8 results · AI-generated index
A
arxiv.org
research

Large Language Models: A Survey

This survey provides an overview of large language models, including their data requirements, architecture, and applications. The authors discuss the importance of high-quality training data for achieving state-of-the-art results.

H
huggingface.io
article

Data Requirements for Large Language Models

This article discusses the data requirements for training large language models, including the types of data needed, data preprocessing, and data augmentation techniques. The authors provide examples of popular datasets used for training language models.

G
github.com
tool

Large Language Model Training Data

This repository provides a collection of datasets and tools for training large language models. The repository includes datasets such as Common Crawl, Wikipedia, and BookCorpus, as well as scripts for data preprocessing and augmentation.

A
aclweb.org
research

The Importance of Data Quality for Large Language Models

This paper discusses the importance of data quality for training large language models. The authors argue that high-quality training data is essential for achieving state-of-the-art results and provide guidelines for evaluating and improving data quality.

N
nsf.gov
official

Data Requirements for Large Language Models: A Government Perspective

This report discusses the data requirements for large language models from a government perspective. The report highlights the need for high-quality training data and provides recommendations for funding and supporting data collection and curation efforts.

Y
youtube.com
video

Large Language Models: Data, Compute, and Ethics

This video lecture discusses the data requirements for large language models, as well as the computational resources and ethical considerations needed for training these models. The speaker provides an overview of the current state of large language models and future directions.

S
stanford.edu
article

Data Curation for Large Language Models

This course provides an overview of data curation for large language models, including data collection, preprocessing, and augmentation. The course covers topics such as data quality, data privacy, and data ethics.

I
ieee.org
news

Large Language Model Data Requirements: A Survey of Industry Practices

This survey provides an overview of industry practices for data collection and curation for large language models. The authors discuss the types of data used, data preprocessing techniques, and data augmentation methods, as well as challenges and future directions.