Natural Language Processing Datasets
Explore a wide range of large datasets for NLP model training, including text classification, sentiment analysis, and language translation.
Explore a wide range of large datasets for NLP model training, including text classification, sentiment analysis, and language translation.
The Linguistic Data Consortium at the University of Pennsylvania offers a variety of large datasets for NLP research and model training, including speech and text corpora.
A curated collection of large NLP datasets for model training, covering tasks such as question answering, text summarization, and named entity recognition.
Non-profit organization providing a large dataset of web pages for NLP model training, with over 25 terabytes of text data available for download.
The Stanford NLP Group provides a range of large datasets for NLP research and model training, including the Stanford Question Answering Dataset and the Stanford Sentiment Treebank.
A search engine for datasets, including large datasets for NLP model training, with filters for data type, license, and more.
An article discussing popular large datasets for NLP model training, including IMDB, 20 Newsgroups, and the WikiText dataset.
The National Institutes of Standards and Technology provides a large dataset for NLP model training, focusing on text analysis and information retrieval.