Natural Language Processing Datasets
Discover and download popular NLP datasets, including large corpora for text classification, sentiment analysis, and language modeling.
Discover and download popular NLP datasets, including large corpora for text classification, sentiment analysis, and language modeling.
Research paper discussing the importance of large datasets in NLP, highlighting popular corpora such as Common Crawl and Wikipedia.
Comprehensive collection of NLP datasets, including the Brown Corpus, Penn Treebank, and WordNet, for use in natural language processing tasks.
Research group focused on NLP, with resources and datasets for tasks such as sentiment analysis, question answering, and machine translation.
Non-profit organization providing a large corpus of web pages for use in NLP research, with over 25 terabytes of data available.
Book chapter discussing the use of large datasets in NLP, with examples using popular libraries such as NLTK and spaCy.
Repository of linguistic datasets, including corpora for speech recognition, machine translation, and text summarization.
Search engine for datasets, with a large collection of NLP datasets, including those for text classification, sentiment analysis, and language modeling.