8 results · AI-generated index
H
huggingface.io
tool

Natural Language Processing Datasets

Discover and download popular NLP datasets, including large corpora for text classification, sentiment analysis, and language modeling.

M
mit.edu
research

Big Data for Natural Language Processing

Research paper discussing the importance of large datasets in NLP, highlighting popular corpora such as Common Crawl and Wikipedia.

N
nltk.org
official

NLTK Data: Corpora and Lexicons

Comprehensive collection of NLP datasets, including the Brown Corpus, Penn Treebank, and WordNet, for use in natural language processing tasks.

S
stanford.edu
article

The Stanford Natural Language Processing Group

Research group focused on NLP, with resources and datasets for tasks such as sentiment analysis, question answering, and machine translation.

C
commoncrawl.org
article

Common Crawl: A Large Corpus of Web Pages

Non-profit organization providing a large corpus of web pages for use in NLP research, with over 25 terabytes of data available.

O
oreilly.com
article

Natural Language Processing with Python

Book chapter discussing the use of large datasets in NLP, with examples using popular libraries such as NLTK and spaCy.

L
ldc.upenn.edu
official

Linguistic Data Consortium: NLP Datasets

Repository of linguistic datasets, including corpora for speech recognition, machine translation, and text summarization.

D
datasetsearch.research.google.com
tool

Google Dataset Search: NLP Datasets

Search engine for datasets, with a large collection of NLP datasets, including those for text classification, sentiment analysis, and language modeling.