8 results · AI-generated index
H
huggingface.io
tool

Natural Language Processing Datasets

Explore a wide range of large datasets for NLP model training, including text classification, sentiment analysis, and language translation.

L
ldc.upenn.edu
research

Linguistic Data Consortium

The Linguistic Data Consortium at the University of Pennsylvania offers a variety of large datasets for NLP research and model training, including speech and text corpora.

G
github.com
tool

NLP Dataset Collection

A curated collection of large NLP datasets for model training, covering tasks such as question answering, text summarization, and named entity recognition.

C
commoncrawl.org
official

Common Crawl

Non-profit organization providing a large dataset of web pages for NLP model training, with over 25 terabytes of text data available for download.

N
nlp.stanford.edu
research

Stanford Natural Language Processing Group

The Stanford NLP Group provides a range of large datasets for NLP research and model training, including the Stanford Question Answering Dataset and the Stanford Sentiment Treebank.

D
datasetsearch.research.google.com
tool

Google Dataset Search

A search engine for datasets, including large datasets for NLP model training, with filters for data type, license, and more.

T
towardsdatascience.com
article

NLP Datasets for Machine Learning

An article discussing popular large datasets for NLP model training, including IMDB, 20 Newsgroups, and the WikiText dataset.

W
www.nist.gov
official

National Institutes of Standards and Technology NLP Dataset

The National Institutes of Standards and Technology provides a large dataset for NLP model training, focusing on text analysis and information retrieval.