8 results · AI-generated index
H
huggingface.co
tool

Natural Language Processing Datasets

Explore a wide range of large text datasets for NLP tasks, including text classification, language modeling, and question answering.

L
ldc.upenn.edu
research

Linguistic Data Consortium

The Linguistic Data Consortium is an international organization that creates and distributes large text datasets for NLP research, including the Penn Treebank and Switchboard corpora.

C
commoncrawl.org
article

Common Crawl

Common Crawl is a non-profit organization that provides a large, freely available corpus of web pages for NLP research and development.

N
nlp.stanford.edu
official

Stanford Natural Language Processing Group

The Stanford NLP Group provides access to a variety of large text datasets, including the Stanford Question Answering Dataset and the Stanford Sentiment Treebank.

D
datasetsearch.research.google.com
tool

Google Dataset Search

Google Dataset Search is a search engine for datasets, including large text datasets for NLP tasks such as language modeling and text classification.

W
www.nih.gov
official

The National Institutes of Health's (NIH) NLP Dataset

The NIH provides a large text dataset for NLP research, including clinical notes and medical literature, to support the development of NLP models for healthcare applications.

W
www.kaggle.com
news

Kaggle NLP Competitions

Kaggle hosts a variety of NLP competitions, including those focused on large text datasets, such as text classification and language modeling.

W
www.aclweb.org
research

The ACL Anthology

The ACL Anthology is a digital archive of papers and proceedings from the Association for Computational Linguistics, including research on large text datasets for NLP tasks.