large text datasets for NLP

H

huggingface.io tool

NLP Datasets

The Hugging Face dataset library provides a wide range of large text datasets for NLP tasks, including but not limited to question answering, text classification, and language modeling.

A

arxiv.org research

Large Text Datasets for Natural Language Processing

This research paper discusses the importance of large text datasets in NLP and provides an overview of popular datasets used in the field, including the Common Crawl dataset and the Wikipedia dataset.

K

kaggle.com tool

Natural Language Processing Datasets

Kaggle provides a variety of large text datasets for NLP tasks, including text classification, sentiment analysis, and machine translation, along with kernels and competitions to practice and improve your skills.

L

ldc.upenn.edu official

LDC Data Catalog

The Linguistic Data Consortium (LDC) at the University of Pennsylvania offers a wide range of large text datasets, including the Gigaword dataset and the TDT5 dataset, for use in NLP research and development.

G

github.com tool

NLP Dataset Collection

This GitHub repository provides a collection of links to large text datasets for NLP tasks, including datasets for language modeling, text classification, and question answering, along with scripts to download and preprocess the data.

S

stanford.edu edu

Text Datasets for NLP

The Stanford Natural Language Processing Group provides a list of large text datasets for NLP research, including the Stanford Question Answering Dataset (SQuAD) and the Stanford Sentiment Treebank.

A

aclweb.org article

Large-Scale Text Datasets for NLP

This article discusses the challenges and opportunities of working with large-scale text datasets in NLP, including the need for efficient data processing and storage solutions.

D

data.gov gov

NLP Data Repository

The US Government's data repository provides a collection of large text datasets for NLP tasks, including datasets from government agencies and other sources, available for download and use in research and development.