8 results · AI-generated index
N
nltk.org
tool

NLTK Data

The Natural Language Toolkit (NLTK) includes a wide range of free text corpora for NLP tasks, including books, articles, and websites.

A
arxiv.org
research

Large Text Corpus for NLP Research

This paper presents a large-scale text corpus for NLP research, containing over 100 million words from various sources, including books and articles.

C
commoncrawl.org
article

Common Crawl

Common Crawl is a non-profit organization that provides a large corpus of web pages for NLP research and development, updated regularly.

D
datasetsearch.research.google.com
tool

Google's Dataset Search

Google's Dataset Search is a search engine for datasets, including text corpora for NLP, providing access to a wide range of free and open datasets.

N
nlp.stanford.edu
official

The Stanford Natural Language Processing Group

The Stanford NLP Group provides a range of free resources, including text corpora, for NLP research and development, such as the Stanford Question Answering Dataset.

H
huggingface.co
tool

Hugging Face Datasets

Hugging Face Datasets is a platform that provides a wide range of text corpora for NLP tasks, including datasets for language modeling, sentiment analysis, and more.

W
wikipedia.org
article

The Wikipedia Corpus

The Wikipedia Corpus is a large corpus of text from Wikipedia articles, available for free download and use in NLP research and development.

L
ldc.upenn.edu
official

Linguistic Data Consortium

The Linguistic Data Consortium (LDC) is a non-profit organization that provides a wide range of linguistic resources, including text corpora, for NLP research and development.