large text corpus for machine learning

H

huggingface.io tool

Large Text Corpus for Machine Learning

The Hugging Face Datasets library provides a wide range of large text corpora for machine learning, including the popular WikiText and BookCorpus datasets.

S

stanford.edu article

Machine Learning with Large Text Corpora

This course covers the fundamentals of machine learning with large text corpora, including topic modeling, sentiment analysis, and text classification.

C

commoncrawl.org official

Common Crawl: A Large Corpus of Web Pages

Common Crawl is a non-profit organization that provides a large corpus of web pages for machine learning and research, updated regularly.

M

meta.wikimedia.org article

The Wikipedia Corpus

The Wikipedia Corpus is a large text corpus based on Wikipedia articles, suitable for machine learning and natural language processing tasks.

A

arxiv.org research

Text Corpus for Machine Learning Research

This research paper presents a new large text corpus for machine learning research, focusing on low-resource languages and domains.

B

blog.google news

Google's Large Text Corpus for Machine Learning

Google has released a large text corpus for machine learning research, including a massive dataset of text from the web and books.

K

kaggle.com tool

Text Data for Machine Learning

Kaggle provides a wide range of text datasets for machine learning, including large corpora for text classification, sentiment analysis, and topic modeling.

Y

youtube.com video

Large Text Corpus Analysis with Python

This video tutorial covers how to analyze large text corpora with Python, using popular libraries such as NLTK and spaCy.