machine learning large corpus dataset

S

stanford.edu research

Large Corpus Datasets for Machine Learning

Stanford University's Natural Language Processing Group provides large corpus datasets for machine learning research, including the Stanford Question Answering Dataset and the Stanford Sentiment Treebank.

K

kaggle.com tool

Machine Learning Datasets

Kaggle's machine learning datasets include large corpus datasets such as the 20 Newsgroups dataset and the IMDB sentiment analysis dataset, which can be used for text classification and sentiment analysis tasks.

N

nsf.gov official

Large-Scale Machine Learning

The National Science Foundation provides funding for research in large-scale machine learning, including the development of new algorithms and techniques for processing large corpus datasets.

W

wikipedia.org article

The Wikipedia Corpus

The Wikipedia Corpus is a large corpus dataset that contains the text of Wikipedia articles, which can be used for machine learning tasks such as text classification and named entity recognition.

G

google.io article

Large Corpus Dataset for Machine Translation

Google's Machine Translation team has released a large corpus dataset for machine translation research, which includes paired translations of text in multiple languages.

T

towardsdatascience.com article

Text Classification with Large Corpus Datasets

This article discusses the use of large corpus datasets for text classification tasks, including the use of pre-trained language models and transfer learning techniques.

C

coursera.org video

Machine Learning with Large Datasets

This online course covers the basics of machine learning with large datasets, including data preprocessing, feature extraction, and model evaluation.

C

commoncrawl.org research

The Common Crawl Dataset

The Common Crawl dataset is a large corpus dataset that contains crawled web pages, which can be used for machine learning tasks such as text classification and information retrieval.