large scale text corpus for nlp

W

www.aclweb.org research

Large Scale Text Corpus for NLP Research

The ACL Anthology is a large scale text corpus for NLP research, containing over 50,000 papers on natural language processing and related topics.

C

commoncrawl.org tool

Common Crawl

Common Crawl is a non-profit organization that provides a large scale text corpus for NLP, with over 25 terabytes of text data available for download.

W

www.nltk.org article

Natural Language Processing with Python

The Natural Language Toolkit (NLTK) provides a range of large scale text corpora for NLP, including the Corpus of Contemporary American English and the Wikipedia Corpus.

N

nlp.stanford.edu official

The Stanford Natural Language Processing Group

The Stanford NLP Group provides a range of large scale text corpora for NLP research, including the Stanford Sentiment Treebank and the Stanford Question Answering Dataset.

W

www.ibm.com article

Large Scale Text Analysis with Hadoop

This article discusses how to use Hadoop to analyze large scale text corpora for NLP, including how to preprocess and tokenize text data.

B

books.google.com tool

The Google Ngram Viewer

The Google Ngram Viewer is a large scale text corpus for NLP that allows users to search and visualize the frequency of words and phrases in books over time.

W

www.mit.edu research

Text Corpus Construction for NLP Research

This research paper discusses the construction of large scale text corpora for NLP research, including the challenges and opportunities of working with big data.

W

www.coursera.org video

Introduction to Natural Language Processing

This online course provides an introduction to NLP, including how to work with large scale text corpora and how to use popular NLP tools and techniques.