Large Scale Text Corpus for NLP Research
The ACL Anthology is a large scale text corpus for NLP research, containing over 50,000 papers on natural language processing and related topics.
The ACL Anthology is a large scale text corpus for NLP research, containing over 50,000 papers on natural language processing and related topics.
Common Crawl is a non-profit organization that provides a large scale text corpus for NLP, with over 25 terabytes of text data available for download.
The Natural Language Toolkit (NLTK) provides a range of large scale text corpora for NLP, including the Corpus of Contemporary American English and the Wikipedia Corpus.
The Stanford NLP Group provides a range of large scale text corpora for NLP research, including the Stanford Sentiment Treebank and the Stanford Question Answering Dataset.
This article discusses how to use Hadoop to analyze large scale text corpora for NLP, including how to preprocess and tokenize text data.
The Google Ngram Viewer is a large scale text corpus for NLP that allows users to search and visualize the frequency of words and phrases in books over time.
This research paper discusses the construction of large scale text corpora for NLP research, including the challenges and opportunities of working with big data.
This online course provides an introduction to NLP, including how to work with large scale text corpora and how to use popular NLP tools and techniques.