Download
Multiple formats, open access
You can download the entire corpus or separate annotation layers in the following formats. Please make sure to read about reconstructing reddit token data, which is not included in the downloadable version but can be added using a script. If you are interested in other subsets or formats of the data, please contact Amir Zeldes.
Format | Annotations |
---|---|
relANNIS3.3 | all (merged), for search with ANNIS |
PAULA XML | all (merged), in standoff XML |
TreeTagger/CWB/CQPWeb XML | token annotations and TEI, including sentence types and speakers |
Penn style brackets | tokens, pos, constituent categories and PTB functions |
CoNLL-U | UD dependencies, morphology, sentence types, speakers, entities, coreference, Wikification and RST dependencies |
CoNLL coreference format | untyped coreference and entities, excluding bridging relations |
WebAnno TSV3 format | typed coreference, including bridging, entity types, Wikification and information structure |
Enhanced Rhetorical Structure Theory | untokenized text with eRST analyses in .rs4 XML, lisp brackets, DISRPT formats and RST dependencies |