Welcome to the GU ANNIS Web Interface
Legend for symbols
- - Open access corpus, no password required
- - Restricted access corpus, see here for obtaining a login
- - Licensed for GU students and staff (you will still need a login)
- - Limited access license, please inquire on a case-by-case basis
- - corpus language
- - number of tokens in corpus
- - number of documents in corpus
Annotation layer symbols:
- - POS tags / lemmas
- - constituent trees
- - dependency trees
- - named enitites / discourse referents
- - coreference annotation
- - discourse structure
- - document structure
- - semantic annotation
- - error annotation
- - alignment
- - metaphor annotation
For all questions and details about obtaining a login to restricted corpora, see this page.
For larger/flat annotated corpora, also see our CQP web interface.
This page is maintain by the Corpus Linguistics lab, Corpling@GU
Multilayer Corpora
GME - Graded Modal Expressions (GME - Graded Modal Expressions)
eng-us / 301,090 / 534
Georgetown University Multilayer Corpus (GUM)
eng-us / 228,399 / 235
OntoNotes 3.0 - WSJ section (OntoNotes)
eng-us / 370,789 / 597
OntoNotes 5.0 Chinese Dependencies (OntoNotes5_Chinese_dep)
zho / 1,050,841 / 2,036
OntoNotes 5.0 Coref Section (OntoNotes5_coref)
eng-us / 1,590,885 / 3,393
OntoNotes 5.0 Dependencies (OntoNotes5_dep)
eng-us / 2,589,499 / 12,721
The Potsdam Commentary Corpus Sampler (pcc2)
deu / 399 / 2
Gendered Ambiguous Pronouns (UA_English-GAP)
eng / 374,975 / 4,454
Arabic Treebank (Buckwalter vocalized) (arabic.treebank)
ara / 177,950 / 734
Chinese Treebank 9.0 (Chinese Treebank 9.0)
zho / 2,287,073 / 3,726
Corpus of Regional African American Language (CORAAL) (CORAAL)
eng-aae / 2,122,362 / 271
English Web Treebank (English.Web.Treebank)
eng-us / 272,779 / 1,174
English Web Treebank - Universal Dependencies (English.Web.Treebank_UD)
eng-us / 254,830 / 1,174
English Web Treebank - Universal Dependencies V2 (English.Web.Treebank_UD2)
eng-us / 254,829 / 1,174
Foreebank En - English Web Support Forum Treebank (Foreebank-en)
eng / 15,613 / 1
Foreebank Fr - French Web Support Forum Treebank (Foreebank-fr)
fra / 19,667 / 1
IAHLT UD Hebrew Treebank (IAHLT_HTB)
he / 155,919 / 203
Open American National Corpus - Manually Annotated Subcorpus (Court Transcripts) (MASC_court)
eng-us / 37,756 / 39
Switchboard Telephone Conversation Constituent Corpus (switchboard_const)
eng-us / 1,095,089 / 646
Switchboard Telephone Conversation Dependency Corpus (Switchboard (dep))
eng-us / 1,287,379 / 649
The Tiger Treebank version 2 (tiger2)
deu / 888,578 / 1,971
French and English Coreference Databases and Corpora (UA_French-Democrat1921)
fr / 284,885 / 126
Potsdam Commentary Corpus (UA_German-PCC)
deu / 33,222 / 176
UD Hebrew IAHLT Wikipedia section (UD_Hebrew-IAHLTwiki)
he / 140,949 / 39
UD Spanish AnCora (UD_Spanish-AnCora)
es / 559,782 / 1,635
UD Telugu MTG (UD_Telugu-MTG)
tel / 6,465 / 3
Spanish Universal Dependency Treebank 2.0 (unidep.es)
spa / 375,180 / 369
Japanese Universal Dependency Treebank 2.0 (unidep.jp)
jap / 80,172 / 80
Wall Street Journal Dependency Corpus (Wall Street Journal (dep))
eng-us / 1,173,766 / 2,312
Wall Street Journal Constituent Treebank (wsj.const_ptb)
eng-us / 1,209,785 / 2,235
CALLHOME Mandarin Telephone Conversation Treebank (zh.callhome.tb)
zho / 108,531 / 41
Xinhua Mandarin News Treebank (zh.xinhua.tb)
zho / 106,934 / 325
Historical Corpora
Penn Parsed Corpus of Early Modern English - Helsinki Subcorpus (PPCEME_helsinki)
eng-eme / 627,993 / 147
Penn Parsed Corpus of Early Modern English - Penn Subcorpus 1 (PPCEME_penn1)
eng-eme / 636,421 / 152
T-CODEX Tatian V2.1 (Tatian 2.1)
ohg / 11,295 / 2,030
TraCES Corpus of the Classical Ethiopic Language (Ge'ez) (Traces_SGML)
gez / 181,577 / 23
Parallel Corpora
SMULTRON Parallel Treebank Sampler (SMULTRON_Banana)
eng-us,deu / 3,782 / 2
Learner Corpora
CityU Corpus of Essay Drafts of English Language Learners (cityu-2007-08A)
eng-L2 / 600,031 / 1,018
CityU Corpus of Essay Drafts of English Language Learners (cityu-2007-08B)
eng-L2 / 1,173,329 / 1,696
CityU Corpus of Essay Drafts of English Language Learners (cityu-2008-09A)
eng-L2 / 3,428,414 / 3,872
CityU Corpus of Essay Drafts of English Language Learners (cityu-2008-09B)
eng-L2 / 2,151,821 / 4,046
CityU Corpus of Essay Drafts of English Language Learners (cityu-2009-10B)
eng-L2 / 424,841 / 532
The MERLIN corpus - L2 Czech (MERLIN_Czech)
cze-L2 / 79,969 / 441
The MERLIN corpus - L2 German (MERLIN_German)
deu-L2 / 154,335 / 1,033
The MERLIN corpus - L2 Italian (MERLIN_Italian)
ita-L2 / 107,211 / 813
Miscellaneous Corpora
VU Amsterdam Metaphor Corpus (VUAMC)
eng-uk / 238,905 / 117
Hausa Corpora
SFB632 A5 Hausa News Corpus (a5.hausa.news)
hau / 2,017 / 4
SFB632 A5 Hausa Film Corpus [Umarnin Uwa] (a5.hausa.umarnin.uwa_V2)
hau / 10,194 / 47
Discourse Treebanks
COVID Discourse Treebank (CovidDTB)
eng / 60,849 / 300
Georgetown Chinese Discourse Treebank (GCDT)
zho / 62,905 / 50
Instructional Discourse Treebank (Instr-DT)
eng / 56,337 / 176
The Penn Discourse Treebank 3.0 (PDTB)
eng-us / 1,156,308 / 2,161
RST Discourse Treebank (RST-DT)
eng-us / 203,352 / 385
RST Discourse Treebank (dependencies) (RST-DT_rsd)
eng-us / 203,352 / 385
RST Spanish Treebank (rst.spanish.treebank)
spa / 57,895 / 267
The Chinese Science Discourse Treebank (Sci-CDTB)
zh / 18,761 / 109
Science Discourse Treebank (SciDTB)
eng / 102,493 / 798
Coptic SCRIPTORIUM Corpora
Apophthegmata Patrum (apophthegmata.patrum)
cop / 12,117 / 94
Besa - Letters (besa.letters)
cop / 4,543 / 5
Coptic Universal Dependency Treebank (coptic.treebank)
cop / 55,016 / 80
Documentary Papyri (doc.papyri)
cop / 289 / 3
Dormition of John (dormition.john)
cop / 3,211 / 1
Canons of Apa Johannes (johannes.canons)
cop / 22,509 / 14
Life of Aphou (life.aphou)
cop / 4,848 / 2
Life of Cyrus (life.cyrus)
cop / 3,559 / 2
The History of Eustathius and Theopiste (life.eustathius.theopiste)
cop / 11,000 / 2
Life of John the Kalybites (life.john.kalybites)
cop / 8,373 / 2
Life of Longinus and Lucius (life.longinus.lucius)
cop / 11,903 / 5
Life of Onnophrius (life.onnophrius)
cop / 8,677 / 4
Life of Paul of Tamma (life.paul.tamma)
cop / 4,147 / 2
Life of Phib (life.phib)
cop / 4,691 / 2
Life of Pisentius (life.pisentius)
cop / 23,057 / 3
Coptic SCRIPTORIUM, Coptic Magical Papyri (magical.papyri)
cop / 578 / 4
Martyrdom of Victor (martyrdom.victor)
cop / 18,253 / 8
Mysteries of John the Evangelist (mysteries.john)
cop / 6,458 / 2
Instructions of Apa Pachomius (pachomius.instructions)
cop / 12,986 / 2
Coptic SCRIPTORIUM, Marcion (pistis.sophia)
cop / 39,271 / 8
Pistis Sophia (proclus.homilies)
cop / 5,214 / 2
Pseudo-Athanasius Discourses (pseudo.athanasius.discourses)
cop / 13,976 / 3
Pseudo-Basil of Caesarea Discourse (pseudo.basil)
cop / 3,837 / 1
Encomium on Victor (pseudo.celestinus)
cop / 23,584 / 3
Pseudo-Chrysostom (pseudo.chrysostom)
cop / 8,606 / 2
Pseudo-Ephrem Writings (pseudo.ephrem)
cop / 11,245 / 3
Encomium on Demetrius Archbishop of Alexandria (pseudo.flavianus)
cop / 7,639 / 2
Pseudo-Theophilus on the Cross (pseudo.theophilus)
cop / 4,974 / 4
Pseudo-Timothy of Alexandria Discourses (pseudo.timothy)
cop / 9,749 / 2
Sahidica Bible - 1 Corinthians (sahidica.1corinthians)
cop / 12,454 / 16
Sahidica Bible - Mark (sahidica.mark)
cop / 20,278 / 16
Sahidica Coptic New Testament (sahidica.nt)
cop / 248,718 / 259
The Book of Ruth (OT) of the Old and New Testament (sahidic.ot)
cop / 464,977 / 729
The Gospel of Mark (NT) of the Old and New Testament (sahidic.ruth)
cop / 3,503 / 4
Shenoute - Acephalous 22 (shenoute.a22)
cop / 8,351 / 6
Shenoute - Abraham Our Father: YA 535-40 (shenoute.abraham)
cop / 7,696 / 7
Shenoute - Some Kinds of People Sift Dirt (shenoute.dirt)
cop / 6,236 / 6
Shenoute - I See Your Eagerness (shenoute.eagerness)
cop / 18,368 / 17
Shenoute - Not Because a Fox Barks (shenoute.fox)
cop / 2,812 / 1
Shenoute - In the Night: BV278-282 (shenoute.night)
cop / 1,180 / 1
Shenoute - Because of You Too O Prince of Evil XH 185-194 (shenoute.prince)
cop / 4,613 / 2
Shenoute - Whoever Seeks God Will Find CZ 129-137 (shenoute.seeks)
cop / 2,195 / 1
Shenoute - God Says Through Those Who Are His: GF 259-262 (shenoute.those)
cop / 9,488 / 13
Shenoute - Unknown Work 5-1: GF 381-88 (shenoute.unknown5_1)
cop / 2,602 / 2
[Admin logon]