Corpus description, PPCME2, release 5

General information

The PPCME2 text samples are based largely on the Middle English section of the Diachronic Part of the Helsinki Corpus of English Texts, with certain additions and deletions. However, the size of the samples is considerably larger. For the earliest Helsinki time period, all texts are exhaustively sampled. For later Helsinki time periods, two texts per period were expanded to 50,000 words. The remaining texts are represented by the Helsinki Corpus sample.

The current edition of the PPCME2 contains roughly 1.2 million words of running text. Each of the 58 text samples in the corpus is available in three forms: parsed, part-of-speech tagged, and unannotated text. In addition, there is a file with philological and bibliographical information about each text.

Helsinki periods

The main Helsinki time periods are M1-M4, each covering approximately one hundred years. In addition, texts originally written in a given period but for which the earliest manuscript is from a later period are given two digit period designations. Table 1 is a list of all Helsinki periods as they appear in the corpus file names.

Table 1: Helsinki periods
Period designation Composition date Manuscript date
MX1 unknown 1150-1250
M1 1150-1250 1150-1250
M2 1250-1350 1250-1350
M23 1250-1350 1350-1420
M24 1250-1350 1420-1500
M3 1350-1420 1350-1420
M34 1350-1420 1420-1500
MX4 unknown 1420-1500
M4 1420-1500 1420-1500

Wordcount information

Wordcounts for the individual text samples, along with date and genre information, are given below. If cut and pasted into a text file, the resulting file can be imported into other applications; the field separator is the space character. The wordcounts exclude punctuation and extralinguistic material such as page numbers or token ID numbers.

Text Date Genre Wordcount
1200-brut-m1 1190_1215 VERSE 41561
1490-caxton-benet-m4 1490 RULE 7864
cmaelr3-m23 c1400 RULE 16952
cmaelr4-m4 a1450 RELIG_TREATISE 11117
cmancriw-1-m1 c1230 RELIG_TREATISE 49211
cmancriw-2-m1 c1230 RELIG_TREATISE 15376
cmastro-m3 a1450_c1391 HANDBOOK_ASTRO 6847
cmayenbi-m2 1340 RELIG_TREATISE 45641
cmbenrul-m3 a1425 RULE 18224
cmboeth-m3 ?a1425_c1380 PHILOSOPHY 10203
cmbrut3-m3 c1400 HISTORY 50158
cmcapchr-m4 a1464 HISTORY 52506
cmcapser-m4 c1452 SERMON 1459
cmcloud-m3 a1425_?a1400 RELIG_TREATISE 15631
cmctmeli-m3 c1390 PHILOSOPHY/FICTION 16939
cmctpars-m3 c1390 RELIG_TREATISE 30259
cmearlps-m2 c1350 BIBLE 44598
cmedmund-m4 c1450_1438 BIOGRAPHY_LIFE_OF_SAINT 3831
cmedthor-m34 c1440_?1350 RELIG_TRREATISE 13896
cmedvern-m3 c1390 RELIG_TREATISE 12798
cmequato-m3 c1392 HANDBOOK_ASTRO 6274
cmfitzja-m4 ?1495 SERMON 5808
cmgaytry-m34 c1440 SERMON 5207
cmgregor-m4 c1475 HISTORY 36671
cmhali-m1 c1225_?c1200 RELIG_TREATISE 8915
cmhilton-m34 a1450_a1396 RELIG_TREATISE 4906
cmhorses-m3 a1450 HANDBOOK_MEDICINE 6315
cminnoce-m4 1497 SERMON 4247
cmjulia-m1 c1225_?c1200 BIOGRAPHY_LIFE_OF_SAINT 7219
cmjulnor-m34 c1450_c1400 RELIG_TREATISE 5029
cmkathe-m1 c1225_?c1200 BIOGRAPHY_LIFE_OF_SAINT 9105
cmkempe-m4 c1450 RELIG_TREATISE 62926
cmkentho-m1 a1150_c1125 HOMILY 4272
cmkentse-m2 c1275 HOMILY 3504
cmlamb1-m1 a1225 HOMILY 6475
cmlambx1-mx1 a1225 HOMILY 20691
cmmalory-m4 a1470 ROMANCE 57393
cmmandev-m3 ?a1425_c1400 TRAVELOGUE 51556
cmmarga-m1 c1225_?c1200 BIOGRAPHY_LIFE_OF_SAINT 8539
cmmirk-m34 a1500_a1415 SERMON 57548
cmntest-m3 c1388 BIBLE 10986
cmorm-m1 ?c1200 HOMILY_POETRY 53474
cmotest-m3 a1425_a1382 BIBLE 9842
cmpeterb-m1 c1150 HISTORY 7310
cmpolych-m3 a1387 HISTORY 45769
cmpurvey-m3 c1388 RELIG_TREATISE 39454
cmreynar-m4 1481 FICTION 8775
cmreynes-m4 1470-1500 HANDBOOK_OTHER 8852
cmrollep-m24 a1450_?1348 RELIG_TREATISE 17850
cmrolltr-m24 c1440_a1349 RELIG_TREATISE 17611
cmroyal-m34 c1450_c1425 SERMON 6191
cmsawles-m1 c1225_?c1200 HOMILY 4318
cmsiege-m4 c1500 ROMANCE 7618
cmthorn-mx4 c1440 HANDBOOK_MEDICINE 5717
cmtrinit-mx1 a1225 HOMILY 41554
cmvices1-m1 a1225_c1200 RELIG_TREATISE 27521
cmvices4-m34 c1450_c1400 RELIG_TREATISE 7044
cmwycser-m3 c1400 SERMON 55666