Our EMNLP 2024 paper presents a valuable genre-diverse PDTB-style dataset for English shallow discourse parsing across modalities, text types, and domains using a cascade of conversion modules leveraging enhanced RST annotations, thereby also enabling theoretical studies of discourse relation variation across frameworks
Research
-
GUM7 – four added genres, Wikification and more!
The first release of GUM series 7 now adds four more genres to our multilayer corpus, in addition to brand new annotation layers, corrections, and more. This post outlines the main changes and additions to the corpus.
More -
Entities in the Coptic Treebank
With the release of Version 2.6 of Universal Dependencies, our focus has shifted to handling Named and Non-Named Entity Recognition (NER/NNER) in Coptic data. As a result of intensive work by the Coptic Scriptorium team in the past few months,...
More -
New features in our NLP pipeline
-
A Neural Network Reads the Newspaper
... in search of discourse signals! We now know a lot about what cues people use to identify discourse relations, but can we teach computers to notice the same signals?
More -
What you say where - a discourse heatmap
Does discourse structure constrain where we talk about what? Research on recurring mentions within discourse graphs shows back-reference is sensitive to the reasons why sentences and groups of sentences are uttered. In the image above, ...
More