Our EMNLP 2024 paper presents a valuable genre-diverse PDTB-style dataset for English shallow discourse parsing across modalities, text types, and domains using a cascade of conversion modules leveraging enhanced RST annotations, thereby also enabling theoretical studies of discourse relation variation across frameworks
P.Duk. inv. 282 fr. B verso
In our Machine Learning for Ancient Languages (ML4AL) workshop paper , we present a bidirectional RNN model for character prediction of Coptic characters in manuscript lacunae and use it to rank the likelihood of various textual reconstructions. A live demo of our models is available here !
Our EACL 2024 paper promotes a strict definition of entity salience by presenting GUMsley, a 12-genre challenge dataset for entity salience evaluation and shows how salient entities added to summarization models are beneficial for deriving higher-quality summaries with fewer hallucinated entities
Check our AACL-IJCNLP 2023 paper about incorporating singletons and mention-based features to improve coreference generalization
Our SIGDIAL 2023 paper on English RST parsing errors examines and models some of the factors associated with parsing difficulties
Our LAW-XVII 2023 (co-located with ACL 2023) paper on a Genre-Diverse Multilayer Challenge Set for English NLP and Linguistic Evaluation presents GENTLE , a new mixed-genre English challenge corpus totaling 17K tokens and consisting of 8 unusual text types for out-of-domain evaluation and openly released as part of the Universal Dependencies 2.12 version available here
Our ACL 2023 Findings paper on Multi-Genre Data and Evaluation for English Abstractive Summarization presents a 12-genre challenge set for English abstractive summarization (the extreme summarization task) following both generall and genre-specific guidelines
Our EACL 2023 paper on a thorough investigation of RST generalizability issues, with a focus on the impact of data diversity, thereby promoting multi-genre benchmarks for RST parsing based on our experimental results
Check out our ACL paper about generalization in SOTA coreference resolution, including the new OntoGUM dataset for evaluation.
Please join us online for Digital Coptic 3 , the virtual workshop for DH project on Coptic!
Would like to have more data to work with? Check our LREC paper , where we present a freely available, genre-balanced English web corpus totaling 4M tokens and featuring a large number of high-quality automatic annotation layers, including dependency trees, non-named entity annotations, coreference resolution, and discourse trees in Rhetorical Structure Theory.
2020-06-07
Shabnam and Amir's paper on Reddit part of speech tagging was accepted to WAC-XII .
2020-05-18
Thoughts on how to treebank social media? Read our LREC paper
Logan and Amir will present a paper on converting Stanford Dependencies to Universal Dependencies using multilayered corpus in LAW-MWE-CxG-2018 workshop at COLING2018
Our EMNLP 2024 paper presents a valuable genre-diverse PDTB-style dataset for English shallow discourse parsing across modalities, text types, and domains using a cascade of conversion modules leveraging enhanced RST annotations, thereby also enabling theoretical studies of discourse relation variation across frameworks
P.Duk. inv. 282 fr. B verso
In our Machine Learning for Ancient Languages (ML4AL) workshop paper , we present a bidirectional RNN model for character prediction of Coptic characters in manuscript lacunae and use it to rank the likelihood of various textual reconstructions. A live demo of our models is available here !
Our EACL 2024 paper promotes a strict definition of entity salience by presenting GUMsley, a 12-genre challenge dataset for entity salience evaluation and shows how salient entities added to summarization models are beneficial for deriving higher-quality summaries with fewer hallucinated entities
Check our AACL-IJCNLP 2023 paper about incorporating singletons and mention-based features to improve coreference generalization
Our SIGDIAL 2023 paper on English RST parsing errors examines and models some of the factors associated with parsing difficulties
If you have any questions or feedback please let us know! If you'd like to join us: We accept new PhD and Masters students every year, please contact Amir Zeldes for more information.