A simple configurable tool for manipulating dependency trees
DepEdit reads and writes files encoded in the CoNLL dependency format (10 columns). It's a simple Python script which can:
You can also import it into your projects as a preprocessing module.
For detailed instructions please see the User Guide
Here are some example scenarios in which DepEdit can be helpful:
DepEdit is a self-contained Python script that is compatible with Python 2.X and 3.X and only needs a configuration file to run.
You can download the script itself (just depedit.py, without installing), or optionally install it via pip and run it as a module in your project (see below). Command line usage is either file by file, or using a glob pattern (e.g. *.conll10), in which case output files are created with a configurable suffix such as '.depedit' before the extension:
> python depedit.py -c config_file.ini INPUT.conll10 > OUTPUT.conll10 > python depedit.py -c config_file.ini *.conllu
Configuration files are text files with one instruction per line and optional blank lines and comments (beginning with ';' or '#'). Each instruction contains 3 columns, as in the following example:
;Connect nouns to a preceding article or possessive pronoun with the 'det' function | ||
pos=/DT|PRP\$/;pos=/NNS?/ | #1.#2 | #2>#1;#1:func=det |
;Change to-infinitive from aux to mark | text=/^[Tt]o$/&func=/aux/ | none | #1:func=mark |
The first column describes the tokens to be matched using regular expressions.
The middle column defines relationships between tokens. It refers to each token in the definition by number
(#1, #2...) and specifies:
The third column specifies what to do if a rule matches:
it is also possible to define variables for frequently used (parts of) regular expressions.
To import DepEdit into an existing project, you may want to install depedit as a module, rather than including depedit.py in your own codebase. You can install from PyPI via pip:
> pip install depedit
from depedit import DepEdit |
infile = open("path/to/infile.txt") |
config_file = open("path/to/config.ini") |
d = DepEdit(config_file) |
result = d.run_depedit(infile) |
Alternatively, you can also create a configuration inside your module, without reading it from a text file. There are several ways of doing this, which all achieve the same result:
from depedit import DepEdit |
d = DepEdit() |
############################## |
# Ways to add transformations: |
############################## |
# From a single string per instruction |
d.add_transformation("pos=/V/\tnone\t#1:func=x") |
# From args |
d.add_transformation("pos=/V/\tnone\t#1:func=z","pos=/V/\tnone\t#1:func=y") |
# From a list |
d.add_transformation(["pos=/V/\tnone\t#1:func=a","pos=/V/\tnone\t#1:func=b"]) |
# From a dictionary |
d.add_transformation({"nodes":"pos=/V/","rels": "none","actions":"#1:pos=a"}) |
If you are using DepEdit in a scholarly paper, please cite the following reference:
@InProceedings{PengZeldes2020, author = {Siyao Peng and Amir Zeldes}, title = {All Roads Lead to {UD}: Converting {S}tanford and {P}enn Parses to {E}nglish {U}niversal {D}ependencies with Multilayer Annotations}, booktitle = {Proceedings of the Joint Workshop on Linguistic Annotation, Multiword Expressions and Constructions ({LAW}-{MWE}-{C}x{G}-2018)}, year = {2018}, pages = {167--177}, address = {Santa Fe, NM}, url = {https://www.aclweb.org/anthology/W18-4918} }
© 2015-2021 Amir Zeldes. Code released under the Apache 2.0 License.