A simple configurable tool for manipulating dependency trees
DepEdit reads and writes files encoded in the CoNLL dependency format (10 columns). It's a simple Python script which can:
You can also import it into your projects as a preprocessing module.
For detailed instructions please see the User Guide
Here are some example scenarios in which DepEdit can be helpful:
DepEdit is a self-contained Python script that is compatible with Python 2.X and 3.X and only needs a configuration file to run.
You can download the script itself (just depedit.py, without installing), or optionally install it via pip and run it as a module in your project (see below). Command line usage is either file by file, or using a glob pattern (e.g. *.conll10), in which case output files are created with a configurable suffix such as '.depedit' before the extension:
> python depedit.py -c config_file.ini INPUT.conll10 > OUTPUT.conll10 > python depedit.py -c config_file.ini *.conllu
Configuration files are text files with one instruction per line and optional blank lines and comments (beginning with ';' or '#'). Each instruction contains 3 columns, as in the following example:
| ;Connect nouns to a preceding article or possessive pronoun with the 'det' function | ||
| pos=/DT|PRP\$/;pos=/NNS?/ | #1.#2 | #2>#1;#1:func=det |
| ;Change to-infinitive from aux to mark | text=/^[Tt]o$/&func=/aux/ | none | #1:func=mark |
The first column describes the tokens to be matched using regular expressions.
The middle column defines relationships between tokens. It refers to each token in the definition by number
(#1, #2...) and specifies:
The third column specifies what to do if a rule matches:
it is also possible to define variables for frequently used (parts of) regular expressions.
To import DepEdit into an existing project, you may want to install depedit as a module, rather than including depedit.py in your own codebase. You can install from PyPI via pip:
> pip install depedit
| from depedit import DepEdit |
| infile = open("path/to/infile.txt") |
| config_file = open("path/to/config.ini") |
| d = DepEdit(config_file) |
| result = d.run_depedit(infile) |
Alternatively, you can also create a configuration inside your module, without reading it from a text file. There are several ways of doing this, which all achieve the same result:
| from depedit import DepEdit |
| d = DepEdit() |
| ############################## |
| # Ways to add transformations: |
| ############################## |
| # From a single string per instruction |
| d.add_transformation("pos=/V/\tnone\t#1:func=x") |
| # From args |
| d.add_transformation("pos=/V/\tnone\t#1:func=z","pos=/V/\tnone\t#1:func=y") |
| # From a list |
| d.add_transformation(["pos=/V/\tnone\t#1:func=a","pos=/V/\tnone\t#1:func=b"]) |
| # From a dictionary |
| d.add_transformation({"nodes":"pos=/V/","rels": "none","actions":"#1:pos=a"}) |
If you are using DepEdit in a scholarly paper, please cite the following reference:
@InProceedings{PengZeldes2020,
author = {Siyao Peng and Amir Zeldes},
title = {All Roads Lead to {UD}: Converting {S}tanford and {P}enn Parses
to {E}nglish {U}niversal {D}ependencies with Multilayer Annotations},
booktitle = {Proceedings of the Joint Workshop on Linguistic Annotation,
Multiword Expressions and Constructions ({LAW}-{MWE}-{C}x{G}-2018)},
year = {2018},
pages = {167--177},
address = {Santa Fe, NM},
url = {https://www.aclweb.org/anthology/W18-4918}
}
© 2015-2021 Amir Zeldes. Code released under the Apache 2.0 License.