
From Abstract Syntax to Universal Dependencies
Author(s) -
Prasanth Kolachina,
Aarnte Ranta
Publication year - 2016
Publication title -
linguistic issues in language technology
Language(s) - English
Resource type - Journals
eISSN - 1945-3590
pISSN - 1945-3604
DOI - 10.33011/lilt.v13i.1391
Subject(s) - computer science , abstract syntax tree , parsing , natural language processing , artificial intelligence , syntax , abstract syntax , programming language , syntax error , annotation , dependency (uml)
syntax is a semantic tree representation that lies between parse trees and logical forms. It abstracts away from word order and lexical items, but contains enough information to generate both surface strings and logical forms. Abstract syntax is commonly used in compilers as an intermediate between source and target languages. Grammatical Framework (GF) is a grammar formalism that generalizes the idea to natural languages, to capture cross-lingual generalizations and perform interlingual translation. As one of the main results, the GF Resource Grammar Library (GF-RGL) has implemented a shared abstract syntax for over 30 languages. Each language has its own set of concrete syntax rules (morphology and syntax), by which it can be generated from the abstract syntax and parsed into it. This paper presents a conversion method from abstract syntax trees to dependency trees. The method is applied for converting GF-RGL trees to Universal Dependencies (UD), which uses a common set of labels for different languages. The correspondence between GF-RGL and UD turns out to be good, and the relatively few discrepancies give rise to interesting questions about universality. The conversion also has potential for practical applications: (1) it makes the GF parser usable as a rule-based dependency parser; (2) it enables bootstrapping UD treebanks from GF treebanks; (3) it defines formal criteria to assess the informal annotation schemes of UD; (4) it gives a method to check the consistency of manually annotated UD trees with respect to the annotation schemes; (5) it makes information from UD treebanks available.