Supertagging the Long Tail with Tree-Structured Decoding of Complex Categories
Author(s) -
Jakob Prange,
Nathan Schneider,
Vivek Srikumar
Publication year - 2021
Publication title -
transactions of the association for computational linguistics
Language(s) - English
Resource type - Journals
ISSN - 2307-387X
DOI - 10.1162/tacl_a_00364
Subject(s) - computer science , parsing , tree (set theory) , constructive , set (abstract data type) , decoding methods , artificial intelligence , domain (mathematical analysis) , natural language processing , tree structure , fraction (chemistry) , training set , machine learning , binary tree , algorithm , mathematics , programming language , process (computing) , mathematical analysis , chemistry , organic chemistry
Although current CCG supertaggers achieve high accuracy on the standard WSJ test set, few systems make use of the categories’ internal structure that will drive the syntactic derivation during parsing. The tagset is traditionally truncated, discarding the many rare and complex category types in the long tail. However, supertags are themselves trees. Rather than give up on rare tags, we investigate constructive models that account for their internal structure, including novel methods for tree-structured prediction. Our best tagger is capable of recovering a sizeable fraction of the long-tail supertags and even generates CCG categories that have never been seen in training, while approximating the prior state of the art in overall tag accuracy with fewer parameters. We further investigate how well different approaches generalize to out-of-domain evaluation sets.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom