Constrained Versions of DEDICOM for Use in Unsupervised Part-Of-Speech Tagging
Author(s) -
Daniel Dunlavy,
Peter Chew
Publication year - 2016
Language(s) - English
Resource type - Reports
DOI - 10.2172/1254278
Subject(s) - bigram , computer science , hidden markov model , domain (mathematical analysis) , artificial intelligence , factor (programming language) , mathematics , mathematical analysis , trigram , programming language
This reports describes extensions of DEDICOM (DEcomposition into DIrectional COMponents) data models [3] that incorporate bound and linear constraints. The main purpose of these extensions is to investigate the use of improved data models for unsupervised part-of-speech tagging, as described by Chew et al. [2]. In that work, a single domain, two-way DEDICOM model was computed on a matrix of bigram fre- quencies of tokens in a corpus and used to identify parts-of-speech as an unsupervised approach to that problem. An open problem identi ed in that work was the com- putation of a DEDICOM model that more closely resembled the matrices used in a Hidden Markov Model (HMM), speci cally through post-processing of the DEDICOM factor matrices. The work reported here consists of the description of several models that aim to provide a direct solution to that problem and a way to t those models. The approach taken here is to incorporate the model requirements as bound and lin- ear constrains into the DEDICOM model directly and solve the data tting problem as a constrained optimization problem. This is in contrast to the typical approaches in the literature, where the DEDICOM model is t using unconstrained optimization approaches, andmore » model requirements are satis ed as a post-processing step.« less
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom