FASTUS: A System for Extracting Information from Natural-Language Text
Author(s) -
Jerry R. Hobbs,
Douglas E. Appelt,
John Bear,
David Israël,
W. Mabry Tyson
Publication year - 1992
Publication title -
citeseer x (the pennsylvania state university)
Language(s) - English
Resource type - Reports
DOI - 10.21236/ada259435
Subject(s) - computer science , natural language processing , natural (archaeology) , artificial intelligence , information retrieval , geography , archaeology
: FASTUS is a system for extracting information from free text in English, and potentially other languages as well, for entry into a database, and potentially for other applications. It works essentially as a cascaded, nondeterministic finite state automaton. There are four steps in the operation of FASTUS. In Step (1) sentences are scanned for certain trigger words to determine whether further processing should be done. In Step (2) noun groups, verb groups, and prepositions and some other particles are recognized. The input to Step (3) is the sequence of phrases recognized in Step (2); patterns of interest are identified in Step (3) and corresponding incident structures are built up. In Step (4) incident structures that derive from the same incident are identified and merged, and these are used in generating database entries. FASTUS is an order of magnitude faster than any comparable system; it can process a news report in an average of less than eleven seconds. This translates directly into fast development time. In the three and a half weeks between its first use and the MUC-4 evaluation in May 1992, we were able to build up its domain knowledge to a point where it was among the leaders in the evaluation.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom