Premium
Extending regular expressions with context operators and parse extraction
Author(s) -
Kearns Steven M.
Publication year - 1991
Publication title -
software: practice and experience
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.437
H-Index - 70
eISSN - 1097-024X
pISSN - 0038-0644
DOI - 10.1002/spe.4380210803
Subject(s) - regular expression , parsing , computer science , character (mathematics) , context (archaeology) , matching (statistics) , automaton , expression (computer science) , pattern matching , regular language , parse tree , finite state machine , theoretical computer science , artificial intelligence , natural language processing , algorithm , programming language , mathematics , paleontology , statistics , geometry , biology
Regular expressions are used in many applications to specify patterns because any regular expression can be compiled into a very efficient one‐pass pattern matcher called a finite automaton. Finding matches is useful, but even more useful is parse extraction, which describes in detail how a pattern matches some input. After matching an address, for example, parse extraction makes it easy to find out the Zip code part of the address. We present an elegant, efficient algorithm for extracting a parse after matching with a finite automaton. In addition, we extend the regular expression language to include new operators for matching arbitrary left context and single character right context. The extended language can be matched as efficiently as the usual regular expression language, but is more expressive. Finally, we suggest how to apply the matching algorithms to match regular expressions containing arbitrary right context and single character left context. In effect, this allows one to specify patterns that seem to require an unlimited amount of look‐ahead to match.