Multi-Layered Pipeline Parsing of Phrases from Multi-Axiom Grammars
by James S. Jones
Thesis for Ph.D. in Computer Science
The University of Iowa
May 1997
Abstract
Conventional approaches for specifying and processing programming languages
have taken a two-layered approach. The division between these layers has been
based on differences in broad classes of languages and machines which have
encourged the use of different methods for language definition and parsing.
As a result, many-layered approaches which also support the processing of
phrase components of a language are not well represented, certainly not
those for left-to-right parsing.
A multi-layer, unified approach for language specification and parsing is
presented which is based on the multi-axiom grammar (MAG) and properties
associated with it. These are defined and it is shown why the MAG is
superior to the single-axiom context free grammar (CFG) and the all-axiom
algebraic grammar (AG) for specifying a language and all of its layers
(superior with regard to the definable classes of languages and
phrase definition).
An algorithm is described to stratify a programming language generated
by a MAG into a hierarchy of layers, each being a sublanguage of phrases
generated by a subgrammar. The layering process is based on a notion
of primitivity such that a phrase at a given layer is composed
of primitive elements and (sub)phrases from other layers.
The PHRASE parsing method is defined which operates for a given language
layer and tokenizes any phrase at that layer which is embedded in the
input stream. It is a multi-axiom extension of LR parsing, named for
its actions: Pass, Halt, Reduce, Accept, Shift, and Error.
Simplified-MLR (SMLR) and fully general Multi-axiom LR (MLR) implementations
of PHRASE are described.
It is shown how this collection of layered parsers are used in a
multi-pass fashion to parse an entire input stream
based on a multi-layered tokenization process, and furthermore how this
can be pipelined parallelized for high speed, left-to-right
language translation.
UMI Dissertation Services
UMI Number: 9731814
300 North Zeeb Road
Ann Arbor, MI 48103