PHRASE parsers from Multi-Axiom Grammars

by Teodor Rus, James S. Jones

Abstract

Multi-axiom grammars (MAG) are alternatives to the single-axiom context free grammars (CFG) and all-axiom algebraic grammars (AG) for programming language specification. Neither phrase recognition nor algebraic mechanisms for language processing are supported by CFGs. AGs support algebraic mechanisms for language processing but specify a smaller class of languages. MAGs avoid these limitations. This paper describes a new parsing algorithm developed on this basis which recognizes any phrase in the language. Moreover, it does so by distributing the parsing task among a collection of smaller parsers which handle well-defined layers of the language in a piping manner. These language-layers are determined by the algebraic properties of the MAGs and are described in the paper. Basic definitions are given for multi-axiom grammar and language as well as for algebraic notions of subgrammar, primitive subgrammar, quotient grammar, and grammar/language layer. Algorithms are described to stratify a programming language into a hierarchy of layers, to construct parsers for each layer analogous to LR construction, and to accomplish the overall task of multi-layered parsing in pipeline fashion based on a tokenization which occurs between the language layers. This pipeline parallel process is a model for high speed, left-to-right language translation.

Keywords : Grammar; Subgrammar; Pipeline parsing; Tokenize; Phrase


Return to Jim Jones's faculty homepage