Abstract and Introduction from IWPT93 paper
Below is a portion of the introduction from Parsing English with a
Link Grammar, by Daniel Sleator and Davy Temperley, which
appeared in The Third International Workshop on Parsing Technologies,
1993. This paper is also available in
postsript form. The first few paragraphs of the introduction contain a
description of the basic logic of link grammars.
We define a new type of formal grammatical system called a LINK
GRAMMAR. A sequence of words is in the language of a link grammar
if there is a way to draw LINKS between words in such a way that
(1) the local requirements of each word are satisfied, (2) the
links do not cross, and (3) the words form a connected graph. We
have encoded English grammar into such a system, and written a
program (based on new algorithms) for efficiently parsing with a
link grammar. The breadth of English phenomena that our system
handles is quite large -- it handles: noun-verb agreement,
questions, imperatives, complex and irregular verbs, many types of
nouns, past- or present-participles in noun phrases, commas, a
variety of adjective types, prepositions, adverbs, relative
clauses, possessives, coordinating conjunctions, and many other
things. A number of sophisticated and new techniques were used to
allow efficient parsing of this very complex grammar. Our program
is written in C, and the entire system may be obtained via
anonymous ftp.
1. Introduction
Most sentences of most natural languages have the property that if arcs are
drawn connecting each pair of words that relate to each other, then the
arcs will not cross. This well-known phenomenon, which we call PLANARITY,
is the basis of LINK GRAMMARS, our new formal language system.
A link grammar consists of a set of WORDS (the terminal symbols of the
grammar), each of which has a LINKING REQUIREMENT. A sequence of words is a
SENTENCE of the language defined by the grammar if there exists a way to
draw arcs (which we shall hereafter call LINKS) among the words so as to
satisfy the following conditions:
Planarity: The links do not cross (when drawn above the words).
Connectivity: The links suffice to connect all the words of the
sequence together.
Satisfaction: The links satisfy the linking requirements of each word
in the sequence.
The linking requirements of each word are contained in a DICTIONARY. To
illustrate the linking requirements, the following diagram shows a simple
dictionary for the words ``a'', ``the'', ``cat'', ``snake'', ``Mary'',
``ran'', and ``chased''. The linking requirement of each word is
represented by the diagram above the word.
O----+
+----D D--+ | +--S O--+ +--S S----+ S--+ +--O
| | | | | | | | |
+--|--+ +-|--\-/-+ +-\-/-+ +--|--+ +|---|+
| * | | * * | | * | | * | |* *|
+-----+ +--------+ +-----+ +-----+ +-----+
a cat Mary ran chased
the snake
At then end of each tentacle sticking out of each of these boxes is a
character. This is called a CONNECTOR. A connector is SATISFIED by
matching it to a compatible connector (one with the same label) facing in
the opposite direction. Exactly one of the connectors attached to a given
``*'' must be satisfied (the others, if any, must not be used). Thus,
``cat'' requires a D connector to its left, and either an O connector to
its left or a S connector to its right. Attaching a pair of matching
connectors together corresponds to drawing a link between that pair of
words.
The following diagram shows how the linking requirements are satisfied in
the sentence ``The cat chased a snake''.
+-----------O------------+
| |
+---D---+ +---S---+ | +-----D----+ |
| | | | | | | |
+--|--+ +-|---|-+ +|---|+ +--|--+ +-|---|--+
| * | | * * | |* *| | * | | * * |
+-----+ +-------+ +-----+ +-----+ +--------+
the cat chased a snake
(The unused connectors have been suppressed here.) It is easy to see that
``Mary chased the cat'', and ``the cat ran'' are also sentences of this
grammar. The sequence of words: ``the Mary chased cat'' is not in this
language. Any attempt to satisfy the linking requirements leads to a
violation of one of the three rules. Here is one attempt:
+--------------------D-------------+
| |
| +-----S---+ +------O---+---+
| | | | | |
+--|--+ +---|---+ +|---|+ +-|---|--+
| * | | * | |* *| | * * |
+-----+ +-------+ +-----+ +--------+
the Mary chased cat
Similarly ``ran Mary'', and ``cat ran chased'' are not part of this
language.
A set of links that prove that a sequence of words is in the language of a
link grammar is called a LINKAGE. From now on we'll use simpler diagrams to
illustrate linkages. Here is the simplified form of the diagram showing
that ``the cat chased a snake'' is part of this language:
+---O---+
+-D-+--S--+ +-D-+
| | | | |
the cat chased a snake
We have a succinct, computer-readable notation for expressing the
dictionary of linking requirements. The following dictionary encodes the
linking requirements of the previous example.
words | formula
------------|-----------------
a the | D+
ran | S-
Mary John | O- or S+
chased | S- & O+
snake cat | D- & (O- or S+)
The linking requirement for each word is expressed as a formula involving
the operators &, and or, parentheses, and connector names. The + or -
suffix on a connector name indicates the direction (relative to the word
being defined) in which the matching connector (if any) must lie. The & of
two formulas is satisfied by satisfying both the formulas. The or of two
formulas requires that exactly one of its formulas be satisfied. The order
of the arguments of an & operator is significant. The farther left a
connector is in the expression, the nearer the word to which it connects
must be. Thus, when using ``cat'' as an object, its determiner (to which
it is connected with its D- connector) must be closer than the verb (to
which it is connected with its O- connector).
We can roughly divide our work on link grammars into three parts: the link
grammar formalism and its properties, the construction of a wide-coverage
link grammar for English, and efficient algorithms and techniques for
parsing link grammars. We now touch briefly on all three of these aspects.
Link grammars are a new and elegant context-free grammatical formalism.
[Two footnotes are appropriate here. (1) Link grammars resemble dependency
grammars and categorial grammars. There are also many significant
differences. The relationship between these formalisms and our work is
described in section 7. (2) The proof of the context-freeness of link
grammars is not included in this paper, but appears in our technical
report [14]. The reader is urged not to jump to any conclusions about the
limitations of our system because of its near context-freeness.
Context-free systems can differ in many ways, including the ease with which
the same grammar can be expressed, the efficiency with which the same
grammar can be parsed, and the usefulness of the output of the parser for
further processing.]
Link grammars have a unique combination of useful properties:
1. In a link grammar each word of the lexicon is given a definition
describing how it can be used in a sentence. The grammar is
distributed among the words. Such a system is said to be LEXICAL.
This has several important advantages. It makes it easier to
construct a large grammar, because a change in the definition of a
word only affects the grammaticality of sentences involving that
word. The grammar can easily be constructed incrementally.
Furthermore, expressing the grammar of the irregular verbs of
English is easy -- there's a separate definition for each word.
Another nice feature of a lexical system is that it allows the
construction of useful probabilistic language models. This has led
researchers to construct lexical versions of other grammatical
systems, such as tree-adjoining grammars [13]. Lafferty and the
present authors have also constructed such a probabilistic model
for link grammars [11].
2. Unlike a phrase structure grammar, after parsing a sentence with a
link grammar words that are associated semantically and
syntactically are directly linked. This makes it easy to enforce
agreement, and to gather statistical information about the
relationships between words.
3. In English, whether or not a noun needs a determiner is independent
of whether it is used as a subject, an object, or even if it is
part of a prepositional phrase. The algebraic notation we
developed for expressing a link grammar takes advantage of this
orthogonality. Any lexical grammatical system, if it is to be used
by a human, must have such a capability. In our current on-line
dictionary the word ``cat'' can be used in 369 different ways, and
for ``time'' this number is 1689. A compact link grammar formula
captures this large number of possibilities, and can easily be
written and comprehended by a human being.
4. Another interesting property of link grammars is that they have no
explicit notion of constituents or categories. In most sentences
parsed with our dictionaries, constituents can be seen to emerge as
contiguous connected collections of words attached to the rest of
the sentence by a particular type of link. For example in the
dictionary above, S links always attach a noun phrase (the
connected collection of words at the left end of the link) to a
verb (on the right end of the link). O links work in a similar
fashion. In these cases the links of a sentence can be viewed as
an alternative way of specifying the constituent structure of the
sentence. On the other hand, this is not the way we think about
link grammars, and we see no advantage in taking that perspective.
Our second result is the construction of a link grammar dictionary for
English. The goal we set for ourselves was to make a link grammar that can
distinguish, as accurately as possible, syntactically correct English
sentences from incorrect ones. We chose a formal or newspaper-style
English as our model. The result is a link grammar of roughly eight
hundred definitions (formulas) and 25000 words that captures many phenomena
of English grammar. It handles: noun-verb agreement, questions,
imperatives, complex and irregular verbs, many types of nouns, past- or
present-participles in noun phrases, commas, a variety of adjective types,
prepositions, adverbs, relative clauses, possessives, coordinating
conjunctions, unbounded dependencies, and many other things.
The third result described in this paper is a program for parsing with link
grammars. The program reads in a dictionary (in a form very similar to the
table above) and will parse sentences according to the given grammar. The
program does an exhaustive search -- it finds every way of parsing the
given sequence with the given link grammar. It is based on our own O(n^3)
algorithm (n is the number of words in the sentence to be parsed). The
program also makes use of several very effective data structures and
heuristics to speed up parsing. The program is comfortably fast -- parsing
typical newspaper sentences in a few seconds on a modern workstation.
Daniel Sleator
Last modified: Sun Mar 22 22:25:13 EST 1998