31 May, 2024: link-grammar 5.12.5 released! This release fixes MS Windows build issues; removes .lg_history litter, and fixes memory consumption issues related to large machine-learned dictionaries. See ChangeLog at bottom for a description of other changes in this release.
March, 2024: The abisource.com website is gone forever, and the new Link Grammar home page is here, on https://opencog.github.io/link-grammar-website/. Downloads of current and older source code tarballs are available at https://www.gnucash.org/link-grammar/downloads/. These changes are permanent, and are expected to remain valid into the indefinite future.
The Link Grammar Parser exhibits the linguistic (natural language) structure of English, Thai, Russian, Arabic, Persian and limited subsets of a half-dozen other languages. This structure is a graph of typed links (edges) between the words in a sentence. One may obtain the more conventional HPSG (constituent) and dependency style parses from Link Grammar by applying a collection of rules to convert to these different formats. This is possible because Link Grammar goes a bit "deeper" into the "syntactico-semantic" structure of a sentence: it provides considerably more fine-grained and detailed information than what is commonly available in conventional parsers.
The theory of Link Grammar parsing was originally developed in 1991 by Davy Temperley, John Lafferty and Daniel Sleator, at the time professors of linguistics and computer science at the Carnegie Mellon University. The three initial publications on this theory provide the best introduction and overview; since then, there have been hundreds of publications further exploring, examining and extending the ideas.
Although based on the original Carnegie-Mellon code base, the current Link Grammar package has dramatically evolved and is profoundly different from earlier versions. There have been innumerable bug fixes; performance has improved by more than an order of magnitude. The package is fully multi-threaded, fully UTF-8 enabled, and has been scrubbed for security, enabling cloud deployment. Parse coverage of English has been dramatically improved; other languages have been added (most notably, Thai and Russian). There is a raft of new features, including support for morphology, dialects, and a fine-grained weight (cost) system, allowing vector-embedding-like behaviour. There is a new, sophisticated tokenizer tailored for morphology: it can offer alternative splittings for morphologically ambiguous words. Dictionaries can be updated at run-time, enabling systems that perform continuous learning of grammar to also parse at the same time. That is, dictionary updates and parsing are mutually thread-safe. Classes of words can be recognized with regexes. Random planar graph parsing is fully supported; this allows uniform sampling of the space of planar graphs.
The latest addition is an experimental sentence generator; it is being used in the OpenCog Language Learning project, which aims to automatically learn Link Grammars from corpora, using brand-new and innovative information theoretic techniques, somewhat similar to those found in artificial neural nets (deep learning), but using explicitly symbolic representations.
The parser includes API's in various different programming languages, as well as a handy command-line tool for playing with it. Here's some typical output:
linkparser> This is a test! Linkage 1, cost vector = (UNUSED=0 DIS= 0.00 LEN=6) +-------------Xp------------+ +----->WV----->+---Ost--+ | +---Wd---+-Ss*b+ +Ds**c+ | | | | | | | LEFT-WALL this.p is.v a test.n ! (S (NP this.p) (VP is.v (NP a test.n)) !) LEFT-WALL 0.000 Wd+ hWV+ Xp+ this.p 0.000 Wd- Ss*b+ is.v 0.000 Ss- dWV- O*t+ a 0.000 Ds**c+ test.n 0.000 Ds**c- Os- ! 0.000 Xp- RW+ RIGHT-WALL 0.000 RW-
This rather busy display illustrates many interesting things. For example, the Ss*b link connects the verb and the subject, and indicates that the subject is singular. Likewise, the Ost link connects the verb and the object, and also indicates that the object is singular. The WV (verb-wall) link points at the head-verb of the sentence, while the Wd link points at the head-noun. The Xp link connects to the trailing punctuation. The Ds**c link connects the noun to the determiner: it again confirms that the noun is singular, and also that the noun starts with a consonant. (The PH link, not required here, is used to force phonetic agreement, distinguishing 'a' from 'an'). These link types are documented in the English Link Documentation.
The bottom of the display is a listing of the "disjuncts" used for each word. The disjuncts are simply a list of the connectors that were employed to form the links. They are particularly interesting because they serve as an extremely fine-grained form of a "part of speech" or "grammatical category", although they also can be interpreted as "semantic selections". Thus, for example: the disjunct S- O+ indicates a transitive verb: its a verb that takes both a subject and an object. The additional markup above indicates that 'is' is not only being used as a transitive verb, but it also indicates finer details: a transitive verb that took a singular subject, and was used (is usable as) the head verb of a sentence. The floating-point value is the "cost" of the disjunct; it very roughly captures the log-likelihood of this particular grammatical (and semantic!) usage. Much as parts-of-speech correlate with word-meanings, so also fine-grained parts-of-speech correlate with much finer distinctions and gradations of meaning.
The link-grammar parser also supports morphological analysis. Here is an example in Russian:
linkparser> это теста Linkage 1, cost vector = (UNUSED=0 DIS= 0.00 LEN=4) +-----MVAip-----+ +---Wd---+ +-LLCAG-+ | | | | LEFT-WALL это.msi тест.= =а.ndnpi
The LL link connects the stem 'тест' to the suffix 'а'. The MVA link connects only to the suffix, because, in Russian, it is the suffixes that carry all of the syntactic structure, and not the stems. The Russian lexis is documented here.
The Thai dictionary is now fully developed, effectively covering the entire language. An example in Thai:
linkparser> นายกรัฐมนตรี ขึ้น กล่าว สุนทรพจน์ Linkage 1, cost vector = (UNUSED=0 DIS= 2.00 LEN=2) +---------LWs--------+ | +<---S<--+--VS-+-->O-->+ | | | | | LEFT-WALL นายกรัฐมนตรี.n ขึ้น.v กล่าว.v สุนทรพจน์.n
The VS link connects two verbs 'ขึ้น' and 'กล่าว' in a serial verb construction. A summary of link types is documented here. Full documentation of Thai Link Grammar can be found here.
Thai Link Grammar also accepts POS-tagged and named-entity-tagged inputs. Each word can be annotated with the Link POS tag. For example:
linkparser> เมื่อวานนี้.n มี.ve คน.n มา.x ติดต่อ.v คุณ.pr ครับ.pt Found 1 linkage (1 had no P.P. violations) Unique linkage, cost vector = (UNUSED=0 DIS= 0.00 LEN=12) +---------------------PT--------------------+ +---------LWs---------+---------->VE---------->+ | | +<---S<---+-->O-->+ +<--AXw<-+--->O--->+ | | | | | | | | | LEFT-WALL เมื่อวานนี้.n[!] มี.ve[!] คน.n[!] มา.x[!] ติดต่อ.v[!] คุณ.pr[!] ครับ.pt[!]
Full documentation for the Thai dictionary can be found here.
The Thai dictionary accepts LST20 tagsets for POS and named entities, to bridge the gap between fundamental NLP tools and the Link Parser. For example:
linkparser> linkparser> วันที่_25_ธันวาคม@DTM ของ@PS ทุก@AJ ปี@NN เป็น@VV วัน@NN คริสต์มาส@NN Found 348 linkages (348 had no P.P. violations) Linkage 1, cost vector = (UNUSED=0 DIS= 1.00 LEN=10) +--------------------------------LWs--------------------------------+ | +<------------------------S<------------------------+ | | +---------->PO--------->+ | | +----->AJpr----->+ +<---AJj<--+ +---->O---->+------NZ-----+ | | | | | | | | LEFT-WALL วันที่_25_ธันวาคม@DTM[!] ของ@PS[!].pnn ทุก@AJ[!].jl ปี@NN[!].n เป็น@VV[!].v วัน@NN[!].na คริสต์มาส@NN[!].n
Note that each word above is annotated with LST20 POS tags and NE tags. Full documentation for both the Link POS tags and the LST20 tagsets can be found here. More information about LST20, e.g. annotation guideline and data statistics, can be found here.
The any language supports uniformly-sampled random planar graphs:
linkparser> asdf qwer tyuiop fghj bbb Found 1162 linkages (1162 had no P.P. violations) +-------ANY------+-------ANY------+ +---ANY--+--ANY--+ +---ANY--+--ANY--+ | | | | | | LEFT-WALL asdf[!] qwer[!] tyuiop[!] fghj[!] bbb[!]
The ady language does likewise, performing random morphological splittings:
linkparser> asdf qwerty fghjbbb Found 1512 linkages (1512 had no P.P. violations) +------------------ANY-----------------+ +-----ANY----+-------ANY------+ +---------LL--------+ | | | | | LEFT-WALL asdf[!ANY-WORD] qwerty[!ANY-WORD] fgh[!SIMPLE-STEM].= =jbbb[!SIMPLE-SUFF]
An extended overview and summary of Link Grammar can be found on the Link Grammar Wikipedia page, which touches on most of the important, primary aspects of the theory. However, it is no substitute for the original papers published on the topic:
A fairly comprehensive bibliography of papers written before 2004 is here and is mirrored here. A sampling of publications that reference Link Grammar in some way can be found here; some of these may be downloaded here.
There is an extensive set of pages documenting the English dictionary; specifically, the names of links and their meanings, as well as how to write new rules. There is also a short primer for creating dictionaries for new languages.
The documentation for the C/C++ programming API is here. Bindings for other programming languages can be found in the bindings directory in the GitHub Link Grammar Repo.
The source code to the system can be downloaded as a tarball. The current stable version is Link Grammar 5.12.5 (May 2024). Older versions are available here.
GitHub hosts the primary link-grammar repository. Issues (bugs) should be reported there. Developers who are not a part of the core development team should not use or deploy the source from github. It is unstable and frequently buggy and broken! All users should use the tarballs, only!
The mailing list for Link Grammar discussion is at the link-grammar google group.
Subscribe to link-grammar:
Ongoing development of Link Grammar is guided and supported by the Open Cognition project, where the parser plays an important role in the OpenCog natural language processing subsystem. Research and implementation is ongoing; current work includes investigations into unsupervised learning of language.
Link Grammar is a natural language parser, not a human-level artificial general intelligence. This means that there are many sentences that it cannot parse correctly, or at all. There are entire classes of speech and writing that it cannot handle, including twitter posts, IRC chat logs, Valley-girl basilect, Old and Middle English, stock-market listings and raw HTML dumps.
Link Grammar works best with "newspaper English", as taught to and written by those educated in American colleges: standard-sized sentences, with proper grammar, proper punctuation, and correct capitalization. Link Grammar has difficulties with the following types of textual input:
It is hoped that the unsupervised learning of language proposal will be of sufficient power and ability to handle most of these exceptional cases. Work is currently ongoing.
Ranked in order of maturity.
Документация по связям и по классам слов доступна в виде списка примеров.
Sukurta yra labai prasta Lietuvių kalbos žodynas; beveik neiks ikį šiol neveikia. Čia dokumentacija.
Version 5.12.5 (31 May 2024)
Version 5.12.4 (28 March 2024)
Version 5.12.3 (24 March 2023)
Version 5.12.2 (9 March 2023)
Version 5.12.1 (5 March 2023)
Improved AtomSpace behavior. In this version, MST parsing becomes not just usable, but also fast.
Version 5.12.0 (26 November 2022)
This release contains an important bug-fix for a multi-threaded race and crash in the regex code. This is quite rare: I was seeing a crash after 24 hours when running 6 threads. If you're not running at that level, chances are slim you'll see it. But still.
Also notable: this version can attach to a live dictionary running in the AtomSpace. This offers some major improvements over the previous version; a bit more is planned, as integration becomes tighter.
Version 5.11.0 (27 September 2022)
Most notable in this release is a preliminary prototype interface to the OpenCog AtomSpace. This allows working directly with language data in the AtomSpace; avoiding the need to export the language model.
A list of older changes can be found here.
Issues concerning this website should be handled by opening a bug report at https://github.com/opencog/link-grammar-website/
Current versions of the Link Grammar parser software, language dictionaries and documentation are available under the LGPL v2.1 license. Versions prior to 5.0.0 are available under a variant of the BSD license.
Copyright (c) 2003-2004 Daniel Sleator, David Temperley, and John
Lafferty. All rights reserved.
Copyright (c) 2003 Peter Szolovits
Copyright (c) 2004,2012,2013 Sergey Protasov
Copyright (c) 2006 Sampo Pyysalo
Copyright (c) 2007 Mike Ross
Copyright (c) 2008,2009,2010 Borislav Iordanov
Copyright (c) 2008-2022 Linas Vepstas
Copyright (c) 2014-2022 Amir Plivatsky
Copyright (c) 2021-2022 Prachya Boonkwan.