> Speech and Language Resources Available on the Web
CHAPTER 3: Morphology and Finite-State Transducers
, the two-level morphological parser.
A public domain FSA/FST toolbox implemented in SICStus Prolog
. Includes a binary version for Sparc/Solaris, for those Solaris users without Prolog.
XEROX XRCE (Xerox Research Centre Europe)'s
Xerox Finite-State Compiler
. Not available for free, but you can try out your own examples on the web site.
Computational Phonology and Morphology SIL site
CHAPTER 4: Computational Phonology and text-to-speech
Speech Synthesis System.
Course in Phonetics
, Very useful materials including sets of International Phonetic Alphabet (IPA) charts together with sounds.
CMU pronunciation dictionary
The International Phonetic Association
Has fonts, printable charts of the International Phonetic Alphabet, etc.
labelling guideline and general introduction.
CHAPTER 5: Probabilistic Models of Pronunciation and Spelling
Decision Tree Package
CHAPTER 6: N-grams
The SRI Language Modeling toolkit
The CMU-Cambridge Statistical Language Modeling toolkit
FTP site for Geoffrey Sampson's C code for a Simple Good-Turing Frequency Estimator"
CHAPTER 7: HMMs and Speech Recognition
The CSLU Speech Toolkit
. Free, but you have to register first. A comprehensive suite of speech recognition, synthesis, and dialog tools.
The Mississippi State ISIP public domain speech recognizer
Also includes a discrete HMM toolkit, a decision tree package, etc.
CHAPTER 8: Word Classes and Part-of-Speech Tagging
Brill's supervised and unsupervised
Transformation-Based Learning (TBL) tagger
; this is C code from 1996.
Here's the ftp site
Xerox PARC's HMM tagger (Common LISP Code from 1994)
Adwait Ratnaparkhi's MXPOST (Maximum Entropy POS Tagger)
CHAPTERS 10 and 12: Parsing and Probabilistic Parsing
Penn XTAG project
Large tree-adjoining grammar, grammar tree-viewer, parser, and SuperTagger. (C/Common LISP/Tck/Tk)
CMU Link Grammar Parser
NYU "Apple Pie" Parser
(bottom-up probabilistic non-lexicalized chart parser and grammar from the Penn Treebank)
Conexor Dependency and Phrase structure parser
(You have to pay for the software, but can use the form on the web version just to try out some examples.)
A bracket scoring program
, by Satoshi Sekine and Michael Collins. It reports precision, recall, non crossing and tagging accuracy.
CHAPTER 11: Features and Unification
ALE (Attribute Logic Engine)
. This is a freeware system (written in Prolog) which integrates phrase structure parsing and constraint logic programming with typed feature structures as terms.
CHAPTER 16: Lexical Semantics
, the WordNet
site, and the European
at the University of Stuttgart
CHAPTER 17: Word Sense Disambiguation and Information Retrieval
The SENSEVAL effort to evaluate WSD systems.
Glasgow's collection of publically available
IR test collections.
Reuters-21578 text categorization test collection
courtesy of David Lewis.
A set of pointers to IR resources from
information retrieval system from Cornell.
MG text indexing and retrieval system
from the Managing Gigabytes text.
A set of papers and demos describing
Latent Semantic Indexing
CHAPTER 20: Generation
Pointers to natural language generation resources can be found on the
CHAPTER 21: Machine Translation
European Association for MT
site contains a wealth of MT information.
The EGYPT Statistical Machine Translation Toolkit
Adwait Ratnaparkhi's MXTERMINATOR (Sentence Boundary Detector)
Andrew McCallum's C library "Bow: A Toolkit for Statistical Language Modeling, Text Retrieval, Classification and Clustering"
Important Lists/Sources of Software, Corpora, and Other SLP Resources
The Natural Language Software Registry
Chris Manning's list of resources for statistical natural language processing and corpus-based computational linguistics
Linguistic Data Consortium (LDC)
The ACL NLP/CL Universe Index of Computational Linguistics and Natural Language Processing resources, including a search engine
The Center for Lexical Research
Summer Institute of Linguistics (SIL): Linguistic Resources on the Internet
The ELSNET Resources Page
TELRI Research Archive of Computational Tools and Resources
Bavarian Archive for Speech Signals