SPEECH and LANGUAGE PROCESSING

An Introduction to Natural Language Processing,
Computational Linguistics, and Speech Recognition
Second Edition

by Daniel Jurafsky and James H. Martin

Last Update
January 6, 2009
The 2nd edition is now avaiable. A million thanks to everyone who sent us corrections and suggestions for all the draft chapters. You can find the book at Amazon.

Chapter 1 is available below for your reading pleasure.

Figures, slides and an instructor's manual are available from Prentice Hall at their Instructor's Resource Site (registration required).

An errata page for this edition is available.

Material from the 1st edition is still available.

Table of Contents

Preface

Chapter 1 PDF 1 Introduction

I: Words

2 Regular Expressions and Automata
3 Words and Transducers
4 N-grams
5 Part-of-Speech Tagging
6 Hidden Markov and Maximum Entropy Models

II: Speech

7 Phonetics
8 Speech Synthesis
9 Automatic Speech Recognition
10 Speech Recognition: Advanced Topics
11 Computational Phonology

III: Syntax

12 Formal Grammars of English
13 Syntactic Parsing
14 Statistical Parsing
15 Features and Unification
16 Language and Complexity

IV: Semantics and Pragmatics

17 The Representation of Meaning
18 Computational Semantics
19 Lexical Semantics
20 Computational Lexical Semantics
21 Computational Discourse

V: Applications

22 Information Extraction
23 Question Answering and Summarization
24 Dialog and Conversational Agents
25 Machine Translation

Chapter 1: Introduction

This chapter is largely the same with updated history and pointers to newer applications. PDF for Chapter 1 (top)

Chapter 2: Regular Expressions and Automata

This chapter is largely the same with some bug fixes. (top)

Chapter 3: Words and Transducers

This new version of the chapter still focuses on morphology and FSTs, but is expanded in various ways. There are more details about the formal descriptions of finite-state transducers, many bugs are fixed, and two new sections are added relating to words and subwords. The first new section is on word and sentence tokenization, including algorithms for English as well as the maxmatch algorithm for Chinese word segmentation. The second new section is on spelling correction and minimum edit distance, and is an extended version of the edit-distance section from Chapter 5 of the first edition, with clearer figures for example for explaining the minimum-edit-distance backtrace. (top)

Chapter 4: N-grams (Formerly Chapter 6)

This updated language model chapter has had a complete overhaul. This draft includes more examples, a more complete description of Good-Turing, expanded sections on practical issues like perplexity and evaluation, language modeling toolkits, including ARPA format, and an overview of modern methods like interpolated Kneser-Ney. (top)

Chapter 5: Part-of-Speech Tagging (Formerly Chapter 8)

The main change to this revised chapter is a greatly expanded, and hence self-contained, description of bigram and trigram HMM part-of-speech tagging, including Viterbi decoding and deleted interpolation smoothing. Together with the new Chapter 6, this allows a complete introduction to HMMs for courses that don't use the speech recognition chapters. Other changes in this chapter include expanded descriptions of unknown word modeling and part-of-speech tagging in other languages, and many bug fixes. Finally, we've moved this chapter earlier in the book. (top)

Chapter 6: Hidden Markov and Maximum Entropy Models (Formerly part of Chapter 7 and Appendix D)

This new chapter introduces two sequence models: HMMs and MEMMs. It gives the details of Hidden Markov Models, including Forward, Viterbi, and EM. It then introduces MaxEnt models, begining with linear regression, followed by logistic regression, then the extension to MaxEnt, and finally the MEMM and the Viterbi intuition. (top)

Chapter 7: Phonetics (Formerly parts of Chapters 4, 5, and 7)

This chapter is an introduction to articulatory and acoustic phonetics for speech processing, as well as foundational tools like the ARPAbet, wavefile formats, phonetic dictionaries, and PRAAT. (top)

Chapter 8: Speech Synthesis

This is a new chapter on speech synthesis. (top)

Chapter 9: Automatic Speech Recognition (Formerly 7)

This new significantly-expanded speech recognition chapter gives a complete introduction to HMM-based speech recognition, including extraction of MFCC features, Gaussian Mixture Model acoustic models, and embedded training. (top)

Chapter 10: Speech Recognition: Advanced Topics (New Chapter)

This new second chapter on speech recognition covers advanced topics like decision-tree clustering for context-dependent phones, advanced decoding (including n-best lists, lattices, confusion networks, and stack decoding), robustness (including MLLR adaptation), discriminative training, and human speech recognition. (top)

Chapter 11: Computational Phonology (Formerly parts of Chapters 4, 5, and 7)

This chapter is a brief introduction to computational phonology, including phonological and morphological learning, finite-state models, OT, and Stochastic OT. (top)

Chapter 12: Formal Grammars of English (Formerly 9)

This chapter still focuses on CFGs for English and includes a revamped and somewhat expanded grammar for the ATIS domain. New and expanded sections cover: treebanks with a focus on the Penn Treebank, searching treebanks with tgrep and tgrep2, heads and head-finding rules, dependency grammars, Categorial grammar, and grammars for spoken language processing. (top)

Chapter 13: Syntactic Parsing (Formerly 10)

The focus of this chapter is still on parsing with CFGs. It now includes sections on CKY, Earley and agenda-based (chart) parsing. In addition, there is a new section on partial parsing with a focus on machine learning based base-phrase chunking and the use of IOB tags. (top)

Chapter 14: Statistical Parsing (Formerly 12)

This statistical parsing chapter has been extensively revised. It now covers PCFGs, probabilistic CKY parsing, parent annotations, the Collins parser, and touches on advanced topics such as discriminative reranking and parsing for language modeling. (top)

Chapter 15: Features and Unification (Formerly 11)

Mainly bug fixes. (top)

Chapter 16: Language and Complexity (Formerly 13)

Mainly bug fixes. (top)

Chapter 17: The Representation of Meaning (Formerly 14)

This chapter still covers basic notions surrounding meaning representation languages. It now has better coverage of model-theoretic semantics for meaning representations, and a new section on Description Logics and their role as a basis for OWL and its role in the Semantic Web. (top)

Chapter 18: Computational Semantics (Formerly 15)

This chapter covers compositional approaches to semantic analysis at the sentence level. The primary focus is on rule-to-rule approaches based on lambda-expressions. It also now has new coverage of unification-based approaches to computational semantics. Coverage in the old chapter 15 on semantic grammars has been moved to the discourse chapter; coverage of information extraction has been expanded and moved to the new chapter 22. (top)

Chapter 19: Lexical Semantics (Formerly 16)

This chapter still covers the basics of lexical semantics, including sense relations, semantic roles, and primitive decomposition. The treatment of semantic roles has been updated, as has the coverage of WordNet, and new sections added for PropBank and FrameNet. (top)

Chapter 20: Computational Lexical Semantics (New Chapter; Parts of old Chs. 15, 16 and 17)

The focus of this new chapter is on computing with word meanings. The three main topics are word sense disambiguation, computing relations between words (similarity, hyponymy, etc.), and semantic role labeling. It considerably expands the treatment of these topics. (top)

Chapter 21: Computational Discourse

This rewritten chapter includes a number of updates to the first edition. The anaphora resolution section is updated to include modern log-linear methods, and a section on the more general problem of coreference is also included. The coherence section describes cue-based methods for rhetorical relation and coherence relation extraction. Finally, there is a significant new section on discourse segmentation (including TextTiling). (top)

Chapter 22: Information Extraction (New chapter: Parts of old 15)

This new chapter surveys current approaches to information extraction. The main topics are named entity recognition, relation detection, temporal expression analysis and template-filling. The primary focus is on supervised machine learning approaches to these topics. The coverage on mostly finite-state methods (FASTUS) has been moved from the old Ch. 15 to here. (top)

Chapter 23: Question Answering and Summarization (Mostly new; Parts of old 17 and 20)

This new chapter covers two applications, question answering and summarization. A brief introduction to the necessary background material from information retrieval is also included. The chapter includes factoid question answering, single document summarization, generic multiple document summarization, and query-focused summarization. (top)

Chapter 24: Dialog and Conversational Agents (Formerly 19)

This is a completely rewritten version of the dialogue chapter. It includes much more information on modern dialogue systems, including VoiceXML, confirmation and clarification dialogues, the information-state model, markov decision processes, and other current approaches to dialogue agents. (top)

Chapter 25: Machine Translation

The MT chapter has been extensively rewritten and a significant new section added covering statistical MT, including IBM Model 1, Model 3, and HMM alignment. A new evaluation section covering human evaluation and Bleu has also been added, as well as sections on SYSTRAN and more details on cross-linguistic divergences. (top)