home · mobile · calendar · defenses · 2011-2012 · 

Thesis Defense - Salvetti

Detecting Deception in Text: A Corpus-Driven Approach
Computer Science PhD Candidate

Deception is a pervasive psycholinguistic phenomenon -- from lies during legal trials to fabricated online reviews. Its identification has been studied for centuries -- from the ancient Chinese method of spitting dry rice to the modern polygraph. The recent proliferation of deceptive online reviews has increased the need for automatic deception filtering systems. Although human performance is in general at chance, previous research suggests that the linguistic signals resulting from conscious deception are sufficient for building automatic systems capable of distinguishing deceptive documents from truthful ones. Our interest is in identifying the invariant traits of deception in text, and we argue that these encouraging results in automatic deception detection are mainly due to the side effects of corpus-specific features. This poses no harm to practical applications, but it does not foster a deeper investigation of deception. To demonstrate this and to allow researchers and practitioners to share results, we have developed the largest publicly available shared multidimensional deception corpus for online reviews. In an attempt to overcome the inherent lack of ground truth, we have also developed a set of semi-automatic techniques to ensure corpus validity. This thesis shows that detecting deception using supervised machine learning methods is brittle. Experiments conducted using this corpus show that accuracy changes across different kinds of deception (e.g., lying vs. fabrication) and text content dimensions (e.g., sentiment), demonstrating the limitations of previous studies. Preliminary results confirm statistical separation, though smaller, between fabricated and truthful reviews, and they do not confirm the existence of separation between truths and lies.

Committee: James Martin, Professor (Chair)
Clayton Lewis, Professor
Wayne Ward, Research Professor
Daniel Jurafsky, Stanford University
Peter Norvig, Google
Department of Computer Science
University of Colorado Boulder
Boulder, CO 80309-0430 USA
May 5, 2012 (14:20)