The current focus of my work is handling speech recognition errors in spoken language dialogue systems. Errors introduced into spoken language dialogue systems (SLDSs) by imperfect automatic speech recognition (ASR) are one of the major problem facing these systems today. Unless a new paradigm for speech recognition arises, ASR cannot be expected to perform much better than human recognition, and even humans make approximately 10\% errors when recognizing words out of context.
If errors are inevitable, it would be useful to have a principled approach to identifying when they occur and a set of well-grounded and generalizable techniques that enable us to handle them appropriately.
My research studies the nature of errors introduced into dialogue by imperfect automatic speech recognition, to explore how errors can be identified before they are introduced into dialogue, to find ways to catch user protest when errors are introduced and to handle the repair.
My work will result in a detailed theory of error recognition and repair, implemented in a working spoken language dialogue system.
You can contact me at stephenc@ics.mq.edu.au
2005 -- Conference Paper:Investigating the Acoustic Sources of Speech Recognition Errors submitted to Interspeech2005 pdf
2005 -- Workshop Paper:Speech Recognition: Handling Error Recognition and Repair submitted to ACL-05 Students Workshop pdf
2004 -- Conference Paper (jointly with Professor Robert Dale) User Responses to Speech Recognition Errors: Consistency of Behaviour across Domains Accepted for SST 2004 pdf
2004 -- presentation: Towards a Theory of Error Recognition and Repair powerpoint
2004 -- Ph.D Proposal: Handling Speech Recognition Errors in Spoken Language Dialogue Systems postscript
2004 -- Workshop Paper: Handling Speech Recognition Errors in Spoken Language Dialogue Systems submitted to ACL-04 Students Workshop pdf
2002 -- Honours Thesis: VoiceDBC: A semi-automatic tool for writing speech applications pdf
2002 -- Software: VoiceDBC V1
No I didn't have a year off, I was just figuring out what to do for my Ph.D. ;-)
If you find you cannot track down any of these email me and I'll try to drag out a copy.
J. Allen. Robust understanding in a dialogue system. In 34th Meeting of the Association for Computational Linguistics, 1995.
J. Allen, D. Byron, M. D. Kovs, G. Fergus, L. Gales, and A. S. Tent. An architecture for a generic dialogue shell. Natural Language Engineering, 1(1), 1998.
J. Austin. How to do things with words. Oxford University Press, 1962.
A. Baddeley. Working Memory. Clarendon Press, Oxford, 1987.
A. Bagga, G. C. Stein, and T. Strzalkowski. FidelityXpress: A Multi-Modal System for Financial Transaction. In 6th Conference on Content-Based Multimedia Information Access (RIAO’00), 2000.
B. Balentine and D. Morgan. How to Build a Speech Recognition Application. Enterprise Integration Group, San Ramon, California, 1999.
J. Bear, J. Dowding, P. Price, and E. E. Schribeg. Labeling conventions for notating gramatical repairs in speech. 1992.
C. Bennet and A. I. Rudnicky. The Carnegie Mellon Communicator Corpus. In Proceedings of ICSLP 2002, pages 341–344, Denver, Colarado, 2002.
D. M. Bikel, S. Miller, R. Schwartz, and R. Weischedel. Nymble: a High-Peformance Learning Name-finder. In ANLP-97. ANLP, 1997.
R. Bod. Spoken Dialogue Interpretation with the DOP Model. In COLING-ACL’98: 17th International Conference on Computational Linguistics/36th Annual Meeting, 1998.
D. Boden and D. H. Zimmerman. Talk and Social Structure, chapter Structure in Action, pages 3 – 21. Polity Press, Cambridge and University of California Press, 1991.
D. Caplan and G. Waters. Verbal working memory and sentence comprehension. Behavior and Brain Sciences, 22(1):77–126, 1999.
D. Chan. Creating a voice portal in the university dmain. Honours Thesis, Macquarie University, December 2003.
H. H. Clark and E. F. Schaefer. Arenas of Language Use, chapter Contributing to Discourse., pages 144–175. University of Chicago and CSLI, 1992.
W. Cohen. Learning trees and rules with set-valued featurers. In AAAI-96, 1996.
N. Dahlback. Representation of Discourse. Department of Communication Studies and Department of Computers and Information Science, Linkoping University, 1991.
N. Dalhback, A. Jonsson, and L. Ahrenberg. Wizard of Oz studies - Why and How. In Proceedings of the 1993 International Workshop on Intelligent User Interfaces, pages 193– 200, 1993. 72
M. Danieli. On the use of expectations for detecting and repairing human-machine miscommunication. Computational Linguistics, 13:11, 1996.
DRI. Discourse resource initiative home page. http://www.georgetown .edu/faculty/luperfos/Discourse-Treebank/dri-home.html, April 2003.
S. Furui, K. Maekawa, H. Ishara, T. Shinozaki, and T. Ohdaira. Towards the realization of spontanous speech recognition. In ICSLP, volume 3, pages 518 – 521, 2000.
L. Gillick, Y. Ito, L. Manganaro, M. Newman, F. Scattone, S. Wegmann, J. Yamron, and P. Zhan. Dragon Systems’ Automatic Transcription of the New TDT Corpus. Technical report, DARPA, 1998.
F. Goldman-Eisler. Psycholinguistics: Experiments in Spontaneous Speech. Academic Press, New York., 1968.
D. A. Good and B. Butterworth. Temporal Variables in Speech, chapter Hesitancy as a conversational resource: Some methodological implications. Mouton, The Hague, 1980.
S. Greenberg, S. Chang, and J. Hollenback. An introduction to the Diagnostic Evaluation of Switchboard-Corpus Automatic Speech Recognition Systems. In NIST Speech Transcription Workshop, College Park, MD. NIST, May 2000.
H. P. Grice. Syntax and Semantics, volume 3, pages 43 – 48. Academic Press, New York, 1975.
R. Grishman. Information extraction and speech recognition. Technical report, Computer Science Department, New York University, 1998.
B. J. Grosz and C. L. Sidner. Intentions in Communication, chapter Plans for discourse, pages 417 – 444. MIT Press, Cambridge, MA, 1990.
D. Gusfield. Algorithms on Strings, Trees, and Sequences. Cambridge University Press, 1999. ISBN 0 521 58519 8 hardback.
B. Hansen, D. G. Novick, and S. Sutton. Prevention and repair of breakdowns in a simple task domain. In AAAI1996 Workshop on Detecting, Repairing and Preventing Human-Machine Miscommunication, 1996.
J. D. Harnsberger and L. A. Goshert. Reduced, citation, and hyperarticulated speech in the Laboratories: Some acoustic analyses. Progress Report 24, Speech Research Laboratories, Department of Psychology, Indiana University, Bloomington, IN 4740in and 5, 2000.
T. J. Hazen, S. Seneff, and J. Polifroni. Recognition confidence scoring and its use in speech understanding systems. Computer Speech and Language, 16:49–67, 2002.
P. A. Heeman and J. F. Allen. Robustness in Language and Speech Technology, chapter Improving Robustness by Modelling Spontaneous Speech Events, pages 123–152. Kluwer Academic Publishers, Dordrecht, 2001.
J. Hirschberg, D. Litman, and M. Swerts. Prosodic cues to recognition errors. In ASRU, 1999. 73
J. Hirschberg, D. Litman, and M. Swerts. Generalizing prosodic prediction of speech recognition errorsf. In International Conference on Spoken Language Processing (ICSLP). ICSLP, September 2000.
E. Horvitz and T. Paek. Harnessing Models of Users’ Goals to Mediate Clarification Dialog in Spoken Language Systems. In Eighth Conference on User Modeling, Sonthofen, Germany, July 2001.
X. Huang, A. Acero, and H.-W. Hon. Spoken Language Processing: A Guide to Theory, Algorithm, and System Development. Prentice Hall PTR, Upper Saddle River, New Jersey, 2001.
P. W. Jordan and R. H. Thomason. Refining the categories of miscommunication. In AAAI- 96, Portland, 1996.
D. Jurafsky, L. Shriberg, and D. Biasca. Switchboard swbd-damsl shallow-discourse-function annotation coders manual, draft 13. ???, August 1997.
C.M. Karat, C. Halverson, D. Horn, and J. Karat. Patterns of entry and correction in large vocabulary continuous speech recognition systems. In CHI’99, Pittsburg, 1999.
M. Kay, J. M. Gawron, and P. Norvih. Verbmobil: A translation system for face-to-face dialog. Lecture Notes, Centre for the Study of Language and Information, 1994.
M. Klein. Dialogue act markup. Technical report, MATE, April 1999a.
M. Klein. Proposed structure of coding book. Technical report, MATE, March 1999b.
S. Larsson. Coding schemas for dialogue moves. Technical report, Department of Linguistics, Goteborg University, 1998.
N. Lesh, C. Rich, and C. L. Sidner. Using plan recognition in human-computer collaboration. In 7th Int. Conf. on User Modeling, pages 3 – 32, 1999.
W. J. M. Levelt. Monitoring and self-repair in speech. Cognition, 14:41 – 104, 1983.
V. I. Levenstein. Binary codes capable of correcting insertions and reversals. Sov. Phys. Dokl, 10:707 – 710, 1966.
G.A. Levow. Characterizing and recognizing spoken corrections in human-computer dialogue. In COLING-ACL ’98. COLING-ACL, 1998.
G.A. Levow. Understanding recognition failurers in spoken corrections in human-computer dialogue. In ESCA Workshop on Dialogue and Prosody, Eindhoven, Netherlands, 1999. ESCA.
L. Libuda. Improving clarification dialogs in speech command systems with the help of user modelling: A conceptualization for an in-car user interface. ABIS-Workshop 2001, 2001.
R. Lickley, D. McKelvie, and E. G. Bard. Comparing human and automatic speech recognition using word gating. In Sattelite Meeting onDisfluency in Spontaneous Speech. ICPhS99, 1999. 74
Y. Liu, E. Schriberg, and A. Stolcke. Automatic disfluency indentification in conversational speech using multple knowledge systems. Eurospeech, 2003. to appear.
K. E. Lochbaum. The need for intentionally-based approaches to language. Technical report, Aiken Computational Lab, Harvard, 1993.
MADCOW. Multi-site data collection for a spoken language corpus. In DARPA Speech and Natural Language Workshop. DARPA, 1992.
MATE. Mate home page. http://www.ims.uni-stuttgart.de/projekte/mate/, May 1998.
S.W. McRoy and G. Hirst. The repair of speech act misunderstanding by abductive inference. Computation Linguistics, 21(4):435–478, 1995.
H. Nanjo, A. Lee, and T. Kawahara. Automatic diagnosis of recognition errors in large vocabulary continuous speech recognition systems. In ICSLP, volume 2, pages 1027 – 1030, 2000.
S. Oviatt. The cham model of hyperarticulate adaptation during human-computer error resolution. In International Conference on Spoken Language Processing, Sydney,Australia, 1998.
S. Oviatt, M. MacEachern, and G.-A. Levow. Prediciting hyperarticulate speech during juman-computer error resolution. Speech Communication, 24(2):87 – 110, 1998.
S. L. Oviatt. Predicting spoken disfluencies during human-computer interaction. Computer Speech and Language, 9:19 – 35, 1995.
V. Pallotta. Cognitive Language Engineering: Towards Robust Human-Computer Interaction. PhD thesis, Swiss Federal Institute of Technology, Lausanne, 2002.
D. D. Palmer and M. Ostendorf. Improving information extraction by modeling errors in asr output. In the Human Language Technology Workshop, pages 156 – 160, March 2001.
D. Perlis and K. Purang. Conversational adequacy: Mistakes are the essence. Technical report, Department of Computer Science, University of Maryland, 1996.
K. Purang. Systems that detect and repair their own mistakes. PhD thesis, University of Maryland, 2001.
C. Rich and C. L. Sidner. Collagen: A collaborative manager for software interface agents. Technical report, Mitusbishi, Massachuesetts, 1998.
E. Ringger. Correcting Speech Recongition Errors. PhD thesis, University of Rochester, 2000.
E. Rosch. Cognition and Categorization, chapter Principals of Catagorization, pages 28–46. Lawrence Erlbaum Associates, Hillsdale, N.J., 1978.
H. Sacks, E. Schegloff, and G. Jefferson. A simple system for the organization of turn-taking in conversation. Language, 50(4):696–735, 1974.
H. Sacks, E. Schegloff, and G. Jefferson. The preference for self-correction in the organization of repair in conversation. Language, 53(2):361 – 382, 1977. 75
R. San-Segundo, B. Pellom, and W. Ward. Confidence measures for dialogue management in the CU Communicator System. In ICASSP’2000, 2000.
J. R. Searle. Speech Acts. Cambridge University Press, Cambridge, 1969.
J. R. Searle. Expression and Meaning. Cambridge University Press, Cambridge, 1979.
T. Shinozaki and S. Furui. An assessment of automatic speech recognition techniques for spontaneous speech in comparison with human performance. In SSPR 2003, pages 95–98, 2003.
E. Shriberg, J. Bear, and J. Dowding. Automatic detection and correction of repairs in humancomputer dialog. In M. Marcus, editor, DARPA Speech and Natural Language Workshop, 1992, pages 419–424. DARPA, Harriman, New York, 1992.
H. Soltan and A. Waibel. On the influence of hyperarticulated speech on the recognition performance. In International Conference on Spoken Language Processing, Sydney, Australia, 1998.
L. J. Stifelman. User repairs of speech recognition errors: An intonational analysis. Technical report, Speech research Group, MIT Media Laboratory, May 1993.
B. Suhm, B. Myers, and A. Waibel. Interactive recovery from speech recognition errors in speech user interfaces. In ICSLP 96, October 1996. TRAINS. Trains home page. http://www.cs.rochester.edu/research/trains/, April 2003.
D. R. Traum. A Computational Theory of Grounding in Natural Laguage Conversation. PhD thesis, University of Rochester, New York, 1994.
W. Wahlster, editor. Verbmobil: Foundations of Speech-to-Speech Translation. Springer- Verlag, Berlin, 2000.
D. A. G. Williams. Knowing What You Don’t Know: Roles for Confidence Measures in Automatic Speech Recognition. PhD thesis, Department of Compuer Science, University of Sheffield, 1999.
L. Wittgenstein. The Blue and Brown Books. Basil Blackwell, Oxford, 1969.
N. Yankelovich, G.-A. Levow, and M. Mars. Designing SpeechActs: Issues in Speech User Interfaces. In CHI’95 Porceedings Papars, 1995.
R. Zhang and A. I. Rudnicky. Word level confidence annotation using combinations of features. In Proceedings of Eurospeech 2001, pages 2105–2108, Aalborg, Denmark, 2001.
E. Zoltan-Ford. How to get people to say and type what computers can understand. International Journal of Man-Machine Studies, 34(4):527–547, 1991. 76