CALL FOR PAPERS HLT/NAACL 2007 Workshop on Confidence Estimation for Natural Language Processing http://cenlp.iltevents.org/ University of Rochester in Rochester, New York (right after HLT/NAACL 2007) April 26, 2007 ------------------------- Description ------------------------- Confidence estimation (CE) is a method for automatically estimating the reliability of output generated by NLP systems. It is useful in scenarios where the underlying systems make errors, especially when human users are involved. This is the case in many areas of NLP, because systems are known to be imperfect. Applications can exploit knowledge of likely mistakes to improve their interactions with users and thereby enhance their overall effectiveness. The workshop aims at bringing researchers from different NLP fields together. We would like to provide a forum for sharing ideas, techniques, and experience on common aspects of CE across different areas of NLP. The status of CE research varies widely: in speech recognition it has been studied extensively, and is now widely used in applications such as dialog systems; but it is nascent in other areas such as information retrieval, machine translation, and question answering. We hope to provide an opportunity for researchers to learn from previous and ongoing work in other fields. The workshop invites contributions dealing with various aspects of CE. A core area is the design of statistical or machine learning frameworks for determining correctness of system output. These frameworks may vary in the level of granularity they target (words, phrases, sentences, etc); in the choice of confidence features; and in the method used to combine information from different features. Features can explore different knowledge sources such as semantic, syntactic, or acoustic properties. They can be derived from the underlying NLP system (eg posterior probabilities or related quantities); or they can be intrinsic to the current input, the current output, or the relation between the two. An interesting question is whether there are common methods that are successful in different areas of NLP or whether each area requires a different approach. Evaluation is also an important issue in CE across the different fields in NLP. It is usually performed using metrics such as classification error rate, normalized cross entropy or ROC curves. However, none of these appears to capture all aspects of CE evaluation. We invite researchers to develop and study new evaluation measures. Reliable evaluation of CE performance is also linked to the problem of calibration and evaluation of predictive uncertainty studied in machine learning and statistics. We therefore invite contributions exploring this relationship, and in particular contributions applying calibration models to NLP systems. In many areas of NLP, an additional question arises: how can we define a gold standard for evaluating CE techniques? In speech recognition, this is relatively straightforward, but in fields like machine translation or summarization, correctness is subjective. It is typically assessed using automatic evaluation metrics that are of questionable reliability. Since many CE systems rely on learning techniques trained on examples of correct and incorrect predictions, we are particularly interested in CE techniques that work well with unreliable or "fuzzy" references. Another focus of the workshop will be the application of CE in different areas of NLP. Possible applications include speech dialogue systems; TransType-style systems in MT; post-editing scenarios; system combination approaches merging the output of several different NLP systems; the use of CE layers as one component of an NLP system; their use in training (statistical) NLP models, eg in semi-supervised approaches; interactive NLP systems; and situations where several NLP tasks are coupled, such as speech translation and information extraction on translated text, or cross-language information retrieval. The workshop invites technical papers related to confidence estimation for natural language processing. Possible topics include, but are not limited to: * CE techniques for different NLP areas * Design and training of CE models; machine learning * Confidence features exploring different knowledge sources * Evaluation and calibration of CE * CE with imprecise or unreliable references * Applications of CE in NLP * Lessons learned from CE systems deployed in applications * Common aspects of CE across different fields in NLP ------------------------- Organizers ------------------------- Nicola Ueffing, George Foster, Cyril Goutte Interactive Language Technologies Group, National Research Council Canada Send inquiries to Nicola.Ueffing@cnrc-nrc.gc.ca ------------------------- Program committee ------------------------- Eugene Agichtein (Emory University, Atlanta) Aron Culotta (U of Massachusetts, Amherst) Chris Drummond (NRC Canada, Ottawa/U of Ottawa) Marcello Federico (ITC-irst, Trento) Simona Gandrabur (Idilia, Montreal) Christian Gollan (RWTH Aachen University) Didier Guillevic (Idilia, Montreal) Philippe Langlais (RALI, U de Montreal) Hermann Ney (RWTH Aachen University) Chris Quirk (Microsoft Research, Redmond) Carl Edward Rasmussen (Max Planck Institute, Tübingen) Alberto Sanchis (U of Valencia) Michel Simard (NRC Canada, Gatineau) Enrique Vidal (U of Valencia) Stephan Vogel (CMU, Pittsburgh) ------------------------- Shared task ------------------------- Because we believe this is the first workshop of its kind, and as CE spans several very different NLP applications, we do not propose a shared task for this workshop. However, we intend to devote part of the workshop to this topic. We plan to organize a wrap-up discussion at the end of the workshop to explore the possibility of organizing a shared task in the future. We invite participants to propose a common shared task and ideas on how to design a shared task that would be interesting for people working in different areas of NLP. ------------------------- Submission information ------------------------- Researchers interested in presenting their work at the workshop should prepare a pdf version of their paper, maximum 8 pages (including references) in the two-column format of ACL proceedings. Please prepare your pdf submission using the NAACL/HLT 2007 format. Reviewing will be double-blind. Submissions will be handled electronically. See http://www.cs.rochester.edu/meetings/hlt-naacl07/workshops.shtml for details. ------------------------- Important dates ------------------------- paper submission Jan 18 notification Feb 22 camera-ready papers Mar 01 workshop Apr 26