|
|
NAACL HLT 2007 Workshop
|
|
|
|
Organizers: |
National Research Council Canada
|
|
|
|
** Latest news (26 Jan 2007) **: We received some good submissions to the workshop. Unfortunately, the number is not high enough to set up a full workshop program. As a consequence we must regretfully cancel the workshop. Thanks to all who have expressed interest, accepted to be part of the committee, and especially to those who submitted their work ! We hope you will be able to present it on another occasion ! Workshop description:Confidence estimation (CE) is a method for automatically estimating the reliability of output generated by NLP systems. It is useful in scenarios where the underlying systems make errors, especially when human users are involved. This is the case in many areas of NLP, because systems are known to be imperfect. Applications can exploit knowledge of likely mistakes to improve their interactions with users and thereby enhance their overall effectiveness. The workshop aims at bringing researchers from different NLP fields together. We would like to provide a forum for sharing ideas, techniques, and experience on common aspects of CE across different areas of NLP. The status of CE research varies widely: in speech recognition it has been studied extensively, and is now widely used in applications such as dialog systems; but it is nascent in other areas such as information retrieval, machine translation, and question answering. We hope to provide an opportunity for researchers to learn from previous and ongoing work in other fields. ModellingThe workshop invites contributions dealing with various aspects of CE. A core area is the design of statistical or machine learning frameworks for determining correctness of system output. These may vary in the level of granularity they target (words, phrases, sentences, etc); in the choice of confidence features; and in the method used to combine information from different features. Features can explore different knowledge sources such as semantic, syntactic, or acoustic properties. They can be derived from the underlying NLP system (eg posterior probabilities or related quantities); or they can be intrinsic to the current input, the current output, or the relation between the two. EvaluationEvaluation is also an important issue in CE. It is usually performed using metrics such as classification error rate, normalized cross entropy or ROC curves. However, none of these appears to capture all aspects of CE evaluation. We invite researchers to develop and study new evaluation measures. Reliable evaluation of CE performance is also linked to the problem of calibration and evaluation of predictive uncertainty studied in machine learning and statistics. We therefore invite contributions exploring this relationship, and in particular contributions applying calibration models to NLP systems. In many areas of NLP, an additional question arises: how can we define a gold standard for evaluating CE techniques? In speech recognition, this is relatively straightforward, but in fields like machine translation or summarization, correctness is subjective. It is typically assessed using automatic evaluation metrics that are of questionable reliability. Since many CE systems rely on learning techniques trained on examples of correct and incorrect predictions, we are particularly interested in CE techniques that work well with unreliable or "fuzzy" references. ApplicationsAnother focus of the workshop will be the application of CE in different areas of NLP. Possible applications include speech dialogue systems; TransType-style systems in MT; post-editing scenarios; system combination approaches merging the output of several different NLP systems; the use of CE layers as one component of an NLP system; their use in training (statistical) NLP models, eg in semi-supervised approaches; interactive NLP systems; and situations where several NLP tasks are coupled, such as speech translation and information extraction on translated text, or cross-language information retrieval. Main topics:The workshop invites technical papers related to confidence estimation for natural language processing. Possible topics include, but are not limited to:
Shared Task:
Because we believe this is the first workshop of its kind,
and as CE spans several very different NLP applications,
we do not propose a shared task for this workshop. Related work:There was a 2003 CLSP workshop on Confidence Estimation for Statistical MT. [Final report (pdf)] John Blatz, Erin Fitzgerald, George Foster, Simona Gandrabur, Cyril Goutte, Alex Kulesza, Alberto Sanchis, and Nicola Ueffing (2004) Confidence Estimation for Machine Translation. In Proceedings of Coling 2004, Geneva, August 2004, pp. 315-321. [pdf] Aron Culotta, Andrew McCallum (2004) Confidence estimation for information extraction, HLT-2004. [pdf] Simona Gandrabur, George Foster, Guy Lapalme (2006) Confidence Estimation for NLP Applications, NRC technical report 48755. [more] Nicola Ueffing, Hermann Ney (2005) Word-Level Confidence Estimation for Machine Translation using Phrase-Based Translation Models. In Proceedings of HLT/EMNLP 2005, Vancouver, Canada, pp. 763-770, October 2005. [pdf] F. Wessel, R. Schlüter, K. Macherey and H. Ney (2001) Confidence measures for large vocabulary continuous speech recognition. IEEE Transactions on Speech and Audio Processing 9(3):288--298. [more]
|
|
||
|
|
|||