


default search action
1st Eval4NLP 2020: Online
- Steffen Eger, Yang Gao, Maxime Peyrard, Wei Zhao, Eduard H. Hovy:

Proceedings of the First Workshop on Evaluation and Comparison of NLP Systems, Eval4NLP 2020, Online, November 20, 2020. Association for Computational Linguistics 2020, ISBN 978-1-952148-82-8 - Klaus-Michael Lux

, Maya Sappelli
, Martha A. Larson:
Truth or Error? Towards systematic analysis of factual errors in abstractive summaries. 1-10 - Oleg V. Vasilyev, Vedant Dharnidharka, John Bohannon:

Fill in the BLANC: Human-free quality estimation of document summaries. 11-20 - João Sedoc

, Lyle H. Ungar:
Item Response Theory for Efficient Human Evaluation of Chatbots. 21-33 - Hwanhee Lee, Seunghyun Yoon, Franck Dernoncourt, Doo Soon Kim, Trung Bui, Kyomin Jung:

ViLBERTScore: Evaluating Image Caption Using Vision-and-Language BERT. 34-39 - Kawin Ethayarajh, Dorsa Sadigh:

BLEU Neighbors: A Reference-less Approach to Automatic Evaluation. 40-50 - Xi Chen, Nan Ding, Tomer Levinboim, Radu Soricut:

Improving Text Generation Evaluation with Batch Centering and Tempered Word Mover Distance. 51-59 - Jacob Bremerman

, Huda Khayrallah, Douglas W. Oard, Matt Post:
On the Evaluation of Machine Translation n-best Lists. 60-68 - Rahul Jha, Keping Bi, Yang Li, Mahdi Pakdaman, Asli Celikyilmaz, Ivan Zhiboedov, Kieran McDonald:

Artemis: A Novel Annotation Methodology for Indicative Single Document Summarization. 69-78 - Reda Yacouby, Dustin Axman:

Probabilistic Extension of Precision, Recall, and F1 Score for More Thorough Evaluation of Classification Models. 79-91 - Adam Poliak:

A survey on Recognizing Textual Entailment as an NLP Evaluation. 92-109 - Jingcheng Niu, Gerald Penn:

Grammaticality and Language Modelling. 110-119 - Jesper Brink Andersen, Mikkel Bak Bertelsen, Mikkel Hørby Schou, Manuel R. Ciosici, Ira Assent:

One of these words is not like the other: a reproduction of outlier identification using non-contextual word representations. 120-130 - Shiran Dudy, Steven Bedrick:

Are Some Words Worth More than Others? 131-142 - Kiril Gashteovski, Rainer Gemulla

, Bhushan Kotnis, Sven Hertling
, Christian Meilicke
:
On Aligning OpenIE Extractions with Knowledge Bases: A Case Study. 143-154 - Hanna Wecker, Annemarie Friedrich

, Heike Adel:
ClusterDataSplit: Exploring Challenging Clustering-Based Data Splits for Model Performance Evaluation. 155-163 - Neslihan Iskender, Tim Polzehl, Sebastian Möller:

Best Practices for Crowd-based Evaluation of German Summarization: Comparing Crowd, Expert and Automatic Evaluation. 164-175 - Nathan Stringham, Mike Izbicki:

Evaluating Word Embeddings on Low-Resource Languages. 176-186

manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.


Google
Google Scholar
Semantic Scholar
Internet Archive Scholar
CiteSeerX
ORCID














