BERTScore

It seems that the default values of keyword arguments in Huggingface's BERTScore API do not give the best of BERTScore. 

1. `idf`: By default, [it is off](https://github.com/huggingface/evaluate/blob/a5d8b740782ec184bd57fe9181bb97c7d7928d04/metrics/bertscore/bertscore.py#L129). We should probably turn it on. See "Importance Weighting" on page 4 of [BERTScore paper ](https://arxiv.org/pdf/1904.09675.pdf) However, since we use the same setting for both traditional and new approach, I am not sure whether it matters. 
2. `model_type`: Default language model is `roberta-large` when `lang=en`. According to [BERTScore's lead board](https://docs.google.com/spreadsheets/d/1RKOVpselB98Nnh_EOC4A2BYn8_201tmPODpNWu4w7xI/edit#gid=0), other models have higher correlation with human ratings. However, since we use the same language model for both traditional/ref-based and new/DocAsRef approach, I am not sure whether it matters. 
3. `use_fast_tokenizer`. Default is off. Please turn on to speed up. Huggingface's fast tokenizer is implemented in Rust instead of Python. 

@NKWBTB @lihebi  Let me know your thoughts. 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BERTScore #3

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

BERTScore #3

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions