It seems that the default values of keyword arguments in Huggingface's BERTScore API do not give the best of BERTScore.
idf: By default, it is off. We should probably turn it on. See "Importance Weighting" on page 4 of BERTScore paper However, since we use the same setting for both traditional and new approach, I am not sure whether it matters.
model_type: Default language model is roberta-large when lang=en. According to BERTScore's lead board, other models have higher correlation with human ratings. However, since we use the same language model for both traditional/ref-based and new/DocAsRef approach, I am not sure whether it matters.
use_fast_tokenizer. Default is off. Please turn on to speed up. Huggingface's fast tokenizer is implemented in Rust instead of Python.
@NKWBTB @lihebi Let me know your thoughts.
It seems that the default values of keyword arguments in Huggingface's BERTScore API do not give the best of BERTScore.
idf: By default, it is off. We should probably turn it on. See "Importance Weighting" on page 4 of BERTScore paper However, since we use the same setting for both traditional and new approach, I am not sure whether it matters.model_type: Default language model isroberta-largewhenlang=en. According to BERTScore's lead board, other models have higher correlation with human ratings. However, since we use the same language model for both traditional/ref-based and new/DocAsRef approach, I am not sure whether it matters.use_fast_tokenizer. Default is off. Please turn on to speed up. Huggingface's fast tokenizer is implemented in Rust instead of Python.@NKWBTB @lihebi Let me know your thoughts.