In the code below, we used two models of quite different capacities: For bert-score and bertscore-sentence-MNLI, we used Roberta-Large, which is about 1.6GB (default for bert-score implemented in HF's evaluate library). But for bertscore-sentence, which is built on top of sentence-bert, we used all-MiniLM-L6-v2, which has only 80MB. So this gives our bertscore-sentence approach a huge disadvantage. Of course, we pick that one to be fast in pilot studies.
https://github.com/SigmaWe/DocAsRef_0/blob/de4de4b4275e661621bebf3b2f92d8676e2f81c2/dar_env.py#L8-L11
I think if we use a large-capacity model for bertscore-sentence, we can further boost our sentence-based pair-wise approach.
There are two directions we can try:
-
A quick one is we just use a larger model trained by sentence-bert project. Let's try two all-mpnet-base-v2 and all-roberta-large-v1. The former one is still much smaller than Roberta-large but has higher scores according to sentence-bert leader board while the latter one is just RoBERTa-large but trained using sentence-bert's dot-product loss. Thus let's test both of these two versions below:
sent_embedder = sentence_transformers.SentenceTransformer("all-mpnet-base-v2")
sent_embedder = sentence_transformers.SentenceTransformer("all-roberta-large-v1")
BTW, we can use HF's transformers library for Sentence-Bert as well. In this way, we don't have importing bothtransformers and sentence_transformers. We can consolidate all code under one framework.
-
A slower but completely fair approach: we also use RoBERTa-large (generally trained, not on MNLI) to embed the sentence and extract the embedding corresponding to the [CLS] token. For how to do it, see here.
In the code below, we used two models of quite different capacities: For bert-score and bertscore-sentence-MNLI, we used Roberta-Large, which is about 1.6GB (default for bert-score implemented in HF's
evaluatelibrary). But for bertscore-sentence, which is built on top of sentence-bert, we used all-MiniLM-L6-v2, which has only 80MB. So this gives our bertscore-sentence approach a huge disadvantage. Of course, we pick that one to be fast in pilot studies.https://github.com/SigmaWe/DocAsRef_0/blob/de4de4b4275e661621bebf3b2f92d8676e2f81c2/dar_env.py#L8-L11
I think if we use a large-capacity model for bertscore-sentence, we can further boost our sentence-based pair-wise approach.
There are two directions we can try:
A quick one is we just use a larger model trained by sentence-bert project. Let's try two
all-mpnet-base-v2andall-roberta-large-v1. The former one is still much smaller than Roberta-large but has higher scores according to sentence-bert leader board while the latter one is just RoBERTa-large but trained using sentence-bert's dot-product loss. Thus let's test both of these two versions below:BTW, we can use HF's
transformerslibrary for Sentence-Bert as well. In this way, we don't have importing bothtransformersandsentence_transformers. We can consolidate all code under one framework.A slower but completely fair approach: we also use RoBERTa-large (generally trained, not on MNLI) to embed the sentence and extract the embedding corresponding to the [CLS] token. For how to do it, see here.