Hello! Thank you for providing a simple implementation of so many models.
I have a question regarding the Attention Based MIL.
In the original implementation, the attention scores are computed as
$a_i = \frac{e^{f_i}}{\sum_{j} e^{f_j}}$,
that is, using a (masked) sotfmax on the obtained latent function $f$.
However, in your implementation, you are using a sigmoid:
weights = torch.sigmoid(scores)
which causes the attention values to lose the property of adding up to one. Wouldn't a softmax be more appropiate?
Also, the code of the functions used in this model are not provided:
self.score = MonoAdditiveAttentionScore(D, D)
self.pool = CountMILPool(D)
Could you also give the implementation of this functions?
Thank you!
Hello! Thank you for providing a simple implementation of so many models.
I have a question regarding the Attention Based MIL.
In the original implementation, the attention scores are computed as
$a_i = \frac{e^{f_i}}{\sum_{j} e^{f_j}}$ ,$f$ .
that is, using a (masked) sotfmax on the obtained latent function
However, in your implementation, you are using a sigmoid:
weights = torch.sigmoid(scores)which causes the attention values to lose the property of adding up to one. Wouldn't a softmax be more appropiate?
Also, the code of the functions used in this model are not provided:
Could you also give the implementation of this functions?
Thank you!