BERT or Transformer?

Hi Kevin, 

Thank you for the impressive work!

In section 3.3, it says "a vanilla bidirectional transformer architecture" is adopted with a citation to the origin transformer paper. Also in Appendix C2, "an auto-regressive transformer" is used as a baseline.

I am quite confused since it looks like the implementation uses a BERT architecture (for both the main model and the autoregressive baseline). I am wondering whether the implementation or the preprint has been updated.

Best,

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BERT or Transformer? #22

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

BERT or Transformer? #22

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions