This article talks about a few neural language models that were developed as improvements over the popular BERT model. The next section provides a short overview of BERT (Devlin et al.) but reading the original paper is recommended along with reading the ‘Attention is all you need’ paper (Vaswani et al.).

In this article, we discuss models that extend the capabilities of BERT by making it more accurate on tasks it was already good at, improving its performance on tasks it wasn’t very good at, and making it lighter and faster.


BERT is an encoder based on the transformer architecture…

Aditya Desai

