This article talks about a few neural language models that were developed as improvements over the popular BERT model. The next section provides a short overview of BERT (Devlin et al.) but reading the original paper is recommended along with reading the ‘Attention is all you need’ paper (Vaswani et al.).

In this article, we discuss models that extend the capabilities of BERT by making it more accurate on tasks it was already good at, improving its performance on tasks it wasn’t very good at, and making it lighter and faster.


BERT is an encoder based on the transformer architecture…

Aditya Desai

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store