Evaluation of various BERT algorithms

evaluated 5 BERT models across 8 GLUE tasks, identifying BERT Uncased as the top performer

Abstract

BERT (Bidirectional Encoder Representations from Transformers) is one of the most popular Deep Learning techniques that gained popularity after the Transformers. There are multiple versions of BERT models, each designed to address various shortcomings. In this paper, we will compare five different BERT models on the GLUE (General Language Understanding Evaluation) dataset. This comparison will provide us with some insights into practical implementations of these algorithms in specific problems. Index Terms—BERT, GLUE, Transformers, Deep Learning