Evaluation of various BERT algorithms

evaluated 5 BERT models across 8 GLUE tasks, identifying BERT Uncased as the top performer

Abstract

BERT (Bidirectional Encoder Representations from Transformers) is one of the most popular Deep Learning techniques that gained popularity after the Transformers. There are multiple versions of BERT models which are designed for solving various shortcomings. In this paper, we are going to compare five different BERT models on GLUE (General Language Understanding Evaluation benchmark) dataset. This comparison will provide us some insights on practical implementations of these algorithms in specific problems. Index Terms—BERT, GLUE, Transformers, Deep Learning