KazBERT: A Custom BERT Model for the Kazakh Language

Published in Preprint, 2025

Recommended citation: Gainulla, Y. (2025). "KazBERT: A Custom BERT Model for the Kazakh Language."

KazBERT is a BERT-based model specifically designed and fine-tuned for Kazakh language tasks. The model is trained using Masked Language Modeling (MLM) on a multilingual text corpus containing Kazakh, Russian, and English texts.

Key Features

  • Custom tokenizer optimized for Kazakh language
  • Trained on diverse Kazakh text corpus
  • Supports downstream NLP tasks for Kazakh

Download paper here

View model on Hugging Face