KazBERT: A Custom BERT Model for the Kazakh Language
Published in Preprint, 2025
Recommended citation: Gainulla, Y. (2025). "KazBERT: A Custom BERT Model for the Kazakh Language."
KazBERT is a BERT-based model specifically designed and fine-tuned for Kazakh language tasks. The model is trained using Masked Language Modeling (MLM) on a multilingual text corpus containing Kazakh, Russian, and English texts.
Key Features
- Custom tokenizer optimized for Kazakh language
- Trained on diverse Kazakh text corpus
- Supports downstream NLP tasks for Kazakh
