Optical character recognition for historical scripts like Old Turkic runiform script poses significant challenges due to the need for abundant annotated data and varying writing styles, materials, and degradations. The paper proposes a novel data synthesis pipeline that augments parametric generation with 3D rendering to build realistic and diverse training data for Old Turkic runiform script grapheme classification. Our approach synthesizes distance field variations of graphemes, applies parametric randomization, and renders them in simulated 3D scenes with varying textures, lighting, and environments. We train a Vision Transformer model on the synthesized data and evaluate its performance on the Kül Tegin inscription photographs. Experimental results demonstrate the effectiveness of our approach, with the model achieving high accuracy without seeing any real-world data during training. We finally discuss avenues for future research. Our work provides a promising direction to overcome data scarcity in Old Turkic runiform script.
optical character recognition Old Turkic runiform script data synthesis
Optical character recognition for historical scripts like Old Turkic runiform script poses significant challenges due to the need for abundant annotated data and varying writing styles, materials, and degradations. The paper proposes a novel data synthesis pipeline that augments parametric generation with 3D rendering to build realistic and diverse training data for Old Turkic runiform script grapheme classification. Our approach synthesizes distance field variations of graphemes, applies parametric randomization, and renders them in simulated 3D scenes with varying textures, lighting, and environments. We train a Vision Transformer model on the synthesized data and evaluate its performance on the Kül Tegin inscription photographs. Experimental results demonstrate the effectiveness of our approach, with the model achieving high accuracy without seeing any real-world data during training. We finally discuss avenues for future research. Our work provides a promising direction to overcome data scarcity in Old Turkic runiform script.
optical character recognition Old Turkic runiform script data synthesis
Birincil Dil | Türkçe |
---|---|
Konular | Dilbilim (Diğer) |
Bölüm | Articles |
Yazarlar | |
Yayımlanma Tarihi | 21 Temmuz 2024 |
Gönderilme Tarihi | 15 Haziran 2024 |
Kabul Tarihi | 14 Temmuz 2024 |
Yayımlandığı Sayı | Yıl 2024 Cilt: 8 Sayı: 2 |