Can AI Save Centuries of Kurdish History?

by Web Fonts August 18th, 2025

This study tackles the challenge of digitizing fragile historical Kurdish publications, which current OCR systems fail to process due to damaged pages, non-standard fonts, and lack of datasets. Using Google’s open-source Tesseract 5.0, researchers built a custom dataset of over 1,200 annotated lines from pre-1950 Kurdish documents provided by the Zheen Center. The adapted Arabic model achieved promising accuracy (84% character recognition), and a user-friendly web app was developed for text extraction. The project highlights the need for larger public datasets and technical innovation to preserve low-resource languages like Kurdish.

Table of Links

Abstract and 1. Introduction

1.1 Printing Press in Iraq and Iraqi Kurdistan

1.2 Challenges in Historical Documents

1.3 Kurdish Language

Abstract

Kurdish libraries have many historical publications that were printed back in the early days when printing devices were brought to Kurdistan. Having a good ...

Copyright of this story solely belongs to hackernoon.com . To see the full text click HERE

Table of Links

Abstract

Share:

More related news