Can AI Save Centuries of Kurdish History?
hackernoon.comThis study tackles the challenge of digitizing fragile historical Kurdish publications, which current OCR systems fail to process due to damaged pages, non-standard fonts, and lack of datasets. Using Google’s open-source Tesseract 5.0, researchers built a custom dataset of over 1,200 annotated lines from pre-1950 Kurdish documents provided by the Zheen Center. The adapted Arabic model achieved promising accuracy (84% character recognition), and a user-friendly web app was developed for text extraction. The project highlights the need for larger public datasets and technical innovation to preserve low-resource languages like Kurdish.
Table of Links
1.1 Printing Press in Iraq and Iraqi Kurdistan
1.2 Challenges in Historical Documents
Abstract
Kurdish libraries have many historical publications that were printed back in the early days when printing devices were brought to Kurdistan. Having a good ...
Copyright of this story solely belongs to hackernoon.com . To see the full text click HERE