Show simple item record

dc.contributor.authorDolek, Ishak
dc.contributor.authorKURT, ATAKAN
dc.date.accessioned2022-07-04T14:36:56Z
dc.date.available2022-07-04T14:36:56Z
dc.identifier.citationDolek I., KURT A., "A deep learning model for Ottoman OCR", CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2022
dc.identifier.issn1532-0626
dc.identifier.othervv_1032021
dc.identifier.otherav_8292a5b7-6568-472c-a66a-0acd459d8d21
dc.identifier.urihttp://hdl.handle.net/20.500.12627/183522
dc.identifier.urihttps://doi.org/10.1002/cpe.6937
dc.description.abstractThe Ottoman OCR is an open problem because the OCR models for Arabic do not perform well on Ottoman. The models specifically trained with Ottoman documents have not produced satisfactory results either. We present a deep learning model and an OCR tool using that model for the OCR of printed Ottoman documents in the naksh font. We propose an end-to-end trainable CRNN architecture consisting of CNN, RNN (LSTM), and CTC layers for the Ottoman OCR problem. An experimental comparison of this model, called , with the Tesseract Arabic, the Tesseract Persian, Abby Finereader, Miletos, and Google Docs OCR tools or models was performed using a test data set of 21 pages of original documents. With 88.86% raw text, 96.12% normalized text, and 97.37% joined text character recognition accuracy, the Hybrid model outperforms the others with a marked difference. Our model outperforms the next best model by a clear margin of 4% which is a significant improvement considering the difficulty of the Ottoman OCR problem, and the huge size of the Ottoman archives to be processed. The hybrid model also achieves 58% word recognition accuracy on normalized text which is the only rate above 50%.
dc.language.isoeng
dc.subjectMühendislik ve Teknoloji
dc.subjectBİLGİSAYAR BİLİMİ, YAZILIM MÜHENDİSLİĞİ
dc.subjectBilgisayar Bilimi
dc.subjectMühendislik, Bilişim ve Teknoloji (ENG)
dc.subjectBİLGİSAYAR BİLİMİ, TEORİ VE YÖNTEM
dc.subjectBilgisayar Bilimleri
dc.subjectBiyoenformatik
dc.subjectVeritabanı ve Veri Yapıları
dc.subjectTheoretical Computer Science
dc.subjectSoftware
dc.subjectGeneral Computer Science
dc.subjectComputer Science (miscellaneous)
dc.subjectComputer Science Applications
dc.subjectPhysical Sciences
dc.titleA deep learning model for Ottoman OCR
dc.typeMakale
dc.relation.journalCONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE
dc.contributor.departmentİstanbul Üniversitesi-Cerrahpaşa , ,
dc.contributor.firstauthorID3407190


Files in this item

FilesSizeFormatView

There are no files associated with this item.

This item appears in the following Collection(s)

Show simple item record