MDPI - Publisher of Open Access Journals

15 pages, 1506 KB

Open AccessProceeding Paper

Artificial Intelligence for Historical Manuscripts Digitization: Leveraging the Lexicon of Cyril

by Stavros N. Moutsis, Despoina Ioakeimidou, Konstantinos A. Tsintotas, Konstantinos Evangelidis, Panagiotis E. Nastou and Antonis Tsolomitis

Eng. Proc. 2025, 107(1), 8; https://doi.org/10.3390/engproc2025107008 - 21 Aug 2025

Viewed by 271

Abstract

Artificial intelligence (AI) is a cutting-edge and revolutionary technology in computer science that has the potential to completely transform a wide range of disciplines, including the social sciences, the arts, and the humanities. Therefore, since its significance has been recognized in engineering and [...] Read more.

Artificial intelligence (AI) is a cutting-edge and revolutionary technology in computer science that has the potential to completely transform a wide range of disciplines, including the social sciences, the arts, and the humanities. Therefore, since its significance has been recognized in engineering and medicine, history, literature, paleography, and archaeology have recently embraced AI as new opportunities have arisen for preserving ancient manuscripts. Acknowledging the importance of digitizing archival documents, this paper explores the use of advanced technologies during this process, showing how these are employed at each stage and how the unique challenges inherent in past scripts are addressed. Our study is based on Cyril’s Lexicon, a Byzantine-era dictionary of great historical and linguistic significance in Greek territory. Full article

(This article belongs to the Proceedings of The 7th International Global Conference Series on ICT Integration in Technical Education & Smart Society)

► Show Figures

Figure 1

25 pages, 30383 KB

Open AccessArticle

Multimodal Handwritten Exam Text Recognition Based on Deep Learning

by Hua Shi, Zhenhui Zhu, Chenxue Zhang, Xiaozhou Feng and Yonghang Wang

Appl. Sci. 2025, 15(16), 8881; https://doi.org/10.3390/app15168881 - 12 Aug 2025

Viewed by 362

Abstract

To address the complex challenge of recognizing mixed handwritten text in practical scenarios such as examination papers and to overcome the limitations of existing methods that typically focus on a single category, this paper proposes MHTR, a Multimodal Handwritten Text Adaptive Recognition algorithm. [...] Read more.

To address the complex challenge of recognizing mixed handwritten text in practical scenarios such as examination papers and to overcome the limitations of existing methods that typically focus on a single category, this paper proposes MHTR, a Multimodal Handwritten Text Adaptive Recognition algorithm. The framework comprises two key components, a Handwritten Character Classification Module and a Handwritten Text Adaptive Recognition Module, which work in conjunction. The classification module performs fine-grained analysis of the input image, identifying different types of handwritten content such as Chinese characters, digits, and mathematical formula. Based on these results, the recognition module dynamically selects specialized sub-networks tailored to each category, thereby enhancing recognition accuracy. To further reduce errors caused by similar character shapes and diverse handwriting styles, a Context-aware Recognition Optimization Module is introduced. This module captures local semantic and structural information, improving the model’s understanding of character sequences and boosting recognition performance. Recognizing the limitations of existing public handwriting datasets, particularly their lack of diversity in character categories and writing styles, this study constructs a heterogeneous, integrated handwritten text dataset. The dataset combines samples from multiple sources, including Chinese characters, numerals, and mathematical symbols, and features high structural complexity and stylistic variation to better reflect real-world application needs. Experimental results show that MHTR achieves a recognition accuracy of 86.63% on the constructed dataset, significantly outperforming existing methods. Furthermore, the context-aware optimization module demonstrates strong adaptive correction capabilities in various misrecognition scenarios, confirming the effectiveness and practicality of the proposed approach for complex, multi-category handwritten text recognition tasks. Full article

► Show Figures

Figure 1

15 pages, 2124 KB

Open AccessArticle

Toward Building a Domain-Based Dataset for Arabic Handwritten Text Recognition

by Khawlah Alhefdhi, Abdulmalik Alsalman and Safi Faizullah

Electronics 2025, 14(12), 2461; https://doi.org/10.3390/electronics14122461 - 17 Jun 2025

Viewed by 852

Abstract

The problem of automatic recognition of handwritten text has recently been widely discussed in the research community. Handwritten text recognition is considered a challenging task for cursive scripts, such as Arabic-language scripts, due to their complex properties. Although the demand for automatic text [...] Read more.

The problem of automatic recognition of handwritten text has recently been widely discussed in the research community. Handwritten text recognition is considered a challenging task for cursive scripts, such as Arabic-language scripts, due to their complex properties. Although the demand for automatic text recognition is growing, especially to assist in digitizing archival documents, limited datasets are available for Arabic handwritten text compared to other languages. In this paper, we present novel work on building the Real Estate and Judicial Documents dataset (REJD dataset), which aims to facilitate the recognition of Arabic text in millions of archived documents. This paper also discusses the use of Optical Character Recognition and deep learning techniques, aiming to serve as the initial version in a series of experiments and enhancements designed to achieve optimal results. Full article

► Show Figures

Figure 1

18 pages, 18456 KB

Open AccessArticle

iForal: Automated Handwritten Text Transcription for Historical Medieval Manuscripts

by Alexandre Matos, Pedro Almeida, Paulo L. Correia and Osvaldo Pacheco

J. Imaging 2025, 11(2), 36; https://doi.org/10.3390/jimaging11020036 - 25 Jan 2025

Cited by 1 | Viewed by 2057

Abstract

The transcription of historical manuscripts aims at making our cultural heritage more accessible to experts and also to the larger public, but it is a challenging and time-intensive task. This paper contributes an automated solution for text layout recognition, segmentation, and recognition to [...] Read more.

The transcription of historical manuscripts aims at making our cultural heritage more accessible to experts and also to the larger public, but it is a challenging and time-intensive task. This paper contributes an automated solution for text layout recognition, segmentation, and recognition to speed up the transcription process of historical manuscripts. The focus is on transcribing Portuguese municipal documents from the Middle Ages in the context of the iForal project, including the contribution of an annotated dataset containing Portuguese medieval documents, notably a corpus of 67 Portuguese royal charter data. The proposed system can accurately identify document layouts, isolate the text, segment, and transcribe it. Results for the layout recognition model achieved 0.98 mAP@0.50 and 0.98 precision, while the text segmentation model achieved 0.91 mAP@0.50, detecting 95% of the lines. The text recognition model achieved 8.1% character error rate (CER) and 25.5% word error rate (WER) on the test set. These results can then be validated by palaeographers with less effort, contributing to achieving high-quality transcriptions faster. Moreover, the automatic models developed can be utilized as a basis for the creation of models that perform well for other historical handwriting styles, notably using transfer learning techniques. The contributed dataset has been made available on the HTR United catalogue, which includes training datasets to be used for automatic transcription or segmentation models. The models developed can be used, for instance, on the eSriptorium platform, which is used by a vast community of experts. Full article

(This article belongs to the Section Document Analysis and Processing)

► Show Figures

Figure 1

18 pages, 4420 KB

Open AccessArticle

Machine Learning Approach for Arabic Handwritten Recognition

by A. M. Mutawa, Mohammad Y. Allaho and Monirah Al-Hajeri

Appl. Sci. 2024, 14(19), 9020; https://doi.org/10.3390/app14199020 - 6 Oct 2024

Cited by 3 | Viewed by 4136

Abstract

Text recognition is an important area of the pattern recognition field. Natural language processing (NLP) and pattern recognition have been utilized efficiently in script recognition. Much research has been conducted on handwritten script recognition. However, the research on the Arabic language for handwritten [...] Read more.

Text recognition is an important area of the pattern recognition field. Natural language processing (NLP) and pattern recognition have been utilized efficiently in script recognition. Much research has been conducted on handwritten script recognition. However, the research on the Arabic language for handwritten text recognition received little attention compared with other languages. Therefore, it is crucial to develop a new model that can recognize Arabic handwritten text. Most of the existing models used to acknowledge Arabic text are based on traditional machine learning techniques. Therefore, we implemented a new model using deep machine learning techniques by integrating two deep neural networks. In the new model, the architecture of the Residual Network (ResNet) model is used to extract features from raw images. Then, the Bidirectional Long Short-Term Memory (BiLSTM) and connectionist temporal classification (CTC) are used for sequence modeling. Our system improved the recognition rate of Arabic handwritten text compared to other models of a similar type with a character error rate of 13.2% and word error rate of 27.31%. In conclusion, the domain of Arabic handwritten recognition is advancing swiftly with the use of sophisticated deep learning methods. Full article

(This article belongs to the Special Issue Applied Intelligence in Natural Language Processing)

► Show Figures

Figure 1

19 pages, 3640 KB

Open AccessArticle

Recognition of Chinese Electronic Medical Records for Rehabilitation Robots: Information Fusion Classification Strategy

by Jiawei Chu, Xiu Kan, Yan Che, Wanqing Song, Kudreyko Aleksey and Zhengyuan Dong

Sensors 2024, 24(17), 5624; https://doi.org/10.3390/s24175624 - 30 Aug 2024

Viewed by 1874

Abstract

Named entity recognition is a critical task in the electronic medical record management system for rehabilitation robots. Handwritten documents often contain spelling errors and illegible handwriting, and healthcare professionals frequently use different terminologies. These issues adversely affect the robot’s judgment and precise operations. [...] Read more.

Named entity recognition is a critical task in the electronic medical record management system for rehabilitation robots. Handwritten documents often contain spelling errors and illegible handwriting, and healthcare professionals frequently use different terminologies. These issues adversely affect the robot’s judgment and precise operations. Additionally, the same entity can have different meanings in various contexts, leading to category inconsistencies, which further increase the system’s complexity. To address these challenges, a novel medical entity recognition algorithm for Chinese electronic medical records is developed to enhance the processing and understanding capabilities of rehabilitation robots for patient data. This algorithm is based on a fusion classification strategy. Specifically, a preprocessing strategy is proposed according to clinical medical knowledge, which includes redefining entities, removing outliers, and eliminating invalid characters. Subsequently, a medical entity recognition model is developed to identify Chinese electronic medical records, thereby enhancing the data analysis capabilities of rehabilitation robots. To extract semantic information, the ALBERT network is utilized, and BILSTM and MHA networks are combined to capture the dependency relationships between words, overcoming the problem of different meanings for the same entity in different contexts. The CRF network is employed to determine the boundaries of different entities. The research results indicate that the proposed model significantly enhances the recognition accuracy of electronic medical texts by rehabilitation robots, particularly in accurately identifying entities and handling terminology diversity and contextual differences. This model effectively addresses the key challenges faced by rehabilitation robots in processing Chinese electronic medical texts, and holds important theoretical and practical value. Full article

(This article belongs to the Special Issue Dynamics and Control System Design for Robot Manipulation)

► Show Figures

Figure 1

16 pages, 5331 KB

Open AccessArticle

A Gateway API-Based Data Fusion Architecture for Automated User Interaction with Historical Handwritten Manuscripts

by Christos Spandonidis, Fotis Giannopoulos and Kyriakoula Arvaniti

Heritage 2024, 7(9), 4631-4646; https://doi.org/10.3390/heritage7090218 - 27 Aug 2024

Viewed by 1287

Abstract

To preserve handwritten historical documents, libraries are choosing to digitize them, ensuring their longevity and accessibility. However, the true value of these digitized images lies in their transcription into a textual format. In recent years, various tools have been developed utilizing both traditional [...] Read more.

To preserve handwritten historical documents, libraries are choosing to digitize them, ensuring their longevity and accessibility. However, the true value of these digitized images lies in their transcription into a textual format. In recent years, various tools have been developed utilizing both traditional and AI-based models to address the challenges of deciphering handwritten texts. Despite their importance, there are still several obstacles to overcome, such as the need for scalable and modular solutions, as well as the ability to cater to a continuously growing user community autonomously. This study focuses on introducing a new information fusion architecture, specifically highlighting the Gateway API. Developed as part of the μDoc.tS research program, this architecture aims to convert digital images of manuscripts into electronic text, ensuring secure and efficient routing of requests from front-end applications to the back end of the information system. The validation of this architecture demonstrates its efficiency in handling a large volume of requests and effectively distributing the workload. One significant advantage of this proposed method is its compatibility with everyday devices, eliminating the need for extensive computational infrastructures. It is believed that the scalability and modularity of this architecture can pave the way for a unified multi-platform solution, connecting diverse user environments and databases. Full article

► Show Figures

Figure 1

26 pages, 12966 KB

Open AccessArticle

Optical Medieval Music Recognition—A Complete Pipeline for Historic Chants

by Alexander Hartelt, Tim Eipert and Frank Puppe

Appl. Sci. 2024, 14(16), 7355; https://doi.org/10.3390/app14167355 - 20 Aug 2024

Cited by 3 | Viewed by 1493

Abstract

Manual transcription of music is a tedious work, which can be greatly facilitated by optical music recognition (OMR) software. However, OMR software is error prone in particular for older handwritten documents. This paper introduces and evaluates a pipeline that automates the entire OMR [...] Read more.

Manual transcription of music is a tedious work, which can be greatly facilitated by optical music recognition (OMR) software. However, OMR software is error prone in particular for older handwritten documents. This paper introduces and evaluates a pipeline that automates the entire OMR workflow in the context of the Corpus Monodicum project, enabling the transcription of historical chants. In addition to typical OMR tasks such as staff line detection, layout detection, and symbol recognition, the rarely addressed tasks of text and syllable recognition and assignment of syllables to symbols are tackled. For quantitative and qualitative evaluation, we use documents written in square notation developed in the 11th–12th century, but the methods apply to many other notations as well. Quantitative evaluation measures the number of necessary interventions for correction, which are about 0.4% for layout recognition including the division of text in chants, 2.4% for symbol recognition including pitch and reading order and 2.3% for syllable alignment with correct text and symbols. Qualitative evaluation showed an efficiency gain compared to manual transcription with an elaborate tool by a factor of about 9. In a second use case with printed chants in similar notation from the “Graduale Synopticum”, the evaluation results for symbols are much better except for syllable alignment indicating the difficulty of this task. Full article

(This article belongs to the Special Issue Machine Learning in Audio Signal Processing and Music Information Retrieval)

► Show Figures

Figure 1

15 pages, 5521 KB

Open AccessArticle

A Historical Handwritten French Manuscripts Text Detection Method in Full Pages

by Rui Sang, Shili Zhao, Yan Meng, Mingxian Zhang, Xuefei Li, Huijie Xia and Ran Zhao

Information 2024, 15(8), 483; https://doi.org/10.3390/info15080483 - 14 Aug 2024

Viewed by 1544

Abstract

Historical handwritten manuscripts pose challenges to automated recognition techniques due to their unique handwriting styles and cultural backgrounds. In order to solve the problems of complex text word misdetection, omission, and insufficient detection of wide-pitch curved text, this study proposes a high-precision text [...] Read more.

Historical handwritten manuscripts pose challenges to automated recognition techniques due to their unique handwriting styles and cultural backgrounds. In order to solve the problems of complex text word misdetection, omission, and insufficient detection of wide-pitch curved text, this study proposes a high-precision text detection method based on improved YOLOv8s. Firstly, the Swin Transformer is used to replace C2f at the end of the backbone network to solve the shortcomings of fine-grained information loss and insufficient learning features in text word detection. Secondly, the Dysample (Dynamic Upsampling Operator) method is used to retain more detailed features of the target and overcome the shortcomings of information loss in traditional upsampling to realize the text detection task for dense targets. Then, the LSK (Large Selective Kernel) module is added to the detection head to dynamically adjust the feature extraction receptive field, which solves the cases of extreme aspect ratio words, unfocused small text, and complex shape text in text detection. Finally, in order to overcome the CIOU (Complete Intersection Over Union) loss in target box regression with unclear aspect ratio, insensitive to size change, and insufficient correlation between target coordinates, Gaussian Wasserstein Distance (GWD) is introduced to modify the regression loss to measure the similarity between the two bounding boxes in order to obtain high-quality bounding boxes. Compared with the State-of-the-Art methods, the proposed method achieves optimal performance in text detection, with the precision and mAP@0.5 reaching 86.3% and 82.4%, which are 8.1% and 6.7% higher than the original method, respectively. The advancement of each module is verified by ablation experiments. The experimental results show that the method proposed in this study can effectively realize complex text detection and provide a powerful technical means for historical manuscript reproduction. Full article

► Show Figures

Figure 1

15 pages, 529 KB

Open AccessArticle

A Pix2Pix Architecture for Complete Offline Handwritten Text Normalization

by Alvaro Barreiro-Garrido, Victoria Ruiz-Parrado, A. Belen Moreno and Jose F. Velez

Sensors 2024, 24(12), 3892; https://doi.org/10.3390/s24123892 - 16 Jun 2024

Cited by 2 | Viewed by 2061

Abstract

In the realm of offline handwritten text recognition, numerous normalization algorithms have been developed over the years to serve as preprocessing steps prior to applying automatic recognition models to handwritten text scanned images. These algorithms have demonstrated effectiveness in enhancing the overall performance [...] Read more.

In the realm of offline handwritten text recognition, numerous normalization algorithms have been developed over the years to serve as preprocessing steps prior to applying automatic recognition models to handwritten text scanned images. These algorithms have demonstrated effectiveness in enhancing the overall performance of recognition architectures. However, many of these methods rely heavily on heuristic strategies that are not seamlessly integrated with the recognition architecture itself. This paper introduces the use of a Pix2Pix trainable model, a specific type of conditional generative adversarial network, as the method to normalize handwritten text images. Also, this algorithm can be seamlessly integrated as the initial stage of any deep learning architecture designed for handwritten recognition tasks. All of this facilitates training the normalization and recognition components as a unified whole, while still maintaining some interpretability of each module. Our proposed normalization approach learns from a blend of heuristic transformations applied to text images, aiming to mitigate the impact of intra-personal handwriting variability among different writers. As a result, it achieves slope and slant normalizations, alongside other conventional preprocessing objectives, such as normalizing the size of text ascenders and descenders. We will demonstrate that the proposed architecture replicates, and in certain cases surpasses, the results of a widely used heuristic algorithm across two metrics and when integrated as the first step of a deep recognition architecture. Full article

(This article belongs to the Special Issue Object Recognition with Vision Sensors Based on Machine Learning and Deep Learning)

► Show Figures

Figure 1

18 pages, 24722 KB

Open AccessArticle

Historical Text Line Segmentation Using Deep Learning Algorithms: Mask-RCNN against U-Net Networks

by Florian Côme Fizaine, Patrick Bard, Michel Paindavoine, Cécile Robin, Edouard Bouyé, Raphaël Lefèvre and Annie Vinter

J. Imaging 2024, 10(3), 65; https://doi.org/10.3390/jimaging10030065 - 5 Mar 2024

Cited by 6 | Viewed by 4145

Abstract

Text line segmentation is a necessary preliminary step before most text transcription algorithms are applied. The leading deep learning networks used in this context (ARU-Net, dhSegment, and Doc-UFCN) are based on the U-Net architecture. They are efficient, but fall under the same concept, [...] Read more.

Text line segmentation is a necessary preliminary step before most text transcription algorithms are applied. The leading deep learning networks used in this context (ARU-Net, dhSegment, and Doc-UFCN) are based on the U-Net architecture. They are efficient, but fall under the same concept, requiring a post-processing step to perform instance (e.g., text line) segmentation. In the present work, we test the advantages of Mask-RCNN, which is designed to perform instance segmentation directly. This work is the first to directly compare Mask-RCNN- and U-Net-based networks on text segmentation of historical documents, showing the superiority of the former over the latter. Three studies were conducted, one comparing these networks on different historical databases, another comparing Mask-RCNN with Doc-UFCN on a private historical database, and a third comparing the handwritten text recognition (HTR) performance of the tested networks. The results showed that Mask-RCNN outperformed ARU-Net, dhSegment, and Doc-UFCN using relevant line segmentation metrics, that performance evaluation should not focus on the raw masks generated by the networks, that a light mask processing is an efficient and simple solution to improve evaluation, and that Mask-RCNN leads to better HTR performance. Full article

(This article belongs to the Section Document Analysis and Processing)

► Show Figures

Figure 1

18 pages, 3564 KB

Open AccessArticle

Offline Mongolian Handwriting Recognition Based on Data Augmentation and Improved ECA-Net

by Qing-Dao-Er-Ji Ren, Lele Wang, Zerui Ma and Saheya Barintag

Electronics 2024, 13(5), 835; https://doi.org/10.3390/electronics13050835 - 21 Feb 2024

Cited by 3 | Viewed by 1711

Abstract

Writing is an important carrier of cultural inheritance, and the digitization of handwritten texts is an effective means to protect national culture. Compared to Chinese and English handwriting recognition, the research on Mongolian handwriting recognition started relatively late and achieved few results due [...] Read more.

Writing is an important carrier of cultural inheritance, and the digitization of handwritten texts is an effective means to protect national culture. Compared to Chinese and English handwriting recognition, the research on Mongolian handwriting recognition started relatively late and achieved few results due to the characteristics of the script itself and the lack of corpus. First, according to the characteristics of Mongolian handwritten characters, the random erasing data augmentation algorithm was modified, and a dual data augmentation (DDA) algorithm was proposed by combining the improved algorithm with horizontal wave transformation (HWT) to augment the dataset for training the Mongolian handwriting recognition. Second, the classical CRNN handwriting recognition model was improved. The structure of the encoder and decoder was adjusted according to the characteristics of the Mongolian script, and the attention mechanism was introduced in the feature extraction and decoding stages of the model. An improved handwriting recognition model, named the EGA model, suitable for the features of Mongolian handwriting was suggested. Finally, the effectiveness of the EGA model was verified by a large number of data tests. Experimental results demonstrated that the proposed EGA model improves the recognition accuracy of Mongolian handwriting, and the structural modification of the encoder and coder effectively balances the recognition accuracy and complexity of the model. Full article

(This article belongs to the Special Issue Deep Learning in Image Processing and Pattern Recognition)

► Show Figures

Figure 1

30 pages, 3035 KB

Open AccessReview

Advancements and Challenges in Handwritten Text Recognition: A Comprehensive Survey

by Wissam AlKendi, Franck Gechter, Laurent Heyberger and Christophe Guyeux

J. Imaging 2024, 10(1), 18; https://doi.org/10.3390/jimaging10010018 - 8 Jan 2024

Cited by 24 | Viewed by 13082

Abstract

Handwritten Text Recognition (HTR) is essential for digitizing historical documents in different kinds of archives. In this study, we introduce a hybrid form archive written in French: the Belfort civil registers of births. The digitization of these historical documents is challenging due to [...] Read more.

Handwritten Text Recognition (HTR) is essential for digitizing historical documents in different kinds of archives. In this study, we introduce a hybrid form archive written in French: the Belfort civil registers of births. The digitization of these historical documents is challenging due to their unique characteristics such as writing style variations, overlapped characters and words, and marginal annotations. The objective of this survey paper is to summarize research on handwritten text documents and provide research directions toward effectively transcribing this French dataset. To achieve this goal, we presented a brief survey of several modern and historical HTR offline systems of different international languages, and the top state-of-the-art contributions reported of the French language specifically. The survey classifies the HTR systems based on techniques employed, datasets used, publication years, and the level of recognition. Furthermore, an analysis of the systems’ accuracies is presented, highlighting the best-performing approach. We have also showcased the performance of some HTR commercial systems. In addition, this paper presents a summarization of the HTR datasets that publicly available, especially those identified as benchmark datasets in the International Conference on Document Analysis and Recognition (ICDAR) and the International Conference on Frontiers in Handwriting Recognition (ICFHR) competitions. This paper, therefore, presents updated state-of-the-art research in HTR and highlights new directions in the research field. Full article

(This article belongs to the Section Computer Vision and Pattern Recognition)

► Show Figures

Figure 1

15 pages, 2015 KB

Open AccessArticle

Gated Convolution and Stacked Self-Attention Encoder–Decoder-Based Model for Offline Handwritten Ethiopic Text Recognition

by Direselign Addis Tadesse, Chuan-Ming Liu and Van-Dai Ta

Information 2023, 14(12), 654; https://doi.org/10.3390/info14120654 - 9 Dec 2023

Cited by 1 | Viewed by 2421

Abstract

Offline handwritten text recognition (HTR) is a long-standing research project for a wide range of applications, including assisting visually impaired users, humans and robot interactions, and the automatic entry of business documents. However, due to variations in writing styles, visual similarities between different [...] Read more.

Offline handwritten text recognition (HTR) is a long-standing research project for a wide range of applications, including assisting visually impaired users, humans and robot interactions, and the automatic entry of business documents. However, due to variations in writing styles, visual similarities between different characters, overlap between characters, and source document noise, designing an accurate and flexible HTR system is challenging. The problem becomes serious when the algorithm has a low learning capacity and when the text used is complex and has a lot of characters in the writing system, such as Ethiopic script. In this paper, we propose a new model that recognizes offline handwritten Ethiopic text using a gated convolution and stacked self-attention encoder–decoder network. The proposed model has a feature extraction layer, an encoder layer, and a decoder layer. The feature extraction layer extracts high-dimensional invariant feature maps from the input handwritten image. Using the extracted feature maps, the encoder and decoder layers transcribe the corresponding text. For the training and testing of the proposed model, we prepare an offline handwritten Ethiopic text-line dataset (HETD) with 2800 samples and a handwritten Ethiopic word dataset (HEWD) with 10,540 samples obtained from 250 volunteers. The experiment results of the proposed model on HETD show a 9.17 and 13.11 Character Error Rate (CER) and Word Error Rate (WER), respectively. However, the model on HEWD shows an 8.22 and 9.17 CER and WER, respectively. These results and the prepared datasets will be used as a baseline for future research. Full article

(This article belongs to the Special Issue Intelligent Information Technology)

► Show Figures

Figure 1

13 pages, 1081 KB

Open AccessArticle

Experimenting with Training a Neural Network in Transkribus to Recognise Text in a Multilingual and Multi-Authored Manuscript Collection

by Carlotta Capurro, Vera Provatorova and Evangelos Kanoulas

Heritage 2023, 6(12), 7482-7494; https://doi.org/10.3390/heritage6120392 - 29 Nov 2023

Cited by 7 | Viewed by 3358

Abstract

This work aims at developing an optimal strategy to automatically transcribe a large quantity of uncategorised, digitised archival documents when resources include handwritten text by multiple authors and in several languages. We present a comparative study to establish the efficiency of a single [...] Read more.

This work aims at developing an optimal strategy to automatically transcribe a large quantity of uncategorised, digitised archival documents when resources include handwritten text by multiple authors and in several languages. We present a comparative study to establish the efficiency of a single multilingual handwritten text recognition (HTR) model trained on multiple handwriting styles instead of using a separate model for every language. When successful, this approach allows us to automate the transcription of the archive, reducing manual annotation efforts and facilitating information retrieval. To train the model, we used the material from the personal archive of the Dutch glass artist Sybren Valkema (1916–1996), processing it with Transkribus. Full article

(This article belongs to the Special Issue XR and Artificial Intelligence for Heritage)

► Show Figures

Figure 1

Search Results (48)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (48)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI