Historical Document Processing: Bridging the Gap between Computer Scientists and Humanities Scholars

A special issue of Journal of Imaging (ISSN 2313-433X). This special issue belongs to the section "Document Analysis and Processing".

Deadline for manuscript submissions: closed (31 August 2022) | Viewed by 10457

Special Issue Editors


E-Mail Website
Guest Editor
Department of Electrical and Information Engineering and Applied Mathematics, University of Salerno, 84084 Fisciano, Italy
Interests: neurocomputational models of handwriting learning and execution; handwriting analysis and recognition; neural networks and evolutionary computation; historical document processing
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
Department of Electrical and Information Engineering, University of Cassino and southern Lazio, 03043 Cassino, FR, Italy
Interests: evolutionary computation; machine learning; feature selection; pattern recognition; bayesian networks; cultural heritage
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

Historical documents represent an as-yet unsurpassed source of information for tracking changes in human societies over past centuries. Inventory lists, accounting records, census registry, notary archives, meeting notes, treaties and essays are just a few of the diverse types of documents that have been produced in the past, and from which information has been extracted by scholars in human sciences and economics to draw a picture of the cultural evolution of the mankind.

Artificial intelligence, deep learning and document image analysis research fields have provided methodologies and tools for their storage, access and retrieval, but there are still issues that necessitate better solutions. Moreover, researchers in those fields focus their attention, and thus evaluate the quality of the solutions they elaborate, mostly on technical aspects, and only occasionally consider the perspective of scholars in humanities in both designing and evaluating the performance of the solutions that have been developed.

This Special Issue aims to collect the most recent advances in the field of historical document processing that address the issues outlined above. We request contributions presenting techniques (methods, tools, performance evaluation and analysis) as well as empirical studies that will contribute to the future roadmap of historical document processing, as well as provide guidelines and recommendations for their adoption by humanities scholars.

Prof. Dr. Angelo Marcelli
Prof. Dr. Francesco Fontanella
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Journal of Imaging is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1800 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • document digital image acquisition and storage
  • preprocessing techniques
  • layout analysis
  • handwritten text recognition
  • keyword spotting
  • performance evaluation
  • human–computer interaction
  • user-centered design and evaluation

Published Papers (5 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

19 pages, 21882 KiB  
Article
End-to-End Transcript Alignment of 17th Century Manuscripts: The Case of Moccia Code
by Giuseppe De Gregorio, Giuliana Capriolo and Angelo Marcelli
J. Imaging 2023, 9(1), 17; https://doi.org/10.3390/jimaging9010017 - 13 Jan 2023
Cited by 1 | Viewed by 1377
Abstract
The growth of digital libraries has yielded a large number of handwritten historical documents in the form of images, often accompanied by a digital transcription of the content. The ability to track the position of the words of the digital transcription in the [...] Read more.
The growth of digital libraries has yielded a large number of handwritten historical documents in the form of images, often accompanied by a digital transcription of the content. The ability to track the position of the words of the digital transcription in the images can be important both for the study of the document by humanities scholars and for further automatic processing. We propose a learning-free method for automatically aligning the transcription to the document image. The method receives as input the digital image of the document and the transcription of its content and aims at linking the transcription to the corresponding images within the page at the word level. The method comprises two main original contributions: a line-level segmentation algorithm capable of detecting text lines with curved baseline, and a text-to-image alignment algorithm capable of dealing with under- and over-segmentation errors at the word level. Experiments on pages from a 17th-century Italian manuscript have demonstrated that the line segmentation method allows one to segment 92% of the text line correctly. They also demonstrated that it achieves a correct alignment accuracy greater than 68%. Moreover, the performance achieved on widely used data sets compare favourably with the state of the art. Full article
Show Figures

Figure 1

18 pages, 55042 KiB  
Article
CorDeep and the Sacrobosco Dataset: Detection of Visual Elements in Historical Documents
by Jochen Büttner, Julius Martinetz, Hassan El-Hajj and Matteo Valleriani
J. Imaging 2022, 8(10), 285; https://doi.org/10.3390/jimaging8100285 - 15 Oct 2022
Cited by 4 | Viewed by 2706
Abstract
Recent advances in object detection facilitated by deep learning have led to numerous solutions in a myriad of fields ranging from medical diagnosis to autonomous driving. However, historical research is yet to reap the benefits of such advances. This is generally due to [...] Read more.
Recent advances in object detection facilitated by deep learning have led to numerous solutions in a myriad of fields ranging from medical diagnosis to autonomous driving. However, historical research is yet to reap the benefits of such advances. This is generally due to the low number of large, coherent, and annotated datasets of historical documents, as well as the overwhelming focus on Optical Character Recognition to support the analysis of historical documents. In this paper, we highlight the importance of visual elements, in particular illustrations in historical documents, and offer a public multi-class historical visual element dataset based on the Sphaera corpus. Additionally, we train an image extraction model based on YOLO architecture and publish it through a publicly available web-service to detect and extract multi-class images from historical documents in an effort to bridge the gap between traditional and computational approaches in historical studies. Full article
Show Figures

Figure 1

17 pages, 13100 KiB  
Article
Using Paper Texture for Choosing a Suitable Algorithm for Scanned Document Image Binarization
by Rafael Dueire Lins, Rodrigo Bernardino, Ricardo da Silva Barboza and Raimundo Correa De Oliveira
J. Imaging 2022, 8(10), 272; https://doi.org/10.3390/jimaging8100272 - 5 Oct 2022
Cited by 5 | Viewed by 1540
Abstract
The intrinsic features of documents, such as paper color, texture, aging, translucency, the kind of printing, typing or handwriting, etc., are important with regard to how to process and enhance their image. Image binarization is the process of producing a monochromatic image having [...] Read more.
The intrinsic features of documents, such as paper color, texture, aging, translucency, the kind of printing, typing or handwriting, etc., are important with regard to how to process and enhance their image. Image binarization is the process of producing a monochromatic image having its color version as input. It is a key step in the document processing pipeline. The recent Quality-Time Binarization Competitions for documents have shown that no binarization algorithm is good for any kind of document image. This paper uses a sample of the texture of the scanned historical documents as the main document feature to select which of the 63 widely used algorithms, using five different versions of the input images, totaling 315 document image-binarization schemes, provides a reasonable quality-time trade-off. Full article
Show Figures

Figure 1

11 pages, 1453 KiB  
Article
X-ray Dark-Field Imaging for Improved Contrast in Historical Handwritten Literature
by Bernhard Akstaller, Stephan Schreiner, Lisa Dietrich, Constantin Rauch, Max Schuster, Veronika Ludwig, Christina Hofmann-Randall, Thilo Michel, Gisela Anton and Stefan Funk
J. Imaging 2022, 8(9), 226; https://doi.org/10.3390/jimaging8090226 - 24 Aug 2022
Viewed by 1784
Abstract
If ancient documents are too fragile to be opened, X-ray imaging can be used to recover the content non-destructively. As an extension to conventional attenuation imaging, dark-field imaging provides access to microscopic structural object information, which can be especially advantageous for materials with [...] Read more.
If ancient documents are too fragile to be opened, X-ray imaging can be used to recover the content non-destructively. As an extension to conventional attenuation imaging, dark-field imaging provides access to microscopic structural object information, which can be especially advantageous for materials with weak attenuation contrast, such as certain metal-free inks in paper. With cotton paper and different self-made inks based on authentic recipes, we produced test samples for attenuation and dark-field imaging at a metal-jet X-ray source. The resulting images show letters written in metal-free ink that were recovered via grating-based dark-field imaging. Without the need for synchrotron-like beam quality, these results set the ground for a mobile dark-field imaging setup that could be brought to a library for document scanning, avoiding long transport routes for valuable historic documents. Full article
Show Figures

Figure 1

12 pages, 5397 KiB  
Article
Hierarchical Fusion Using Subsets of Multi-Features for Historical Arabic Manuscript Dating
by Kalthoum Adam, Somaya Al-Maadeed and Younes Akbari
J. Imaging 2022, 8(3), 60; https://doi.org/10.3390/jimaging8030060 - 1 Mar 2022
Cited by 4 | Viewed by 2320
Abstract
Automatic dating tools for historical documents can greatly assist paleographers and save them time and effort. This paper describes a novel method for estimating the date of historical Arabic documents that employs hierarchical fusions of multiple features. A set of traditional features and [...] Read more.
Automatic dating tools for historical documents can greatly assist paleographers and save them time and effort. This paper describes a novel method for estimating the date of historical Arabic documents that employs hierarchical fusions of multiple features. A set of traditional features and features extracted by a residual network (ResNet) are fused in a hierarchical approach using joint sparse representation. To address noise during the fusion process, a new approach based on subsets of multiple features is being considered. Following that, supervised and unsupervised classifiers are used for classification. We show that using hierarchical fusion based on subsets of multiple features in the KERTAS dataset can produce promising results and significantly improve the results. Full article
Show Figures

Figure 1

Back to TopTop