Machine Learning and Data Mining for Digital Scholarly Editions
University of Rostock, 9-10 June 2022
In several areas of the Digital Humanities, Data Mining and Machine Learning techniques are increasingly applied and discussed, for example for the processing and extraction of information from digital images that represent humanistic sources, or for the analysis of full texts that are relevant for the Humanities and have already been extracted from images or are born digital. The goal of both Data Mining and Machine Learning is to develop and apply effective and intelligent methods to detect, extract and structure information from big amounts of data that could not be processed reasonably with other methods (Alpaydin 2014, Han and Kamber 2012).
As a subfield of Digital Humanities, Digital Scholarly Editing is no exception to this trend. Data Mining and Machine Learning methods have been used for several tasks in the Digital Scholarly Editing workflow, for example to prepare the transcription and scholarly description of texts by recognizing and classifying text from image data (Boenig et al. 2016, Kestemont et al. 2017, Reul et al. 2019, Brusuelas 2021), to automatically compare text witnesses and reconstruct their history (Nassourou 2013, Hoenen 2018), or to enrich edited texts with information about mentioned entities, topics, or other content- and discourse-related phenomena (Koncar et al. 2020, Haeder 2020).
However, compared to other Digital Humanities subfields such as Computational Linguistics or Computational Literary Studies, so far, these methods have not reached the same widespread use and are not yet discussed as intensely and fundamentally in Digital Scholarly Editing. There may be several reasons for this. Digital Scholarly Editions, which can be defined as “the critical representation of historic documents” in “editions that are guided by a digital paradigm in their theory, method and practice” (Sahle 2016), have particularly high demands regarding the accuracy of transcription and annotation. At the same time, the focus is often on historic and handwritten texts, making a computational treatment more challenging. Also, the amount and extent of edited materials are often not as big as they are when other kinds of text or image collections are analyzed.
The main purpose of this conference is to foster the discussion on Machine Learning and Data Mining techniques in the area of Digital Scholarly Editing. The following questions can be addressed:
- Where can Machine Learning and Data Mining be usefully and meaningfully applied in a Digital Scholarly Editing workflow?
- How are Machine Learning and Data Mining already used for the creation of Digital Scholarly Editions and what are potential use cases for the future?
- What are challenges in Digital Scholarly Editing that can be successfully addressed by using Machine Learning and Data Mining?
- Do editions pose special challenges to the application of Machine Learning and Data Mining that need to be overcome?
- What are biases or side effects when applying Machine Learning and Data Mining methods to historical data/texts?
- How does the use of Machine Learning and Data Mining change the way editors work and the way editions are created? Does it change the role of the editor? How does it change the methods of editing?
- How does Digital Scholarly Editing relate to other Digital Humanities subfields regarding the application of Machine Learning and Data Mining?
- How can a critical engagement with Machine Learning and Data Mining techniques in Digital Scholarly Editing be developed and encouraged?
We are interested in a wide range of topics where Machine Learning and Data Mining can be used in the Digital Scholarly Editing workflow, for example pattern recognition in image analysis, OCR, NLP (tokenization, lemmatization, part-of-speech tagging, NER), topic modeling, sentiment analysis, clustering and classification tasks which prepare transcription, interpretation, text constitution, annotation, and commentary. We encourage proposals that go beyond the presentation of specific research projects towards more general reflections about Machine Learning and Data Mining for Digital Scholarly Editions.
Papers of 4,000 to 6,000 words (not counting the bibliographic references) should be submitted to email@example.com as .odt or .docx until 10 February 2022. We only accept papers in English. Please see the submission guidelines. The proposals will be carefully reviewed by the scientific committee, and authors will be notified about acceptance in April 2022.
The conference will be held on 9 and 10 June 2022. After the conference, papers can be revised. The final version should be submitted for publication until 31 August 2022. The proceedings of the conference will be published Open Access in the IDE’s book series SIDE.
Authors whose proposals are based on research data are encouraged to also publish the data sets, for example on GitHub. A Zenodo community will be created to bundle and archive the data sets related to the conference.
To learn more: https://www.i-d-e.de/call-for-papers-ml-dse/