This entry presents a comprehensive overview of the computational study of Old English that surveys the evolution from early digital corpora to recent artificial intelligence applications. Six interconnected domains are examined: textual resources (including the Helsinki Corpus, the Dictionary of Old English Corpus, and the York-Toronto-Helsinki Parsed Corpus), lexicographical resources (analysing approaches from Bosworth–Toller to the Dictionary of Old English), corpus lemmatisation (covering both prose and poetic texts), treebanks (particularly Universal Dependencies frameworks), and artificial intelligence applications. The paper shows that computational methodologies have transformed Old English studies because they facilitate large-scale analyses of morphology, syntax, and semantics previously impossible through traditional philological methods. Recent innovations are highlighted, including the development of lexical databases like Nerthusv5, dependency parsing methods, and the application of transformer models and NLP libraries to historical language processing. In spite of these remarkable advances, problems persist, including limited corpus size, orthographic inconsistency, and methodological difficulties in applying modern computational techniques to historical languages. The conclusion is reached that the future of computational Old English studies lies in the integration of AI capabilities with traditional philological expertise, an approach that enhances traditional scholarship and opens new avenues for understanding Anglo-Saxon language and culture.
This entry provides a comprehensive overview of the linguistic computational approaches to studying Old English. The computational study of Old English applies digital methodologies to analyse the earliest diachronic stage of the English language (c. 600–1150 CE), focusing on digital and computational methodologies rather than the medieval computus tradition. This interdisciplinary field integrates philology, corpus linguistics, and artificial intelligence in order to carry out large-scale data-driven studies in Old English texts. Digitising manuscripts, annotating syntactic structures and training machine learning models allow scholars of Old English to capture the various synchronic and diachronic phenomena at the morphological, syntactic, and semantic levels of analysis. Many of these aspects were previously imperceptible through traditional manual analysis. The foundation of this approach lies in curated resources such as electronic corpora, machine-readable dictionaries and annotated treebanks, which provide structured datasets for computational models.
The remainder of this entry is structured as follows:
Section 2 examines textual resources, including digital corpora like
The Helsinki Corpus,
The Dictionary of Old English Corpus, and
The York-Toronto-Helsinki Parsed Corpus of Old English Prose.
Section 3 explores lexicographical resources and discusses how dictionaries from Bosworth–Toller to
The Dictionary of Old English have been digitised and enhanced with textual data.
Section 4 tackles corpus lemmatisation methodologies for both prose and poetry.
Section 5 deals with treebanks and focuses on the application of Universal Dependencies frameworks to Old English.
Section 6 investigates artificial intelligence developments through the progression from rule-based systems to modern transformer models.
Section 7 draws the main conclusions of this research.