Technology is opening up new vistas for humanities research all the time, and they bring with them new uses for existing major reference resources, such as the DMLBS, when they are available in digital form.
In this guest post, Lynne Cahill of the ChartEx project describes one way in which the DMLBS‘s digital data is already making a huge difference to research into the medieval world of a kind that could scarcely have been imagined when the project was first proposed a century ago.
The ChartEx Project is developing new ways of exploring the full text content of digital historical records. The project will demonstrate its approach using medieval charters which survive in abundance from the 12th to the 16th centuries and are one of the richest sources for studying the lives of people in the past. Charters record legal transactions of property of all kinds: houses, workshops, fields and meadows and describe the people who lived there. Long before records such as censuses or birth registers existed charters were and still are the major resource for researching people, for tracing changes in communities over time and for finding ancestors.
The new ChartEx tools will use a combination of Natural Language Processing (NLP) and Data Mining to extract information about places, people and events in their lives from the charters automatically and find new relationships between these entities. The project will then build an interactive “virtual workbench” that will allow historians, archivists and others interested in charters to explore the information extracted and add further information and comments. This workbench will enable researchers to really dig into the content of the records, to recover their rich descriptions of places and people, and to go far beyond current digital catalogues which restrict searches to a few key facts about each document.
The ChartEx consortium is an innovative partnership between historians, archivists, and experts in computer science and artificial intelligence from Canada, the Netherlands, the UK and the US. The ChartEx Project is funded by the Digging into Data Challenge.
The NLP task involves processing both medieval English and Latin. We are using a combination of modern English resources, for which there are many available off-the-shelf, and resources compiled by the historians from their personal experience of working with charters. For the Latin, there is much less available to use off-the-shelf, so we are fortunate to have access to the Dictionary of Medieval Latin from British Sources. Being able to send a list of English words and receive (often by return) a list of all Latin words from the dictionary with that English word in the definition allows us to build up a significant base of Latin vocabulary. This is particularly important for a language like Latin. For a start, it is a relatively underresourced language in terms of electronically available word lists, dictionaries or grammars. The other issue with Latin is that there are different varieties of Latin used at different times and in different geographical locations. The DMLBS focuses exactly on the variety we are encountering in the charters, so we can be confident that the words and their definitions will be more appropriate for our purposes than entries from a classical or school Latin dictionary, for example.
The DMLBS project welcomes approaches for collaboration from other research projects for which our data and/or expertise can make a difference of this kind, especially in the development of new research methods. With a large and well-structured dataset that can be rapidly and flexibly processed, the project expects its data to be fundamental for a whole range of other forms of research, many of which can even now scarcely be imagined, in much the same way that 100 years ago our forerunners could not have foreseen what uses the results of their proposed dictionary would enable.