Dotting ‘i’s and crossing ‘t’s

2012 has been a busy year for the DMLBS. It was about half a year ago that we launched a new project website and this blog. Both have already shown themselves to achieving their aim to bring our work to a wider audience. (Both website and blog have had readers from more than a dozen countries worldwide and had thousands of hits between them.)

We have continued to develop our digital data, exploring how to publish the dictionary online in the future and using it to collaborate with other projects.

However, this flurry of activity does not mean the core work of the dictionary team has been neglected – quite the reverse. Preparing material for publication has gone on apace as we build up to completing the printed dictionary next year.

When we launched the website and blog earlier this year we were just putting the finishing touches to Fasc. XV of the dictionary. We are now in that same last stage for the next fascicule, as we prepare to send it to our typesetters, Data Standards. Accordingly this last fortnight has been spent incorporating final editorial revisions and honing the transformations that we use to check the data.

Anyone who has published a book, especially an academic book, will be painfully aware that the final part of the work can be particularly arduous. Even once the ‘text proper’ has been checked, reviewed and revised several times so that we are as content as we can be about the ‘content’, there are numerous more technical things that we have to ensure are right, such as alphabetical order, sense numbering and lettering, references, and cross-references.

In the early fascicules of the dictionary it was always a laborious task to make sure that each of the hundreds of cross-references of various sorts in the text actually correctly pointed to a destination and to ensure that the right form of reference was being used for each source quotation (of which there are more than 20,000 in a fascicule of around 200 pages). And it was surprisingly hard work to get alphabetical order correct to the fifth, sixth, seventh letter or further.

With our system now, however, creating the dictionary as XML data means that these crucial but repetitive tasks can be largely automated or eliminated. We can build XSL and XQuery transformations to check that each entry is in its correct alphabetical place and has a homograph number if needed, that each sense is correctly numbered and lettered, that each sense and subsense has a corresponding quotation paragraph (and vice versa), and so on.

Similarly we can check every cross-reference against the data for the dictionary so far, and with the majority of these matching a valid destination, we can eliminate much of the drudgery of checking (at which human beings are not just slow but also not very accurate) and reserve our attention for the instances where editorial judgement is needed. (You might well ask how erroneous cross-references creep into the drafting process in the first place, and the answer is that typically they were accurate when first drafted but the destination has been moved for some reason during revision without a cross-reference pointing to it being known about.) The form of bibliographical references can also be verified in the same way against our bibliography data.

Perhaps the most significant advantage to developing methods of electronic checking is that they can then be used throughout the editorial process to keep the material in good order from the very start, especially now that the dictionary is drafted in its electronic form. Thus the data currently being checked for typesetting has been through many of these checks several times already, after each stage of revision. As a result this time the final pre-typesetting stage is far easier and quicker than ever before and at the same time more accurate.

While we now expect Fasc. XVI to be published in January 2013, meaning that 2012 will not be the first calendar year of the dictionary’s history in which two fascicules have been published, nevertheless XVI will be appearing just six months after XV, itself only seven months after XIV, making a total of more than 600 pages of dictionary text published in little over a year.

Advertisements