Revision (part II)

In the second of a series of posts looking at our creation of a revision workflow, we look at the second stage of the process, building the editorial core.

II. Structure

It’s all very well recording possible items for review. When it comes to doing something with these proposed corrigenda, the situation is more complex. We need to have not only a final revised version of the text but also a record of what changes have been made (e.g., with a view to printing a list of corrections).

Our starting point is the fact we have complete well-formed XML for the whole dictionary (as far as it has been published or drafted), valid against our own specially designed XML Schemas. Because the dictionary is still in preparation, we need our XML data for that published part to continue to reflect the dictionary as it was published (so that we can rely on it for reference, e.g. for preparing cross-references involving the part already published); at the same time we naturally want to implement any revision changes in the data as we go so that we can also see the most accurate and up-to-date version of our dictionary. If we need both these versions in any case (at least until the new version of any text being amended is published), it makes sense for us to hold them in such a way that we can extract straightforwardly a list for publication of the changes made.

We therefore can’t just start making changes to the data as and when we feel like it. We need to know the status of the data at any point, and at the heart of this is the fact that in any case changes need to go through due editorial review before being released (just as the draft dictionary is carefully reviewed and revised during its editorial workflow towards publication).

Now, we do already maintain version control through successive editorial stages in the drafting process using Subversion (in the form of VisualSVN Server and TortoiseSVN), so why not use that to control different versions arising through revision? This kind of solution turns out in a number of ways not to be sufficient on its own for our needs. For instance, though there are various ‘diff’ tools available for comparing successive versions, they usually work by displaying two versions of a file side-by-side, and so they highlight the changes without necessarily enabling an editor or reviewer to see why the change has been made (which would seriously slow the editorial review process). The results of such diffs on XML data are also themselves very hard to interpret, with only raw plain text XML display available rather than visually styled display that has revolutionized our drafting process over the last few years. Moreover, such diff tools work best (sometimes they work only) on a line-by-line basis, which makes changes to inline elements (especially small changes such as a correction of a single individual misprinted character) very difficult to spot and assess, and that is something that we will have a frequent need for. There are a number of other technical difficulties (with managing appropriate read and write repository access and so on) that would also need to be overcome for a Subversion-based solution, but the impracticality of the diff process for editorial review is in itself too great a sticking point to make their resolution worth investigating. Accordingly, although we’ll be continuing to use Subversion control for successive versions in the revision system (something it is very good at), Subversion won’t quite do what we need for handling old and new versions simultaneously and exposing both for XML processing.

We have instead decided simply to include the new version of some part of the data alongside the old version within any amended XML data file, allowing both to be seen and compared easily (and allowing for notes to be added as desired). This further enables a two-stage editorial process of formalizing an amendment (i.e. putting the appropriately tagged new version in its relevant place alongside the existing version) and then after its editorial approval incorporating it as the live version (basically by swapping the old and new versions, with the old one retained to provide part of any corrigendum instruction to be printed, e.g. For {old version} read {new version}).

Creating a structure that will handle the draft entry of changes, their editorial review and approval (or rejection), and their implementation in the data and collation for publication has meant some adaptation to the principal DMLBS XML schema as well as the devising of new transformations (e.g., to incorporate the approved changes, and eventually to produce a formatted list for publication).

Thankfully in addition to our data for the dictionary we have a perfect set of sample amendments as a testbed: an existing set of corrigenda, published at the start of vol. II of the dictionary.


Corrigenda (A-L) from Fasc. VI

So how did we go about formulating the schema changes to be made? We began by establishing our requirements. First, we roughly sketched out the key points of the editorial revision process: recorded suggestions (from the tracking system) are turned into draft amendments that sit in the relevant place in the data alongside the current version (stage 1); the draft amendments are then reviewed and, if approved, incorporated as the live version, with the replaced data retained for reference, marked as old (stage 2).

Second we considered the kinds of revision that we want to be able to make to the dictionary, such as substituting a word in a quotation for another, or moving a quotation from one entry to another. For this exercise the printed list was invaluable in suggesting possible kinds of revision; though we have not considered ourselves to be limited by the list, if the new system can cope satisfactorily with the majority of its changes, we can be confident that it can cope with the majority of our future needs. (Not all its changes, indeed, would be relevant in any case, e.g. those relating to things now carried out automatically, such as the running heads.)

What about additions or deletions? These can be treated as new versions with no old versions or vice versa respectively (with deleted entries being a special case, on which more in a later post). Thus a structure covering substitutions would be sufficient for both these scenarios too.

The two stages in the proposed workflow point to two different structures, related to each other. In the first, the substitution needs to be represented as alternatives, i.e. a grouped pair of siblings of the same type. For instance, a quotation <qt>quot1</qt> element to be replaced with a different <qt>quot2</qt> element might be best treated as sibling children of a <revision> within the usual parent of <qt> which is <q2>; this might point to:


The <revision> element could also allow notes as children and be marked for whether the revision would feature in a printed list or not (i.e. simply be incorporated silently).

<note>quot2 is a newer version of quot1</note></revision>

A disadvantage of this structure, though, is that the status of quot1 and quot2 as old or new versions is ambiguous, being implied only by position. Adequate though this might be for a substitution (in which, say, old might conventionally precede new), for an addition or deletion, the sole <qt> child would be indistinguishable in status. For this reason we instead mark both as old or new respectively, using <for> and <read> as intermediate parent wrappers:


Now an addition can be seen clearly, structured as a <read> without a <for>, and a deletion as a <for> without a <read>.

Two further matters arise: first, at what structural levels (i.e. in what elements) do we allow this kind of <revision> to appear? Clearly it makes sense for there to be some limits to this: review of the existing list suggests that the key levels are substitution of entry, lemma, sense, subsense (definition), subsense (quotation group), quotation. Although it is tempting to allow changes within the text of individual elements, their structure and downstream processing would be considerably more complex, and as a result, we decided that since every proposed change could be formulated in terms of a change at one or more of these levels, every change would be treated at the lowest possible of the levels identified.

This raises the second matter: though it is desirable to record all changes, even small ones, in such a unified consistent way, would it really be desirable to print a full quotation in both old and new versions in a corrigendum list when only a small part (perhaps just a single letter) was being changed? Clearly the answer is no, and so the structure is adapted to allow a <print> note to clarify for publication the change made, effectively summarizing the differences between the full <qt> children of <for> and <to>:

<print>in quot1 for 1280 read 1180</print></revision>

By having the new version present in full (with or without a <print> note), we can make the process of converting to the second stage of structure very much easier (on which transformation, more next time).

The necessary updates to the schema for stage 1 therefore permit <revision> elements at their relevant levels, with appropriate children allowed within the <read> and <for> children (e.g. within <q2> the children of <read> and <for> are <qt>, while within <entry> the children of <read> and <for> are <s1>), and allowing for <note>, <print>, an indication of type (print/silent), and editorial authorisation.

The second structure needs to hold an amended entry at the end of the process and reflect its heritage while fitting in seamlessly, i.e. sharing a structure, with other unamended entries around it. Of course, one option is simply to mark the change as authorized and leave it as is:

<print>in quot1 for 1280 read 1180</print>↩
<auth incorporated="2013-07-01"/></revision>

But this makes it harder to distinguish and process the amended file in among lots of other unamended entries where, for instance, the other (live) <qt> elements are direct children of <q2>. The second structure therefore has the new version in the position it would have been in an unamended entry, i.e. in the place where the <revision> element was, with the replaced version retained, now as its descendant.

 <q2><qt>quot2<amended incorporated="2013-07-01" type="print"><qt>quot1</qt>↩
<print>in quot1 for 1280 read 1180/note></amended></qt></q2>

This structure works well for substitutions and insertions (which would, e.g., lack the <qt> child of <amended>) but less well for deletions. For these the logic simply points to:

 <q2><amended incorporated="2013-07-01" type="print">↩

Of course, this might look like the record of a substitution for the parent <q2>:

 <q2><amended incorporated="2013-07-01" type="print"><q2><qt>quot3</qt>↩

However, the node names in fact prevent ambiguity. In the case of a deletion the parent and child of <amended> have different names, whereas for a substitution they will have the same name. So no confusion need ever arise: each type of change at each level can uniquely be identified by structure and/or name.

The coding of these alternatives in XSD was not straightforward (‘unique particle attribution’ is a real and unnecessary pain here: why should we care which pattern is matched if an element matches more than one valid pattern within its type, provided it does match one?) but finally we have a working Schema that captures both stage one with <revision> and stage two with <amended>.

In the next post we look at the editorial workflow itself and considerations around types of changes, approval, and the first steps in transforming from one stage to the next.

Read the other posts in this series here.


Leave a comment

Please log in using one of these methods to post your comment: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s