Overcoming the Perils of Mapping Medieval Spaces: GIS and Other Data Visualizations of the Literary Real and Fictive
Adapted from a presentation given at the Fifty-Fourth International Congress on Medieval Studies, Western Michigan University, 10 May
This essay discusses the various false starts, wrong turns, and general errantry inherent in beginning digital humanities projects, but with particular attention to those that deal with Historical Literary GIS. This talk springs from my ongoing DH project, Morrois (Mapping of Romance Realms and Other Imagined Spaces), a database and data visualization site which is a geographic concordance of literary spaces in Middle English manuscripts.
This project began with a straightforward hypothesis: medieval romance is a literary mode which centres on explorations of unknown spaces in the journeys undertaken by the romance protagonist or in the encountering of foreign or strange things in known spaces, both imaginatively recreated by the romance reader. Further the mode itself, as its name indicates, is also inherently focused on the transmission of texts across language, space, and time. These phenomena have driven critical interest in the various spatial markers of literary texts, and there are manifold research possibilities afforded through distant reading, GIS, and other data visualizations when applied to these phenomena.
To give a sense of the research questions that drive my project, I’ll offer a few examples of these different sorts of place/space movements within just ten lines of a single romance. The Sultan of Babylon is a late fourteenth- or early fifteenth-century romance which describes a Muslim attack on Rome under the eponymous sultan in retribution for the Romans looting a wrecked Muslim treasure ship. In the introduction to this text, we are given the following text:
Listinythe a while and ye shall see,
As King Lowes witnessith that cas,
As it is wryten in romaunce
And founden in bokes of antiquyté
At Seinte Denyse Abbey in Fraunce,
There as cronycles remembrede be,
Howe Laban, the kinge of hie degre,
And syre and Sowdon of hie Babilon,
Conquerede grete parte of Christianté,
That was born in Askalon. (20-32)
Listen a while and you shall see,
As King Louis saw in that case,
As it is written in romance [French],
And found in books of antiquity
At St-Denis Abbey in France,
There as it by chronicles remembered be,
How Laban, the king of high degree,
And sire and Sultan of High Babylon,
Conquered a great part of Christianity
Who was born in Askalon.
We have four immediately recognizable places: St-Denis Abbey, France, Babylon (i.e., Cairo), and Askalon (Ashkelon). These place names can, however, be added to and qualified. “Babylon” is part of Laban’s title, that is, “the Sultan of Babylon,” and as such can be read as a description of a person with a sense of place inherent in it. Similar adjectival usage can be found in “romaunce,”meaning here French, with language bearing a similar moveable sense of place. Finally, “Christianté” is used here to refer to a geo-religious entity, which is often placed against competing spaces of religious identities such as “heathenness,” as with Otuel’s converted king Garcy, who “[a]l cristendom, more & lasse, He þouʒte to maken heþennesse,” (41-2) or even “Fairy,” as when “Launfal, wythouten fable, / That noble knyght of the Rounde Table, / Was take ynto Fayrye (1033-5). Place names, whether referring to a place itself, or being used adjectively to refer to people or things, can be found everywhere in these texts, and often with great frequency. This incidence of place names to line numbers (6:10) falls roughly at the median: some texts, like the Alliterative Morte Arthure, has sections that run closer to 2:1, while others, like Amis and Amiloun, have very few, at about 2:100.
The more I encountered these place name usages and variations, the more I began to think about the various ways that these spatial markers, when read distantly, could reveal a large amount of information to us, particularly when placed against different witnesses of the same text, or when examining place names across multiple texts in the same manuscript. The places named in a manuscript can offer a sense of the spatial literacy of the original author and audience, while depictions of cultural interaction (war, travel, or marriage) mark out permeable boundaries that separate regions. Potential anxieties about geographic senses of identity can be revealed (as we see in the Sultan of Babylon, when the relics of St. Peter’s are looted and taken to Muslim territory), but so too can the many forms of medieval cultural transmission, multiethnic communities, and the complex issues surrounding medieval concepts of race and nationalism. Further, romance reflects the certain tradeable and consumable commodities, often named for their exotic origins (e.g., damask, Arabian horses, Bordeaux wine); reading for these markers of cosmopolitan, luxury taste can give insight into audience familiarity with them and their potential scarcity or availability in certain places or times.
As an end goal, I will extract place-name data from all Middle English manuscripts which contain romances (as defined by the Manual of Writings in Middle English, 1050-1500) in both verse and prose. This gives us a rough total romance count of 105, appearing in a variety of single-text manuscripts, romance-only manuscripts, and composite manuscripts containing romance texts alongside devotional, didactic, and historical works. While romance is at the heart of the project, I argue that the geographic markers contained in any non-romance materials found with romance texts can offer further evidence of the spatial literacy of the composers and copyists of individual manuscript witnesses. Subsequently, I believe that the ontology I’ve developed is flexible enough to allow the project to then engage with romance texts in other languages. While this is obviously essential to any complete conception of trends in spatial knowledge in polyglot England, it will eventually permit engagement with the significantly large corpus of non-English manuscripts, and of ongoing digitization and transcription projects in Germany, France, and elsewhere.
This is obviously a very ambitious end goal. My current two-year project, funded by the Social Sciences and Humanities Research Council of Canada, has helpfully limited the scope to two critically important manuscripts in the Middle English period: the so-called Auchinleck Manuscript (Edinburgh, National Library of Scotland Advocates MS 19.2.1), and the so-called Lincoln Thornton Manuscript (Lincoln Cathedral Library MS 91).
Auchinleck is a collection of forty-four texts compiled by a group of five professional scribes working in London in the 1330s. As the earliest example of lay and commercial book production at a relatively early point in the history of Middle English, it has long held a very prominent place in discussions of Middle English language and literature. Much of the scholarship on Auchinleck, including monographs by Thorlac Turville-Petre, Ralph Hanna, and Siobhan Bly Calkin, has addressed the apparent themes of a proto-national identity of Englishness (or indeed London-ness) as distinct from those of France, the rest of Christendom, or, indeed, the larger medieval world.
Lincoln Thornton is a collection of approximately 100 items, compiled and copied by amateur scribe and Yorkshire landowner Robert Thornton in the 1430s and 1440s. Alongside Thornton’s other smaller manuscript (the “London Thornton,” British Library MS Additional 31042), the Lincoln Thornton has been studied for the glimpse it offers into “a fifteenth-century English gentleman’s long-term, worshipful endeavour to compile a sizeable library for his own personal use and the edification of his family” (Susanna Fein, "The Contents of Robert Thornton's Manuscripts," in Robert Thornton and His Books: Essays on the Lincoln and London Thornton Manuscripts, edited by Susanna Fein and Michael Johnston, pp. 13-65 [Woodbridge, Suffolk: York Medieval Press in association with The Boydell Press, 2014], 15).
With Auchinleck and the Lincoln Thornton, I hope, though a distant-reading investigation of all place-name data, to investigate two major research questions. The first is whether Auchinleck truly engages in a nationalizing or patriotic enterprise, but how: are anxieties of place observable not only in references to various Englishnesses, but also in deployment of these Englishings proximate to references to the strange or foreign. As an oft-discussed example of this, the Auchinleck Sir Orfeo, a reimagining of Orpheus and Eurydice, contains a couplet not found in other witnesses to this text which establishes the setting:
Þis king soiournd in Traciens
Þat was a cite of noble defens;
For Winchester was cleped þo
Traciens wiþouten no. (47-50)
The second question springs from the combination of devotional material, romances, and recipes found in Lincoln Thornton. As Jennifer Bartlett has recently shown ("Arthur’s Dinner; Or, Robert Thornton Goes Shopping," Arthuriana 26, no. 1 (2016), 165-179), the lavish feast scene found in the Alliterative Morte Arthure, though on the one hand filled with exotic food and wine, is also comprised of items to which Thornton himself would have had relatively easy access in his own location and time. Here, I want to see how markers of elite taste are used, differently or similarly, across texts with different functions (romance versus recipe, for example).
Enough for theory. What I intend to focus on more today is practice, which can be roughly organized into five separate stages of project development:
(1) ontology development;
(2) platform selection;
(3) data collection;
(4) data entry and migration;
These stages must at the outset run concurrently, since and the actual practices of entry, migration, and visualization can be profoundly helpful to both platform selection and ontology development.
In my case, we began by collecting place name instances from a selection of romances, reading through the text and entering all instances in an Excel spreadsheet. The ability to save the data in a format as simple as CSV meant that selecting a platform could wait until we began to get a scope of how many place names we’d have, and what sorts of places they’d be. Data entry naturally required a rough ontology (simply for naming the columns), and we came up with a rough template capturing the line number, the name as it appears in text, the Modern English name, the form that it takes (i.e., person, place, thing), and then fields for notes and bibliographic information.
Parallel to this, I presented on these early stages at a couple of DH conferences and workshops and quickly began to see some major flaws in this data collection process. The major takeaway from this is that it’s critical to think like an end user as much as possible and ensure that you’re workshopping the project as much as possible. For instance, the “form” field lacked any fixed vocabulary, meaning that searchability was going to be inherently limited; it also lacked any sense of specifics or hierarchies of meaning. Simply saying “place” is fine enough, but it fails to incorporate in any meaningful way relationships between places: London, for example, is a place within Britain, which is within Christendom A failure to note that in any way further limits the types of queries we might want to ask, such as, “Show me every place named in these romances that features places in Britain.” Another place issue was that I lacked any external resources for these place names: as I was looking to produce map-based visualizations, latitude and longitude were necessary at a minimum. The Getty Thesaurus of Geographic Names was suggested to me, though GeoNames is also a useful resource. I went with Getty because it has a focus on historical places, allowing me to capture entities such as “the Roman Empire.” Each Getty place is marked by a unique numerical identifier, so we quickly added a Getty Identifier column to our spreadsheets.
It was also important to find some sort of platform for displaying this data to get a better sense of how it might be used. The Omeka content management system was recommended to me, and, since it’s open-source, and even has a relatively decent (though not customizable) web-based version, I went with it. Omeka lets you batch upload data through CSV files, so this further cemented it as a choice. One further benefit of Omeka is that it employs the Dublin Core Metadata Standard, which further pushed me to think as regularly as possible about the forms my data was taking. However, as you can see, the metadata fields offered by Dublin Core are useful for metatextual data related to the manuscripts, it’s nearly worthless for my intratextual data. Since I had by this point secured virtual server space through Compute Canada, I went ahead and installed the full version of Omeka (now called Omeka Classic), simply so I could explore how much the Omeka format could be customized to my purposes. The answer was: not much. Although we could effectively change the names of the various fields to more clearly represent the actual data, this also removed any interoperability of those fields across databases.
Nonetheless, we achieved two goals: first, we had a somewhat usable concordance of all the place names found in the Alliterative Morte, which we were able to put online and make some use of. As you can see here, it’s sortable by line number and alphabetically by both Modern and Middle English name. Further, there is a rudimentary “tagging” system, which gives some sense of the underlying ontologies of place, person, and thing. Secondly, by manually entering the latitude and longitude from the Getty Thesaurus, we are also able to use the Neatline plugin, which gives us some sense of the places as projected on a map. More on that later. Secondly, and more importantly, gaining familiarity with Dublin Core Metadata Schema opened up the far wider possibilities of Linked Open Data.
Linked Open Data can in brief be summarized through the four principles offered by Tim Berners-Lee:
- Use URIs to name (identify) things.
- Use HTTP URIs so that these things can be looked up (interpreted, "dereferenced").
- Provide useful information about what a name identifies when it's looked up, using open standards such as RDF, SPARQL, etc.
- Refer to other things using their HTTP URI-based names when publishing data on the Web.
Effectively, all the conceptual things I deal with in my intratextual material can largely be identified using pre-existing URIs. The Getty Thesaurus offers a fine example of this: if we look to the URL for their record of Florence, we see that Florence is identified by the URI “7000457.” The same principle is at play to the corresponding record for “floren” in the Middle English Dictionary. As Linked Open Data entities, I can now use these URIs to identify a relationship between the Modern English (Getty) and Middle English (MED) place names. These URIs also allow me to extract additional semantic information from both sources.
From Getty, we can see a hierarchy of places, such that “Florence” is now nested within a series of other geographic concepts, so my query “Show me every place named in these romances that features places in Italy” can now be gleaned automatically through references to Getty. Similarly, I can engage with the variant spellings, quotes, and dates of attested usage from the MED.
The standards we’ve elected to use for this new underlying database conform to the Resource Description Framework (RDF) specifications, and a series of ontology schema conforming to those RDF specifications. These ontologies can be joined together, so my metatextual data relating to the manuscripts themselves still employ some parts of the Dublin Core Schema. To that, we’ve appended the Erlangen Functional Requirements for Bibliographic Records object-oriented (EFRBRoo), the Sharing Ancient Wisdoms (SAWS) ontology, and generated a few of our own. Overall, this schema is perhaps best expressed visually, as we can see here.
In this image, we can trace the various conceptual entities that underlie any single text within a manuscript, including the scribe, various titles given by both scribe and modern editor, and then a breakdown of both meta- and intratextual entities into persons, places, and things. This permeability between the meta- and intratextual allows for a scribe to reference either themselves or other real-world persons within the text, allowing us to encode geographically-named persons that might sign their work. We allow for place names to lie underneath person and thing references: this is to say, any place name refers to a place, but that reference can then be appended to a person or thing named for a place.
Our first order of business after this lengthy process was to be able to duplicate the end-user visualizations offered by Omeka. Using D3 SPARQL queries, we can provide a similar (if not yet visually polished) list of place names sortable in the same way as Omeka allows, and a similar rough extraction of latitude and longitude to produce simple points that refer to places on a map. Additionally, we can now produce charts and graphs showing frequency of place names, which is a rather helpful form of distant reading when assessing particular focuses within or across texts.
Returning to variant spellings and different usages of place names, one benefit of our manual data extraction is that each addition of a new place also works to compile a geographical vocabulary. The first of two next steps we plan to take is to develop a script that can automatically read plain text transcriptions of texts, identify place names, and add that new data to the database. Simultaneously, the text itself can be marked up in TEI, allowing us to have a marked-up, hyperlinked text that highlights place names within the text. A finished example of this marked-up text can be seen in the Icelandic Saga Map project.
Finally, I should note that neither the Neatline visualization, nor the D3 SPARQL map, nor the Icelandic Saga Map, reflect space and place in a way that accurately represents the sense of these places: the lat/long coordinates reduce an island, country, or continent to a single point rather than the full shape we might desire. Although our work is still very much at the theoretical development level, I have partnered with GIS and historical map librarians to produce a secondary database of shapefiles that will allow us to show, for example, the kingdom of France as a polygonal space which can change over time, and other, fuzzier shapefiles for georeligious entities such as Christendom and Heathenness, which might be able to overlap each other and be weighted as all, mostly, equally, or slightly marked as that space. Overlapping these fuzzy spaces (and that is a technical term), using models developed to study air and water pollution or animal migration, might allow us to ask the database for all romance mentions of places that are contested (that is, overlapping) between these georeligious or geopolitical entities. For the unmappable space, I have enlisted the help of my collaborator, Sean Winslow, to develop a best practice for how to show those on a map, or indicate their mentions without actually placing them on a map at all.
On that last point, I can conclude with a brief summary of what’s most important in any digital humanities project involving mapping. When I first spoke to GIS specialists about how to deal with unmappable spaces like Heaven or Fairy land, one said, “what if you simply chose spaces on the map that wouldn’t be used otherwise, like something in North America?” Since we all know, pace Belinda Carlisle, that heaven is not a place on earth, it’s critical firstly to introduce computer scientists to the nature of our field and the ways in which we use the material. I would also note that any DH project should expect slow starts, with little immediate payoff in many of the ways recognized as scholarly output: developing a robust ontology is time-consuming. Using prebuilt database systems like Omeka, even if imperfect, allow you to spend more time on data collection and thinking about the material in ways helpful to the project, even if you move away from it.
John A. Geck, PhD, LMS
Director of Medieval Studies
Assistant Professor of English
Department of English
Memorial University of Newfoundland