Assessing the State of Digital Tools: An Examination of Juxta

In 2002, Ray Siemens and a group of Digital Humanities (DH) scholars began a project whose aim was to create an electronic edition of the Devonshire Manuscript that would provide an “opportunity to evaluate the applicability and reliability of digital visualization tools” (par. 3). Three programs in particular were used in an attempt to interpret the manuscript in a way that would not be possible with a traditional print edition. The first tool, Piespy, was originally created to monitor IRC channels and visualize relationships between users using a mathematical model (Piespy). The second tool, Simile Timeline, was designed to visualize dates—as either singular events or as events spanning a length of time—in an interactive time line (Simile Timeline). The third tool, TextArc, was designed for use in analyzing financial news updates although it has, since then, become re-envisioned as a more broadly defined text analysis tool (Textarc). As is clear, these three programs did not begin as scholarly tools but were lifted and modified to fit the needs of Siemens and his team.

This demonstrates an important issue currently facing the humanities, namely, the creation and utilization of digital tools for scholarly research. It is of note that a project aimed at evaluating digital visual tools exclusively examined tools that were not created for the Humanities. Siemens’s team, perhaps anticipating the rising need for DH scholars to create their own tools, actually modified Simile Timeline in order to better accommodate their intended application of the program (par. 21). The majority of humanities scholars, however, do not possess the skills to create tools such as Piespy, Simile Timeline, or TextArc and otherwise, do not appear to feel inclined to learn how to do so. Ten years ago, Jerome McGann wrote that “the general field of humanities education and scholarship will not take the use of digital technology seriously until one demonstrates how its tools improve the ways we explore and explain aesthetic works – until, that is, they expand our interpretational procedures” (xii). Ten years later, where do humanities scholars stand?

For this project, I examined one of these programs in greater depth with the intent of questioning whether current digital tools offer potential for research that would not normally be possible by traditional research methods. The focus of my examination will be Juxta, a collation and comparison program designed at the University of Virginia, available free of charge on its official website. Juxta, alongside another digital tool, Ivanhoe, were part of an initiative headed by McGann himself to create tools for the Digital Humanities from the ground up. In 2003, McGann and his team passed Juxta and Ivanhoe on to NINES. While Juxta is, thus, not new, it is a unique program to analyze because of its heritage as a product of McGann and its current position within NINES as part of a larger suite of digital tools.

I have framed my examination of Juxta around Jon Unsworth’s concept of the scholarly primitive. Unsworth defines the scholarly primitive, using the Stanford Encyclopedia of Philosophy, as a “’finite list of self-understood terms’ from which, without recourse to further definitions or explanations, axiomatic logic may proceed” (par. 1). Unsworth’s use of primitives breaks down the tools of producing knowledge to their most basic components, which has enabled me to explore whether, as McGann says is key, new digital tools allow us to “expand our interpretational procedures” (xii). Unsworth uses primitives in a specific way which I intend to emulate. Unsworth consciously employs primitives figuratively, describing the interpretational processes they embody in a “self-consciously analogical way” (par. 1).  I focus on four particular primitives that I feel are especially predominant in Juxta: discovery, comparison, annotation, and visualization.

I used two sets of sample texts in order to test the abilities of Juxta. The first was the 1890 and 1891 versions of Oscar Wilde’s The Picture of Dorian Gray. Wilde first published the story in a thirteen chapter form in Lippincotts, a Philadelphia based magazine that saw distribution to England (Ross). Wilde then expanded the story to twenty chapters and published it as a book the next year. With the lengthier book edition, Wilde fleshed out the characters and narrative while removing some of the homoerotic content from the 1890 version that had drawn criticism from certain members of the public. The relationship between the characters Basil and Dorian shift from that of a perceived romance in the 1890 version to one of artistic appreciation in the 1891 version. I used Juxta to pinpoint these revisions within the text and demonstrate one practical use for Juxta: finding thematic or narrative changes within revised texts. The second set of texts I used was the 1798, 1800, and 1817 versions of Samuel Taylor Coleridge’s Rime of the Ancient Mariner. The intent here was to test how useful Juxta is when comparing subtle revisions to a text over a period of several years, in this case, nineteen years. I wished to chart the dialectical changes between the 1798 and 1817 texts for which the poem has gained notoriety. I was also interested in how Juxta would collate the gloss to the 1817 version. It is important to note that the work I did with these examples was carried out without any intent of generating new knowledge about the texts themselves. I chose these texts because I knew that comparing them to earlier editions would yield results. My concern was namely to determine how successfully Juxta facilitated the process of comparing texts and making interpretation possible.

Before launching into the results of my examination I wish to briefly outline the program’s interface. Juxta is divided into three panels. On the left is the Comparison Panel which lists all texts that have been added to the comparison set. The user can select which texts from the comparison set are included in the collation. One text is always a base and all others are a witness. On the bottom is the Secondary Panel, which serves a variety of purposes. The Secondary Panel allows the user to examine the source text, view images or notes, and look through search results, amongst other things. To the right is the Document Panel. It is here that the collation process unfolds. Texts can be viewed individually in collation mode or in side-by-side comparison and each viewing mode carries with it different options.

Discovery

The first and perhaps most basic primitive which Juxta enables is discovery. The success of Juxta is that, in essence, all work is taken out of the discovery process. To begin, the user must select two or more texts in plaintext or .xml format that will then form a comparison set. Additional texts can later be added or removed so long as they are in the proper format. I created two comparison sets which I titled Dorian Gray and Ancient Mariner. Discovery is as simple as uploading two texts and pressing “collate documents” button. Any and all changes within the base text and all selected witness texts are immediately rendered visible. Blue highlights designate variations between the base and witness texts. Clicking on a highlighted portion opens a pop-up that demonstrates what the text has been changed to in other editions and lists precisely what category of change it belongs to such as addition, deletion, or word change. By rendering the process of discovering changes in a text automatic, Juxta has eliminated the most tedious component of the collation process, effectively giving the scholar leave to dedicate all of his or her faculties to the decidedly more important task of interpreting those changes that have been identified.

Comparison

There are, as noted, two modes for viewing texts: collation mode which shows a single text and side-by-side mode. For the purpose of actively comparing two texts I preferred the side-by-side view. The process of comparison was extremely fruitful with the Dorian Gray comparison set. As the 1891 text had been expanded into a novel, it included large portions of blue highlighted text where new chapters had been added. There were also select passages that had been expanded from the 1890 to the 1891 version. Finally, both texts were peppered with minor word changes. My interest was on the rarer moments where the 1890 version had been highlighted designating not simply word changes but larger, more substantial deletions. Scrolling through the text I was able to find several moments where these deletions had occurred, many of which were of interest.

The Ancient Mariner comparison set yielded far more difficulty. The problems began even prior to the actual collation process. Unable to find a single source for all three versions, I used two different books from the Hathi Trust Digital Library, each of which presented the poem differently. Amongst the differences were the positioning of line numbers and the general layout of the stanzas. None of the sources provided easy to download plaintext files. Instead, I had to copy and paste the poems piece-meal one page at a time. The quality of the OCR was a significant problem. The difficulties of the three texts’ formatting were present immediately upon beginning the collation process. When I finally ran the texts through Juxta, the program was unable to register some of the characters. As a result, boxes appear throughout the text where there should instead be apostrophes or quotations. Added to this were the issues that the gloss for the 1817 version of the text created. I had intentionally chosen Rime of the Ancient Mariner because I expected the gloss would cause some difficulty and I wished to see how Juxta would handle unconventional text layout. Because of Juxta’s reliance on plaintext or .xml formatting, the gloss which was originally separate was combined into the body of the poem appearing as separate stanzas. The amount of difference between the texts was exaggerated due to the addition of the gloss. Whenever I found what appeared to be an interesting addition to the 1817 text I would need to refer to a print copy of the 1817 edition to see whether the addition was in fact a new stanza or simply part of the gloss. In order to get around the gloss, formatting, and OCR problems I had to comb through the plaintext of the 1798 and 1817 versions deleting the gloss, line numbers, and any immediately apparent OCR errors. I then added these new documents to the Ancient Mariner comparison set as Ancient Mariner 1798 Formatted and Ancient Mariner 1817 Formatted. I kept the original, unformatted versions as reference of how much I had to alter the text. This created a number of issues. First, it countered the very purpose of using Juxta which is to speed up the process of comparing texts. Second, by editing the text, I added additional room for error in the text that were not their originally thereby defeating the entire purpose of the exercise.

Upon editing the texts to a point where I felt I could adequately (albeit with potentially additional errors) compare the texts I was able to find a interesting changes. Most notably, of course, were the stanzas deleted from the 1798 edition. Juxta made it very easy to see which stanzas had been removed from the text in subsequent editions. The comparison also demonstrated a consistent shift in dialect. There were, however, a few changes in the text that I was concerned with at first and which required me to refer back to a print copy of the 1817 text. Even after looking through the poem carefully, I was still able to find OCR errors within the formatted text.

Ultimately the Dorian Gray Comparison set was more successful for a number of reasons. First, because my focus was on larger, more substantial changes to the text I did not have to be concerned over the possibility of OCR errors. Second, the conventional novel format of The Picture of Dorian Gray lent itself more readily to this type of examination than Rime of the Ancient Mariner which is traditionally read in conjunction with a gloss. Added to this were the issues of having to resort to varying different sources for the texts, each with different formatting. This demonstrates some of the structural limitations Juxta is currently facing.

Annotating

It is worth mentioning the annotation capabilities of Juxta. Clicking on a portion of text gives the user a prompt to add a note. Notes appear within the Secondary panel where they can then be easily accessed at any time. Changes that have been annotated appear within the text with orange highlight. I found the noting feature useful for delineating between different types of textual changes in Rime of the Ancient Mariner. I differentiated between word changes, spelling changes, and possible OCR errors. Unfortunately, notes are confined within the comparison set and cannot be shared in any meaningful way other than by providing someone with your same comparison set file.

Visualizing

While Juxta is not a visually intensive tool, it is actually permeated by a number of visual cues that make information gathering more accessible. As I have already mentioned, Juxta highlights changes to the text. These highlights come in a variety of colours to designate different meanings in the same way that Microsoft Word designates between spelling, grammar, and word choice errors using red, green, and blue squiggly lines respectively. In Juxta, changes to the text are highlighted in blue, annotations in orange, and search results in yellow. The blue highlights appear in varying degrees of colour to represent how many witness texts deviate from the base text. Furthermore, the act of viewing texts side-by-side is facilitated by the lines which link changes between texts together. As I scroll through the texts, the lines stretch and contract ensuring that the relative position of each change is always clear. The comparison panel also includes a useful heat map which visualizes the degree of change that exists between the witness text and each base text. Hovering the mouse over the heat map also gives you a numerical value for this difference but the heat map is intuitive enough on its own.

Juxta’s most substantial visual component is the histogram graph which charts the exact points in each text where changes have occurred. This is especially useful in longer texts where, by viewing the graph, one can see patterns of change and determine what parts of the text were revised the most.

Future Changes

There are some changes that could take place to make Juxta and other collation tools more useful. They involve the input and output generated from their use. First, the success of tools like Juxta is bottle-necked by the quality of the texts that are put in. If Juxta was nested within a database or network of databases that could then take out the trouble of worrying about formatting it would remove the concerns I encountered trying to prepare Rime of the Ancient Mariner. As it is, Juxta is already housed within NINES giving it an unprecedented advantage. Second, it would be convenient if it were easier to share output made with Juxta. Again, this could be accomplished if Juxta was intuitively embedded within a database system. Juxta already saves work within comparison set files which combine texts along with any annotations the scholar has created. It would be satisfying to be able to “publish” annotated changes in a text. Such an option might entail creating a simplified, view-only interface that will open up when clicked on by a reader. This sort of functionality could potentially be included with a text when it is searched for in a database, appearing next to the Twitter, Facebook, or other such buttons. The significance of networked digital tools is not lost on Unsworth. He suggests that:

With the possible exception of a class of activities we’ll call authoring, the most interesting things that you can do with standalone tools and standalone resources is, I would argue, less interesting and less important than the least interesting thing you can do with networked tools and networked resources. (par. 19)

So long as Juxta remains one of these standalone tools, its full potential will not be reached.

Conclusion

For the purpose of this project I have framed my examination of Juxta around four different scholarly primitives. This is not an exhaustive list. I could have discussed searching, sampling, or linking, each of which is present in Juxta to some degree. But the point of this examination is not to determine how many primitives Juxta utilizes—any tool, regardless of complexity or worth is bound to reflect some form of primitive. Rather, the point is to prove how a digital tool like Juxta can, using McGann’s words once more, “expand our interpretational procedures” (xii). Said in other terms, do the scholarly primitives employed by Juxta alter the process of interpretation in any meaningful way?

While my intent is that the observations I have made will speak for themselves and will allow the reader to come to his or her own conclusion, I will venture to say that Juxta does, indeed, open up new avenues for interpretation. In case any readers wish to experiment with my text samples I have also included the two comparison sets I used for further clarity. As mentioned earlier, the output I generated with Juxta was not intended to accomplish anything new. Rather, I hope to have shown that Juxta frees the user to pursue thought and interpretation further than is possible manually by rendering all labour intensive processes automatically. Juxta allows the user to experiment with ideas more fundamentally by removing the risk that would traditionally accompany the long process of collating a text which may or may not render anything of use. The histogram graph further enables new kinds of experimentation by providing information and visualizing patterns that would not normally by available to scholars.

Programs such as Juxta, and the numerous other digital tools that have been developed since, offer an optimistic glimpse of the future of scholarly work. The potential for academic discourse to be generated by the next generation of digital tools—tools created by humanities scholars themselves—are clear. The issue at hand, then, is not the tools themselves but the apathy surrounding their use. For change to occur it is necessary that scholars make a more conscious effort to promote the use of digital tools in the discipline and the earlier the better. Collation tools like Juxta provide the kind of interpretational data that could be of use to all levels of scholars from the undergraduate to the postdoctoral.

 

 

 

Works Cited

Coleridge, Samuel Taylor, and Carleton Eldredge Noyes. The Rime of the Ancient Mariner. New York: Globe School Book Co, 1900. Hathi Trust Digital Library. Web. 12 Nov.

Huynh, David François. Timeline. Massachusetts Institute of Technology. 2000. Web. 26 Oct. 2011. http://www.simile-widgets.org/timeline/

Juxta: Collation Software for Scholars. NINES. 13 Sept. 2011. Web. 15 Oct. 2011. http://juxtasoftware.org/about.html

McGann, Jerome. Radiant Textuality: Literature After the World Wide Web. Basingstoke: Palgrave Macmillan, 2011. Print.

Mutton, Paul. “Piespy Social Network Bot,” Jibble. N.p. 2010. Web. 26 Oct 2011. http://www.jibble.org/piespy

Paley, Bradford. TextArc. N. P. 2002. Web. 26 Oct. 2011. http://www.textarc.org/

Ross, Alex. “Deceptive Picture.” New Yorker 87.23 (2011): 64-70. Academic Search Premier. Web. 10 Nov. 2011.

Siemens, Ray. “Drawing Networks in the Devonshire Manuscript (BL Add 17492): Toward Visualizing a Writing Community’s Shared Apprenticeship, Social Valuation, and Self-Validation.” Digital Studies/ Le Champ Numerique. 1.1 (2009):n. Page. Web. 24 Oct. 2011.

Unsworth, John. “Scholarly Primitives: What Methods Do Humanities Researchers Have in Common and How Might our Tools Reflect This?” Humanities Computing: Formal Methods, Experimental Practice. King’s College: London, 2000. Print.

Wilde, Oscar. The Picture of Dorian Gray. London: Lippincotts, 1890. Project Gutenberg. Web. 10 Nov. 2011.

—. The Picture of Dorian Gray. London, 1891. Project Gutenberg. Web. 10 Nov. 2011.

 

Advertisements