“It was you that broke the new wood” Ezra Pound famously said of Walt Whitman; perhaps it’s fitting then, that Whitman is so foundational to the space being carved out by the digital humanities. The Walt Whitman Archive, run by Kenneth Price and Ed Folsom, was one of the first of a relatively small handful of very ambitious digitization projects – the Archive was launched in 1995, just two years after the first online edition of the complete works of Shakespeare (http://shakespeare.mit.edu/). Looking at its successes and challenges, particularly in comparison to a later generation of virtual archives, I aim here to draw useful conclusions about the direction of such archives, perhaps even moving away from the metaphor of the archive altogether.

When analyzing a literary website, we naturally start by asking ourselves how visiting it compares to the experiences of reading a codex critical edition, or to visiting the author’s real world literary archive(s) in person. Helpfully, the Archive’s directors have done some public reflecting on how they would like their work to measure up against those experiences. In “Database as Genre” Folsom discusses the advantages of an online archive: “Where before scholars had to travel to many individual archives to examine Whitman’s poetry manuscripts, they are now able to access all those manuscripts from a single integrated finding guide and to display the manuscripts from diverse archives side by side, thus discovering lost connections” (1575). In terms of its comprehensiveness, Folsom is right to tout the singular achievement of his archive: it includes all known manuscripts of Whitman’s poetry plus every major published edition, all his civil war correspondence, every contemporary review of Whitman’s work, a long biography of Whitman written by Folsom and Price, every known photo of Whitman, a rare audio recording of Whitman’s voice, and numerous sundry items, like material by and about Whitman’s circle of “devotees,” or teaching guides and syllabi. For all this, the website is quite easily navigable, with one of the clearest and most self-explanatory main navigation bars I’ve ever seen (fig. 1).

FIGURE 1.  Main Navigation Bar of The Walt Whitman Archive.

FIGURE 1. Main Navigation Bar of The Walt Whitman Archive.

Folsom’s claim that this will facilitate the discovery of “lost connections” is persuasive. The creativity made possible by the comprehensiveness of the virtual archive is probably its greatest achievement; in a sense, though, this is not a departure from but rather a continuation of the thrust of 20th Century literary scholarship. The move to displace the voice of any one central authority, thereby opening up new possibilities for interpretation, has been central to this scholarship, from the intentional fallacy to the death of the author. Folsom uses Deleuze’s metaphor of the rhizome to describe the effect of visiting the Archive: in contrast to climbing a tree, where there are many branches but only one trunk, in the Archive all materials exist on one plane and spread out in a network, with no particular experience of Whitman taking precedence.

 I was skeptical of this claim – it is notoriously difficult to displace the voice of the interpreter in any critical edition. As Price himself says “neutrality is finally not a possibility.”  (“Electronic Scholarly Editions”). and the fact that Folsom and Price wrote the biography of Whitman provided on the site made me even more skeptical. After visiting the site, though, I think the claim that this experience allows more critical freedom for the individual interpreter is a defensible one. I quickly jumped from Folsom and Price’s biography (fig. 2);

FIGURE 2.  Biography of Walt Whitman, with mother’s name highlighted.

FIGURE 2. Biography of Walt Whitman, with mother’s name highlighted.

to the Walt Whitman Encyclopedia entry for Whitman’s mother (fig. 3);

FIGURE 3: Entry on Louisa Whitman from The Walt Whitman Encyclopedia.

FIGURE 3: Entry on Louisa Whitman from The Walt Whitman Encyclopedia.

and then to all of Whitman’s Civil War correspondence to and from his mother, though I had to go through Biography >> Correspondence >> and then through an alphabetical list to find it (fig. 4).

FIGURE 4: Hyperlinked List of Louisa Whitman’s correspondence with Walt Whitman.

FIGURE 4: Hyperlinked List of Louisa Whitman’s correspondence with Walt Whitman.

 I could easily problematize Folsom and Price’s claims about Whitman’s mother and his relationship to her through my own reading of these primary sources; doing the same thing with an edited codex would have required at least a second volume of correspondence, if not a trip to one of the Whitman archives scattered around the US. In this sense, the access to knowledge provided by this website is truly unprecedented.

       On other counts, the status of access to knowledge through the website is less clear. On the “Conditions of Use” page, it says the copyright to the Archive is held by Folsom and Price – it’s not immediately clear, though, what this means they own, given that the copyrights to many of the individual materials on the site are still held by the institutions that granted use of them to the Archive (more than 20 institutions are listed on the “Holding Institutions” page, with contact information for each). Also, Whitman’s writing should, in theory, have long ago entered the public domain – Whitman died in 1892. To someone unfamiliar with US Copyright law, it could seem that Folsom and Price are claiming they own Whitman’s works; therefore, their claims to ownership need to be spelled out more clearly. They have helpfully, however, provided a link to fair use provisions and made clear that these apply to all materials in the Archive. Even more helpfully, Folsom and Price have licensed all the TEI/XML files (the coded versions of the documents on the site, which cannot be immediately reused as websites but can be manipulated and put to other uses by developers) under Creative Commons and provided them for download in the “Resources” section (fig. 5).

FIGURE 5: TEI/XML downloads  page, with Creative Commons Licensing.

FIGURE 5: TEI/XML downloads page, with Creative Commons Licensing.

This is a very generous move, and one Folsom and Price should be applauded for.

       There are shortcomings, though, with this site’s generosity towards its users. Though they are provided with all the information they might need to make different cases about Whitman’s life and work, users of the Archive are not given any platform on which to make that case. In “Remediating Whitman” Meredith McGill notes that “[d]espite the revolutionary capacities of the new technologies, pioneering digital projects such as  The Walt Whitman Archive hew surprisingly closely to normative ideas of the author and the work…keep[ing] such projects from functioning in the radical ways that Folsom describes” (1593). She cites several examples: the focus on a single author; the emphasis on Whitman’s poetry versus his prose, and on Leaves of Grass specifically; the centrality of the author’s biography; and the absence of any reader-generated data. This last point she makes quite briefly, but as someone raised in the Web 2.0 environment, it had the most impact on my experience of the site: I expect to be able to interact with other users, to be exposed to and benefit from their knowledge and opinions about the site and about Whitman. The Whitman Archive doesn’t have even a basic “guest book” feature where users can enter comments – the “Comments” link, found in tiny script in the footer section of the page, simply links to an auto-email feature that emails Folsom and Price directly (fig. 6).

FIGURE 6: Website Footer, with a link to “Comments?”

FIGURE 6: Website Footer, with a link to “Comments?”

Price has talked about the need for a new metaphor for online “archives” such as “arsenal,” understood primarily as a “public place for making,” which according to Price “suits current aspects of the genre … and will no doubt characterize it even more in this age of social networking” (Edition, Project). But the Whitman Archive isn’t really a public place for making, particularly when compared to NINES (www.nines.org) which allows users to discuss articles, create Exhibits of related works, tag articles and then search through these user-generated tags. The “public making” aspect of NINES are mostly neglected, though, and generally haven’t been central to the experience of using NINES because of this lack of critical mass. So while we can fault the shortcomings of The Whitman Archive from the perspective of creativity and rhizomatic experience (if departure from archives is what Price and Folsom were interested in, then The Whitman Archive represents a missed opportunity) we are still left with the problem of how to stop remediating the archive and start building participatory online “places for making”. Hopefully with time, the digital humanities will move more in the direction of valuing, and providing access to, scholars’ shared knowledge about authors and their work.

Works Cited

Folsom, Ed. “Database as Genre: The Epic Transformation of Archives.“ PMLA (October

2007) 122:1571-1579.

McGill, Meredith. “Remediating Whitman.“ PMLA (October 2007) 122:1580-1612.

Kenneth M. Price. “Electronic Scholarly Editions.” A Companion to Digital Lierary Studies,

eds. Susan Schreibman and Ray Siemens. Oxford:Blackwell, 2008.

http://www.digitalhumanities.org/companionDLS/

–. “Edition, Project, Database, Archive, Thematic Research Collection: What’s in a Name?”

Digital Humanities Quarterly (Summer 2009) 3:3.

http://digitalhumanities.org/dhq/vol/3/3/000053/000053.html

Advertisements

Crowdsourcing the Academy:
Digital Humanities and the Evolution of the University

In Writing Machines (2002), Katherine Hayles relates the reaction of her colleagues at a conference of the North American Society for the Study of Romanticism when she drew their attention to the fact that The William Blake Archive’s rhetoric is dominated by print, with tools specifically designed to get as close as possible to the original colour and size of the printed page. Her “obvious moral that the literary community could no longer afford to treat text on screen as if it were print read in a vertical position” did not resonate well with her audience (43). Five years later, in the inaugural issue of Digital Humanities Quarterly, Joseph Raben identifies print superiority as a mode of thinking that is deeply entrenched in all aspects of humanities in the academia. One of the central concerns of DHQ, he posits, is to challenge “the status of online publications as an inferior medium” and the fact that  “electronic media is not as highly regarded by the gatekeepers of tenure and promotion as the traditional hard-bound book and the article offprint” (2). Finally, in a more recent article What is Digital Humanities and What’s It Doing in English Department?” Matthew Kirschenbaum highlights the social aspect of Digital Humanities, its public visibility, community and significant impact on the world outside the academy, and its position as a “free-floating signifier” that “increasingly serves to focus the anxiety and even outrage of individual scholars over their own lack of agency amid the turmoil in their institutions and profession” (60). These three independent standpoints form a short, yet telling narrative of the field. Most striking is the number of unique observations and questions digital humanities scholars bring to bear on the role of humanities and the structure of university as a whole. Questioning everything from knowledge “gatekeepers” and access, to traditional ideas of scholarship and pedagogy, and the role of community and public good, digital scholarship raises provocative questions for humanities scholars and challenges not only the traditional way of doing things within the academy, but the structure of the university itself.

It would be difficult to deny that digital humanities has become a key player in the evolution of the academia. No longer located at the periphery, digital scholarship has had and continues to have a profound impact on academic discourse, scholarship and pedagogy. In an ADE Bulletin from 1988, Cynthia Selfe posits that “experimentation at the most basic level” is the only way to characterize the English department’s affinity with computers: “As a profession, we are just learning how to live with computers, just beginning to integrate these machines effectively into writing- and reading-intensive courses, just starting to consider the implications of the multilayered literacy associated with computers” (qtd. in Kirschenbaum 55). The theme of experimentation is, of course, still relevant to digital humanities scholarship today with new experimental digital tools and emerging methodologies such as data mining that radically extend research, especially involving large groups of texts. Indeed, in trying to keep up with technological advances, digital scholarship is bound to retain a label of “emerging field” no matter the strength of its professional apparatus. But the word “emerging” or “experimentation” need not carry any negative connotations. While digital scholarship may not have resulted in extensive alteration of university structure yet, it has certainly set the stage for much fruitful experimentation, and encouraged and challenged scholars working to advance understanding of large-scale institutional change.

One such experimental strategy is crowdsourcing. Whether digital humanists recruit public for help with transcribing large collections of manuscripts or crowdsource an entire book project through multiple social media channels, and with no restrictions on submission format, massive collaboration is an approach that is charged with potential for rewarding alterations within academic discourse and scholarly practices. This paper will trace how digital humanities utilize crowdsourcing to challenge traditional views of knowledge production and consumption in the academia. I will consider three case studies to explore the benefits and criticism digital humanists face in soliciting public engagement in major scholarly endeavours and extending scholarship and pedagogy beyond the immediate confines of their scholarly community: University College London’s Bentham Project and their decision to recruit general public in an effort to accelerate the process of transcribing Jeremy Bentham’s papers; my own experience using 18thConnect.org TypeWright tool and its pedagogical merits and public benefits; and Dan Cohen’s and Tom Scheinfeldt’s book project titled Hacking The Academy, a volume crowdsourced in one week through social media outlets that provocatively puts the structure of academy as its subject and presents an innovative method for assembling and publishing collaborative scholarly work. In assessing each of these cases, I will argue that the theme of experimentation that still marks much of the work in digital humanities studies is also one of rejuvenation. An active participant in the evolution of the academia, digital humanities has at its core a simple but pertinent question: how can technology and new media revitalize and revamp the traditional academic system?

The Oxford English Dictionaryarguably the first “massively-crowdsourced collation of English knowledge” (Lanxon; “Crowdsourcing”)does not contain an entry for the word “crowdsourcing.” And while Wikipedia explains that wiki is an “unusual medium” and warns that academics “may reject Wikipedia-sourced material completely” (“Wikipedia”), its collaborative environment seems a fitting place to produce a reliable entry for “crowdsourcing.” The entry suggests that Jeff Howe coined the word in a June 2006 Wired article “The Rise of Crowdsourcing,” where he argued that outsourcing was being replaced by a new commercially advantageous system of  “crowdsourcing”—“the new pool of cheap labor” (Howe). Given the recent coinage of the word, it is perhaps not surprising that the OED does not contain a definition. Then again, having become the definitive—and academic—standard of reference that it is, the OED may not have included an entry for “crowdsourcing” because the word is not “significant” or “important” enough, or is “likely to stand the test of time” (“How”). Whatever the case may be, as another Wired article shrewdly observes, both the OED and Wikipedia have remarkably similar origins (Lanxon). Professor James Murray, chief OED editor from 1879 until his death in 1915, spent several decades crowdsourcing word definitions. Collaborating much in the same way as today’s Wikipedia contributors, the general public would send in slips of paper with quotations that showed word usage, along with suggested definition, date of first use, etymology, and other data (Lanxon). Unlike Wikipedia, however, the OED’s claim to authority is quite prominent, taking the center stage as its subtitle: “The definitive record of the English language” (“Home”). Whether or not Wikipedia will ever acquire within academic circles the same status as an encyclopedia as the OED holds as a dictionary, this comparison reveals that crowdsourcing has been around for a long time and that there is little difference between the creation of the most trusted and one of the most mistrusted sources for scholarly research. Moreover, with the aid of technological advantages, such as instantaneous editing and steady contributions and improvements (Lanxon), Wikipedia might take less time to reach academic acceptance.

At the heart of the problem of academic acceptance of crowdsourced projects such as Wikipedia, that allow participants from all walks of life, lies the threat to the academe’s own importance. These projects challenge the rank and prestige of the “Ivory Tower” that houses experts far away from amateurs. Consequently, decisions to crowdsource scholarly work are often imbued with fears and concerns combined with a genuine interest for experimentation or a desire to challenge the status quo of established scholarly practices.

Such is the case with Transcribe Bentham project, “a participatory initiative,” as its subtitle puts it, that allows anyone who signs up for a MediaWiki account to transcribe manuscripts from a large collection of papers of the Enlightenment philosopher Jeremy Bentham. The reasons the editors decided in favour of crowdsourcing are quite simple and commonplace in today’s academia: a massive undertaking and lack of funding (P. Cohen). The project is a daunting job. Since its inception more than fifty years ago less than half of the papers have been transcribed (Wallace).  Despite the fact that the bulk of this scholarly endeavour would be eventually completed by general public, the editors make sure to highlight that the goal of the project is to produce “the authoritative edition of the Collected Works of Jeremy Bentham” (Causer, my emphasis)—a prominently featured subtitle on the project’s blog page. Again, a production of knowledge in many ways very similar to Wikipedia—it even uses MediaWiki, a software platform that was originally designed for use on Wikipedia—the project exhibits labels such as “the definitive” or “the authoritative” in an effort to attract attention to the expertise of the scholars that oversee it. What is driving this fear that the work has less claim to authority when it involves participants outside the academy on the one hand, or that it is, in fact, recognized as “definitive” or “authoritative” despite its non-academic contributors on the other?

To be sure, there is the real fear that the public would produce poor-quality work. Daniel Stowell, who is in charge of the Papers of Abraham Lincoln in Springfield, Illinois, explains that the hiring of non-academic transcribers for their project resulted in extremely poor quality and multiple gaps in the work: “we were spending more time and money correcting them as creating them from scratch” (qtd. in P. Cohen).  Of course, unlike the public working on the Bentham Project, the non-academics working on the Papers of Abraham Lincoln were paid workers. It is difficult to say whether or not this fact makes a difference in the final quality of the work. One the one hand, paid transcription entails more responsibility for the quality of the work produced; on the other, those that volunteer are generally genuinely interested in producing quality work. Whether or not remuneration makes a difference, Mr. Stowell is understandably sceptical towards new crowdsourcing projects like Transcribe Bentham: the Papers of Abraham Lincoln’s platform for crowdsourcing that was designed by the National Center for Supercomputing Applications at the University of Illinois, Urbana-Champaign was eventually abandoned (P. Cohen).

But aside from the alleged corruption of valuable scholarly materials, there seems to be a deeper concern on the side of the academe, whose response to crowdsourcing, among other digital scholarship, is still overly defensive—even when they choose to collaborate with the public! Smudging the border between those within and without the Ivory Tower, academics still feel the need to continually reassert their importance by fortifying the final product, and therefore the academic institution, with labels such as “definitive” and “authoritative”—labels that call attention to the power and authority that govern knowledge production. The editors of the Bentham Project examine and correct all submitted transcripts. So why is there such a pressing need to defend the legitimacy of digital scholarship that moves away from traditional scholarly practices by engaging the public? Why is the institution so reluctant to accept online scholarship unless it adheres as much as possible to the established—dare I say printmodel of scholarship? After all, the “definitive” or the “authoritative” edition of an author’s oeuvre is one of the most highly regarded products in print publication. Finally, is the primary goal of the Bentham Project to “form the basis of future scholarship including printed editions”? Or is engaging the public a goal in itself? Both of these aims along with “preserving national heritage” and “widen[ing] access” are listed on the “About Us” page (Transcribe Bentham).

Karen Mason, one of the volunteer transcribers explains that she considers her help as a “service to the scholarly community” (qtd. in P. Cohen). Aside from widening access and preserving national heritage, it seems that a service of the scholarly community back to the community at large should be pedagogical in nature. Indeed, the opening paragraph of the “About Us” page clearly outlines the educational benefits of participation:

We would like to encourage all those who have an interest in Bentham or those with an interest in history, politics, law, philosophy and economics, fields to which Bentham made significant contributions, to visit the site. Those with an enthusiasm for palaeography, transcription and manuscript studies will be interested in Bentham’s handwriting, while those involved in digital humanities, education and heritage learning will find the site intriguing. Undergraduates and school pupils studying Bentham’s ideas are particularly encouraged to use the site to enhance their learning experience. (Transcribe Bentham)

Public engagement and education are definitely part of the Bentham Project’s agenda. “I’m no Bentham scholar,” Mason reflects on her experience, “but I am interested in history, so it’s interesting to look at the addenda and deletions in the manuscripts and generally follow the thought processes of a man living in 18th-century England” (qtd. in P. Cohen). Despite pedagogical merits, it would be hard to argue that the primary goal of the project is to extend scholarship beyond the academy. As the second paragraph of the “About Us” page conspicuously admits, “helping UCL’s Bentham Project in its task of producing a new authoritative edition of the Collected Works of Jeremy Bentham” is the end goal. Again, the word “authoritative” here signals that at its core the project is by scholars and for scholars.

Not all crowdsourcing projects, however, are influenced as much by the logic of print scholarship/academic superiority as the Bentham Project. Sharon Leon, a historian at George Mason University, is working on a free digital crowdsourcing plug-in that can be attached to any database, allowing the public to transcribe the materials (P. Cohen). Leon’s approach to crowdsourced transcription is much more daring and forward thinking, freed from the constraints of the academe’s self-importance. She does not fear the ostensible corruption of materials that comes with the public’s involvement: “We’re not looking for perfect . . . We’re looking for progressive improvement, which is a completely different goal from someone who is creating a letter-press edition” (qtd. in P. Cohen). The end goal of Leon’s digital tool is clearly not a perfectly transcribed scholarly archive; rather, it is active public participation and collaboration that extends beyond the academy. This is experimentation at its best. Only time will tell whether the tool will be a success, but, at least for now, this scholar has intentionally chosen to adopt the logic of Wikipedia—one of the most feared non-academic sources.

“Progressive improvement” is also the strategy of another transcription tool “TypeWright” that is currently being developed by 18thConnect.org. TypeWright is a crowsourced correction tool that is designed to improve OCR texts. The aim of the tool is the progressive improvement of the text-versions of the documents that have been created from the page images hosted by commerical databases such as Gale’s Eighteenth Century Collections Online (ECCO).  As the homepage of the tool explains, the corrected text-versions are  “crucially necessary: they are what enables full-text searching, datamining, preserving, and curating texts of historical importance.” I had the privilege to be a beta tester for this tool and write a report outlining my experience and reccommendations.

The call for beta testers was crowdsourced through usual scholarly communication channels such as an e-mail to the departments and instructors with interest in digital humanities, though it potentially extended beyond the academia. The fact that Gale’s OCR versions of eighteenth-century documents are already hopelessly corrupted, and that TypeWrite, at least at this stage, targets contributers that are more or less within the academy, means that the fears of further corruption with this crowdsourcing endeavour are somewhat limited. Nonetheless, the tool has a strong apparatus that allows anyone to review all changes to the original OCR text.

Setting aside the  problem of poor-quality corrections that TypeWright will most likely encounter in the future, in my experience of testing the tool, I found it to be an excellent example of  how digital scholarship can enhance pedagogy. Indeed, one of the first survey questions at the end of my testing asked me what I have learned during the exercise. And I was struck by how much one could learn—with a digital tool—about eighteenth-century print practices. In the brief time that I was testing the tool, I learned, for example, that printers often used extra spacing as part of the ornamentation of text. In the text I was working with, Love-letters on all Occasions Lately Passed between Persons of Distinction, collected by Mrs. Eliza Haywood, the publishing city, London, had an extra space between each letter. Similarly, the first word of the letter, “Madam,” had larger spacing between letters. I also learned that, aside from spacing, mechanically typed text is remarkably varied in its use of lower and upper case letters as well as bold and cursive typefaces. Whether used for emphasis, distinction, or aesthetic reasons, almost every page of my document contained words in bold typeface, italics, small capitals, or regular capitals. TypeWright can certainly be a great pedagogical asset, enhancing discussion and providing an exciting way to study eighteenth-century print culture at the high school or university level.

In addition to its pedagogical merits, TypeWright also has an agreement with Gale that allows it to directly contribute to the public good. Once a work is corrected, Gale releases the page images, as well as the newly corrected text-versions behind them, from its copyright, thus allowing free access to the materials that are part of a subscription-only commercial database. Such unusual arrangement demonstrates that digital humanities scholarship can not only enhance pedagogy and enrich academic discourse, but also provide new avenues to beneficially reform the current institutional system that allows large corporations such as Gale to set a price for and thus control access to knowledge.

The last project I will consider is Dan Cohen’s and Tom Scheinfeldt’s book titled Hacking The Academy, a volume crowdsourced in one week that focuses the discussion on the changing institutional system in the face of digital technologies and challenges the traditional model for academic publishing. Dan Cohen even crowdsourced the title of the volume in an earlier blog entry, asking commentators to come up with a title for his new book that would explore “the way in which common web tech/methods should influence academia, rather than academia thinking it can impose its methods and genres on the web” (“Crowdsourcing the Title”). As the volume editors explain in the preface, they used multiple online channels to distribute their call for papers that asked “intentionally provocative questions” such as “Can an algorithm edit a journal? Can a library exist without books? Can students build and manage their own learning management platforms? Can a conference be held without a program? Can Twitter replace a scholarly society?” The goal was to provoke contributors to comment on the state of today’s academy, to question and “hack” every aspect of the institutional infrastructure (Davidson). “The book,” the call for paper continued, “will itself be an exercise in reimagining the edited volume. Any blog post, video response, or other media created for the volume and tweeted (or tagged) with the hashtag #hackacad will be aggregated at hackingtheacademy.org” (Davidson). More importantly, the editors wanted the volume to not only spread across multiple digital media formats and websites, but to become interactive with continuous commentary, blog entries, tweets, and the ability of contributors to directly respond to each other instead of restricting themselves to “inert, isolated chapters that normally populate edited volumes” (D. Cohen et al.)

The call for papers was a tremendous success. During only seven days, 329 submissions from 177 authors were made, incorporating various genres, formats and multimedia. Evidently, the experiment was a remarkable achievement that ostensibly proved that the web can and does influence academia and that academia can no longer “impose its methods and genres” on digital humanities scholarship. Reflecting back on the project in a blog entry, Dan Cohen observes that letting the submissions dictate the form of the volume was an exciting and liberating experience: “I think one of the real breakthroughs that Tom and I had in this process is realizing that we didn’t need to adhere to a standard edited-volume format of same-size chapters” (“Some Thoughts”). Furthermore, the editors discovered that this model for academic volume publishing could be easily repeated and is in many respects superior to the traditional model:

Sure, it’s not ideal for some academic cases, and speed is not necessarily of the essence. But for “state of the field” volumes, vibrant debates about new ideas, and books that would benefit from blended genres, it seems like an improvement upon the staid “you have two years to get me 8,000 words for a chapter” model of the edited book. (“Some Thoughts”)

There was, however, a crucial problem with Hacking the Academy: the unedited versus the edited volume and the requirements for the preparation of a print edition, forthcoming under the University of Michigan Press digitalculturebooks imprint.

While the experiment does indeed demonstrate how digital technologies can usefully transform the production and consumption of knowledge, Dan Cohen might have overstated his case about the academy no longer having a strong grip on the methods and genres of digital scholarship. Derek Bruff, one of the commentators to Dan Cohen’s “Some Thoughts on the Hacking the Academy Process and Model” blog entry, astutely traces the differences between the unedited volume and the final online product:

. . . I’ll admit I was frustrated to see that, as innovative as this project was, you didn’t find a way to include audio and video contributions in the edited volume, even though such contributions were requested. I know that audio and video pieces are in the “unedited” volume, but I would have liked to have seen a couple of digital humanists come up with a creative way to include them in the edited volume, as well.  Also, it seems that most of the pieces in the edited volume are from people in the humanities. I understand that given the nature of the call for submissions (via Twitter accounts and blogs run by humanists), this is to be expected. But there was nothing in the project description (at least, that I noticed) that limited contributions to those in coming from the humanities. You have some important voices from outside the humanities (Wesch, Jarvis, Junco, and others), but it’s still a very humanities-centered volume. . . . I hope that what you’ve done will inspire others to take on similarly inventive projects, perhaps ones that can respond to the criticisms I’ve made here.

What Katherine Hayles described as print-centered rhetoric (42) is clearly working its way into the online edited volume, anticipating the final—and even more highly prized—product: the print edition. As Bruff notes, the edited volume cuts out all of the multimedia and most of the “unusual genre” pieces. Moreover, it makes sure that accepted and respected digital humanists such as John Unsworth, Matthew Kirschenbaum, and Kathleen Fitzpatrick (with five entries!) are featured on its pages. Despite its potential, the unedited volume will, more likely than not, soon be abandoned. What will remain is the “authoritative” print edition with more or less “inert” entries and pieces of interaction, tasked with the job to perpetually exude the sense of professional self-importance.

Experimentation is indeed the perfect way to describe what marks much of the digital humanities scholarship today. Whether trying to accelerate painstaking scholarly endeavors through crowdsourcing, or reimagining an edited volume in the digital age, scholars are attempting to move away from producing digital scholarship modeled on the established print practices towards new models that have the potential to transform the very structure of institutional knowledge production. Some experiments might not be as successful as others, and, as I have argued in this paper, print-centered logic and the fear of losing the “Ivory Tower” status still undermines a great deal of digital endeavors. The next important step is to continue experimenting. As Derek Bruff notes in his comment, the hope is that each project will inspire others to build on, improve, and actively participate in the evolution of the academia. Just like with Wikipedia or the OED “progressive improvement” is a slow but steady path to a better institution.

Works Cited

Causer, Tim. The Bentham Project Blog. University College London, 21 Nov. 2011. Web. 25 Nov. 2011. <http://blogs.ucl.ac.uk/bentham-project/>.

Cohen, Dan. “Some Thoughts on the Hacking the Academy Process and Model.” Dan Cohen Blog. 8 Sept. 2011. Web. 2 Dec. 2011.

___. “Crowdsourcing the Title of My Next Book.” Dan Cohen Blog. 4 Aug. 2010. Web. 2 Dec. 2011.

Cohen, Dan, and Tom Scheinfeldt. Preface. Hacking the Academy: A Book Crowdsourced in One Week. Eds. Dan Cohen, et. al. University of Michigan Press, 8 Sept. 2011. Web. 1 Dec. 2011. <http://www.digitalculture.org/hacking-the-academy/introductions/>.

Cohen, Patricia. “Scholars Recruit Public for Project.”  New York Times. New York Times, 27 Dec. 2010. Web. 2 Dec. 2011.

“Crowdsourcing.” Wikipedia: The Free Encyclopedia. Wikimedia Foundation, Inc. 10 Dec. 2011. Web. 10 Dec. 2011.

Davidson, Cathy. “Hacking the Academy!” Cathy Davidson Blog. HASTAC, 24 May. 2010. Web. 2 Dec. 2011.

Hayles, N. Katherine. Writing Machines. Cambridge, Mass.: MIT Press, 2002. Print.

“Home: Oxford English Dictionary.” OED Online. Sept. 2011. Oxford UP. 8 Dec. 2011.

“How do you decide whether a new word should be included in an Oxford dictionary?” Oxford Dictionaries: Frequently Asked Questions. Oxford University Press. Web. 10 Dec. 2011. <http://oxforddictionaries.com/words/how-do-you-decide-whether-a-new-word-should-be-included-in-an-oxford-dictionary>.

Howe, Jeff. “The Rise of Crowdsourcing.” Wired. Jun. 2006. Web. 2 Dec. 2011. <http://www.wired.com/wired/archive/14.06/crowds.html>.

Lanxon, Nate. “How the Oxford English Dictionary started out like Wikipedia.” Wired. 13 Jan. 2011. Web. 2 Dec. 2011. <http://www.wired.co.uk/news/archive/2011-01/13/the-oxford-english-wiktionary>.

McGann, Jerome J. Radiant Textuality: Literature After the World Wide Web. New York: Palgrave, 2001. Print.

Transcribe Bentham: A Participatory Initiative. University College London, 6 Dec 2011. Web. 10 Dec. 2011. <http://www.transcribe-bentham.da.ulcc.ac.uk/td/Transcribe_Bentham>.

Wallace H. Valerie. “About the Bentham Project.” UCL Bentham Project, 6 Dec. 2010. Web. 10 Dec. 2011. <http://www.ucl.ac.uk/Bentham-Project/about>.

“Wikipedia: Citing Wikipedia.” Wikipedia: The Free Encyclopedia. Wikimedia Foundation, Inc. 20 Sept. 2011. Web. 10 Dec. 2011.

Below is the link to the first phase of an online archive/database/thematic research collection that I put  together this term. I used Omeka.net, an online exhibit builder, to display a series of documents that the BC Archives has generously allowed me to display online on an ongoing basis. Please feel free to explore the site and give me feedback!

“This exhibit, the first phase of ‘the story is more than itself,’ consists of correspondence, literary works, illustrations, newspaper articles, and magazine reviews from between 1939 and 1941. Taken together, these materials provide a brief glimpse into the work and life-stories of three individuals—Alice Ravenhill, Anthony Walsh, and Noel Stewart—who coordinated their efforts to promote the artistic abilities of Indigenous children enrolled in Indian Residential and Day Schools in British Columbia.”

http://thestoryismorethanitself.omeka.net/exhibits/show/artsbasededucation

Assessing the State of Digital Tools: An Examination of Juxta

In 2002, Ray Siemens and a group of Digital Humanities (DH) scholars began a project whose aim was to create an electronic edition of the Devonshire Manuscript that would provide an “opportunity to evaluate the applicability and reliability of digital visualization tools” (par. 3). Three programs in particular were used in an attempt to interpret the manuscript in a way that would not be possible with a traditional print edition. The first tool, Piespy, was originally created to monitor IRC channels and visualize relationships between users using a mathematical model (Piespy). The second tool, Simile Timeline, was designed to visualize dates—as either singular events or as events spanning a length of time—in an interactive time line (Simile Timeline). The third tool, TextArc, was designed for use in analyzing financial news updates although it has, since then, become re-envisioned as a more broadly defined text analysis tool (Textarc). As is clear, these three programs did not begin as scholarly tools but were lifted and modified to fit the needs of Siemens and his team.

This demonstrates an important issue currently facing the humanities, namely, the creation and utilization of digital tools for scholarly research. It is of note that a project aimed at evaluating digital visual tools exclusively examined tools that were not created for the Humanities. Siemens’s team, perhaps anticipating the rising need for DH scholars to create their own tools, actually modified Simile Timeline in order to better accommodate their intended application of the program (par. 21). The majority of humanities scholars, however, do not possess the skills to create tools such as Piespy, Simile Timeline, or TextArc and otherwise, do not appear to feel inclined to learn how to do so. Ten years ago, Jerome McGann wrote that “the general field of humanities education and scholarship will not take the use of digital technology seriously until one demonstrates how its tools improve the ways we explore and explain aesthetic works – until, that is, they expand our interpretational procedures” (xii). Ten years later, where do humanities scholars stand?

For this project, I examined one of these programs in greater depth with the intent of questioning whether current digital tools offer potential for research that would not normally be possible by traditional research methods. The focus of my examination will be Juxta, a collation and comparison program designed at the University of Virginia, available free of charge on its official website. Juxta, alongside another digital tool, Ivanhoe, were part of an initiative headed by McGann himself to create tools for the Digital Humanities from the ground up. In 2003, McGann and his team passed Juxta and Ivanhoe on to NINES. While Juxta is, thus, not new, it is a unique program to analyze because of its heritage as a product of McGann and its current position within NINES as part of a larger suite of digital tools.

I have framed my examination of Juxta around Jon Unsworth’s concept of the scholarly primitive. Unsworth defines the scholarly primitive, using the Stanford Encyclopedia of Philosophy, as a “’finite list of self-understood terms’ from which, without recourse to further definitions or explanations, axiomatic logic may proceed” (par. 1). Unsworth’s use of primitives breaks down the tools of producing knowledge to their most basic components, which has enabled me to explore whether, as McGann says is key, new digital tools allow us to “expand our interpretational procedures” (xii). Unsworth uses primitives in a specific way which I intend to emulate. Unsworth consciously employs primitives figuratively, describing the interpretational processes they embody in a “self-consciously analogical way” (par. 1).  I focus on four particular primitives that I feel are especially predominant in Juxta: discovery, comparison, annotation, and visualization.

I used two sets of sample texts in order to test the abilities of Juxta. The first was the 1890 and 1891 versions of Oscar Wilde’s The Picture of Dorian Gray. Wilde first published the story in a thirteen chapter form in Lippincotts, a Philadelphia based magazine that saw distribution to England (Ross). Wilde then expanded the story to twenty chapters and published it as a book the next year. With the lengthier book edition, Wilde fleshed out the characters and narrative while removing some of the homoerotic content from the 1890 version that had drawn criticism from certain members of the public. The relationship between the characters Basil and Dorian shift from that of a perceived romance in the 1890 version to one of artistic appreciation in the 1891 version. I used Juxta to pinpoint these revisions within the text and demonstrate one practical use for Juxta: finding thematic or narrative changes within revised texts. The second set of texts I used was the 1798, 1800, and 1817 versions of Samuel Taylor Coleridge’s Rime of the Ancient Mariner. The intent here was to test how useful Juxta is when comparing subtle revisions to a text over a period of several years, in this case, nineteen years. I wished to chart the dialectical changes between the 1798 and 1817 texts for which the poem has gained notoriety. I was also interested in how Juxta would collate the gloss to the 1817 version. It is important to note that the work I did with these examples was carried out without any intent of generating new knowledge about the texts themselves. I chose these texts because I knew that comparing them to earlier editions would yield results. My concern was namely to determine how successfully Juxta facilitated the process of comparing texts and making interpretation possible.

Before launching into the results of my examination I wish to briefly outline the program’s interface. Juxta is divided into three panels. On the left is the Comparison Panel which lists all texts that have been added to the comparison set. The user can select which texts from the comparison set are included in the collation. One text is always a base and all others are a witness. On the bottom is the Secondary Panel, which serves a variety of purposes. The Secondary Panel allows the user to examine the source text, view images or notes, and look through search results, amongst other things. To the right is the Document Panel. It is here that the collation process unfolds. Texts can be viewed individually in collation mode or in side-by-side comparison and each viewing mode carries with it different options.

Discovery

The first and perhaps most basic primitive which Juxta enables is discovery. The success of Juxta is that, in essence, all work is taken out of the discovery process. To begin, the user must select two or more texts in plaintext or .xml format that will then form a comparison set. Additional texts can later be added or removed so long as they are in the proper format. I created two comparison sets which I titled Dorian Gray and Ancient Mariner. Discovery is as simple as uploading two texts and pressing “collate documents” button. Any and all changes within the base text and all selected witness texts are immediately rendered visible. Blue highlights designate variations between the base and witness texts. Clicking on a highlighted portion opens a pop-up that demonstrates what the text has been changed to in other editions and lists precisely what category of change it belongs to such as addition, deletion, or word change. By rendering the process of discovering changes in a text automatic, Juxta has eliminated the most tedious component of the collation process, effectively giving the scholar leave to dedicate all of his or her faculties to the decidedly more important task of interpreting those changes that have been identified.

Comparison

There are, as noted, two modes for viewing texts: collation mode which shows a single text and side-by-side mode. For the purpose of actively comparing two texts I preferred the side-by-side view. The process of comparison was extremely fruitful with the Dorian Gray comparison set. As the 1891 text had been expanded into a novel, it included large portions of blue highlighted text where new chapters had been added. There were also select passages that had been expanded from the 1890 to the 1891 version. Finally, both texts were peppered with minor word changes. My interest was on the rarer moments where the 1890 version had been highlighted designating not simply word changes but larger, more substantial deletions. Scrolling through the text I was able to find several moments where these deletions had occurred, many of which were of interest.

The Ancient Mariner comparison set yielded far more difficulty. The problems began even prior to the actual collation process. Unable to find a single source for all three versions, I used two different books from the Hathi Trust Digital Library, each of which presented the poem differently. Amongst the differences were the positioning of line numbers and the general layout of the stanzas. None of the sources provided easy to download plaintext files. Instead, I had to copy and paste the poems piece-meal one page at a time. The quality of the OCR was a significant problem. The difficulties of the three texts’ formatting were present immediately upon beginning the collation process. When I finally ran the texts through Juxta, the program was unable to register some of the characters. As a result, boxes appear throughout the text where there should instead be apostrophes or quotations. Added to this were the issues that the gloss for the 1817 version of the text created. I had intentionally chosen Rime of the Ancient Mariner because I expected the gloss would cause some difficulty and I wished to see how Juxta would handle unconventional text layout. Because of Juxta’s reliance on plaintext or .xml formatting, the gloss which was originally separate was combined into the body of the poem appearing as separate stanzas. The amount of difference between the texts was exaggerated due to the addition of the gloss. Whenever I found what appeared to be an interesting addition to the 1817 text I would need to refer to a print copy of the 1817 edition to see whether the addition was in fact a new stanza or simply part of the gloss. In order to get around the gloss, formatting, and OCR problems I had to comb through the plaintext of the 1798 and 1817 versions deleting the gloss, line numbers, and any immediately apparent OCR errors. I then added these new documents to the Ancient Mariner comparison set as Ancient Mariner 1798 Formatted and Ancient Mariner 1817 Formatted. I kept the original, unformatted versions as reference of how much I had to alter the text. This created a number of issues. First, it countered the very purpose of using Juxta which is to speed up the process of comparing texts. Second, by editing the text, I added additional room for error in the text that were not their originally thereby defeating the entire purpose of the exercise.

Upon editing the texts to a point where I felt I could adequately (albeit with potentially additional errors) compare the texts I was able to find a interesting changes. Most notably, of course, were the stanzas deleted from the 1798 edition. Juxta made it very easy to see which stanzas had been removed from the text in subsequent editions. The comparison also demonstrated a consistent shift in dialect. There were, however, a few changes in the text that I was concerned with at first and which required me to refer back to a print copy of the 1817 text. Even after looking through the poem carefully, I was still able to find OCR errors within the formatted text.

Ultimately the Dorian Gray Comparison set was more successful for a number of reasons. First, because my focus was on larger, more substantial changes to the text I did not have to be concerned over the possibility of OCR errors. Second, the conventional novel format of The Picture of Dorian Gray lent itself more readily to this type of examination than Rime of the Ancient Mariner which is traditionally read in conjunction with a gloss. Added to this were the issues of having to resort to varying different sources for the texts, each with different formatting. This demonstrates some of the structural limitations Juxta is currently facing.

Annotating

It is worth mentioning the annotation capabilities of Juxta. Clicking on a portion of text gives the user a prompt to add a note. Notes appear within the Secondary panel where they can then be easily accessed at any time. Changes that have been annotated appear within the text with orange highlight. I found the noting feature useful for delineating between different types of textual changes in Rime of the Ancient Mariner. I differentiated between word changes, spelling changes, and possible OCR errors. Unfortunately, notes are confined within the comparison set and cannot be shared in any meaningful way other than by providing someone with your same comparison set file.

Visualizing

While Juxta is not a visually intensive tool, it is actually permeated by a number of visual cues that make information gathering more accessible. As I have already mentioned, Juxta highlights changes to the text. These highlights come in a variety of colours to designate different meanings in the same way that Microsoft Word designates between spelling, grammar, and word choice errors using red, green, and blue squiggly lines respectively. In Juxta, changes to the text are highlighted in blue, annotations in orange, and search results in yellow. The blue highlights appear in varying degrees of colour to represent how many witness texts deviate from the base text. Furthermore, the act of viewing texts side-by-side is facilitated by the lines which link changes between texts together. As I scroll through the texts, the lines stretch and contract ensuring that the relative position of each change is always clear. The comparison panel also includes a useful heat map which visualizes the degree of change that exists between the witness text and each base text. Hovering the mouse over the heat map also gives you a numerical value for this difference but the heat map is intuitive enough on its own.

Juxta’s most substantial visual component is the histogram graph which charts the exact points in each text where changes have occurred. This is especially useful in longer texts where, by viewing the graph, one can see patterns of change and determine what parts of the text were revised the most.

Future Changes

There are some changes that could take place to make Juxta and other collation tools more useful. They involve the input and output generated from their use. First, the success of tools like Juxta is bottle-necked by the quality of the texts that are put in. If Juxta was nested within a database or network of databases that could then take out the trouble of worrying about formatting it would remove the concerns I encountered trying to prepare Rime of the Ancient Mariner. As it is, Juxta is already housed within NINES giving it an unprecedented advantage. Second, it would be convenient if it were easier to share output made with Juxta. Again, this could be accomplished if Juxta was intuitively embedded within a database system. Juxta already saves work within comparison set files which combine texts along with any annotations the scholar has created. It would be satisfying to be able to “publish” annotated changes in a text. Such an option might entail creating a simplified, view-only interface that will open up when clicked on by a reader. This sort of functionality could potentially be included with a text when it is searched for in a database, appearing next to the Twitter, Facebook, or other such buttons. The significance of networked digital tools is not lost on Unsworth. He suggests that:

With the possible exception of a class of activities we’ll call authoring, the most interesting things that you can do with standalone tools and standalone resources is, I would argue, less interesting and less important than the least interesting thing you can do with networked tools and networked resources. (par. 19)

So long as Juxta remains one of these standalone tools, its full potential will not be reached.

Conclusion

For the purpose of this project I have framed my examination of Juxta around four different scholarly primitives. This is not an exhaustive list. I could have discussed searching, sampling, or linking, each of which is present in Juxta to some degree. But the point of this examination is not to determine how many primitives Juxta utilizes—any tool, regardless of complexity or worth is bound to reflect some form of primitive. Rather, the point is to prove how a digital tool like Juxta can, using McGann’s words once more, “expand our interpretational procedures” (xii). Said in other terms, do the scholarly primitives employed by Juxta alter the process of interpretation in any meaningful way?

While my intent is that the observations I have made will speak for themselves and will allow the reader to come to his or her own conclusion, I will venture to say that Juxta does, indeed, open up new avenues for interpretation. In case any readers wish to experiment with my text samples I have also included the two comparison sets I used for further clarity. As mentioned earlier, the output I generated with Juxta was not intended to accomplish anything new. Rather, I hope to have shown that Juxta frees the user to pursue thought and interpretation further than is possible manually by rendering all labour intensive processes automatically. Juxta allows the user to experiment with ideas more fundamentally by removing the risk that would traditionally accompany the long process of collating a text which may or may not render anything of use. The histogram graph further enables new kinds of experimentation by providing information and visualizing patterns that would not normally by available to scholars.

Programs such as Juxta, and the numerous other digital tools that have been developed since, offer an optimistic glimpse of the future of scholarly work. The potential for academic discourse to be generated by the next generation of digital tools—tools created by humanities scholars themselves—are clear. The issue at hand, then, is not the tools themselves but the apathy surrounding their use. For change to occur it is necessary that scholars make a more conscious effort to promote the use of digital tools in the discipline and the earlier the better. Collation tools like Juxta provide the kind of interpretational data that could be of use to all levels of scholars from the undergraduate to the postdoctoral.

 

 

 

Works Cited

Coleridge, Samuel Taylor, and Carleton Eldredge Noyes. The Rime of the Ancient Mariner. New York: Globe School Book Co, 1900. Hathi Trust Digital Library. Web. 12 Nov.

Huynh, David François. Timeline. Massachusetts Institute of Technology. 2000. Web. 26 Oct. 2011. http://www.simile-widgets.org/timeline/

Juxta: Collation Software for Scholars. NINES. 13 Sept. 2011. Web. 15 Oct. 2011. http://juxtasoftware.org/about.html

McGann, Jerome. Radiant Textuality: Literature After the World Wide Web. Basingstoke: Palgrave Macmillan, 2011. Print.

Mutton, Paul. “Piespy Social Network Bot,” Jibble. N.p. 2010. Web. 26 Oct 2011. http://www.jibble.org/piespy

Paley, Bradford. TextArc. N. P. 2002. Web. 26 Oct. 2011. http://www.textarc.org/

Ross, Alex. “Deceptive Picture.” New Yorker 87.23 (2011): 64-70. Academic Search Premier. Web. 10 Nov. 2011.

Siemens, Ray. “Drawing Networks in the Devonshire Manuscript (BL Add 17492): Toward Visualizing a Writing Community’s Shared Apprenticeship, Social Valuation, and Self-Validation.” Digital Studies/ Le Champ Numerique. 1.1 (2009):n. Page. Web. 24 Oct. 2011.

Unsworth, John. “Scholarly Primitives: What Methods Do Humanities Researchers Have in Common and How Might our Tools Reflect This?” Humanities Computing: Formal Methods, Experimental Practice. King’s College: London, 2000. Print.

Wilde, Oscar. The Picture of Dorian Gray. London: Lippincotts, 1890. Project Gutenberg. Web. 10 Nov. 2011.

—. The Picture of Dorian Gray. London, 1891. Project Gutenberg. Web. 10 Nov. 2011.

 

 

Reilly’s website is at: user-1.comp1855.ca

Ben’s website is at: thestoryismorethanitself.omeka.net