Crowdsourcing the Academy:
Digital Humanities and the Evolution of the University

In Writing Machines (2002), Katherine Hayles relates the reaction of her colleagues at a conference of the North American Society for the Study of Romanticism when she drew their attention to the fact that The William Blake Archive’s rhetoric is dominated by print, with tools specifically designed to get as close as possible to the original colour and size of the printed page. Her “obvious moral that the literary community could no longer afford to treat text on screen as if it were print read in a vertical position” did not resonate well with her audience (43). Five years later, in the inaugural issue of Digital Humanities Quarterly, Joseph Raben identifies print superiority as a mode of thinking that is deeply entrenched in all aspects of humanities in the academia. One of the central concerns of DHQ, he posits, is to challenge “the status of online publications as an inferior medium” and the fact that  “electronic media is not as highly regarded by the gatekeepers of tenure and promotion as the traditional hard-bound book and the article offprint” (2). Finally, in a more recent article What is Digital Humanities and What’s It Doing in English Department?” Matthew Kirschenbaum highlights the social aspect of Digital Humanities, its public visibility, community and significant impact on the world outside the academy, and its position as a “free-floating signifier” that “increasingly serves to focus the anxiety and even outrage of individual scholars over their own lack of agency amid the turmoil in their institutions and profession” (60). These three independent standpoints form a short, yet telling narrative of the field. Most striking is the number of unique observations and questions digital humanities scholars bring to bear on the role of humanities and the structure of university as a whole. Questioning everything from knowledge “gatekeepers” and access, to traditional ideas of scholarship and pedagogy, and the role of community and public good, digital scholarship raises provocative questions for humanities scholars and challenges not only the traditional way of doing things within the academy, but the structure of the university itself.

It would be difficult to deny that digital humanities has become a key player in the evolution of the academia. No longer located at the periphery, digital scholarship has had and continues to have a profound impact on academic discourse, scholarship and pedagogy. In an ADE Bulletin from 1988, Cynthia Selfe posits that “experimentation at the most basic level” is the only way to characterize the English department’s affinity with computers: “As a profession, we are just learning how to live with computers, just beginning to integrate these machines effectively into writing- and reading-intensive courses, just starting to consider the implications of the multilayered literacy associated with computers” (qtd. in Kirschenbaum 55). The theme of experimentation is, of course, still relevant to digital humanities scholarship today with new experimental digital tools and emerging methodologies such as data mining that radically extend research, especially involving large groups of texts. Indeed, in trying to keep up with technological advances, digital scholarship is bound to retain a label of “emerging field” no matter the strength of its professional apparatus. But the word “emerging” or “experimentation” need not carry any negative connotations. While digital scholarship may not have resulted in extensive alteration of university structure yet, it has certainly set the stage for much fruitful experimentation, and encouraged and challenged scholars working to advance understanding of large-scale institutional change.

One such experimental strategy is crowdsourcing. Whether digital humanists recruit public for help with transcribing large collections of manuscripts or crowdsource an entire book project through multiple social media channels, and with no restrictions on submission format, massive collaboration is an approach that is charged with potential for rewarding alterations within academic discourse and scholarly practices. This paper will trace how digital humanities utilize crowdsourcing to challenge traditional views of knowledge production and consumption in the academia. I will consider three case studies to explore the benefits and criticism digital humanists face in soliciting public engagement in major scholarly endeavours and extending scholarship and pedagogy beyond the immediate confines of their scholarly community: University College London’s Bentham Project and their decision to recruit general public in an effort to accelerate the process of transcribing Jeremy Bentham’s papers; my own experience using TypeWright tool and its pedagogical merits and public benefits; and Dan Cohen’s and Tom Scheinfeldt’s book project titled Hacking The Academy, a volume crowdsourced in one week through social media outlets that provocatively puts the structure of academy as its subject and presents an innovative method for assembling and publishing collaborative scholarly work. In assessing each of these cases, I will argue that the theme of experimentation that still marks much of the work in digital humanities studies is also one of rejuvenation. An active participant in the evolution of the academia, digital humanities has at its core a simple but pertinent question: how can technology and new media revitalize and revamp the traditional academic system?

The Oxford English Dictionaryarguably the first “massively-crowdsourced collation of English knowledge” (Lanxon; “Crowdsourcing”)does not contain an entry for the word “crowdsourcing.” And while Wikipedia explains that wiki is an “unusual medium” and warns that academics “may reject Wikipedia-sourced material completely” (“Wikipedia”), its collaborative environment seems a fitting place to produce a reliable entry for “crowdsourcing.” The entry suggests that Jeff Howe coined the word in a June 2006 Wired article “The Rise of Crowdsourcing,” where he argued that outsourcing was being replaced by a new commercially advantageous system of  “crowdsourcing”—“the new pool of cheap labor” (Howe). Given the recent coinage of the word, it is perhaps not surprising that the OED does not contain a definition. Then again, having become the definitive—and academic—standard of reference that it is, the OED may not have included an entry for “crowdsourcing” because the word is not “significant” or “important” enough, or is “likely to stand the test of time” (“How”). Whatever the case may be, as another Wired article shrewdly observes, both the OED and Wikipedia have remarkably similar origins (Lanxon). Professor James Murray, chief OED editor from 1879 until his death in 1915, spent several decades crowdsourcing word definitions. Collaborating much in the same way as today’s Wikipedia contributors, the general public would send in slips of paper with quotations that showed word usage, along with suggested definition, date of first use, etymology, and other data (Lanxon). Unlike Wikipedia, however, the OED’s claim to authority is quite prominent, taking the center stage as its subtitle: “The definitive record of the English language” (“Home”). Whether or not Wikipedia will ever acquire within academic circles the same status as an encyclopedia as the OED holds as a dictionary, this comparison reveals that crowdsourcing has been around for a long time and that there is little difference between the creation of the most trusted and one of the most mistrusted sources for scholarly research. Moreover, with the aid of technological advantages, such as instantaneous editing and steady contributions and improvements (Lanxon), Wikipedia might take less time to reach academic acceptance.

At the heart of the problem of academic acceptance of crowdsourced projects such as Wikipedia, that allow participants from all walks of life, lies the threat to the academe’s own importance. These projects challenge the rank and prestige of the “Ivory Tower” that houses experts far away from amateurs. Consequently, decisions to crowdsource scholarly work are often imbued with fears and concerns combined with a genuine interest for experimentation or a desire to challenge the status quo of established scholarly practices.

Such is the case with Transcribe Bentham project, “a participatory initiative,” as its subtitle puts it, that allows anyone who signs up for a MediaWiki account to transcribe manuscripts from a large collection of papers of the Enlightenment philosopher Jeremy Bentham. The reasons the editors decided in favour of crowdsourcing are quite simple and commonplace in today’s academia: a massive undertaking and lack of funding (P. Cohen). The project is a daunting job. Since its inception more than fifty years ago less than half of the papers have been transcribed (Wallace).  Despite the fact that the bulk of this scholarly endeavour would be eventually completed by general public, the editors make sure to highlight that the goal of the project is to produce “the authoritative edition of the Collected Works of Jeremy Bentham” (Causer, my emphasis)—a prominently featured subtitle on the project’s blog page. Again, a production of knowledge in many ways very similar to Wikipedia—it even uses MediaWiki, a software platform that was originally designed for use on Wikipedia—the project exhibits labels such as “the definitive” or “the authoritative” in an effort to attract attention to the expertise of the scholars that oversee it. What is driving this fear that the work has less claim to authority when it involves participants outside the academy on the one hand, or that it is, in fact, recognized as “definitive” or “authoritative” despite its non-academic contributors on the other?

To be sure, there is the real fear that the public would produce poor-quality work. Daniel Stowell, who is in charge of the Papers of Abraham Lincoln in Springfield, Illinois, explains that the hiring of non-academic transcribers for their project resulted in extremely poor quality and multiple gaps in the work: “we were spending more time and money correcting them as creating them from scratch” (qtd. in P. Cohen).  Of course, unlike the public working on the Bentham Project, the non-academics working on the Papers of Abraham Lincoln were paid workers. It is difficult to say whether or not this fact makes a difference in the final quality of the work. One the one hand, paid transcription entails more responsibility for the quality of the work produced; on the other, those that volunteer are generally genuinely interested in producing quality work. Whether or not remuneration makes a difference, Mr. Stowell is understandably sceptical towards new crowdsourcing projects like Transcribe Bentham: the Papers of Abraham Lincoln’s platform for crowdsourcing that was designed by the National Center for Supercomputing Applications at the University of Illinois, Urbana-Champaign was eventually abandoned (P. Cohen).

But aside from the alleged corruption of valuable scholarly materials, there seems to be a deeper concern on the side of the academe, whose response to crowdsourcing, among other digital scholarship, is still overly defensive—even when they choose to collaborate with the public! Smudging the border between those within and without the Ivory Tower, academics still feel the need to continually reassert their importance by fortifying the final product, and therefore the academic institution, with labels such as “definitive” and “authoritative”—labels that call attention to the power and authority that govern knowledge production. The editors of the Bentham Project examine and correct all submitted transcripts. So why is there such a pressing need to defend the legitimacy of digital scholarship that moves away from traditional scholarly practices by engaging the public? Why is the institution so reluctant to accept online scholarship unless it adheres as much as possible to the established—dare I say printmodel of scholarship? After all, the “definitive” or the “authoritative” edition of an author’s oeuvre is one of the most highly regarded products in print publication. Finally, is the primary goal of the Bentham Project to “form the basis of future scholarship including printed editions”? Or is engaging the public a goal in itself? Both of these aims along with “preserving national heritage” and “widen[ing] access” are listed on the “About Us” page (Transcribe Bentham).

Karen Mason, one of the volunteer transcribers explains that she considers her help as a “service to the scholarly community” (qtd. in P. Cohen). Aside from widening access and preserving national heritage, it seems that a service of the scholarly community back to the community at large should be pedagogical in nature. Indeed, the opening paragraph of the “About Us” page clearly outlines the educational benefits of participation:

We would like to encourage all those who have an interest in Bentham or those with an interest in history, politics, law, philosophy and economics, fields to which Bentham made significant contributions, to visit the site. Those with an enthusiasm for palaeography, transcription and manuscript studies will be interested in Bentham’s handwriting, while those involved in digital humanities, education and heritage learning will find the site intriguing. Undergraduates and school pupils studying Bentham’s ideas are particularly encouraged to use the site to enhance their learning experience. (Transcribe Bentham)

Public engagement and education are definitely part of the Bentham Project’s agenda. “I’m no Bentham scholar,” Mason reflects on her experience, “but I am interested in history, so it’s interesting to look at the addenda and deletions in the manuscripts and generally follow the thought processes of a man living in 18th-century England” (qtd. in P. Cohen). Despite pedagogical merits, it would be hard to argue that the primary goal of the project is to extend scholarship beyond the academy. As the second paragraph of the “About Us” page conspicuously admits, “helping UCL’s Bentham Project in its task of producing a new authoritative edition of the Collected Works of Jeremy Bentham” is the end goal. Again, the word “authoritative” here signals that at its core the project is by scholars and for scholars.

Not all crowdsourcing projects, however, are influenced as much by the logic of print scholarship/academic superiority as the Bentham Project. Sharon Leon, a historian at George Mason University, is working on a free digital crowdsourcing plug-in that can be attached to any database, allowing the public to transcribe the materials (P. Cohen). Leon’s approach to crowdsourced transcription is much more daring and forward thinking, freed from the constraints of the academe’s self-importance. She does not fear the ostensible corruption of materials that comes with the public’s involvement: “We’re not looking for perfect . . . We’re looking for progressive improvement, which is a completely different goal from someone who is creating a letter-press edition” (qtd. in P. Cohen). The end goal of Leon’s digital tool is clearly not a perfectly transcribed scholarly archive; rather, it is active public participation and collaboration that extends beyond the academy. This is experimentation at its best. Only time will tell whether the tool will be a success, but, at least for now, this scholar has intentionally chosen to adopt the logic of Wikipedia—one of the most feared non-academic sources.

“Progressive improvement” is also the strategy of another transcription tool “TypeWright” that is currently being developed by TypeWright is a crowsourced correction tool that is designed to improve OCR texts. The aim of the tool is the progressive improvement of the text-versions of the documents that have been created from the page images hosted by commerical databases such as Gale’s Eighteenth Century Collections Online (ECCO).  As the homepage of the tool explains, the corrected text-versions are  “crucially necessary: they are what enables full-text searching, datamining, preserving, and curating texts of historical importance.” I had the privilege to be a beta tester for this tool and write a report outlining my experience and reccommendations.

The call for beta testers was crowdsourced through usual scholarly communication channels such as an e-mail to the departments and instructors with interest in digital humanities, though it potentially extended beyond the academia. The fact that Gale’s OCR versions of eighteenth-century documents are already hopelessly corrupted, and that TypeWrite, at least at this stage, targets contributers that are more or less within the academy, means that the fears of further corruption with this crowdsourcing endeavour are somewhat limited. Nonetheless, the tool has a strong apparatus that allows anyone to review all changes to the original OCR text.

Setting aside the  problem of poor-quality corrections that TypeWright will most likely encounter in the future, in my experience of testing the tool, I found it to be an excellent example of  how digital scholarship can enhance pedagogy. Indeed, one of the first survey questions at the end of my testing asked me what I have learned during the exercise. And I was struck by how much one could learn—with a digital tool—about eighteenth-century print practices. In the brief time that I was testing the tool, I learned, for example, that printers often used extra spacing as part of the ornamentation of text. In the text I was working with, Love-letters on all Occasions Lately Passed between Persons of Distinction, collected by Mrs. Eliza Haywood, the publishing city, London, had an extra space between each letter. Similarly, the first word of the letter, “Madam,” had larger spacing between letters. I also learned that, aside from spacing, mechanically typed text is remarkably varied in its use of lower and upper case letters as well as bold and cursive typefaces. Whether used for emphasis, distinction, or aesthetic reasons, almost every page of my document contained words in bold typeface, italics, small capitals, or regular capitals. TypeWright can certainly be a great pedagogical asset, enhancing discussion and providing an exciting way to study eighteenth-century print culture at the high school or university level.

In addition to its pedagogical merits, TypeWright also has an agreement with Gale that allows it to directly contribute to the public good. Once a work is corrected, Gale releases the page images, as well as the newly corrected text-versions behind them, from its copyright, thus allowing free access to the materials that are part of a subscription-only commercial database. Such unusual arrangement demonstrates that digital humanities scholarship can not only enhance pedagogy and enrich academic discourse, but also provide new avenues to beneficially reform the current institutional system that allows large corporations such as Gale to set a price for and thus control access to knowledge.

The last project I will consider is Dan Cohen’s and Tom Scheinfeldt’s book titled Hacking The Academy, a volume crowdsourced in one week that focuses the discussion on the changing institutional system in the face of digital technologies and challenges the traditional model for academic publishing. Dan Cohen even crowdsourced the title of the volume in an earlier blog entry, asking commentators to come up with a title for his new book that would explore “the way in which common web tech/methods should influence academia, rather than academia thinking it can impose its methods and genres on the web” (“Crowdsourcing the Title”). As the volume editors explain in the preface, they used multiple online channels to distribute their call for papers that asked “intentionally provocative questions” such as “Can an algorithm edit a journal? Can a library exist without books? Can students build and manage their own learning management platforms? Can a conference be held without a program? Can Twitter replace a scholarly society?” The goal was to provoke contributors to comment on the state of today’s academy, to question and “hack” every aspect of the institutional infrastructure (Davidson). “The book,” the call for paper continued, “will itself be an exercise in reimagining the edited volume. Any blog post, video response, or other media created for the volume and tweeted (or tagged) with the hashtag #hackacad will be aggregated at” (Davidson). More importantly, the editors wanted the volume to not only spread across multiple digital media formats and websites, but to become interactive with continuous commentary, blog entries, tweets, and the ability of contributors to directly respond to each other instead of restricting themselves to “inert, isolated chapters that normally populate edited volumes” (D. Cohen et al.)

The call for papers was a tremendous success. During only seven days, 329 submissions from 177 authors were made, incorporating various genres, formats and multimedia. Evidently, the experiment was a remarkable achievement that ostensibly proved that the web can and does influence academia and that academia can no longer “impose its methods and genres” on digital humanities scholarship. Reflecting back on the project in a blog entry, Dan Cohen observes that letting the submissions dictate the form of the volume was an exciting and liberating experience: “I think one of the real breakthroughs that Tom and I had in this process is realizing that we didn’t need to adhere to a standard edited-volume format of same-size chapters” (“Some Thoughts”). Furthermore, the editors discovered that this model for academic volume publishing could be easily repeated and is in many respects superior to the traditional model:

Sure, it’s not ideal for some academic cases, and speed is not necessarily of the essence. But for “state of the field” volumes, vibrant debates about new ideas, and books that would benefit from blended genres, it seems like an improvement upon the staid “you have two years to get me 8,000 words for a chapter” model of the edited book. (“Some Thoughts”)

There was, however, a crucial problem with Hacking the Academy: the unedited versus the edited volume and the requirements for the preparation of a print edition, forthcoming under the University of Michigan Press digitalculturebooks imprint.

While the experiment does indeed demonstrate how digital technologies can usefully transform the production and consumption of knowledge, Dan Cohen might have overstated his case about the academy no longer having a strong grip on the methods and genres of digital scholarship. Derek Bruff, one of the commentators to Dan Cohen’s “Some Thoughts on the Hacking the Academy Process and Model” blog entry, astutely traces the differences between the unedited volume and the final online product:

. . . I’ll admit I was frustrated to see that, as innovative as this project was, you didn’t find a way to include audio and video contributions in the edited volume, even though such contributions were requested. I know that audio and video pieces are in the “unedited” volume, but I would have liked to have seen a couple of digital humanists come up with a creative way to include them in the edited volume, as well.  Also, it seems that most of the pieces in the edited volume are from people in the humanities. I understand that given the nature of the call for submissions (via Twitter accounts and blogs run by humanists), this is to be expected. But there was nothing in the project description (at least, that I noticed) that limited contributions to those in coming from the humanities. You have some important voices from outside the humanities (Wesch, Jarvis, Junco, and others), but it’s still a very humanities-centered volume. . . . I hope that what you’ve done will inspire others to take on similarly inventive projects, perhaps ones that can respond to the criticisms I’ve made here.

What Katherine Hayles described as print-centered rhetoric (42) is clearly working its way into the online edited volume, anticipating the final—and even more highly prized—product: the print edition. As Bruff notes, the edited volume cuts out all of the multimedia and most of the “unusual genre” pieces. Moreover, it makes sure that accepted and respected digital humanists such as John Unsworth, Matthew Kirschenbaum, and Kathleen Fitzpatrick (with five entries!) are featured on its pages. Despite its potential, the unedited volume will, more likely than not, soon be abandoned. What will remain is the “authoritative” print edition with more or less “inert” entries and pieces of interaction, tasked with the job to perpetually exude the sense of professional self-importance.

Experimentation is indeed the perfect way to describe what marks much of the digital humanities scholarship today. Whether trying to accelerate painstaking scholarly endeavors through crowdsourcing, or reimagining an edited volume in the digital age, scholars are attempting to move away from producing digital scholarship modeled on the established print practices towards new models that have the potential to transform the very structure of institutional knowledge production. Some experiments might not be as successful as others, and, as I have argued in this paper, print-centered logic and the fear of losing the “Ivory Tower” status still undermines a great deal of digital endeavors. The next important step is to continue experimenting. As Derek Bruff notes in his comment, the hope is that each project will inspire others to build on, improve, and actively participate in the evolution of the academia. Just like with Wikipedia or the OED “progressive improvement” is a slow but steady path to a better institution.

