Testing TypeWright

I participated as one of the “power users” in the testing of the TypeWright tool hosted by 18thConnect.org. In my assessment of the tool I chose a work titled Love-letters on all Occasions Lately Passed between Persons of Distinction, collected by Mrs. Eliza Haywood. I corrected the first eight pages of the document.  After completing my evaluation, I submitted a report that outlined what I have learned, my experience with interface, and suggestions for future developments. The sections below contain my evaluation of the tool as well as a more comprehensive version of the report I have submitted to 18thConnect.org.

For those who are not familiar with the tool, here is a Video Introduction.

TypeWright Beta Evaluation:

How much did you learn about mechanically typed text from this exercise?

To test TypeWright, I chose a work titled Love-letters on all Occasions Lately Passed Between Persons of Distinction, Collected by Mrs. Eliza Haywood. I did not realize before I started the exercise that spacing between letters would be the main issue. Starting with the first line of the book—where the OCR text for “LOVE-LETTERS” was “LOV E-LETTERS”—most of the changes on the eight pages I corrected involved deleting or adding spaces between letters and words:

I’ve also noticed that printers often used extra spacing as part of the ornamentation of text. In my example, the publishing city, London, had an extra space between each letter. Similarly, the first word of the letter, Madam, had larger spacing between letters.

I also learned that, aside from spacing, mechanically typed text is remarkably varied in its use of lower and upper case letters as well as bold and cursive typefaces. Whether used for emphasis, distinction, or aesthetic reasons, almost every page of my document contained words in bold typeface, italics, small capitals, or regular capitals. The variations of the letter “s” were the result of the second most common OCR error. While OCR worked surprisingly well with some words that contained the long “s,” words like “shine” or “mistress” where the letter “s” was in close proximity to other problem letters (“h” and “t” in these cases) would almost always require correction. The doubling of the letter “s” also frequently caused multiple mistakes:

Overall, I think that this exercise is valuable to any student of literature at the undergraduate or the graduate level.

Please tell us anything else you’d like us to know about your experience using the TypeWright beta.

I had some difficulty finding a TypeWright enabled text. I wanted to work with one of the texts by Eliza Haywood. A search for “Haywood” from the TypeWright tab, where the engine searches specifically for TypeWright enabled texts, returned 28 results. Less than half of these results allowed me to edit the work; out of the first 10 results only one did not take me to the error page. I was aware of the last line issue (I learned from the walkthrough video that the last line would not retain any changes if one did not return to the previous line), but on several occasions I still managed to forget to return to the previous line to make sure that the last line edits would be saved. Perhaps there should be a “check/save” button for the entire page. I also had some problems with adding and deleting lines. The section of the book that I worked on contained an ornately decorated initial capital letter (p. 2).
Because of this initial the OCR text missed the first line. I have tried to insert the line above the section, and while I was able to do so, the line disappeared as soon as I navigated away from the page. A bug with line insertion was reported in the original “call for testers” e-mail, so the feature was probably not fixed when I did my testing. When I moved to the third line of that section, I realized that the first line was actually captured by the OCR text, but because of the positioning of the ornamented initial—the ornaments took some space forcing the actual letter to be slightly below the first line—the first was moved to the position of the second line while the second line was mistakenly captured as the first one. I was able to delete the first line (which was now in a position of the second line) to make sure that the actual second line was followed by the third line of the text. I was not able, however, to transcribe the first line of the text in any position above the second line. Similarly, the page header of some of the pages I corrected contained the word “DEDICATIONS” that the OCR missed, and I was not able to add that line above the first line of the text.  I have also experienced several instances where the last line would be on the same level as a page number/identifier. Page 2 of my document, for instance, contained this OCR text in the last line: “A3 fully.” Clearly, “A3” was not part of the original last line, but I was not sure whether to leave this line intact or insert a line with “A3” below to separate the page identifier from the rest of the text. Perhaps small issues such as the one I had with page numbers should be documented as part of the “Instructions” text on the bottom of each “Edit” page.

What additional features would you like to see?

Given the problems I have encountered with adding and deleting lines—and assuming the line behavior I experienced with my text was not singular—a “swap/switch lines” feature would be a good addition to the “insert line above/below” buttons. It would also be great if the red frame that displays the current line would be more interactive. For example, if I am adding a line that was missed by the OCR, I could move the frame to the exact line position on the scan. Similarly, if the OCR missed the last character/word of the line, I could expand the red frame to the missing part of the scanned page. As I have mentioned before, a “check/save” button for the entire page—that perhaps could also allow a final review of all the changes made—would be a great feature. While I understand the need to confirm that the line is correct with the “Assert the line is correct” keyboard shortcut or the “check mark” button, I found that sometimes most of the lines on the page were correct, but I would forget to assert that the lines were correct on several occasions. I think it would be better to automatically assert that each unedited line is correct. And if developers implement the final “check/save page” feature, I would be forced to double check and assert that each correct line is in fact correct. Perhaps allowing the user to see the entire page in plain corrected text would also be a good reviewing strategy. Finally, one of the most frustrating parts of the user interface was not being able to see the entire image scan. For example, when I came across that decorated initial capital, I constantly had to jump up and down the lines around the letter in an effort to figure out the OCR mistakes. I understand that showing the full scan of the page would require more scrolling for the corrector, but if developers introduce the ability to resize the scanned page window, then users can select a size that is more comfortable for them and their computers’ screens and resolutions.