_drafts

Digital Aladore and aladore-book

I started the Digital Aladore project in 2014 as a grad student project blog with the idea to take an obscure digitized public domain text and make it more usable. In 2018, I used the text to develop the aladore-book project, exploring ideas of using static web generation to create ebooks with added data features.

Here is an introduction adapted from my presentation “Digital Aladore: Reflecting on Five Years of a DH Learning Project” at DHSI Colloquium 2020:

Abstract

Digital Aladore started in 2014 as a project blog with the idea to take an obscure digitized public domain text and make it more usable. Grappling with the crummy digital legacy of the Million Book Project/Universal Library digitization initiatives of 2000’s, it was conceptually an opportunity to apply a background in Classical Studies to investigating the twisted details of digital witnesses and textual transmission. However, encountering the limitations of digitization that failed to take advantage of the affordances of digital mediums, it led down a path to learning the software and approaches for transforming images, text, and data. Importantly, publicly articulating the learning process of encountering challenges and discovering solutions resulted in a lasting resource of accessible explanations about using technical tools–attracting significant use from outside of academic sources. After more than five years of maintaining Digital Aladore as a venue for thinking through ideas and tinkering with DH tools, it has moved far beyond its origins as a course assignment in library school. The experience guided my development into the realm of open software, content, and communication, while providing a playground of data for tooling up DH projects. This talk reflects on the unexpected value, outcomes, and sustainability of a simple side project for learning DH hands-on.

Presentation

Rather than introducing my official scholarship–this presentation reflects on more than five years of a Digital Humanities side project–a little personal gem that has sat simmering on the side of my desk since grad school–and the unexpected value that comes of having this space for thinking through ideas and tinkering with DH tools. This little project acts as a medium for exploration and learning, a theme DH thinkers have commonly returned to, “messing about” and play, which is especially appropriate in the DHSI context where traditionally we come together for a week of exploring, learning, and concocting new ideas in lovely Victoria–which I am missing very much this year.

Digital Aladore was a project I started with the idea to take an obscure digitized public domain text and make it more useable. It has it’s roots in a weird mix of academic ideas, personal whimsy, and, of course, a class assignment.

In 2014 I was taking a course at UBC nicknamed “OK MOOC”. It was a collaboration of six international universities offering in person, for credit classes, embedded inside a free online course with over 12,000 other learners– it was basically very intense, overwhelming, but truly fascinating.

Digital Aladore came out of that mix, thinking about open culture, working with public domain text and open source software–as well as an aim for transparency and clear communication.

My background was in Classical Studies, meanwhile I was working in a digitization lab, I was fascinated with how the scholarship of textual transmission might apply in the digital archives. Particularly, in the legacy of terrible metadata and horrific OCR from mass digitization projects of the late 2000’s, such as the Million Book Project/Universal Library.

You could find these digitized texts online, but due to the poor quality they are barely useable for reading or analysis. On the other hand there are academic projects such as the William Morris archive that provide critical edition quality text, but are often locked in outdated web sites (the Adobe Flash Edition). I mention William Morris here because Aladore is basically an imitation his romances–but Morris is popular enough to have good human-edited reading editions available in places such as Project Gutenberg. For more obscure texts, we are stuck with trying to read auto-generated bad OCR ebooks or a lousy PDF.

I really just wanted to read a nice clean copy of Aladore, so I set out to make my own, learning everything that went into it along the way.

In retrospect, this is a classic digital librarian idea–untangle metadata, create more useable access, and reporting out about everything.

So the blog was just that–step-by-step notes about open source tools and processes to work with digitized texts, publicly articulating my learning process and recording solutions.

I processed the two print editions of Aladore that were available, and then I collated them to help in the textual editing process.

I wanted to engage people with thought about textual transmission and concepts of openness. However my most popular posts ended up being the practical steps to use OCR. And they seem to get used–for example, I ran across this forum post of an IT person trying to optimize some batch script based on a Digital Aladore post.

Think about that, some publishing company is using a random code copied from a DH student learning blog…

Documenting learning is powerful.

That approach of open communication has stuck with me, contributing to organizations such as Software Carpentry or Programming Historian, and my own workshop resources.

The text of Digital Aladore lives on.

For me, it acts as a well known text data set for use in testing out ideas or any new tools that comes across my plate, a playground of data for tooling up for other, more important, DH projects. And a lot of the use cases end up built into my current edition, aladore-book project, where I play off the “collections as data” concept to make a variety of handy text derivatives readily available for use on the web.

And, true to the original project, it is also reader friendly, so if you you are looking for a obscure summer read, check it out.