A few bonus examples to practice Refine+API skillz…

ChronAm Demo

The Chronicling America project provides access to millions of pages of digitized historic newspapers. It also includes a simple, open API to interact with the repository programmatically. Unlike the IIIF which is a standard, this API is custom built into the repository system and is used by its own web pages to retrieve data. Read the documentation to learn how to build URL queries.

To search individual pages, the recipe is:

Let’s say we want to see what was in the news this week in Idaho 100 years ago. Build up the query string key+value pairs, similar to how you would build an “advanced search”:

Combine all the parameters with & and to get:

https://chroniclingamerica.loc.gov/search/pages/results?state=Idaho&dateFilterType=range&date1=05%2F13%2F1919&date2=05%2F19%2F1919&sequence=1&format=json

Now use that link to start a new Refine project. Steps:

Now you have a full text data set of the front pages of Idaho news papers this week 100 years ago!

Text Processing API

Text-processing.com provides some basic natural language processing APIs designed for learning that are available without a key. The APIs are based on Python NLTK (see the NLTK book for a great introduction to NLP and programming).

Like many API services used to enhance data, such as geocoding or named entity recognition, Text-processing uses HTTP POST to transfer information to the server for processing. A POST can be significantly more complex than GET, since it allows any amount of data to be attached to the body of the request. This is also more secure than GET, since the information is encrypted in the message, rather than appended to the URL.

However, since this is not just a URL, to use this type of API we will have to create POST requests using a programming language, such as Python. Luckily, OpenRefine has Python built in!

Let’s start another new project, get some text data, parse it, and use the API to test the Sentiment Analysis service.

Start Aladore data project:

This sets up a project where we can start exploring the text as a collection of lines. Lets explore how the text represents “dog”

Sentiment analysis API:

import urllib2
url = "http://text-processing.com/api/sentiment/"
data = "text=" + value
post = urllib2.urlopen(url, data)
return post.read()

This POST request sends our text data to the Text-processing API which returns JSON data. We could parse this data using further functions, such as value.parseJson()["probability"]["neg"].