Here comes Technology
This simple DIY project only consisits of several PHP files. It is manually written without a PHP framework!
What are Application Programming Interfaces (APIs) and SPARQL?
Of course, you can read the Wikipedia article, but, if you are not a techie person, it does not mean much for you. I also don't know the precise definition. It does not matter. It is more important if we can use it. To me at least, it is a way to access data (and/or service/software). Normally, people access a website and browse the content. Sometimes we also interact with it by posting information. We upload photos in Facebook, or send name and email address to buy a ticket, or simply type keywords to search on Google. What happening behind the scenes is database transactions. What we send is stored in a database, so the website provider can process the data of thusands of users efficiently. Or, Google is a kind of database, so that we can find websites after typing keywords.
Now, instead of using a normal website, API allows you to access such a database (and/or service/software) directly. It is an extra service the website provider offers to share the data (and/or service/software) they have. Why? The offered data is standardised, so other people can use them quickly and efficiently. In particular, software developers can write a code to process them automatically. This is important, because normally users have to type something and click buttons, when using a website. As long as the code is written as such, computer/machine can process data without us. In this way, APIs are useful to build a new service/software. For instance, if you have an access to a weather API and a map API (e.g. GoogleMaps), you can show the temparatures of different locations on a map. This can be a totally new service which is useful for somebody. API allows us to reuse somebody's data (and/or service/software). Why not use such a nice idea for Digital Humanities? That's what JCDJ tries to do.
SPARQL (endpoint) is similar to APIs. SPARQL itself is a rule (ie language) to make a query on a database. SPARQL endpoint is where queries can be built and we can see the query results on a website (and/or directly like API). So, if somebody opens up a SPARQL endpoint, we can use the database with SPARQL language. SPARQL is used to query a specialised data type, called RDF (Resource Description Framework). RDF consists of a simple sentence: Subject, Predicate, and Object. For example, "Mozart lived in Vienna" in RDF looks like: <Mozart><livedIn><Vienna>. So, if we ask "Who lived in Vienna?"" in SPARQL query, we can get <Mozart>. For the last decade, many people have created RDF datasets and opend up SPARQL endpoints, so that we can now access RDF databases all over the web. The powerful expression of RDF makes our knowledge richer.
- Very nice and non-techie real example of SPARQL tutorial is written by a historian, so check more concrete examples for Digital Humanities.
- Go Sugimoto also uses SPARQL and RDF in another project, using data from Wikipedia. Check my WiQiZi
Due to the use of full-text search and chain of APIs, the query performance is rather slow. Unfortunately, there is not much we can do about it, as this application totally depends on external APIs. In particular, the combination of full-text search and named entity recognition based on it makes it hard to achieve a good performance. On the other hand, we have also proved that the current API set-ups could be a bottle neck for serious implementation of distributed data-centric research for Digital Humanities. In fact, this point is pointed out in my academic paper. Let's hope the situation will become better in the near future!
The code is super-portable. In order to use the same technique for another resource of the Open Library (and Internet Archive), we only need to change the URI of the resource. In addition, if a full-text API is available elsewhere (JSON format would be the easiest to implement), it is relatively straightforward to reuse the code for a resource, regardless of its provenance. It is certainly needed to analyse the API and refactor the access to the API data, but other part of the code should work in general. The code has been applied for the following resources:
- Voices from the Orient, or, The testimony of the monuments [microform] : of the recent historical and topographical discoveries : and of the customs and traditions of the people in the Orient to the veracity of the sacred record (Burnfield, George 1884)
- A voyage round the world, but more particularly to the north-west coast of America [microform] : performed in 1785, 1786, 1787, and 1788, in the King George and Queen Charlotte, Captains Portlock and Dixon ; dedicated, by permission, to Sir Joseph Banks, Bart. (Beresford, William, fl.; Dixon, George, d. 1789)
- The Hittites [microform] : their inscription and their history (Campbell, John, 1890)
JCDJ is the first test case, but other resources can be release quickly in the near future
No scalable service
As this project is largely experimental, JCDJ only uses free service of APIs. Therefore, scalable and stable service cannot be offered at the moment. In particular, it uses free version of a commercial API (Dandelion APIs) and the use is limited to 1000 units (entity recognition) per day. If there is a high demand on JCDJ, we consider how to offer a more robust service.
APIs/SPARQL endpoint used
- The Open Library to search inside the journal
- Dandelion to extract entities
- Wikipedia/Wikimedia to extract metadata of thumbnails
- Google Maps to display maps
- DBpedia to fetch coordinates of places
Data are only displayed and no data are stored in our servers.