Kuldar Aas: So, there is data – what can we do with it? - (Re)using data on Archives Portal Europe

published under CC-BY-SA license

Abstract:

At the moment of writing (May 2014) the Archives Portal Europe includes data from 397 archival institutions, in total more than 38 million descriptive units which reference to more than 141 million digital objects.[1] By the end of the Archives Portal Europe network of excellence (APEx) project in 2015 the amount of institutions and data is expected to grow even more and therefore the portal will include a huge set of data which has much potential for reuse.

However, this potential has not yet been exploited much as the main scope in the APEx project, and also in the previous Archives Portal Europe network (APEnet) project, has been the development of the tools for data gathering, normalisation and simple presentation.

This article is trying to give an overview of the first emerging solutions for more effective data reuse and research as well as to discuss further possibilities which reach beyond the lifetime of the APEx project.


 

First steps in the Archives Portal Europe

Put in highly simplified terms, there are two crucial components to a user-oriented portal: the content which users can access and the functionality which allows them using the content. Therefore, there are also two possible ways developing the portal – starting with the user functionality or with the content.

The Archives Portal Europe decided to start with providing content. During the APEnet project, which was running between January 2009 and January 2012, the main aim was to develop functions, which allows content providers, that are archives and other institutions with archival collections, to provide content to the portal and thus ensure that a reasonable amount of data can be uploaded. In regard to user-oriented functions everything remained rather simple and straightforward. The functions, which has been offered already in the very first public version of the portal in 2011 included possibilities for simple and advanced searching, faceted browsing and the possibility to save searches (so that these could be executed again later) for logged in users.

Also within the APEx project (which started in March 2012 and is about to finish by the end of 2015) the primary goal is to extend the amount and scope of content available in the portal. However, while putting together the project, it was also decided that user-related functions would need more attention. Therefore, a Work Package for this purpose was added.[2]

The work on user-related functions started with a broad scoping survey among the key data providers, the European national archives. As a result core user groups and most wished-for functions were defined, and the latter have also been analysed in rather much detail. A detailed description of the survey and the results are published in the Work Packages report to the European Commission in 2013.[3]

However, from the responses it was rather nicely visible that, as far as archivists themselves are concerned, the main user groups for the Archives Portal Europe are academic staff, hobby historians and teachers (ie educational purpose). In terms of user-related functions the top 5 wish list included the personal research space for personalised mash-up possibilities, feedback tools, tagging, Linked Data and updates to faceted search.

In order to comply with the content providers requests, in early 2014 two user-related functions became available:

  • Feedback form: It allows users to send feedback to data providers about their content. Essentially, when a user opens a description on the portal and discovers errors or missing information (s)he can fill a simple form and let the archives know about it;
  • My Pages: Users can register and have access to personalised services. As of now it is possible to save searches, but options are  going to be added rather soon!

Of course, testing things is better than reading about it, so now it’s time to open www.archivesportaleurope.net, register for a My Pages account, and try out to Save searches and to give feedback to the archives!

 

And that’s it?

Well, while it is great what the APEx project has already achieved, then obviously that is not the end. For the very near future, plans include a major update to the My Pages by adding functions for bookmarking, creating and managing collections, and also allowing sharing and collaborating.

After all by the end of 2014 logged in users of the Archives Portal Europe should be able to create thematic collections which include saved searches and specific bookmarks, extend the personal context descriptions of these and ultimately, share these with friends, colleagues or the general public. Additionally, discussions are being held around extending this towards a personal exhibition function which would allow users to design and highlight their ideas around European history.

Now, remind of the user groups mentioned above: researchers, hobby historians and teachers. It is rather clear that the My Pages function even in its most extended version is mainly useful for hobby historians and teachers, both with less demanding research objectives. However, for scientific researchers this is not sufficient as nowadays research is shifting more and more towards using specific Digital Humanities tools, like data mining and automated text processing applications customised to provide effective results in a specific research area. As well, their needs for background data reach often beyond the scope of Archives Portal Europe. In short, scientific researchers need to analyse archival data in combination with data from other sources with specific tools most suitable and convenient for the researcher.

When scoping the needs as such, it is also rather clear that it is not reasonable for the APEx project to start developing specific functions and tools as:

  • the project partners represent mainly data providers and not the researchers and therefore, also lack necessary knowledge about the state of the art tools users might need;
  • the whole field of Digital Humanities is quickly developing, which also means that the Archives Portal Europe would need to invest significantly to keep its tools up to date and on a competitive level;
  • as already mentioned, the scope in terms of data for researchers is often wider than just archival descriptions and therefore, the portal would also need to have functions to hold other (and often in significant size) kinds of data in user’s personal spaces.

Mainly that's why the more reasonable way forward for APEx seems to be to open up the data – make the data itself as easy to use as possible. So researchers could just take the data and work on it in any tool or environment they fancy. Technically spoken, the approach should therefore be to use the principles known from Open Data and Linked Data initiatives.

A first contact in this field is emerging right now as the APEx project is starting discussions around sharing data with the CENDARI project[4], which is concentrating on Medieval and First World War research and is developing state-of-the-art research tools for that purpose.

 

What is this "Open Data"?

To discuss the relevance of Open Data in the Archives Portal Europe, it might be useful to explain this concept, even if the approach itself is getting more and more known.

Well, the most generic definition of Open Data is data which has been made available under a public license. As such, any data which were put out onto the web and explicitly allow reuse can be seen as Open Data. However, Tim Berners-Lee, the inventor of the Web and Linked Data initiator, has defined additional and more complex levels of Open Data. Especially for the highest level, also called five star open data or Linked Open Data (LOD) the approach also includes the use of specific technical standards known from Semantic Web initiatives.[5]

The special thing about Linked Open Data is, that it allows data to be reused not only by humans but also by automated applications. It is machine readable. Trying to explain it rather simply, the idea of Linked Open Data is that:

  • all the data is in a common format (in most cases the W3C standard Resource Description Framework (RDF) – is used);
  • single values within data are connected to global vocabularies (like Wikipedia).

So, why should archivists and the Archives Portal Europe be concerned about Linked Open Data? Especially the connection to global vocabularies allows clever pieces of software to connect different datasets to each other rather easily and in an automated manner. As an example, when a German archival description mentions Köln and an English description to Cologne and both are connected to the same vocabulary value, like http://www.geonames.org/2886242/koeln.html, then automated linking can be done between the two descriptions despite the fact that different name forms have been used.

Linked Open Data allows data reusing and automatically analysing more easily. It is also one of the big trends in current Digital Humanities and that is why Archives Portal Europe should be concerned.

However, the main technical blocker issue next to the delivery of Linked Open Data from Archives Portal Europe is the quality of data. Probably most archivists understand that connecting specific data elements to some global vocabularies can be rather hard work. Most archival institutions have created their digital descriptions by simply typing these from paper catalogues in a massive effort. Of course such work can be quite erroneous and typos rather common. Next to that also the amount of parallel (historic) name forms and terms available in descriptions makes it hard to connect these to a global vocabulary which uses only current terms and names.

But that is actually the power of the Archives Portal Europe. While the task of tagging archival descriptions with terms from vocabularies would be quite a mouthful for any single archival institution, the APEx project has offered a good chance to bring together experts from various archives and look into this together.

This lengthy introduction leads to a short statement on what APEx is doing. Namely, work is currently finishing on selecting most appropriate global vocabularies for the use on archival descriptions and also on evaluating some tools which might be used to find vocabulary terms from single descriptions automatically. More details and the results of this work are going to be published as report within the APEx project in summer 2014.

But to give away the main objectives already now: The idea is that the Archives Portal Europe Dashboard, the central tool for data management in the portal, could include some simple tools and scripts which data providers can use to analyse, tag and publish their data as Linked Open Data. In essence data providers would have some additional big buttons to push, which execute specific automated tools and let data providers review the results. As such data providers would not need to analyse and customise specific tools themselves, but simply use the tools provided by the Archives Portal Europe, which are already optimised for the use on EAD, EAC-CPF, and EAG.

However, what is not known yet is whether all of the implementations will be possible within the APEx project lifetime or will happen later in the framework of the proposed Archives Portal Europe Foundation. Realising that Linked Open Data is really the big opportunity for allowing more complex forms of reuse, it has to be implemented!

 

Having the technical stuff sorted out is the solution!

Indeed, until now APEx have mainly concentrated on the technical components. But (and that is a really huge but) there is also the issue of the will of content providers. In a legally correct scenario every content provider of the Archives Portal Europe would need to agree to the terms of usage for every single function which can present data in a different way than the original. As an example, the Archives Portal Europe should not be allowed to roll out a personal exhibition function unless data providers allow their data to be used that way. While the issue is not that pressing as far as data remains in Archives Portal Europe, of course in the case of Linked Open Data content would end up in other environments, which are not possible to be controlled by archivists. Therefore, some kind of legally correct confirmation from data providers is necessary before their content is published in such a way.

Next to that, there is also the question of ensuring a reasonable service level for functions where users would or could expect responses from archival institutions. As an example it is technically rather simple to provide forum and feedback possibilities, but it is by no means simple to get archivists committing themselves to maintain the functions and respond to user queries in a reasonable timeframe.

As soon as there are more than just individual user-related functions available and more than one data provider is included, there is a need for a formal management regime or policy. The scope of such a policy should be to set down things like:

  • how new user-related functionality is agreed upon;
  • who is in charge of managing functionality and where input from data providers or Archives Portal Europe are needed;
  • who is allowed to reuse the content outside Archives Portal Europe;
  • how requests for reusing content are evaluated.

In 2013 the Cultural Heritage Information Management Research Group (CHIM) of the Victoria University of Wellingt, New Zealand, carried out a survey called "Digital cultural heritage 2.0: behind the scenes".[6] The objective of the survey was to look at how memory institutions manage their Web 2.0 programmes and in Europe the results are rather controversial. While more than 80% of the European archives which answered the survey, have implemented Web 2.0 or user-oriented functions, less than half of these have actual sustainability plans to maintain and manage these functions. In short: most archives have until now concentrated on the question what to do and less about how to maintain it.

This is really a huge opportunity for the Archives Portal Europe. Right now and right here it is the chance to come together and start discussing among most of the European archival institutions, how the future of accessing archival records should look like, resulting in a European policy for managing and maintaining reuse of archival content!

 

Final words

When summarising everything written above, it can be stated, that reusing archival content in ways which reach beyond current capabilities is not simple. First of all, it is not always seen as the core priority and at the same time some specific skills and personnel are needed for the more technical tasks, as well as archivists to maintain and manage some of the functions. Taken in account that most archival institutions do not have extensive resources available for these purposes it is the big opportunity of the APEx project and the APE Foundation afterwards to continue bringing the archival community together to carry out the necessary discussions.

And of course, here is a personal opportunity as well: After having read this article we would be more than happy, if you would let us know about your feelings and good ideas (and you are more than welcome to take these on social media), because in the end of the day it is down to being in it all together.

1.
See the latest numbers on the Archives Portal Europe homepage (http://www.archivesportaleurope.net/) (viewed 9 May 2015). 
2.
"Work Package 6 of the APEx project is tasked with applying innovative Web 2.0 functionalities to the Archives Portal Europe, allowing for user-compliant accessibility." APEx project homepage (http://www.apex-project.eu/index.php/en/about-apex/2-uncategorised/31-work-packages#WP6) (viewed 9 May 2014). 
3.
APEx project: First Analysis report: Applying Web 2.0 solutions in archival applications. Den Haag, 2013. (http://www.apex-project.eu/images/docs/D61_Web20_In_Archival_Applications.pdf) (viewed 9 May 2014.) 
4.
Read more about CENDARI in Aleksandra Pawliczek: “Building up a Research Infrastructure on the First World War across Borders” (http://www.apex-project.eu/index.php/en/articles/180-building-up-a-research-infrastructure-on-the-first-world-war-across-borders) (viewed 9 May 2014). 
5.
5 Start Open Data (http://5stardata.info/) (9 May 2014). 
6.
Unfortunately the final report of the survey is not yet available. 

About the Author

Kuldar Aas

Kuldar Aas is Deputy Head of the digital preservation bureau of the National Archives of Estonia. In the APEx project he has acted as WP6 lead from March 2012 until January 2014. From February 2014 he acts as a technical expert in several WP.

More articles from this author

Add comment


Security code
Refresh

Copyright © 2012 - 2015 APEx project - All Rights Reserved
The APEx project is co-funded by the European Commission via the ICT PSP framework, 5th call, theme 2.1 - aggregating content for Europeana
 

EU flag  ICT PSP logo  Europeana logo

  

 

You are here: Home Articles So, there is data – what can we do with it?