Chezkie Kasnett: The Historical Archive Reborn — Approach and Strategy for the Archive Network

published under CC-BY-SA license

Abstract:

The Israel Heritage Archive Network Project (IHAN) was launched in January 2013 as part of the national "LandMarks" cultural heritage project. Aimed at all archive institutions situated throughout Israel that contain historical material of cultural heritage value, participating archives include public, private, government, and commercial archives. The aim of IHAN is two-fold. Firstly, IHAN aims to digitally preserve valuable cultural heritage materials for the future, and secondly to make these materials freely available to the public online.


 

Introduction

The digital revolution has caused drastic changes in our day-to-day lives in recent years, specifically with regard to our expectations of availability and access to information, whether it is the train schedule, the news, or even historical material. In an era where almost all information is available on the Internet, the researcher and the inquisitive expect to be able to access source material and historical material online. The digital era has drastically influenced the research field, creating new areas of research and exciting new opportunities for researchers and academics alike. It has transformed the cumbersome job of collecting information, which in the past was a very large part of the researcher's activities to a simpler, faster, and more efficient task, even enabling the researcher to go far deeper than ever before. Material is now readily accessible that in the past was inaccessible without crossing borders. The researcher can today ask questions that could not have been asked in the past, in ways previously not conceivable. This new environment presents a great opportunity for historical archives, yet simultaneously presents new challenges and demands.

Today's historical archive is vastly different. Gone are the days when the archive was a lone body with static records and material on the shelves, a physical storage location for the document or object. Today's archive is a virtual resource for education and scholarship. A place of interaction, learning via active engagement, a place of ever-growing exposure to the public. In order to survive, the modern archive must put itself on the digital map. This requires enabling electronic access to its catalog, digitizing source material, and engaging existing audiences as well as new ones. Today's archive professionals must gain familiarization, and often expertise in new technologies ranging from digitization and electronic storage to electronic catalogs, databases, standards and formats.

In addition to the core source material, increasing importance is being given to multimedia collections, electronic repositories and databases, digital surrogates, digitally born material, and Internet search engines that allow access to material from afar, dramatically improving the research approach. The role of the archive has not changed, but the method has.

If the first digital revolution was the eruption of information, then today's trend is connecting all of this information together, a network of interlinked records. The tremendous value potential of information networks, and archive networks in particular, is in connecting the information, creating a rich and deep information resource highlighting the context, depth, sources and story of the target. The greater the network, the more complete the information "picture" becomes. An important tool is the use of semantic tagging of the material located in the network. Beyond linking the material between archives together based on like records and subject matter, semantic analysis of texts allows much greater depth in correlating material on the document level, something that has been but a dream for most archives until now. This concept must constitute the core of archive network projects, and is the case with the Israel Heritage Archive Network Project.

The Israel Heritage Archive Network is a national cultural heritage project aimed at bringing together over 600 historical archives into an online network, providing public access to their holdings. Many important historical archives are unknown and virtually inaccessible to the public. There exists a real danger of losing historically valuable, national material. The project is a platform for cultural heritage archives to improve the quality of and access to their collections. The objectives of the project are to provide access to valuable national material to the public free of charge, and to provide a framework and infrastructure for the long-term digital preservation of the electronic records and their related digital objects from these archives. The project will create and utilize shared controlled vocabularies among participating archives and will allow relationships between archive institutions based on semantic processing of both the records and the underlying OCR (Optical Character Recognition) text of the digitized objects. In a revolutionary step, the project will allow for greater record resolution, down to the document level of archival holdings. Using crowdsourcing, the project will allow the public to enrich the information on the record level, contributing information and tagging objects in the system. An advanced search solution utilizing the latest technologies will allow users to search across the holdings of all the archives at once, achieving an instant range and depth of results never before possible.

 

The project

The project was initiated by the Heritage Division of the Office of the Prime Minister of the State of Israel. The National Library of Israel was chosen to manage and execute the project. The Israel State Archives is a co-manager of the project representing the archives and providing consultation and assistance on content and standards. The Association of Israeli Archivists also represents the archives and acts as an advisor to the archives themselves and to the project on matters of standards, participating archives, and content.

 

Project team

A number of professional teams were assembled to manage and execute the project. These consist of a steering committee and four professional task teams. Each team was given responsibility for a specific area of the project. The task teams consist of a Technology team, a Legal team, a Content team, and a Project team. The content team is responsible for selecting the archives that will be subsidized by the Project as well as the content within the archive itself that will be digitized and added to the Project website. The project team consists of three dedicated team members, a project manager and two technical experts and is responsible for the management and execution of the project action items. The Project Manager works with the Steering Committee and the other teams to define project goals and timelines, make project decisions and to resolve important project issues.

 

Budget and timeline

The Project was allocated a four year timeframe. The first year was defined as a pilot stage during which the technology infrastructure was to be implemented followed by the addition of nine archives into the project. The subsequent three years would be dedicated to adding additional archives into the Project.

It became clear that many hundreds of historical archives exist throughout Israel and that the limited Project budget could cover only a small percentage of them. The Content team was tasked with selecting a total of 40 archives that would be funded by the Project. The funding would go towards digitization and cataloging efforts only.

 

Challenges

Such a project understandably faces many challenges. These challenges can be grouped into three major categories, namely, a) the archives, b) technology and c) the process.

The archive challenges include selecting those archives to receive project funding, and dealing with many different standards of metadata, (or none at all), different size organizations and structure, the poor physical state of many of the archives and their holdings, the hesitancy of archives in joining the project, legal issues such as copyright and privacy, and the sheer enormity of the cumulative data across the participating archives.

In terms of the technology, the challenges lie in a number of core areas:

  1. Data: The difficulty in controlling and managing the information, standardization, and unification of the data.
  2. Digitization: Digitization of a broad range of materials, often in poor physical condition and long-term digital preservation.
  3. Delivery: Defining a broad target audience and creating a usable, intuitive, engaging website that seeks to be useful to a broad audience including experienced researchers, students, academics, and the general public.

Lastly, the project process poses the challenges of unifying many different catalogs into a single data repository, dealing with data in numerous languages, working with many agents in the execution of the project, funding and bureaucracy, and the lack of standards and unified methodology among archives.

 

Strategy

The strategy employed in undertaking such a challenging project lies in employing a number of core strategy factors:

  1. Secure the necessary funding.
  2. Build a team of experts.
  3. Define and learn the target audience.
  4. Develop and implement a content strategy.
  5. Implement a comprehensive access and discovery strategy in engaging the public.
  6. Adopt international standards of metadata, and digitization.
  7. Implement a powerful and robust technology platform.
  8. Adopt a long-term Digital Preservation strategy.
  9. Address copyright and legal aspects prior to undertaking such a project. Many archives hold material that is limited by privacy restrictions and/or under copyright.
  10. Implement a KIS (Keep It Simple approach).
  11. Learn from other projects. Don't attempt to re-invent the wheel.
  12. Involve the archives and the public.
  13. Create a win-win situation for the maximal participation of potential archives.

 

Technology approach and solution

The Project is to develop and implement a technology platform that will host and provide access to the selected historical archival material from as many archives as possible. At the start of the project it was clear that a thorough analysis of all potential archives was necessary in order to get a clear picture of the number of candidate archives as well as the nature, relevance, quality, and physical state of their holdings. A national survey of archives was then undertaken over the course of fourteen months to complete the task. A team of six surveyors with archive backgrounds were recruited and a comprehensive questionnaire was written. Close to 600 archives were surveyed during this time period. The accumulated data was then analyzed by the Content team in order to select the archives that would be funded by the Project based on pre-defined criteria.

While the core aim of the Project is the digitization and online publishing of the 40 selected archives, the broader aim is of course for all historical archives to participate in the Project. Therefore special attention was dedicated to the technology infrastructure to allow non-selected archives to participate in the Project by sourcing their own funds for digitization and cataloging, yet allowing them the possibility to upload their catalog data and digital scans into the Project at no direct cost. A technology infrastructure was built to achieve this containing a number of components.

  1. The international EAD (Encoded Archival Description) standard was adopted for the import of all catalogs into the Project. The Project released an updated and slightly modified version of EAD (EAD-Israel, or EADI) for use by archives institutions wishing to participate in the Project.
  2. A mapping tool is being built to allow archives the ability to export their data from whatever Archive Managements System (AMS) or catalog is in use by the archive as an EADI file which can then be imported into the Project database.
  3. An AMS to contain all of the catalog data of all participating archives. A leading international AMS vendor was selected, localized, and implemented.
  4. A Digital Asset Management (DAM) and long-term digital preservation system was implemented for the storage of the digital scans of the project material.
  5. An engaging front-end web portal to provide access and search capability on the records and digital files of all the archives. A comprehensive solution is being constructed based on a combination of a number of Google solution platforms.
  6. The National Library of Israel thesaurus of authority records was integrated into the AMS for use by all participating archives.

As a result of this architecture, a number of achievements were attained.

  1. Any archive could participate in the Project by exporting their data into the Project via EADI.
  2. Archives that do not have an AMS in place could purchase a license and use the Project AMS to manage their archive as a SaaS (Software as a Solution) solution.
  3. A unique national digital preservation repository was created for all digital derivative scans. The digital repository will serve as a national digital backup of the historical assets contained in the many archives across the country.
  4. A unique central catalog repository of all historical archives is being created to allow centralized and in-depth search across multiple archive catalogs and collections simultaneously.

 

Project impact

The Project portal is expected to be launched during the second half of 2015. Additional content and functionality will be added to the portal over the course of the next few years, considerably expanding and enhancing the available content on the site and exponentially enhancing the user experience.

The result will be making many archives and their content available to the public worldwide for the first time, linking them in a comprehensive network that will allow in-depth search and navigation across all of the archives simultaneously.

It is expected that the Project will be a rich, comprehensive, and valuable resource for both the public and academic circles for historical content on Israeli cultural history, covering the broad spectrum of Israeli life and society.

Furthermore, the Project will serve as a useful model for digital preservation, content sharing and access that can be implemented globally for digital projects in general and cultural heritage projects in particular.

 

 

About the Author

Chezkie Kasnett

Chezkie Kasnett currently holds the position of Head of Digital Projects at the National Library of Israel. Chezkie is responsible for leading core digitization initiatives at the Library. Chezkie has over twelve years' experience in the information technology field and in international project management. His areas of expertise include process and systems development, digitization, OCR, and enterprise knowledge management solutions. Prior to the National Library Chezkie worked at a number of leading technology companies in the fields of intelligent information and knowledge mining where he successfully managed their large-scale US and European projects. Chezkie holds a B.A. in Business Administration and Information Systems.

More articles from this author

Add comment


Security code
Refresh

Copyright © 2012 - 2015 APEx project - All Rights Reserved
The APEx project is co-funded by the European Commission via the ICT PSP framework, 5th call, theme 2.1 - aggregating content for Europeana
 

EU flag  ICT PSP logo  Europeana logo

  

 

You are here: Home Articles The Historical Archive Reborn — Approach and Strategy for the Archive Network