Email Archiving with ePADD at Harvard Library

On Tuesday, April 18th 2023, Digital Preservation Services and Library Technology Services gave a joint presentation about new email archiving tools and workflows at Harvard Library. This presentation was an update on the work originally introduced in February 2021, and you can revisit that recording or read more about it here. Read on for a brief recap on this project and how to get involved.

Finishing the ePADD+ Project

Harvard Library initially partnered with Stanford University and the University of Manchester in 2020, when the institutions aligned their email archiving priorities and agreed to collaborate on a project to embed preservation functionality into the open-source tool, ePADD, which already supported email appraisal, processing, discovery, and delivery. The collaboration was awarded a grant through the University of Illinois’ Email Archives: Building Community and Capacity regrant program, funded by the Andrew W. Mellon Foundation. 

The resulting project, called Integrating Preservation Functionality into ePADD (or ePADD+ for short) ran from January 2021 and December 2022. Over the past two years, the project team undertook two full version updates of the ePADD tool. New feature highlights included (but were not limited to):

  • Retention of the full header profiles for record authenticity
  • Support of multi-part messages
  • Ability to import sidecar files, which can be used to cover anything from repository deposit information to donor agreements and more
  • Richer metadata capture, including automatic and manual additions of PREMIS metadata
  • Export of a preservation-ready bag, that can optionally include the original email, the post-appraisal or processed copy of the email, a preservation copy, and sidecar files
  • Optional integration of the commercial format conversion tool, Emailchemy, for direct import and export of a wider variety of email formats

Black space image with stars that reads ePADD now with preservation functionality over it
Image via Digital Preservation Services

 

As with any open source tool, there have been ongoing concerns about how to resource bug fixes and security updates in between grant-funded development windows. The project partners launched a new website aimed to pull together more community resources, projects, and workflows, expanding adoption of the tool. There is also a new ePADD Steering Group that will unite international user institutions to strategically advance the ePADD project, as well as a Code Group that lowers the barrier for new codebase contributors.

Implementing ePADD at Harvard Library

Decommissioning old tooling and adopting a new, open-source tool for email archiving involved careful, strategic collaboration between DPS and LTS. Beginning in mid-2022, LTS and DPS partnered to tackle:

  • Decommissioning our homegrown email archviving system, EAS, without major disruption to the existing users
  • Triage of email collections stored in EAS, migrating them into the DRS for long-term preservation or back to the collecting units for ingest into ePADD
  • Design, engineering and implementation of the new ePADD to DRS deposit pipeline

The LTS team determined early on that architecture previously designed for the Harvard Data Commons project - which facilitated the deposit of research data from Dataverse into the DRS - could be re-used for the ePADD deposit pipeline as well. While the software engineering team iterated on that architecture, LTS operations, support, systems, and storage teams coordinated the onboarding and documentation of the new tooling. Final testing and a soft launch of the new end-to-end workflow launched this month, April 2023.

Workflow diagram showing how a PST file could be processed using the ePADD software
Sample workflow of a PST file - use cases will vary by curatorial unit and donor circumstances

 

Getting started with ePADD

Curators that are interested in utilizing ePADD for archival email processing as well as preserving in the DRS can visit the LTS wiki for more information on what to download and install and how to get started. Anyone interested in exploring the ePADD project generally can visit the website.

For updates on using ePADD at Harvard Library, you can join the LTS ePADD listserv here. To receive broader community updates about the project, you can join the general ePADD listserv here. Please feel free to contact Tricia Patterson (tricia_patterson@harvard.edu), Senior Digital Preservation Specialist in DPS with additional questions not covered by these resources.