Digital Preservation Implementation Plan
1. Preservation Activities
This plan implements Northern Illinois University Libraries' (NIUL) Digital Preservation Policy, describing ongoing strategies to preserve the content data objects contained in NIUL's digital repository. These strategies include:
Archival File Formats
NIUL is committed to the use of file formats that support long term sustainability. In general, the considerations for selecting file formats include the “openness” of the file format, its level of support as a preservation format in the academic/scholarly community, and its well-suitedness to later format migration. Upon ingest, every file in the repository is subject to identification of its file format and other significant characteristics, including a reference to the file format’s entry (if it exists) in PRONOM, the National Archive’s online format registry. This association ensures that information is always available on the internal structure of the file, and can be further used to determine when the format migration activity should take place (if allowed by the object’s preservation level) in order to mitigate the risks posed by the obsolete file formats. The process used for file format identification and NIUL's preferred formats are documented below in the Preferred File Format section.
Normalization and Migration
As mentioned above, NIUL works to identify file formats well-suited to its approach to preservation and access. Upon ingest, materials not conforming to NIUL’s accepted standards will be converted to one of the previously identified formats. Whenever possible, NIUL will attempt to preserve the essential characteristics of the object. When NIUL perceives that a content data object is stored in a format that is at risk of obsolescence, a new version of this content will be created in a format more suited to long-term preservation and use. This transformation may consist of migration to a newer version of the content’s existing format, or transformation to a different format altogether. In all cases, preservation of the object’s intellectual content will be prioritized over the preservation of a specific presentation style.
Bit Stream Copying
NIUDL maintains backups of all information contained in NIUL’s digital repositories for use in the event of data loss. In combination with regular fixity checks, which identify potentially damaged content, this process ensures the integrity of content in NIUL’s digital repository, and provides a foundation for its disaster recovery plans. In addition to operating system level backups, two copies of all NIUL content data objects are kept in cold storage indefinitely: 1) one copy of the archival FOXML required by the Fedora Commons repository for migration, which contains all of the files associated with the object and its administrative history, and 2) one copy of the raw, archival file. These procedures are documented in the Backup and Bit Stream Copying section.
Fixity Checking
All materials in the repository are subject to regular fixity checks. This activity, when combined with bit stream copying, mitigates the risk of objects becoming corrupt in the repository, as it enables the repository manager to identify damaged or corrupted content, and to revert to a valid version of the object from a previous point in time. This strategy is documented in the Fixity Procedures section.
Preservation Levels
These preservation activities are applied to materials in the repository according to the material’s designated Preservation Level, which are indicated in the Collection Development Policy. These levels include:
Bit-level Preservation
Objects preserved at this level will be subject to Bit Stream Copying, Fixity Checking, and Documentation of File Formats. This is a baseline level of preservation activity which ensures that the object, once ingested into the repository, can be maintained in a valid and uncorrupted state. It also attempts to provide representation information for the object through documentation of its file format, though at this level no migration activities will take place. This preservation level should be considered less robust than the “Full Preservation” level, and should only be considered in situations where Full Preservation is not a viable strategy. Common issues of unsuitability include lack of privilege to perform migration activities on the material, the presence of material in unknown or unsupported file formats, or the material’s failure to conform to a valid format. Objects are preserved at this level permanently.
While the bulk of the objects that NIUL curates are located in the digital repository, many objects still remain on other platforms, including DSpace and local networked attached storage. The long-term goal of NIUL is to migrate all of these objects into the same platform, in order to take full advantage of the digital preservation strategies outlined in this document. In the meantime, every effort will be made to provide bit-level preservation for these objects through local backups of data storage areas.
Full Preservation
Items preserved at this level will receive the benefit of all of the above-mentioned preservation activities, as appropriate. Upon ingest into the repository, the material will undergo file format identification and normalization/transformation to archival file formats. As time goes on, these formats will be monitored by NIUL staff, and should the criteria for format migration be met, the files will be migrated to a new format. In addition, all activities associated with the “Bit-level Preservation” preservation level will be carried out. Objects preserved at this level may be de-escalated to bit-level preservation only at the discretion of the Digital Collections Steering Committee and Preservation Committee.
No Preservation
In rare cases, the repository may contain material for which NIUL is unable or unwilling to accept preservation responsibility. Although incidental preservation activities may take place upon this material, NIUL accepts no responsibility for its long-term accessibility or validity. This level is an exception to NIUL’s digital preservation activities, rather than a part of its preservation strategy, but is included here for completeness. Examples of materials at this level would include objects that are known to be corrupt or not authorized for preservation at any level but which, for some reason, cannot be deleted immediately.
2. SIP, DIP, and AIP Definitions
Submission Information Package (SIP)
The SIP is the information package that is delivered to or created by NIUL for use in the construction of one or more Archival Information Packages (AIPs). SIPs will include the content files and some level of descriptive information, including file naming standards or project directory structure. SIPs may be delivered via electronic transfer (e.g., FTP), loaded from submitted media, or simply mounted (e.g., CD-ROM) on the staging file system for processing.
SIPs may be temporarily stored on backup storage during processing (e.g., metadata creation or remediation, error resolution, etc.), however this is not meant to be a permanent storage solution. SIPs must be used to create AIPs for long-term preservation, or they must be destroyed according to the retention schedule or other policies governing this process (see Collection Development Policy). Initial deposits may remain with the unit of origin, in accordance with local practice.
The format of the SIP may vary from submitter to submitter, based on the submitter’s willingness or ability to provide the content and metadata in a specific format. For a given content type, file format preferences are described in the Preferred File Formats section below.
Dissemination Information Package (DIP)
OAIS describes a DIP as "the Information Package, derived from a part, or all, of one or more AIPs, received by the Consumer in response to a request to the OAIS."
DIPs are always generated from a single AIP, which resides in the NIUL digital repository, Fedora Commons. User access to digital assets is provided through the Northern Illinois University Digital Library (NIUDL), whenever possible in accordance with donor agreements and copyright law. Depending on their level of access, the user may see basic object metadata, an access version of the digital object, and download links to all available files. Contextual information is provided in the form of links to parent collections.
Archival Information Package (AIP)
The information package consisting of the Content Information (CI), Preservation Description Information (PDI), Packaging Information (PI), and Descriptive Information (DI) that is archived by NIUL in the NIUDL. The type and level of content in a NIUDL AIP can vary, depending on the content provided by the submitter.
-
Content Information (CI): Consists of the Content Data Object (CDO), which is the focus of preservation, as well as Representation Information (RI), which contains information on the CDO's file format, version, and a reference to a format registry in order to provide information on how to interpret the file. CDO and RI are both stored in Fedora Commons as OBJ and TECHMD datastreams, respectively.
-
Preservation Description Information (PDI): Documents and supports the preservation processes, including Reference Information, such as the identifiers used to identify an asset locally (e.g. URI) and globally (e.g. NIUDL PID); Provenance Information, which provides a history of preservation events in the object's lifetime, beginning at ingest into NIUDL and referencing any preservation activities taken on the object (e.g., replacement due to corruption, format migration, fixity checking, etc.); Context Information about how a CDO relates to other CDOs or to other conceptual entities (e.g. a newer version of an object that supersedes an older version); and Fixity Information generated at the time of ingest in order to later determine whether or not the item remains in the same state as when it was ingested. This information can be used to determine integrity of an object being copied within the system (as in the case of a change in storage location), or for periodic integrity checks. In Fedora Commons, PDI is contained in the FOXML, which is used to derive PREMIS records when AIPs are created for long-term cold storage.
-
Packaging Information (PI): Combines CI and PDI together into a single logical package, either the FOXML in Fedora Commons or the bag manifest.
-
Descriptive Information (DI): Depending on the type of CDO, the format and level of descriptive metadata can vary. In general, every CDO contains both a MODS and Dublin Core record, which is included with the PDI.
Although the contents of the SIP, DIP, and AIP may vary, depending on the willingness or ability of the content provider to submit preferred formats, they generally contain:
Content Model | SIP | DIP |
---|---|---|
Audio |
MODS OBJ: .wav, .mp3 |
PROXY_MP3 (derivative created by LAME encoder) |
Basic Image |
MODS OBJ: .gif, .png, .jpg, .jpeg |
MEDIUM_SIZE (compressed image) |
Large Image |
MODS OBJ: .tif, .tiff, .jp2 |
JP2 JPG |
Video |
MODS OBJ: .mp4, .avi, .mov, .ogg, .qt, .mkv, .m4v |
MP4 (derivative created by ffmpeg) MKV (derivative created by ffmpeg) |
MODS OBJ: .pdf |
PREVIEW (image created by ImageMagick) FULL-TEXT (created by pdftext or uploaded) |
|
Islandora Paged Content | OBJ: .tif, .tiff, .jp2 |
JP2 JPG RELS-INT OCR HOCR |
Book | MODS | |
Compound Object | MODS | PDF (as applicable) |
Also Included in each DIP are the OBJ, RELS-EXT, MODS and/or DC, TN (thumbnail image, as applicable), and TECHMD.
3. Preferred File Formats
NIUL requires identification of the type of file format during ingest into the repository in order to help mitigate risk posed by format obsolescence. While there are no format restrictions on the content data objects that will be accepted, well-known, widely accepted formats that support long-term preservation are strongly preferred. If a content provider wants to submit a specific format that does not adhered to the criteria in this document, an agreement must be reached between the content provider and NIUL.
To this end, NIUL employs the use of DROID, JHOVE, file utility, Exiftool, NLNZ Metadata Extractor, and ffident through the FITS software package to identify file formats on ingest to the repository. An example characterization and reference to format registry is below:
<identification>
<identity format="Tagged Image File Format" mimetype="image/tiff" toolname="FITS" toolversion="0.6.1">
<tool toolname="Jhove" toolversion="1.5"/>
<tool toolname="file utility" toolversion="5.04"/>
<tool toolname="Exiftool" toolversion="7.74"/>
<tool toolname="Droid" toolversion="3.0"/>
<tool toolname="NLNZ Metadata Extractor" toolversion="3.4GA"/>
<tool toolname="ffident" toolversion="0.2"/>
</identity>
</identification>
NIUL is unable to ingest files with formats that are not recognized by FITS. If FITS identifies a conflict in the identification of a format between two or more tools, that conflict will be recorded in the object’s technical metadata, but the file will be accepted. Format consensus will be evaluated periodically by the repository manager, who will be responsible for investigating and resolving conflicts.
Preferred File Formats
NIUL can only commit to providing full-level preservation for preferred formats, which are detailed below. These files are sometimes also referred to as “archival masters.”
Acceptable File Formats
Acceptable formats are those that NIUL will accept, but for which it can only guarantee basic, bit-level preservation.
Normalization
For assets with file formats that do not fall into the preferred or acceptable categories,normalization may be considered, depending on the availability of funding, technology, and/or staff time. In such cases, files will be transformed into a preferred or acceptable format in order to provide access to the content and long-term preservation. However, the original file, including format identification, will be included in the AIP.
Resource Type | Preferred Format | Acceptable Format |
---|---|---|
Image | TIFF | JPEG, GIF, PNG |
Document | ||
Audio | WAV or FLAC | MP3 |
Video | AVI | MPEG, MP4, OGG, MKV, Quicktime |
4. Backup and Bit Stream Copying
Data Storage and Software Backup
NIUL is committed to regular backup procedures of both data storage areas and its software infrastructure (e.g. databases, application files). These backups are intended to serve as the basis for restoration of NIUL materials and services in the case of disaster or large-scale corruption of data.
Incremental snapshots of the operating system, application files, and databases are taken daily and retained for 60 days in Amazon S3, which guarantees 99.999999999% durability. Temporary files and staging areas are excluded from these snapshots.
The Fedora Commons objectStore, which contains XML about each object in the repository, is stored on an Amazon EBS volume, which is included in the daily snapshots mentioned above. The datastreamStore, which contains all of the individual files associated with objects, is located on Amazon S3. S3 is mounted like a traditional filesystem using yas3fs on our Backend instance, using a local cache to sync with S3. The cache has a maximum size limit of 20 GB, but files typically sync immediately.
Content Data Object Copying
Backups of the archival FOXML, which contain XML about each object and all files encoded in base64, and the archival file ("OBJ") are created by the Islandora S3 Backup module automatically upon ingest. This module leverages Drupal Rules to create, modify, and delete backups depending on certain triggers (i.e. object or datastream ingested, modified, or purged). These backups are sent to S3, where lifecycle rules immediately transition the files to Glacier, Amazon's cold storage solution.
5. Fixity Procedures
NIUL is committed to maintaining the integrity of digital assets in its care. This includes creating checksums for all files ingested into the repository and regular fixity checking of those assets.
NIUDL Repository
On ingest to the repository, the Islandora Checksum Module calculates an MD5 checksum for every file associated with an asset. These values are stored in the object’s audit log. On a monthly basis, the Islandora Checksum Checker module is used to periodically verify that these checksums have not changed, detecting errors that may have been introduced through malicious interference or due to degradation of files over time. The results of each verification will also be recorded in the object’s audit log. At the end of each cycle, a completion report will be e-mailed to the repository manager, who will be responsible for restoring from backup any files with mismatched checksums using the Islandora S3 Backup Module.
Long-Term Remote Storage
Checksums are generated by the long-term storage service provider (Amazon Glacier) and are used to verify the integrity of all files deposited. Regular, systematic integrity checks are performed and the system is built to be automatically self-healing.
6. Selection and Appraisal
Selection and Appraisal for materials to be preserved by NIUL will be conducted in accordance with the Digital Preservation Policy and Digital Collection Development Policy by curatorial staff and subject specialists from the content provider units, including:
- Rare Books and Special Collections
- Regional History Center and University Archives
- Southeast Asia Collection
- Digital Collections and Scholarship
- Reference & Research (as-needed)
Selection will be done in consultation with digital preservation staff in Technical Services. All items selected must adhere to the individual unit’s collection development policy.
All copyrights and literary rights are to be enforced in keeping with their respective deeds of gift documents where applicable.
Legal requirements for record retention must also be observed where applicable, in accordance with Library and University policies.
Content providers are strongly encouraged to submit files in the designated preferred file formats; any non-standard file formats submitted will be transformed into preservation formats where possible, with the option to retain the original file where feasible. All files must be scanned for viruses and malware by the content provider before submission.
7. Review Cycle and Acknowledgments
This plan was approved by the Preservation Committee and Library Management Group on May 16, 2017. It will be reviewed annually, or as needed, by the Preservation Committee to assure timely revisions as technology progresses, preservation strategies and experiences mature, and resources change. The latest version was approved on February 11, 2019.
Significant portions of this document were inspired by or borrowed from the OCUL's Scholars Portal Trusted Digital Repository Documents and York University's Preservation Policy.