1. Preservation Activities
This plan implements Northern Illinois University Libraries' (NIUL) Digital Preservation Policy, describing ongoing strategies to preserve the content data objects contained in NIUL's digital repository. These strategies include:
Archival File Formats
NIUL is committed to the use of file formats that support long term sustainability. In general, the considerations for selecting file formats include the “openness” of the file format, its level of support as a preservation format in the academic/scholarly community, and its well-suitedness to later format migration. Upon ingest, every file in the repository is subject to identification of its file format and other significant characteristics, including a reference to the file format’s entry (if it exists) in PRONOM, the National Archive’s online format registry. This association ensures that information is always available on the internal structure of the file, and can be further used to determine when the format migration activity should take place (if allowed by the object’s preservation level) in order to mitigate the risks posed by the obsolete file formats. The process used for file format identification and NIUL's preferred formats are documented below in the Preferred File Format section.
Normalization and Migration
As mentioned above, NIUL works to identify file formats well-suited to its approach to preservation and access. Upon ingest, materials not conforming to NIUL’s accepted standards will be converted to one of the previously identified formats. Whenever possible, NIUL will attempt to preserve the essential characteristics of the object. When NIUL perceives that a content data object is stored in a format that is at risk of obsolescence, a new version of this content will be created in a format more suited to long-term preservation and use. This transformation may consist of migration to a newer version of the content’s existing format, or transformation to a different format altogether. In all cases, preservation of the object’s intellectual content will be prioritized over the preservation of a specific presentation style.
Bit Stream Copying
NIUDL maintains regularly scheduled backups of all information contained in NIUL’s digital repositories for use in the event of data loss. In combination with regular fixity checks, which identify potentially damaged content, this process ensures the integrity of content in NIUL’s digital repository, and provides a foundation for its disaster recovery plans. In addition to backups, two copies of all NIUL content data objects are maintained in a repository-agnostic format: 1) one copy located off-campus at the Department of Information Technology, and 2) one copy synced to DuraCloud. These procedures are documented in the Backup and Bit Stream Copying section.
Fixity Checking
All materials in the repository are subject to regular fixity checks. This activity, when combined with bit stream copying, mitigates the risk of objects becoming corrupt in the repository, as it enables the repository manager to identify damaged or corrupted content, and to revert to a valid version of the object from a previous point in time. This strategy is documented in the Fixity Procedures section.
Preservation Levels
These preservation activities are applied to materials in the repository according to the material’s designated Preservation Level, which are indicated in the Collection Development Policy. These levels include:
Bit-level Preservation
Objects preserved at this level will be subject to Bit Stream Copying, Fixity Checking, and Documentation of File Formats. This is a baseline level of preservation activity which ensures that the object, once ingested into the repository, can be maintained in a valid and uncorrupted state. It also attempts to provide representation information for the object through documentation of its file format, though at this level no migration activities will take place. This preservation level should be considered less robust than the “Full Preservation” level, and should only be considered in situations where Full Preservation is not a viable strategy. Common issues of unsuitability include lack of privilege to perform migration activities on the material, the presence of material in unknown or unsupported file formats, or the material’s failure to conform to a valid format. Objects are preserved at this level permanently.
While the bulk of the objects that NIUL curates are located in the digital repository, many objects still remain on other platforms, including DSpace and local networked attached storage. The long-term goal of NIUL is to migrate all of these objects into the same platform, in order to take full advantage of the digital preservation strategies outlined in this document. In the meantime, every effort will be made to provide bit-level preservation for these objects through regular syncing to DuraCloud and backups of data storage areas.
Full Preservation
Items preserved at this level will receive the benefit of all of the above-mentioned preservation activities, as appropriate. Upon ingest into the repository, the material will undergo file format identification and normalization/transformation to archival file formats. As time goes on, these formats will be monitored by NIUL staff, and should the criteria for format migration be met, the files will be migrated to a new format. In addition, all activities associated with the “Bit-level Preservation” preservation level will be carried out. Objects preserved at this level may be de-escalated to bit-level preservation only at the discretion of the Digital Collections Steering Committee and Preservation Committee.
No Preservation
In rare cases, the repository may contain material for which NIUL is unable or unwilling to accept preservation responsibility. Although incidental preservation activities may take place upon this material, NIUL accepts no responsibility for its long-term accessibility or validity. This level is an exception to NIUL’s digital preservation activities, rather than a part of its preservation strategy, but is included here for completeness. Examples of materials at this level would include objects that are known to be corrupt or not authorized for preservation at any level but which, for some reason, cannot be deleted immediately.
2. SIP, DIP, and AIP Definitions
Submission Information Package (SIP)
The SIP is the information package that is delivered to or created by NIUL for use in the construction of one or more Archival Information Packages (AIPs). SIPs will include the content files and some level of descriptive information, including file naming standards or project directory structure. SIPs may be delivered via electronic transfer (e.g., FTP), loaded from submitted media, or simply mounted (e.g., CD-ROM) on the staging file system for processing.
SIPs may be temporarily stored on backup storage during processing (e.g., metadata creation or remediation, error resolution, etc.), however this is not meant to be a permanent storage solution. SIPs must be used to create AIPs for long-term preservation, or they must be destroyed according to the retention schedule or other policies governing this process (see Collection Development Policy). Initial deposits may remain with the unit of origin, in accordance with local practice.
The format of the SIP may vary from submitter to submitter, based on the submitter’s willingness or ability to provide the content and metadata in a specific format. For a given content type, file format preferences are described in the Preferred File Formats section below.
Dissemination Information Package (DIP)
OAIS describes a DIP as "the Information Package, derived from a part, or all, of one or more AIPs, received by the Consumer in response to a request to the OAIS."
DIPs are always generated from a single AIP, which resides in the NIUL digital repository, Fedora Commons. User access to digital assets is provided through the Northern Illinois University Digital Library (NIUDL), whenever possible in accordance with donor agreements and copyright law. Depending on their level of access, the user may see basic object metadata, an access version of the digital object, and download links to all available files. Contextual information is provided in the form of links to parent collections.
Archival Information Package (AIP)
The information package consisting of the Content Information (CI), Preservation Description Information (PDI), Packaging Information (PI), and Descriptive Information (DI) that is archived by NIUL in the NIUDL. The type and level of content in a NIUDL AIP can vary, depending on the content provided by the submitter.
-
Content Information (CI): Consists of the Content Data Object (CDO), which is the focus of preservation, as well as Representation Information (RI), which contains information on the CDO's file format, version, and a reference to a format registry in order to provide information on how to interpret the file. CDO and RI are both stored in Fedora Commons as OBJ and TECHMD datastreams, respectively.
-
Preservation Description Information (PDI): Documents and supports the preservation processes, including Reference Information, such as the identifiers used to identify an asset locally (e.g. URI) and globally (e.g. NIUDL PID); Provenance Information, which provides a history of preservation events in the object's lifetime, beginning at ingest into NIUDL and referencing any preservation activities taken on the object (e.g., replacement due to corruption, format migration, fixity checking, etc.); Context Information about how a CDO relates to other CDOs or to other conceptual entities (e.g. a newer version of an object that supersedes an older version); and Fixity Information generated at the time of ingest in order to later determine whether or not the item remains in the same state as when it was ingested. This information can be used to determine integrity of an object being copied within the system (as in the case of a change in storage location), or for periodic integrity checks. In Fedora Commons, PDI is contained in the FOXML, which is used to derive PREMIS records when AIPs are created for long-term cold storage.
-
Packaging Information (PI): Combines CI and PDI together into a single logical package, either the FOXML in Fedora Commons or the bag manifest.
-
Descriptive Information (DI): Depending on the type of CDO, the format and level of descriptive metadata can vary. In general, every CDO contains both a MODS and Dublin Core record, which is included with the PDI.
Although the contents of the SIP, DIP, and AIP may vary, depending on the willingness or ability of the content provider to submit preferred formats, they generally contain:
Content Model | SIP | DIP |
---|---|---|
Audio |
MODS OBJ: .wav, .mp3 |
PROXY_MP3 (derivative created by LAME encoder) |
Basic Image |
MODS OBJ: .gif, .png, .jpg, .jpeg |
MEDIUM_SIZE (compressed image) |
Large Image |
MODS OBJ: .tif, .tiff, .jp2 |
JP2 JPG |
Video |
MODS OBJ: .mp4, .avi, .mov, .ogg, .qt, .mkv, .m4v |
MP4 (derivative created by ffmpeg) MKV (derivative created by ffmpeg) |
MODS OBJ: .pdf |
PREVIEW (image created by ImageMagick) FULL-TEXT (created by pdftext or uploaded) |
|
Islandora Paged Content | OBJ: .tif, .tiff, .jp2 |
JP2 JPG RELS-INT OCR HOCR |
Book | MODS | |
Compound Object | MODS | PDF (as applicable) |
Also Included in each DIP are the OBJ, RELS-EXT, MODS and/or DC, TN (thumbnail image, as applicable), and TECHMD.
3. Preferred File Formats
NIUL requires identification of the type of file format during ingest into the repository in order to help mitigate risk posed by format obsolescence. While there are no format restrictions on the content data objects that will be accepted, well-known, widely accepted formats that support long-term preservation are strongly preferred. If a content provider wants to submit a specific format that does not adhered to the criteria in this document, an agreement must be reached between the content provider and NIUL.
To this end, NIUL employs the use of DROID, JHOVE, file utility, Exiftool, NLNZ Metadata Extractor, and ffident through the FITS software package to identify file formats on ingest to the repository. An example characterization and reference to format registry is below:
<identification>
<identity format="Tagged Image File Format" mimetype="image/tiff" toolname="FITS" toolversion="0.6.1">
<tool toolname="Jhove" toolversion="1.5"/>
<tool toolname="file utility" toolversion="5.04"/>
<tool toolname="Exiftool" toolversion="7.74"/>
<tool toolname="Droid" toolversion="3.0"/>
<tool toolname="NLNZ Metadata Extractor" toolversion="3.4GA"/>
<tool toolname="ffident" toolversion="0.2"/>
</identity>
</identification>
NIUL is unable to ingest files with formats that are not recognized by FITS. If FITS identifies a conflict in the identification of a format between two or more tools, that conflict will be recorded in the object’s technical metadata, but the file will be accepted. Format consensus will be evaluated periodically by the repository manager, who will be responsible for investigating and resolving conflicts.
Preferred File Formats
NIUL can only commit to providing full-level preservation for preferred formats, which are detailed below. These files are sometimes also referred to as “archival masters.”
Acceptable File Formats
Acceptable formats are those that NIUL will accept, but for which it can only guarantee basic, bit-level preservation.
Normalization
For assets with file formats that do not fall into the preferred or acceptable categories,normalization may be considered, depending on the availability of funding, technology, and/or staff time. In such cases, files will be transformed into a preferred or acceptable format in order to provide access to the content and long-term preservation. However, the original file, including format identification, will be included in the AIP.
Resource Type | Preferred Format | Acceptable Format |
---|---|---|
Image | TIFF | JPEG, GIF, PNG |
Document | ||
Audio | WAV or FLAC | MP3 |
Video | AVI | MPEG, MP4, OGG, MKV, Quicktime |
4. Backup and Bit Stream Copying
Data Storage and Software Backup
NIUL is committed to regular backup procedures of both data storage areas and its software infrastructure (e.g. databases, application files). These backups are intended to serve as the basis for restoration of NIUL materials in the case of disaster or large-scale corruption of data. Data backup at NIU is coordinated Technology Services and the Department of Information Technology. Since the data is stored on physical hardware located in the Libraries, it uses the same backup hardware and software as other University Libraries systems:
- Backups of the production servers are scheduled to happen twice a week. Backups of the working space for in-process objects occur daily.
- Backups are checked for integrity every week using an automated disk safe verification tool.
- Backups are retained for 60 days.
- A compressed backup is sent to a remote data center weekly.
- To test the process of restoring backups, restores of selected backups are run on a monthly basis.
Content Data Object Copying
Copies of the content data objects themselves are created using the BagIt specification. Each “bag” consists of a “payload” and “tags.” The payload contains all of the files in the AIP, as well as a PREMIS record that is derived during bag creation. The tags contain metadata about the payload, including a manifest listing every file and each file’s corresponding checksum (see Fixity Procedures below). Details about optional components contained in the bags are documented in NIUDL's Bagit Profile, which is instantiated in a JSON files that can be externally referenced using an HTTP URI. This profile in no way modifies the canonical BagIt specification.
When new content data objects are ingested or when changes have been made to previously ingested objects, bags are automatically created using the Islandora BagIt module on long-term remote storage, which is located at the Department of Information Technology.
A copy of the content data object bag is then synced with DuraCloud, which is configured to use Amazon S3 storage. S3 includes ongoing checksum calculations on all traffic and automatic self-healing. The repository manager can periodically download “health status” reports from DuraCloud, consisting primarily of integrity checks.
5. Fixity Procedures
NIUL is committed to maintaining the integrity of digital assets in its care. This includes creating checksums for all files ingested into the repository and regular fixity checking of those assets.
NIUDL Repository
On ingest to the repository, the Islandora Checksum Module calculates an MD5 checksum for every file associated with an asset. These values are stored in the object’s audit log. On a bi-monthly basis, the Islandora Checksum Checker module is used to periodically verify that these checksums have not changed, detecting errors that may have been introduced through malicious interference or due to degradation of files over time. The results of each verification will also be recorded in the object’s audit log. At the end of each cycle, a completion report will be e-mailed to the repository manager, who will be responsible for restoring from backup any files with mismatched checksums.
Long-Term Local Storage
On ingest to the repository, SHA1 checksums are generated for each file in the AIP. When restoring from backup, these values are used to verify that the files have not changed following deposit in long-term remote storage. There is no fixity checking in local storage.
Long-Term Remote Storage
Checksums are generated by the long-term storage service provider (DuraCloud) and used to verify the integrity of all files deposited. This service generates a report following the completion of each cycle. It is the responsibility of the repository manager to investigate any mismatched checksums in long-term storage.
6. Selection and Appraisal
Selection and Appraisal for materials to be preserved NIUL will be conducted in accordance with the Digital Preservation Policy and Digital Collection Development Policy by curatorial staff and subject specialists from the content provider units, including:
- Rare Books and Special Collections
- Regional History Center and University Archives
- Southeast Asia Collection
- Digital Collections and Scholarship
- Reference & Research (as-needed)
Selection will be done in consultation with digital preservation staff in Technical Services. All items selected must adhere to the individual unit’s collection development policy.
All copyrights and literary rights are to be enforced in keeping with their respective deeds of gift documents where applicable.
Legal requirements for record retention must also be observed where applicable, in accordance with Library and University policies.
Content providers are strongly encouraged to submit files in the designated preferred file formats; any non-standard file formats submitted will be transformed into preservation formats where possible, with the option to retain the original file where feasible. All files must be scanned for viruses and malware by the content provider before submission.
7. Review Cycle and Acknowledgments
This plan was approved on May 16, 2017. It will be reviewed annually, or as needed, by the Preservation Committee to assure timely revisions as technology progresses, preservation strategies and experiences mature, and resources change.
Significant portions of this document were inspired by or borrowed from the OCUL's Scholars Portal Trusted Digital Repository Documents and York University's Preservation Policy.