Metadata Quality Assessment Plan
This assessment plan applies only to descriptive metadata in Huskie Commons and the NIU Digital Library (NIUDL), including MODS records, which are created for objects in NIUDL, and Dublin Core (DC) records, created for all objects in both repositories. Scope is limited to quantitative analysis of elements and attributes using community best practices, the NIU Data Dictionary, and associated Metadata Project Specifications for evaluation. The strategy described below relies on harvesting and quality analysis scripts developed by Mark Phillips (UNT) and Christina Harlow (Cornell), with additional scripts and modifications developed locally by the Metadata Librarian for harvesting and analyzing MODS records from NIUDL.
Assessment Workflow
There are 2 stages in the Metadata QA Assessment Workflow: harvesting records and analysis. Harvesting methods vary depending on the target repository and schema.
Repository | Records | Method | Command |
Huskie Commons | DC | oaiharvest.py | python oaiharvest.py -m oai_dc -o huskiecommons.oaidc.xml -l http://commons.lib.niu.edu/oai/req uest |
NIUDL | DC | oaiharvest.py | python oaiharvest.py -m oai_dc -o niudl.oaidc.xml -l http://digital.lib.niu.edu/oai2 |
MODS | oaiharvest.py | python oaiharvest.py -m mods -o niudl.oaidc.xml -l http://digital.lib.niu.edu/oai2 |
By default, these methods harvest every record in the repository. However, each will also accept the optional argument “s” to narrow the harvest to a specific set or collection. This is often desirable when requirements vary from collectiontocollection and, in the case of MODS records in NIUDL, because harvesting the entire repository can take several hours.
There are two methods for analyzing harvested records, which vary according to the target schema.
Repository | Records | Method | Command |
Huskie Commons | DC | oaidc_analysis.py | python oaidc_analysis.py huskiecommons.oaidc.xml |
NIUDL | DC | oaidc_analysis.py | python oaidc_analysis.py niudl.oaidc.xml |
MODS | oaimods_analysis.py | python oaimods_analysis.py niudl.mods.xml |
Records are analyzed by counting the occurrences of each element in the set and displaying their overall utilization. The completeness of the set is calculated by measuring against the set itself (collection_completeness), recommendations by the Kernel Metadata and Electronic Resource Citations community (wwww_completeness), the DPLA’s minimum requirements (dpla_completeness), and NIU’s data dictionary (niu_completeness). Below is an example analysis of DC records in Huskie Commons.
dc_completeness: all 15 DC elements
collection_completeness: only those elements used within a collection
wwww_completeness: elements recommended by the Kernel Metadata and Electronic Resource Citations community (requires who, what, where, and when)
dpla_completeness: title, rights, and identifier
niu_completeness: required elements in the NIU Data Dictionary
Metadata Quality Assessment Report
A Metadata Quality Assessment Report will be prepared annually consisting of collectionbycollection analysis and recommendations for remediation. Remediation strategies will be developed and implemented by the Metadata Librarian, with subsequent reports demonstrating any improvements as a result of these activities.
The report will only include new or living collections, defined as any collection that has grown or changed in the preceding calendar year.