The following text was automatically extracted from the image on this page using optical character recognition software:
1 mphillips$ python pyoaiharvest.py -1 http://digital.library.unt.edu/explore/collections/I The above command will harvest all records from the OAI-PMH repository available at the url http:l//digital.library.unt.edulexplore/collections/UNTSW/oail and save them as a file named untsw.dc.xml. The default metadata format of oai_dc is requested from the repository and transmitted back to the harvester. For a full list of command line options see the help screen for the script: 1 mphillips$ python pyoaiharvest.py -h 2 Usage: pyoaiharvest.py [options] 3 4 Options: 5 -h, --help show this help message and exit 6 -1 LINK, --link=LINK URL of repository 7 -o FILENAME, --filename=FILENAME 8 write repository to file 9 -f FROMDATE, --from=FROMDATE 10 harvest records from this date yyyy-mm-dd 11 -u UNTIL, --until=UNTIL 12 harvest records until this date yyyy-mm-dd 13 -m MDPREFIX, --mdprefix=MDPREFIX 14 use the specified metadata format 15 -s SETNAME, --setName=SETNAME 16 harvest the specified set The tool supports requesting a specific setSpec, a different metadata format, or limiting to a date range for updating collections of existing records. Repository Breakers Once a set of metadata records have been harvested, the next step of processing metadata records is converting them into a text format that can act as input to common command line tools. The tool dcbreaker.py is used for this function. This tool efficiently consumes the output format from the pyoaiharvester.py script as input and provides a set of options for converting this into formats easily used by other command-line tools. 1 mphillips$ python dc breaker.py untsw.dc.xml 2 1000 records processed 3 4 {http://purl.org/dc/elements/1.1/}contributor: ==============1032/1835 5 {http://purl.org/dc/elements/1.1/}coverage: == 172/1835 6 {http://purl.org/dc/elements/1.1/}creator: 1834/1835 7 {http://purl.org/dc/elements/1.1/}date: 1807/1835 8 {http://purl.org/dc/elements/1.1/}description: 1832/1835 9 {http://purl.org/dc/elements/1.1/}format: 1835/1835 10 {http://purl.org/dc/elements/1.1/}identifier: 1835/1835 11 {http://purl.org/dc/elements/1.1/}language: 1835/1835 12 {http://purl.org/dc/elements/1.1/}publisher: =========681/1835 13 {http://purl.org/dc/elements/1.1/}relation: ====351/1835 14 {http://purl.org/dc/elements/1.1/}rights: 1671/1835 15 {http://purl.org/dc/elements/1.1/}source: 1426/1835 16 {http://purl.org/dc/elements/1.1/}subject: 1835/1835 17 {http://purl.org/dc/elements/1.1/}title: 1835/1835 18 {http://purl.org/dc/elements/1.1/}type: 1835/1835 19 20 dc completeness 79.258856 21 collection completeness 84.250681 22 wwww completeness 99.604905 23 average completeness 87.704814 The example above runs the tool without any options selected. This will generate an output that can be helpful for seeing
the overall utilization of fields within a collection of metadata records. It shows the fifteen elements of the oaidc metadata scheme, as well as a visualization of the percentage of records in the repository file which have at least one value in this element. Next, the output presents a column showing the number of records in the repository that contain this field related to the total number of records in the collection and, finally, a percentage of utilization of this field in the collection.