Core Infrastructure Considerations for Large Digital Libraries Page: 6
iv, 19 p.View a full description of this text.
Extracted Text
The following text was automatically extracted from the image on this page using optical character recognition software:
Geneva Henry
Model, however, thought should be given to alternative database
implementations. Under consideration are
* modifying Solr, such as caching and precomputing search results
to accommodate continued growth;
* moving to the noSQL document database and combining that
with Solr; and
* developing a management solution for RDF triple stores. This,
however, is of some concern because billions of triples would be
needed to represent the content. Such a solution could have a sig-
nificant negative impact on performance.
The choices made regarding content format, metadata, and
search and browse strategies, as well as the ability of the project to
maintain the technology, should inform decisions about selecting a
repository platform or data management infrastructures.
2.4 Content Distribution and Format Assumptions
Decisions about content management may affect the approach taken
to ensuring the content is available when needed. One means of pro-
viding reliable access is to mirror (i.e., replicate) the digital library
in multiple locations so that if there is a problem at one location, the
mirrored site can provide continued access to the resources and ser-
vices. Mirroring across multiple geographic locations can facilitate
better, more reliable delivery of the resources than does a single site.
Examples of large archives that have mirror sites include the Internet
Archive, with the production site in San Francisco and the mirror site
at Bibliotheca Alexandrina in Egypt; Europeana, with mirrored sites
at its host provider's data centers in Amsterdam and Almere in The
Netherlands (it is unclear which site is the primary production site);
and HathiTrust, with the production site at the University of Michi-
gan and the mirror site at Indiana University's Indianapolis campus.
Mirroring is not the only means of configuring systems to ensure
continued high availability. The projects and organizations men-
tioned in the previous paragraph are also attentive to load balancing
their servers and distributing functions across multiple servers. High
availability configurations will generally support scalability so that
as traffic increases and content grows, systems will continue to meet
user demands. NSDL supports high availability by using a Fedora-
level transaction journaling system developed for the project. This
system allows for replication of transactions in real time to two "fol-
lower" systems, ensuring minimal downtime in the event of updates
and failures (Krafft, Birkland, and Cramer 2008).
Backup and restore services are critical for the recovery of con-
tent in the event of a catastrophic failure. Such services are among
the simplest but most fundamental for any repository. The frequency
of backups and the media chosen for backups (e.g., tape, disk) vary
across projects. As discussed previously, storage solutions can
have redundancy built into their architecture. In addition to the
redundancy provided in its clustered storage, HathiTrust provides
Upcoming Pages
Here’s what’s next.
Search Inside
This text can be searched. Note: Results may vary based on the legibility of text within the document.
Tools / Downloads
Get a copy of this page or view the extracted text.
Citing and Sharing
Basic information for referencing this web page. We also provide extended guidance on usage rights, references, copying or embedding.
Reference the current page of this Text.
Henry, Geneva. Core Infrastructure Considerations for Large Digital Libraries, text, July 2012; Washington, DC. (https://digital.library.unt.edu/ark:/67531/metadc98133/m1/11/: accessed April 24, 2024), University of North Texas Libraries, UNT Digital Library, https://digital.library.unt.edu; .