Automatic generation of warehouse mediators using an ontology engine Page: 4 of 11
This article is part of the collection entitled: Office of Scientific & Technical Information Technical Reports and was provided to UNT Digital Library by the UNT Libraries Government Documents Department.
Extracted Text
The following text was automatically extracted from the image on this page using optical character recognition software:
schema change Section 4 presents the DataFoundry
ontology and motivates the design decisions, while
Section 5 outlines how it is used to generate the mediators
We conclude with a discussion of possible future research
directions
2 Related Work
There are two areas of ontology based research
related to this work using ontologies to describe domain-
specific knowledge, and using ontologies as an
information resource for database integration Efforts
such as Cyc [LG90], the Generalized Uppet Model
IBMR], the Unified Medical Language System (UMLS)
[COS] and Miller's work on information capacity
1MIR93] are primarily concerned with accurately
representing a particular subset of domain specific
knowledge This work focuses on ensuring the semantics
represented in the model accurately reflect the real world
semantics of domain-specific concepts Other efforts such
as the Information Manifold [KLS95, LRO96, FKL97],
TSIMMIS [CMH94], and InfoSleuth [BBB97] utilize
ontologies as an information repository used to guide
program execution In these projects the semantics of the
concepts represented in the ontology, while important, are
not the focus of the work Because this is similar to our
approach, two of these projects, the Information Manifold
and TSIMMIS, are discussed in slightly more detail
The Information Manifold uses a federated database
architecture to provide a consistent view of data
represented in a distributed heterogeneous environment
The emphasis of the project is on correctly identifying the
relevant set of data sources for a particular query To this
end, an ontology is used to represent information about
the contents of each data source The available
information includes both the type of information
available from the data source, represented as a query on
the global schema, and the coverage of the source,
represented as the likelihood that an arbitrary instance of
the data set is present in the data source When given a
query, the Infoimation Manifold uses this information to
identify the relevant data sources, and ordet them based
on the how likely they are to contain information required
by the query By querying the most relevant sources first,
this approach makes effective use of limited resources
TSIMMIS takes a different approach to integrating
heterogeneous data sources, providing a multi-database
interface instead of a federated schema As a result, the
end user must resolve the semantic and syntactic conflicts
that arise between the data sources To help ease this
burden, the table attributes are tagged with metadata
describing the value's semantics For example, a
temperature field could be marked as "degrees
Fahrenheit " While this is a practical approach to
accessing heterogeneous data sources, multi-databases
place a heavy burden on the end-user, and are infeasiblewhen unless all users are familiar with the semantic
differences between the underlying databases
An important issue orthogonal to what information is
contained within an ontology, is how that information is
represented One of the major problems in the area of
ontology research is the inability to transfer knowledge
between different systems This arises not only because
the systems represent different, possibly conflicting,
aspects of a problem domain, but also because the actual
data representations are incompatible This is particularly
significant problem in the domain of intelligent agents,
where knowledge sharing between heterogeneous systems
is a critical requirement In an effort to help alleviate this
problem, the Stanford Knowledge Systems Laboratory
project, as part of the DARPA knowledge sharing effort,
has developed Ontolingua [Gru92, FFR97], a language
capable of translating between various ontology
representations including the knowledge interchange
format (KIF) [GSS94] and LOOM [Mac93] While not a
complete solution, this effort should reduce the effort
required to exchange knowledge between existing
knowledge bases
As previously discussed, ontologies are a common
method of representing knowledge for interacting with
heterogeneous data sources DataFoundry's usage differs
from these efforts in that it represents a different set of
knowledge, the knowledge required to identify and
resolve both syntactic and semantic conflicts between
information stored in these sources The DataFoundry
ontology is represented using Ontolingua because its
ability to translate an ontology into several different
formats provides the greatest flexibility
3 The Role of Mediators
In a mediated warehouse architecture, mediators are
used to obtain data from a source and transfer it to a
warehouse This data is then used to either populate
warehouse tables or to dynamically answer a query, which
is irrelevant foi purposes of this discussion Ideally, the
mediator interacts with the data source through a wrapper,
which is capable of interpreting the data source and
forwarding the required data to the mediator Depending
on how the data source is represented (e g , relational
database, flat files, etc ), the wrapper may simply be the
default DBMS inter face, or it may also be custom built
Unfortunately, when the wrapper is custom built the
distinction between the wrapper and mediator may
become blur red Frequently in operational databases the
wrapper, transformation, and population functionality are
combined into a single program, as shown in Figure 1 (a)
In this situation, whenever the source schema changes,
both the wrapper and the transformation methods must be
manually updated to reflect these modifications This can
entail a significant effort due to the complexity of the
resulting codeT Critchlow, M Ganesh, R Musick
8-2
Upcoming Pages
Here’s what’s next.
Search Inside
This article can be searched. Note: Results may vary based on the legibility of text within the document.
Tools / Downloads
Get a copy of this page or view the extracted text.
Citing and Sharing
Basic information for referencing this web page. We also provide extended guidance on usage rights, references, copying or embedding.
Reference the current page of this Article.
Critchlow, T. Automatic generation of warehouse mediators using an ontology engine, article, April 1, 1998; California. (https://digital.library.unt.edu/ark:/67531/metadc688375/m1/4/?rotate=270: accessed April 24, 2024), University of North Texas Libraries, UNT Digital Library, https://digital.library.unt.edu; crediting UNT Libraries Government Documents Department.