Current proposals for
languages to query semi-structured data provide only limited capabilities for
flexible querying, with no ability to rank the answers for users.
Our research involves
the investigation and development of techniques for enabling users to query
semi-structured data in a flexible fashion. This is achieved by allowing the
user to specify various approximation and relaxation operations on the
conditions of a query, which will subsequently allow query results to be
returned ranked in terms of how closely they match the original query.
Application areas
The outcomes from this
research will be useful in domains where users may not be familiar with the
structure of the data or where they may want to browse the data in an
exploratory manner. One application which is currently being investigated as a
case study is the L4All system. This
system allows users to create and maintain a record of their personal learning
and work experiences to date (visualised in the form of a timeline), as well as
their future learning and career aspirations. Users can search over this
information, with the aim of supporting collaborative formulation of future
learning goals and aspirations.
Even though L4All users are able to pose queries for
finding relevant timelines and the learning and work episodes within them, the
flexibility of the querying mechanisms provided by the system is limited. The case study aims to extend L4All by allowing users to specify
approximations and relaxations to be applied to their initial search query.
Query results will then be returned incrementally, ranked in order of
increasing "edit distance" from the original query.
An example
To illustrate the
principle of query approximation with an example, assume we have the data
below.
Jane's timeline, where"next" indicates the sequencing of successive episodes in the
timeline and "prereq" indicates that Jane has stated that undertaking an earlier
episode was necessary in order for her to be able to proceed to a later episode
Tom might pose a query
asking which jobs have an "English Studies" degree as a "prereq"
(prerequisite). Without query approximation, no results from Jane's timeline
would be returned, even though it is clear that this timeline would be of
interest to Tom.
However, some answers can be returned by applying
query approximation to Tom's query: By replacing "prereq" in the query by
"next" - with an edit cost of 1 - the answer Air Travel Assistant would be
returned (from episode ep2). By replacing "prereq" by "next" and inserting a
second "next" - at a combined edit cost of 2 - the answer Journalist would be
returned. By inserting "next" twice in front of "prereq" - also at a combined
cost of 2 - the answer Assistant Editor would be returned.