Keyword Query Routing

January 8, 2017

2147

ABSTRACT:

Keyword search is an intuitive paradigm for searching linked data sources on the web. We propose to route keywords only to relevant sources to reduce the high cost of processing keyword search queries over all sources. We propose a novel method for computing top-k routing plans based on their potentials to contain results for a given keyword query. We employ a keyword-element relationship summary that compactly represents relationships between keywords and the data elements mentioning them. A multilevel scoring mechanism is proposed for computing the relevance of routing plans based on scores at the level of keywords, data elements, element sets, and subgraphs that connect these elements. Experiments carried out using 150 publicly available sources on the web showed that valid plans (precision@1 of 0.92) that are highly relevant (mean reciprocal rank of 0.89) can be computed in 1 second on average on a single PC. Further, we show routing greatly helps to improve the performance of keyword search, without compromising its result quality.

AIM:

Linked data describes a method of publishing structured data so that it can be interlinked and become more useful. Keyword search is an intuitive paradigm for searching linked data sources on the web. We propose to route keywords only to relevant sources to reduce the high cost of processing keyword search queries over all sources. In this we have implement TOP K-Routing plan based on their potentials to contain results for a given keyword query.

SYNOPSIS:

In recent years the Web has evolved from a global information space of linked documents to one where both documents and data are linked. Underpinning this evolution is a set of best practices for publishing and connecting structured data on the Web known as Linked Data. The adoption of the Linked Data best practices has lead to the extension of the Web with a global data space connecting data from diverse domains such as people, companies, books, scientific publications, films, music, television and radio programmes, genes, proteins, drugs and clinical trials, online communities, statistical and scientific data, and reviews. This Web of Data enables new types of applications. There are generic Linked Data browsers which allow users to start browsing in one data source and then navigate along links into related data sources. There are Linked Data search engines that crawl the Web of Data by following links between data sources and provide expressive query capabilities over aggregated data, similar to how a local database is queried today. The Web of Data also opens up new possibilities for domain-specific applications. Unlike Web 2.0 mashups which work against a fixed set of data sources, Linked Data applications operate on top of an unbound, global data space. This enables them to deliver more complete answers as new data sources appear on the Web.

We propose to investigate the problem of keyword query routing for keyword search over a large number of structured and Linked Data sources. Routing keywords only to relevant sources can reduce the high cost of searching for structured results that span multiple sources. To the best of our knowledge, the work presented in this paper represents the first attempt to address this problem.

We use a graph-based data model to characterize individual data sources. In that model, we distinguish between an element-level data graph representing relationships between individual data elements, and a set-level data graph, which captures information about group of elements. This set-level graph essentially captures a part of the Linked Data schema on the web that is represented in RDFS, i.e., relations between classes. Often, a schema might be incomplete or simply does not exist for RDF data on the web. In such a case, a pseudoschema can be obtained by computing a structural summary such as a dataguide.

EXISTING SYSTEM:

Existing work can be categorized into two main categories:

Ø schema-based approaches

Ø Schema-agnostic approaches

There are schema-based approaches implemented on top of off-the-shelf databases. A keyword query is processed by mapping keywords to elements of the database (called keyword elements). Then, using the schema, valid join sequences are derived, which are then employed to join (“connect”) the computed keyword elements to form so called candidate networks representing possible results to the keyword query.

Schema-agnostic approaches operate directly on the data. Structured results are computed by exploring the underlying data graph. The goal is to find structures in the data called Steiner trees (Steiner graphs in general), which connect keyword elements. Various kinds of algorithms have been proposed for the efficient exploration of keyword search results over data graphs, which might be very large. Examples are bidirectional search and dynamic programming

Existing work on keyword search relies on an element-level model (i.e., data graphs) to compute keyword query results.

DISADVANTAGES OF EXISTING SYSTEM:

Ø The number of potential results may increase exponentially with the number of sources and links between them. Yet, most of the results may be not necessary especially when they are not relevant to the user.

Ø The routing problem, we need to compute results capturing specific elements at the data level.

Ø Routing keywords return all the source which may or may not be the relevant sources

PROPOSED SYSTEM:

We propose to route keywords only to relevant sources to reduce the high cost of processing keyword search queries over all sources. We propose a novel method for computing top-k routing plans based on their potentials to contain results for a given keyword query. We employ a keyword-element relationship summary that compactly represents relationships between keywords and the data elements mentioning them. A multilevel scoring mechanism is proposed for computing the relevance of routing plans based on scores at the level of keywords, data elements, element sets, and subgraphs that connect these elements. We propose to investigate the problem of keyword query routing for keyword search over a large number of structured and Linked Data sources.

ADVANTAGES OF PROPOSED SYSTEM:

· Routing keywords only to relevant sources can reduce the high cost of searching for structured results that span multiple sources.

· The routing plans, produced can be used to compute results from multiple sources.

MODULES:

ü Linked Data Generation

ü Key level Mapping

ü Multilevel Inter relationship

ü Routing Plan

MODULES DESCRIPTION:

Linked Data Generation

The GeoNames Services makes it possible to add geospatial semantic information to the Word Wide Web. All over 6.2 million geonames toponyms now have a unique URL with a corresponding XML web service. In this we have used Country Info , Time zone and Finance Info services. This model resembles RDF data where entities stand for some RDF resources, data values stand for RDF literals, and relations and attributes correspond to RDF triples. While it is primarily used to model RDF Linked Data on the web, such a graph model is sufficiently general to capture XML and relational data.

Key level Mapping

The set-level graph essentially captures a part of the Linked Data schema on the web that is represented in RDFS, i.e., relations between classes. Often, a schema might be incomplete or simply does not exist for RDF data on the web. In such a case, a pseudoschema can be obtained by computing a structural summary such as a data guide. A set-level data graph can be derived from a given schema or a generated pseudoschema. The web of data is modeled as a web graph where GA is the set of all data graphs, N is the set of all nodes, E is the set of all “internal” edges that connect elements within a particular source.

Multilevel Inter relationship

The search space of keyword query routing using a multilevel inter-relationship graph. The inter-relationships between elements at different levels keyword is mentioned in some entity descriptions at the element level. Entities at the element level are associated with a set-level element via type. A set-level element is contained in a source. There is an edge between two keywords if two elements at the element level mentioning these keywords are connected via a path. We propose a ranking scheme that deals with relevance at many levels.

Routing Plan:

Given the web graph W =(G,N,E) and a keyword query K, the mapping: K-2G that associates a query with a set of data graphs is called a keyword routing plan RP. A plan RP is considered valid w.r.t. K when the union set of its data graphs contains a result for K. The problem of keyword query routing is to find the top-k keyword routing plans based on their relevance to a query. A relevant plan should correspond to the information need as intended by the user.

SYSTEM REQUIREMENTS:

HARDWARE REQUIREMENTS:

Ø System : Pentium IV 2.4 GHz.

Ø Hard Disk : 40 GB.

Ø Floppy Drive : 1.44 Mb.

Ø Monitor : 15 VGA Colour.

Ø Mouse : Logitech.

Ø Ram : 512 Mb.

SOFTWARE REQUIREMENTS:

Ø Operating system : Windows XP/7.

Ø Coding Language : JAVA/J2EE

Ø IDE : Eclipse Keepler