Search Engine Technique by Using Similarity Measures

Title: Search Engine Technique by Using Similarity Measures
Publisher: Guru Nanak Publications
ISSN: 2278-0947
Series: Volume 4 Issue 1
Authors: Vijayalakshmi. K, Sudarson Jena, R. Rajeswara Rao


The Worldwide Web or commonly abbreviated as WWW and generally known as the web, is the system of interlinking documents contained on the Internet. Meanwhile, search engines are tools which provide effective relevance information that has the capability of matching the queries of a user. The results obtained by web search engines are in large number and some of them are not relevant to the query being entered. These web search engines have their roots in information retrieval systems which have the capability of preparing a keyword index and respond to a keyword with a list of ranked documents. This paper deals with the optimization of the relevance of information generated by web search engines with the use of clickthrough data. Information retrieval systems have the capability to generate relevant documents presented in greater ranking followed by less relevant documents. In this paper, we propose new learning retrieval function which maximizes the user’s preferable information. The goal of this paper is the development of a method which can utilize the query log along with the similarity between the generated ranking of search engines and the user’s preferable ranking. To be specific, the researchers are presenting a method for learning retrieval functions by taking or utilizing a Set Similarity Measure (S3M)[4]. This method is shown to be well-founded in a risk minimization and is feasible even for large set of queries. Likewise, the researchers will be developing an evaluation framework in which the performances of the algorithms are compared in terms of whether the clusters (groups of Web search engine ranking and user clicked preference) are correctly identified with the use of a replicated clustering approach. In addition, we provide an investigation of whether clustering performance is affected by different sequence representations and different distance measures as well as by other factors such as number of actual Web user clusters, number of Web pages, similarity between clusters, minimum session length, number of user sessions, and number of clusters to form.


Clustering, Distance measure Ranking, Web search engines.

Download Full Text

(For complimentary copy, please contact