Clickthrough Data and the World Wide Web
A valid dilemma for the majority of existing search engines on the Web is that users are requesting for accurate search results while these search engines are limited in capability in the providing of results to these queries as because of their limited length which is normally less than two on the average. In recent times, numerous works have been done in the community of web search for the purpose of expanding the query terms with similar keywords for refining the results provided by the search engines. One basic idea is the utilisation of the click-through data, with the capability of recording the interactions between the search engine and the users known as the feedback for the purpose of learning the similarity between or among the query keywords. Existing work on search engine similarity measure is normally classified into two categories namely: the query term based query expansion and the document term based query expansion. The query term based query expansion refers to the measurement of similarity between the terms of a query with the utilization of the similarity propagation of web pages being clicked while the document term based query expansion refers to the measurement of similarity between or among document terms and search queries primarily based on the search engines’ query log of data. The perception following this idea is that web pages are similar if they are visited by users which are issuing related queries, and these queries are considered similar if the corresponding users visit related pages of the web.
The problem of web personalization has become very popular and critical with the continued growth of users of the Internet or the web making the availability of information a necessity. As a result, web developers are trying their best for the customization of a website to be able to meet the needs of users with the aid of knowledge obtained from the behaviour of user navigation. User page visits are considered essentially sequential in nature that needs the services of efficient clustering algorithms for the providing of sequential data. As such, set similarity measure or S3M has the ability to capture both the order of occurrence of visits to a web page and the content of the web page as well. The expected outcome of this research project is the providing of a method that could offer efficiency in the area of searching documents on the web while learning retrieval functions with the use of set similarity measures. Challenges in achieving this outcome include the constraints on time, availability of resources related to the use of similarity measures as search engine technique and the collection of data relevant to the current problem of the research project. Other researchers approached these constraints through the proper use of time and resources, strict consideration of relevant resources needed for the completion of their respective projects and their adherence to the ethics of doing research. Other researchers only focus on the current problem of their research projects and disregard other issues that may not be related or may distract the flow of thought of their projects. However, problems still exist with their approaches since feedback are expected from their work that will require them to redo or revised some discussions on their chosen problem or topic. Meanwhile, this research has tapped several approaches to avoid the problems encountered by previous researchers by strategically planning the various phases of the research project. In this approach, the researcher was engaged in a number of sessions to carefully plan the project by considering its problem, target, the resources to be used, its timeframe, its significance, aims and objectives and its expected contribution or outcome. This approach is beneficial to the researcher since it allows for the organisation of the stages to complete the research project.
Clickthrough Data applied in Search Engines
Click-through data in search engines can be considered as triplets composing of the query q, the ranking r presented to the user, and the set c or (q, r, c,) of links that the user is clicking. The first figure can illustrate this process with the following example: the user asked the query “support vector machine”, received the ranking shown in Figure 1, and then clicked on the links ranked 1, 3, and 7. Given that each query is equivalent or correspond to one triplet, the quantity of data that is potentially obtainable or available is practically unlimited. Therefore, it is clear that web users have no tendency to click on links randomly but are making a somewhat educated choice. Whereas clickthrough data is normally noisy and clicks are not always considered as perfect relevance judgments, the clicks are likely to express some information. The main question, therefore, is how this information can be extracted. As such, before the derivation of a model on how clickthrough data are analysed, consideration on how it is recorded is also a focal point too.
Although clickthrough data was able to receive significant awareness in the measurement of similarity between the terms being queried in the community of web search, the majority of existing work of researchers and web enthusiasts ignored the significant fact that the similarity between query terms frequently evolves over time.
Get to know more about clickthrough data and SEO with our SEO experts, contact us!