IMPROVING BUSINESS ON WEB USING WEB MINING
Posted On April 13, 2012 by Sneha Latha filed under Miscellaneous
The Internet has changed the rules for today’s businesses, which now increasingly face the challenge of improving and sustaining performance throughout the enterprise. The growth of the World Wide Web and enabling technologies has made data collection, data exchange and information exchange easier and has resulted in speeding up of most major business functions. Delays in retail, manufacturing, shipping, and customer service processes are no longer accepted as necessary evils, and firms improving upon these (and other) critical functions have an edge in their battle of margins. Technology has been brought to bear on myriad business processes and affected massive change in the form of automation, tracking, and communications, but many of the most profound changes are yet to come. Leaps in computational power have enabled businesses to collect and process large amounts of data. The availability of data and the necessary computational resources, together with the potential of data mining, has shown great promise in having a transformational effect on the way businesses perform their work. Well-known successes of companies such as Amazon.com have provided evidence to that end. By leveraging large repositories of data collected by corporations, data mining techniques and methods offer unprecedented opportunities in understanding business processes and in predicting future behavior. With the Web serving as the realm of many of today’s businesses, firms can improve their ability to know when and what customers want by understanding customer behavior, find bottlenecks in internal processes, and better anticipate industry trends.
2. Web Mining
Web mining is the application of data mining techniques to extract knowledge from Web data, including Web documents, hyperlinks between documents, and usage logs of Web sites. A panel organized at ICTAI 1997 (Srivastava and Mobasher, 97) asked the question "Is there anything distinct about Web mining (compared to data mining in general)?" While no definitive conclusions were reached then, the tremendous attention on Web mining in past decade, and the number of significant ideas that have been developed have answered this question in the affirmative. Many informative surveys exist in the literature that addresses various aspects of Web mining (Cooley et al, 1997; Kosala and Blockeel, 2000; Mobasher, 2005).
Two different approaches have been taken in defining Web mining. First was a 'process-centric view', which defined Web mining as a sequence of tasks (Etzioni, 1996). Second was a 'data-centric view', which defined Web mining in terms of the types of Web data that was being used in the mining process (Cooley et al, 1997).
The second definition has become more acceptable, as is evident from the approach adopted in most recent papers that have addressed the issue. In this chapter, we use the data-centric view of Web mining, which is defined as, “Web mining is the application of data mining techniques to extract knowledge from Web data, i.e. Web Content, Web Structure and Web Usage data.”
The attention paid to Web mining in research, software industry, and Web-based organizations, has led to the accumulation of a lot of experiences. Its application in business computing has also found tremendous utility.
2.1 Web Mining Taxonomy
Web mining can be broadly divided into three distinct categories according to the kinds of data to be mined. We provide a brief overview of the three categories and an illustration depicting the taxonomy is shown in Figure 2.
2.1.1Web content mining:
Web content mining is the process of extracting useful information from the contents of Web documents. Content data corresponds to the collection of information on a Web page, which is conveyed to users. It may consist of text, images, audio, video, or structured records such as lists and tables. Application of text mining to Web content has been the most widely researched. Issues addressed in text mining include topic discovery, extracting association patterns, clustering of Web documents and classification of Web Pages.
Research activities on this topic have drawn heavily on techniques developed in other disciplines such as Information Retrieval (IR) and Natural Language Processing (NLP). While a significant body of work in extracting knowledge from images, in the fields of image processing and computer vision exists, the application of these techniques to Web content mining has been limited.
2.1.2 Web structure mining:
Web structure mining is the process of discovering structure information from the Web. The structure of a typical Web graph consists of Web pages as nodes and hyperlinks as edges connecting related pages. Web structure mining can be further divided into two kinds based on the type of structured information used.
• Hyperlinks: A Hyperlink is a structural unit that connects a location in a Web page to different location, either within the same Web page or on a different Web page. A hyperlink that connects to a different part of the same page is called an Intra-Document Hyperlink, and a hyperlink that connects two different pages is called an Inter-Document Hyperlink. There has been a significant body of work on hyperlink analysis (see survey paper on hyperlink analysis, Desikan et al, 2002).
• Document Structure: The content within a Web page can also be organized in a tree-structured format, based on the various HTML and XML tags within the page. Here, mining efforts have focused on automatically extracting document object model (DOM) structures out of documents.
2.1.3 Web usage mining:
Web usage mining is the application of data mining techniques to discover interesting usage patterns from Web data, in order to understand and better serve the needs of Web-based applications. Usage data captures the identity or origin of Web users along with their browsing behavior at a Web site. Web usage mining itself is further classified depending on the kind of usage data used:
• Web Server Data: The user logs are collected by Web server. Typical data includes IP address, page reference and access time.
• Application Server Data: Commercial application servers, e.g. Weblogic, etc. have significant features in the framework to enable E-commerce applications to be built on top of them with little effort. A key feature is the ability to track various kinds of business events and log them in application server logs.
• Application Level Data: New kinds of events can always be defined in an application, and logging can be turned on for them - generating histories of these specially defined events.
3. How Web Mining is Going to Make the Users Life Easy?
Ultimate objective of the most data mining projects is to use the insights and the models to improve business, by sharing the results across departments by representing with visualization charts, maps effectively.
3.1Solution for Business Decision Problems of E-Commerce for Retailer’s Web Site Solved Using Web Mining
In an e-commerce web site a reduction in user’s web site behavior analysis time and web site usage, trends will be a value addition provided by web mining. This e-commerce web site, required to develop a web data mining system for business users and data analysts as an end to end solution comprising of data gathering, cleansing, ETL operations, warehousing. The business intelligence systems created user friendly, flexible, dynamic, multidimensional factual reporting, supported by visualization, and web data mining techniques.
In the e-commerce web site the data gathering and data sources includes not only customer registration and demographic information but also web click-streams, response to direct-mail, email campaigns, and orders placed through a website, call center amongst the other sources. The quantity of data can vary above 100 million records. The E-commerce Web Site Architecture can collect additional click stream data besides the data in the web logs; web logs have sensitive information about customer’s login, session information, IP addresses indicating the area, region they belong to and their age, frequency of using the web site etc. The focus on Business to Customer(B2C) e-commerce for retailers helps in understanding and fulfilling the business needs to develop the required expertise and design out of the box reports and analysis of the domain’s future trends and patterns of customer behavior understand in a better way to the business user. It can answer the business questions such as to identify heavy spenders at the web site, which are the customers who express willingness to receive emails from the web site are heavy spenders? Such answers reflect the customer’s loyalty, based on these results promotion offers and discount offers can also be derived by the business decision maker and possibility to increase in customers can be increases for registrations to web site.
3.2 Solution to the Search Engine Problems and How Web Mining Can Help in Improving the Business Decisions
As the search engines use enormous information existing in the web sites, web pages, it is a challenging task to engineer, implement and to improvise the search engine. This specifies that indexing of web pages involves a huge task. Per day tens of millions of queries are given to search engine. It indicates the tremendous magnitude of data. The problems of scaling traditional search techniques to this magnitude data; new technical challenges are involved in using the additional information present in hypertext to produce better search results. The real question of how to build a practical large-scale system which can exploit the hypertext information can be answered by using web mining techniques and improving the capabilities of the search engines by giving better results to customers. It helps in problems of how to effectively deal with uncontrolled hypertext collection where anyone can publish anything they want. Web Mining Applications have been used by these web sites such as Web search e.g., Google and Yahoo, Web Vertical Search e.g., FatLens and Become, Web Recommendations e.g., Amazon.com , Web Advertising e.g., Google and Yahoo, Web site design e.g., landing page optimization.
4. The Various Business Areas Where Web Mining has Helped in Improving the Business Decision Making
Analysis of click-stream data i.e. web mining uncovers real-time e-business opportunities across geography. It provides ways to target right customers and understand their needs and to customize services and strategies in near-or-real time. The area of advertising is no exception for utilizing the opportunities provided by online customer analytics to promote right products in real time to the right customer. It also helps in effectiveness of a web site as a channel for marketing by quantifying the user’s behavior while on the web site.
Analytical CRM utilizes business intelligence and reporting methodologies such as data mining and analytical processing to CRM applications. While the earlier CRM implementations focus on improving operational efficiencies in the sales and service functions through tailor-made solutions for call-center management, analytical CRM solutions use intelligence solutions to analyze the data, identify the demographic profiles and measure the purchase frequency and other behavioral patterns of the customers. With the amount of available online content, today organizations put premium on understanding, adopting and managing the same, convert them into appropriate knowledge suitable to serve their customers better, and thus improve the operations and accelerate the process of delivery of products to markets. The World Wide Web is a fertile area for web Mining and it can provide applications, methods, algorithms to be beneficial in various real-world applications with respect to the critical e-CRM function.
4.3 Customer Behavior
Web Mining helps in understanding the concerns such as current and future probability of every customer, relationship between behavior and the loyalty at the website The models based on customer-centric web behavior can be used not only for identifying improvements in the appeal of web site segmentation, which are based on web behavior providing a precise basis for personalization but also for predicting customer’s future behavior that is essential for website content planning and design.
4.4 Web Usage Mining for Proxy Server
Web Usage Mining is an aspect of data mining that has received a lot of attention in recent years. Commercial companies as well as academic researchers have developed an extension array of tools that perform several data mining algorithms on log files coming from web servers in order to identify user behavior on a particular website. Performing this kind of investigations on your website can provide information that can be used to better accommodate the user’s needs. An area that has received much less attention is the investigation of user-behavior on proxy servers. Servers of Internet Service Providers (ISPs) log traffic from thousands of users of websites. This can give a general overview of user behavior on the Internet or an overview of behavior within a specific sector.
4.5 Web Site Service Quality Improvement
The World Wide Web is one of the most used interfaces to access remote data and commercial, non-commercial services and the number of actors involved in these transactions is growing very quickly. Everyone using the Web Experiences knows that how the connection to a popular website may be very slow during rush hours and it is well known that web users tend to leave a site if the wait time for a page to be served exceeds a given value. Therefore, performance and service quality attributes have gained enormous relevance in service design and deployment. This has led to the development of web benchmarking tools that are largely available in the market. One of the most common criticism to this approach is that synthetic workload produced by web stressing tools is far from realistic. Moreover, websites need to be analyzed for discovering commercial rules and user profiles and models must be extracted from log files and monitored data.
In today’s era where the entire world has become a global village and the driving force is internet having e-business to internet blogs to search engines, the major questions in front of the business users is while they would like to retain the existing customers and also would like to understand the patterns and trends of customer behavior so that their decisions can be supported with facts represented with visualizations and appropriate reporting made possible with web mining. The success of accuracy of deriving patterns is directly proportional to the amount of sample data used for the data mining techniques. The advantages of using web mining in search engines and e-commerce, CRM, customer behavior analysis, cross selling; web site service quality improvement is noticeable. The recommendation of using web mining techniques can be applied successfully with a keen analysis of clearly understood business needs and requirements. Also one more governing factor is the amount of data, as the data is voluminous the results can be more towards the correct trends and patterns to be predicted from the given set of data.
But although the web mining techniques can be applied to even the small web sites with a few number of web pages and links within them, web mining may not be the answer for its improvement as it will not be the optimum solution as far as the cost factor in terms of parameters such as complexity of web mining techniques using algorithms may not be recommended. Possible applications can be On-line social networking community software applications can use web mining techniques to explore the effectiveness of on-line networking, also areas such as knowledge management web sites and web mining can also be useful in bioinformatics, e-governance and e-learning.
- Berson Alex, et al, 2000. Building Data Mining Applications for CRM. Publishers, TATA McGRAW HILL, New Delhi, INDIA.
- G. K. Gupta, 2006. Introduction to Data Mining with Case Studies. Publishers, Prentice Hall, New Delhi, INDIA.
- N. Girija, 2006. Web Mining. Publishers, ICFAI University Press, Hyderabad, INDIA.
- J. Srivastava, et al, 2000. Web Usage Mining: Discovery and Applications of Usage patterns from Web Data, ACM SIGKDD Explorations, Vol 1, No 2, pp 12-23.
- Magdalini Eirinaki et al, 2003. Web Mining for web personalization. In ACM Transactions on Internet Technology, Vol.3, No. 1, pp 1- 27.
- R. Kosala, et al, 2000. Web Mining Research: A Survey, ACM SIGKDD Explorations, Vol 2, No 1, pp 1-15.
- S. Chkrabarti, et al, 2000. Data Mining for Hypertext: A Tutorial Survey, ACM SIGKDD Explorations, Vol 1, No 2, pp 1-11.
- Joshi, A. et al, 2000. On mining web access logs. In ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, pp 63-69
- Magdalini Eirinki, et al, 2005. Web path recommendations based on page ranking and Markov models, Proceedings of 7th annual ACM international workshop on web information and data management, Bremen, Germany.
By Mohit Gupta, Rachit Mohan Garg
M.Tech Student, Jaypee University of Information Technology, H.P