
Advances in metrics and e-commerce evaluation techniques
Author(s) -
Constantine J. Aivalis,
Κωνσταντίνος Αϊβαλής
Publication year - 2021
Language(s) - English
Resource type - Dissertations/theses
DOI - 10.12681/eadd/48115
Subject(s) - data science , computer science
Web Application Evaluation and Analytics Systems is intrinsically a multidisciplinary topic of Informatics, since its primary goal is to collect data from every possible facet of the operational environment of a web application system, process it and provide insight to the management. Analytics applications are primarily based on Network Programming techniques, Data Structure Design and Implementation, Database Management Systems, Human Factors and Interfaces, Web Development Frameworks, Data and Information Visualization and Data Mining. Social Media with their broad content and their Application Programming Interfaces, as well as mobile devices which enhance the experience by extending the geographical boundaries of their accessibility. This research aims at analyzing the status quo, studying and proposing advances of Web Analytics applications and providing a software prototype, including innovative Analytics techniques. Problems and arising issues are pointed out, and appropriate solutions and techniques are presented, that provide software specifications and implementations of systems that offer full insight to the way web and E-Commerce applications operate, as well as to the mode they are visited and used by customers and the public using the Internet. Metrics concerning performance and customer habits and visitor behavior are examined and various algorithms and environments that have been developed to provide them are described. Hardware and Software innovations resulting to a perpetual evolution of platforms and foundations used to develop and operate Web Applications are taken into consideration. How the evolution of platforms is dealt with, is an interesting topic that has been analyzed here as well. This perpetual evolution triggers the development of appropriate techniques to accommodate and support adaptive measurement technologies and Analytics applications. Several solutions that have been implemented are presented in this work. Big Data techniques, that allow horizontal scaling of the volume of data the application can support, as well as enrichment of the variety of data sources provide a more accurate, higher velocity, saving time and give a more exact picture of the operation and finally remote accessibility to the e-commerce application. In addition to data collected internally by the web server, running the web application and enriched by Big Data techniques, data sources such as various Social Media Applications are also used to enhance the information collection ever further. Social Media offer large possibilities of combining personal information from external sources and allow the analyzer to complete the insight. There are various forms of Analytics applications and add-ons. In general Analytics applications can be viewed as compound systems or often as mash-up applications that operate as web-based applications, desktop applications or even both. The work-flow consists of four general groups of functionalities that can be conceptually viewed and implemented in a variety of ways: •Data Collection •Data Preparation & Storage •Computational and Mining •Result Presentation and Data Visualization The data collection procedure traditionally is based on extracting data from various sources and log files that are being generated by the web server, hosting the web application. Additional sources are used in parallel that enhance the information and provide a more consistent and global view of the users’ profile. The collected data need preparation before storage. This preparation involves locating the data sources, cleaning up redundant and unnecessary data, separating data fields, grouping and indexing. Storage takes place mainly in relational database systems, to provide flexible support for complicated queries, data correlations and searches at a later phase. Computational algorithms, data mining techniques and applications form the kernel-layer of the Analyzer. They generate metrics based on the collected data sets and enhance the existing data sets by importing additional information from external sources. Modular design and integration with contemporary report generation and visualization systems allow feeding the visualization subsystems with data. Interactive data visualization systems establish two-way communication with the computational subsystems and provide excellent interfaces for filtering and extracting results and metrics. This research focuses on ways of extending the input data of an analyzer to support retail businesses transactions as well. The key clicks of the web application users are being substituted by sensor input, automatically generated by mobile applications that are activated by retail customers transaction data while the customer is visiting the shopping floor and searches through the aisles of the shopping floor for products. A sensor mobile device environment is particularly cheap, easy to integrate and useful, especially in situations where the retailer operates electronic shopping applications in parallel, selling online the same products that are being offered on the shelves. Like most web applications and services, e-commerce applications are often implemented without significant built-in subsystems and modules acting as internal performance measuring mechanisms. Developers constantly strive to program as efficiently as they can, using the best and fastest possible run-time optimization techniques and tools and optimize their code as well as they can. Other than that, overall performance measurement is not a primary responsibility of the programmers. The reason lies mainly in the fact that performance of web applications happens to be heavily dependent on the environment characteristics they operate in. These environments are usually complex, and consist of servers and various network connections, as well as services that happen to be distributed externally, often in other countries, like payment portals and security certificates. Thus, measurements need to be designed in a way that extends beyond the core software application implemented by the developer team whose main goal is to fulfill the functional requirements of the designers. Because responsiveness is always a major factor, since high revenue is sought for in all e-commerce applications, it is important to perform these measurements well and accurately. Occasionally overall response times increase and the administrators responsible often lack the means to even notice that. To remedy such conditions, it is crucial to have a precise performance, user-action and behavior measuring and visualizing system on hand, that will make all bottlenecks and problems visible and will even predict shortage of resources. Techniques used for obtaining operational and transactional data and presenting them on time to monitor the safe operation of any web site, and particularly an e-commerce site are presented in this work. Innovative ideas range from log file data enhancement techniques to real time visualization and customer behavioral pattern analysis. A customizable and extendable log file analyzer has been developed. During the development phase, four distinctive versions of the Analyzer Application have been produced. They can be used interchangeably according to the operational environment in scrutiny. These versions are being presented as steps of development in detail in the current document. They can be viewed as complementary functionalities. Important requirements for the software developer are ease of installation, integration, configuration and tuning. Portability across operating environments and possibility of combining cross platform installations lead to using 100% Java as a developed platform. This way, evaluating any e-commerce application becomes easy for the administrators. Issues involved in creating a prototype customizable e-commerce log file analyzer for measuring customer access to E-shops are pointed out and solutions provided. The analyzer is basically a toolbox, consisting of a set of the necessary tools to load the necessary data into its database and provide exact insight for customer access and system response of e-shops. Insight can be obtained through generation of extensive reporting, graphical reports and various visuals and statistics. The analyzer provides answers to standard questions such as: How many times has a specific product been added to the cart over a period? What is the average, maximum or minimum duration of visits? How many bytes were sent from the web server per day? What is the average duration of a complete payment cycle? How many customers have visited the site in a specified time span? How much revenue do we make per day or per hour? However, it provides answers to more interesting questions on session analysis such as what the customer route within the web application was, which web pages or selections are only seldom visited and the profile characteristics of the visitors. Additionally, comparisons between different e-shops and comparisons with previous years, months and days are feasible and easy, as is error reporting and detection as well. Our first approach of this research is based on the first four steps of the Quantitative Analysis Cycle of an E-business Site: 1. Insight of e-business site architecture 2. Measuring system performance from different reference points 3. Understanding customer behavior by generating a Customer Behavior Model Graph 4. Workload and Session analysis. The final approach is providing a platform that will promote: 5. Performance model development 6. Performance parameter definition 7. Workload forecasting 8. Prediction of site performance. The e-shop log file analysis tool can transparently display user actions and allow management to locate weak spots of the e-shop design, since it provides information about all user selected paths before a successful purchase or even an unsuccessful purchase attempt. It allows measurement of the times between all steps involved. Most open-source e-commerce solutions offer some statistical tools, like reports displaying orders per day or highest selling items. These reports inform the staff about daily procedures and only successful purchases. The log file analysis application, on the other hand, is more comprehensive and powerful informing about performance, user behavior and user preferences. The goal of a business to customer (B2C) e-shop application is to promote retail sales and create profit. A virtual store allows buying products or services through a website, in analogy to a bricks-and-mortar retailer or a shopping mall. The Internet is no longer a niche technology – it is mass media and an utterly integral part of modern life. Over 85 per cent of the world’s online population has used the Internet to make a purchase. Intention to shop online in Europe is high.79 percent of online European consumers plan to purchase products or services via the Internet in the next six months. Online consumers in Norway and Great Britain show the greatest propensity with almost 90 percent planning a web purchase soon. The e-shop must have a minimal interface, consisting of search engines and product presentation mechanisms. They must also be able to support easy and quick adding of items to the cart and finally allow secure payments and possibly offer one-page checkout. Deficient performance of an e-shop will lead to lost revenue. According to the so-called 8-second rule, a user will not tolerate delays longer than 8 seconds per page-refresh of a website, not even if the user is equipped with a low-speed dial-up connection. This forces the e-shop designer, to design and implement every page as efficiently as possible. In 2001 Zona Research report more than $25 billion in potential lost business due to Web performance issues. Today, not only overall bandwidth has increased dramatically, but also the number of users, the demand for multimedia and the overall traffic. The 8-second rule still applies and the need to measure performance is still valid but under very different conditions. The web server can be configured in such a way that any access to the e-shop can be registered into an access log file. The remote IP address, time stamp of access, requested or sent object, size in bytes, duration etc. are registered here. The log file analyzer mainly operates on this file. Many issues involved with e-shop log files must be considered. Successful sale sessions are always fewer than the total number of sessions. Additionally, to human users we have sessions created by crawlers and robots. These sessions alienate the measurements. Web Robots (also known as Web Wanderers, Crawlers, or Spiders), are programs that traverse the Web automatically. Search engines, agents and research applications use them to index the web content, spammers use them to scan for email addresses, and they have many other uses. Robots can be identified through their behavior but it may not always be feasible to detect them. The assumption is made that robots never enter the pay section of the e-shop. Still, theoretically a robot could be used to buy products. The log file analyzer we have developed contains a toolbox with specialized tools necessary for measuring performance of e-shops as well as customer behavioral patterns. Standard general-purpose log file analyzers usually process solely log files, to evaluate access hits, calculate bandwidth and report visited pages on hourly and daily basis, as well as visitor countries and browser-agent statistics. This information is very useful for a content management system, a portal, or even a static website administrator, because the pages visited, and the visit durations are enough to measure the success of the site. An e-commerce site administrator, on the other hand, needs more specific information about the performed actions and transactions, which must be combined with the log file data. E-shop specific data about products, product-categories, orders and customers are used in our toolbox to gain more precise information of the access events. This way, the end user deals with more business-specific objects since familiar terms and items are used. That makes the application easier to adopt by the administrator. The initial approach, as seen in Figure 1, has been built as a standalone application with a simple, easy to use and intuitive graphical user interface, it maintains its own database and includes options that allow its user to easily adapt and load data from both log files and bidirectionally with the e-shop data base. This model has been used as the base application upon which various extensions and addon features were implemented. This application can run anywhere, not necessarily on the machine where the web server resides. It can accommodate multiple e-shops, running on multiple web server architectures and provides a basis for the later approaches. This is a typical compound menu-driven ETL application which includes integrated visualization components and mechanisms that integrate data and information of the e-shop into the analyzer. The second approach integrates external metrics, collected from a tagging Analytics provider, like Google Analytics. The Java API allows access to registered users. This addition turns the entire system to hybrid. Hybrid Analytics applications alleviate all limitations of pure log file analyzers and tagging systems. These two initial approaches are further extended and Log File Analyzers with additional capabilities were designed and are described and presented in this document.