Classification and prediction of matrix structured data with applications to recommendation systems, identifying anti-socials and bot-nets
Author(s) -
Shokat Fadaee
Publication year - 2017
Language(s) - English
Resource type - Dissertations/theses
DOI - 10.17760/d20284626
Subject(s) - matrix decomposition , computer science , matrix (chemical analysis) , missing data , artificial intelligence , data matrix , recommender system , sparse matrix , data mining , machine learning , non negative matrix factorization , big data , information retrieval , data science , clade , biochemistry , eigenvalues and eigenvectors , materials science , physics , chemistry , quantum mechanics , gaussian , composite material , gene , phylogenetic tree
Matrix representations are a natural way to represent many forms of networked and tabulated data. These include connections among people, user preferences over items, or (the time-series of) bot-net attacks against entities. Models based on matrix factorization have been extensively studied in machine learning and statistical analysis. In this thesis, we address issues related to learning with matrix structured data such as interpreting missing values in a highly sparse matrix. The impact of missing information on matrix structured data can be serious and lead to biased estimates of parameters, increased standard errors, and decrease the ability to generalize findings. We address this issue in three different applications. In the first part of this dissertation, our goal is to determine the structural differences between different categories of networks (represented as adjacency matrices) and to use these differences to predict the network category. We propose Cliqster, a new Bernoulli process-based model for unweighted networks. By solving this problem, we are able to present an efficient algorithm for transforming the network to a new space which is both concise and discriminative. This new space preserves the identity of the network as much as possible. Our algorithm is interpretable and intuitive. In the second part of this work, we have matrices that represent users’ preferences (in the form of ordinal or “1-5 star” ratings) over items. The task of predicting a user’s rating of items is solved by recommendation systems. Recommendation systems have been widely used by commercial service providers for giving suggestions to users. Collaborative filtering (CF) systems, one of the most popular recommendation systems, utilize the history of behaviors of the aggregate user-base to provide individual recommendations and are effective when almost all users faithfully express their opinions. However, they are vulnerable to malicious users biasing their inputs in order to change the overall ratings of a specific group of items. CF systems largely fall into two categories neighborhood-based and (matrix) factorization-based and the presence of adversarial input can influence recommendations in both categories, leading to instabilities in estimation and prediction. Although the robustness of different collaborative filtering algorithms has been extensively studied, designing an efficient system that is immune to manipulation remains a challenge. We propose a novel hybrid recommendation system with an adaptive graph user/item similarity-regularization Chiron. Chiron ties the performance benefits
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom