z-logo
open-access-imgOpen Access
Collective Classification of Network Data
Author(s) -
Ben London,
Lise Getoor
Publication year - 2014
Language(s) - English
DOI - 10.1201/b17320-16
Communication networks, financial transaction networks, networks describing physical systems, and social networks are all becoming increasingly important in our day-to-day life. Often, we are interested in models of how nodes in the network influence each other (for example, who infects whom in an epidemiological network), models for predicting an attribute of interest based on observed attributes of objects in the network (for example, predicting political affiliations based on online purchases and interactions), or we might be interested in identifying important nodes in the network (for example, critical nodes in communication networks). In most of these scenarios, an important step in achieving our final goal is classifying, or labeling, the nodes in the network. Given a network and a node v in the network, there are three distinct types of correlations that can be utilized to determine the classification or label of v: (1) The correlations between the label of v and the observed attributes of v. (2) The correlations between the label of v and the observed attributes (including observed labels) of nodes in the neighborhood of v. (3) The correlations between the label of v and the unobserved labels of objects in the neighborhood of v. Collective classification refers to the combined classification of a set of interlinked objects using all three types of information just described. Many applications produce data with correlations between labels of interconnected nodes. The simplest types of correlation can be the result of homophily (nodes with similar labels are more likely to be linked) or the result of social influence (nodes that are linked are more likely to have similar labels), but more complex dependencies among labels often exist. Within the machine-learning community, classification is typically done on each object independently, without taking into account any underlying network that connects the nodes. Collective classification does not fit well into this setting. For instance, in the web page classification problem where web pages are interconnected with hyperlinks and the task is to assign each web page with a label that best indicates its topic, it is common to assume that the labels on interconnected web pages are correlated. Such interconnections occur naturally in data from a variety of applications such as bibliographic data, email networks, and social networks. Traditional classification techniques would ignore the correlations represented by these interconnections and would be hard pressed to produce the classification accuracies possible using a collective classification approach. Although traditional exact probabilistic inference algorithms such as variable elimination and the junction tree algorithm harbor the potential to perform collective classification, they are practical only when the graph structure of the network satisfies certain conditions. In general, exact inference is known to be Articles

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here
Accelerating Research

Address

John Eccles House
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom