Description-Driven Community Detection
Author(s) -
Simon Pool,
Francesco Bonchi,
Matthijs van Leeuwen
Publication year - 2014
Publication title -
acm transactions on intelligent systems and technology
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.914
H-Index - 63
eISSN - 2157-6912
pISSN - 2157-6904
DOI - 10.1145/2517088
Subject(s) - computer science , data science , set (abstract data type) , graph , context (archaeology) , domain (mathematical analysis) , social network (sociolinguistics) , node (physics) , data mining , information retrieval , world wide web , social media , theoretical computer science , paleontology , mathematical analysis , mathematics , structural engineering , engineering , biology , programming language
Traditional approaches to community detection, as studied by physicists, sociologists, and more recently computer scientists, aim at simply partitioning the social network graph. However, with the advent of online social networking sites, richer data has become available: beyond the link information, each user in the network is annotated with additional information, e.g., demographics, shopping behaviour, or interests. In this context it is therefore important to develop mining methods which can take advantage of all available information. In the case of community detection this means finding good communities (a set of nodes cohesive in the social graph) which are associated with good descriptions in terms of user information (node attributes). Having good descriptions associated to our models make them understandable by the domain experts, and thus more useful in real-world applications. Another requirement dictated by real-world applications, is to develop methods that can use, when available, any domain-specific background knowledge. In the case of community detection the background knowledge could be a vague description of the communities sought in a specific application, or some prototypical nodes (e.g., good customers in the past), that represent what the analyst is looking for (a community of similar users).Towards this goal, in this article we define and study the problem of finding a diverse set of cohesive communities with concise descriptions. We propose an effective algorithm that alternates between two phases: a hill-climbing phase producing (possibly overlapping) communities, and a description induction phase which uses techniques from supervised pattern set mining. Our framework has the nice feature of being able to build well-described cohesive communities starting from any given description or seed set of nodes, which makes it very flexible and easily applicable in real-world applications.Our experimental evaluation confirms that the proposed method discovers cohesive communities with concise descriptions in realistic and large online social networks such as Delicious, Flickr, and LastFM.status: publishe
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom