Social Graph Community Differentiated by Node Features with Partly Missing Information | Zendy

Vladislav Chesnokov | Zendy; P. G. Klyucharev | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

Social Graph Community Differentiated by Node Features with Partly Missing Information

Author(s) -

Vladislav Chesnokov,

P. G. Klyucharev

Publication year - 2014

Publication title -

science and education of the bauman mstu

Language(s) - English

Resource type - Journals

ISSN - 1994-0408

DOI - 10.7463/0915.0811704

Subject(s) - computer science , node (physics) , social graph , graph , social network (sociolinguistics) , data science , world wide web , information retrieval , theoretical computer science , social media , engineering , structural engineering

This paper proposes a new algorithm for community differentiation in social graphs, which uses information both on the graph structure and on the vertices. We consider user's ego-network i.e. his friends, with no himself, where each vertex has a set of features such as details on a workplace, institution, etc. The task is to determine missing or unspecified features of the vertices, based on their neighbors' features, and use these features to differentiate the communities in the social graph. Two vertices are believed to belong to the same community if they have a common feature. A hypothesis has been put forward that if most neighbors of a vertex have a common feature, there is a good probability that the vertex has this feature as well. The proposed algorithm is iterative and updates features of vertices, based on its neighbors, according to the hypothesis. Share of neighbors that form a majority is specified by the algorithm parameter. Complexity of single iteration depends linearly on the number of edges in the graph.

To assess the quality of clustering three normalized metrics were used, namely: expected density, silhouette index, and Hubert's Gamma Statistic. The paper describes a method for test sampling of 2.000 graphs of the user's social network \VKontakte". The API requests addressed \VKontakte" and parsing HTML-pages of user's profiles and search results provided crawling. Information on user's group membership, secondary and higher education, and workplace was used as features. To store data the PostgreSQL DBMS was used, and the gexf format was used for data processing. For the test sample, metrics for several values of algorithm parameter were estimated: the value of index silhouettes was low (0.14-0.20), but within the normal range; the value of expected density was high, i.e. 1.17-1.52; the value of Hubert's gamma statistic was 0.94-0.95 that is close to the maximum. The number of vertices with no features was calculated before and after applying the algorithm. With two-third of vertices before using the algorithm their number has fallen to 25-64% depending on algorithm parameter after applying it. The proposed algorithm can be used for clustering the social graphs, but it should be modified to differentiate overlapping communities.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Empowering knowledge with every search

About

About Careers Publisher Partners Contact Us

Learn

FAQs Blog Terms of Use Privacy Policy

About

Learn

Discover

Explore