Distributed OSN Crawling System based on Ajax Simulation
Author(s) -
Shan Jixi,
Ying Sha,
Yang Li,
Kai Xu,
Li Guo
Publication year - 2013
Publication title -
procedia computer science
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.334
H-Index - 76
ISSN - 1877-0509
DOI - 10.1016/j.procs.2013.05.105
Subject(s) - ajax , computer science , crawling , world wide web , asynchronous communication , javascript , middleware (distributed applications) , web application , xml , web crawler , database , computer network , medicine , anatomy
In the age of Web2.0, lots of online social networks (OSNs) like Facebook, Twitter, WeiBo become the most popular information transform platform, which catch more and more attention from Information Retrieval (IR). However, traditional web crawling System get into trouble because of the complicated OSN web pages, the rapid message exploding and the heavy using of Asynchronous JavaScript and XML(AJAX). We design and implement a distributed system based on Message Oriented Middleware (MOM) and Ajax simulation, which crawls 70 millions of Twitter detail items in one month. The data Acquisition shows that the crawling with Ajax simulation is able to get items loaded by Ajax without limitations, the distributed system based on MOM and Ajax simulation is able to crawl massive OSN data completely, quickly, frequently and unrestrictedly
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom