A Framework for Collecting YouTube Meta-Data
Author(s) -
Haroon Malik,
Zifeng Tian
Publication year - 2017
Publication title -
procedia computer science
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.334
H-Index - 76
ISSN - 1877-0509
DOI - 10.1016/j.procs.2017.08.347
Subject(s) - computer science , scalability , treasure , process (computing) , world wide web , data science , data collection , data set , social media , set (abstract data type) , volume (thermodynamics) , multimedia , information retrieval , database , artificial intelligence , philosophy , statistics , physics , theology , mathematics , quantum mechanics , programming language , operating system
YouTube is currently the most popular and successful video sharing website. The videos on YouTube have become a treasure of data, which can be used in various fields of research ranging from STEM education to Medical science. However, all the previous research, studies, and analysis so far, are only conducted on very small volume of YouTube video data. To date, no mechanism exists to systematically and continuously collect, process and store the rich set of YouTube data. In this paper, we present a methodology to fill the gap, i.e., systematically and continuously mine and store the YouTube data. The methodology has two modules, a video discovery and a video meta-data collection. Our methodology is robust, efficient and scalable. Over the period of two months, using our methodology, we discovered 16,000,000 videos and mined the complete meta-data of more than 42,000 videos.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom