Algorithm of web page similarity comparison based on visual block
Author(s) -
Xingchen Li,
Weizhe Zhang,
Desheng Wang,
Bin Zhang,
Hui He
Publication year - 2019
Publication title -
computer science and information systems
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.244
H-Index - 24
eISSN - 2406-1018
pISSN - 1820-0214
DOI - 10.2298/csis180915028l
Subject(s) - computer science , web page , information retrieval , similarity (geometry) , page view , phishing , static web page , block (permutation group theory) , world wide web , matching (statistics) , data mining , artificial intelligence , web navigation , image (mathematics) , the internet , mathematics , statistics , geometry
Phishing often deceives users due to the relative similarity to the true pages on a layout and leads to considerable losses for the society. Consequently, detecting phishing sites has been an urgent activity. By researching phishing web pages using web page screenshots, we discover that this kind of web pages use numerous web page screenshots to achieve the close similarity to the true page and avoid the text and structure similarity detection. This study introduces a new similarity matching algorithm based on visual blocks. First, the RenderLayer tree of the web page is obtained to extract the visual block. Second, an algorithm that will settle the jumbled visual blocks, including the deletion of the small visual blocks and the emergence of the overlapping visual blocks, is designed. Finally, the similarity between the two web pages is assessed. The proposed algorithm sets different thresholds to achieve the optimal missing and false alarm rates.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom