z-logo
open-access-imgOpen Access
The Impact of Feature Selection on Web Spam Detection
Author(s) -
Jaber Karimpour,
Ali Noroozi,
Adeleh Abadi
Publication year - 2012
Publication title -
international journal of intelligent systems and applications
Language(s) - English
Resource type - Journals
eISSN - 2074-9058
pISSN - 2074-904X
DOI - 10.5815/ijisa.2012.09.08
Subject(s) - spamming , spamdexing , computer science , feature selection , deep web , data mining , set (abstract data type) , search engine optimization , selection (genetic algorithm) , search engine , machine learning , feature (linguistics) , rank (graph theory) , genetic algorithm , artificial intelligence , information retrieval , the internet , web search engine , world wide web , web search query , linguistics , philosophy , mathematics , combinatorics , programming language
Search engine is one of the most important tools for managing the massive amount of distributed web content. Web spamming tries to deceive search engines to rank some pages higher than they deserve. Many methods have been proposed to combat web spamming and to detect spam pages. One basic one is using classification, i.e., learning a classification model for classifying web pages to spam or non-spam. This work tries to select the best feature set for classification of web spam using imperialist competitive algorithm and genetic algorithm. Imperialist competitive algorithm is a novel optimization algorithm that is inspired by socio-political process of imperialism in the real world. Experiments are carried out on WEBSPAM- UK2007 data set, which show feature selection improves classification accuracy, and imperialist competitive algorithm outperforms GA.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here
Accelerating Research

Address

John Eccles House
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom