z-logo
open-access-imgOpen Access
Using a Self Organizing Feature Map for Extracting Representative Web Pages from a Web Site
Author(s) -
Sebastián A. Ríos,
Juan Velazquez,
Hiroshi Yasuda,
Terumasa Aoki
Publication year - 2006
Publication title -
international journal of computational intelligence research
Language(s) - English
Resource type - Journals
eISSN - 0974-1259
pISSN - 0973-1873
DOI - 10.5019/j.ijcir.2006.59
Subject(s) - computer science , feature (linguistics) , world wide web , web site , information retrieval , data mining , the internet , philosophy , linguistics
We introduce a method for improving the web site content through the identification of their most representative web pages. The process begin with the transformation of the web page text content in feature vectors by using the vector space model for documents. Next a Self Organizing Feature Map (SOFM) receive these vectors as input, generating a set of clusters, whose centroids contain the most representative text content for a topic in the site. In the web page's vectorial representation, the text content is transformed in a set of numeric values. Then by operation of the SOFM, the cluster's content are vectors whose relation with the web site pages is not clear. By applying a Reverse Cluster Analysis (RCA), it is possible to identify which pages are rep- resented in each cluster. The RCA consists in the comparison among the vectors in each clusters with the page's vector repre- sentation. Next the pages whose vectorial representation is near to the cluster's centroid, are extracted. This approach was tested in a real web site in order to shows its effectiveness. The results indicate that it is possible to identify representative web page in a web site and for this way, improve the site's text content.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here
Accelerating Research

Address

John Eccles House
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom