A Heuristic Approach for Web Content Extraction | Zendy

Neha Gupta | Zendy; Saba Hilal | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

A Heuristic Approach for Web Content Extraction

Author(s) -

Neha Gupta,

Saba Hilal

Publication year - 2011

Publication title -

international journal of computer applications

Language(s) - English

Resource type - Journals

ISSN - 0975-8887

DOI - 10.5120/1945-2601

Subject(s) - computer science , heuristic , content (measure theory) , extraction (chemistry) , information retrieval , world wide web , web content , web page , artificial intelligence , chromatography , chemistry , mathematics , mathematical analysis

Today internet has made the life of human dependent on it. Almost everything and anything can be searched on net. Web pages usually contain huge amount of information that may not interest the user, as it may not be the part of the main content of the web page. To extract the main content of the web page, data mining techniques need to be implemented. A lot of research has already been done in this field. Current automatic techniques are unsatisfactory as their outputs are not appropriate for the query of the user. In this paper, we are presenting an automatic approach to extract the main content of the web page using tag tree & heuristics to filter the clutter and display the main content. Experimental results have shown that the technique presented in this paper is able to outperform existing techniques dramatically.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research