Research Article

Article:A Clickstream-based Focused Trend Parallel Web Crawler

by  F. Ahmadi-Abkenari, Ali Selamat
journal cover
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 9 - Issue 5
Published: November 2010
Authors: F. Ahmadi-Abkenari, Ali Selamat
10.5120/1385-1866
PDF

F. Ahmadi-Abkenari, Ali Selamat . Article:A Clickstream-based Focused Trend Parallel Web Crawler. International Journal of Computer Applications. 9, 5 (November 2010), 1-8. DOI=10.5120/1385-1866

                        @article{ 10.5120/1385-1866,
                        author  = { F. Ahmadi-Abkenari,Ali Selamat },
                        title   = { Article:A Clickstream-based Focused Trend Parallel Web Crawler },
                        journal = { International Journal of Computer Applications },
                        year    = { 2010 },
                        volume  = { 9 },
                        number  = { 5 },
                        pages   = { 1-8 },
                        doi     = { 10.5120/1385-1866 },
                        publisher = { Foundation of Computer Science (FCS), NY, USA }
                        }
                        %0 Journal Article
                        %D 2010
                        %A F. Ahmadi-Abkenari
                        %A Ali Selamat
                        %T Article:A Clickstream-based Focused Trend Parallel Web Crawler%T 
                        %J International Journal of Computer Applications
                        %V 9
                        %N 5
                        %P 1-8
                        %R 10.5120/1385-1866
                        %I Foundation of Computer Science (FCS), NY, USA
Abstract

The immense growing dimension of the World Wide Web induces many obstacles for all-purpose single-process crawlers including the presence of some incorrect answers among search results and the scaling drawbacks. As a result, more enhanced heuristics are needed to provide more accurate search outcomes in an appropriate timely manner. Regarding the fact that employing link dependent Web page importance metrics within a parallel crawler yields a considerable overhead on the overall searching system, and also because such a metric is not able to cover the authorized Web content in dark net and authorized fresh pages, therefore employing these metrics is not an absolute solution within search engines’ architecture. This paper proposes the application of a link independent Web page importance metric to govern the priority rule within the crawl frontier through proposing a modest weighted architecture for a focused structured parallel Web crawler (CFP crawler) in which the credit assignment to URLs in crawl frontier is done according to a clickstream-based prioritizing algorithm.

References
Index Terms
Computer Science
Information Sciences
No index terms available.
Keywords

Clickstream analysis Focused crawlers Parallel crawlers Web data management Web page Importance metrics

Powered by PhDFocusTM