limit-depth crawling
Reported by hayato | September 6th, 2009 @ 12:31 AM
I suggest to add new function to anemone.
Problem
Anemone 0.1.2 follow link in same domain. In some of the cases of
root URL, Anemone crawl too many pages, and take long time.
Suggestion
Add function to Anemone to limit following link by the depth
count.
I attatch concept implementation and rspec.
Please consider my suggestion.
Comments and changes to this ticket
-
chris (at chriskite) September 6th, 2009 @ 07:35 PM
- State changed from new to open
- Assigned user set to chris (at chriskite)
-
chris (at chriskite) October 24th, 2009 @ 01:06 PM
- State changed from open to resolved
Please Sign in or create a free account to add a new ticket.
With your very own profile, you can contribute to projects, track your activity, watch tickets, receive and update tickets through your email and much more.
Create your profile
Help contribute to this project by taking a few moments to create your personal profile. Create your profile ยป
Anemone is a Ruby library that makes it quick and painless to write programs that spider a website. It provides a simple DSL for performing actions on every page of a site, skipping certain URLs, and calculating the shortest path to a given page on a site.