#9 limit-depth crawling - Anemone

Type	To find
responsible:me	tickets assigned to you
tagged:"@high"	tickets tagged @high
milestone:next	tickets in the upcoming milestone
state:invalid	tickets with the state invalid
created:"last week"	tickets created last week
sort:number, importance, updated	tickets sorted by #, importance or updated
Combine keywords for powerful searching.
Use advanced searching »

#9 ✓resolved

limit-depth crawling

Reported by hayato | September 6th, 2009 @ 12:31 AM

I suggest to add new function to anemone.

Problem
Anemone 0.1.2 follow link in same domain. In some of the cases of root URL, Anemone crawl too many pages, and take long time.

Suggestion
Add function to Anemone to limit following link by the depth count.

I attatch concept implementation and rspec.
Please consider my suggestion.

depthlimit.patch 3.6 KB

Comments and changes to this ticket

chris (at chriskite) September 6th, 2009 @ 07:35 PM
- State changed from “new” to “open”
- Assigned user set to “chris (at chriskite)”
You flagged this item as spam.
hayato September 16th, 2009 @ 09:52 AM
This ticket looks fixed at anemone 0.2.0.

thanks!
chris (at chriskite) October 24th, 2009 @ 01:06 PM
- State changed from “open” to “resolved”

Please Sign in or create a free account to add a new ticket.

With your very own profile, you can contribute to projects, track your activity, watch tickets, receive and update tickets through your email and much more.

Create new ticket

Create your profile

Help contribute to this project by taking a few moments to create your personal profile. Create your profile »

Anemone is a Ruby library that makes it quick and painless to write programs that spider a website. It provides a simple DSL for performing actions on every page of a site, skipping certain URLs, and calculating the shortest path to a given page on a site.

Chriskite Anemone

limit-depth crawling

Comments and changes to this ticket

chris (at chriskite) September 6th, 2009 @ 07:35 PM

hayato September 16th, 2009 @ 09:52 AM

chris (at chriskite) October 24th, 2009 @ 01:06 PM

Create your profile

Shared Ticket Bins (Sort)

People watching this ticket

Attachments

Pages

Chriskite Anemone

Keyword searching

limit-depth crawling

Comments and changes to this ticket

chris (at chriskite) September 6th, 2009 @ 07:35 PM

hayato September 16th, 2009 @ 09:52 AM

chris (at chriskite) October 24th, 2009 @ 01:06 PM

Create your profile

Shared Ticket Bins (Sort)

People watching this ticket

Attachments

Pages