#20 ✓resolved
Luke Hartman

Crawl not staying on domain

Reported by Luke Hartman | February 4th, 2010 @ 04:39 PM

While running a crawl, I am sometimes getting page results from other domains. The following code:

require 'rubygems'
require 'anemone'
require 'pp'

Anemone.crawl("http://www.ou.edu/") do |a|
  a.on_every_page do |x|
   pp x.url
  end
end

checks multiple other domains and subdomains.

Comments and changes to this ticket

Please Sign in or create a free account to add a new ticket.

With your very own profile, you can contribute to projects, track your activity, watch tickets, receive and update tickets through your email and much more.

New-ticket Create new ticket

Create your profile

Help contribute to this project by taking a few moments to create your personal profile. Create your profile ยป

Anemone is a Ruby library that makes it quick and painless to write programs that spider a website. It provides a simple DSL for performing actions on every page of a site, skipping certain URLs, and calculating the shortest path to a given page on a site.

Shared Ticket Bins

People watching this ticket

Pages