#30 open
michael.harrington

No response code or referer after running for hours

Reported by michael.harrington | July 26th, 2010 @ 01:09 PM

OS X Snow Leopard
Ruby 1.8.7 (System Ruby)

require 'rubygems'
require 'bundler'
Bundler.setup
require 'anemone'

files = {}

Anemone.crawl 'http://local.acton.org' do |anemone|
  anemone.on_every_page do |page|
    puts "#{page.code}: #{page.url} (#{page.referer})"
    files[page.code] ||= File.open "internal_#{page.code || 'unknown'}s.txt", 'w'
    files[page.code] << "#{page.url} (#{page.referer})\n"
    files[page.code].flush
  end
  anemone.skip_links_like /login\?/
end

files.each do |code, file|
  file.close
end

For the better part of an hour, everything moves along smoothly, but at a certain point it looks like one -- and then all -- of the tentacles starts giving me nil response codes and referers.

I see this kind of console output:

----
302: https://local.acton.org/it/user/login?destination=global%252Farticles-it (http://local.acton.org/it/global/articles-it)
302: http://local.acton.org/it/user/login?destination=global%252Farticles-it (http://local.acton.org/it/global/articles-it)
200: http://local.acton.org/it/support/donating-appreciated-assets (http://local.acton.org/it/index/support)
200: http://local.acton.org/it/global/articles-it?page=1 (http://local.acton.org/it/global/articles-it)
200: http://local.acton.org/it/global/articles-it?page=2 (http://local.acton.org/it/global/articles-it)
: http://local.acton.org/it/global/article/lettera-dal-direttore-maggio-2010-it ()
200: http://local.acton.org/it/global/articles-it?page=3 (http://local.acton.org/it/global/articles-it)
: http://local.acton.org/it/global/article/lettera-dal-direttore-aprile-2010-it ()
: http://local.acton.org/it/global/article/il-profeta-jim-wallis-e-la-chiesa-dell%25E2%2580%2599ignoranza-e-it ()
: http://local.acton.org/it/global/article/la-scienza-della-custodia-peccato-sostenibilit%25C3%25A0-e-it ()
: http://local.acton.org/it/global/article/lettera-dal-direttore-it-0 ()
: http://local.acton.org/it/global/article/due-evviva-ai-vescovi-inglesi-e-gallesi-it ()
----

And the empty codes/referers continue for hundreds of pages.

Any idea what's going on or how to fix it?

Comments and changes to this ticket

Please Sign in or create a free account to add a new ticket.

With your very own profile, you can contribute to projects, track your activity, watch tickets, receive and update tickets through your email and much more.

New-ticket Create new ticket

Create your profile

Help contribute to this project by taking a few moments to create your personal profile. Create your profile ยป

Anemone is a Ruby library that makes it quick and painless to write programs that spider a website. It provides a simple DSL for performing actions on every page of a site, skipping certain URLs, and calculating the shortest path to a given page on a site.

Shared Ticket Bins

Pages