Timeouts bubble up through rescue blocks and kill tentacles
Reported by michael.harrington | July 30th, 2010 @ 12:39 PM | in 0.4.1
The thread body of the tentacle looks like this:
def run
loop do
link, referer, depth = @link_queue.deq
break if link == :END
@http.fetch_pages(link, referer, depth).each { |page| @page_queue << page }
delay
end
end
Theoretically, we are protected from exceptions because fetch_pages has a catch-all rescue...
def fetch_pages(url, referer = nil, depth = nil)
begin
url = URI(url) unless url.is_a?(URI)
pages = []
get(url, referer) do |response, code, location, redirect_to, response_time|
pages << Page.new(location, :body => response.body.dup,
:code => code,
:headers => response.to_hash,
:referer => referer,
:depth => depth,
:redirect_to => redirect_to,
:response_time => response_time)
end
return pages
rescue => e
if verbose?
puts e.inspect
puts e.backtrace
end
return [Page.new(url, :error => e)]
end
end
For the most part, you are safe.
Except in the case of Timeout::Error, which derives from
Interrupt, not StandardError, and therefore bypasses the rescue
block!
The solution is to explicitly capture it with an additional rescue Timeout::Error => e.
Comments and changes to this ticket
-
chris (at chriskite) July 30th, 2010 @ 07:31 PM
- State changed from new to open
- Milestone set to 0.4.1
- Milestone order changed from 185603 to 0
Please Sign in or create a free account to add a new ticket.
With your very own profile, you can contribute to projects, track your activity, watch tickets, receive and update tickets through your email and much more.
Create your profile
Help contribute to this project by taking a few moments to create your personal profile. Create your profile ยป
Anemone is a Ruby library that makes it quick and painless to write programs that spider a website. It provides a simple DSL for performing actions on every page of a site, skipping certain URLs, and calculating the shortest path to a given page on a site.