#33 open

Timeouts bubble up through rescue blocks and kill tentacles

Reported by michael.harrington | July 30th, 2010 @ 12:39 PM | in 0.4.1

The thread body of the tentacle looks like this:

    def run
      loop do
        link, referer, depth = @link_queue.deq

        break if link == :END

        @http.fetch_pages(link, referer, depth).each { |page| @page_queue << page }


Theoretically, we are protected from exceptions because fetch_pages has a catch-all rescue...

    def fetch_pages(url, referer = nil, depth = nil)
        url = URI(url) unless url.is_a?(URI)
        pages = []
        get(url, referer) do |response, code, location, redirect_to, response_time|
          pages << Page.new(location, :body => response.body.dup,
                                      :code => code,
                                      :headers => response.to_hash,
                                      :referer => referer,
                                      :depth => depth,
                                      :redirect_to => redirect_to,
                                      :response_time => response_time)

        return pages
      rescue => e
        if verbose?
          puts e.inspect
          puts e.backtrace
        return [Page.new(url, :error => e)]

For the most part, you are safe.
Except in the case of Timeout::Error, which derives from Interrupt, not StandardError, and therefore bypasses the rescue block!

The solution is to explicitly capture it with an additional rescue Timeout::Error => e.

Comments and changes to this ticket

Please Sign in or create a free account to add a new ticket.

With your very own profile, you can contribute to projects, track your activity, watch tickets, receive and update tickets through your email and much more.

New-ticket Create new ticket

Create your profile

Help contribute to this project by taking a few moments to create your personal profile. Create your profile ยป

Anemone is a Ruby library that makes it quick and painless to write programs that spider a website. It provides a simple DSL for performing actions on every page of a site, skipping certain URLs, and calculating the shortest path to a given page on a site.

Shared Ticket Bins
