#11 ✓resolved
urbanadventurer

Doesn't handle redirection properly.

Reported by urbanadventurer | October 5th, 2009 @ 09:45 AM

Doesn't handle redirection properly.

irb(main):171:0 Anemone.crawl("http://treshna.com/") do |a|
irb(main):172:1
a.on_every_page do |x|
irb(main):173:2* pp x
irb(main):174:2> end
irb(main):175:1> end

<Anemone::Page:0xb758bd8c

@aliases=[], @code=nil, @data=#, @depth=0, @headers=nil, @links=[], @referer=nil, @url=#<URI::HTTP:0xb758d7cc URL:http://treshna.com/>> => #<Anemone::Core:0xb758d998 @urls=[#<URI::HTTP:0xb758d7cc URL:http://treshna.com/>], @skip_link_patterns=[], @pages={"http://treshna.com/"=>#&lt;Anemone::Page:0xb758bd8c @links=[], @referer=nil, @url=#<URI::HTTP:0xb758d7cc URL:http://treshna.com/>, @data=#, @aliases=[], @headers=nil, @code=nil, @depth=0>}, @on_pages_like_blocks={}, @tentacles=[#<Thread:0xb758d5b0 dead>, #<Thread:0xb758d4fc dead>, #<Thread:0xb758d448 dead>, #<Thread:0xb758d394 dead>], @on_every_page_blocks=[#Proc:0xb758e550@(irb):172], @after_crawl_blocks=[]>

curl -vv treshna.com
* About to connect() to treshna.com port 80 (#0) * Trying 210.48.71.196... connected * Connected to treshna.com (210.48.71.196) port 80 (#0)

GET / HTTP/1.1 User-Agent: curl/7.18.2 (i486-pc-linux-gnu) libcurl/7.18.2 OpenSSL/0.9.8g zlib/1.2.3.3 libidn/1.10 Host: treshna.com Accept: /

< HTTP/1.1 302 Found
< Date: Mon, 05 Oct 2009 14:44:02 GMT
< Server: Apache/2.2.9 (Debian) PHP/5.2.6-1+lenny3 with Suhosin-Patch proxy_html/3.0.0 mod_ssl/2.2.9 OpenSSL/0.9.8g
< Location: http://www.treshna.com
< Content-Length: 366
< Content-Type: text/html; charset=iso-8859-1
<
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">

302 Found

Found


The document has moved here.




Apache/2.2.9 (Debian) PHP/5.2.6-1+lenny3 with Suhosin-Patch proxy_html/3.0.0 mod_ssl/2.2.9 OpenSSL/0.9.8g Server at treshna.com Port 80


* Connection #0 to host treshna.com left intact * Closing connection #0

Comments and changes to this ticket

  • chris (at chriskite)

    chris (at chriskite) November 5th, 2009 @ 10:27 AM

    • State changed from “new” to “resolved”
    • Assigned user set to “chris (at chriskite)”

    Since Anemone limits the crawl to a single domain, it won't switch over to your www subdomain after the redirect. You'll need to start on the domain you intend to crawl.

Please Sign in or create a free account to add a new ticket.

With your very own profile, you can contribute to projects, track your activity, watch tickets, receive and update tickets through your email and much more.

New-ticket Create new ticket

Create your profile

Help contribute to this project by taking a few moments to create your personal profile. Create your profile ยป

Anemone is a Ruby library that makes it quick and painless to write programs that spider a website. It provides a simple DSL for performing actions on every page of a site, skipping certain URLs, and calculating the shortest path to a given page on a site.

Shared Ticket Bins

People watching this ticket

Pages