#24 Memleak with pages? - Anemone

Type	To find
responsible:me	tickets assigned to you
tagged:"@high"	tickets tagged @high
milestone:next	tickets in the upcoming milestone
state:invalid	tickets with the state invalid
created:"last week"	tickets created last week
sort:number, importance, updated	tickets sorted by #, importance or updated
Combine keywords for powerful searching.
Use advanced searching »

#24 ✓resolved

Memleak with pages?

Reported by rb2k | May 15th, 2010 @ 04:06 PM

Is it possible that, when running a lot of .crawl() operations, the @pages hash will keep on growing and there is no method that allows the user to empty it?
(using a simple Hash in my case)

Comments and changes to this ticket

chris (at chriskite) May 25th, 2010 @ 09:13 PM
- State changed from “new” to “resolved”
Yes, although it's not really a "leak" because it is intentionally storing data about all the pages you crawl. If you crawl a lot of pages, that data has to go somewhere, and if you're using a Hash then it's in memory. Using the TokyoCabinet storage engine is a good solution, as the data is persisted on disk and doesn't use nearly as much memory.
You flagged this item as spam.
rb2k May 25th, 2010 @ 09:16 PM
I crawl different domains though.
Is there a way to "reset" the page cache in between crawls?

Please Sign in or create a free account to add a new ticket.

With your very own profile, you can contribute to projects, track your activity, watch tickets, receive and update tickets through your email and much more.

Create new ticket

Create your profile

Help contribute to this project by taking a few moments to create your personal profile. Create your profile »

Anemone is a Ruby library that makes it quick and painless to write programs that spider a website. It provides a simple DSL for performing actions on every page of a site, skipping certain URLs, and calculating the shortest path to a given page on a site.

Shared Ticket Bins (Sort)

↓↑ drag 19 Open tickets
↓↑ drag 17 Resolved tickets
↓↑ drag 0 This week's tickets

Chriskite Anemone

Memleak with pages?

Comments and changes to this ticket

chris (at chriskite) May 25th, 2010 @ 09:13 PM

rb2k May 25th, 2010 @ 09:16 PM

Create your profile

Shared Ticket Bins (Sort)

People watching this ticket

Pages

Chriskite Anemone

Keyword searching

Memleak with pages?

Comments and changes to this ticket

chris (at chriskite) May 25th, 2010 @ 09:13 PM

rb2k May 25th, 2010 @ 09:16 PM

Create your profile

Shared Ticket Bins (Sort)

People watching this ticket

Pages