Class NextUrlsInSQS
In: lib/spider/next_urls_in_sqs.rb
Parent: Object

A specialized class using AmazonSQS to track nodes to walk. It supports two operations: push and pop . Together these can be used to add items to the queue, then pull items off the queue.

This is useful if you want multiple Spider processes crawling the same data set.

To use it with Spider use the store_next_urls_with method:

 Spider.start_at('http://example.com/') do |s|
   s.store_next_urls_with NextUrlsInSQS.new(AWS_ACCESS_KEY, AWS_SECRET_ACCESS_KEY, queue_name)
 end

Methods

new   pop   push  

Public Class methods

Construct a new NextUrlsInSQS instance. All arguments here are passed to RightAWS::SqsGen2 (part of the right_aws gem) or used to set the AmazonSQS queue name (optional).

Public Instance methods

Pull an item off the queue, loop until data is found. Data is encoded with YAML.

Put data on the queue. Data is encoded with YAML.

[Validate]