Using td-agent to archive data from SQS

Oft times we need to archive data for billing or viewing traffic trends and so on. This data can come from either one or multiple application servers. This post describes the use of fluentd/td-agent as a data collector.

td-agent is a data collector tool invented by Treasure Data. It is open sourced and maintained by the community under the name of fluentd.

Architecture

The architecture used for this example is –

td-agent Architecture Diagram

Data Flow

  • Server pushes data to SQS.
  • td-agent SQS input plugin pulls data from SQS.
  • td-agent makes an API call to another service (such as an abstraction layer over a Database model).

Adding Plugins

Include the following plugins under /etc/td-agent/plugin –

Configuration

The configuration to be included in td-agent:

<source>
  type sqs
  sqs_url {queue_url}
  aws_key_id {your_aws_key_id}
  aws_sec_key {your_aws_secret_key}
  sqs_endpoint {sqs_endpoint}
  receive_interval 0.25
  max_number_of_messages 10
  tag demo
</source>

<match demo>
  type http
  endpoint_url    {api_endpoint_url}
  http_method     post   # default: post
  serializer      json   # default: form
  raise_on_error  false  # default: true
</match>

Now restart td-agent for the setup to work. 🙂

td-agent Dataflow

td-agent has many XML directives but our focus here are – source and match. <source></source> has details of the input plugin and <match></match> the output plugin.

<source>

Source is used to fetch data from different sources. Source can be of multiple types – http (HTTP POST), tcp (TCP Payload), unix (Unix sockets) and so forth. Example of an HTTP built-in plugin.

<source>
  type http
  port 8888
  bind 0.0.0.0
</source>

You could also write custom source plugins to accept data from any system in any format. In our example, we will fetch the data from SQS using a custom input plugin.

<source>
  type sqs
  sqs_url {queue_url}
  aws_key_id {your_aws_key_id}
  aws_sec_key {your_aws_secret_key}
  sqs_endpoint {sqs_endpoint}
  receive_interval 0.25
  max_number_of_messages 10
  tag demo
</source>

Here the type sqs is not native to td-agent. It is registered in the in_sqs.rb script line no. 8.

Plugin.register_input('sqs', self) # This allows use of type sqs

The remaining parameters are values which are used as part of fetching data from SQS by the script. td-agent defines an event using three entities – tag, time and record.

  • Tag – A string used by to direct data flow. Example: tag demo
  • Time – Time in Unix time format.
  • Record – The actual data in JSON format.

Line no. 61 in the in_sqs.rb script instructs td-agent engine to route the data to the specific match directive.

Engine.emit(@tag, Time.now.to_i, record)

<match>

Match is used to direct the input fetched using source to defined output. Match also can be of multiple types – file (write into file), mongo (store in Mongo DB), copy (store the same data in multiple outputs) and so forth. Example of an file built-in plugin.

<match pattern>            # pattern is a tag as defined in source
  type file
  path /var/log/fluent/myapp
  compress gzip
</match>

In our example we are again using a custom output plugin to send this as an HTTP API call to another system.

<match demo>
  type http
  endpoint_url    {api_endpoint_url}
  http_method     post   # default: post
  serializer      json   # default: form
  raise_on_error  false  # default: true
</match>

Line no. 109 in the out_http.rb script makes the HTTP request to the API which would have the logic to handle it.

res = Net::HTTP.new(uri.host, uri.port).start {|http| http.request(req)}

Note that the above plugin works better with bufferize plugin to buffer data into a file and then make the API call.

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s