Blog

Add CloudWatch Agent to EC2 instances via puppet

Add CloudWatch Agent to EC2 instances via puppet

Sometimes additional metrics are needed for some servers to help better debug performance problems. That, or some extra alerting on the stateful systems that must stick around for longer than the auto-scaled instances would be helpful.

In AWS, beyond the instance-level metrics collected and visible at the EC2 metrics, there are operating-system level metrics that come in handy: per-volume disk free, an example metric that you don’t want to run down to zero.

For that application, AWS created the CloudWatch Agent service that can be installed on various operating systems. For the sake of this article, I’m assuming running a debian-based OS.

In a puppet module, create a new file, config.json. This will be used for the cloudwatch agent’s configuration.

{
  "agent": {
    "metrics_collection_interval": 60,
    "run_as_user": "cwagent"
  },
  "metrics": {
    "append_dimensions": {
      "AutoScalingGroupName": "${aws:AutoScalingGroupName}",
      "ImageId": "${aws:ImageId}",
      "InstanceId": "${aws:InstanceId}",
      "InstanceType": "${aws:InstanceType}"
    },
    "metrics_collected": {
      "collectd": {
        "metrics_aggregation_interval": 60
      },
      "cpu": {
        "measurement": [
          "cpu_usage_idle",
          "cpu_usage_iowait",
          "cpu_usage_user",
          "cpu_usage_system"
        ],
        "metrics_collection_interval": 60,
        "resources": [
          "*"
        ],
        "totalcpu": false
      },
      "disk": {
        "measurement": [
          "used_percent",
          "inodes_free"
        ],
        "metrics_collection_interval": 60,
        "resources": [
          "*"
        ]
      },
      "diskio": {
        "measurement": [
          "io_time"
        ],
        "metrics_collection_interval": 60,
        "resources": [
          "*"
        ]
      },
      "mem": {
        "measurement": [
          "mem_used_percent"
        ],
        "metrics_collection_interval": 60
      },
      "statsd": {
        "metrics_aggregation_interval": 60,
        "metrics_collection_interval": 10,
        "service_address": ":8125"
      },
      "swap": {
        "measurement": [
          "swap_used_percent"
        ],
        "metrics_collection_interval": 60
      }
    }
  }
}

Here’s my puppet configuration that I needed to get CloudWatch agent up and running:


  file { "/tmp/amazon-cloudwatch-agent.deb":
    source => "https://s3.amazonaws.com/amazoncloudwatch-agent/ubuntu/amd64/latest/amazon-cloudwatch-agent.deb",
    ensure => present
  }

  package { "collectd":
    ensure => present
  }

  package { "cloudwatch-agent":
    source   => "/tmp/amazon-cloudwatch-agent.deb",
    require  => [Package["collectd"], File["/tmp/amazon-cloudwatch-agent.deb"]],
    provider => dpkg,
    ensure   => latest,
  }

  file { "/opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.json":
    source  => "puppet:///modules/pb_profile/opt/aws/amazon-cloudwatch-agent/bin/config.json",
    ensure  => present,
    owner   => "root",
    group   => "root",
    mode    => "0644",
    require => Package["cloudwatch-agent"]
  }

  file { '/etc/cron.d/cloudwatch_agent_on_boot':
    ensure => present,
    owner  => 'root',
    mode   => '0644',
    group  => 'root',
    source => 'puppet:///modules/pb_profile/etc/cron.d/cloudwatch_agent_on_boot'
  }

and finally, the CRON job that’s responsible for starting up the cloudwatch agent process:

## Managed by puppet
@reboot root /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl -a fetch-config -m ec2 -c file:/opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.json -s 2>&1 | logger -t pmm_register_on_boot

P.S., don’t forget the newline in the CRON job!

The EC2 instance needs an IAM instance role attached. I’ve found that these policies need to be included:

  • AmazonEC2RoleforSSM
  • CloudWatchAgentServerPolicy
  • AmazonSSMManagedInstanceCore

The SSM roles aren’t critical; however, if you place configuration in SSM then that’s needed.

Development-Operations
Jack Peterson