Amazon ECS Cluster
Overview
This service contains Terraform code to deploy a production-grade ECS cluster on AWS using Elastic Container Service (ECS).
This service launches an ECS cluster on top of an Auto Scaling Group that you manage. If you wish to launch an ECS cluster on top of Fargate that is completely managed by AWS, refer to the ecs-fargate-cluster module. Refer to the section EC2 vs Fargate Launch Types for more information on the differences between the two flavors.
ECS architecture
Features
This Terraform Module launches an EC2 Container Service Cluster that you can use to run Docker containers. The cluster consists of a configurable number of instances in an Auto Scaling Group (ASG). Each instance:
Runs the ECS Container Agent so it can communicate with the ECS scheduler.
Authenticates with a Docker repo so it can download private images. The Docker repo auth details should be encrypted using Amazon Key Management Service (KMS) and passed in as input variables. The instances, when booting up, will use gruntkms to decrypt the data in-memory. Note that the IAM role for these instances, which uses
var.cluster_name
as its name, must be granted access to the Customer Master Key (CMK) used to encrypt the data.Runs the CloudWatch Logs Agent to send all logs in syslog to CloudWatch Logs. This is configured using the cloudwatch-agent.
Emits custom metrics that are not available by default in CloudWatch, including memory and disk usage. This is configured using the cloudwatch-agent module.
Runs the syslog module to automatically rotate and rate limit syslog so that your instances don’t run out of disk space from large volumes.
Runs the ssh-grunt module so that developers can upload their public SSH keys to IAM and use those SSH keys, along with their IAM user names, to SSH to the ECS Nodes.
Runs the auto-update module so that the ECS nodes install security updates automatically.
Learn
note
This repo is a part of the Gruntwork Service Catalog, a collection of reusable, battle-tested, production ready infrastructure code. If you’ve never used the Service Catalog before, make sure to read How to use the Gruntwork Service Catalog!
Under the hood, this is all implemented using Terraform modules from the Gruntwork terraform-aws-ecs repo. If you are a subscriber and don’t have access to this repo, email support@gruntwork.io.
Core concepts
To understand core concepts like what is ECS, and the different cluster types, see the documentation in the terraform-aws-ecs repo.
To use ECS, you first deploy one or more EC2 Instances into a "cluster". The ECS scheduler can then deploy Docker containers across any of the instances in this cluster. Each instance needs to have the Amazon ECS Agent installed so it can communicate with ECS and register itself as part of the right cluster.
For more info on ECS clusters, including how to run Docker containers in a cluster, how to add additional security group rules, how to handle IAM policies, and more, check out the ecs-cluster documentation in the terraform-aws-ecs repo.
For info on finding your Docker container logs and custom metrics in CloudWatch, check out the cloudwatch-agent documentation.
Repo organization
- modules: the main implementation code for this repo, broken down into multiple standalone, orthogonal submodules.
- examples: This folder contains working examples of how to use the submodules.
- test: Automated tests for the modules and examples.
Deploy
Non-production deployment (quick start for learning)
If you just want to try this repo out for experimenting and learning, check out the following resources:
- examples/for-learning-and-testing folder: The
examples/for-learning-and-testing
folder contains standalone sample code optimized for learning, experimenting, and testing (but not direct production usage).
Production deployment
If you want to deploy this repo in production, check out the following resources:
- examples/for-production folder: The
examples/for-production
folder contains sample code optimized for direct usage in production. This is code from the Gruntwork Reference Architecture, and it shows you how we build an end-to-end, integrated tech stack on top of the Gruntwork Service Catalog.
Manage
For information on how to configure cluster autoscaling, see How do you configure cluster autoscaling?
For information on how to manage your ECS cluster, see the documentation in the terraform-aws-ecs repo.
Reference
- Inputs
- Outputs
Required
cluster_instance_ami
string(required)The AMI to run on each instance in the ECS cluster. You can build the AMI using the Packer template ecs-node-al2.json. One of cluster_instance_ami
or cluster_instance_ami_filters
is required.
cluster_instance_ami_filters
object(required)Properties on the AMI that can be used to lookup a prebuilt AMI for use with ECS workers. You can build the AMI using the Packer template ecs-node-al2.json. Only used if cluster_instance_ami
is null. One of cluster_instance_ami
or cluster_instance_ami_filters
is required. Set to null if cluster_instance_ami is set.
object({
# List of owners to limit the search. Set to null if you do not wish to limit the search by AMI owners.
owners = list(string)
# Name/Value pairs to filter the AMI off of. There are several valid keys, for a full reference, check out the
# documentation for describe-images in the AWS CLI reference
# (https://docs.aws.amazon.com/cli/latest/reference/ec2/describe-images.html).
filters = list(object({
name = string
values = list(string)
}))
})
cluster_instance_type
string(required)The type of instances to run in the ECS cluster (e.g. t2.medium)
cluster_max_size
number(required)The maxiumum number of instances to run in the ECS cluster
cluster_min_size
number(required)The minimum number of instances to run in the ECS cluster
cluster_name
string(required)The name of the ECS cluster
vpc_id
string(required)The ID of the VPC in which the ECS cluster should be launched
vpc_subnet_ids
list(required)The IDs of the subnets in which to deploy the ECS cluster instances
list(string)
Optional
alarms_sns_topic_arn
list(optional)The ARNs of SNS topics where CloudWatch alarms (e.g., for CPU, memory, and disk space usage) should send notifications
list(string)
[]
allow_ssh_from_cidr_blocks
list(optional)The IP address ranges in CIDR format from which to allow incoming SSH requests to the ECS instances.
list(string)
[]
allow_ssh_from_security_group_ids
list(optional)The IDs of security groups from which to allow incoming SSH requests to the ECS instances.
list(string)
[]
autoscaling_termination_protection
bool(optional)Protect EC2 instances running ECS tasks from being terminated due to scale in (spot instances do not support lifecycle modifications). Note that the behavior of termination protection differs between clusters with capacity providers and clusters without. When capacity providers is turned on and this flag is true, only instances that have 0 ECS tasks running will be scaled in, regardless of capacity_provider_target. If capacity providers is turned off and this flag is true, this will prevent ANY instances from being scaled in.
false
capacity_provider_enabled
bool(optional)Enable a capacity provider to autoscale the EC2 ASG created for this ECS cluster.
false
capacity_provider_max_scale_step
number(optional)Maximum step adjustment size to the ASG's desired instance count. A number between 1 and 10000.
null
capacity_provider_min_scale_step
number(optional)Minimum step adjustment size to the ASG's desired instance count. A number between 1 and 10000.
null
capacity_provider_target
number(optional)Target cluster utilization for the ASG capacity provider; a number from 1 to 100. This number influences when scale out happens, and when instances should be scaled in. For example, a setting of 90 means that new instances will be provisioned when all instances are at 90% utilization, while instances that are only 10% utilized (CPU and Memory usage from tasks = 10%) will be scaled in.
null
cloud_init_parts
map(optional)Cloud init scripts to run on the ECS cluster instances during boot. See the part blocks in https://www.terraform.io/docs/providers/template/d/cloudinit_config.html for syntax
map(object({
filename = string
content_type = string
content = string
}))
{}
cloudwatch_log_group_kms_key_id
string(optional)The ID (ARN, alias ARN, AWS ID) of a customer managed KMS Key to use for encrypting log data.
null
cloudwatch_log_group_name
string(optional)The name of the log group to create in CloudWatch. Defaults to <a href="#cluster_name"><code>cluster_name</code></a>-logs
.
""
cloudwatch_log_group_retention_in_days
number(optional)The number of days to retain log events in the log group. Refer to https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/cloudwatch_log_group#retention_in_days for all the valid values. When null, the log events are retained forever.
null
cloudwatch_log_group_tags
map(optional)Tags to apply on the CloudWatch Log Group, encoded as a map where the keys are tag keys and values are tag values.
map(string)
null
cluster_access_from_sgs
list(optional)Specify a list of Security Groups that will have access to the ECS cluster. Only used if enable_cluster_access_ports
is set to true
list(any)
[]
cluster_instance_associate_public_ip_address
bool(optional)Whether to associate a public IP address with an instance in a VPC
false
cluster_instance_keypair_name
string(optional)The name of the Key Pair that can be used to SSH to each instance in the ECS cluster
null
default_user
string(optional)The default OS user for the ECS worker AMI. For AWS Amazon Linux AMIs, which is what the Packer template in ecs-node-al2.json uses, the default OS user is 'ec2-user'.
ec2-user
disallowed_availability_zones
list(optional)A list of availability zones in the region that should be skipped when deploying ECS. You can use this to avoid availability zones that may not be able to provision the resources (e.g instance type does not exist). If empty, allows all availability zones.
list(string)
[]
enable_cloudwatch_log_aggregation
bool(optional)Set to true to enable Cloudwatch log aggregation for the ECS cluster
true
enable_cloudwatch_metrics
bool(optional)Set to true to enable Cloudwatch metrics collection for the ECS cluster
true
enable_cluster_access_ports
list(optional)Specify a list of ECS Cluster ports which should be accessible from the security groups given in cluster_access_from_sgs
list(any)
[]
enable_ecs_cloudwatch_alarms
(optional)Set to true to enable several basic Cloudwatch alarms around CPU usage, memory usage, and disk space usage. If set to true, make sure to specify SNS topics to send notifications to using alarms_sns_topic_arn
true
enable_fail2ban
bool(optional)Enable fail2ban to block brute force log in attempts. Defaults to true
true
enable_ip_lockdown
bool(optional)Enable ip-lockdown to block access to the instance metadata. Defaults to true
true
enable_ssh_grunt
bool(optional)Set to true to add IAM permissions for ssh-grunt (https://github.com/gruntwork-io/terraform-aws-security/tree/master/modules/ssh-grunt), which will allow you to manage SSH access via IAM groups.
true
external_account_ssh_grunt_role_arn
string(optional)Since our IAM users are defined in a separate AWS account, this variable is used to specify the ARN of an IAM role that allows ssh-grunt to retrieve IAM group and public SSH key info from that account.
""
high_cpu_utilization_evaluation_periods
number(optional)The number of periods over which data is compared to the specified threshold
2
high_cpu_utilization_period
number(optional)The period, in seconds, over which to measure the CPU utilization percentage. Only used if enable_ecs_cloudwatch_alarms
is set to true
300
high_cpu_utilization_statistic
string(optional)The statistic to apply to the alarm's high CPU metric. Either of the following is supported: SampleCount, Average, Sum, Minimum, Maximum
Average
high_cpu_utilization_threshold
number(optional)Trigger an alarm if the ECS Cluster has a CPU utilization percentage above this threshold. Only used if enable_ecs_cloudwatch_alarms
is set to true
90
high_disk_utilization_period
number(optional)The period, in seconds, over which to measure the disk utilization percentage. Only used if enable_ecs_cloudwatch_alarms
is set to true
300
high_disk_utilization_threshold
number(optional)Trigger an alarm if the EC2 instances in the ECS Cluster have a disk utilization percentage above this threshold. Only used if enable_ecs_cloudwatch_alarms
is set to true
90
high_memory_utilization_evaluation_periods
number(optional)The number of periods over which data is compared to the specified threshold
2
high_memory_utilization_period
number(optional)The period, in seconds, over which to measure the memory utilization percentage. Only used if enable_ecs_cloudwatch_alarms
is set to true
300
high_memory_utilization_statistic
string(optional)The statistic to apply to the alarm's high CPU metric. Either of the following is supported: SampleCount, Average, Sum, Minimum, Maximum
Average
high_memory_utilization_threshold
number(optional)Trigger an alarm if the ECS Cluster has a memory utilization percentage above this threshold. Only used if enable_ecs_cloudwatch_alarms
is set to true
90
internal_alb_sg_ids
list(optional)The Security Group ID for the internal ALB
list(string)
[]
multi_az_capacity_provider
bool(optional)Enable a multi-az capacity provider to autoscale the EC2 ASGs created for this ECS cluster, only if capacity_provider_enabled = true
false
public_alb_sg_ids
list(optional)The Security Group ID for the public ALB
list(string)
[]
should_create_cloudwatch_log_group
bool(optional)When true, precreate the CloudWatch Log Group to use for log aggregation from the EC2 instances. This is useful if you wish to customize the CloudWatch Log Group with various settings such as retention periods and KMS encryption. When false, the CloudWatch agent will automatically create a basic log group to use.
true
ssh_grunt_iam_group
string(optional)If you are using ssh-grunt, this is the name of the IAM group from which users will be allowed to SSH to the nodes in this ECS cluster. This value is only used if enable_ssh_grunt=true.
ssh-grunt-users
ssh_grunt_iam_group_sudo
string(optional)If you are using ssh-grunt, this is the name of the IAM group from which users will be allowed to SSH to the nodes in this ECS cluster with sudo permissions. This value is only used if enable_ssh_grunt=true.
ssh-grunt-sudo-users
tenancy
string(optional)The tenancy of this server. Must be one of: default, dedicated, or host.
default
use_managed_iam_policies
bool(optional)When true, all IAM policies will be managed as dedicated policies rather than inline policies attached to the IAM roles. Dedicated managed policies are friendlier to automated policy checkers, which may scan a single resource for findings. As such, it is important to avoid inline policies when targeting compliance with various security standards.
true
A list of all the CloudWatch Dashboard metric widgets available in this module.
The ID of the ECS cluster
The name of the ECS cluster's autoscaling group (ASG)
For configurations with multiple ASGs, this contains a list of ASG names.
For configurations with multiple capacity providers, this contains a list of all capacity provider names.
The ID of the launch configuration used by the ECS cluster's auto scaling group (ASG)
The name of the ECS cluster
The ID of the VPC into which the ECS cluster is launched
The VPC subnet IDs into which the ECS cluster can launch resources into
The ARN of the IAM role applied to ECS instances
The ID of the IAM role applied to ECS instances
The name of the IAM role applied to ECS instances
The ID of the security group applied to ECS instances
The CloudWatch Dashboard metric widget for the ECS cluster workers' CPU utilization metric.
The CloudWatch Dashboard metric widget for the ECS cluster workers' Memory utilization metric.