EKS Cluster with On-demand Capacity Reservations (ODCR)¶

This pattern demonstrates how to consume/utilize on-demand capacity reservations (ODCRs) with Amazon EKS. The solution is comprised of primarily 3 components:

The node group that will utilize the ODCRs should have the subnets provided to it restricted to the availability zone where the ODCR(s) capacity is allocated. For example - if the ODCR(s) are allocated to us-west-2b, the node group should only have subnet IDs provided to it that reside in us-west-2b. If the subnets that reside in other AZs are provided, its possible to encounter an error such as InvalidParameterException: The following supplied instance types do not exist .... It is not guaranteed that this error will always be shown, and may appear random since the underlying autoscaling group(s) will provision nodes into different AZs at random. It will only occur when the underlying autoscaling group tries to provision instances into an AZ where capacity is not allocated and there is insufficient on-demand capacity for the desired instance type.
A custom launch template is required in order to specify the capacity_reservation_specification arguments. This is how the ODCRs are integrated into the node group (i.e. - tells the autoscaling group to utilize the provided capacity reservation(s)).

Info

By default, the terraform-aws-eks module creates and utilizes a custom launch template with EKS managed node groups which means users just need to supply the capacity_reservation_specification in their node group definition.
A resource group will need to be created for the capacity reservations. The resource group acts like a container, allowing for ODCRs to be added or removed as needed to adjust the available capacity. Utilizing the resource group allows for this additional capacity to be adjusted without any modification or disruption to the existing node group or launch template. As soon as the ODCR has been associated to the resource group, the node group can scale up to start utilizing that capacity.

Links:

Code¶

################################################################################
# Required Input
################################################################################

variable "capacity_reservation_arns" {
  description = "List of on-demand capacity block reservation ARNs for the node group"
  type        = list(string)
}

################################################################################
# Cluster
################################################################################

module "eks" {
  source  = "terraform-aws-modules/eks/aws"
  version = "~> 20.34"

  cluster_name    = local.name
  cluster_version = "1.32"

  # Give the Terraform identity admin access to the cluster
  # which will allow it to deploy resources into the cluster
  enable_cluster_creator_admin_permissions = true
  cluster_endpoint_public_access           = true

  # These will become the default in the next major version of the module
  bootstrap_self_managed_addons   = false
  enable_irsa                     = false
  enable_security_groups_for_pods = false

  cluster_addons = {
    coredns                   = {}
    eks-node-monitoring-agent = {}
    eks-pod-identity-agent = {
      before_compute = true
    }
    kube-proxy = {}
    vpc-cni = {
      most_recent    = true
      before_compute = true
    }
  }

  # Add security group rules on the node group security group to
  # allow EFA traffic
  enable_efa_support = true

  vpc_id     = module.vpc.vpc_id
  subnet_ids = module.vpc.private_subnets

  eks_managed_node_group_defaults = {
    node_repair_config = {
      enabled = true
    }
  }

  eks_managed_node_groups = {
    odcr = {
      # The EKS AL2023 NVIDIA AMI provides all of the necessary components
      # for accelerated workloads w/ EFA
      ami_type       = "AL2023_x86_64_NVIDIA"
      instance_types = ["p5.48xlarge"]

      # Mount instance store volumes in RAID-0 for kubelet and containerd
      # https://github.com/awslabs/amazon-eks-ami/blob/master/doc/USER_GUIDE.md#raid-0-for-kubelet-and-containerd-raid0
      cloudinit_pre_nodeadm = [
        {
          content_type = "application/node.eks.aws"
          content      = <<-EOT
            ---
            apiVersion: node.eks.aws/v1alpha1
            kind: NodeConfig
            spec:
              instance:
                localStorage:
                  strategy: RAID0
          EOT
        }
      ]

      min_size     = 2
      max_size     = 2
      desired_size = 2

      # This will:
      # 1. Create a placement group to place the instances close to one another
      # 2. Ignore subnets that reside in AZs that do not support the instance type
      # 3. Expose all of the available EFA interfaces on the launch template
      enable_efa_support = true

      min_size     = 4
      max_size     = 5
      desired_size = 2

      labels = {
        "vpc.amazonaws.com/efa.present" = "true"
        "nvidia.com/gpu.present"        = "true"
      }

      taints = {
        # Ensure only GPU workloads are scheduled on this node group
        gpu = {
          key    = "nvidia.com/gpu"
          value  = "true"
          effect = "NO_SCHEDULE"
        }
      }

      # First subnet is in the "${local.region}a" availability zone
      # where the capacity reservation is created
      # TODO - Update the subnet to match the availability zone of *YOUR capacity reservation
      subnet_ids = [element(module.vpc.private_subnets, 0)]

      # Targeted on-demand capacity reservation
      capacity_reservation_specification = {
        capacity_reservation_target = {
          capacity_reservation_resource_group_arn = aws_resourcegroups_group.odcr.arn
        }
      }
    }

    # This node group is for core addons such as CoreDNS
    default = {
      instance_types = ["m5.large"]

      min_size     = 1
      max_size     = 2
      desired_size = 2
    }
  }

  tags = local.tags
}

################################################################################
# Resource Group
################################################################################

resource "aws_resourcegroups_group" "odcr" {
  name        = "${local.name}-p5-odcr"
  description = "P5 instance on-demand capacity reservations"

  configuration {
    type = "AWS::EC2::CapacityReservationPool"
  }

  configuration {
    type = "AWS::ResourceGroups::Generic"

    parameters {
      name   = "allowed-resource-types"
      values = ["AWS::EC2::CapacityReservation"]
    }
  }
}

resource "aws_resourcegroups_resource" "odcr" {
  count = length(var.capacity_reservation_arns)

  group_arn    = aws_resourcegroups_group.odcr.arn
  resource_arn = element(var.capacity_reservation_arns, count.index)
}

Deploy¶

See here for the prerequisites and steps to deploy this pattern.

Validate¶

Navigate to the EC2 console page - on the left hand side, click on Capacity Reservationsunder the Instances section. You should see the capacity reservation(s) that have been created similar to the screenshot below. For this example, you can see that Available capacity column is empty, which means that the capacity reservations have been fully utilized by the example (as expected).
Click on one of the capacity reservation IDs to view the details of the capacity reservation. You should see the details of the capacity reservation similar to the screenshot below. For this example, you can see that Available capacity is 0 instances, which means that the capacity reservation has been fully utilized by the example (as expected).

Destroy¶

terraform destroy -target="module.eks_blueprints_addons" -auto-approve
terraform destroy -target="module.eks" -auto-approve
terraform destroy -auto-approve

See here for more details on cleaning up the resources created.