Skip to content

EKS w/ ML Capacity Block Reservation (CBR)

This pattern demonstrates how to consume/utilize ML capacity block reservations (CBR) with Amazon EKS. The solution is comprised of primarily of the following components:

  1. The node group, either EKS managed or self-managed, that will utilize the CBR should have the subnets provided to it restricted to the availability zone where the CBR has been allocated. For example - if the CBR is allocated to us-west-2b, the node group should only have subnet IDs provided to it that reside in us-west-2b. If the subnets that reside in other AZs are provided, its possible to encounter an error such as InvalidParameterException: The following supplied instance types do not exist .... It is not guaranteed that this error will always be shown, and may appear random since the underlying autoscaling group(s) will provision nodes into different AZs at random. It will only occur when the underlying autoscaling group tries to provision instances into an AZ where capacity is not allocated and there is insufficient on-demand capacity for the desired instance type.
  2. The launch template utilized should specify the instance_market_options and capacity_reservation_specification arguments. This is how the CBR is utilized by the node group (i.e. - tells the autoscaling group to launch instances utilizing provided capacity reservation).
  3. In the case of EKS managed node group(s), the capacity_type should be set to "CAPACITY_BLOCK".

Links:

Code

################################################################################
# Required Input
################################################################################

# See https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/capacity-blocks-using.html
# on how to obtain a ML capacity block reservation. Once acquired, you can provide
# the reservation ID through this input to deploy the pattern
variable "capacity_reservation_id" {
  description = "The ID of the ML capacity block reservation for the node group"
  type        = string
}

################################################################################
# Cluster
################################################################################

module "eks" {
  source  = "terraform-aws-modules/eks/aws"
  version = "~> 20.26"

  cluster_name    = local.name
  cluster_version = "1.31"

  # Give the Terraform identity admin access to the cluster
  # which will allow it to deploy resources into the cluster
  enable_cluster_creator_admin_permissions = true
  cluster_endpoint_public_access           = true

  cluster_addons = {
    coredns                = {}
    eks-pod-identity-agent = {}
    kube-proxy             = {}
    vpc-cni = {
      most_recent = true
    }
  }

  # Add security group rules on the node group security group to
  # allow EFA traffic
  enable_efa_support = true

  vpc_id     = module.vpc.vpc_id
  subnet_ids = module.vpc.private_subnets

  eks_managed_node_groups = {
    cbr = {
      # The EKS AL2023 NVIDIA AMI provides all of the necessary components
      # for accelerated workloads w/ EFA
      ami_type       = "AL2023_x86_64_NVIDIA"
      instance_types = ["p5e.48xlarge"]

      # Mount instance store volumes in RAID-0 for kubelet and containerd
      # https://github.com/awslabs/amazon-eks-ami/blob/master/doc/USER_GUIDE.md#raid-0-for-kubelet-and-containerd-raid0
      cloudinit_pre_nodeadm = [
        {
          content_type = "application/node.eks.aws"
          content      = <<-EOT
            ---
            apiVersion: node.eks.aws/v1alpha1
            kind: NodeConfig
            spec:
              instance:
                localStorage:
                  strategy: RAID0
          EOT
        }
      ]

      min_size     = 2
      max_size     = 2
      desired_size = 2

      # This will:
      # 1. Create a placement group to place the instances close to one another
      # 2. Ignore subnets that reside in AZs that do not support the instance type
      # 3. Expose all of the available EFA interfaces on the launch template
      enable_efa_support = true

      labels = {
        "vpc.amazonaws.com/efa.present" = "true"
        "nvidia.com/gpu.present"        = "true"
      }

      taints = {
        # Ensure only GPU workloads are scheduled on this node group
        gpu = {
          key    = "nvidia.com/gpu"
          value  = "true"
          effect = "NO_SCHEDULE"
        }
      }

      # First subnet is in the "${local.region}a" availability zone
      # where the capacity reservation is created
      # TODO - Update the subnet to match the availability zone of *YOUR capacity reservation
      subnet_ids = [element(module.vpc.private_subnets, 0)]

      # ML capacity block reservation
      capacity_type = "CAPACITY_BLOCK"
      instance_market_options = {
        market_type = "capacity-block"
      }
      capacity_reservation_specification = {
        capacity_reservation_target = {
          capacity_reservation_id = var.capacity_reservation_id
        }
      }
    }
    # This node group is for core addons such as CoreDNS
    default = {
      instance_types = ["m5.large"]

      min_size     = 2
      max_size     = 2
      desired_size = 2
    }
  }

  # Self-managed node group equivalent for ML capacity block reservation
  # This is not required for ML CBR support with EKS managed node groups,
  # its just showing use with both node group types. Users should select
  # the one that works for their use case.
  self_managed_node_groups = {
    cbr2 = {
      # The EKS AL2023 NVIDIA AMI provides all of the necessary components
      # for accelerated workloads w/ EFA
      ami_type      = "AL2023_x86_64_NVIDIA"
      instance_type = "p5e.48xlarge"

      # Mount instance store volumes in RAID-0 for kubelet and containerd
      # https://github.com/awslabs/amazon-eks-ami/blob/master/doc/USER_GUIDE.md#raid-0-for-kubelet-and-containerd-raid0
      cloudinit_pre_nodeadm = [
        {
          content_type = "application/node.eks.aws"
          content      = <<-EOT
            ---
            apiVersion: node.eks.aws/v1alpha1
            kind: NodeConfig
            spec:
              instance:
                localStorage:
                  strategy: RAID0
              kubelet:
                flags:
                  - --node-labels=vpc.amazonaws.com/efa.present=true,nvidia.com/gpu.present=true
                  - --register-with-taints=nvidia.com/gpu=true:NoSchedule
          EOT
        }
      ]

      min_size     = 2
      max_size     = 2
      desired_size = 2

      # This will:
      # 1. Create a placement group to place the instances close to one another
      # 2. Ignore subnets that reside in AZs that do not support the instance type
      # 3. Expose all of the available EFA interfaces on the launch template
      enable_efa_support = true

      # First subnet is in the "${local.region}a" availability zone
      # where the capacity reservation is created
      # TODO - Update the subnet to match the availability zone of *YOUR capacity reservation
      subnet_ids = [element(module.vpc.private_subnets, 0)]

      # ML capacity block reservation
      instance_market_options = {
        market_type = "capacity-block"
      }
      capacity_reservation_specification = {
        capacity_reservation_target = {
          capacity_reservation_id = var.capacity_reservation_id
        }
      }
    }
  }

  tags = local.tags
}

Deploy

See here for the prerequisites and steps to deploy this pattern.

Destroy

terraform destroy -target="module.eks_blueprints_addons" -auto-approve
terraform destroy -target="module.eks" -auto-approve
terraform destroy -auto-approve

See here for more details on cleaning up the resources created.