Problem#

  • EKS node group created but nodes are not joining the cluster
  • kubectl get nodes shows no nodes
  • EC2 instances are in Running state

Checklist#

1. VPC DNS Settings#

SettingRequired ValueCheck
enable_dns_hostnamestrue
enable_dns_supporttrue

Verification:

resource "aws_vpc" "main" {
  enable_dns_hostnames = true
  enable_dns_support   = true
}

2. Subnet Tags#

The EKS controller requires specific tags to recognize subnets

SubnetTagCheck
Publickubernetes.io/role/elb = 1
Privatekubernetes.io/role/internal-elb = 1
Bothkubernetes.io/cluster/<cluster-name> = shared

Verification:

# modules/vpc/main.tf - Public Subnet
tags = {
  "kubernetes.io/role/elb"                        = "1"
  "kubernetes.io/cluster/${var.eks_cluster_name}" = "shared"
}

# modules/vpc/main.tf - Private Subnet
tags = {
  "kubernetes.io/role/internal-elb"               = "1"
  "kubernetes.io/cluster/${var.eks_cluster_name}" = "shared"
}

3. Subnet Settings#

SubnetSettingRequired ValueCheck
Publicmap_public_ip_on_launchtrue
Privatemap_public_ip_on_launchfalse (default)

4. Security Group Communication#

Required ports between nodes and the control plane

DirectionPortPurposeCheck
Node → Control Plane443API Server
Control Plane → Node10250Kubelet
Node ↔ NodeAllPod communication

When using terraform-aws-modules/eks:

# Required SG rules are created automatically
attach_cluster_primary_security_group = true

5. Cluster Endpoint Settings#

SettingRecommendedDescriptionCheck
cluster_endpoint_private_accesstruePrivate nodes access API via internal network
cluster_endpoint_public_accesstrueExternal kubectl access

Note: If private_access = false, nodes must go through NAT → public internet, which complicates routing.


6. VPC Endpoints (Required for AL2023)#

ItemDescription
Service namecom.amazonaws.<region>.eks-auth
TypeInterface (PrivateLink)
RequiredMandatory when using AL2023 AMI

Why:

  • AL2023 uses nodeadm (replaces legacy bootstrap.sh)
  • nodeadm uses EKS Pod Identity for authentication
  • eks-auth API is PrivateLink-only (not reachable via NAT Gateway)

Symptom:

  • Node bootstrap log shows nodeadm: done!
  • But node never registers with the cluster

Fix:

# config.yaml
vpc_endpoints:
  - eks-auth

Other VPC Endpoints#

EndpointTypeNAT AlternativeCost
S3GatewayPossibleFree
ECR (ecr.api, ecr.dkr)InterfacePossiblePaid
STSInterfacePossiblePaid
eks-authInterfaceNot possiblePaid

Conclusion: When using NAT Gateway, only eks-auth is mandatory; the rest are optional.


7. AMI Type and User Data#

AL2 vs AL2023 Differences#

ItemAL2AL2023
Bootstrapbootstrap.shnodeadm
AuthenticationIAM RoleEKS Pod Identity
eks-auth requiredNoYes (mandatory)
AMI typeAL2_ARM_64AL2023_ARM_64_STANDARD

User Data Conflict Check#

  • Custom user_data can conflict with AL2023 bootstrap
  • When using terraform-aws-modules/eks, it’s handled automatically — best not to override

Verification:

# modules/eks/main.tf - Node Group
infra = {
  ami_type = "AL2023_ARM_64_STANDARD"
  # no custom user_data or launch_template
}

8. IAM Policies#

Required policies for the node IAM Role

PolicyPurposeCheck
AmazonEKSWorkerNodePolicyBasic node policy(auto via module)
AmazonEKS_CNI_PolicyVPC CNI(auto via module)
AmazonEC2ContainerRegistryReadOnlyECR pull(auto via module)
AmazonEKSWorkerNodeMinimalPolicyAL2023 minimal permissions
AmazonEBSCSIDriverPolicyEBS CSI

9. EKS Access Entries#

Authentication method for EKS 1.30+

TypePurpose
EC2_LINUXNode group (auto-created)
STANDARDUser/role access

SSO access configuration:

# config.yaml
eks:
  access_entries:
    devops_sso:
      enabled: true
      role_arn: "arn:aws:iam::ACCOUNT_ID:role/aws-reserved/sso.amazonaws.com/REGION/AWSReservedSSO_..."

Debugging Commands#

Check EC2 instance logs#

# Connect to node via SSM
aws ssm start-session --target <instance-id>

# Bootstrap logs
sudo journalctl -u nodeadm -f
sudo cat /var/log/cloud-init-output.log

# kubelet status
sudo systemctl status kubelet
sudo journalctl -u kubelet -f

Check cluster status#

# Node list
kubectl get nodes

# Cluster info
kubectl cluster-info

# Check Access Entries
aws eks list-access-entries --cluster-name <cluster-name>

Final Checklist#

  • VPC DNS settings (enable_dns_hostnames, enable_dns_support)
  • Subnet tags (kubernetes.io/role/elb, kubernetes.io/cluster/...)
  • Public subnet map_public_ip_on_launch = true
  • Security group allows ports 443 and 10250
  • cluster_endpoint_private_access = true
  • eks-auth VPC Endpoint created (required for AL2023)
  • AMI type confirmed (AL2023_ARM_64_STANDARD)
  • No custom User Data
  • IAM policies include AmazonEKSWorkerNodeMinimalPolicy

Notes#

  • Missing eks-auth VPC Endpoint is the primary cause for EKS 1.35 + AL2023 combinations
  • NAT Gateway alone cannot reach the eks-auth API (PrivateLink only)
  • terraform-aws-modules/eks v20.x handles most settings automatically