All Posts - DevHub | DeveloperHat

Amazon Elastic Kubernetes Service (EKS) provides a managed Kubernetes service that simplifies container orchestration at scale. This guide explores best practices for optimizing your EKS clusters across security, scalability, and operational aspects.

%%{init: {'theme': 'base', 'themeVariables': { 'background': 'transparent', 'primaryColor': '#FF9900', 'primaryTextColor': '#232F3E', 'lineColor': '#147EB4' }}}%% flowchart TB subgraph Network["VPC & Networking"] direction TB subgraph AZ1["Availability Zone 1"] direction TB Subnet1["fa:fa-network-wired Public Subnet"] Subnet2["fa:fa-network-wired Private Subnet"] MNG1["fa:fa-server Managed Node Group"] end subgraph AZ2["Availability Zone 2"] direction TB Subnet3["fa:fa-network-wired Public Subnet"] Subnet4["fa:fa-network-wired Private Subnet"] MNG2["fa:fa-server Managed Node Group"] end end subgraph Control["Control Plane & Security"] direction LR EKS["fa:fa-cogs EKS Control Plane"] OIDC["fa:fa-key OIDC Provider"] IAM["fa:fa-shield-alt IAM Roles"] SG["fa:fa-lock Security Groups"] end subgraph Workloads["Workload Management"] direction TB subgraph Resources["Resource Management"] direction LR HPA["fa:fa-balance-scale HPA"] VPA["fa:fa-arrows-alt VPA"] CA["fa:fa-expand Cluster Autoscaler"] end subgraph Storage["Storage Solutions"] direction LR EBS["fa:fa-hdd EBS CSI"] EFS["fa:fa-folder EFS CSI"] S3["fa:fa-database S3"] end end subgraph Observability["Monitoring & Logging"] direction LR CW["fa:fa-chart-line CloudWatch"] CT["fa:fa-history CloudTrail"] Prometheus["fa:fa-chart-bar Prometheus"] Fluentbit["fa:fa-stream Fluent Bit"] end EKS --> MNG1 EKS --> MNG2 EKS --> OIDC OIDC --> IAM MNG1 --> Resources MNG2 --> Resources Resources --> Storage EKS --> Observability %% Styling classDef networkNode fill:#FF9900,stroke:#FF9900,color:#232F3E,stroke-width:2px classDef controlNode fill:#232F3E,stroke:#232F3E,color:#FFFFFF,stroke-width:2px classDef workloadNode fill:#147EB4,stroke:#147EB4,color:#FFFFFF,stroke-width:2px classDef observeNode fill:#147EB4,stroke:#147EB4,color:#FFFFFF,stroke-width:2px classDef groupStyle fill:transparent,stroke:#147EB4,stroke-width:2px class Subnet1,Subnet2,Subnet3,Subnet4,MNG1,MNG2 networkNode class EKS,OIDC,IAM,SG controlNode class HPA,VPA,CA,EBS,EFS,S3 workloadNode class CW,CT,Prometheus,Fluentbit observeNode class Network,Control,Workloads,Resources,Storage,Observability groupStyle

Cluster Design Best Practices

1. Node Group Configuration

apiVersion: eks.amazonaws.com/v1alpha1
kind: NodeGroup
metadata:
  name: production-nodes
spec:
  clusterName: production-cluster
  nodeRole: arn:aws:iam::111122223333:role/eks-node-role
  subnets:
    - subnet-0123456789abcdef0
    - subnet-0123456789abcdef1
  instanceTypes:
    - m5.large
    - m5a.large
  scaling:
    minSize: 2
    maxSize: 10
    desiredSize: 3
  labels:
    role: application
    environment: production
  taints:
    - key: dedicated
      value: production
      effect: NoSchedule

2. High Availability Setup

apiVersion: apps/v1
kind: Deployment
metadata:
  name: sample-app
spec:
  replicas: 3
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 0
  template:
    metadata:
      labels:
        app: sample-app
    spec:
      topologySpreadConstraints:
        - maxSkew: 1
          topologyKey: topology.kubernetes.io/zone
          whenUnsatisfied: DoNotSchedule
          labelSelector:
            matchLabels:
              app: sample-app
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
            - weight: 100
              podAffinityTerm:
                labelSelector:
                  matchExpressions:
                    - key: app
                      operator: In
                      values:
                        - sample-app
                topologyKey: kubernetes.io/hostname

Security Best Practices

1. IAM Roles for Service Accounts (IRSA)

apiVersion: v1
kind: ServiceAccount
metadata:
  name: app-service-account
  annotations:
    eks.amazonaws.com/role-arn: arn:aws:iam::111122223333:role/app-role

2. Network Policies

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: restrict-traffic
spec:
  podSelector:
    matchLabels:
      app: secure-app
  policyTypes:
    - Ingress
    - Egress
  ingress:
    - from:
        - namespaceSelector:
            matchLabels:
              environment: production
        - podSelector:
            matchLabels:
              role: frontend
      ports:
        - protocol: TCP
          port: 8080
  egress:
    - to:
        - namespaceSelector:
            matchLabels:
              environment: production
      ports:
        - protocol: TCP
          port: 5432

Resource Management

1. Resource Quotas

apiVersion: v1
kind: ResourceQuota
metadata:
  name: compute-resources
spec:
  hard:
    requests.cpu: "4"
    requests.memory: 8Gi
    limits.cpu: "8"
    limits.memory: 16Gi
    pods: "20"

2. Horizontal Pod Autoscaling

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: app-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: sample-app
  minReplicas: 2
  maxReplicas: 10
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70
    - type: Resource
      resource:
        name: memory
        target:
          type: Utilization
          averageUtilization: 80

Monitoring and Logging

1. CloudWatch Container Insights

apiVersion: v1
kind: ConfigMap
metadata:
  name: cwagent-config
  namespace: amazon-cloudwatch
data:
  cwagentconfig.json: |
    {
      "logs": {
        "metrics_collected": {
          "kubernetes": {
            "cluster_name": "production-cluster",
            "metrics_collection_interval": 60
          }
        },
        "force_flush_interval": 5
      }
    }

2. Prometheus and Grafana Setup

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: app-monitor
spec:
  selector:
    matchLabels:
      app: sample-app
  endpoints:
    - port: metrics
      interval: 15s
      path: /metrics

Cost Optimization

1. Spot Instances Configuration

apiVersion: eks.amazonaws.com/v1alpha1
kind: NodeGroup
metadata:
  name: spot-nodes
spec:
  clusterName: production-cluster
  nodeRole: arn:aws:iam::111122223333:role/eks-node-role
  instanceTypes:
    - m5.large
    - m5a.large
    - m5d.large
  capacityType: SPOT
  scaling:
    minSize: 2
    maxSize: 10

2. Cluster Autoscaler Settings

apiVersion: v1
kind: ConfigMap
metadata:
  name: cluster-autoscaler-config
data:
  config.yaml: |
    scaleDownUnneededTime: 5m
    scaleDownDelayAfterAdd: 5m
    scaleDownUtilizationThreshold: 0.5
    skipNodesWithSystemPods: true

Operational Excellence

1. GitOps Implementation

apiVersion: source.toolkit.fluxcd.io/v1beta2
kind: GitRepository
metadata:
  name: app-source
spec:
  interval: 1m
  url: https://github.com/org/app-config
  ref:
    branch: main
---
apiVersion: kustomize.toolkit.fluxcd.io/v1beta2
kind: Kustomization
metadata:
  name: app-config
spec:
  interval: 10m
  path: ./overlays/production
  prune: true
  sourceRef:
    kind: GitRepository
    name: app-source

2. Backup and Disaster Recovery

apiVersion: velero.io/v1
kind: Schedule
metadata:
  name: daily-backup
spec:
  schedule: "0 0 * * *"
  template:
    includedNamespaces:
      - production
    storageLocation: default
    volumeSnapshotLocations:
      - default
    ttl: 720h

Best Practices Checklist

Cluster Management
- Use managed node groups
- Implement multi-AZ deployment
- Keep control plane updated
- Use cluster autoscaling
Security
- Enable IRSA
- Implement network policies
- Use security groups
- Regular security audits
Resource Management
- Set resource quotas
- Configure HPA/VPA
- Implement pod disruption budgets
- Use efficient pod scheduling
Monitoring
- Enable Container Insights
- Set up Prometheus/Grafana
- Configure alerts
- Implement logging
Cost Optimization
- Use Spot Instances
- Implement autoscaling
- Regular cost analysis
- Resource cleanup

Troubleshooting Guide

Common issues and solutions:

Node Group Issues
- Check IAM roles
- Verify security groups
- Review capacity issues
Networking Problems
- Validate CNI configuration
- Check network policies
- Review service mesh setup
Resource Constraints
- Monitor resource usage
- Review quota limits
- Check scaling policies

References

EKS Best Practices Guide - Official AWS EKS best practices
EKS Workshop - Hands-on EKS tutorials
Kubernetes Documentation - Official Kubernetes docs
AWS EKS Documentation - Official EKS documentation
Container Insights - Monitoring guide
EKS Security - Security best practices
Cluster Autoscaler - AWS specific configuration
EKS Networking - Networking requirements

Amazon EKS Best Practices: Optimizing Your Kubernetes Clusters

Cluster Design Best Practices

1. Node Group Configuration

2. High Availability Setup

Security Best Practices

1. IAM Roles for Service Accounts (IRSA)

2. Network Policies

Resource Management

1. Resource Quotas

2. Horizontal Pod Autoscaling

Monitoring and Logging

1. CloudWatch Container Insights

2. Prometheus and Grafana Setup

Cost Optimization

1. Spot Instances Configuration

2. Cluster Autoscaler Settings

Operational Excellence

1. GitOps Implementation

2. Backup and Disaster Recovery

Best Practices Checklist

Troubleshooting Guide

References

Related Posts

Container Orchestration: A Deep Dive into Kubernetes and Beyond

Docker Kubernetes Integration: A Complete Guide

GitOps Tools Comparison: A Comprehensive Guide

Kubernetes Operators: A Complete Development Guide

AWS ECS vs EKS in 2024: A Comprehensive Comparison

How to Implement GitOps with AWS EKS and ArgoCD

Orchestrating AI/ML Workloads with Kubernetes: Best Practices

Azure Kubernetes Service (AKS): Enterprise-Scale Implementation

Implementing a Multi-Cloud Strategy: Architecture and Best Practices

Infrastructure Automation at Scale: Tools and Techniques