How to (and why) replace AWS CNI with Calico on AWS EKS cluster

All EKS clusters come with default AWS CNI plugin that provides some nice features like getting an address within VPC subnet range, with a performance of ENI. So why on earth you may want to use some other CNI?

Apart from some SNAT issues you may encounter while deploying first clusters there’s one BIG limitation of AWS CNI that comes from the number of IP addresses and ENI that you can assign to the single instance.

This table shows that limits. As you can see, e.g. for t3.large you can assign 3×12 = 36 IP addresses to a single EC2 instance. This seriously limits a number of pods that can be scheduled to a single node. It may or may not be a problem, but if you hit the wall with this limit, here’s the recipe on how to replace AWS CNI with Calico.

Remove existing AWS CNI components

First, we need to get rid of AWS CNI. But please, don’t just delete daemonset like other tutorials suggest as you’ll leave other parts of that component hanging in your cluster.

To do it properly, just to that:

curl https://raw.githubusercontent.com/aws/amazon-vpc-cni-k8s/release-1.5/config/v1.5/aws-k8s-cni.yaml > aws-cni.yaml
cat aws-cni.yaml | kubectl delete -f -

Add Calico components

Now, it’s time to add Calico CNI components. For a typical deploy it’s enough to use standard manifest (for larger deployment please read the docs and adjust manifests accordingly):

curl https://docs.projectcalico.org/manifests/calico.yaml > calico.yaml
kubectl apply -f calico.yaml

Disable max pods limit

Unfortunately, replacing CNI plugin is not enough, we need to also modify EKS boostrap script and add the following flag:

--use-max-pods false

Depending on the deployment method of the cluster there may be different ways to accomplish that, I use Terraform EKS module for that and in that case it’s as simple as adding that flag to bootstrap_extra_args:

module "eks" {
  source           = "terraform-aws-modules/eks/aws"
  version          = "8.2.0"
  ...
  worker_groups_launch_template = [
    {
      ...
      bootstrap_extra_args    = "--use-max-pods false"
    }
  ]
}

Terraform will rotate the nodes for you, after that you should be able to run (almost) as many pods as you wish :)

The last part is about fixing stuff that depends on CNI plugins.

Fix kube2iam

If you use kube2iam (IIRSa not elasic enough? :)), you need to change interface to cali+, e.g. for helm-based deployment you need to set:

host:
  iptables: true
  interface: cali+

Fix metrics-server

Another problem I discovered after deploying Calico is that EKS-managed Kubernetes API Server blocks internal CIDR that calico uses. The easiest and acceptable way to fix that is to set (also, for helm-based deployments) host networking for metrics-server pod:

hostNetwork:
  enabled: true

And that’s all – now you have a working EKS cluster without pod number limit :)