Setting up EFS on EKS

Published: 1 April 2020

Posted in:

The following is part of a series of posts called "Building a complete Kubernetes backed infrastructure".

This series of posts describes my approach to standing up a scalable infrastructure for hosting both internal and external software in a company setting.

This series focuses on AWS specifically as I have found EKS to be the most complicated Kubernetes provider to set up, but the principles and workflow should be easily applied to any other provider or on-prem situation.

Building a complete Kubernetes backed infrastructure - 12 March 2020
Creating a Kubernetes cluster on AWS - 13 March 2020
Adding ingress-nginx to an EKS cluster - 18 March 2020
Setting up OpenVPN on EKS - 26 March 2020
Setting up EFS on EKS - 1 April 2020
Setting up a private docker registry on AWS - 21 April 2020
Creation of new users and granting kubectl access - 5 May 2020
Set up Github Actions for EKS deployments - 17 May 2020
Adding Kubernetes dashboard to a cluster - 8 June 2020

If you’ve used Kubernetes for a least a little while you’ll be aware of the issues that can be caused when using PersistentVolumes that only support ReadWriteOnce claims and not ReadWriteMany.

With a ReadWriteOnce claim, only one pod can be assigned a given volume claim and have it mounted at any given time.

This might not sound like an issue if you’re only planning to run a single replica of a service, and not scale horizontally, which is often the case especially for personal projects, but there are a number of unseen issues in this case.

One of the biggest problem with ReadWriteOnce volume claims is when you are using a Deployment and you want to do rollouts. A rollout will typically create a new ReplicationController, which will create the require Pod, check things have started correctly, then switch traffic over to the new Pod before deleting the old ReplicationController.

The issue here is the new Pod will not be able to start as it will not be able to mount the PersistentVolumeClaim which will currently be bound to the exiting Pod. Basically, ReadWriteOnce stops you from doing Deployment based roll-outs and introduce the need for service downtime as the existing Pod must be removed before starting the new Pod.

To get around this, instead of using the default gp2 storage class which is made available when creating the cluster, a new storage class can be created which is backed by AWS EFS. EFS is basically a NFS offering with a different name.

Using EFS/NFS allows multiple pods to access a given PersistentVolumeClaim at the same time, avoiding the issue outlined previously.

To set up, I used this repo. I’ve found that there is a lot of conflicting information regarding getting EFS working in EKS, but this was the best source to follow. It is installed via the following:

kubectl apply -k "github.com/kubernetes-sigs/aws-efs-csi-driver/deploy/kubernetes/overlays/stable/?ref=release-1.0"

You’ll also need to create a new EFS drive. It is pretty easy to create a new EFS volume in the AWS console and you can just select the defaults when creating. You also need to create a new security policy to allow your EC2 instances within your Kubernetes cluster to access EFS. To do this, go to the EC2 dashboard in the AWS console and navigate to the Security Groups section. From here create a New Security Group. Add a name and description for the policy and ensure that you select the VPC of your cluster. Add an inbound rule with the type of NFS and for the source. In my case I added 172.31.0.0/16 allowing anything from the shared VPC (you can check this within your VPC, it is just what is listed as the IPv4 CIDR).

Adding this security policy to the EFS drive you just created is done via the Network tab in the EFS view. ‘Manage’ the mount targets which were automatically created and change the security policy to the one you just created. Without this, the EFS mounts will fail with some very unhelpful error messages, so if things don’t work, this might be the first thing that you check is set up correctly.

Once complete, take a note of the file system id for the new drive which is in the form of fs-XXXXXXX. This is required in the Kubernetes configuration.

Next, it is a case of creating a new storage class for EFS, which can be done by applying a configuration such as the following.

kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
  name: aws-efs
provisioner: efs.csi.aws.com

After the storage class is available the next step is to create a PersistentVolume using the filesystem id of the EFS volume that was just created, like the below:

apiVersion: v1
kind: PersistentVolume
metadata:
  name: VOLUME_NAME
spec:
  capacity:
    storage: 1Gi
  volumeMode: Filesystem
  accessModes:
    - ReadWriteMany
  persistentVolumeReclaimPolicy: Retain
  storageClassName: aws-efs
  claimRef:
    name: VOLUME_PVC
  csi:
    driver: efs.csi.aws.com
    volumeHandle: FILE_SYSTEM_ID

After running the above you should now have a provisioned PersistentVolume. To claim that volume, need to create a PersistentVolumeClaim like the below.

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: VOLUME_PVC
spec:
  accessModes:
    - ReadWriteMany
  storageClassName: aws-efs
  resources:
    requests:
      storage: 1Gi

Any number of pods can now use this PersistentVolumeClaim and mount it as they want. Apart from the benefit explained above of now being able to do Deployment rollouts, there is also another great benefit that different services can share a common file system. This is obviously an anti-pattern if not done correctly, as services should be built around a well designed and defined API but if done with limited scope with proper guardrails this can be beneficial.

One such example is a Hadoop / MapReduce style pattern where different services can take directories as input and output into other directories for other services to pick up. On a previous project this worked fantastic, where we had a good number of integrations with external parties that would extract data and store it within a directory in EFS, another worker service would pick up these files and transform them into a number of outputs, which in turn was processed by dedicated loading services. This workflow made things much more scalable than doing a full ETL system for each integration.

The efs driver does have the ability as well to load an EFS directory as the mount as well. This can be leveraged so that you only need one EFS instance and then each service can have it’s own directory on the one EFS. This is done via the volumeHandle property and will look something like fs-xxxxxxxx:/service-name, but beware, if that directory doesn’t exist on the EFS when you try to mount it the mount will fail. You will need to create these directories on the EFS volume manually prior to using them.

Posted in: