Testing the Resilience of your Kubernetes systems using Chaos Mesh

Testing the Resilience of your Kubernetes systems using Chaos Mesh

August 7, 2023 Operator Officer

Heard of Chaos Engineering??

As per Wikipedia, Chaos engineering is the discipline of experimenting on a software system in production in order to build confidence in the system’s capability to withstand turbulent and unexpected conditions.

Or in layman’s language, injecting faults and failures deliberately into your system to see if it can cope and perform as expected in these failure scenarios.

With more and more organisations globally adopting Kubernetes as their de-facto container orchestrator, it becomes the need of the hour to test the resilience and response of our Kubernetes systems during failures and outages.

This is where `Chaos Mesh` an open-source tool is perfectly placed. It helps introduce or inject failures into your Kubernetes setup and lookout and fix any abnormal or unexpected behaviour. The idea is that we prepare our systems and make them self-resilient to common failures and outage scenarios even before they occur in realtime.

Chaos Mesh is an open-source, cloud-native Chaos Engineering platform built using Kubernetes custom resource definitions (CRDs). It can simulate different types of faults and also has the capability to orchestrate fault scenarios. In fact, we can consider Chaos Mesh as the Swiss army knife for implementing Chaos Engineering on Kubernetes systems.

Chaos Mesh is built exclusively for Kubernetes and features all-around fault injection methods for complex systems on Kubernetes, which covers faults in Pod, Network, file systems, and even the kernel.

Currently, more than 50 organizations are using Chaos Mesh to test and improve the resiliency of their systems. Some of these adopters include anmes like ByteDance, DataStax, Percona, Prudential, NetEase Fuxi, RabbitMQ, SHAREit, XPeng Motors etc

Chaos mesh is recently admitted as a CNCF Sandbox project and has gained huge popularity in recent times.

Main components of Chaos Mesh:
Chaos Dashboard: The visualization component of Chaos Mesh. Chaos Dashboard offers a set of user-friendly web interfaces through which users can manipulate and observe Chaos experiments.
Chaos Controller Manager: The core logical component of Chaos Mesh. Chaos Controller Manager is primarily responsible for scheduling and managing Chaos experiments.
Chaos Daemon: The main executive component. Chaos Daemon runs in the DaemonSet mode and has Privileged permission by default (which can be disabled).
Chaosd: A toolkit to inject failures into non-Kubernetes nodes.

Some of the features that Chaos Mesh supports:
* Chaos injection
* Pod crash
* Network failure
* Load test
* I/O failure
* Event tracking
* Associated alarm
* Timing telemetry

Alternatives to Chaos Mesh
So Chaos Mesh is not the only option for a chaos engineering platform with Kubernetes setup. We have another CNCF hosted competitor, called LitmusChaos. Much like Chaos Mesh, Litmus Chaos is also an open-source, cloud-native project that uses CRDs for chaos management, and is built for Kubernetes. Other options include the original chaos engineering tool Chaos Monkey, Gremlin that offers chaos engineering as a Service, Chaos Toolkit, and KubeInvader.