Chaos engineering is a kind of contradiction: it works against the very system it is protecting in order to build an environment that is more resilient and more secure. How does it work? How is introducing errors useful and how does it help to secure the digital environment? Understanding this discipline can lead to substantial improvements.
What is it?
The concept of chaos engineering is based on four principles defined by Netflix. These principles consist of defining a “stable” state, making a hypothesis of the state that will follow, introducing variables that reflect events true to reality, and trying to break the hypothesis (in that order).
Through a series of tests, characteristics of the infrastructure, such as availability, security, and performance, are assessed. The goal is to resolve problems in these distributed systems in order to bolster recovery capabilities for the entire system. This means, in short, getting structures that withstand extreme conditions.
Resilience and “antifragility”
The concept of chaos engineering is only understood if we understand the definition of “antifragility”, a term coined by Nassim Nicholas Taleb. This is the precursor concept of chaos engineering and, in turn, is based on resilience. Resilience is defined as the ability to absorb disturbances. These disturbances are caused by stressors, or stress factors, that trigger destabilization.
It is a concept widely used in living organisms (ecology, physiology, psychology, etc.) and refers to the ability to overcome problems actively and adapt to the situation. “Antifragility” goes beyond resilience since it implies the evolution of a system, which would be able to grow from the stress to which it has been subjected to adapt to new failures.
Panda Adaptive Defense is a tool that keeps a close eye on the principles of antifragility and adds resilience to the company, while increasing visibility into the state of the corporate network.
The Simian Army
Taking all this into account, large companies such as Netflix or Amazon see in chaos engineering the possibility of testing their infrastructure to make their systems more mature and increasingly robust — and also more evolved. In short, more resilient. Since performing an analysis and correcting a problem in a repetitive and escalating way is a very difficult task, they use heuristic strategies focused on prioritizing decision-making aimed simply at resolving problems.
Thus, Netflix, for example, uses its own suite of applications called the Simian Army, which tests the stability of its network. Simian Army has more than a dozen stressors that test the system in various ways. Security Monkey, for expample, is just one “piece” of the Simian Army. It implements a security strategy into cloud-computing platforms based on chaos engineering.
How can chaos engineering help companies?
The first question is, why should a company consider using chaos engineering?
Implementing a strategy based on chaos engineering helps to work the antifragility of a platform, including meeting the control objectives and requirements of PCI-DSS in case of audits. Thus, any company could benefit greatly from implementing a tool such as Security Monkey in its security strategy.
This would require a “chaosification” of the platform in a controlled manner, which could consist of actions of the following type: disable SG (Security Groups) rules, modify files at random, randomly listen to ports, inject malicious traffic into the VPC (Virtual Private Cloud), randomly kill processes while they are taking place… and the list of havoc-wreaking could go on.
Thanks to this tool (or strategy), a deeper visibility of the consequences of attacks can be achieved with the intention of improving defenses. This, in the long run, is the basis of a more mature and reliable system, capable of recovering from attacks and reducing losses in the face of a serious security incident, something that should be mandatory for any high availability service.
6 comments
Panda Adaptive Defense is a tool that keeps a close eye on the principles of antifragility and adds resilience to the company while increasing visibility into the state of the corporate network.
Thanks for your feedback!
Hey, This is a great article. I was searching for the same and I found the best content here. Thanks for sharing the info. keep on doing the great work.
I came across Panda Security on the internet & decided to give it a try. So far I am absolutely happy with it & will continue to use it. 🙂
I have been looking for this information for quite some times. Will look around your website .
Your blog is greatly appreciated, and I am very glad to read the blog. Got some knowledge. I want to tell you that you always keep posting blogs on new topics, we will look forward to your blog.