In the beginning of 2020, we were all faced with an absolute truth: the system is not working the way we all expected. Shockingly, a new highly spreadable, deadly virus means that we had to take extreme measures, fight adversity and sadly endure millions of excess deaths. In the wake of this tragedy, us humans, responded to the problem in various ways - all of which had one goal: put ourselves in a position where the system can run smoothly eventually again while it was resilient enough to fight issues as they occurred.
Biological systems are complex
How is it ever possible to know and predict everything? It’s not. What is possible is to build resilience. To build resilience, the first thing we need to do is know what is going on. How big is the problem? Where? Why? How? At first, we didn’t have enough data. We started testing people for the virus. We built a continuous stream of signals that gave us insights. Over time, we get more acquainted with the data, we got better at capturing it, interpreting it, sharing it, knowing when an anomaly occurs quickly and easily. This took time. It did not happen overnight.
It also required a change in mindset: individuals had to be more aware of their health status and the potential collateral damage they caused to people around them or even across the world. We all realised how interconnected we are. At the same time, this empowered us to be better at observing. Better at monitoring the signals. Better at reporting, acting and reacting to the signals.
Systems generally work
That’s the world and humans. My day to day job is being a Software Engineer. It is obvious to me that the process we have all gone through in the last couple of years is akin to observability in software systems. It’s not that we don’t have a world and human bodies that generally work. We do. But what happens when things start not working? How do we go about it?
Software systems have a story too
Likewise, in software systems, we expect the systems to do what is more or less expected by their purpose. There is a story that they are part of and that simply stands proud. Most of the times. The question is what happens when things don’t work. Or simply when we want to gauge how the system is performing. How well and proudly the story is standing. The solution to this is to build observable systems. Software systems that allow us to ask questions and build familiarity with the system. The system should allow us to query its outputs and give us reliable enough insights. They will never be the absolute truth. But they inform us and help us build resilience.
It’s all about feedback
We can’t make radical or even normal decisions without feedback from our systems - whether they are human systems or software systems. So my advice would be: pay attention to your systems - whether it’s individual components or the system as a whole. Get to know it. Start getting a feeling about what granularity of the data you need to understand and analyse. Learn how to choose where and how to get that data. Learn how to collect that data in the first place.
A better world
A better world and better software systems is reliant on collectively having a better understanding of how it is performing when we really need to. Work towards that goal every day. Today’s imminent threat is the pandemic, tomorrow’s threat is climate change.