Digital transformation isn’t a new pursuit for businesses today. Instead of a destination, businesses have started seeing it as a continuum- the further along a company is, the better it will adapt and thrive in the face of change and uncertainty, similar to the COVID-19 pandemic.
Download the full whitepaper – Building a Resilient Company in the New Normal
Introduction to IT Resilience
Until the pandemic struck, organizations were crawling within their comfort zone, looking at digital transformation projects as long-horizon initiatives that stretched into the future. The horizon accelerated into view at a much quicker pace, exposing organizations to their reality of IT resilience.
IT resilience is commonly defined as a company’s ability to deal with technology disruption. Organizations with proficient digital capabilities have systematically future-proofed themselves with faster, cheaper and agile ways of working remotely- gaining a competitive advantage over competitors during the pandemic.
Moreover, the latest technology helped companies develop newer ways of making themselves available and preferred by their customers. Technological capabilities helped them keep a tab on the market’s pulse and adapt their offerings in-tune with customer needs.
Technologically resilient firms continue to thrive in a period of adversity.
Although, we can’t say that the pandemic was the only time organizations came across said tech disruption. While global uncertainty peaked during the pandemic, according to the International Monetary Fund and Stanford University, it had been rising for over 30 years, especially in the last decade.
Severe Tech Outages and Their Impact
In September 2019, the stock of Slack took a 14% dip after its quarterly earnings report revealed that the company took an $8.2 million revenue hit after giving credits to customers following service-level disruptions.
In October 2021, Facebook and its family of apps, including WhatsApp and Instagram, faced a six-hour-long outage, rendering its services inaccessible to billions of global users.
Technology resilience is a vital aspect of any organization today.
The Ponemon Institute estimated the average cost of an unplanned tech outage to be nearly $90,000 per minute. Businesses have been dealing with hundreds of resilience incidents perpetually as the associated expenses escalate both in money and customer satisfaction.
When faced with a tech disruption, many companies focus on restoring service and fixing the issue. Understandably, technology teams reach for the solution that would solve the problem in the moment.
However, duct-taping gaps might work only so long. Consequently, organizations are constantly grappling with technology flare-ups, fixing one issue before the next one.
Let’s look at how startups can build a resilient tech stack and strengthen IT resilience.
Designing a Resilient Tech Stack
McKinsey identifies seven critical steps to building a resilient technology stack.
Focus on User Journey
Instead of focusing on the critical assets- systems and applications- organizations should solve the weakest link in the customer journey. This will help organizations move away from the duct-taping approach and help identify the critical components of the user journey that may make or break their experience.
IT resilience, in effect, is not about modernizing applications but about understanding how all applications, API calls and third-party dependencies work in tandem to result in a unified customer journey and then identifying the critical components to strengthen.
Instead of viewing resiliency as an IT infrastructure issue, companies should take a risk-based approach. The first can be a business-driven, top-down approach that prioritizes journeys related to risk. For instance, companies should ask which customer journeys significantly impact revenue and customer satisfaction.
And the second approach can be bottom-up, which calculates a technology component’s risk profile to create a risk profile of that particular asset. A detailed risk profile will include the probability of failure, its impact and the ability to detect and minimize it.
Data and Analytics
No matter how strong an organization is technologically, there will be IT incidents and related IT data. Many organizations have disparate and legacy tools to handle IT incidents, meaning their data isn’t ripe for insights, discovery and decision-making.
By using artificial intelligence and data analytics, organizations can get better insight into the why of tech disruption instead of getting to know only the when. Better data and analytics can inform IT departments and senior leadership around the most common tech disturbances.
Design for Anomalies
Traditionally, organizations undertake capacity planning and maintain a buffer of 50 percent on top of it for higher traffic. The pandemic resulted in surges in digital traffic to 300 to 500 percent, rendering many legacy systems useless and incapable of handling such traffic.
Infrastructure capabilities such as containerized applications can augment capacity across the technical stack and address issues in the middleware, such as message queues. Designing for anomalies means organizations are more resilient to sudden surges in traffic.
Automate and Address
More resilient organizations continually invest in talent acquisition and reskilling initiatives around DevOps automation. These initiatives enable modern engineering practices such as CI/CD pipelines to automate software deployments.
Companies can improve uptime by employing these engineering practices and quickly identify and address IT issues with automation.
The One-person Syndrome
Delegating too many responsibilities to a few people can be a bottleneck to resiliency. If you notice a few people in your organization helping everyone, knowing how to take care of everything, it’s time to appreciate teams that promote better resilient behaviors team-wide.
The one-person syndrome harms an organization in many ways. One of them is it discourages other employees from stepping up and learning how to handle responsibilities and fix gaps. People are also a significant part of your technology stack. Promote resilience within teams.
Failure is inevitable and a constant. Companies should get more proactive in identifying gaps in IT before they expand and become glaring. Some methods companies can employ include pre-mortem analysis, chaos engineering, strategy testing and problem simulation.
In the big picture, the livelihood of your business and employees depends on the technology working for your customers. A combination of reliability, scalability and redundancy can achieve organizations the needed resilience.
As organizations accelerated innovation in the wake of the pandemic, they faced severe service disruptions. However, it’s not too late to build more resilient tech organizations.