Make your resilience work actually work.

We fix broken resilience programs.

AI won't fix what your organization can't see.

We help engineering organizations understand why they keep having the same types of incidents — despite doing incident analysis, GameDays, operational readiness reviews, monitoring, chaos engineering, and increasingly, even AI.

Is this your reality?

  • The same types of incidents keep happening despite thorough postmortems and action items

  • You invested in chaos engineering but can't tell if it's actually making you more resilient

  • Your teams do incident reviews, but the learning never spreads beyond the room

  • You're spending more time fighting fires than improving your systems

  • You know something's broken organizationally, but can't pinpoint what

  • Leadership is asking "why does this keep happening?" and you don't have a good answer

  • You're deploying AI into operations but can't tell if it's making better decisions or just faster ones

  • You're betting on AI to improve operations but nobody can explain what happens when it's wrong

Sound familiar? These aren't isolated problems. They share a common root. Your organization has invested in resilience practices, but the feedback loops that should turn those practices into actual learning are broken. Incidents get reviewed but the insights don't spread. Experiments get run but the results don't change decisions. The gap between how your resilience program looks on paper and how it works in practice is where the real problem lives.

Here's how we help organizations break this cycle.


NEW BOOK

Why We Still Suck at Resilience

Your organization does incident reviews, runs GameDays, and practices chaos engineering. So why do the same incidents keep happening? This book explains why, and what to do about it.


Start with diagnosis, then build from there

Most organizations start with the Resilience Assessment. Once we've identified what's actually broken, we can help you strengthen specific capabilities or partner for long-term transformation. But diagnosis comes first—you need to know what to fix.

1. Resilience Assessment

People in a meeting room with laptops and gift bags, attending a presentation.
  • Your organization keeps having the same types of incidents despite doing retros, chaos engineering, and architecture reviews. Something organizational is broken—but what?

    What we do:

    We diagnose the feedback loop failures and organizational patterns that prevent your teams from learning and adapting.

    We embed with your teams to see how work actually happens versus how it's described. We participate in your incident reviews, observe GameDays, sit in on chaos experiments, and join operational readiness reviews. We watch how teams interact, what incentives and pressures they face, and where the gaps appear between policy and practice.

    Through this combination of observation and structured interviews, we identify exactly what's blocking resilience and give you a clear roadmap to fix it.

    What you get:

    ✓ Embedded observation of your actual resilience practices (incident reviews, GameDays, ORRs, chaos experiments)

    ✓ Stakeholder interviews across engineering, ops, and leadership

    ✓ Deep analysis of your feedback loops and incident patterns

    ✓ Written report with prioritized, actionable recommendations

    ✓ 2-hour executive readout session with your leadership team

    Timeline: 6-8 weeks from kickoff to delivery


2. Strengthen
(After Assessment)

Person pointing at sticky notes on a whiteboard with a shadow on the wall, likely during a brainstorming session.
  • Once we've identified what's broken, we help you build specific capabilities to address the gaps.

    These are focused 2-4 month engagements that build one capability deeply.

    Recent Strengthen engagements have included designing chaos engineering programs from scratch, rebuilding incident analysis processes to focus on organizational learning rather than action-item compliance, and establishing Operational Readiness Reviews that teams actually trust.

    Typical services:
    Chaos Engineering Programs - Design and implement systematic resilience testing

    Operational Readiness Reviews - Validate systems are actually ready for production

    Incident Analysis Process - Improve how your teams learn from failures

3. Transform
(Long-term Partnership)

Two hands holding puzzle pieces that fit together against a blurred outdoor background.
  • For organizations ready for comprehensive change, we offer ongoing strategic partnership to embed resilience into your culture and operations.

    We become a long-term partner embedded in your leadership rhythm, joining monthly strategy sessions, running quarterly health assessments, and serving as a sounding board when new challenges emerge.

    What this looks like:
    Monthly strategic sessions with leadership

    Quarterly organizational health assessments

    Ongoing advisory as you implement changes

    Access for architecture reviews and escalations

    Typical engagement: 6-12+ months of collaboration


Testimonials


Our team

Adrian Hornsby has spent nearly 25 years building and operating software systems, from research and telecommunications at Nokia through multiple startups to nearly a decade at AWS, where he progressed from Solutions Architect to Principal Engineer on the AWS Fault Injection Service team. He authored much of AWS's resilience and chaos engineering guidance, trained field communities across the organization, and worked with internal teams including Prime Video, Amazon Search, and Lambda. He also holds a patent for fault-injection impact zone identification.

He is the author of Why We Still Suck at Resilience and writes the Resilience Bites newsletter. His work draws on resilience engineering research to explain why organizations keep having the same incidents despite doing all the right things, and his framework goes beyond tooling and process into the organizational patterns, feedback loops, and tensions that determine whether resilience efforts actually work.

Today, through Resilium Labs, Adrian works with Fortune 500 companies and growth-stage organizations across the world to diagnose broken resilience programs and help teams build the capabilities to fix them. He advises VC portfolio companies on engineering practices, serves on advisory boards, and speaks regularly at conferences across the globe.

Based in Finland. Working globally.


Ready to stop fighting fires?

The best first step is a conversation to understand your current challenges and resilience goals. We'll explore whether our approach aligns with your needs and discuss which step in the journey makes sense for your organization.

There's no obligation, and this conversation alone often provides valuable perspectives on your resilience opportunities.

Your information remains confidential, and we'll respond promptly.

Resilium Labs Oy
+358 (0)504361615
adhorn@resiliumlabs.com