Arto Liukkonen
I broke the production
#1about 6 minutes
A personal story of breaking production at scale
The speaker recounts causing a major production outage by running a backfill script that overwhelmed the Facebook API and halted data updates.
#2about 2 minutes
Judging intentions versus actions during incidents
We tend to judge others by their actions but ourselves by our intentions, so we should assume good intent from colleagues during incidents.
#3about 2 minutes
Why individual blame is a counterproductive response
When a production issue occurs, it's a system failure, not an individual's fault, as responsibility is shared across developers, reviewers, and processes.
#4about 3 minutes
How to build a psychologically safe blameless culture
Shifting to a blameless culture requires fostering trust, understanding intentions, practicing self-awareness, and owning mistakes without displacing frustration.
#5about 2 minutes
Using blameless postmortems for system-level learning
Blameless postmortems, originating from aviation and healthcare, focus on investigating root causes to strengthen systems rather than assigning individual blame.
#6about 3 minutes
The power of positive feedback in code reviews
Applying the five-to-one ratio of positive to negative interactions can improve team dynamics, especially by adding positive comments during code reviews.
#7about 2 minutes
Using pre-mortems to proactively prevent failures
Pre-mortems are a proactive exercise where teams imagine a project has already failed in order to identify potential risks and edge cases beforehand.
#8about 3 minutes
Incident resolution and key cultural takeaways
The incident took 20 hours to fully resolve but was a valuable learning experience that exposed system flaws and reinforced a healthy team culture.
#9about 2 minutes
Q&A on customer impact and worst production breaks
The speaker answers audience questions about customer reactions to the outage and shares a story about his worst production break involving a failed form.
Related jobs
Jobs that call for the skills explored in this talk.
Matching moments
01:40 MIN
How engineers handle production errors and monitoring
DevOps at Netflix
04:07 MIN
The pitfalls of a "move fast and break things" culture
Navigating the Future of Junior Developers in Tech
02:26 MIN
How blame culture undermines your testing strategy
Your Testing Strategy is broken - lets fix it!
02:46 MIN
The negative impact of 'move fast' culture
Navigating the Future of Junior Developers in Tech
07:45 MIN
Q&A on production code analysis and performance bottlenecks
Data Science on Software Data
05:52 MIN
Q&A on shared systems and scaling productivity
Forget Developer Platforms, Think Developer Productivity!
05:05 MIN
Shifting from blame to learning in incident analysis
Empathy: The secret sauce of Resilience
01:58 MIN
What we can learn from high-profile software failures
The Software Bug All Stars - and what we can learn from them
Featured Partners
Related Videos
Answering the Million Dollar Question: Why did I Break Production?
LuĂs Ventura
Shipping Quality Software In Hostile Environments
Luka Kladaric
What we Learned from Reading 100+ Kubernetes Post-Mortems
Noaa Barki
What I learned as a developer from accidents in space
Andrey Sitnik
Chaos in Containers - Unleashing Resilience
Maish Saidel-Keesing
Empathy: The secret sauce of Resilience
Malin Litwinski
The Software Bug All Stars - and what we can learn from them
Christian Seifert
Building a culture from chaos
Steve Upton
Related Articles
View all articles



From learning to earning
Jobs that call for the skills explored in this talk.

iits-consulting GmbH
MĂĽnchen, Germany
Intermediate
Go
Docker
DevOps
Kubernetes

Mittwald CM Service GmbH & Co. KG
Espelkamp, Germany
Intermediate
Senior
Linux
Docker
DevOps
Kubernetes

Peter Park System GmbH
MĂĽnchen, Germany
Senior
Python
Docker
Node.js
JavaScript

smartclip Europe GmbH
Hamburg, Germany
Intermediate
Senior
GIT
Linux
Python
Kubernetes


CONTIAMO GMBH
Berlin, Germany
Senior
Python
Docker
TypeScript
PostgreSQL


IKEA
Amsterdam, Netherlands
Intermediate
Azure
Kubernetes
Google Cloud Platform
Amazon Web Services (AWS)
Scripting (Bash/Python/Go/Ruby)

SYSKRON GmbH
Regensburg, Germany
Intermediate
Senior
.NET
Python
Kubernetes