Nele Uhlemann
Handling incidents collaboratively is like solving a rubix cube
#1about 4 minutes
The Rubik's Cube metaphor for engineering teams
Different engineering teams like backend and SREs operate on different sides of the system, requiring collaboration during incidents.
#2about 3 minutes
The first phase of resolving incidents collaboratively
The initial step in incident response is to establish a common understanding and transparency across teams before applying quick fixes.
#3about 2 minutes
Preventing future incidents with best practices
After resolving an incident, teams must collaborate on prevention by documenting best practices for patterns like service retries.
#4about 2 minutes
Discovering incidents through system observability
The discovery phase relies on making systems observable by collecting telemetry data like logs, metrics, and traces.
#5about 2 minutes
Standardizing telemetry collection with OpenTelemetry
OpenTelemetry provides a vendor-neutral standard for instrumenting applications, preventing vendor lock-in for observability backends.
#6about 2 minutes
Simplifying metrics with the Autometrics library
The open-source Autometrics library uses decorators to automatically generate key metrics like latency, errors, and request rate from functions.
#7about 5 minutes
Demo of generating metrics and SLOs from code
A live demo shows how Autometrics provides live metrics in the IDE and helps define SLOs that can be visualized in Grafana.
#8about 1 minute
Summary of collaborative incident management phases
A recap of the three key phases for collaborative incident handling: resolving, preventing, and discovering issues together.
#9about 2 minutes
Q&A on tooling and open source contribution
The speaker answers audience questions about managing tool complexity and the role of community contributions in open-source projects.
Related jobs
Jobs that call for the skills explored in this talk.
Matching moments
03:53 MIN
Applying agile and SRE principles to incident response
Applying Agile Principles to Incident Management
04:41 MIN
Actionable takeaways for SREs on incident management
Serverless Observability: where SLOs meet transforms
05:45 MIN
Using an incident console to manage response and resolvers
Applying Agile Principles to Incident Management
02:39 MIN
Fostering cross-team collaboration with SLOs
Serverless Observability: where SLOs meet transforms
01:31 MIN
Understanding observability and the need for a process
Mastering AI-Driven Problem Solving in Engineering with Observability
04:52 MIN
Handling operational challenges and infrastructure failures at scale
How building an industry DBMS differs from building a research one
01:40 MIN
How engineers handle production errors and monitoring
DevOps at Netflix
06:29 MIN
Overcoming observability challenges with a unified platform
All your telemetry data from any source in one place
Featured Partners
Related Videos
Applying Agile Principles to Incident Management
Tobias Dunn-Krahn
Mastering AI-Driven Problem Solving in Engineering with Observability
Jemiah Sius
Empathy: The secret sauce of Resilience
Malin Litwinski
SRE Methods In an Agency Environment
Martin Beránek
Metrics Handle with Care: The Paradox of Measuring Team Performance
Stefan Stelzer & Volker Zöpfel
Breaking Silos: Successful Collaboration Between Tech & Business Teams in Complex Enterprise Systems
Stefan Menschner & Alexander Weißhaupt
We adopted DevOps and are Cloud-native, Now What?
Bruno Amaro Almeida
Planet-Scale Dashboards
Robert Lehmann
Related Articles
View all articles
.gif?w=240&auto=compress,format)

.png?w=240&auto=compress,format)
From learning to earning
Jobs that call for the skills explored in this talk.

IKEA
Amsterdam, Netherlands
Senior
Team Leadership
Product Management
Project Management

Peter Park System GmbH
München, Germany
Senior
Python
Docker
Node.js
JavaScript


AUTO1 Group SE
Berlin, Germany
Intermediate
Senior
ELK
Terraform
Elasticsearch

CONTIAMO GMBH
Berlin, Germany
Senior
Python
Docker
TypeScript
PostgreSQL


egocentric Systems GmbH
Dresden, Germany
Intermediate
Senior
DevOps
Kubernetes

Peter Park System GmbH
München, Germany
Intermediate
Senior
Bash
Linux
Python
