Dainius Jocas

May 12, 2021 • WeAreDevelopers LIVE

Don't Change the Partition Count for Kafka Topics!

A well-intentioned infrastructure change silently corrupted our search index. Discover how increasing a Kafka topic's partition count can break your entire data pipeline.

#1about 5 minutes

An overview of the data indexing pipeline architecture

The system moves data from a MySQL primary data store to an Elasticsearch search server using a Kafka and Kafka Connect pipeline.

#2about 1 minute

Using Kafka partition offset for optimistic concurrency control

The system leverages the Kafka partition offset as the document version number in Elasticsearch to enable parallel indexing without data consistency issues.

#3about 2 minutes

Investigating a mysterious data deletion failure in production

A bug report about Elasticsearch failing to delete documents, which serves stale data, could not be reproduced in local or testing environments.

#4about 5 minutes

Discovering the offset and version number mismatch

Manual inspection reveals that the document version in Elasticsearch is significantly higher than the new message offset in the Kafka topic for the same key.

#5about 4 minutes

How changing partition count breaks message ordering guarantees

Increasing the Kafka topic's partition count changes the key hashing algorithm, causing new messages for the same key to land in different partitions with lower offsets.

#6about 4 minutes

The solution and key lessons for managing Kafka topics

The fix required a full data re-ingestion into a new Kafka topic, highlighting the lesson to never increase partition count when message ordering is critical.

Andrew Comp
Cosio Valtellino, Italy

Intermediate

TypeScript

Cards Co

Remote

Intermediate

JavaScript

TypeScript

Name of

Remote

Intermediate

PHP

Java

+1

Exploring Kafka's core concepts of events, topics, and partitions

01:45 MIN

Exploring Kafka's core concepts of events, topics, and partitions

Let's Get Started With Apache Kafka® for Python Developers

Recapping Kafka's capabilities for real-time data feeds

01:17 MIN

Recapping Kafka's capabilities for real-time data feeds

Let's Get Started With Apache Kafka® for Python Developers

Navigating the Kafka ecosystem and the power of community

03:30 MIN

Navigating the Kafka ecosystem and the power of community

Let's Get Started With Apache Kafka® for Python Developers

Answering questions on Kafka use cases, careers, and learning

22:41 MIN

Answering questions on Kafka use cases, careers, and learning

Let's Get Started With Apache Kafka® for Python Developers

Common challenges of running Kafka at scale

05:28 MIN

Common challenges of running Kafka at scale

Tips, Techniques, and Common Pitfalls Debugging Kafka

A traditional approach to streaming with Kafka and Debezium

04:23 MIN

A traditional approach to streaming with Kafka and Debezium

Python-Based Data Streaming Pipelines Within Minutes

Decoupling microservices with event streams

03:41 MIN

Decoupling microservices with event streams

From event streaming to event sourcing 101

Managing data consistency with change data capture

01:34 MIN

Managing data consistency with change data capture

Software Engineering Social Connection: Yubo’s lean approach to scaling an 80M-user infrastructure

Featured Partners

Tips, Techniques, and Common Pitfalls Debugging Kafka

Tips, Techniques, and Common Pitfalls Debugging Kafka

DeveloperSteve

about 2 years ago • WeAreDevelopers LIVE

Let's Get Started With Apache Kafka® for Python Developers

Let's Get Started With Apache Kafka® for Python Developers

Lucia Cerchie

about 3 years ago • WeAreDevelopers LIVE

How to Benchmark Your Apache Kafka

How to Benchmark Your Apache Kafka

Kirill Kulikov

about 5 years ago • WeAreDevelopers LIVE

Kafka Streams Microservices

Kafka Streams Microservices

Denis Washington & Olli Salonen

about 5 years ago • World Congress 2021

Why and when should we consider Stream Processing frameworks in our solutions

Why and when should we consider Stream Processing frameworks in our solutions

Soroosh Khodami

about 2 years ago • World Congress 2024

Practical Change Data Streaming Use Cases With Debezium And Quarkus

Practical Change Data Streaming Use Cases With Debezium And Quarkus

Alex Soto

about 3 years ago • WeAreDevelopers LIVE

Distributed search under the hood

Distributed search under the hood

Alexander Reelsen

about 4 years ago • WeAreDevelopers LIVE

From event streaming to event sourcing 101

From event streaming to event sourcing 101

Gerard Klijs

about 5 years ago • WeAreDevelopers LIVE

Related Articles

View all articles

CH

Chris Heilmann

Dev Digest 134 - Where pixels sing?

News and ArticlesWeAreDevelopers LIVE Data and Security Day is on Wednesday, 25/09/2024. Learn about OPC UA Updates, Best Practices for Using GitHub Secrets, Passwordless Web 1.5, Emerging AI Security Risks, Data Privacy in LLMs and get a chance to t...

Dev Digest 134 - Where pixels sing?

CH

Chris Heilmann

Dev Digest 109 -Egg-citing things…

As we are heading into the Easter break, here are some things to spend some time on. There's resources on improving the performance of your code and you hear from the winners of CODE100 Amsterdam what it was like to be on stage. Also, hang tight as t...

Dev Digest 109 -Egg-citing things…

CH

Chris Heilmann

WeAreDevelopers LIVE days are changing - get ready to take part

Starting with this week's Web Dev Day edition of WeAreDevelopers LIVE Days, we changed the the way we run these online conferences. The main differences are:Shorter talks (half an hour tops)More interaction in Q&AA tips and tricks "Did you know" sect...

WeAreDevelopers LIVE days are changing - get ready to take part

CH

Chris Heilmann

Dev Digest 138 - Are you secure about this?

Hello there! This is the 2nd "out of the can" edition of 3 as I am on vacation in Greece eating lovely things on the beach. So, fewer news, but lots of great resources. Many around the topic of security. Enjoy! News and ArticlesGoogle Pixel phones t...

Dev Digest 138 - Are you secure about this?

From learning to earning

Jobs that call for the skills explored in this talk.

Senior DevOps Engineer - Search & Services - (f/m/x)

AUTO1 Group SE
Berlin, Germany

Intermediate

Senior

ELK

Terraform

Elasticsearch

Mobfox
Vienna, Austria

Senior

Java

Unit Testing

Data Engineer (f/m/d) - AI

smartclip Europe GmbH
Hamburg, Germany

Intermediate

Senior

ETL

Java

Scala

Senior DevOps Engineer - Edge Data Platform (all genders)

SYSKRON GmbH
Regensburg, Germany

Intermediate

Senior

.NET

Python

Kubernetes

Java or Scala Developer

Journi
Vienna, Austria

Intermediate

Java

Scala

PHP Backend Developer

Storyclash
Linz, Austria

Intermediate

PHP

PL/SQL

ING DiBa
Vienna, Austria

Intermediate

Java

Spring

Java EE

Hibernate

Technology Architect - Apache Kafka, Confluent - Germany

Infosys Limited

Ansible

Kubernetes

Apache Kafka

Microservices

Senior Entwickler - Java Spring & Hibernate & Kubernetes & Microservices & Kafka

Cosonic GmbH

Remote

€60-80K

Senior

Docker

PostgreSQL

Kubernetes

+2