Philipp Krenn
Make Your Data FABulous
#1about 7 minutes
Understanding the CAP theorem for distributed systems
The CAP theorem states that a distributed data store can only provide two of three guarantees: consistency, availability, and partition tolerance.
#2about 3 minutes
Introducing the FAB theory for datastore tradeoffs
The FAB theory proposes another set of tradeoffs for data stores, where you can only pick two of three attributes: fast, accurate, or big.
#3about 7 minutes
How terms aggregation trades accuracy for speed
Elasticsearch's terms aggregation may return inaccurate counts by default because each shard only considers its top local results to improve performance.
#4about 8 minutes
Inconsistent relevance scores in distributed full-text search
Full-text search relevance scores using TF-IDF can be inconsistent because inverse document frequency is calculated per-shard, not globally.
#5about 2 minutes
Using a single shard to ensure data accuracy
Forcing an index to use a single shard guarantees accurate aggregations and relevance scores by eliminating distributed calculations, but sacrifices horizontal scaling.
#6about 1 minute
Why you must consciously choose your data tradeoffs
It is crucial to understand and explicitly choose the tradeoffs in your data systems, like those in the CAP and FAB theorems, to avoid unexpected behavior.
Related jobs
Jobs that call for the skills explored in this talk.
Matching moments
02:56 MIN
Navigating the challenges of distributed aggregations
Distributed search under the hood
03:31 MIN
Q&A on indexing, aggregations, and OpenSearch vs Elasticsearch
Search and aggregations made easy with OpenSearch and NodeJS
05:32 MIN
Optimizing compute, storage, and data transmission
A Hitchhiker's Guide to Resource Efficient Software
04:58 MIN
Optimizing performance with advanced data distribution methods
Fault Tolerance and Consistency at Scale: Harnessing the Power of Distributed SQL Databases
04:29 MIN
Introducing the core principles of Elasticsearch
Distributed search under the hood
01:17 MIN
Recapping Kafka's capabilities for real-time data feeds
Let's Get Started With Apache Kafka® for Python Developers
03:59 MIN
Modern data architectures and the reality of team size
Modern Data Architectures need Software Engineering
02:40 MIN
Distributing data using shards and replicas
Distributed search under the hood
Featured Partners
Related Videos
Distributed search under the hood
Alexander Reelsen
Leveraging Real time data in FSIs
Tim Faulkes
Modern Data Architectures need Software Engineering
Matthias Niehoff
Writing a full-text search engine in TypeScript
Michele Riva
How building an industry DBMS differs from building a research one
Markus Dreseler
Making Data Warehouses fast. A developer's story.
Adnan Rahic
In-Memory Computing - The Big Picture
Markus Kett
Database Magic behind 40 Million operations/s
Jürgen Pilz
Related Articles
View all articles
.gif?w=240&auto=compress,format)


From learning to earning
Jobs that call for the skills explored in this talk.

AUTO1 Group SE
Berlin, Germany
Intermediate
Senior
ELK
Terraform
Elasticsearch







