Aarno Aukia
DevOps for AI: running LLMs in production with Kubernetes and KubeFlow
#1about 3 minutes
Applying DevOps principles to machine learning operations
The maturity of software operations from reactive firefighting to automated DevOps provides a model for improving current MLOps practices.
#2about 3 minutes
Defining AI, machine learning, and generative AI
AI is a broad concept that has evolved through machine learning and deep learning to the latest trend of generative AI, which can create new content.
#3about 4 minutes
How large language models generate text with tokens
LLMs work by converting text into numerical tokens and then using a large statistical model to predict the most probable next token in a sequence.
#4about 2 minutes
Using prompt engineering to guide LLM responses
Prompt engineering involves crafting detailed instructions and providing context within a prompt to guide the LLM toward a desired and accurate answer.
#5about 2 minutes
Understanding and defending against prompt injection attacks
User-provided input can be manipulated to bypass instructions or extract sensitive information, requiring defensive measures against prompt injection.
#6about 3 minutes
Advanced techniques like RAG and model fine-tuning
Beyond basic prompts, you can use Retrieval-Augmented Generation (RAG) to add dynamic context or fine-tune a model with specific data for better performance.
#7about 5 minutes
Choosing between cloud APIs and self-hosted models
LLMs can be consumed via managed cloud APIs, which are simple but opaque, or by self-hosting open-source models for greater control and data privacy.
#8about 2 minutes
Streamlining local development with the Ollama tool
Ollama simplifies running open-source LLMs on a local machine for development by managing model downloads and hardware acceleration, acting like Docker for LLMs.
#9about 6 minutes
Running LLMs in production with Kubeflow and KServe
Kubeflow and its component KServe provide a robust, Kubernetes-native framework for deploying, scaling, and managing LLMs in a production environment.
#10about 2 minutes
Monitoring LLM performance with KServe's observability tools
KServe integrates with tools like Prometheus and Grafana to provide detailed metrics and dashboards for monitoring LLM response times and resource usage.
Related jobs
Jobs that call for the skills explored in this talk.
Matching moments
02:06 MIN
The rise of MLOps and AI security considerations
MLOps and AI Driven Development
02:48 MIN
Comparing open source tools for serving LLMs
Self-Hosted LLMs: From Zero to Inference
03:27 MIN
Understanding the new AI developer stack and MLOps workflow
Developer Experience, Platform Engineering and AI powered Apps
05:08 MIN
The lifecycle for operationalizing AI models in business
Detecting Money Laundering with AI
02:31 MIN
Demonstrating the automated LLMOps pipeline in action
LLMOps-driven fine-tuning, evaluation, and inference with NVIDIA NIM & NeMo Microservices
03:02 MIN
Defining LLMOps and understanding its core benefits
From Traction to Production: Maturing your LLMOps step by step
04:56 MIN
What MLOps is and the engineering challenges it solves
MLOps - What’s the deal behind it?
01:33 MIN
The convergence of ML and DevOps in MLOps
AI Model Management Life Circles: ML Ops For Generative AI Models From Research to Deployment
Featured Partners
Related Videos
The state of MLOps - machine learning in production at enterprise scale
Bas Geerdink
From Traction to Production: Maturing your LLMOps step by step
Maxim Salnikov
LLMOps-driven fine-tuning, evaluation, and inference with NVIDIA NIM & NeMo Microservices
Anshul Jindal
Self-Hosted LLMs: From Zero to Inference
Roberto Carratalá & Cedric Clyburn
DevOps for Machine Learning
Hauke Brammer
One AI API to Power Them All
Roberto Carratalá
Creating Industry ready solutions with LLM Models
Vijay Krishan Gupta & Gauravdeep Singh Lotey
MLOps on Kubernetes: Exploring Argo Workflows
Hauke Brammer
Related Articles
View all articles.gif?w=240&auto=compress,format)
.gif?w=240&auto=compress,format)
.gif?w=240&auto=compress,format)

From learning to earning
Jobs that call for the skills explored in this talk.





AUTO1 Group SE
Berlin, Germany
Intermediate
Senior
ELK
Terraform
Elasticsearch

smartclip Europe GmbH
Hamburg, Germany
Intermediate
Senior
GIT
Linux
Python
Kubernetes

iits-consulting GmbH
München, Germany
Intermediate
Go
Docker
DevOps
Kubernetes

IKEA
Amsterdam, Netherlands
Intermediate
Azure
Kubernetes
Google Cloud Platform
Amazon Web Services (AWS)
Scripting (Bash/Python/Go/Ruby)

IKEA
Amsterdam, Netherlands
Intermediate
Azure
Terraform
Google Cloud Platform
Amazon Web Services (AWS)
Scripting (Bash/Python/Go/Ruby)