Learning materials for those who want to get into MLOps
Curated by Bartosz Mikulski

Why is DevOps for Machine Learning so Different?

The role of MLOps is to support the whole flow of training, serving, rollout, and monitoring, not only deployment and testing. The entire workflow is...

Scaling An ML Team (0–10 People)

Don't write your own tools. Everything you need at the beginning has already been written by someone else. Invest effort in automation. It would be...

Guide to File Formats for Machine Learning: Columnar, Training, Inferencing, and the Feature Store

The author explains the difference between tabular, columnar, hierarchical, and nested data formats. They also mention data partitioning to speed up parallel processing and list...

Modern SQL

The author goes far beyond the basic SQL tutorials that got stuck with the SQL-92 standard.

What is a Vector Database?

We transform the raw data into vector embeddings to train/use an ML model (for example, in language processing). Vector databases store such embeddings and offer...

Automating Data Protection at Scale, Part 1

A story of building a Data Protection Platform at Airbnb, automating data encryption, leakage protection, and GDPR compliance.

Uber’s Journey Toward Better Data Culture From First Principles

Building a data-first architecture where every data point has an owner and quality metrics.

Data Versioning: What is it & why is it important?

An explanation of why we need data versioning and what kinds of data versioning tools exist.

The Guide to Data Versioning

A description of three kinds of data versioning implementation and an explanation how it is implemented in LakeFS. Additionally, a few best practices regarding data...

Unit Testing Data: What Is It and How Do You Do It?

Monitoring the data quality and ensuring that the feature store always contains valuable data. Hints of the kinds of data quality checks that we can...

What are Feature Stores and Why Are They Critical for Scaling Data Science?

This text explains the difference between online and offline features. Also, it shows the benefits of having a feature store, such as an interface between...

What is a Feature Store?

An explanation which focusing on the technical building blocks of a feature store and the separation of responsibilities between data engineers and data scientists.

Experiment Tracking: What it is, Best Practices & Tools

What experiment is tracking, and why do you need it? Ideas on experiment tracking implementation better than using a shared Excel file

ML Experiment Tracking: What It Is, Why It Matters, and How to Implement It

Using experiment tracking to compare experiments, analyze results, debug the model training code, and improve team collaboration by sharing experiment results.

The time and place for Jupyter notebooks in Data Science projects

What are the consequences of using Jupiter notebooks? Hopefully, this text will persuade you to use them only for quick prototyping ;)

Machine Learning Build system: why do you need it?

What is a ML build system and why is it crucial for model reproducibility. How to test models using the build system to ensure they...

Designing ML Orchestration Systems for Startups

A case study on designing a machine learning orchestration platform for startups.

ML Model Orchestration Recommendations

What to expect from a properly built ML orchestration system and the key components of batch and streaming systems.

Reproducibility in ML: why it matters and how to achieve it

Root Causes of Non-Determinism and how to fix those issues

Selecting your optimal MLOps stack: adventages and challenges

What do you need for ML pipeline automation, and how to evaluate whether the tools meet your expectations

Introduction to Unit Testing for Machine Learning

A comprehensive guide to testing in MLOps that covers data and model monitoring, principles of MLOps testing, and an example implementation

Effective testing for machine learning systems

An explanation of the difference between testing standard software and testing ML-based software. An introduction to the pre-training testing (data validation, assumptions about the training...

Why don’t we test machine learning like we test software?

An introduction to the idea of applying the property-based testing to ML models

A/B Testing Machine Learning Models

The process of testing multiple variants of the same ML model in production. An explanation of the factors affecting A/B testing.

What Does it Mean to Deploy a Machine Learning Model?

How do you deploy machine learning models in an automated, reproducible, and auditable manner?

Batch Inference for Machine Learning Deployment

How to use an ML model in a batch process scheduled using cron, Airflow, Prefect, or any other workflow management software

A guide to ML model serving

A short explanation of the building blocks needed to serve ML models in production

Deploying your first ML model in production

What to do when you want the model in production as fast as possible. Overengineering is fun, but right now, you need results. Fast.

Shadow deployment vs. canary release of machine learning models

How to roll out machine learning models in three stages to ensure that the model works properly in production

Monitoring ML pipelines

An introduction to monitoring ML pipelines in production. The article covers Monitoring your infrastructure, the input data, and the ML training process.

Model Monitoring: What it is & why does it matter?

Preventing ML model degradation over time by monitoring the model's perfromance KPIs

Concept Drift and Model Decay in Machine Learning

An explanation of the most common problems related to ML model deployed in production

Monitoring Machine Learning Models in Production

This comprehensive guide aims to at the very least make you aware of where the complexity in monitoring machine learning models in production comes from,...

What is ML Observability?

An introduction to ML Observability - the practice of obtaining an understanding of the model’s performance across all stages of the model development cycle.

ML Model Monitoring – 9 Tips From the Trenches

A practical guide to finding common problems with ML models and fixing them

Q&A: Do I need to monitor data drift if I can measure the ML model quality?

Monitoring data drift to get an early warning about incomming model performance problems

Beyond Monitoring: The Rise of Observability

Using data observability to detect or prevent partial, erroneous, missing, or otherwise inaccurate data

Interpretable Machine Learning

This free book is about making machine learning models and their decisions interpretable.

Why you need to explain machine learning models

Increasing the trust in ML models by figuring out why they generated the predictions An introduction to the four most popular ML explainability methods

Explainability and Auditability in ML: Definitions, Techniques, and Tools

A guide to all explainability methods applied to various kinds of ML models

What Are the Prevailing Explainability Methods?

An explanation of SHAP and LIME explainability methods