MLflow
MLflow is an open-source platform for managing the full machine learning lifecycle, developed by Databricks. It provides capabilities such as experiment tracking, model packaging, model registry, and deployment, making it particularly well-suited for enterprise environments that require self-hosting.
Core Components
MLflow consists of four main components:
| Component | Function | Description |
|---|---|---|
| MLflow Tracking | Experiment Tracking | Logs parameters, metrics, code versions, and model artifacts |
| MLflow Projects | Project Packaging | Describes how to run ML code in a standardized format |
| MLflow Models | Model Packaging | Unified model format with support for multiple deployment targets |
| Model Registry | Model Registry | Model versioning, approval workflows, and lifecycle management |
Basic Usage
Installation
pip install mlflow
Experiment Tracking
import mlflow
# Set the experiment name
mlflow.set_experiment("image-classification")
with mlflow.start_run(run_name="resnet50-baseline"):
# Log hyperparameters
mlflow.log_param("learning_rate", 1e-3)
mlflow.log_param("batch_size", 32)
mlflow.log_param("optimizer", "AdamW")
for epoch in range(num_epochs):
train_loss = train_one_epoch(model, train_loader, optimizer)
val_loss, val_acc = evaluate(model, val_loader)
# Log metrics
mlflow.log_metric("train_loss", train_loss, step=epoch)
mlflow.log_metric("val_loss", val_loss, step=epoch)
mlflow.log_metric("val_accuracy", val_acc, step=epoch)
# Save the model
mlflow.pytorch.log_model(model, "model")
# Log artifacts
mlflow.log_artifact("config.yaml")
Launching the UI
mlflow ui --port 5000
# Visit http://localhost:5000
Model Registry and Deployment
Model Registry
# Register a model
result = mlflow.register_model(
model_uri="runs:/<run_id>/model",
name="image-classifier"
)
# Model stage management
from mlflow import MlflowClient
client = MlflowClient()
client.transition_model_version_stage(
name="image-classifier",
version=1,
stage="Production" # None, Staging, Production, Archived
)
Model Deployment
# Load a registered model
import mlflow.pytorch
model = mlflow.pytorch.load_model("models:/image-classifier/Production")
# Deploy via REST API
# mlflow models serve -m "models:/image-classifier/Production" -p 5001
Automatic Logging with PyTorch
MLflow supports automatic logging for PyTorch Lightning:
import mlflow
mlflow.pytorch.autolog()
# Any subsequent Lightning Trainer will automatically log all metrics and models
trainer = pl.Trainer(max_epochs=10)
trainer.fit(model, train_loader, val_loader)
Use Cases
- Enterprise Self-Hosting: When uploading data to third-party cloud services is not desirable
- ML Pipelines: Integration with orchestration tools such as Airflow and Kubeflow
- Model Governance: When model versioning, approval workflows, and stage transitions are required
- Multi-Framework Support: When teams work with multiple ML frameworks (PyTorch, TensorFlow, Scikit-learn, etc.)