Skip to content

Jeff Liu's AI Learning Notes

MLflow

MLflow

MLflow is an open-source platform for managing the full machine learning lifecycle, developed by Databricks. It provides capabilities such as experiment tracking, model packaging, model registry, and deployment, making it particularly well-suited for enterprise environments that require self-hosting.

Core Components

MLflow consists of four main components:

Component	Function	Description
MLflow Tracking	Experiment Tracking	Logs parameters, metrics, code versions, and model artifacts
MLflow Projects	Project Packaging	Describes how to run ML code in a standardized format
MLflow Models	Model Packaging	Unified model format with support for multiple deployment targets
Model Registry	Model Registry	Model versioning, approval workflows, and lifecycle management

Basic Usage

Installation

pip install mlflow

Experiment Tracking

import mlflow

# Set the experiment name
mlflow.set_experiment("image-classification")

with mlflow.start_run(run_name="resnet50-baseline"):
    # Log hyperparameters
    mlflow.log_param("learning_rate", 1e-3)
    mlflow.log_param("batch_size", 32)
    mlflow.log_param("optimizer", "AdamW")

    for epoch in range(num_epochs):
        train_loss = train_one_epoch(model, train_loader, optimizer)
        val_loss, val_acc = evaluate(model, val_loader)

        # Log metrics
        mlflow.log_metric("train_loss", train_loss, step=epoch)
        mlflow.log_metric("val_loss", val_loss, step=epoch)
        mlflow.log_metric("val_accuracy", val_acc, step=epoch)

    # Save the model
    mlflow.pytorch.log_model(model, "model")

    # Log artifacts
    mlflow.log_artifact("config.yaml")

Launching the UI

mlflow ui --port 5000
# Visit http://localhost:5000

Model Registry and Deployment

Model Registry

# Register a model
result = mlflow.register_model(
    model_uri="runs:/<run_id>/model",
    name="image-classifier"
)

# Model stage management
from mlflow import MlflowClient

client = MlflowClient()
client.transition_model_version_stage(
    name="image-classifier",
    version=1,
    stage="Production"  # None, Staging, Production, Archived
)

Model Deployment

# Load a registered model
import mlflow.pytorch

model = mlflow.pytorch.load_model("models:/image-classifier/Production")

# Deploy via REST API
# mlflow models serve -m "models:/image-classifier/Production" -p 5001

Automatic Logging with PyTorch

MLflow supports automatic logging for PyTorch Lightning:

import mlflow

mlflow.pytorch.autolog()

# Any subsequent Lightning Trainer will automatically log all metrics and models
trainer = pl.Trainer(max_epochs=10)
trainer.fit(model, train_loader, val_loader)

Use Cases

Enterprise Self-Hosting: When uploading data to third-party cloud services is not desirable
ML Pipelines: Integration with orchestration tools such as Airflow and Kubeflow
Model Governance: When model versioning, approval workflows, and stage transitions are required
Multi-Framework Support: When teams work with multiple ML frameworks (PyTorch, TensorFlow, Scikit-learn, etc.)

References

评论 #