Skip to content

MLflow

MLflow is an open-source platform for managing the full machine learning lifecycle, developed by Databricks. It provides capabilities such as experiment tracking, model packaging, model registry, and deployment, making it particularly well-suited for enterprise environments that require self-hosting.


Core Components

MLflow consists of four main components:

Component Function Description
MLflow Tracking Experiment Tracking Logs parameters, metrics, code versions, and model artifacts
MLflow Projects Project Packaging Describes how to run ML code in a standardized format
MLflow Models Model Packaging Unified model format with support for multiple deployment targets
Model Registry Model Registry Model versioning, approval workflows, and lifecycle management

Basic Usage

Installation

pip install mlflow

Experiment Tracking

import mlflow

# Set the experiment name
mlflow.set_experiment("image-classification")

with mlflow.start_run(run_name="resnet50-baseline"):
    # Log hyperparameters
    mlflow.log_param("learning_rate", 1e-3)
    mlflow.log_param("batch_size", 32)
    mlflow.log_param("optimizer", "AdamW")

    for epoch in range(num_epochs):
        train_loss = train_one_epoch(model, train_loader, optimizer)
        val_loss, val_acc = evaluate(model, val_loader)

        # Log metrics
        mlflow.log_metric("train_loss", train_loss, step=epoch)
        mlflow.log_metric("val_loss", val_loss, step=epoch)
        mlflow.log_metric("val_accuracy", val_acc, step=epoch)

    # Save the model
    mlflow.pytorch.log_model(model, "model")

    # Log artifacts
    mlflow.log_artifact("config.yaml")

Launching the UI

mlflow ui --port 5000
# Visit http://localhost:5000

Model Registry and Deployment

Model Registry

# Register a model
result = mlflow.register_model(
    model_uri="runs:/<run_id>/model",
    name="image-classifier"
)

# Model stage management
from mlflow import MlflowClient

client = MlflowClient()
client.transition_model_version_stage(
    name="image-classifier",
    version=1,
    stage="Production"  # None, Staging, Production, Archived
)

Model Deployment

# Load a registered model
import mlflow.pytorch

model = mlflow.pytorch.load_model("models:/image-classifier/Production")

# Deploy via REST API
# mlflow models serve -m "models:/image-classifier/Production" -p 5001

Automatic Logging with PyTorch

MLflow supports automatic logging for PyTorch Lightning:

import mlflow

mlflow.pytorch.autolog()

# Any subsequent Lightning Trainer will automatically log all metrics and models
trainer = pl.Trainer(max_epochs=10)
trainer.fit(model, train_loader, val_loader)

Use Cases

  • Enterprise Self-Hosting: When uploading data to third-party cloud services is not desirable
  • ML Pipelines: Integration with orchestration tools such as Airflow and Kubeflow
  • Model Governance: When model versioning, approval workflows, and stage transitions are required
  • Multi-Framework Support: When teams work with multiple ML frameworks (PyTorch, TensorFlow, Scikit-learn, etc.)

References


评论 #