Skip to content

Weights & Biases (W&B)

Weights & Biases (commonly known as W&B or wandb) is an experiment tracking and visualization platform designed for machine learning teams. Compared to TensorBoard, W&B offers advanced features such as cloud storage, team collaboration, and automated hyperparameter search.


Basic Usage

Installation and Setup

pip install wandb
wandb login  # 输入 API Key(从 wandb.ai 获取)

Basic Integration

import wandb

# 初始化实验
wandb.init(
    project="my-project",
    name="resnet50-baseline",
    config={
        "learning_rate": 1e-3,
        "batch_size": 32,
        "epochs": 100,
        "architecture": "ResNet-50",
        "optimizer": "AdamW",
    }
)

for epoch in range(num_epochs):
    train_loss = train_one_epoch(model, train_loader, optimizer)
    val_loss, val_acc = evaluate(model, val_loader)

    # 记录指标
    wandb.log({
        "train_loss": train_loss,
        "val_loss": val_loss,
        "val_accuracy": val_acc,
        "lr": optimizer.param_groups[0]['lr'],
        "epoch": epoch,
    })

# 保存模型
wandb.save("model_best.pth")
wandb.finish()

Core Features

Experiment Comparison

W&B automatically creates a Run for each call to wandb.init(). From the web interface, you can:

  • Compare training curves across different experiments side by side
  • Filter and sort experiments by hyperparameters
  • Create custom Dashboards

Hyperparameter Search (Sweep)

W&B Sweep provides automated hyperparameter search:

# 定义搜索空间
sweep_config = {
    "method": "bayes",  # bayes, grid, random
    "metric": {"name": "val_accuracy", "goal": "maximize"},
    "parameters": {
        "learning_rate": {"min": 1e-5, "max": 1e-2, "distribution": "log_uniform_values"},
        "batch_size": {"values": [16, 32, 64, 128]},
        "weight_decay": {"min": 1e-5, "max": 1e-1, "distribution": "log_uniform_values"},
    }
}

sweep_id = wandb.sweep(sweep_config, project="my-project")

def train():
    wandb.init()
    config = wandb.config
    # 使用 config.learning_rate, config.batch_size 等进行训练
    ...

wandb.agent(sweep_id, function=train, count=50)  # 运行50次实验

Artifacts (Asset Management)

Artifacts are used for versioned management of datasets, models, and other files:

# 保存模型为 artifact
artifact = wandb.Artifact('trained-model', type='model')
artifact.add_file('model_best.pth')
wandb.log_artifact(artifact)

# 加载 artifact
artifact = wandb.use_artifact('trained-model:latest')
artifact_dir = artifact.download()

Tables and Visualization

# 记录预测样本
table = wandb.Table(columns=["image", "prediction", "ground_truth"])
for img, pred, gt in samples:
    table.add_data(wandb.Image(img), pred, gt)
wandb.log({"predictions": table})

Comparison with Other Tools

Feature TensorBoard W&B MLflow
Experiment Tracking Local Cloud-based Local/Remote
Team Collaboration Not supported Native support Supported
Hyperparameter Search Not supported Sweep (Bayesian, etc.) Not supported (requires integration)
Model Registry Not supported Artifacts Model Registry
Free Tier Completely free Free for individuals Open-source and free
Learning Curve Low Low Moderate

Recommendations:

  • Personal projects / quick experiments -> TensorBoard (zero configuration)
  • Team collaboration / production projects -> W&B (most comprehensive feature set)
  • Self-hosted / MLOps Pipeline requirements -> MLflow

References


评论 #