Camera Calibration and Selection

Overview

Camera calibration is the process of determining a camera's intrinsic parameters and lens distortion coefficients. It is a prerequisite for all accurate visual measurements. This section covers the pinhole camera model, calibration methods, distortion correction, and guidelines for selecting the right camera for your application.

Pinhole Camera Model

Projection Equation

The pinhole model projects 3D world points onto a 2D image plane:

\[ \begin{bmatrix} u \\ v \\ 1 \end{bmatrix} = \frac{1}{Z} K [R|t] \begin{bmatrix} X \\ Y \\ Z \\ 1 \end{bmatrix} \]

Where:

$(u, v)$: Pixel coordinates
$(X, Y, Z)$: World coordinates
$K$: Camera intrinsic matrix
$[R|t]$: Extrinsic matrix (rotation + translation)

Intrinsic Matrix

\[ K = \begin{bmatrix} f_x & 0 & c_x \\ 0 & f_y & c_y \\ 0 & 0 & 1 \end{bmatrix} \]

Parameter	Meaning	Unit
$f_x, f_y$	Focal length in x/y direction	Pixels
$c_x, c_y$	Principal point (intersection of optical axis and image plane)	Pixels

Converting focal length from mm to pixels:

\[ f_x = \frac{f_{\text{mm}} \times \text{width}_{\text{pixel}}}{\text{sensor\_width}_{\text{mm}}} \]

IMX219 Intrinsic Estimation

Focal length 3.04mm, sensor width 3.68mm, image width 1920 pixels:

\[f_x = \frac{3.04 \times 1920}{3.68} = 1585.7 \text{ pixels}\]

Extrinsic Matrix

\[ [R|t] = \begin{bmatrix} r_{11} & r_{12} & r_{13} & t_x \\ r_{21} & r_{22} & r_{23} & t_y \\ r_{31} & r_{32} & r_{33} & t_z \end{bmatrix} \]

$R$ (3x3 rotation matrix): Camera orientation relative to the world coordinate system
$t$ (3x1 translation vector): Camera position in the world coordinate system

Complete Projection Process

\[ \begin{bmatrix} x' \\ y' \\ z' \end{bmatrix} = [R|t] \begin{bmatrix} X_w \\ Y_w \\ Z_w \\ 1 \end{bmatrix} \quad \rightarrow \quad \begin{bmatrix} u \\ v \end{bmatrix} = \begin{bmatrix} f_x \frac{x'}{z'} + c_x \\ f_y \frac{y'}{z'} + c_y \end{bmatrix} \]

Lens Distortion

Distortion Types

Actual lenses introduce distortion, mainly of two types:

Radial Distortion:

\[ \begin{aligned} x_{\text{distorted}} &= x(1 + k_1 r^2 + k_2 r^4 + k_3 r^6) \\ y_{\text{distorted}} &= y(1 + k_1 r^2 + k_2 r^4 + k_3 r^6) \end{aligned} \]

Where $r^2 = x^2 + y^2$, and $(x, y)$ are normalized image coordinates.

Tangential Distortion:

\[ \begin{aligned} x_{\text{distorted}} &= x + 2p_1 xy + p_2(r^2 + 2x^2) \\ y_{\text{distorted}} &= y + p_1(r^2 + 2y^2) + 2p_2 xy \end{aligned} \]

Distortion Coefficients

OpenCV uses 5 (or more) distortion coefficients:

\[ D = [k_1, k_2, p_1, p_2, k_3] \]

Coefficient	Type	Effect
$k_1$	Radial	Primary distortion term (most important)
$k_2$	Radial	Secondary correction
$k_3$	Radial	Higher-order correction (needed for wide-angle lenses)
$p_1$	Tangential	Lens not parallel to sensor
$p_2$	Tangential	Lens not parallel to sensor

Distortion Visualization

Normal grid:         Barrel distortion(k1<0):   Pincushion distortion(k1>0):
+--+--+--+--+      +--+--+--+--+              +--+--+--+--+
+--+--+--+--+      +--+--+--+--+              |\ |  |  |/ |
+--+--+--+--+      +--+--+--+--+              | \|  |  |/ |
+--+--+--+--+      +--+--+--+--+              | /|  |  |\ |
+--+--+--+--+      +--+--+--+--+              +--+--+--+--+
Common in wide-angle    Occasionally in telephoto

Zhang's Calibration Method

Principle

Zhang's method uses a planar calibration board (e.g., checkerboard) to estimate camera intrinsics and distortion coefficients from multiple photos taken at different angles.

Basic idea:

Corner world coordinates on the checkerboard are known ($Z=0$ plane)
Detect corner pixel coordinates in images
Each image provides a homography matrix $H$
Solve for intrinsic $K$ and distortion coefficients from multiple $H$ matrices

Calibration Steps

import cv2
import numpy as np
import glob

# 1. Prepare checkerboard parameters
CHECKERBOARD = (9, 6)  # Number of inner corners
SQUARE_SIZE = 0.025    # Square size (meters)

# 3D world coordinates
objp = np.zeros((CHECKERBOARD[0]*CHECKERBOARD[1], 3), np.float32)
objp[:, :2] = np.mgrid[0:CHECKERBOARD[0], 0:CHECKERBOARD[1]].T.reshape(-1, 2)
objp *= SQUARE_SIZE

obj_points = []  # 3D points
img_points = []  # 2D points

# 2. Detect corners
images = glob.glob('calibration_images/*.jpg')
for fname in images:
    img = cv2.imread(fname)
    gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

    ret, corners = cv2.findChessboardCorners(
        gray, CHECKERBOARD,
        cv2.CALIB_CB_ADAPTIVE_THRESH + cv2.CALIB_CB_NORMALIZE_IMAGE
    )

    if ret:
        # Sub-pixel refinement
        corners = cv2.cornerSubPix(
            gray, corners, (11, 11), (-1, -1),
            (cv2.TERM_CRITERIA_EPS + cv2.TERM_CRITERIA_MAX_ITER, 30, 0.001)
        )
        obj_points.append(objp)
        img_points.append(corners)

# 3. Calibrate
ret, K, D, rvecs, tvecs = cv2.calibrateCamera(
    obj_points, img_points, gray.shape[::-1], None, None
)

print(f"Intrinsic matrix K:\n{K}")
print(f"Distortion coefficients D: {D.ravel()}")
print(f"Reprojection error: {ret:.4f} pixels")

# 4. Save calibration results
np.savez('calibration.npz', K=K, D=D)

Calibration Quality Assessment

Reprojection Error:

\[ \text{RMS} = \sqrt{\frac{1}{N} \sum_{i=1}^{N} \|\mathbf{p}_i - \hat{\mathbf{p}}_i\|^2} \]

Reprojection Error	Quality
< 0.3 pixel	Excellent
0.3 - 0.5 pixel	Good
0.5 - 1.0 pixel	Acceptable
> 1.0 pixel	Needs recalibration

Calibration Best Practices

Tips for Improving Calibration Accuracy

At least 20 images, taken from different angles and distances
The checkerboard should cover all areas of the image (including edges and corners)
Include multiple tilt angles (not just facing the camera directly)
Use a flat calibration board (print and attach to a flat surface)
Ensure images are sharp (no motion blur)
The calibration board should occupy 20-80% of the image area

Undistortion

Image Undistortion

# Method 1: Direct undistortion (simple but computed per frame)
undistorted = cv2.undistort(img, K, D)

# Method 2: Pre-computed mapping tables (recommended, computed once)
new_K, roi = cv2.getOptimalNewCameraMatrix(K, D, (w, h), alpha=0)
map1, map2 = cv2.initUndistortRectifyMap(K, D, None, new_K, (w, h), cv2.CV_16SC2)

# Apply mapping to each frame
undistorted = cv2.remap(img, map1, map2, cv2.INTER_LINEAR)

Point Undistortion

# Convert distorted image points to undistorted coordinates
points_distorted = np.array([[100, 200], [300, 400]], dtype=np.float32)
points_distorted = points_distorted.reshape(-1, 1, 2)

points_undistorted = cv2.undistortPoints(
    points_distorted, K, D, P=new_K
)

Field of View (FOV) Calculation

Computing FOV from Intrinsics

\[ \text{FOV}_H = 2 \arctan\left(\frac{w}{2f_x}\right), \quad \text{FOV}_V = 2 \arctan\left(\frac{h}{2f_y}\right) \]

import numpy as np

def compute_fov(K, image_size):
    fx, fy = K[0, 0], K[1, 1]
    w, h = image_size

    fov_h = 2 * np.degrees(np.arctan(w / (2 * fx)))
    fov_v = 2 * np.degrees(np.arctan(h / (2 * fy)))
    fov_d = 2 * np.degrees(np.arctan(
        np.sqrt(w**2 + h**2) / (2 * np.sqrt(fx * fy))
    ))

    return fov_h, fov_v, fov_d

fov_h, fov_v, fov_d = compute_fov(K, (1920, 1080))
print(f"Horizontal FOV: {fov_h:.1f} deg, Vertical FOV: {fov_v:.1f} deg, Diagonal FOV: {fov_d:.1f} deg")

FOV and Coverage Range

Coverage at distance $d$:

\[ W = 2d \tan\left(\frac{\text{FOV}_H}{2}\right), \quad H = 2d \tan\left(\frac{\text{FOV}_V}{2}\right) \]

FOV	Coverage at 1m	Coverage at 3m	Coverage at 5m
60 degrees	1.15m	3.46m	5.77m
90 degrees	2.00m	6.00m	10.0m
120 degrees	3.46m	10.4m	17.3m

Stereo Calibration

# Stereo calibration
ret, K_left, D_left, K_right, D_right, R, T, E, F = cv2.stereoCalibrate(
    obj_points, img_points_left, img_points_right,
    K_left, D_left, K_right, D_right,
    gray.shape[::-1],
    flags=cv2.CALIB_FIX_INTRINSIC  # Fix already calibrated intrinsics
)

print(f"Baseline length: {np.linalg.norm(T):.4f}m")
print(f"Stereo calibration error: {ret:.4f} pixels")

Hand-Eye Calibration

Robot hand-eye calibration solves for the transform $T_{CE}$ between the camera and the robot end-effector:

\[ A_i X = X B_i \]

Where $A_i$ is the relative motion of the robot end-effector, $B_i$ is the relative motion observed by the camera, and $X = T_{CE}$.

# OpenCV hand-eye calibration
R_target2cam = [rvecs]  # Calibration board to camera rotation
t_target2cam = [tvecs]  # Calibration board to camera translation
R_gripper2base = [...]  # Robot end-effector to base rotation
t_gripper2base = [...]  # Robot end-effector to base translation

R_cam2gripper, t_cam2gripper = cv2.calibrateHandEye(
    R_gripper2base, t_gripper2base,
    R_target2cam, t_target2cam,
    method=cv2.CALIB_HAND_EYE_TSAI
)

Selection Guide

Selection by Application Scenario

Scenario	Key Requirement	Recommended Camera	Reason
Indoor SLAM	Depth + IMU	RealSense D435i	Active stereo + IMU
Outdoor SLAM	Waterproof, outdoor depth	ZED 2i	IP66 + passive stereo
Robotic Arm Grasping	Close-range accuracy	RealSense D405	7cm minimum, high accuracy
High-Speed VIO	Global shutter + IMU	OV9281 + IMU	Global shutter, no motion distortion
Object Detection	High-resolution RGB	IMX477 + lens	12.3MP, interchangeable lenses
Night Vision	Low-light performance	IMX462	Starlight-grade CMOS
Omnidirectional Perception	360-degree coverage	4x fisheye solution	No blind spots
High-Speed Scenarios	Microsecond temporal resolution	DAVIS346	Event camera
Education Prototypes	Low cost	RPi Camera V2	$25 entry

Selection Decision Flow

graph TD
    A[Vision Task] --> B{Budget?}
    B -->|<$50| C[RPi Camera V2/V3]
    B -->|$50-200| D{Need depth?}
    B -->|$200-500| E{Special requirements?}
    B -->|>$500| F[Industrial camera / Event camera]

    D -->|No| G[IMX477 + lens]
    D -->|Yes| H[OAK-D Lite]

    E -->|Outdoor waterproof| I[ZED 2i]
    E -->|Close-range accuracy| J[RealSense D405]
    E -->|General purpose| K[RealSense D435i]
    E -->|Long range| L[RealSense D455]

Key Parameter Checklist

Parameters to confirm during selection:

Parameter	Question
Resolution	How fine a detection granularity is needed?
Frame Rate	Robot speed and control frequency requirements?
FOV	How large an area needs to be covered?
Depth Range	Minimum and maximum working distances?
Depth Accuracy	Positioning/grasping accuracy requirements?
Interface	CSI (low latency) or USB (flexible)?
Shutter	Global shutter needed for high-speed motion?
Environment	Indoor/outdoor/waterproof/temperature range?
Power	Battery-powered budget?
Size & Weight	Installation space constraints?
Software Ecosystem	ROS2 drivers, SDK maturity?

See: Calibration and System Integration for more on calibration workflows in actual deployment.

Online and Self-Calibration

Why Online Calibration Is Needed

Camera parameters may change with temperature
Physical impacts may alter mounting pose
Lenses and structures may loosen over long-term operation

Self-Calibration Methods

Method	Principle	Accuracy	Suitable For
Motion self-calibration	Recover intrinsics from image sequences	Medium	Without calibration board
Vanishing point calibration	Use parallel lines in the scene	Medium	Structured environments
Marker-based calibration	AprilTag/ArUco	High	When markers can be placed
Continuous optimization	Joint intrinsic optimization in SLAM	High	Long-term operation

# ArUco marker detection and pose estimation
import cv2

aruco_dict = cv2.aruco.getPredefinedDictionary(cv2.aruco.DICT_4X4_50)
parameters = cv2.aruco.DetectorParameters()
detector = cv2.aruco.ArucoDetector(aruco_dict, parameters)

corners, ids, rejected = detector.detectMarkers(image)

if ids is not None:
    # Estimate marker pose
    rvecs, tvecs, _ = cv2.aruco.estimatePoseSingleMarkers(
        corners, marker_size, K, D
    )
    for rvec, tvec in zip(rvecs, tvecs):
        cv2.drawFrameAxes(image, K, D, rvec, tvec, marker_size/2)

Summary

The pinhole model is the basic mathematical framework for camera projection
Zhang's calibration method is the most commonly used, requiring 20+ multi-angle images
Reprojection error <0.5px is the benchmark for good calibration
Pre-computed mapping tables are best practice for real-time undistortion
Selection should comprehensively consider resolution, frame rate, FOV, depth, interface, environment and other factors
Hand-eye calibration is a necessary step for robotic arm vision

References

Zhang, Z. "A Flexible New Technique for Camera Calibration" IEEE TPAMI, 2000
Hartley, R., & Zisserman, A. Multiple View Geometry in Computer Vision
OpenCV Camera Calibration Tutorial: https://docs.opencv.org/
Tsai, R. Y., & Lenz, R. K. "A New Technique for Fully Autonomous and Efficient 3D Robotics Hand/Eye Calibration"

Parameter	Meaning	Unit
\(f_x, f_y\)	Focal length in x/y direction	Pixels
\(c_x, c_y\)	Principal point (intersection of optical axis and image plane)	Pixels

Coefficient	Type	Effect
\(k_1\)	Radial	Primary distortion term (most important)
\(k_2\)	Radial	Secondary correction
\(k_3\)	Radial	Higher-order correction (needed for wide-angle lenses)
\(p_1\)	Tangential	Lens not parallel to sensor
\(p_2\)	Tangential	Lens not parallel to sensor