Project Initialization
Setting up a new deep learning project often comes with a variety of issues — CUDA acceleration failing silently and falling back to CPU, dependency conflicts, and so on. This note is a dedicated guide to clarify what preparations are needed when starting a new project and what pitfalls to watch out for.
Dependency Management
Although deep learning training is primarily done in Python, it differs significantly from typical Python programs. The CUDA runtime libraries required for training are system-level binary dependencies, not Python packages. For this reason, we typically adopt a two-tier structure using conda + pip: conda manages low-level dependencies, while pip handles Python packages.
System-Level Drivers
First, ensure that the system-level drivers are properly installed. You can check the NVIDIA driver installation status using the nvidia-smi command.
Conda Dependencies
We generally use Conda to create virtual environments, and then install PyTorch via pip.
Conda dependencies are typically specified in an environment.yml file. (The following is for reference only — this particular version still had numerous issues after installation.)
name: visinject
channels:
- conda-forge
- defaults
dependencies:
- python=3.10
- pip=23.3.1
# Scientific stack
- numpy=1.24.3
- scipy=1.11.4
- pillow=10.0.1
- matplotlib=3.8.0
- scikit-image=0.22.0
- tqdm=4.66.1
Install using the following command:
conda env create -f environment.yml
You can also install the conda environment inside the project directory. For example:
# Navigate to your project directory in the terminal
cd D:\Projects\MyNewAI
# Use -p to specify the path (./env refers to the env folder under the current directory)
conda create -p ./env python=3.11
# Activate the local env environment
conda activate ./env
This way, all dependencies are installed inside the env folder. This is convenient for developers who need to conserve disk space on the C drive.
pip Dependency Installation
pip installs packages primarily from PyPI. It is recommended to install CUDA dependencies via conda. For anything not covered by conda, or packages unavailable in conda, use pip.
pip dependencies are typically listed in a requirements.txt file. (The following is for reference only — this version also had numerous issues after installation.)
--extra-index-url https://download.pytorch.org/whl/cu124
# PyTorch + CUDA 12.4 (GPU version, supports RTX 4090)
torch
torchvision
torchaudio
# Hugging Face & ML libraries
transformers>=4.37.0
transformers_stream_generator>=0.0.4
accelerate>=0.25.0
tiktoken>=0.5.2
einops>=0.7.0
sentencepiece>=0.1.99
protobuf>=3.20.0,<5
Install pip dependencies with:
pip install -r requirements.txt
If a conda environment is activated during installation, the packages will automatically be installed into that conda environment.
Verifying the Installation
To verify the installation:
python -c "import torch; print(torch.__version__)"
python -c "import torch; print(torch.cuda.is_available())"
python -c "import torch; print(torch.cuda.get_device_name(0) if torch.cuda.is_available() else 'N/A')"
Expected output:
PyTorch: 2.1.0+cu118
CUDA可用: True
GPU: NVIDIA GeForce RTX 4090 Laptop GPU
.
Saving the Environment
Since configuring environments can be quite tedious, the best practice is to export a working environment for future use:
conda env export --no-builds > environment.yml
.
Additional Notes
HuggingFace models can be very large. To conserve disk space on the C drive, you can store downloaded model weight files on the D drive.
Setting the HuggingFace environment variable (Windows):
# Set a user-level environment variable
[System.Environment]::SetEnvironmentVariable('HF_HOME', 'D:\hugging_face', 'User')
# Or via System Settings:
# 1. Win + R, type sysdm.cpl
# 2. Advanced → Environment Variables
# 3. Under "User variables", create a new entry:
# Variable name: HF_HOME
# Variable value: D:\hugging_face
Verification:
# Reload the environment variable
$env:HF_HOME = [System.Environment]::GetEnvironmentVariable('HF_HOME', 'User')
# Verify
echo $env:HF_HOME
.