Docker Development Environment¶
This guide explains how to use Docker for development and deployment in the Aether project. Our Docker setup provides a consistent environment with CUDA support, development tools, and the Rye package manager.
Container Architecture¶
Base Image¶
We use nvidia/cuda:11.7.1-cudnn8-devel-ubuntu22.04 as our base image for:
- CUDA 11.7.1 and cuDNN 8 support
- Development tools compatibility
- Ubuntu 22.04 LTS stability
Key Components¶
-
Environment Configuration
-
Development Tools
- Git for version control
- Build essentials for compilation
- Rye package manager for Python dependencies
-
Python Environment
- Managed by Rye package manager
- Dependencies from
pyproject.toml - Locked versions in
requirements.lock
Building Images¶
Using the Build Script¶
The build.sh script provides a flexible way to build and manage Docker images:
# Basic build
./docker/build.sh
# Build with custom name and tag
./docker/build.sh --name myproject --tag v1.0
# Build and convert to Singularity
./docker/build.sh --singularity
# Build and push to registry
./docker/build.sh --push --registry your.registry.com/username
Build Script Options¶
| Option | Description | Default |
|---|---|---|
--name |
Image name | "projection" |
--tag |
Image tag | "latest" |
--singularity |
Convert to Singularity | false |
--push |
Push to registry | false |
--registry |
Registry URL | "" |
Development Workflow¶
Local Development¶
-
Build the Image
-
Run Development Container
-
Working with the Container
- Code changes in host are reflected in container
- Python environment is pre-configured
- GPU support is enabled
Best Practices¶
-
Volume Mounting
- Mount your project directory as
/workspace - Consider mounting data directories separately
- Use named volumes for persistent storage
- Mount your project directory as
-
Environment Variables
-
Resource Management
Cluster Deployment¶
Converting to Singularity¶
-
Build and Convert
-
Transfer to Cluster
-
Running on Cluster
SLURM Integration¶
For SLURM-based clusters, create job scripts like:
#!/bin/bash
#SBATCH --job-name=aether_training
#SBATCH --gpus=1
#SBATCH --cpus-per-task=4
#SBATCH --mem=16G
singularity run --nv \
-B /path/to/data:/data \
projection_latest.sif \
python train.py
Troubleshooting¶
Common Issues¶
-
CUDA/GPU Access
- Ensure NVIDIA drivers are installed
- Use
nvidia-smito verify GPU access - Check
--gpusflag in docker run
-
Volume Permissions
-
Package Installation
- Update
pyproject.tomland rebuild - Use
rye add package_nameinside container - Check Rye environment activation
- Update
Debugging Tips¶
-
Container Inspection
-
Build Issues