GROMACS Simulation on Bare Metal GPUs
1. Overview
This document outlines the end-to-end procedure for deploying and benchmarking GROMACS 2023.2 on an NVIDIA GPU cluster using Docker and the NVIDIA Container Toolkit. It provides a reproducible environment setup for GPU performance benchmarking, ensuring consistent results across nodes and configurations.
Objectives
- Deploy Docker and NVIDIA runtimes for GPU compute workloads
- Pull and run the official NGC GROMACS container
- Benchmark molecular dynamics simulations using benchPEP dataset
- Measure throughput (ns/day) and optimize performance parameters
Deploy Docker and NVIDIA runtimes for GPU compute workloads Pull and run the official NGC GROMACS container Benchmark molecular dynamics simulations using benchPEP dataset Measure throughput (ns/day) and optimize performance parameters
System Architecture
The setup runs entirely in containers for reproducibility:
- Host OS: Ubuntu 24.04 LTS
- GPU Runtime: NVIDIA CUDA 12.x
- Container Platform: Docker Engine + NVIDIA Container Toolkit
- Benchmark Container: nvcr.io/hpc/gromacs:2023.2
Host OS: Ubuntu 24.04 LTS GPU Runtime: NVIDIA CUDA 12.x Container Platform: Docker Engine + NVIDIA Container Toolkit Benchmark Container: nvcr.io/hpc/gromacs:2023.2
2. Prerequisites
| Component | Description |
|---|---|
| OS | Ubuntu 24.04 LTS |
| GPU | NVIDIA GPUs |
| Driver | 550+ (CUDA 12.x compatible) |
| Docker | 27.x or newer |
| NVIDIA Toolkit | libnvidia-container =E2=89=A5 1.15 |
| Internet Access | Required to pull NGC image and benchmark dataset |
Component Description OS Ubuntu 24.04 LTS GPU NVIDIA GPUs Driver 550+ (CUDA 12.x compatible) Docker 27.x or newer NVIDIA Toolkit libnvidia-container =E2=89=A5 1.15 Internet Access Required to pull NGC image and benchmark dataset
3. Install Docker Engine
# Remove old packages
for pkg in docker.io docker-doc docker-compose docker-compose-v2 podman-docker containerd runc; do=20
sudo apt-get remove -y $pkg || true=20
done
# Base utilities
sudo apt-get update
sudo apt-get install -y ca-certificates curl gnupg
# Docker repo & key
sudo install -m 0755 -d /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | \
sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
echo "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] \
https://download.docker.com/linux/ubuntu $(. /etc/os-release && echo $VERSION_CODENAME) stable" | \
sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
# Engine + CLI + Compose
sudo apt-get update
sudo apt-get install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
# Add your user to the Docker group
sudo usermod -aG docker $USER
# (Log out and back in for the changes to apply)
4. Install NVIDIA Container Toolkit
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
# Add NVIDIA GPG key and repo
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | \
sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
curl -fsSL https://nvidia.github.io/libnvidia-container/stable/deb/$distribution/ | \
sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
sudo apt-get update
sudo apt-get install -y nvidia-container-toolkit
# Configure and restart Docker
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker
5. Validate GPU Visibility
docker run --rm --gpus all nvidia/cuda:12.4.1-base-ubuntu22.04 nvidia-smi
Expected: Displays all 8 H100 GPUs.
6. Login to NGC and Pull GROMACS Container
docker login nvcr.io
# Username: $oauthtoken
# Password:
docker pull nvcr.io/hpc/gromacs:2023.2
7. Prepare Benchmark Directory
mkdir -p ~/gmxbench && cd ~/gmxbench
wget -O benchPEP.zip https://www.mpinat.mpg.de/benchPEP.zip
unzip benchPEP.zip
ls -lh benchPEP.tpr
8. Launch Container with GPUs Mounted
docker run --rm -it --gpus all -v ~/gmxbench:/bench nvcr.io/hpc/gromacs:2023.2 bash
Inside the container:
cd /bench
export GMX_ENABLE_DIRECT_GPU_COMM=1
export OMP_NUM_THREADS=8 # Adjust per CPU
gmx mdrun -s benchPEP.tpr \
-deffnm run1 \
-nsteps 500000 \
-nb gpu -bonded gpu -pme gpu \
-update cpu -pin on -ntmpi 8 -npme 1 -nstlist 200
9. Key Parameters Explained
| Flag | Purpose |
|---|---|
| -nb gpu | Short-range forces computed on GPU |
| -bonded gpu | Bonded terms computed on GPU |
| -pme gpu -npme 1 | Long-range PME on one GPU |
| -update cpu | Required for legacy TPR updates |
| GMX_ENABLE_DIRECT_GPU_COMM=1 | Enables peer-to-peer GPU communication |
| -ntmpi 8 | One rank per GPU |
| -nstlist 200 | Reduces neighbor-list rebuild overhead |
Flag
Purpose
-nb gpu
Short-range forces computed on GPU
-bonded gpu
Bonded terms computed on GPU
-pme gpu -npme 1
Long-range PME on one GPU
-update cpu
Required for legacy TPR updates
GMX_ENABLE_DIRECT_GPU_COMM=1
Enables peer-to-peer GPU communication
-ntmpi 8
One rank per GPU
-nstlist 200
Reduces neighbor-list rebuild overhead
10. Monitor Performance
GPU Utilization
watch -n1 nvidia-smi
Throughput (inside container)
watch -n2 "grep -E 'Performance|ns/day' run1.log | tail -n1"
Expected Performance (8=C3=97 H100):~8.0 ns/day
11. Scaling and Optimization Tips
| Setting | Description |
|---|---|
| OMP Threads | Set OMP_NUM_THREADS total_CPU_threads / 8 |
| DLB (Dynamic Load Balancing) | Let GROMACS manage automatically |
| CUDA Graphs | Enable via export GMX_CUDA_GRAPH=1 |
| Run Length | Increase -nsteps for smoother performance trends |
| Resume Runs | gmx mdrun -s benchPEP.tpr -cpi run1.cpt -append |
Setting
Description
OMP Threads
Set OMP_NUM_THREADS total_CPU_threads / 8
DLB (Dynamic Load Balancing)
Let GROMACS manage automatically
CUDA Graphs
Enable via export GMX_CUDA_GRAPH=1
Run Length
Increase -nsteps for smoother performance trends
Resume Runs
gmx mdrun -s benchPEP.tpr -cpi run1.cpt -append
12. Results Location
All output files (e.g., md.log, run1.log, traj.xtc, ener.edr) are stored in:
~/gmxbench/
13. Troubleshooting
| Issue | Possible Fix |
|---|---|
| nvidia-smi: command not found | Verify container uses --gpus all and toolkit is installed |
| CUDA driver not found | Ensure host driver =E2=89=A5 550 and Docker restarted |
| Benchmark fails early | Re-download benchPEP.zip or check disk space |
| Poor GPU scaling | Verify GMX_ENABLE_DIRECT_GPU_COMM=1 and NVLink topology |
Issue
Possible Fix
nvidia-smi: command not found
Verify container uses --gpus all and toolkit is installed
CUDA driver not found
Ensure host driver =E2=89=A5 550 and Docker restarted
Benchmark fails early
Re-download benchPEP.zip or check disk space
Poor GPU scaling
Verify GMX_ENABLE_DIRECT_GPU_COMM=1 and NVLink topology