Multi Instance GPU Setup
1. Overview
This guide provides a detailed procedure for configuring and validating Multi-Instance GPU (MIG) on NVIDIA GPUs hosted on Substrate AI Cloud Bare Metal nodes.
Objectives
- Enable and configure MIG mode on Nvidia GPUs
- Partition a single GPU into multiple 1 g / 10 GB instances
- Verify hardware-level isolation using Docker containers
- Demonstrate fine-tuning performance improvements (5–7 %) with concurrent workloads
- Present real-world use cases for MIG in AI inference, MLOps, and cloud GPU environments
Enable and configure MIG mode on Nvidia GPUs Partition a single GPU into multiple 1 g / 10 GB instances Verify hardware-level isolation using Docker containers Demonstrate fine-tuning performance improvements (5–7 %) with concurrent workloads Present real-world use cases for MIG in AI inference, MLOps, and cloud GPU environments
Architecture Overview
| Layer | Purpose |
|---|---|
| Bare-Metal Node | Physical host with NVIDIA H100 GPU |
| MIG Slices | Hardware-isolated GPU partitions (e.g. 7 =C3=97 1 g 10 GB) |
| Docker Containers | Individual jobs, each bound to one MIG instance |
| Workload Manager | Orchestrates fine-tuning or inference jobs |
| Monitoring Tools | nvidia-smi, Prometheus, Grafana DCGM Exporter |
Layer
Purpose
Bare-Metal Node
Physical host with NVIDIA H100 GPU
MIG Slices
Hardware-isolated GPU partitions (e.g. 7 =C3=97 1 g 10 GB)
Docker Containers
Individual jobs, each bound to one MIG instance
Workload Manager
Orchestrates fine-tuning or inference jobs
Monitoring Tools
nvidia-smi, Prometheus, Grafana DCGM Exporter
2. Prerequisites
| Requirement | Description |
|---|---|
| Driver | NVIDIA =E2=89=A5 550 (MIG-capable driver) |
| CUDA Toolkit | 12.4 or later |
| GPU Model | NVIDIA H100 PCIe or SXM (Hopper Architecture) |
| NVIDIA Container Toolkit | Installed for Docker GPU passthrough |
| No Active GPU Processes | Stop all containers and DCGM agents before enabling MIG |
Requirement Description Driver NVIDIA =E2=89=A5 550 (MIG-capable driver) CUDA Toolkit 12.4 or later GPU Model NVIDIA H100 PCIe or SXM (Hopper Architecture) NVIDIA Container Toolkit Installed for Docker GPU passthrough No Active GPU Processes Stop all containers and DCGM agents before enabling MIG
Verify current GPU status
nvidia-smi
3. Enable MIG Mode on GPU ID 0
Run each command in sequence:
# Enable MIG mode
sudo nvidia-smi -i 0 -mig 1
# Delete existing instances (clean slate)
sudo nvidia-smi mig -dci -i 0
sudo nvidia-smi mig -dgi -i 0
# List GPU instance profiles to locate ID for 1g.10gb
nvidia-smi mig -lgip | grep "1g.10gb"
# Assuming profile ID 0 for H100
sudo nvidia-smi mig -cgi 19,19,19,19,19,19,19 -C -i 0
# Verify creation
sudo nvidia-smi mig -lgi -i 0
nvidia-smi -L | sed -n '/GPU 0:/,/GPU/p'
You should now have 7 MIG instances, each representing a 1 g / 10 GB slice.
4. Verification & Isolation Test
List all MIG UUIDs
nvidia-smi -L | awk -F'UUID: ' '/MIG/{print $2}`' | tr -d ')'
Test isolation per slice
sudo docker run --rm --gpus "device=MIG-" \
nvcr.io/nvidia/cuda:12.4.1-runtime-ubuntu22.04 nvidia-smi
Each container should only see one MIG slice, confirming complete hardware isolation.
5. Performance Demonstration — Fine-Tuning With and Without MIG
Experimental Setup
| Parameter | Baseline (No MIG) | MIG (7 =C3=97 1 g 10 GB) |
|---|---|---|
| GPU Layout | 1 =C3=97 H100 (80 GB) – full device | 7 =C3=97 1 g 10 GB MIG instances |
| Concurrent Jobs | 1 (sequential) | 7 (concurrent) |
| Precision | FP16 | FP16 |
| Models | DistilBERT / MiniLM | DistilBERT / MiniLM |
| Dataset | SST-2 / GLUE subset | SST-2 / GLUE subset |
| Batch Sizes | 16 / 32 | 16 / 32 |
| Samples per job | 2 000 | 2 000 |
Parameter Baseline (No MIG) MIG (7 =C3=97 1 g 10 GB) GPU Layout 1 =C3=97 H100 (80 GB) – full device 7 =C3=97 1 g 10 GB MIG instances Concurrent Jobs 1 (sequential) 7 (concurrent) Precision FP16 FP16 Models DistilBERT / MiniLM DistilBERT / MiniLM Dataset SST-2 / GLUE subset SST-2 / GLUE subset Batch Sizes 16 / 32 16 / 32 Samples per job 2 000 2 000
Results Summary
| Exp ID | Config | Model | Batch | Jobs | Wall Time (s) | Throughput (samples/s) | Gain vs Baseline (%) | Notes |
|---|---|---|---|---|---|---|---|---|
| B1 | No MIG | DistilBERT | 16 | 7 | 2100 | 0.952 | – | Baseline |
| B2 | No MIG | DistilBERT | 32 | 7 | 1960 | 1.020 | – | Baseline |
| B3 | No MIG | MiniLM | 16 | 7 | 2025 | 0.988 | – | Baseline |
| B4 | No MIG | MiniLM | 32 | 7 | 1955 | 1.023 | – | Baseline |
| M1 | MIG (7=C3=971g.10gb) | DistilBERT | 16 | 7 | 1975 | 1.013 | + 6.3 | Concurrent |
| M2 | MIG (7=C3=971g.10gb) | DistilBERT | 32 | 7 | 1840 | 1.087 | + 6.5 | Concurrent |
| M3 | MIG (7=C3=971g.10gb) | MiniLM | 16 | 7 | 1915 | 1.044 | + 5.7 | Concurrent |
| M4 | MIG (7=C3=971g.10gb) | MiniLM | 32 | 7 | 1828 | 1.094 | + 6.6 | Concurrent |
Exp ID Config Model Batch Jobs Wall Time (s) Throughput (samples/s) Gain vs Baseline (%) Notes B1 No MIG DistilBERT 16 7 2100 0.952 – Baseline B2 No MIG DistilBERT 32 7 1960 1.020 – Baseline B3 No MIG MiniLM 16 7 2025 0.988 – Baseline B4 No MIG MiniLM 32 7 1955 1.023 – Baseline M1 MIG (7=C3=971g.10gb) DistilBERT 16 7 1975 1.013
- 6.3 Concurrent M2 MIG (7=C3=971g.10gb) DistilBERT 32 7 1840 1.087
- 6.5 Concurrent M3 MIG (7=C3=971g.10gb) MiniLM 16 7 1915 1.044
- 5.7 Concurrent M4 MIG (7=C3=971g.10gb) MiniLM 32 7 1828 1.094
- 6.6 Concurrent
Interpretation
- 5 – 7 % overall throughput gain by running seven concurrent fine-tuning jobs.
- Each MIG slice provides predictable performance and no SM / memory contention.
- Idle GPU cycles are eliminated, improving total node efficiency.
5 – 7 % overall throughput gain by running seven concurrent fine-tuning jobs. Each MIG slice provides predictable performance and no SM / memory contention. Idle GPU cycles are eliminated, improving total node efficiency.
6. Real-World Use Cases of MIG
| Industry / Domain | Use Case | MIG Benefit |
|---|---|---|
| AI Inference Clusters | Host multiple small LLMs (e.g. 7=C3=97 Llama-3 8B) | Hardware-level isolation; no context-switch overhead |
| MLOps Platforms | Multi-tenant fine-tuning (DistilBERT, LoRA, QLoRA) | Predictable latency and QoS per tenant |
| Cloud GPU Providers | Fractional GPU leasing (10 GB slices) | Maximized H100 utilization |
| Academic Research | Parallel hyperparameter sweeps / student workloads | Safe shared GPU access with full isolation |
| Edge / AI Gateways | Multiple AI microservices (CV, NLP, ASR) | Dedicated GPU partition per service |
| Data Center Efficiency | Mix inference + monitoring on same node | Prevents cross-task interference |
| LLM Serving Gateways | Deploy 7 independent LLM endpoints per GPU | Stable QoS and Kubernetes scalability |
Industry / Domain Use Case MIG Benefit AI Inference Clusters Host multiple small LLMs (e.g. 7=C3=97 Llama-3 8B) Hardware-level isolation; no context-switch overhead MLOps Platforms Multi-tenant fine-tuning (DistilBERT, LoRA, QLoRA) Predictable latency and QoS per tenant Cloud GPU Providers Fractional GPU leasing (10 GB slices) Maximized H100 utilization Academic Research Parallel hyperparameter sweeps / student workloads Safe shared GPU access with full isolation Edge / AI Gateways Multiple AI microservices (CV, NLP, ASR) Dedicated GPU partition per service Data Center Efficiency Mix inference + monitoring on same node Prevents cross-task interference LLM Serving Gateways Deploy 7 independent LLM endpoints per GPU Stable QoS and Kubernetes scalability
7. Cleanup / Revert to Full GPU Mode
sudo nvidia-smi mig -dci -i 0
sudo nvidia-smi mig -dgi -i 0
sudo nvidia-smi -i 0 -mig 0
sudo reboot
8. Key Takeaways
- MIG partitions GPUs at the hardware level, not via software scheduling.
- Ideal for multi-tenant, multi-workload, and cloud-native setups.
- Ensures deterministic performance and QoS isolation.
- On H100, 2nd-Gen MIG adds dynamic reconfiguration & improved bandwidth scaling.
MIG partitions GPUs at the hardware level, not via software scheduling. Ideal for multi-tenant, multi-workload, and cloud-native setups. Ensures deterministic performance and QoS isolation. On H100, 2nd-Gen MIG adds dynamic reconfiguration & improved bandwidth scaling.