Multi Instance GPU Setup

1. Overview

This guide provides a detailed procedure for configuring and validating Multi-Instance GPU (MIG) on NVIDIA GPUs hosted on Substrate AI Cloud Bare Metal nodes.

Objectives

Enable and configure MIG mode on Nvidia GPUs
Partition a single GPU into multiple 1 g / 10 GB instances
Verify hardware-level isolation using Docker containers
Demonstrate fine-tuning performance improvements (5–7 %) with concurrent workloads
Present real-world use cases for MIG in AI inference, MLOps, and cloud GPU environments

Enable and configure MIG mode on Nvidia GPUs Partition a single GPU into multiple 1 g / 10 GB instances Verify hardware-level isolation using Docker containers Demonstrate fine-tuning performance improvements (5–7 %) with concurrent workloads Present real-world use cases for MIG in AI inference, MLOps, and cloud GPU environments

Architecture Overview

Layer	Purpose
Bare-Metal Node	Physical host with NVIDIA H100 GPU
MIG Slices	Hardware-isolated GPU partitions (e.g. 7 =C3=97 1 g 10 GB)
Docker Containers	Individual jobs, each bound to one MIG instance
Workload Manager	Orchestrates fine-tuning or inference jobs
Monitoring Tools	nvidia-smi, Prometheus, Grafana DCGM Exporter

Layer Purpose Bare-Metal Node Physical host with NVIDIA H100 GPU MIG Slices Hardware-isolated GPU partitions (e.g. 7 =C3=97 1 g 10 GB) Docker Containers Individual jobs, each bound to one MIG instance Workload Manager Orchestrates fine-tuning or inference jobs Monitoring Tools nvidia-smi, Prometheus, Grafana DCGM Exporter

2. Prerequisites

Requirement	Description
Driver	NVIDIA =E2=89=A5 550 (MIG-capable driver)
CUDA Toolkit	12.4 or later
GPU Model	NVIDIA H100 PCIe or SXM (Hopper Architecture)
NVIDIA Container Toolkit	Installed for Docker GPU passthrough
No Active GPU Processes	Stop all containers and DCGM agents before enabling MIG

Requirement Description Driver NVIDIA =E2=89=A5 550 (MIG-capable driver) CUDA Toolkit 12.4 or later GPU Model NVIDIA H100 PCIe or SXM (Hopper Architecture) NVIDIA Container Toolkit Installed for Docker GPU passthrough No Active GPU Processes Stop all containers and DCGM agents before enabling MIG

Verify current GPU status

nvidia-smi

3. Enable MIG Mode on GPU ID 0

Run each command in sequence:

# Enable MIG mode
sudo nvidia-smi -i 0 -mig 1

# Delete existing instances (clean slate)
sudo nvidia-smi mig -dci -i 0
sudo nvidia-smi mig -dgi -i 0

# List GPU instance profiles to locate ID for 1g.10gb
nvidia-smi mig -lgip | grep "1g.10gb"

# Assuming profile ID  0 for H100
sudo nvidia-smi mig -cgi 19,19,19,19,19,19,19 -C -i 0

# Verify creation
sudo nvidia-smi mig -lgi -i 0
nvidia-smi -L | sed -n '/GPU 0:/,/GPU/p'

You should now have 7 MIG instances, each representing a 1 g / 10 GB slice.

4. Verification & Isolation Test

List all MIG UUIDs

nvidia-smi -L | awk -F'UUID: ' '/MIG/{print $2}`' | tr -d ')'

Test isolation per slice

sudo docker run --rm --gpus "device=MIG-" \
  nvcr.io/nvidia/cuda:12.4.1-runtime-ubuntu22.04 nvidia-smi

Each container should only see one MIG slice, confirming complete hardware isolation.

5. Performance Demonstration — Fine-Tuning With and Without MIG

Experimental Setup

Parameter	Baseline (No MIG)	MIG (7 =C3=97 1 g 10 GB)
GPU Layout	1 =C3=97 H100 (80 GB) – full device	7 =C3=97 1 g 10 GB MIG instances
Concurrent Jobs	1 (sequential)	7 (concurrent)
Precision	FP16	FP16
Models	DistilBERT / MiniLM	DistilBERT / MiniLM
Dataset	SST-2 / GLUE subset	SST-2 / GLUE subset
Batch Sizes	16 / 32	16 / 32
Samples per job	2 000	2 000

Parameter Baseline (No MIG) MIG (7 =C3=97 1 g 10 GB) GPU Layout 1 =C3=97 H100 (80 GB) – full device 7 =C3=97 1 g 10 GB MIG instances Concurrent Jobs 1 (sequential) 7 (concurrent) Precision FP16 FP16 Models DistilBERT / MiniLM DistilBERT / MiniLM Dataset SST-2 / GLUE subset SST-2 / GLUE subset Batch Sizes 16 / 32 16 / 32 Samples per job 2 000 2 000

Results Summary

Exp ID	Config	Model	Batch	Jobs	Wall Time (s)	Throughput (samples/s)	Gain vs Baseline (%)	Notes
B1	No MIG	DistilBERT	16	7	2100	0.952	–	Baseline
B2	No MIG	DistilBERT	32	7	1960	1.020	–	Baseline
B3	No MIG	MiniLM	16	7	2025	0.988	–	Baseline
B4	No MIG	MiniLM	32	7	1955	1.023	–	Baseline
M1	MIG (7=C3=971g.10gb)	DistilBERT	16	7	1975	1.013	+ 6.3	Concurrent
M2	MIG (7=C3=971g.10gb)	DistilBERT	32	7	1840	1.087	+ 6.5	Concurrent
M3	MIG (7=C3=971g.10gb)	MiniLM	16	7	1915	1.044	+ 5.7	Concurrent
M4	MIG (7=C3=971g.10gb)	MiniLM	32	7	1828	1.094	+ 6.6	Concurrent

Exp ID Config Model Batch Jobs Wall Time (s) Throughput (samples/s) Gain vs Baseline (%) Notes B1 No MIG DistilBERT 16 7 2100 0.952 – Baseline B2 No MIG DistilBERT 32 7 1960 1.020 – Baseline B3 No MIG MiniLM 16 7 2025 0.988 – Baseline B4 No MIG MiniLM 32 7 1955 1.023 – Baseline M1 MIG (7=C3=971g.10gb) DistilBERT 16 7 1975 1.013

6.3 Concurrent M2 MIG (7=C3=971g.10gb) DistilBERT 32 7 1840 1.087
6.5 Concurrent M3 MIG (7=C3=971g.10gb) MiniLM 16 7 1915 1.044
5.7 Concurrent M4 MIG (7=C3=971g.10gb) MiniLM 32 7 1828 1.094
6.6 Concurrent

Interpretation

5 – 7 % overall throughput gain by running seven concurrent fine-tuning jobs.
Each MIG slice provides predictable performance and no SM / memory contention.
Idle GPU cycles are eliminated, improving total node efficiency.

5 – 7 % overall throughput gain by running seven concurrent fine-tuning jobs. Each MIG slice provides predictable performance and no SM / memory contention. Idle GPU cycles are eliminated, improving total node efficiency.

6. Real-World Use Cases of MIG

Industry / Domain	Use Case	MIG Benefit
AI Inference Clusters	Host multiple small LLMs (e.g. 7=C3=97 Llama-3 8B)	Hardware-level isolation; no context-switch overhead
MLOps Platforms	Multi-tenant fine-tuning (DistilBERT, LoRA, QLoRA)	Predictable latency and QoS per tenant
Cloud GPU Providers	Fractional GPU leasing (10 GB slices)	Maximized H100 utilization
Academic Research	Parallel hyperparameter sweeps / student workloads	Safe shared GPU access with full isolation
Edge / AI Gateways	Multiple AI microservices (CV, NLP, ASR)	Dedicated GPU partition per service
Data Center Efficiency	Mix inference + monitoring on same node	Prevents cross-task interference
LLM Serving Gateways	Deploy 7 independent LLM endpoints per GPU	Stable QoS and Kubernetes scalability

Industry / Domain Use Case MIG Benefit AI Inference Clusters Host multiple small LLMs (e.g. 7=C3=97 Llama-3 8B) Hardware-level isolation; no context-switch overhead MLOps Platforms Multi-tenant fine-tuning (DistilBERT, LoRA, QLoRA) Predictable latency and QoS per tenant Cloud GPU Providers Fractional GPU leasing (10 GB slices) Maximized H100 utilization Academic Research Parallel hyperparameter sweeps / student workloads Safe shared GPU access with full isolation Edge / AI Gateways Multiple AI microservices (CV, NLP, ASR) Dedicated GPU partition per service Data Center Efficiency Mix inference + monitoring on same node Prevents cross-task interference LLM Serving Gateways Deploy 7 independent LLM endpoints per GPU Stable QoS and Kubernetes scalability

7. Cleanup / Revert to Full GPU Mode

sudo nvidia-smi mig -dci -i 0
sudo nvidia-smi mig -dgi -i 0
sudo nvidia-smi -i 0 -mig 0
sudo reboot

8. Key Takeaways

MIG partitions GPUs at the hardware level, not via software scheduling.
Ideal for multi-tenant, multi-workload, and cloud-native setups.
Ensures deterministic performance and QoS isolation.
On H100, 2nd-Gen MIG adds dynamic reconfiguration & improved bandwidth scaling.

MIG partitions GPUs at the hardware level, not via software scheduling. Ideal for multi-tenant, multi-workload, and cloud-native setups. Ensures deterministic performance and QoS isolation. On H100, 2nd-Gen MIG adds dynamic reconfiguration & improved bandwidth scaling.

1. Overview​

Objectives​

Architecture Overview​

2. Prerequisites​

Verify current GPU status​

3. Enable MIG Mode on GPU ID 0​

4. Verification & Isolation Test​

List all MIG UUIDs​

Test isolation per slice​

5. Performance Demonstration — Fine-Tuning With and Without MIG​

Experimental Setup​

Results Summary​

Interpretation​

6. Real-World Use Cases of MIG​

7. Cleanup / Revert to Full GPU Mode​

8. Key Takeaways​