Skip to main content

Multi Instance GPU Setup

1. Overview

This guide provides a detailed procedure for configuring and validating Multi-Instance GPU (MIG) on NVIDIA GPUs hosted on Substrate AI Cloud Bare Metal nodes.

Objectives

  • Enable and configure MIG mode on Nvidia GPUs
  • Partition a single GPU into multiple 1 g / 10 GB instances
  • Verify hardware-level isolation using Docker containers
  • Demonstrate fine-tuning performance improvements (5–7 %) with concurrent workloads
  • Present real-world use cases for MIG in AI inference, MLOps, and cloud GPU environments

Enable and configure MIG mode on Nvidia GPUs Partition a single GPU into multiple 1 g / 10 GB instances Verify hardware-level isolation using Docker containers Demonstrate fine-tuning performance improvements (5–7 %) with concurrent workloads Present real-world use cases for MIG in AI inference, MLOps, and cloud GPU environments

Architecture Overview

LayerPurpose
Bare-Metal NodePhysical host with NVIDIA H100 GPU
MIG SlicesHardware-isolated GPU partitions (e.g. 7 =C3=97 1 g 10 GB)
Docker ContainersIndividual jobs, each bound to one MIG instance
Workload ManagerOrchestrates fine-tuning or inference jobs
Monitoring Toolsnvidia-smi, Prometheus, Grafana DCGM Exporter

Layer Purpose Bare-Metal Node Physical host with NVIDIA H100 GPU MIG Slices Hardware-isolated GPU partitions (e.g. 7 =C3=97 1 g 10 GB) Docker Containers Individual jobs, each bound to one MIG instance Workload Manager Orchestrates fine-tuning or inference jobs Monitoring Tools nvidia-smi, Prometheus, Grafana DCGM Exporter


2. Prerequisites

RequirementDescription
DriverNVIDIA =E2=89=A5 550 (MIG-capable driver)
CUDA Toolkit12.4 or later
GPU ModelNVIDIA H100 PCIe or SXM (Hopper Architecture)
NVIDIA Container ToolkitInstalled for Docker GPU passthrough
No Active GPU ProcessesStop all containers and DCGM agents before enabling MIG

Requirement Description Driver NVIDIA =E2=89=A5 550 (MIG-capable driver) CUDA Toolkit 12.4 or later GPU Model NVIDIA H100 PCIe or SXM (Hopper Architecture) NVIDIA Container Toolkit Installed for Docker GPU passthrough No Active GPU Processes Stop all containers and DCGM agents before enabling MIG

Verify current GPU status

nvidia-smi

3. Enable MIG Mode on GPU ID 0

Run each command in sequence:

# Enable MIG mode
sudo nvidia-smi -i 0 -mig 1

# Delete existing instances (clean slate)
sudo nvidia-smi mig -dci -i 0
sudo nvidia-smi mig -dgi -i 0

# List GPU instance profiles to locate ID for 1g.10gb
nvidia-smi mig -lgip | grep "1g.10gb"

# Assuming profile ID 0 for H100
sudo nvidia-smi mig -cgi 19,19,19,19,19,19,19 -C -i 0

# Verify creation
sudo nvidia-smi mig -lgi -i 0
nvidia-smi -L | sed -n '/GPU 0:/,/GPU/p'

You should now have 7 MIG instances, each representing a 1 g / 10 GB slice.


4. Verification & Isolation Test

List all MIG UUIDs

nvidia-smi -L | awk -F'UUID: ' '/MIG/{print $2}`' | tr -d ')'

Test isolation per slice

sudo docker run --rm --gpus "device=MIG-" \
nvcr.io/nvidia/cuda:12.4.1-runtime-ubuntu22.04 nvidia-smi

Each container should only see one MIG slice, confirming complete hardware isolation.


5. Performance Demonstration — Fine-Tuning With and Without MIG

Experimental Setup

ParameterBaseline (No MIG)MIG (7 =C3=97 1 g 10 GB)
GPU Layout1 =C3=97 H100 (80 GB) – full device7 =C3=97 1 g 10 GB MIG instances
Concurrent Jobs1 (sequential)7 (concurrent)
PrecisionFP16FP16
ModelsDistilBERT / MiniLMDistilBERT / MiniLM
DatasetSST-2 / GLUE subsetSST-2 / GLUE subset
Batch Sizes16 / 3216 / 32
Samples per job2 0002 000

Parameter Baseline (No MIG) MIG (7 =C3=97 1 g 10 GB) GPU Layout 1 =C3=97 H100 (80 GB) – full device 7 =C3=97 1 g 10 GB MIG instances Concurrent Jobs 1 (sequential) 7 (concurrent) Precision FP16 FP16 Models DistilBERT / MiniLM DistilBERT / MiniLM Dataset SST-2 / GLUE subset SST-2 / GLUE subset Batch Sizes 16 / 32 16 / 32 Samples per job 2 000 2 000


Results Summary

Exp IDConfigModelBatchJobsWall Time (s)Throughput (samples/s)Gain vs Baseline (%)Notes
B1No MIGDistilBERT16721000.952Baseline
B2No MIGDistilBERT32719601.020Baseline
B3No MIGMiniLM16720250.988Baseline
B4No MIGMiniLM32719551.023Baseline
M1MIG (7=C3=971g.10gb)DistilBERT16719751.013+ 6.3Concurrent
M2MIG (7=C3=971g.10gb)DistilBERT32718401.087+ 6.5Concurrent
M3MIG (7=C3=971g.10gb)MiniLM16719151.044+ 5.7Concurrent
M4MIG (7=C3=971g.10gb)MiniLM32718281.094+ 6.6Concurrent

Exp ID Config Model Batch Jobs Wall Time (s) Throughput (samples/s) Gain vs Baseline (%) Notes B1 No MIG DistilBERT 16 7 2100 0.952 – Baseline B2 No MIG DistilBERT 32 7 1960 1.020 – Baseline B3 No MIG MiniLM 16 7 2025 0.988 – Baseline B4 No MIG MiniLM 32 7 1955 1.023 – Baseline M1 MIG (7=C3=971g.10gb) DistilBERT 16 7 1975 1.013

  • 6.3 Concurrent M2 MIG (7=C3=971g.10gb) DistilBERT 32 7 1840 1.087
  • 6.5 Concurrent M3 MIG (7=C3=971g.10gb) MiniLM 16 7 1915 1.044
  • 5.7 Concurrent M4 MIG (7=C3=971g.10gb) MiniLM 32 7 1828 1.094
  • 6.6 Concurrent

Interpretation

  • 5 – 7 % overall throughput gain by running seven concurrent fine-tuning jobs.
  • Each MIG slice provides predictable performance and no SM / memory contention.
  • Idle GPU cycles are eliminated, improving total node efficiency.

5 – 7 % overall throughput gain by running seven concurrent fine-tuning jobs. Each MIG slice provides predictable performance and no SM / memory contention. Idle GPU cycles are eliminated, improving total node efficiency.


6. Real-World Use Cases of MIG

Industry / DomainUse CaseMIG Benefit
AI Inference ClustersHost multiple small LLMs (e.g. 7=C3=97 Llama-3 8B)Hardware-level isolation; no context-switch overhead
MLOps PlatformsMulti-tenant fine-tuning (DistilBERT, LoRA, QLoRA)Predictable latency and QoS per tenant
Cloud GPU ProvidersFractional GPU leasing (10 GB slices)Maximized H100 utilization
Academic ResearchParallel hyperparameter sweeps / student workloadsSafe shared GPU access with full isolation
Edge / AI GatewaysMultiple AI microservices (CV, NLP, ASR)Dedicated GPU partition per service
Data Center EfficiencyMix inference + monitoring on same nodePrevents cross-task interference
LLM Serving GatewaysDeploy 7 independent LLM endpoints per GPUStable QoS and Kubernetes scalability

Industry / Domain Use Case MIG Benefit AI Inference Clusters Host multiple small LLMs (e.g. 7=C3=97 Llama-3 8B) Hardware-level isolation; no context-switch overhead MLOps Platforms Multi-tenant fine-tuning (DistilBERT, LoRA, QLoRA) Predictable latency and QoS per tenant Cloud GPU Providers Fractional GPU leasing (10 GB slices) Maximized H100 utilization Academic Research Parallel hyperparameter sweeps / student workloads Safe shared GPU access with full isolation Edge / AI Gateways Multiple AI microservices (CV, NLP, ASR) Dedicated GPU partition per service Data Center Efficiency Mix inference + monitoring on same node Prevents cross-task interference LLM Serving Gateways Deploy 7 independent LLM endpoints per GPU Stable QoS and Kubernetes scalability


7. Cleanup / Revert to Full GPU Mode

sudo nvidia-smi mig -dci -i 0
sudo nvidia-smi mig -dgi -i 0
sudo nvidia-smi -i 0 -mig 0
sudo reboot

8. Key Takeaways

  • MIG partitions GPUs at the hardware level, not via software scheduling.
  • Ideal for multi-tenant, multi-workload, and cloud-native setups.
  • Ensures deterministic performance and QoS isolation.
  • On H100, 2nd-Gen MIG adds dynamic reconfiguration & improved bandwidth scaling.

MIG partitions GPUs at the hardware level, not via software scheduling. Ideal for multi-tenant, multi-workload, and cloud-native setups. Ensures deterministic performance and QoS isolation. On H100, 2nd-Gen MIG adds dynamic reconfiguration & improved bandwidth scaling.