This post covers exposing an NVIDIA RTX 5070 Ti (Blackwell architecture) as a schedulable Kubernetes resource with time-slicing support, allowing multiple pods to share the GPU.
Hardware
| Node | GPU | Memory | Compute Capability |
|---|---|---|---|
| polycephala | NVIDIA GeForce RTX 5070 Ti | 16 GB | 12.0 (Blackwell, sm_120) |
The RTX 5070 Ti uses the new Blackwell architecture (GB203 chip) with compute capability sm_120. This creates compatibility challenges with some software that hasn’t been updated yet.
Architecture
┌─────────────────────────────────────────────────────────────────────┐
│ polycephala node │
│ ┌────────────────────┐ ┌─────────────────────────────────────┐ │
│ │ NVIDIA Driver │ │ containerd │ │
│ │ 580.95.05 │ │ ┌─────────────────────────────────┐ │ │
│ │ │◀──│ │ nvidia-container-runtime │ │ │
│ │ /dev/nvidia0 │ │ │ (RuntimeClass: nvidia) │ │ │
│ │ /dev/nvidiactl │ │ └─────────────────────────────────┘ │ │
│ └────────────────────┘ └─────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────┐ │
│ │ nvidia-device-plugin DaemonSet │ │
│ │ - Discovers GPU │ │
│ │ - Registers nvidia.com/gpu resource │ │
│ │ - Time-slicing: 4 replicas per GPU │ │
│ └─────────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────────┐
│ Kubernetes Scheduler │
│ polycephala: nvidia.com/gpu = 4 (time-sliced from 1 physical) │
└─────────────────────────────────────────────────────────────────────┘
Prerequisites on the GPU Node
1. NVIDIA Driver (Rocky Linux 10)
The driver was installed via RPMFusion:
sudo dnf install akmod-nvidia xorg-x11-drv-nvidia-cuda
The xorg-x11-drv-nvidia-cuda package is required for nvidia-smi. Without it, the nvidia-container-runtime cannot inject the nvidia-smi binary into containers.
Verify the driver:
nvidia-smi
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 580.95.05 Driver Version: 580.95.05 CUDA Version: 13.0 |
+-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| 0 NVIDIA GeForce RTX 5070 Ti Off | 00000000:21:00.0 Off | N/A |
| 0% 54C P8 15W / 300W | 42MiB / 16303MiB | 0% Default |
+-----------------------------------------+------------------------+----------------------+
2. NVIDIA Container Toolkit
The container toolkit allows containerd to access the GPU:
# Add NVIDIA container toolkit repository
curl -s -L https://nvidia.github.io/libnvidia-container/stable/rpm/nvidia-container-toolkit.repo | \
sudo tee /etc/yum.repos.d/nvidia-container-toolkit.repo
# Install
sudo dnf install nvidia-container-toolkit
# Configure containerd to use nvidia runtime
sudo nvidia-ctk runtime configure --runtime=containerd
# Restart containerd
sudo systemctl restart containerd
This creates /etc/containerd/conf.d/99-nvidia.toml with the nvidia runtime configuration.
3. Label the GPU Node
kubectl label node polycephala nvidia.com/gpu.present=true
Directory Structure
infrastructure/nvidia-device-plugin/
├── device-plugin-daemonset.yaml # Device plugin + ConfigMap
├── dcgm-exporter.yaml.disabled # GPU metrics (disabled - no Blackwell support)
├── nvidia-runtime-class.yaml # RuntimeClass definition
├── setup-nvidia-device-plugin.sh # Installation script
├── test-gpu-workload.yaml # Test pods
└── README.md
Kubernetes Configuration
RuntimeClass
The RuntimeClass tells Kubernetes to use the nvidia container runtime for pods that request it:
apiVersion: node.k8s.io/v1
kind: RuntimeClass
metadata:
name: nvidia
handler: nvidia
Device Plugin with Time-Slicing
The device plugin discovers GPUs and registers them as schedulable resources. Time-slicing is configured via ConfigMap to allow 4 pods to share one GPU:
apiVersion: v1
kind: Namespace
metadata:
name: nvidia-device-plugin
labels:
name: nvidia-device-plugin
annotations:
linkerd.io/inject: enabled
---
apiVersion: v1
kind: ConfigMap
metadata:
name: nvidia-device-plugin-config
namespace: nvidia-device-plugin
data:
config.yaml: |
version: v1
flags:
migStrategy: none
failOnInitError: false
sharing:
timeSlicing:
renameByDefault: false
failRequestsGreaterThanOne: false
resources:
- name: nvidia.com/gpu
replicas: 4
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: nvidia-device-plugin-daemonset
namespace: nvidia-device-plugin
labels:
app.kubernetes.io/name: nvidia-device-plugin
app.kubernetes.io/component: device-plugin
spec:
selector:
matchLabels:
app.kubernetes.io/name: nvidia-device-plugin
updateStrategy:
type: RollingUpdate
template:
metadata:
labels:
app.kubernetes.io/name: nvidia-device-plugin
app.kubernetes.io/component: device-plugin
spec:
tolerations:
- key: nvidia.com/gpu
operator: Exists
effect: NoSchedule
priorityClassName: system-node-critical
runtimeClassName: nvidia
containers:
- name: nvidia-device-plugin
image: nvcr.io/nvidia/k8s-device-plugin:v0.17.0
args:
- --config-file=/config/config.yaml
env:
- name: FAIL_ON_INIT_ERROR
value: "false"
resources:
limits:
cpu: 100m
memory: 128Mi
requests:
cpu: 50m
memory: 64Mi
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
capabilities:
drop: ["ALL"]
volumeMounts:
- name: device-plugin
mountPath: /var/lib/kubelet/device-plugins
- name: config
mountPath: /config
volumes:
- name: device-plugin
hostPath:
path: /var/lib/kubelet/device-plugins
- name: config
configMap:
name: nvidia-device-plugin-config
nodeSelector:
nvidia.com/gpu.present: "true"
Setup Script
#!/bin/bash
set -e
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
echo "=== NVIDIA GPU Device Plugin Setup ==="
GPU_NODES=$(kubectl get nodes -l nvidia.com/gpu.present=true -o name 2>/dev/null || true)
if [ -z "$GPU_NODES" ]; then
echo "WARNING: No nodes labeled with nvidia.com/gpu.present=true"
echo "Label GPU nodes with: kubectl label node <node-name> nvidia.com/gpu.present=true"
fi
echo "Creating NVIDIA RuntimeClass..."
kubectl apply -f "$SCRIPT_DIR/nvidia-runtime-class.yaml"
echo "Deploying NVIDIA device plugin with time-slicing..."
kubectl apply -f "$SCRIPT_DIR/device-plugin-daemonset.yaml"
echo "Waiting for device plugin to be ready..."
kubectl rollout status daemonset/nvidia-device-plugin-daemonset -n nvidia-device-plugin --timeout=120s || true
echo ""
echo "=== GPU Resources ==="
kubectl get nodes -l nvidia.com/gpu.present=true -o custom-columns='NODE:.metadata.name,GPU:.status.allocatable.nvidia\.com/gpu'
Deployment
cd infrastructure/nvidia-device-plugin/
./setup-nvidia-device-plugin.sh
=== NVIDIA GPU Device Plugin Setup ===
Creating NVIDIA RuntimeClass...
runtimeclass.node.k8s.io/nvidia created
Deploying NVIDIA device plugin with time-slicing...
namespace/nvidia-device-plugin created
configmap/nvidia-device-plugin-config created
daemonset.apps/nvidia-device-plugin-daemonset created
=== GPU Resources ===
NODE GPU
polycephala 4
The node reports 4 GPUs due to time-slicing (4 replicas of 1 physical GPU).
Using the GPU
Request nvidia.com/gpu in pod specifications:
apiVersion: v1
kind: Pod
metadata:
name: gpu-workload
spec:
runtimeClassName: nvidia
containers:
- name: app
image: pytorch/pytorch:2.8.0-cuda12.8-cudnn9-devel
resources:
limits:
nvidia.com/gpu: 1
Blackwell / PyTorch Compatibility
The RTX 5070 Ti uses Blackwell architecture (sm_120). Most PyTorch versions do not support this yet:
| PyTorch Version | CUDA | sm_120 Support |
|---|---|---|
| 2.5.1 | 12.4 | ❌ |
| 2.6.0 | 12.6 | ❌ |
| 2.8.0 | 12.8 | ✅ |
Using an incompatible version produces:
NVIDIA GeForce RTX 5070 Ti with CUDA capability sm_120 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_50 sm_60 sm_70 sm_75 sm_80 sm_86 sm_90.
PyTorch 2.8.0 with CUDA 12.8 is required for Blackwell GPUs.
Test Workload
A PyTorch matrix multiplication test verifies the GPU works:
apiVersion: v1
kind: Pod
metadata:
name: gpu-test-pytorch
spec:
restartPolicy: Never
runtimeClassName: nvidia
containers:
- name: pytorch
image: pytorch/pytorch:2.8.0-cuda12.8-cudnn9-devel
command:
- python3
- -c
- |
import torch
print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
print(f"CUDA version: {torch.version.cuda}")
print(f"GPU: {torch.cuda.get_device_name(0)}")
props = torch.cuda.get_device_properties(0)
print(f"GPU Memory: {props.total_memory / 1e9:.1f} GB")
print(f"Compute capability: {props.major}.{props.minor}")
print("Running matrix multiplication test...")
x = torch.randn(2000, 2000, device='cuda')
y = torch.matmul(x, x)
print(f"Matrix multiplication test: SUCCESS")
import time
start = time.time()
for _ in range(100):
y = torch.matmul(x, x)
torch.cuda.synchronize()
elapsed = time.time() - start
print(f"100x matmul (2000x2000): {elapsed:.3f}s")
resources:
limits:
nvidia.com/gpu: 1
Output:
PyTorch version: 2.8.0+cu128
CUDA available: True
CUDA version: 12.8
GPU: NVIDIA GeForce RTX 5070 Ti
GPU Memory: 16.6 GB
Compute capability: 12.0
Running matrix multiplication test...
Matrix multiplication test: SUCCESS
100x matmul (2000x2000): 0.056s
Gotchas
Missing nvidia-smi
The RPMFusion akmod-nvidia package does not include nvidia-smi. Without it, containers cannot run nvidia-smi even if the driver is installed. Install xorg-x11-drv-nvidia-cuda:
sudo dnf install xorg-x11-drv-nvidia-cuda
NVIDIA_VISIBLE_DEVICES=void
If pods see NVIDIA_VISIBLE_DEVICES=void, the device plugin’s --pass-device-specs flag is set. This sets the variable to “void” and passes devices via the Kubernetes device plugin API instead. Remove the flag to use the standard environment variable method:
# Don't use --pass-device-specs unless specifically needed
args:
- --config-file=/config/config.yaml
# NOT: - --pass-device-specs
DCGM Exporter Not Working
NVIDIA DCGM (Data Center GPU Manager) does not support Blackwell GPUs yet. Tested versions 3.3.9 and 4.4.2 both fail to collect metrics:
time=2026-01-02T23:06:30.305Z level=INFO msg="DCGM successfully initialized!"
time=2026-01-02T23:06:30.305Z level=INFO msg="Initializing system entities of type 'GPU'"
# No GPU metrics collected
nvidia-smi works inside the DCGM container, confirming the issue is DCGM library support, not driver access. The DCGM exporter configuration is included but disabled (dcgm-exporter.yaml.disabled) until NVIDIA adds Blackwell support.
Time-Slicing Limitations
Time-slicing allows multiple pods to share a GPU via context switching. Important limitations:
- No memory isolation between pods
- No performance guarantees
- All pods share the same 16GB VRAM
- One misbehaving pod can affect others
For memory isolation, MIG (Multi-Instance GPU) is required, but consumer GPUs like the RTX 5070 Ti do not support MIG.
Summary
| Component | Status |
|---|---|
| NVIDIA Driver 580.95.05 | ✅ Working |
| nvidia-container-toolkit | ✅ Working |
| nvidia-device-plugin v0.17.0 | ✅ Working |
| Time-slicing (4 replicas) | ✅ Working |
| Linkerd integration | ✅ Working |
| PyTorch 2.8.0+cu128 | ✅ Working |
| DCGM Exporter | ⏸️ Disabled (no Blackwell support) |
The RTX 5070 Ti is now a schedulable Kubernetes resource. Up to 4 pods can share the GPU via time-slicing. PyTorch 2.8.0 with CUDA 12.8 is required for Blackwell compatibility.
Configuration available at k8s-configs/infrastructure/nvidia-device-plugin.