Exposing an NVIDIA RTX 5070 Ti GPU in Kubernetes with Time-Slicing

This post covers exposing an NVIDIA RTX 5070 Ti (Blackwell architecture) as a schedulable Kubernetes resource with time-slicing support, allowing multiple pods to share the GPU.

Hardware

Node	GPU	Memory	Compute Capability
polycephala	NVIDIA GeForce RTX 5070 Ti	16 GB	12.0 (Blackwell, sm_120)

The RTX 5070 Ti uses the new Blackwell architecture (GB203 chip) with compute capability sm_120. This creates compatibility challenges with some software that hasn’t been updated yet.

Architecture

┌─────────────────────────────────────────────────────────────────────┐
│                          polycephala node                            │
│  ┌────────────────────┐   ┌─────────────────────────────────────┐  │
│  │   NVIDIA Driver    │   │        containerd                    │  │
│  │   580.95.05        │   │  ┌─────────────────────────────────┐ │  │
│  │                    │◀──│  │  nvidia-container-runtime       │ │  │
│  │  /dev/nvidia0      │   │  │  (RuntimeClass: nvidia)         │ │  │
│  │  /dev/nvidiactl    │   │  └─────────────────────────────────┘ │  │
│  └────────────────────┘   └─────────────────────────────────────┘  │
│                                        │                            │
│                                        ▼                            │
│  ┌─────────────────────────────────────────────────────────────┐   │
│  │              nvidia-device-plugin DaemonSet                  │   │
│  │  - Discovers GPU                                             │   │
│  │  - Registers nvidia.com/gpu resource                         │   │
│  │  - Time-slicing: 4 replicas per GPU                          │   │
│  └─────────────────────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────────────────────┘
                                    │
                                    ▼
┌─────────────────────────────────────────────────────────────────────┐
│                      Kubernetes Scheduler                            │
│  polycephala: nvidia.com/gpu = 4  (time-sliced from 1 physical)     │
└─────────────────────────────────────────────────────────────────────┘

Prerequisites on the GPU Node

1. NVIDIA Driver (Rocky Linux 10)

The driver was installed via RPMFusion:

sudo dnf install akmod-nvidia xorg-x11-drv-nvidia-cuda

The xorg-x11-drv-nvidia-cuda package is required for nvidia-smi. Without it, the nvidia-container-runtime cannot inject the nvidia-smi binary into containers.

Verify the driver:

nvidia-smi

+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 580.95.05              Driver Version: 580.95.05      CUDA Version: 13.0     |
+-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
|   0  NVIDIA GeForce RTX 5070 Ti     Off |   00000000:21:00.0 Off |                  N/A |
|  0%   54C    P8             15W /  300W |      42MiB /  16303MiB |      0%      Default |
+-----------------------------------------+------------------------+----------------------+

2. NVIDIA Container Toolkit

The container toolkit allows containerd to access the GPU:

# Add NVIDIA container toolkit repository
curl -s -L https://nvidia.github.io/libnvidia-container/stable/rpm/nvidia-container-toolkit.repo | \
  sudo tee /etc/yum.repos.d/nvidia-container-toolkit.repo

# Install
sudo dnf install nvidia-container-toolkit

# Configure containerd to use nvidia runtime
sudo nvidia-ctk runtime configure --runtime=containerd

# Restart containerd
sudo systemctl restart containerd

This creates /etc/containerd/conf.d/99-nvidia.toml with the nvidia runtime configuration.

3. Label the GPU Node

kubectl label node polycephala nvidia.com/gpu.present=true

Directory Structure

infrastructure/nvidia-device-plugin/
├── device-plugin-daemonset.yaml   # Device plugin + ConfigMap
├── dcgm-exporter.yaml.disabled    # GPU metrics (disabled - no Blackwell support)
├── nvidia-runtime-class.yaml      # RuntimeClass definition
├── setup-nvidia-device-plugin.sh  # Installation script
├── test-gpu-workload.yaml         # Test pods
└── README.md

Kubernetes Configuration

RuntimeClass

The RuntimeClass tells Kubernetes to use the nvidia container runtime for pods that request it:

apiVersion: node.k8s.io/v1
kind: RuntimeClass
metadata:
  name: nvidia
handler: nvidia

Device Plugin with Time-Slicing

The device plugin discovers GPUs and registers them as schedulable resources. Time-slicing is configured via ConfigMap to allow 4 pods to share one GPU:

apiVersion: v1
kind: Namespace
metadata:
  name: nvidia-device-plugin
  labels:
    name: nvidia-device-plugin
  annotations:
    linkerd.io/inject: enabled
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: nvidia-device-plugin-config
  namespace: nvidia-device-plugin
data:
  config.yaml: |
    version: v1
    flags:
      migStrategy: none
      failOnInitError: false
    sharing:
      timeSlicing:
        renameByDefault: false
        failRequestsGreaterThanOne: false
        resources:
          - name: nvidia.com/gpu
            replicas: 4
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: nvidia-device-plugin-daemonset
  namespace: nvidia-device-plugin
  labels:
    app.kubernetes.io/name: nvidia-device-plugin
    app.kubernetes.io/component: device-plugin
spec:
  selector:
    matchLabels:
      app.kubernetes.io/name: nvidia-device-plugin
  updateStrategy:
    type: RollingUpdate
  template:
    metadata:
      labels:
        app.kubernetes.io/name: nvidia-device-plugin
        app.kubernetes.io/component: device-plugin
    spec:
      tolerations:
        - key: nvidia.com/gpu
          operator: Exists
          effect: NoSchedule
      priorityClassName: system-node-critical
      runtimeClassName: nvidia
      containers:
        - name: nvidia-device-plugin
          image: nvcr.io/nvidia/k8s-device-plugin:v0.17.0
          args:
            - --config-file=/config/config.yaml
          env:
            - name: FAIL_ON_INIT_ERROR
              value: "false"
          resources:
            limits:
              cpu: 100m
              memory: 128Mi
            requests:
              cpu: 50m
              memory: 64Mi
          securityContext:
            allowPrivilegeEscalation: false
            readOnlyRootFilesystem: true
            capabilities:
              drop: ["ALL"]
          volumeMounts:
            - name: device-plugin
              mountPath: /var/lib/kubelet/device-plugins
            - name: config
              mountPath: /config
      volumes:
        - name: device-plugin
          hostPath:
            path: /var/lib/kubelet/device-plugins
        - name: config
          configMap:
            name: nvidia-device-plugin-config
      nodeSelector:
        nvidia.com/gpu.present: "true"

Setup Script

#!/bin/bash
set -e

SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"

echo "=== NVIDIA GPU Device Plugin Setup ==="

GPU_NODES=$(kubectl get nodes -l nvidia.com/gpu.present=true -o name 2>/dev/null || true)
if [ -z "$GPU_NODES" ]; then
    echo "WARNING: No nodes labeled with nvidia.com/gpu.present=true"
    echo "Label GPU nodes with: kubectl label node <node-name> nvidia.com/gpu.present=true"
fi

echo "Creating NVIDIA RuntimeClass..."
kubectl apply -f "$SCRIPT_DIR/nvidia-runtime-class.yaml"

echo "Deploying NVIDIA device plugin with time-slicing..."
kubectl apply -f "$SCRIPT_DIR/device-plugin-daemonset.yaml"

echo "Waiting for device plugin to be ready..."
kubectl rollout status daemonset/nvidia-device-plugin-daemonset -n nvidia-device-plugin --timeout=120s || true

echo ""
echo "=== GPU Resources ==="
kubectl get nodes -l nvidia.com/gpu.present=true -o custom-columns='NODE:.metadata.name,GPU:.status.allocatable.nvidia\.com/gpu'

Deployment

cd infrastructure/nvidia-device-plugin/
./setup-nvidia-device-plugin.sh

=== NVIDIA GPU Device Plugin Setup ===

Creating NVIDIA RuntimeClass...
runtimeclass.node.k8s.io/nvidia created
Deploying NVIDIA device plugin with time-slicing...
namespace/nvidia-device-plugin created
configmap/nvidia-device-plugin-config created
daemonset.apps/nvidia-device-plugin-daemonset created

=== GPU Resources ===
NODE          GPU
polycephala   4

The node reports 4 GPUs due to time-slicing (4 replicas of 1 physical GPU).

Using the GPU

Request nvidia.com/gpu in pod specifications:

apiVersion: v1
kind: Pod
metadata:
  name: gpu-workload
spec:
  runtimeClassName: nvidia
  containers:
    - name: app
      image: pytorch/pytorch:2.8.0-cuda12.8-cudnn9-devel
      resources:
        limits:
          nvidia.com/gpu: 1

Blackwell / PyTorch Compatibility

The RTX 5070 Ti uses Blackwell architecture (sm_120). Most PyTorch versions do not support this yet:

PyTorch Version	CUDA	sm_120 Support
2.5.1	12.4	❌
2.6.0	12.6	❌
2.8.0	12.8	✅

Using an incompatible version produces:

NVIDIA GeForce RTX 5070 Ti with CUDA capability sm_120 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_50 sm_60 sm_70 sm_75 sm_80 sm_86 sm_90.

PyTorch 2.8.0 with CUDA 12.8 is required for Blackwell GPUs.

Test Workload

A PyTorch matrix multiplication test verifies the GPU works:

apiVersion: v1
kind: Pod
metadata:
  name: gpu-test-pytorch
spec:
  restartPolicy: Never
  runtimeClassName: nvidia
  containers:
    - name: pytorch
      image: pytorch/pytorch:2.8.0-cuda12.8-cudnn9-devel
      command:
        - python3
        - -c
        - |
          import torch
          print(f"PyTorch version: {torch.__version__}")
          print(f"CUDA available: {torch.cuda.is_available()}")
          if torch.cuda.is_available():
              print(f"CUDA version: {torch.version.cuda}")
              print(f"GPU: {torch.cuda.get_device_name(0)}")
              props = torch.cuda.get_device_properties(0)
              print(f"GPU Memory: {props.total_memory / 1e9:.1f} GB")
              print(f"Compute capability: {props.major}.{props.minor}")
              print("Running matrix multiplication test...")
              x = torch.randn(2000, 2000, device='cuda')
              y = torch.matmul(x, x)
              print(f"Matrix multiplication test: SUCCESS")
              import time
              start = time.time()
              for _ in range(100):
                  y = torch.matmul(x, x)
              torch.cuda.synchronize()
              elapsed = time.time() - start
              print(f"100x matmul (2000x2000): {elapsed:.3f}s")
      resources:
        limits:
          nvidia.com/gpu: 1

Output:

PyTorch version: 2.8.0+cu128
CUDA available: True
CUDA version: 12.8
GPU: NVIDIA GeForce RTX 5070 Ti
GPU Memory: 16.6 GB
Compute capability: 12.0
Running matrix multiplication test...
Matrix multiplication test: SUCCESS
100x matmul (2000x2000): 0.056s

Gotchas

Missing nvidia-smi

The RPMFusion akmod-nvidia package does not include nvidia-smi. Without it, containers cannot run nvidia-smi even if the driver is installed. Install xorg-x11-drv-nvidia-cuda:

sudo dnf install xorg-x11-drv-nvidia-cuda

NVIDIA_VISIBLE_DEVICES=void

If pods see NVIDIA_VISIBLE_DEVICES=void, the device plugin’s --pass-device-specs flag is set. This sets the variable to “void” and passes devices via the Kubernetes device plugin API instead. Remove the flag to use the standard environment variable method:

# Don't use --pass-device-specs unless specifically needed
args:
  - --config-file=/config/config.yaml
  # NOT: - --pass-device-specs

DCGM Exporter Not Working

NVIDIA DCGM (Data Center GPU Manager) does not support Blackwell GPUs yet. Tested versions 3.3.9 and 4.4.2 both fail to collect metrics:

time=2026-01-02T23:06:30.305Z level=INFO msg="DCGM successfully initialized!"
time=2026-01-02T23:06:30.305Z level=INFO msg="Initializing system entities of type 'GPU'"
# No GPU metrics collected

nvidia-smi works inside the DCGM container, confirming the issue is DCGM library support, not driver access. The DCGM exporter configuration is included but disabled (dcgm-exporter.yaml.disabled) until NVIDIA adds Blackwell support.

Time-Slicing Limitations

Time-slicing allows multiple pods to share a GPU via context switching. Important limitations:

No memory isolation between pods
No performance guarantees
All pods share the same 16GB VRAM
One misbehaving pod can affect others

For memory isolation, MIG (Multi-Instance GPU) is required, but consumer GPUs like the RTX 5070 Ti do not support MIG.

Summary

Component	Status
NVIDIA Driver 580.95.05	✅ Working
nvidia-container-toolkit	✅ Working
nvidia-device-plugin v0.17.0	✅ Working
Time-slicing (4 replicas)	✅ Working
Linkerd integration	✅ Working
PyTorch 2.8.0+cu128	✅ Working
DCGM Exporter	⏸️ Disabled (no Blackwell support)

The RTX 5070 Ti is now a schedulable Kubernetes resource. Up to 4 pods can share the GPU via time-slicing. PyTorch 2.8.0 with CUDA 12.8 is required for Blackwell compatibility.

Configuration available at k8s-configs/infrastructure/nvidia-device-plugin.

Hardware#

Architecture#

Prerequisites on the GPU Node#

1. NVIDIA Driver (Rocky Linux 10)#

2. NVIDIA Container Toolkit#

3. Label the GPU Node#

Directory Structure#

Kubernetes Configuration#

RuntimeClass#

Device Plugin with Time-Slicing#

Setup Script#

Deployment#

Using the GPU#

Blackwell / PyTorch Compatibility#

Test Workload#

Gotchas#

Missing nvidia-smi#

NVIDIA_VISIBLE_DEVICES=void#

DCGM Exporter Not Working#

Time-Slicing Limitations#

Summary#