Osh + DRA .....
Osh without DRA
- You get what’s cooked. Give me one portion
- You can’t ask for
- extra meat
- less oil
- to'y/choyxona osh
There’s one type of osh for everyone

Osh without DRA
Osh with DRA
There’s one type of osh for everyone
There’s one type of osh for everyone. You only request how much you want.
You request exactly what you want.
You can customize based on your taste.
Give me one portion with:
You get what’s cooked. Give me one portion


-
extra meat
-
less oil
-
eggs
Before
After
You ask for: nvidia.com/gpu: 1
You could not say:
- I need a GPU
- with these capabilities
- prepared in this way
Describe what you actually need...
You can ask for:
- nvidia.com/gpu: 1
- product ID: A100-SXM4-40GB
- memory: 40 GB
- cores: 3456 FB64
resources:
request:
nvidia.com/gpu: 1spec:
requirements:
- deviceClassName: gpu
selectors:
- name: model
value: A100
- name: memory
value: "40Gi"Driver writer
Cluster admin
Application Developer/DevOps
Roles
Driver writer
Cluster admin
Application Developer/DevOps
Roles
.. is someone who understands how a piece of hardware works, basically knows writing the software that lets to control and allocate that hardware.
Driver developer
Cluster admin
Application Developer/DevOps
Roles
.. is someone who understands how a piece of hardware works, basically knows writing the software that lets to control and allocate that hardware.
Decides:
- what attributes of the hardware to expose to DRA
- implements interfaces to configure the node resources on the fly
Driver developer
Cluster admin
Application Developer/DevOps
Roles
.. is someone who understands how a piece of hardware works, basically knows writing the software that lets to control and allocate that hardware.
Decides:
- what attributes of the hardware to expose to DRA
- implements interfaces to configure the node resources on the fly
Driver developer
Cluster admin
Application Developer/DevOps
Roles
.. is someone who understands how a piece of hardware works, basically knows writing the software that lets to control and allocate that hardware.
Decides:
- what attributes of the hardware to expose to DRA
- implements interfaces to configure the node resources on the fly
Driver developer
Cluster admin
App developer/DevOps
Roles
.. is who installs the DRA driver, sets up device classes, and configures nodes (e.g., attaching GPUs) so workloads can use the hardware.
Driver developer
Cluster admin
App developer/DevOps
Roles
.. is someone who knows the application needs and defines resource requirements (define ResourceClaims) for the their application
Let's play those roles.
Let's play those roles.
We’re not writing the driver today :)
- Install the driver
- Analyze the resources
- Create deviceClasses
Cluster admin
helm install dra-driver-pizza....- Install the driver
- Analyze the resources
- Create deviceClasses
Cluster admin
kubectl get resourceSlices
NAME NODE DRIVER POOL AGE
kind-control-plane-dra.pizza-9q2ls kind-control-plane dra.pizza kind-control-plane 20hhelm install dra-driver-pizza....- Install the driver
- Analyze the resources
- Create deviceClasses
Cluster admin
kubectl get resourceSlices
NAME NODE DRIVER POOL AGE
kind-control-plane-dra.pizza-9q2ls kind-control-plane dra.pizza kind-control-plane 20hapiVersion: pizza.kitchen/v1
kind: ResourceSlice
metadata:
name: pizzahut-matinkyla
spec:
pizzas:
- name: margherita-pan-pizza
attributes:
kitchen.pizza.example/dough:
string: pan
kitchen.pizza.example/sauce:
string: tomato
kitchen.pizza.example/cheese:
string: mozzarella
kitchen.pizza.example/toppings:
string: "basil"
kitchen.pizza.example/extraCheeseAvailable:
bool: true
kitchen.pizza.example/availableSlices:
string: "4,6,8"
- name: pepperoni-pan-pizza
attributes:
kitchen.pizza.example/dough:
string: pan
kitchen.pizza.example/sauce:
string: tomato
kitchen.pizza.example/cheese:
string: mozzarella
kitchen.pizza.example/toppings:
string: "pepperoni"
kitchen.pizza.example/extraCheeseAvailable:
bool: true
kitchen.pizza.example/availableSlices:
string: "6,8"helm install dra-driver-pizza....- Install the driver
- Analyze the resources
- Creates deviceClasses (for device filtering).
- Create deviceClass that selects CPU with x86_64 architecture
Cluster admin
"I want a vegetarian pizza with mushrooms, basil, and extra cheese"
deviceClass
attributes
apiVersion: resource.k8s.io/v1
kind: DeviceClass
metadata:
name: intelCpu
spec:
selectors:
- cel:
expression: |
attributes["hardware.cpu/architecture"].string == "x86_64" &&
attributes["hardware.cpu/vendor"].string == "amd"apiVersion: resource.k8s.io/v1
kind: DeviceClass
metadata:
name: intelCpu
spec:
selectors:
- cel:
expression: |
attributes["hardware.cpu/architecture"].string == "x86_64" &&
attributes["hardware.cpu/vendor"].string == "intel"










apiVersion: resource.k8s.io/v1
kind: DeviceClass
metadata:
name: intelCpu
spec:
selectors:
- cel:
expression: |
attributes["hardware.cpu/architecture"].string == "x86_64" &&
attributes["hardware.cpu/vendor"].string == "amd"apiVersion: resource.k8s.io/v1
kind: DeviceClass
metadata:
name: intelCpu
spec:
selectors:
- cel:
expression: |
attributes["hardware.cpu/architecture"].string == "x86_64" &&
attributes["hardware.cpu/vendor"].string == "intel"












- Install the driver
- Analyze the resources
- Creates deviceClasses (for device filtering). When a user creates a ResourceClaim, they don’t pick devices directly.
- create deviceClass that selects CPU resources based on architecture constraints, such as x86_64 or ARM64.
Cluster admin
"I want a vegetarian pizza with mushrooms, basil, and extra cheese"
deviceClass
attributes
apiVersion: resource.k8s.io/v1
kind: DeviceClass
metadata:
name: pizza-vegetarian
...
spec:
selectors:
- cel:
expression: |
attributes["kitchen.pizza.example/cheese"].string == "mozzarella" &&
attributes["kitchen.pizza.example/toppings"].string.contains("mushroom") &&
attributes["kitchen.pizza.example/extraCheeseAvailable"].bool == true- Install the driver
- Analyze the resources
- Creates deviceClasses (for device filtering). When a user creates a ResourceClaim, they don’t pick devices directly.
- create deviceClass that selects CPU resources based on architecture constraints, such as x86_64 or ARM64.
Cluster admin
"I want a vegetarian pizza with mushrooms, basil, and extra cheese"
deviceClass
attributes
apiVersion: resource.k8s.io/v1
kind: DeviceClass
metadata:
name: pizza-vegetarian
...
spec:
selectors:
- cel:
expression: |
attributes["kitchen.pizza.example/cheese"].string == "mozzarella" &&
attributes["kitchen.pizza.example/toppings"].string.contains("mushroom") &&
attributes["kitchen.pizza.example/extraCheeseAvailable"].bool == true- Install the driver
- Analyze the resources
- Creates deviceClasses (for device filtering). When a user creates a ResourceClaim, they don’t pick devices directly.
- create deviceClass that selects CPU resources based on architecture constraints, such as x86_64 or ARM64.
Cluster admin
"I want a vegetarian pizza with mushrooms, basil, and extra cheese"
deviceClass
attributes
apiVersion: resource.k8s.io/v1
kind: DeviceClass
metadata:
name: gpu.amd.com
spec:
selectors:
- cel:
expression: "device.driver == 'gpu.amd.com'"apiVersion: resource.k8s.io/v1
kind: DeviceClass
metadata:
name: mig.nvidia.com
spec:
selectors:
- cel:
expression: "device.driver == 'gpu.nvidia.com' && device.attributes['gpu.nvidia.com'].type == 'mig'"DevOps/app dev
- given an access to a cluster
- has DeviceClasses that describe category of devices
- creates ResourceClaims (resources that a workload needs) and attaches to the workload (Pod)
- deviceClass CEL - filter over available devices
- resourceClaim CEL - workload specific attributes of hardware. From the acceptable devices, which exact ones do I want right now?
apiVersion: resource.k8s.io/v1
kind: ResourceClaim
metadata:
name: my-pizza-order
spec:
devices:
requests:
- name: pizza
deviceClassName: vegeterian-pizza
selectors:
- cel:
expression: |-
device.attributes["kitchen.pizza.example/cheese"].string == "mozzarella" &&
device.attributes["kitchen.pizza.example/toppings"].string.contains("mushroom") &&
device.attributes["kitchen.pizza.example/extraCheeseAvailable"].bool == true
DevOps/app dev
- given an access to a cluster
- has DeviceClasses that describe category of devices
- creates ResourceClaims (resources that a workload needs) and attaches to the workload (Pod)
- deviceClass CEL - filter over available devices
- resourceClaim CEL - workload specific attributes of hardware. From the acceptable devices, which exact ones do I want right now?
apiVersion: resource.k8s.io/v1
kind: ResourceClaim
metadata:
name: claim-cpu-capacity-20
spec:
devices:
requests:
- name: numa0-cpus
exactly:
deviceClassName: dra.cpu
capacity:
requests:
dra.cpu/cpu: "10"
selectors:
- cel:
expression: device.attributes["dra.cpu"].numaNodeID == 0
- name: numa1-cpus
exactly:
deviceClassName: dra.cpu
capacity:
requests:
dra.cpu/cpu: "10"
selectors:
- cel:
expression: device.attributes["dra.cpu"].numaNodeID ==1Pod references ResourceClaim
ResourceClaim
deviceClass + CEL constraints
DeviceClass
cluster wide device filter
ResourceSlice
devices advertised by
driver
DRA driver
scheduler
matches the claim to the node
ResourceClaim is allocated
status is updated by scheduler
DRA driver on Node
NodePrepareResource called
Kubelet mounts devices into Pod sandbocx
Pod starts
device ready, CDI injected
DRA isn't only about GPU
DRA isn't only about GPU
CPU, memory, hugepages, NIC, RDMA net devices and etc.
Migration from Device plugins to DRA
DRA
By fmuyassarov
DRA
- 34