DRA

Before

After

You ask for: nvidia.com/gpu: 1

There is no way to say:

  • I need this type of device
  • with these capabilities
  • prepared in this way

 

Instead of requesting a number you describe what you actually need...

You can ask for: nvidia.com/gpu: 1, product ID A100-SXM4-40GB of memory and 3456 FB64 cores

  • ResourceClaim = what you want
  • Driver = what exists
  • Scheduler = matches them
resources:
  limits:
    nvidia.com/gpu: 1
spec:
  requirements:
    - deviceClassName: gpu
      selectors:
        - name: model
          value: A100
        - name: memory
          value: "40Gi"
                    +-------------------+        +---------------------+        +----------------------+
                    |       Pod         |        |   ResourceClaim     |        |   DRA Driver         |
                    |-------------------|        |---------------------|        |----------------------|
                    | needs:            | -----> | requests:           | -----> | knows real devices   |
                    | - "GPU workload"  |        | - device type       |        | - GPU A100           |
                    |                   |        | - capabilities      |        | - FPGA X             |
                    |                   |        | - constraints       |        | - topology info      |
                    +-------------------+        +---------------------+        +----------------------+
                             |                              |                               |
                             |                              v                               |
                             |                  +---------------------+                     |
                             |                  |   Scheduler         |                     |
                             |                  |---------------------|                     |
                             |                  | finds matching node |                     |
                             |                  +---------------------+                     |
                             |                              |                               |
                             v                              v                               v
                                    +--------------------------------------------------+
                                    |           Allocated & Prepared Device            |
                                    |   (correct hardware, configured, ready to use)   |
                                    +--------------------------------------------------+

                                                   ↓

                                            🚀 Pod starts

DRA

By fmuyassarov

DRA

  • 8