Manage Ascend NPU quota with Kueue
This page shows how to use Alauda Build of Kueue to govern quota for Ascend NPU
resources exposed by the Ascend Device Plugin (shipped as part of the NPU
Operator). Kueue treats huawei.com/Ascend910 and similar device-plugin
resources as ordinary countable resources, so the same ClusterQueue /
ResourceFlavor / LocalQueue model used for CPU, memory and Nvidia GPUs
applies unchanged.
TOC
PrerequisitesProcedure1. Identify the NPU node label2. Create the Kueue objects3. Submit an NPU job through the queue4. Observe admissionNotesPrerequisites
- You have installed the
Alauda Build of Kueuecluster plugin. - You have installed the
Alauda Build of NPU Operatorcluster plugin with the Driver, Ascend Device Plugin and Ascend Docker Runtime components enabled in the deployment form (they are enabled by default). These three toggles are what make the cluster surfacehuawei.com/Ascend910(orhuawei.com/Ascend310P) as a schedulable extended resource on the Ascend nodes. See Install NPU Operator for the full procedure and the meaning of each deployment-form toggle. - The Ascend Device Plugin is reporting NPU capacity on the node, e.g.
kubectl describe node <ascend-node>showshuawei.com/Ascend910: <N>under bothCapacityandAllocatable. - The Alauda Container Platform Web CLI has communication with your cluster.
Procedure
1. Identify the NPU node label
The ResourceFlavor selects which node pool the quota applies to via
spec.nodeLabels. The NPU Operator adds the following labels to every node it
manages — pick whichever fits your fleet:
The rest of this page uses accelerator: huawei-Ascend910.
2. Create the Kueue objects
Create a ResourceFlavor bound to the NPU node pool, a ClusterQueue that
declares an NPU quota, and a LocalQueue in the namespace where users will
submit jobs.
nodeLabels: Restricts the flavor to nodes labeled by the NPU Operator as Ascend 910. Replace with the label that matches your fleet (see the table above).coveredResources: Lists the device-plugin resource alongside CPU and memory so that all three are accounted for in admission decisions.nominalQuota: Total NPU cards the queue may hand out at any time. Set this to (or below) the sum of allocatable NPUs across the nodes selected by the flavor.
Apply the manifest with kubectl apply -f <filename>.yaml.
3. Submit an NPU job through the queue
Add the kueue.x-k8s.io/queue-name label and request NPUs in the container
resources block. Kueue's built-in batch/job integration will create the
Job in a suspended state and unsuspend it only after the quota check passes.
queue-namelabel: Routes the Job to theLocalQueueyou created above.suspend: true: Required so Kueue can hold the Job until quota is available. Kueue will set this back tofalseon admission.- NPU request: Use
huawei.com/Ascend910(orhuawei.com/Ascend910B4, etc.) to match the resource name reported by the Ascend Device Plugin on your nodes. Request and limit must be equal — this is an extended resource.
4. Observe admission
Submit two such Jobs each requesting 2 NPUs against a queue whose
nominalQuota is 2.
The first Job is admitted, runs as a Pod that receives /dev/davinci*
devices and ASCEND_VISIBLE_DEVICES from the Ascend Device Plugin; the second
Job stays Suspended and its Workload reports the quota shortage:
Inspecting the held Workload makes the reason explicit:
The status.conditions field reports the quota shortage:
Once the first Job finishes or is deleted, Kueue automatically admits the held Workload, unsuspends the second Job and the Ascend Device Plugin assigns the freed NPU devices to its Pod.
Notes
- The
ClusterQueue'snominalQuotaforhuawei.com/Ascend910is enforced independently of the node'sAllocatable. IfAllocatableis lower than the quota (for example because the device plugin marks some chips unhealthy), an admitted Pod can still end upPendingat the kube-scheduler layer withInsufficient huawei.com/Ascend910. KeepnominalQuota≤ the sum ofAllocatableacross the selected nodes. - The same pattern applies to other Ascend resource names exposed by the
device plugin (
huawei.com/Ascend910B4,huawei.com/Ascend310P, …). Add them tocoveredResourcesand to the containerresourcesblock as needed. - Kueue's other features — fair sharing, cohorts, gang scheduling, preemption — work with NPU resources without any extra configuration.