There's no mapping available from IP block versions to actual product
features, which make these version numbers fairly useless for end
users.
In mixed GPU clusters, running a job that adds/updates node labels for
the relevant GPU features to each relevant node would be much more
user-friendly. This could be done easily by converting given GPU API
capability tool (e.g. "vainfo" for VA-API, "clinfo" for OpenCL) output
to a NFD feature file.
(Such thing would be outside of this project scope though, except
maybe as an example / test-case.)
Signed-off-by: Eero Tamminen <eero.t.tamminen@intel.com>
kubebuilder v3 based scaffolding has updated many things
and they are documented in [1].
Update operator's functionality to v3 level. We've done
most/some of the changes earlier (e.g., by not using
deprecated k8s APIs anymore) so the changes are minimal.
[1] https://book.kubebuilder.io/migration/v2vsv3.html
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
QAT_401xx is a derivative of 4xxx. Add support for that device
by including the device IDs (both PF and VF).
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
Containers running on QAT Gen4 should be based on qatlib and therefore
kerneldrv is not the right mode. Skip registering 4xxx* devices to
ensure it is not used.
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
grpc-go v1.43.0 deprecated grpc.WithInsecure() in favor of
insecure.NewCredentials(). Move to use the recommended approach
and drop the linter annotations.
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
Move reallocate logic to getpreferredallocation and simplify
allocate to use the kubelet's device ids.
Signed-off-by: Tuomas Katila <tuomas.katila@intel.com>
Adds functionality to convert container's tile annotation
in to corresponding L0 affinity mask. This helps to target
container's workload to specific L0 subdevices.
Signed-off-by: Tuomas Katila <tuomas.katila@intel.com>
Start using the newly created NodeFeatureRule configs with SGX.
This allows to drop the custom worker config.
Additionally, split the example NFD deployment into two steps
1) plain NFD (+SGX json patches)
2) NodeFeatureRule creation
NodeFeatureRule creation is not guaranteed to succeed when it's
part of the same kustomization with the CRD creation. Users may
also have NFD already running so allowing 2) alone works better
in that scenario.
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
The webhooks' default deployments depend on cert-manager. Our existing
documentation points to a specific cert-manager version giving users
the impression that it should be used. However, that is not the case.
Update the documentation so that we just point to cert-manager
installation page. With this, we don't have to hard-code to any
specific version.
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
This adds a new label "gpu-numbers" for short numbered lists of
gpus, omitting "card" from the names. Also adds splitting of long
label values.
Similarly this adds a new label "pci-groups" for PCI groups. Grouping
can be controlled by env var GPU_PCI_GROUPING_LEVEL. The env var
dictates, how many pci-folder names need to match, in order for GPUs
to be considered to belong in a group.
Signed-off-by: Ukri Niemimuukko <ukri.niemimuukko@intel.com>
If the kernel has CONFIG_VFIO_NOIOMMU enabled and the node admin
has explicitly set enable_unsafe_noiommu_mode VFIO parameter,
VFIO taints the kernel and writes "vfio-noiommu" to the IOMMU
group name. If these conditions are true, the /dev/vfio/ devices
are prefixed with "noiommu-".
This use-case is documented for DPDK so we don't want to break
it (as it was before because we added DeviceMounts to
/dev/vfio/<iommugroup> files that did not exist).
See DPDK documentation for further information and warnings.
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
While the labeling limit is obvious after little thought, IMHO
limitations like this should either be stated out front, or be in
their own section in the README. Commit does former for the GPU
plugin fractional resources, and latter for the NFD hook / labeling.