Start using the newly created NodeFeatureRule configs with SGX.
This allows to drop the custom worker config.
Additionally, split the example NFD deployment into two steps
1) plain NFD (+SGX json patches)
2) NodeFeatureRule creation
NodeFeatureRule creation is not guaranteed to succeed when it's
part of the same kustomization with the CRD creation. Users may
also have NFD already running so allowing 2) alone works better
in that scenario.
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
The webhooks' default deployments depend on cert-manager. Our existing
documentation points to a specific cert-manager version giving users
the impression that it should be used. However, that is not the case.
Update the documentation so that we just point to cert-manager
installation page. With this, we don't have to hard-code to any
specific version.
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
This adds a new label "gpu-numbers" for short numbered lists of
gpus, omitting "card" from the names. Also adds splitting of long
label values.
Similarly this adds a new label "pci-groups" for PCI groups. Grouping
can be controlled by env var GPU_PCI_GROUPING_LEVEL. The env var
dictates, how many pci-folder names need to match, in order for GPUs
to be considered to belong in a group.
Signed-off-by: Ukri Niemimuukko <ukri.niemimuukko@intel.com>
If the kernel has CONFIG_VFIO_NOIOMMU enabled and the node admin
has explicitly set enable_unsafe_noiommu_mode VFIO parameter,
VFIO taints the kernel and writes "vfio-noiommu" to the IOMMU
group name. If these conditions are true, the /dev/vfio/ devices
are prefixed with "noiommu-".
This use-case is documented for DPDK so we don't want to break
it (as it was before because we added DeviceMounts to
/dev/vfio/<iommugroup> files that did not exist).
See DPDK documentation for further information and warnings.
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
While the labeling limit is obvious after little thought, IMHO
limitations like this should either be stated out front, or be in
their own section in the README. Commit does former for the GPU
plugin fractional resources, and latter for the NFD hook / labeling.
Remove the sentence for pre-built image since Dockerhub image for dlb
plugin is available.
Signed-off-by: Hyeongju Johannes Lee <hyeongju.lee@intel.com>
There are a few things left un-renamed after \#771.
Rename those to idxd-config-initcontainer.
Signed-off-by: Hyeongju Johannes Lee <hyeongju.lee@intel.com>
The new_id based driver binding is failing on kernels 5.11+ when the
QAT VF is not bound to any driver: attempts to write to new_id with
the same device ID repeatedly error with "file exists".
Move the new_id initialization to the beginning of the startup and
write the enabled device IDs only once.
This commit also fixes an issue where VF devices where not correctly detected
in virtual machines where the VF was not bound any driver.
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
After a closer review, it was noticed that some of the QAT dpdkdrv
unit tests need updating:
- "Broken igb_uio DPDKdriver..." is actually testing unknown device ID
and we already have tests for it -> drop.
- "igb_uio DPDKdriver with one kernel bound device (not QAT device)" is
testing something impossible: an unknown VF devID is originated from a
QAT PF -> drop.
- creating files for unbind/new_id etc. is unnecessary because
os.WriteFile() creates them during the tests -> drop these lines to
simplify unit tests maintenance.
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
Bumps [google.golang.org/grpc](https://github.com/grpc/grpc-go) from 1.42.0 to 1.43.0.
- [Release notes](https://github.com/grpc/grpc-go/releases)
- [Commits](https://github.com/grpc/grpc-go/compare/v1.42.0...v1.43.0)
---
updated-dependencies:
- dependency-name: google.golang.org/grpc
dependency-type: direct:production
update-type: version-update:semver-minor
...
---
In addition to changes made by dependabot, I add nolint comments to ignore staticcheck(SA1019) errors.
It is because insecure.NewCredentials() recommended as an alternative is still declared experimental.
So keep grpc.withInsecure() with nolint comment.
Signed-off-by: Hyeongju Johannes Lee <hyeongju.lee@intel.com>
GPU generation "gen" number is replaced in the capability files of
latest kernels with separate display, graphics, and media versions.
For compatibility with newer kernels, provide "gen" based on the new
labels (but without decimals), and for older kernel compatibility, new
labels based on the "gen".
Because different kernels match different items from the action map,
whole capability file will get parsed. Capability file parsing is
optimized by using prefix check instead of scanf.
"platform_gen" label is deprecated, and can be dropped whenever it
becomes inconvenient (lint complains about line count etc).
The cmdline flags talked about the old device nodes. With the
upstream driver, the devices nodes are /dev/sgx_[enclave|provision].
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
To make QAT plugin deployment consistent with the other plugins
we update the default flags and deploy without the flag settings
provided by the ConfigMap.
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
1. Implement PreferredAllocator interface.
2. Provide 3 preferred allocation policies: balancedPolicy, packedPolicy and nonePolicy.
3. Provide the cmdline interface: -allocation-policy balanced/packed/none, to select which preferred allocation policy to use.
4. Add operator support.
Co-authored-by: Mikko Ylinen <mikko.ylinen@intel.com>
- used the same go version as for the project build
- used verbose output
- fixed gofmt check failures
Signed-off-by: Ed Bartosh <eduard.bartosh@intel.com>
Commit 00a59e8f7d was not complete in that it didn't update
the corresponding documentation. This commit fixes that.
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
The devices enabled by default are different between the
kustomize and operator based deployments.
This change harmonizes the defaults to c6xxvf and 4xxxvf
in both deployment options.
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
This changes the memory reading to be done through lmem_total_bytes
file instead of the addr_range file.
Signed-off-by: Ukri Niemimuukko <ukri.niemimuukko@intel.com>
Add govet-fieldalignment to .golangci.yml
Fix errors that come from adding govet-fieldalignment
- by reordering the fields of structs
- by putting nolint:govet annotations
Signed-off-by: Hyeongju Johannes Lee <hyeongju.lee@intel.com>