Fetches xelink topology information from xpu-manager's rest
interface and stores them as labels under NFD's feature.d directory.
NFD then assigns the labels to the node. On exit, sidecar will
remove the label file from disk.
Co-authored-by: Ukri Niemimuukko <ukri.niemimuukko@intel.com>
Signed-off-by: Tuomas Katila <tuomas.katila@intel.com>
initcontainer enables vfs and configures vfs
- only first pf is used to configure a vf
- only one vf is configured from the pf
add dlb-initcontainer kustomize overlay
update CRD to have initImage
implment operator to run initcontainer
update e2e test to run initcontainer overlay
update envtest to test initimage
Signed-off-by: Hyeongju Johannes Lee <hyeongju.lee@intel.com>
Operator has used "gpu-manager" as part of the cluster object names
it creates. Kustomize based deployments can be aligned with that.
Signed-off-by: Ukri Niemimuukko <ukri.niemimuukko@intel.com>
With the latest version of controller-tools, we get to set
reinvocationPolicy tag so that we no longer have to add that
field manually in our Admission Webhook manifests.
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
6.0.0 kernel doesn't seem to have 'drm' module anymore and it makes
more sense to depend on the i915 module.
Signed-off-by: Tuomas Katila <tuomas.katila@intel.com>
controller-runtime now defaults LeaderElectionResourceLock to
leases and we had missed the migration to it properly.
Update the RBAC rules to get our controllers to write their
leader election locks to leases.
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
SGX Admission webhook was quickly forked from FPGA's
implementation. After a bit of thinking, it turns out
leader election and metrics are not necessary for a
(idempotent) webhook-only functionality.
For FPGA Admission webhook, the metrics isn't correctly
set up so it's better to disable the functionality. Leader
election is kept but the flag name is renamed to align with
"kubebuilder v3 functionality" similar to how we changed it
to the operator as well.
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
kubebuilder v3 based scaffolding has updated many things
and they are documented in [1].
Update operator's functionality to v3 level. We've done
most/some of the changes earlier (e.g., by not using
deprecated k8s APIs anymore) so the changes are minimal.
[1] https://book.kubebuilder.io/migration/v2vsv3.html
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
This sample nfd cr can be used to make a new nfd instance with intel plugins support in openshift.
Signed-off-by: Manish Regmi <manish.regmi@intel.com>
* run the sgx container as container_device_plugin_t and init container
as container_device_plugin_init_t. these labels are being added to
container_selinux package upstream.
* add rbac role for openshift
Signed-off-by: Manish Regmi <manish.regmi@intel.com>
Start using the newly created NodeFeatureRule configs with SGX.
This allows to drop the custom worker config.
Additionally, split the example NFD deployment into two steps
1) plain NFD (+SGX json patches)
2) NodeFeatureRule creation
NodeFeatureRule creation is not guaranteed to succeed when it's
part of the same kustomization with the CRD creation. Users may
also have NFD already running so allowing 2) alone works better
in that scenario.
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
Intel GPUs come at least in two classes: "0300" and 0380". Desktop GPUs with
3D / display support are in "0300" category, server/compute GPUs without
those are in "0380" category.
"0380" is missing so add it.
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
PR #753 had a huge mistake that changed operator manifest yaml file.
Some part was unintentionally copied and pasted, and no one noticed.
Therefore, this commit replaces the yaml file with the command "operator-sdk generate".
Signed-off-by: Hyeongju Johannes Lee <hyeongju.lee@intel.com>
NFD master and the upcoming release v0.10.0 dropped the
"custom-" prefix from custom labels. Update the default
SgxDevicePlugin sample accordingly.
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
Previously, the SGX plugin deployment pulled in NFD and
SGX webhook as well. This triggered kustomize issues when
trying to get everything under the same namespace.
This commit splits the three deployments into their own steps.
It allows to keep the static parts part of [Before|After]Each
and helps to build SGX plugin/application test cases more
easily.
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
There are a few things left un-renamed after \#771.
Rename those to idxd-config-initcontainer.
Signed-off-by: Hyeongju Johannes Lee <hyeongju.lee@intel.com>
In order to make controllers consistent, I add a nodeselector constraint of daemonset to dlb, fpga, qat too.
Since the same code is commonly used in many files, I add a function that replaces duplicated code.
Signed-off-by: Hyeongju Johannes Lee <hyeongju.lee@intel.com>
Resources in clusters with OwnerReferencesPermissionEnforcement
(e.g., OpenShift) get stricter checks for metadata.ownerReferences.
This appears via errors like:
“is forbidden: cannot set blockOwnerDeletion if an ownerReference refers to
a resource you can’t set finalizers on: ...”
The fix is to add "update" permissions to finalizers subresource
for the xDevicePlugins resources.
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
To make QAT plugin deployment consistent with the other plugins
we update the default flags and deploy without the flag settings
provided by the ConfigMap.
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
1. Implement PreferredAllocator interface.
2. Provide 3 preferred allocation policies: balancedPolicy, packedPolicy and nonePolicy.
3. Provide the cmdline interface: -allocation-policy balanced/packed/none, to select which preferred allocation policy to use.
4. Add operator support.
Co-authored-by: Mikko Ylinen <mikko.ylinen@intel.com>
The provisioning config can be optionally stored in the ProvisioningConfig
configMap which is then passed to initcontainer through the volume mount.
There's also a possibility for a node specific congfiguration through
passing a nodename via NODE_NAME into initcontainer's environment
and passing a node specific profile via configMap volume mount.
Signed-off-by: Oleg Zhurakivskyy <oleg.zhurakivskyy@intel.com>
To simplify the e2e node setup, change the QAT tests to deploy with
the sriov_numvfs overlay.
Moreover, as we are seeing the vfio-pci driver becoming built-in and
requiring opt-in parameters depending on the kernel version, it's
better to move the vfio-pci initcontainer step(s) to kernel cmdline/
modules-load.
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
controller-gen v0.7.0 dropped the support for v1beta1 CRD API as it
was also dropped in k8s.io v1.22.
update 'make generate' to only allow v1 CRD APIs and run it with
controller-gen v0.7.0.
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
Previously idxd kernel module instantiated some
default DSA devices and workqueues on boot.
This is a sample deployment that provisions DSA devices and
workqueues for intel-dsa-plugin with accel-config utility
through initcontainer.
Signed-off-by: Oleg Zhurakivskyy <oleg.zhurakivskyy@intel.com>
The devices enabled by default are different between the
kustomize and operator based deployments.
This change harmonizes the defaults to c6xxvf and 4xxxvf
in both deployment options.
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
All but one (VPU) of the published container images can be built with
static binaries which allows us to use distroless/static as the
base image. Moreover, when combined with stripping the plugin binaries,
we can get both build time and image size savings.
This is the part 1 (out of 2) of the rework. Part 2 will finish the
change by making some adjustments to VPU plugin image and moving the
FPGA/SGX/GPU initcontainers to distroless/static too.
Partial: #516
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
Signed-off-by: Ed Bartosh <eduard.bartosh@intel.com>
Add a patch to operator's manager.yaml to add "--device fpga"
command line in orfer to enable per device deployment.
Signed-off-by: Oleg Zhurakivskyy <oleg.zhurakivskyy@intel.com>
We have been getting reports about the operator getting killed
with an OOMKilled reason. This indicates we consume more memory
than what the resource limit states.
Bump up the memory limit to 50M.
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
In order to get rid of deprecation warnings when deploying the operator,
move away from v1beta1 in RBAC API.
Signed-off-by: Oleg Zhurakivskyy <oleg.zhurakivskyy@intel.com>
The device plugins daemonsets are cluster wide and currently only
one device plugin instance per device is possible so making the
corresponding deviceplugin/v1 CRDs non-namespaced (i.e., scope: cluster)
fits better.
Previously, the device plugin daemonset was deployed in the same
namespace as the CR for that device but with the cluster scoped CRDs
we default to use the same namespace as the operator, unless overridden
via DEVICEPLUGIN_NAMESPACE env variable or a command line parameter
to operator manager deployment.
Three additional changes in this commit:
- enable DSA envtest tests
- update controller-runtime to v0.8.1
- change device plugin envtest suite to use klog/v2
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
It looks that for a long time now we have accepted a setup where a valid QAT
device ID is accepted as a QAT device resource even though the device is
not "enabled" via kernelVfDrivers parameter.
Fix device ID validation to skip valid QAT devices that are not
explicitly specified in kernelVfDrivers.
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
The plugin now detects/accepts 4xxx and c4xxx devices too
and defaults to those drivers that are part of Linux mainline.
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>