controller-runtime now defaults LeaderElectionResourceLock to
leases and we had missed the migration to it properly.
Update the RBAC rules to get our controllers to write their
leader election locks to leases.
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
SGX Admission webhook was quickly forked from FPGA's
implementation. After a bit of thinking, it turns out
leader election and metrics are not necessary for a
(idempotent) webhook-only functionality.
For FPGA Admission webhook, the metrics isn't correctly
set up so it's better to disable the functionality. Leader
election is kept but the flag name is renamed to align with
"kubebuilder v3 functionality" similar to how we changed it
to the operator as well.
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
kubebuilder v3 based scaffolding has updated many things
and they are documented in [1].
Update operator's functionality to v3 level. We've done
most/some of the changes earlier (e.g., by not using
deprecated k8s APIs anymore) so the changes are minimal.
[1] https://book.kubebuilder.io/migration/v2vsv3.html
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
This sample nfd cr can be used to make a new nfd instance with intel plugins support in openshift.
Signed-off-by: Manish Regmi <manish.regmi@intel.com>
* run the sgx container as container_device_plugin_t and init container
as container_device_plugin_init_t. these labels are being added to
container_selinux package upstream.
* add rbac role for openshift
Signed-off-by: Manish Regmi <manish.regmi@intel.com>
Start using the newly created NodeFeatureRule configs with SGX.
This allows to drop the custom worker config.
Additionally, split the example NFD deployment into two steps
1) plain NFD (+SGX json patches)
2) NodeFeatureRule creation
NodeFeatureRule creation is not guaranteed to succeed when it's
part of the same kustomization with the CRD creation. Users may
also have NFD already running so allowing 2) alone works better
in that scenario.
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
Intel GPUs come at least in two classes: "0300" and 0380". Desktop GPUs with
3D / display support are in "0300" category, server/compute GPUs without
those are in "0380" category.
"0380" is missing so add it.
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
PR #753 had a huge mistake that changed operator manifest yaml file.
Some part was unintentionally copied and pasted, and no one noticed.
Therefore, this commit replaces the yaml file with the command "operator-sdk generate".
Signed-off-by: Hyeongju Johannes Lee <hyeongju.lee@intel.com>
NFD master and the upcoming release v0.10.0 dropped the
"custom-" prefix from custom labels. Update the default
SgxDevicePlugin sample accordingly.
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
Previously, the SGX plugin deployment pulled in NFD and
SGX webhook as well. This triggered kustomize issues when
trying to get everything under the same namespace.
This commit splits the three deployments into their own steps.
It allows to keep the static parts part of [Before|After]Each
and helps to build SGX plugin/application test cases more
easily.
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
There are a few things left un-renamed after \#771.
Rename those to idxd-config-initcontainer.
Signed-off-by: Hyeongju Johannes Lee <hyeongju.lee@intel.com>
In order to make controllers consistent, I add a nodeselector constraint of daemonset to dlb, fpga, qat too.
Since the same code is commonly used in many files, I add a function that replaces duplicated code.
Signed-off-by: Hyeongju Johannes Lee <hyeongju.lee@intel.com>
Resources in clusters with OwnerReferencesPermissionEnforcement
(e.g., OpenShift) get stricter checks for metadata.ownerReferences.
This appears via errors like:
“is forbidden: cannot set blockOwnerDeletion if an ownerReference refers to
a resource you can’t set finalizers on: ...”
The fix is to add "update" permissions to finalizers subresource
for the xDevicePlugins resources.
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
To make QAT plugin deployment consistent with the other plugins
we update the default flags and deploy without the flag settings
provided by the ConfigMap.
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
1. Implement PreferredAllocator interface.
2. Provide 3 preferred allocation policies: balancedPolicy, packedPolicy and nonePolicy.
3. Provide the cmdline interface: -allocation-policy balanced/packed/none, to select which preferred allocation policy to use.
4. Add operator support.
Co-authored-by: Mikko Ylinen <mikko.ylinen@intel.com>
The provisioning config can be optionally stored in the ProvisioningConfig
configMap which is then passed to initcontainer through the volume mount.
There's also a possibility for a node specific congfiguration through
passing a nodename via NODE_NAME into initcontainer's environment
and passing a node specific profile via configMap volume mount.
Signed-off-by: Oleg Zhurakivskyy <oleg.zhurakivskyy@intel.com>
To simplify the e2e node setup, change the QAT tests to deploy with
the sriov_numvfs overlay.
Moreover, as we are seeing the vfio-pci driver becoming built-in and
requiring opt-in parameters depending on the kernel version, it's
better to move the vfio-pci initcontainer step(s) to kernel cmdline/
modules-load.
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
controller-gen v0.7.0 dropped the support for v1beta1 CRD API as it
was also dropped in k8s.io v1.22.
update 'make generate' to only allow v1 CRD APIs and run it with
controller-gen v0.7.0.
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
Previously idxd kernel module instantiated some
default DSA devices and workqueues on boot.
This is a sample deployment that provisions DSA devices and
workqueues for intel-dsa-plugin with accel-config utility
through initcontainer.
Signed-off-by: Oleg Zhurakivskyy <oleg.zhurakivskyy@intel.com>
The devices enabled by default are different between the
kustomize and operator based deployments.
This change harmonizes the defaults to c6xxvf and 4xxxvf
in both deployment options.
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
All but one (VPU) of the published container images can be built with
static binaries which allows us to use distroless/static as the
base image. Moreover, when combined with stripping the plugin binaries,
we can get both build time and image size savings.
This is the part 1 (out of 2) of the rework. Part 2 will finish the
change by making some adjustments to VPU plugin image and moving the
FPGA/SGX/GPU initcontainers to distroless/static too.
Partial: #516
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
Signed-off-by: Ed Bartosh <eduard.bartosh@intel.com>
Add a patch to operator's manager.yaml to add "--device fpga"
command line in orfer to enable per device deployment.
Signed-off-by: Oleg Zhurakivskyy <oleg.zhurakivskyy@intel.com>
We have been getting reports about the operator getting killed
with an OOMKilled reason. This indicates we consume more memory
than what the resource limit states.
Bump up the memory limit to 50M.
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
In order to get rid of deprecation warnings when deploying the operator,
move away from v1beta1 in RBAC API.
Signed-off-by: Oleg Zhurakivskyy <oleg.zhurakivskyy@intel.com>
The device plugins daemonsets are cluster wide and currently only
one device plugin instance per device is possible so making the
corresponding deviceplugin/v1 CRDs non-namespaced (i.e., scope: cluster)
fits better.
Previously, the device plugin daemonset was deployed in the same
namespace as the CR for that device but with the cluster scoped CRDs
we default to use the same namespace as the operator, unless overridden
via DEVICEPLUGIN_NAMESPACE env variable or a command line parameter
to operator manager deployment.
Three additional changes in this commit:
- enable DSA envtest tests
- update controller-runtime to v0.8.1
- change device plugin envtest suite to use klog/v2
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
It looks that for a long time now we have accepted a setup where a valid QAT
device ID is accepted as a QAT device resource even though the device is
not "enabled" via kernelVfDrivers parameter.
Fix device ID validation to skip valid QAT devices that are not
explicitly specified in kernelVfDrivers.
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
The plugin now detects/accepts 4xxx and c4xxx devices too
and defaults to those drivers that are part of Linux mainline.
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
The SGX device nodes have changed from /dev/sgx/[enclave|provision]
to /dev/sgx_[enclave|provision] in v4x RFC patches according to the
LKML feedback.
This changes moves to use the new device nodes. Backwards compatibility
is provided by adding /dev/sgx directory mount to containers. This
assumes the cluster admin has installed the udev rules provided in the
README to make the old device nodes as symlinks to the new device nodes.
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
This adds the initImage field to the custom resource definition
and takes it into use.
The fpga webhook image validation function is split off into a
separate file.
Signed-off-by: Ukri Niemimuukko <ukri.niemimuukko@intel.com>
SGX aesmd (architectural enclave service daemon) can be used for SGX
DCAP Quote Generation. This commit adds a sample deployment that by
default talks to an Intel reference PCCS (Provisioning Certificate
Caching Service).
The default config provided is for a "single node" cluster that has
PCCS service localhost.
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
With the addition of SGX webhook in the operator, full SGX stack
depends on having the operator deployed first. SgxDevicePlugin CRD
is set to get intel-sgx-plugin and intel-sgx-initcontainer deployed
by the operator.
As a pre-requisite, node-feature-discovery must be deployed but it
is currently deployed via sgx_plugin kustomization overlay only.
It's better to allow NFD with the SGX specific settings deployed with
a kustomization of its own.
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
We currently build using trivialVersions=true and don't deal with
multiversion APIs and their conversion webhooks.
Therefore, drop the registration of the conversion webooks.
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
This adds an nfd-hook for the gpu-plugin, which will create labels
for the GPUs that can then be used for POD deployment purposes or
creation of GPU extended resources which allow then finer grained
GPU resource management.
The nfd-hook will install to the host system when the
intel-gpu-initcontainer is run. It is added into the plugin deployment
yaml.
Signed-off-by: Ukri Niemimuukko <ukri.niemimuukko@intel.com>
This commit adds two initcontainers in a kustomize overlay to QAT
deployment. The overlay can be used to prepare QAT setup on a freshly
booted system.
Note: containerd/cri-o seem to have issues mounting sysfs rw in even
if the container is privileged. Therefore, we do a special /sys:/sys
bind mount for 'cat sriov_totalvs | tee sriov_numvfs' to work.
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
this commits also changes validatePluginImage() to allow
image version as a parameter so that it can be used by by
other webooks too.
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
Some Ubuntu systems may run with Apparmor LSM policy enformements making
the default QAT daemonset to fail with (un)bind errors.
This commit adds a sample kustomize overlay to deploy the QAT daemonset with
Apparmor uconfined policy.
Fixes: #381
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>