Commit Graph

28 Commits

Author SHA1 Message Date
Tuomas Katila
6d9e96856d operator: modify service accounts and role bindings to be shared
Additional objects are shared between device plugin CRs. Once the last
CR is removed, the additional objects are also removed.

Signed-off-by: Tuomas Katila <tuomas.katila@intel.com>
2023-11-10 12:31:19 +02:00
Tuomas Katila
f9221c46fd operator: remove one-cr-per-kind limitation
Differentiate objects by adding cr names as suffixes
Drop kind book keeping and related functions from controllers

Signed-off-by: Tuomas Katila <tuomas.katila@intel.com>
2023-11-09 13:05:40 +02:00
Mikko Ylinen
48fd7b82fe controllers: use const appLabel in tests
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2023-10-30 13:43:18 +02:00
Tuomas Katila
a15c84c81e operator: fix controllers indicating changes when there are none
Signed-off-by: Tuomas Katila <tuomas.katila@intel.com>
2023-10-23 11:03:14 +03:00
Hyeongju Johannes Lee
20caa42e7a operator: add ctx to func UpgradeImages and logger for env vars of images
Signed-off-by: Hyeongju Johannes Lee <hyeongju.lee@intel.com>
2023-09-18 12:04:32 -07:00
Tuomas Katila
691dfc3483 gpu: refactor nfdhook functionality to plugin
NFD v0.14+ doesn't support binary NFD hooks by default, so there is
a need to move the label creation away from the GPU nfdhook.

Move extended resource label creation to plugin, and drop labels that were
already marked deprecated (platform_gen, media_version etc.).

Drop init-container from deployment files and operator. It is still possible
to use an initcontainer, but the default deployments do not support it.

Signed-off-by: Tuomas Katila <tuomas.katila@intel.com>
2023-09-12 16:20:33 +03:00
Tuomas Katila
974829ff7c gpu: try to fetch PodList from kubelet API
In large clusters and with resource management, the load
from gpu-plugins can become heavy for the api-server.
This change will start fetching pod listings from kubelet
and use api-server as a backup. Any other error than timeout
will also move the logic back to using api-server.

Signed-off-by: Tuomas Katila <tuomas.katila@intel.com>
2023-03-30 12:43:02 +03:00
Eero Tamminen
2f3dc23651 Use more generic name for NFD features host directory volume
NFD hooks are deprecated and going away:
https://github.com/kubernetes-sigs/node-feature-discovery/issues/856

This makes the mount names more future-proof, and shows where later
changes need to be done (to change operator mount directory, and
switch hook-using deployments e.g. to feature files).

Signed-off-by: Eero Tamminen <eero.t.tamminen@intel.com>
2023-02-08 18:20:41 +02:00
Manish Regmi
22e9d5f882 add selinux labels for GPU plugins 2022-09-15 14:44:51 -04:00
Ed Bartosh
3e2f9b1e80
Merge pull request #907 from mythi/PR-2022-016
operator: QAT and GPU controller updates
2022-03-01 17:43:49 +02:00
Ed Bartosh
13780a8cdc implement terrascan check
Signed-off-by: Ed Bartosh <eduard.bartosh@intel.com>
2022-03-01 15:54:28 +02:00
Mikko Ylinen
fb73e2ecb3 gpu: avoid slice realloc in GpuDevicePlugin controller
The amount of GPU plugin parameters has increased but the
args slice capacity has not been changed. Update it to avoid
slice reallocations.

Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2022-03-01 07:48:17 +02:00
Oleg Zhurakivskyy
cfc8eb18cb operator: Support upgrade of plugins
The upgrade of the deployed plugins can be done by simply installing
a new release of the operator.

The operator auto-upgrades operator-managed plugins (CR images
and thus corresponding deployed daemonsets) to the current release
of the operator.

The [registry-url]/[namespace]/[image] are kept intact on the upgrade.

No upgrade is done for:

- Non-operator managed deployments
- Operator deployments without numeric tags

Closes #702

Signed-off-by: Oleg Zhurakivskyy <oleg.zhurakivskyy@intel.com>
2022-02-18 12:52:55 +02:00
Ed Bartosh
cec004c398 lint: enable wsl check
Fixes: #392

Signed-off-by: Ed Bartosh <eduard.bartosh@intel.com>
2021-12-17 11:48:48 +02:00
Hyeongju Johannes Lee
251727a3db operator: add node selection constraint (amd64 arch)
In order to make controllers consistent, I add a nodeselector constraint of daemonset to dlb, fpga, qat too.
Since the same code is commonly used in many files, I add a function that replaces duplicated code.

Signed-off-by: Hyeongju Johannes Lee <hyeongju.lee@intel.com>
2021-12-02 08:54:50 -08:00
Mikko Ylinen
b63bb53057 operator: allow controllers to touch ownerReferences always
Resources in clusters with OwnerReferencesPermissionEnforcement
(e.g., OpenShift) get stricter checks for metadata.ownerReferences.

This appears via errors like:
“is forbidden: cannot set blockOwnerDeletion if an ownerReference refers to
a resource you can’t set finalizers on: ...”

The fix is to add "update" permissions to finalizers subresource
for the xDevicePlugins resources.

Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2021-11-26 08:28:29 +02:00
Ed Bartosh
b9b2de7889 operator: test NewDaemonSet for all plugins
Signed-off-by: Ed Bartosh <eduard.bartosh@intel.com>
2021-11-22 20:20:34 +02:00
Ed Bartosh
b6caadfc63 operator: use go:embed to generate daemonset objects
Signed-off-by: Ed Bartosh <eduard.bartosh@intel.com>
2021-11-22 16:55:55 +02:00
Xu, Guoshu
e4c4a8f7ac GPU devices resource preferred allocation methods.
1. Implement PreferredAllocator interface.
2. Provide 3 preferred allocation policies: balancedPolicy, packedPolicy and nonePolicy.
3. Provide the cmdline interface: -allocation-policy balanced/packed/none, to select which preferred allocation policy to use.
4. Add operator support.

Co-authored-by: Mikko Ylinen <mikko.ylinen@intel.com>
2021-11-17 22:55:10 +08:00
Eero Tamminen
86a86e2863 Add "-enable-monitoring" GPU plugin option operator support
Based on Ukri's examples and tested by Ukri (thanks!).
2021-06-29 17:33:03 +03:00
Ukri Niemimuukko
efe6efdc0e fix flag value
Signed-off-by: Ukri Niemimuukko <ukri.niemimuukko@intel.com>
2021-06-24 12:40:56 +03:00
Ukri Niemimuukko
cf0073878b address Eero's comments
Signed-off-by: Ukri Niemimuukko <ukri.niemimuukko@intel.com>
2021-06-24 12:07:12 +03:00
Ukri Niemimuukko
39f7c4c747 gpu resource manager operator parts
Signed-off-by: Ukri Niemimuukko <ukri.niemimuukko@intel.com>
2021-06-24 11:49:08 +03:00
Mikko Ylinen
37618d4f85 operator: move deviceplugin/v1 CRDs to cluster scope
The device plugins daemonsets are cluster wide and currently only
one device plugin instance per device is possible so making the
corresponding deviceplugin/v1 CRDs non-namespaced (i.e., scope: cluster)
fits better.

Previously, the device plugin daemonset was deployed in the same
namespace as the CR for that device but with the cluster scoped CRDs
we default to use the same namespace as the operator, unless overridden
via DEVICEPLUGIN_NAMESPACE env variable or a command line parameter
to operator manager deployment.

Three additional changes in this commit:
- enable DSA envtest tests
- update controller-runtime to v0.8.1
- change device plugin envtest suite to use klog/v2

Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2021-02-11 11:41:47 +02:00
Dmitry Rozhkov
5f0da56045 Upgrade to k8s v1.19.3 2020-11-10 16:09:20 +02:00
Ukri Niemimuukko
c935570bab operator: GPU-plugin initImage
This adds the initImage field to the custom resource definition
and takes it into use.

The fpga webhook image validation function is split off into a
separate file.

Signed-off-by: Ukri Niemimuukko <ukri.niemimuukko@intel.com>
2020-11-09 20:55:12 +02:00
Dmitry Rozhkov
d27b12a925 operator: fix crash when assigning to nil map 2020-06-26 12:35:25 +03:00
Dmitry Rozhkov
6b2fa0a264 operator: initial version with gpu and qat controllers 2020-06-25 13:48:41 +03:00