Commit Graph

17 Commits

Author SHA1 Message Date
Tuomas Katila
74006cda80 depl: drop capabilities from all plugins
Signed-off-by: Tuomas Katila <tuomas.katila@intel.com>
2025-01-02 15:42:32 +02:00
Tuomas Katila
518a8606ff gpu: add levelzero sidecar support for plugin and the deployment files
In addition to the levelzero's health data use, this adds support to
scan devices in WSL. Scanning happens by retrieving Intel device
indices from the Level-Zero API.

Signed-off-by: Tuomas Katila <tuomas.katila@intel.com>
2024-09-19 19:14:15 +03:00
Tuomas Katila
95b7230374 gpu: enable monitoring for the default installations
Signed-off-by: Tuomas Katila <tuomas.katila@intel.com>
2023-12-08 08:42:08 +02:00
Tuomas Katila
691dfc3483 gpu: refactor nfdhook functionality to plugin
NFD v0.14+ doesn't support binary NFD hooks by default, so there is
a need to move the label creation away from the GPU nfdhook.

Move extended resource label creation to plugin, and drop labels that were
already marked deprecated (platform_gen, media_version etc.).

Drop init-container from deployment files and operator. It is still possible
to use an initcontainer, but the default deployments do not support it.

Signed-off-by: Tuomas Katila <tuomas.katila@intel.com>
2023-09-12 16:20:33 +03:00
Tuomas Katila
cb04ca0deb deployments: move from 'patchesStrategicMerge' to 'patches'
Signed-off-by: Tuomas Katila <tuomas.katila@intel.com>
2023-08-03 10:37:44 +03:00
Tuomas Katila
ec2930b331 deployments: move from 'bases' to 'resources'
Signed-off-by: Tuomas Katila <tuomas.katila@intel.com>
2023-08-03 10:37:44 +03:00
Tuomas Katila
974829ff7c gpu: try to fetch PodList from kubelet API
In large clusters and with resource management, the load
from gpu-plugins can become heavy for the api-server.
This change will start fetching pod listings from kubelet
and use api-server as a backup. Any other error than timeout
will also move the logic back to using api-server.

Signed-off-by: Tuomas Katila <tuomas.katila@intel.com>
2023-03-30 12:43:02 +03:00
Tuomas Katila
3a1880ec8b Remove overlays using kube-system
Signed-off-by: Tuomas Katila <tuomas.katila@intel.com>
2023-02-13 12:47:22 +02:00
Tuomas Katila
d1e8350c6e gpu: add new nfd + monitoring + shared-dev deployment option
Signed-off-by: Tuomas Katila <tuomas.katila@intel.com>
2023-01-05 14:13:13 +02:00
Ukri Niemimuukko
1d09cd6549 align gpu kustomize object naming with operator naming
Operator has used "gpu-manager" as part of the cluster object names
it creates. Kustomize based deployments can be aligned with that.

Signed-off-by: Ukri Niemimuukko <ukri.niemimuukko@intel.com>
2022-09-26 19:50:55 +03:00
Tuomas Katila
8ecf258a82 gpu: add nodeSelector to fractional overlay
Updated documentation indicates that fractional overlay
uses nfd so maybe it should.

Signed-off-by: Tuomas Katila <tuomas.katila@intel.com>
2022-09-22 10:45:11 +03:00
Tuomas Katila
c562db9b28 gpu: Improve installation options and documentation
Signed-off-by: Tuomas Katila <tuomas.katila@intel.com>
2022-09-15 15:19:23 +03:00
Alex Nordlund
79986a6096 Replace to-be obsolete patchesStrategicMerge with patches 2022-08-25 21:07:32 +02:00
Alex Nordlund
0636e2d3a1 Replace obsolete patches with patchesStrategicMerge
This was made obsolete in v1.0.9
https://github.com/kubernetes-sigs/kustomize/blob/v1.0.9/pkg/types/kustomization.go#L129
And stopped working in v3.0.3
https://github.com/kubernetes-sigs/kustomize/issues/1373
2022-08-25 09:28:03 +02:00
Ukri Niemimuukko
2341119d5e remove monitoring arg from fractional resource overlay
An easter-egg slipped in the args. This removes it.

Signed-off-by: Ukri Niemimuukko <ukri.niemimuukko@intel.com>
2021-06-16 12:09:07 +03:00
Ukri Niemimuukko
2c4d529d66 gpu_plugin: fractional resource management
Fractional resource management feature

Signed-off-by: Ukri Niemimuukko <ukri.niemimuukko@intel.com>
Signed-off-by: Dmitry Rozhkov <dmitry.rozhkov@intel.com>
2021-06-04 13:06:50 +03:00
Antti Kervinen
d568f050c5 gpu_plugin: add kustomizations
- Default deployment: `kubectl apply -k deployments/gpu_plugin`
- Default deployment does not specify namespace anymore
  (was: `kube-system`).
- Variant: deploy only on nodes with Intel GPU label by NFD:
  `kubectl apply -k deployments/gpu_plugin/overlays/nfd_labeled_nodes`
- Variant: deploy to `kube-system` instead of user-defined namespace
  (or "default"):
  `kubectl apply -k deployments/gpu_plugin/overlays/namespace_kube-system`
- GPU plugin README updated.

Signed-off-by: Antti Kervinen <antti.kervinen@intel.com>
2020-02-07 14:56:52 +02:00