Commit Graph

31 Commits

Author SHA1 Message Date
Tuomas Katila
74006cda80 depl: drop capabilities from all plugins
Signed-off-by: Tuomas Katila <tuomas.katila@intel.com>
2025-01-02 15:42:32 +02:00
Hyeongju Johannes Lee
3b08a9074d Add cpu/memory requests and limits
Operator maturity level 3 requires cpu/memory requests and limits
for operands. Add them to all plugins deployed by operator

Signed-off-by: Hyeongju Johannes Lee <hyeongju.lee@intel.com>
2024-09-25 03:42:19 -07:00
Tuomas Katila
518a8606ff gpu: add levelzero sidecar support for plugin and the deployment files
In addition to the levelzero's health data use, this adds support to
scan devices in WSL. Scanning happens by retrieving Intel device
indices from the Level-Zero API.

Signed-off-by: Tuomas Katila <tuomas.katila@intel.com>
2024-09-19 19:14:15 +03:00
Tuomas Katila
402fb8d9cd gpu: add support for CDI devices
Signed-off-by: Tuomas Katila <tuomas.katila@intel.com>
2024-09-11 09:29:55 +03:00
Hyeongju Johannes Lee
83aa236e70 gpu: add updateStrategy to daemonSet
Signed-off-by: Hyeongju Johannes Lee <hyeongju.lee@intel.com>
2024-07-25 12:54:30 +03:00
Tuomas Katila
95b7230374 gpu: enable monitoring for the default installations
Signed-off-by: Tuomas Katila <tuomas.katila@intel.com>
2023-12-08 08:42:08 +02:00
Tuomas Katila
691dfc3483 gpu: refactor nfdhook functionality to plugin
NFD v0.14+ doesn't support binary NFD hooks by default, so there is
a need to move the label creation away from the GPU nfdhook.

Move extended resource label creation to plugin, and drop labels that were
already marked deprecated (platform_gen, media_version etc.).

Drop init-container from deployment files and operator. It is still possible
to use an initcontainer, but the default deployments do not support it.

Signed-off-by: Tuomas Katila <tuomas.katila@intel.com>
2023-09-12 16:20:33 +03:00
Tuomas Katila
cb04ca0deb deployments: move from 'patchesStrategicMerge' to 'patches'
Signed-off-by: Tuomas Katila <tuomas.katila@intel.com>
2023-08-03 10:37:44 +03:00
Tuomas Katila
ec2930b331 deployments: move from 'bases' to 'resources'
Signed-off-by: Tuomas Katila <tuomas.katila@intel.com>
2023-08-03 10:37:44 +03:00
Tuomas Katila
974829ff7c gpu: try to fetch PodList from kubelet API
In large clusters and with resource management, the load
from gpu-plugins can become heavy for the api-server.
This change will start fetching pod listings from kubelet
and use api-server as a backup. Any other error than timeout
will also move the logic back to using api-server.

Signed-off-by: Tuomas Katila <tuomas.katila@intel.com>
2023-03-30 12:43:02 +03:00
Tuomas Katila
3a1880ec8b Remove overlays using kube-system
Signed-off-by: Tuomas Katila <tuomas.katila@intel.com>
2023-02-13 12:47:22 +02:00
Mikko Ylinen
f559d8717d
Merge pull request #1327 from eero-t/nfd-features
Use more generic name for NFD features host directory volume
2023-02-13 11:45:26 +02:00
Eero Tamminen
2f3dc23651 Use more generic name for NFD features host directory volume
NFD hooks are deprecated and going away:
https://github.com/kubernetes-sigs/node-feature-discovery/issues/856

This makes the mount names more future-proof, and shows where later
changes need to be done (to change operator mount directory, and
switch hook-using deployments e.g. to feature files).

Signed-off-by: Eero Tamminen <eero.t.tamminen@intel.com>
2023-02-08 18:20:41 +02:00
Tuomas Katila
d1e8350c6e gpu: add new nfd + monitoring + shared-dev deployment option
Signed-off-by: Tuomas Katila <tuomas.katila@intel.com>
2023-01-05 14:13:13 +02:00
Ukri Niemimuukko
1d09cd6549 align gpu kustomize object naming with operator naming
Operator has used "gpu-manager" as part of the cluster object names
it creates. Kustomize based deployments can be aligned with that.

Signed-off-by: Ukri Niemimuukko <ukri.niemimuukko@intel.com>
2022-09-26 19:50:55 +03:00
Tuomas Katila
8ecf258a82 gpu: add nodeSelector to fractional overlay
Updated documentation indicates that fractional overlay
uses nfd so maybe it should.

Signed-off-by: Tuomas Katila <tuomas.katila@intel.com>
2022-09-22 10:45:11 +03:00
Manish Regmi
22e9d5f882 add selinux labels for GPU plugins 2022-09-15 14:44:51 -04:00
Tuomas Katila
c562db9b28 gpu: Improve installation options and documentation
Signed-off-by: Tuomas Katila <tuomas.katila@intel.com>
2022-09-15 15:19:23 +03:00
Alex Nordlund
79986a6096 Replace to-be obsolete patchesStrategicMerge with patches 2022-08-25 21:07:32 +02:00
Alex Nordlund
0636e2d3a1 Replace obsolete patches with patchesStrategicMerge
This was made obsolete in v1.0.9
https://github.com/kubernetes-sigs/kustomize/blob/v1.0.9/pkg/types/kustomization.go#L129
And stopped working in v3.0.3
https://github.com/kubernetes-sigs/kustomize/issues/1373
2022-08-25 09:28:03 +02:00
Ed Bartosh
13780a8cdc implement terrascan check
Signed-off-by: Ed Bartosh <eduard.bartosh@intel.com>
2022-03-01 15:54:28 +02:00
Ukri Niemimuukko
2341119d5e remove monitoring arg from fractional resource overlay
An easter-egg slipped in the args. This removes it.

Signed-off-by: Ukri Niemimuukko <ukri.niemimuukko@intel.com>
2021-06-16 12:09:07 +03:00
Ukri Niemimuukko
2c4d529d66 gpu_plugin: fractional resource management
Fractional resource management feature

Signed-off-by: Ukri Niemimuukko <ukri.niemimuukko@intel.com>
Signed-off-by: Dmitry Rozhkov <dmitry.rozhkov@intel.com>
2021-06-04 13:06:50 +03:00
Oleg Zhurakivskyy
272625cb39 deployments: Add missing default imagePullPolicy 2020-11-12 16:12:27 +00:00
Ukri Niemimuukko
505eadaf94 gpu-plugin nfd-hook
This adds an nfd-hook for the gpu-plugin, which will create labels
for the GPUs that can then be used for POD deployment purposes or
creation of GPU extended resources which allow then finer grained
GPU resource management.

The nfd-hook will install to the host system when the
intel-gpu-initcontainer is run. It is added into the plugin deployment
yaml.

Signed-off-by: Ukri Niemimuukko <ukri.niemimuukko@intel.com>
2020-10-01 12:02:57 +03:00
Dan Garfield
3be22ab9af
Add node selector to restrict to x86
Without the node selector this will deploy on arm nodes with a constantly retrying pod.
2020-04-23 20:55:20 -06:00
Antti Kervinen
d568f050c5 gpu_plugin: add kustomizations
- Default deployment: `kubectl apply -k deployments/gpu_plugin`
- Default deployment does not specify namespace anymore
  (was: `kube-system`).
- Variant: deploy only on nodes with Intel GPU label by NFD:
  `kubectl apply -k deployments/gpu_plugin/overlays/nfd_labeled_nodes`
- Variant: deploy to `kube-system` instead of user-defined namespace
  (or "default"):
  `kubectl apply -k deployments/gpu_plugin/overlays/namespace_kube-system`
- GPU plugin README updated.

Signed-off-by: Antti Kervinen <antti.kervinen@intel.com>
2020-02-07 14:56:52 +02:00
Mikko Ylinen
fd631fc31c deployments/gpu_plugin: limit host mounts
The default deployment gives rather wide host mounts. We can limit
the mounts only to the subdirectories the plugin needs and mount
them read-only.

Also, add notes that both QAT and GPU plugins can be run as non-root
user.

Fixes: #228

Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2019-12-11 12:54:36 +02:00
Mikko Ylinen
7a8ff9ccc1 deployments: set readOnlyRootFilesystem
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2019-08-30 12:53:17 +03:00
Mikko Ylinen
d06f98690f images: tag with intel prefix
In preparations to get some of the images to hub.docker.com/intel,
start using intel/ prefix.

Moreover, set the Makefile variables so that the images built
by make [images|demos] can easily be pushed to any registry/org
by 'docker push' (e.g., by Jenkins).

Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2019-08-29 13:21:19 +03:00
Ed Bartosh
14b4168cbd add GPU plugin deployment
Added DaemonSet yaml
Added deployment instructions to plugin's README
2018-09-14 13:55:08 +03:00