Commit Graph

544 Commits

Author SHA1 Message Date
Mikko Ylinen
5a59385a09 qat: drop c6xxvf from defaults
The devices searched by default are QAT Gen4+ only.

Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2024-06-11 07:31:49 +03:00
Tuomas Katila
20b7b5a4d7
Merge pull request #1748 from mythi/PR-2024-013
pkg/deviceplugin: move to grpc.NewClient()
2024-05-28 12:09:22 +03:00
Tuomas Katila
11c9753aca
Merge pull request #1745 from bart0sh/PR155-fpga-support-CDI
FPGA: support CDI
2024-05-28 11:19:58 +03:00
Mikko Ylinen
4d858c5364 pkg/deviceplugin: move to grpc.NewClient()
grpc.NewClient(), added in grpc-go v1.63, is the preferred way to
create a new ClientConn. In most of our usages, moving away from
grpc.Dial*() to it is straightforward.

However, we've also relied on grpc.Dial*()'s behavior to automatically
make a new connection to "test" a connection is successful isn't available
anymore. Combined with grpc.WithBlock dialoption this usage is considered
"especially bad" way to handle a client connection.

The recommended approach to test a server connection is to separately
make a connection and watch the connection state to become Ready. This
change follows that recommendation.

Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2024-05-28 08:17:06 +03:00
Ed Bartosh
e58369ed13 rename deprecated prestart to createRuntime
`prestart` hook is marked as deprecated in the OCI runtime spec:
https://github.com/opencontainers/runtime-spec/blob/main/config.md#posix-platform-hooks

Renamed `prestart` to the `createRuntime` as suggested in the spec.

Replaced `CDI hook` with `OCI hook` to be more clear. CDI is just a
way to update OCI config and theoretically there is no such thing as
CDI hook.
2024-05-22 19:54:53 +03:00
Ed Bartosh
1fa557e680 crihook: update documentation 2024-05-22 15:59:36 +03:00
Ed Bartosh
ca6f8f3020 cri_hook: remove annotation check 2024-05-22 14:57:03 +03:00
Ed Bartosh
d245b2609d fpga: use CDI to run hooks 2024-05-22 14:56:58 +03:00
Ed Bartosh
988fbed528 deviceplugin: add DeviceInfo.hooks field 2024-05-22 13:13:38 +03:00
Tuomas Katila
e753423884
Merge pull request #1685 from hj-johannes-lee/PR-2024-001
qat: improve qat_dpdk_app, openssl-qat-engine
2024-05-15 11:28:26 +03:00
Tuomas Katila
7caba390e3 xpumanager sidecar: add note about using HTTPS with xpum
Signed-off-by: Tuomas Katila <tuomas.katila@intel.com>
2024-05-14 12:54:29 +03:00
Hyeongju Johannes Lee
2af37fd4cb qat_dpdk_app: drop generic
Signed-off-by: Hyeongju Johannes Lee <hyeongju.lee@intel.com>
2024-05-07 20:46:12 +03:00
Tuomas Katila
ff91a97934
Merge pull request #1720 from mythi/PR-2024-010
ci: move to golangci-lint v1.57.2
2024-05-03 12:55:29 +03:00
Tuomas Katila
05bb8ef156 qat: add support for 420xx driver and its devices (4946)
Signed-off-by: Tuomas Katila <tuomas.katila@intel.com>
2024-05-02 11:36:13 +03:00
Mikko Ylinen
54f9d730e9 ci: move to golangci-lint v1.57.2
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2024-05-02 09:18:27 +03:00
Tuomas Katila
4946b26018 gpu: doc: monitoring resource notes
Also align xelink-sidecar deployment with the new files in
the xpu manager project.

Signed-off-by: Tuomas Katila <tuomas.katila@intel.com>
2024-03-13 08:16:16 +02:00
Tuomas Katila
1de1024530 gpu: add xe notes
Co-authored-by: Eero Tamminen <eero.t.tamminen@intel.com>
Signed-off-by: Tuomas Katila <tuomas.katila@intel.com>
2024-03-12 15:41:44 +02:00
Tuomas Katila
e600fe9313 gpu: add support for the upcoming xe-driver
Plugin can support both i915 and xe drivers dynamically. But
having both drivers on same node with RM is not possible.

Signed-off-by: Tuomas Katila <tuomas.katila@intel.com>
2024-03-12 11:34:01 +02:00
Tuomas Katila
d5cb53a1d1 labeler: add xe support for tile counting
Signed-off-by: Tuomas Katila <tuomas.katila@intel.com>
2024-03-11 11:32:25 +02:00
Tuomas Katila
af04d41e1b labeler: use a function to store splittable labels
Signed-off-by: Tuomas Katila <tuomas.katila@intel.com>
2024-03-11 11:22:33 +02:00
Mikko Ylinen
2399794ef8 webhooks: make SGX mutator registration to follow other webhooks
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2024-03-05 17:38:26 +02:00
hugo-syn
039865aec8
chore: Fix multiple typos (#1653)
* chore: Fix multiple typos

Signed-off-by: hugo-syn <hugo.vincent@synacktiv.com>
2024-01-25 08:18:48 +02:00
Oleg Zhurakivskyy
ab0e8bc146 qat: Add annotation configurability in the operator
Signed-off-by: Oleg Zhurakivskyy <oleg.zhurakivskyy@intel.com>
2024-01-09 10:20:16 +02:00
Tuomas Katila
fd3ad4003f gpu: restructure readme
Split readme into smaller chunks, show only one "easy installation"
and hide the rest. Add some notes about tile resources.

Co-authored-by: Eero Tamminen <eero.t.tamminen@intel.com>
Signed-off-by: Tuomas Katila <tuomas.katila@intel.com>
2023-12-08 08:42:08 +02:00
Tuomas Katila
8640b1501c gpu: default to flat/combined mode for l0 affinity mask
With tile requests, the level zero affinity mask now defaults to
flat/combined mode. If ZE_FLAT_DEVICE_HIERARCHY is set to COMPOSITE
in the Pod's specification, plugin will use the previous "x.y" format
instead of the new "x" in the affinity mask.

Signed-off-by: Tuomas Katila <tuomas.katila@intel.com>
2023-12-08 08:42:02 +02:00
Tuomas Katila
ef16dc0e9d gpu: labeler: convert private getter functions into public
Signed-off-by: Tuomas Katila <tuomas.katila@intel.com>
2023-12-08 08:40:06 +02:00
Tuomas Katila
a0bc682b9b labeler: fix codeql issues
Signed-off-by: Tuomas Katila <tuomas.katila@intel.com>
2023-12-05 16:13:28 +02:00
Eero Tamminen
3ade6d44ce List writable render devices with no render-device.sh args
To help debugging potential kubernetes device usage issues.

Signed-off-by: Eero Tamminen <eero.t.tamminen@intel.com>
2023-12-01 21:07:30 +02:00
Eero Tamminen
4b3944600f Fix (harmless) render-device.sh shellcheck warnings
Signed-off-by: Eero Tamminen <eero.t.tamminen@intel.com>
2023-12-01 21:07:30 +02:00
Mikko Ylinen
d7997800a9 logging: move away from klogr to ktesting/textlogger
klog has added ktesting/textlogger and is going to deprecate
klogr. The deprecation is going to trigger golangci-lint (staticcheck)
errors so rework the logging and move to ktesting/textlogger.

The commit also fixes the loglevel setting with operator.

Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2023-11-20 09:46:41 +02:00
Tuomas Katila
89a0ad9b1d
Merge pull request #1580 from hsyrja/main
xelink support
2023-11-17 13:02:30 +02:00
Harri Syrjä
1d8e10dc17 lint issues fixed 2023-11-17 11:36:49 +02:00
Harri Syrjä
6b9b3c4cdf Xelink support with all changes 2023-11-15 16:17:03 +02:00
Tuomas Katila
f9221c46fd operator: remove one-cr-per-kind limitation
Differentiate objects by adding cr names as suffixes
Drop kind book keeping and related functions from controllers

Signed-off-by: Tuomas Katila <tuomas.katila@intel.com>
2023-11-09 13:05:40 +02:00
Mikko Ylinen
33e0e21a8b gpu: fix klog formatting typo
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2023-11-03 09:29:21 +02:00
Hyeongju Johannes Lee
f55e4327a7 dlb, demo: update dlb to 8.5.2
Signed-off-by: Hyeongju Johannes Lee <hyeongju.lee@intel.com>
2023-10-31 15:30:54 -07:00
Tuomas Katila
72b46bd349 iaa: convert qpl example to accel-config-demo
Signed-off-by: Tuomas Katila <tuomas.katila@intel.com>
2023-10-23 10:57:19 +03:00
Tuomas Katila
5016f54e47 qat: add support for new capabilities
Signed-off-by: Tuomas Katila <tuomas.katila@intel.com>
2023-10-13 15:09:57 +03:00
Tuomas Katila
c7162df440 qat: add heartbeat check and use that as a device healthiness indicator
Signed-off-by: Tuomas Katila <tuomas.katila@intel.com>
2023-10-13 15:09:04 +03:00
Mikko Ylinen
319843c94e vpu: remove deprecated plugin
The VPU plugin can only be used with devices that are
no longer supported by upper layers, such as OpenVINO.

The deprecation plan for the plugin was announced earlier
this year and post v0.28 marks the date when the plugin is removed
from the repo.

Releases before v0.29 have the plugin available should it
be needed.

Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2023-10-02 15:28:11 +03:00
Mikko Ylinen
4ba166a40e qat: make device ID scan less verbose
currently, the QAT plugin warns when it finds a PCI ID that is
not an enabled QAT device. This is too verbose so lower the
log priority to "Info".

Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2023-09-20 14:02:52 +03:00
Mikko Ylinen
834f598f80 deployments: update to NFD v0.14.1 and drop custom GPU deployment
With the NFD recent versions (v0.13+), it's no longer necessary to
start NFD with custom nfd-master args/rbac settings to get numeric
labels registered as extended resources.

The same can be specified via NodeFeatureRules which also works for
"local" source with feature files.

Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2023-09-20 14:02:52 +03:00
Mikko Ylinen
44cc057b98
Merge pull request #1499 from hj-johannes-lee/PR-2023-026
operator: add image upgrade with env vars
2023-09-19 12:19:37 +03:00
Hyeongju Johannes Lee
6a60c745d2 operator: add image upgrade with env vars
Signed-off-by: Hyeongju Johannes Lee <hyeongju.lee@intel.com>
2023-09-18 11:45:52 -07:00
Mikko Ylinen
06f1db9fb8 qat: update README
The documentation needs clarifications to how QAT Gen4 SW differs from
older platfoms:

- only upstream driver is available and due to this, the -mode parameter
  is now deprecated
- the QAT VF services are configurable and thus the resource names
  differ

Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2023-09-15 18:49:52 +03:00
Tuomas Katila
031ee64590 gpu/doc: Add Max Series support and a note about SR-IOV
Signed-off-by: Tuomas Katila <tuomas.katila@intel.com>
2023-09-14 13:21:30 +03:00
Tuomas Katila
827b9a0ced fix crash with rm when kubelet request timeouts
Signed-off-by: Tuomas Katila <tuomas.katila@intel.com>
2023-09-12 16:20:33 +03:00
Tuomas Katila
ea659a5e4b nfd: add rules to label nodes with different GPUs
Signed-off-by: Tuomas Katila <tuomas.katila@intel.com>
2023-09-12 16:20:33 +03:00
Tuomas Katila
691dfc3483 gpu: refactor nfdhook functionality to plugin
NFD v0.14+ doesn't support binary NFD hooks by default, so there is
a need to move the label creation away from the GPU nfdhook.

Move extended resource label creation to plugin, and drop labels that were
already marked deprecated (platform_gen, media_version etc.).

Drop init-container from deployment files and operator. It is still possible
to use an initcontainer, but the default deployments do not support it.

Signed-off-by: Tuomas Katila <tuomas.katila@intel.com>
2023-09-12 16:20:33 +03:00
Tuomas Katila
e0c0bd612a
Merge pull request #1518 from uniemimu/fake
gpu_fakedev: better dra support
2023-09-08 14:08:34 +03:00