Commit Graph

533 Commits

Author SHA1 Message Date
Tuomas Katila
7caba390e3 xpumanager sidecar: add note about using HTTPS with xpum
Signed-off-by: Tuomas Katila <tuomas.katila@intel.com>
2024-05-14 12:54:29 +03:00
Tuomas Katila
ff91a97934
Merge pull request #1720 from mythi/PR-2024-010
ci: move to golangci-lint v1.57.2
2024-05-03 12:55:29 +03:00
Tuomas Katila
05bb8ef156 qat: add support for 420xx driver and its devices (4946)
Signed-off-by: Tuomas Katila <tuomas.katila@intel.com>
2024-05-02 11:36:13 +03:00
Mikko Ylinen
54f9d730e9 ci: move to golangci-lint v1.57.2
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2024-05-02 09:18:27 +03:00
Tuomas Katila
4946b26018 gpu: doc: monitoring resource notes
Also align xelink-sidecar deployment with the new files in
the xpu manager project.

Signed-off-by: Tuomas Katila <tuomas.katila@intel.com>
2024-03-13 08:16:16 +02:00
Tuomas Katila
1de1024530 gpu: add xe notes
Co-authored-by: Eero Tamminen <eero.t.tamminen@intel.com>
Signed-off-by: Tuomas Katila <tuomas.katila@intel.com>
2024-03-12 15:41:44 +02:00
Tuomas Katila
e600fe9313 gpu: add support for the upcoming xe-driver
Plugin can support both i915 and xe drivers dynamically. But
having both drivers on same node with RM is not possible.

Signed-off-by: Tuomas Katila <tuomas.katila@intel.com>
2024-03-12 11:34:01 +02:00
Tuomas Katila
d5cb53a1d1 labeler: add xe support for tile counting
Signed-off-by: Tuomas Katila <tuomas.katila@intel.com>
2024-03-11 11:32:25 +02:00
Tuomas Katila
af04d41e1b labeler: use a function to store splittable labels
Signed-off-by: Tuomas Katila <tuomas.katila@intel.com>
2024-03-11 11:22:33 +02:00
Mikko Ylinen
2399794ef8 webhooks: make SGX mutator registration to follow other webhooks
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2024-03-05 17:38:26 +02:00
hugo-syn
039865aec8
chore: Fix multiple typos (#1653)
* chore: Fix multiple typos

Signed-off-by: hugo-syn <hugo.vincent@synacktiv.com>
2024-01-25 08:18:48 +02:00
Oleg Zhurakivskyy
ab0e8bc146 qat: Add annotation configurability in the operator
Signed-off-by: Oleg Zhurakivskyy <oleg.zhurakivskyy@intel.com>
2024-01-09 10:20:16 +02:00
Tuomas Katila
fd3ad4003f gpu: restructure readme
Split readme into smaller chunks, show only one "easy installation"
and hide the rest. Add some notes about tile resources.

Co-authored-by: Eero Tamminen <eero.t.tamminen@intel.com>
Signed-off-by: Tuomas Katila <tuomas.katila@intel.com>
2023-12-08 08:42:08 +02:00
Tuomas Katila
8640b1501c gpu: default to flat/combined mode for l0 affinity mask
With tile requests, the level zero affinity mask now defaults to
flat/combined mode. If ZE_FLAT_DEVICE_HIERARCHY is set to COMPOSITE
in the Pod's specification, plugin will use the previous "x.y" format
instead of the new "x" in the affinity mask.

Signed-off-by: Tuomas Katila <tuomas.katila@intel.com>
2023-12-08 08:42:02 +02:00
Tuomas Katila
ef16dc0e9d gpu: labeler: convert private getter functions into public
Signed-off-by: Tuomas Katila <tuomas.katila@intel.com>
2023-12-08 08:40:06 +02:00
Tuomas Katila
a0bc682b9b labeler: fix codeql issues
Signed-off-by: Tuomas Katila <tuomas.katila@intel.com>
2023-12-05 16:13:28 +02:00
Eero Tamminen
3ade6d44ce List writable render devices with no render-device.sh args
To help debugging potential kubernetes device usage issues.

Signed-off-by: Eero Tamminen <eero.t.tamminen@intel.com>
2023-12-01 21:07:30 +02:00
Eero Tamminen
4b3944600f Fix (harmless) render-device.sh shellcheck warnings
Signed-off-by: Eero Tamminen <eero.t.tamminen@intel.com>
2023-12-01 21:07:30 +02:00
Mikko Ylinen
d7997800a9 logging: move away from klogr to ktesting/textlogger
klog has added ktesting/textlogger and is going to deprecate
klogr. The deprecation is going to trigger golangci-lint (staticcheck)
errors so rework the logging and move to ktesting/textlogger.

The commit also fixes the loglevel setting with operator.

Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2023-11-20 09:46:41 +02:00
Tuomas Katila
89a0ad9b1d
Merge pull request #1580 from hsyrja/main
xelink support
2023-11-17 13:02:30 +02:00
Harri Syrjä
1d8e10dc17 lint issues fixed 2023-11-17 11:36:49 +02:00
Harri Syrjä
6b9b3c4cdf Xelink support with all changes 2023-11-15 16:17:03 +02:00
Tuomas Katila
f9221c46fd operator: remove one-cr-per-kind limitation
Differentiate objects by adding cr names as suffixes
Drop kind book keeping and related functions from controllers

Signed-off-by: Tuomas Katila <tuomas.katila@intel.com>
2023-11-09 13:05:40 +02:00
Mikko Ylinen
33e0e21a8b gpu: fix klog formatting typo
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2023-11-03 09:29:21 +02:00
Hyeongju Johannes Lee
f55e4327a7 dlb, demo: update dlb to 8.5.2
Signed-off-by: Hyeongju Johannes Lee <hyeongju.lee@intel.com>
2023-10-31 15:30:54 -07:00
Tuomas Katila
72b46bd349 iaa: convert qpl example to accel-config-demo
Signed-off-by: Tuomas Katila <tuomas.katila@intel.com>
2023-10-23 10:57:19 +03:00
Tuomas Katila
5016f54e47 qat: add support for new capabilities
Signed-off-by: Tuomas Katila <tuomas.katila@intel.com>
2023-10-13 15:09:57 +03:00
Tuomas Katila
c7162df440 qat: add heartbeat check and use that as a device healthiness indicator
Signed-off-by: Tuomas Katila <tuomas.katila@intel.com>
2023-10-13 15:09:04 +03:00
Mikko Ylinen
319843c94e vpu: remove deprecated plugin
The VPU plugin can only be used with devices that are
no longer supported by upper layers, such as OpenVINO.

The deprecation plan for the plugin was announced earlier
this year and post v0.28 marks the date when the plugin is removed
from the repo.

Releases before v0.29 have the plugin available should it
be needed.

Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2023-10-02 15:28:11 +03:00
Mikko Ylinen
4ba166a40e qat: make device ID scan less verbose
currently, the QAT plugin warns when it finds a PCI ID that is
not an enabled QAT device. This is too verbose so lower the
log priority to "Info".

Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2023-09-20 14:02:52 +03:00
Mikko Ylinen
834f598f80 deployments: update to NFD v0.14.1 and drop custom GPU deployment
With the NFD recent versions (v0.13+), it's no longer necessary to
start NFD with custom nfd-master args/rbac settings to get numeric
labels registered as extended resources.

The same can be specified via NodeFeatureRules which also works for
"local" source with feature files.

Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2023-09-20 14:02:52 +03:00
Mikko Ylinen
44cc057b98
Merge pull request #1499 from hj-johannes-lee/PR-2023-026
operator: add image upgrade with env vars
2023-09-19 12:19:37 +03:00
Hyeongju Johannes Lee
6a60c745d2 operator: add image upgrade with env vars
Signed-off-by: Hyeongju Johannes Lee <hyeongju.lee@intel.com>
2023-09-18 11:45:52 -07:00
Mikko Ylinen
06f1db9fb8 qat: update README
The documentation needs clarifications to how QAT Gen4 SW differs from
older platfoms:

- only upstream driver is available and due to this, the -mode parameter
  is now deprecated
- the QAT VF services are configurable and thus the resource names
  differ

Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2023-09-15 18:49:52 +03:00
Tuomas Katila
031ee64590 gpu/doc: Add Max Series support and a note about SR-IOV
Signed-off-by: Tuomas Katila <tuomas.katila@intel.com>
2023-09-14 13:21:30 +03:00
Tuomas Katila
827b9a0ced fix crash with rm when kubelet request timeouts
Signed-off-by: Tuomas Katila <tuomas.katila@intel.com>
2023-09-12 16:20:33 +03:00
Tuomas Katila
ea659a5e4b nfd: add rules to label nodes with different GPUs
Signed-off-by: Tuomas Katila <tuomas.katila@intel.com>
2023-09-12 16:20:33 +03:00
Tuomas Katila
691dfc3483 gpu: refactor nfdhook functionality to plugin
NFD v0.14+ doesn't support binary NFD hooks by default, so there is
a need to move the label creation away from the GPU nfdhook.

Move extended resource label creation to plugin, and drop labels that were
already marked deprecated (platform_gen, media_version etc.).

Drop init-container from deployment files and operator. It is still possible
to use an initcontainer, but the default deployments do not support it.

Signed-off-by: Tuomas Katila <tuomas.katila@intel.com>
2023-09-12 16:20:33 +03:00
Tuomas Katila
e0c0bd612a
Merge pull request #1518 from uniemimu/fake
gpu_fakedev: better dra support
2023-09-08 14:08:34 +03:00
Ukri Niemimuukko
7ed30b2c4e gpu_fakedev: better dra support
This adds generation of the sys/bus folder files, so that dra
can use the generated files to an extent.

Signed-off-by: Ukri Niemimuukko <ukri.niemimuukko@intel.com>
2023-09-06 17:16:27 +03:00
Mikko Ylinen
7f685b5d89 sgx: add QuoteVerification demo and cleanup hostNetwork dependency
hostNetwork usage for SGX demo pods is not absolutely necessary so it's
better to clean it up and make IAS "security" scanners happier. It was
originally used to be able to use "localhost" PCCS but this change now
adds an example how proper PCCS url can be configured using jq.

Additionally, SGX DCAP Quote Verification is added.

Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2023-08-31 14:23:19 +03:00
Hersh Pathak
f1bb5b7270 Update references to OpenShift
Remove obsolete content related to OpenShift version of operator.
Update links to point to Intel Technology Enabling for OpenShift: https://github.com/intel/intel-technology-enabling-for-openshift.
Signed-off-by: Hersh Pathak hersh.pathak@intel.com
2023-08-24 08:56:44 -07:00
Mikko Ylinen
60530ecdcd go.mod: bump sigs.k8s.io/controller-runtime from 0.15.1 to 0.16.0
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2023-08-23 09:30:08 +03:00
Tuomas Katila
446ab6642f Fix QAT kernel driver links
Signed-off-by: Tuomas Katila <tuomas.katila@intel.com>
2023-08-02 12:07:09 +03:00
ZhangLi
ca96e5b312 Change the regular expression string of the device type to support sIOV device of QAT
The string of the sIOV device type exceeds the range of [[alnum]], such as:
# adf_ctl status
Checking status of all devices.
There is 2 QAT acceleration device(s) in the system:
 qat_dev0 - type: vqat-adi,  inst_id: 0,  node_id: 0,  bsf: 0000:00:08.0,  #accel: 1 #engines: 1 state: up
 qat_dev1 - type: vqat-adi,  inst_id: 1,  node_id: 0,  bsf: 0000:00:09.0,  #accel: 1 #engines: 1 state: up
2023-07-24 15:59:11 +08:00
Hyeongju Johannes Lee
c60a3afb26 fpga: fix naked return error from linter
golangci-lint version < v1.53.0 used nakedret linter that did not check
return values in conditionals. That got changed in v1.53.0 and some
of our code starts failing because of naked returns from conditionals.

Update the code to get nakedret linter passing.

Signed-off-by: Hyeongju Johannes Lee <hyeongju.lee@intel.com>
2023-07-20 10:17:08 +03:00
Tuomas Katila
708b5b405e xpum sidecar: update readme
Signed-off-by: Tuomas Katila <tuomas.katila@intel.com>
2023-06-16 14:37:32 +03:00
Tuomas Katila
efb75007a6 xpum sidecar: add support to https
Signed-off-by: Tuomas Katila <tuomas.katila@intel.com>
2023-06-16 14:37:32 +03:00
Tuomas Katila
e34e93bd64 xpum sidecar: allow xelinks that are not tied to subdevices
With one tile GPUs, xelinks are no longer advertised to
be on subdevices.

Signed-off-by: Tuomas Katila <tuomas.katila@intel.com>
2023-06-16 14:37:26 +03:00
Mikko Ylinen
93bea62dc4 doc: update SGX docs NodeFeatureRule usage
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2023-05-31 09:10:53 +03:00