Commit Graph

574 Commits

Author SHA1 Message Date
Mikko Ylinen
b14cefd485 ci: fix .golangi.yml against JSONSchema validator
golangci-lint config can be verified using the followint command:
golangci-lint config verify

Our config had some errors so fix them.

Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2025-02-17 11:04:39 +02:00
Tuomas Katila
71505e6d8d dsa: dpdk example workload and use it in e2e
Signed-off-by: Tuomas Katila <tuomas.katila@intel.com>
2025-02-07 13:42:24 +02:00
Ukri Niemimuukko
daf64052e5
Update cmd/gpu_plugin/fractional.md
Co-authored-by: Eero Tamminen <eero.t.tamminen@intel.com>
2025-01-24 15:33:15 +02:00
Ukri Niemimuukko
1c40eaaa83 Add deprecation notices about GAS 2025-01-23 20:21:36 +02:00
Mikko Ylinen
fe3eaeeb0b qat: drop AppArmor annotations
"unconfined" annotation was needed to get writes to new_id / bind
to succeed on AppArmor enabled OSes.

However, many things have changed:

* new_id should not be used anymore and it was dropped in the plugin.
* QAT initcontainer has assumed the role of HW initialization.
* vfio-pci is the preferred "dpdkDriver" and starting with QAT Gen4, it
is the only available VF driver so unbind isn't necessary.
* k8s AppArmor is "GA" since 1.30 and the annotation is deprecated.

As of now, the initcontainer will take care of binding QAT VFs to vfio-pci
so the plugin does not neeed to set AppArmor at all.

Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2025-01-16 13:54:37 +02:00
Mikko Ylinen
aaa720a329 qat: drop setupDeviceIDs()
setupDeviceIDs() is obsoleted and the preferred approach is driver_override
already implemented in qat-init.sh initcontainer.

The new_id mechanism was added way before we had the initcontainer support in place.
Furthermore, at least for vfio-pci we don't need it at all if the driver uses
ids=8086:<qat VF dev IDs>.

Drop write attemps to new_id in favor of the initcontainer functionality.

Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2025-01-16 13:54:37 +02:00
Mikko Ylinen
1a62423d21 qat-init: bind QAT VFs to vfio-pci
QAT device plugin has some initialization functions that require
special SecurityContext parameters (e.g., setting Apparmor policies
on some OSes).

It's better to move all of the initialization to the privileged
init-container that is already taking care some parts of it.

With this change, we default to vfio-pci "DpdkDrv".

Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2025-01-16 13:54:37 +02:00
Mikko Ylinen
6255810e0d gpu/rm: move to fake.NewClientSet()
k8s v1.32 client-go makes FakePods private so the current
resourcemanager fake client won't work anymore.

client-go provides a simple fake Client that works easily so
just move to use it.

Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2025-01-02 12:00:34 +02:00
Mikko Ylinen
3e141cc736 ci: move to golangci-lint v1.63.1
along with it, fix some wsl findings.

Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2025-01-02 12:00:34 +02:00
Tuomas Katila
e34355940a operator: drop rbac-proxy in favor of controller-runtime's authz/authn
rbac-proxy will be deprecated in 2025

Signed-off-by: Tuomas Katila <tuomas.katila@intel.com>
2024-12-17 10:17:14 +02:00
Hyeongju Lee
e2a82e8c95
Merge pull request #1852 from hj-johannes-lee/PR-2024-006
qat, initcontainer: add enablement of auto_reset
2024-10-02 15:41:04 +03:00
Hyeongju Johannes Lee
51b7745260 qat, initcontainer: add enablement of auto_reset
Signed-off-by: Hyeongju Johannes Lee <hyeongju.lee@intel.com>
2024-09-30 16:49:44 -07:00
Tuomas Katila
b060920201 labeler: fix false-failure with nfd file removal test
Signed-off-by: Tuomas Katila <tuomas.katila@intel.com>
2024-09-27 14:55:43 +03:00
Tuomas Katila
d9cb0fc3f9 gpu: add a note about non-default namespaces with fractional resources
Signed-off-by: Tuomas Katila <tuomas.katila@intel.com>
2024-09-25 13:15:36 +03:00
Tuomas Katila
b4a3ccd94d levelzero: use define for error temperature
Signed-off-by: Tuomas Katila <tuomas.katila@intel.com>
2024-09-23 08:27:45 +03:00
Tuomas Katila
80a5529ad3 levelzero: add missing calloc return value check
Signed-off-by: Tuomas Katila <tuomas.katila@intel.com>
2024-09-20 13:59:00 +03:00
Tuomas Katila
fc2dce588c Rename pci to PCI in various places
Signed-off-by: Tuomas Katila <tuomas.katila@intel.com>
2024-09-19 19:14:15 +03:00
Tuomas Katila
606ac77647 gpu: levelzero: documentation
Co-authored-by: Eero Tamminen <eero.t.tamminen@intel.com>
Signed-off-by: Tuomas Katila <tuomas.katila@intel.com>
2024-09-19 19:14:15 +03:00
Tuomas Katila
518a8606ff gpu: add levelzero sidecar support for plugin and the deployment files
In addition to the levelzero's health data use, this adds support to
scan devices in WSL. Scanning happens by retrieving Intel device
indices from the Level-Zero API.

Signed-off-by: Tuomas Katila <tuomas.katila@intel.com>
2024-09-19 19:14:15 +03:00
Tuomas Katila
2df9443fda gpu: add levelzero application
Signed-off-by: Tuomas Katila <tuomas.katila@intel.com>
2024-09-19 19:14:15 +03:00
Tuomas Katila
402fb8d9cd gpu: add support for CDI devices
Signed-off-by: Tuomas Katila <tuomas.katila@intel.com>
2024-09-11 09:29:55 +03:00
Mikko Ylinen
be10a8d923
Merge pull request #1810 from tkatila/cdi-make-more-generic 2024-09-04 17:08:31 +03:00
Tuomas Katila
c3a01b91ff cdi: restructure cdi support for more generic use
Pass the whole cdi.spec structure to DeviceInfo and use
cdiCache for interacting with the CDI files on the host.

Signed-off-by: Tuomas Katila <tuomas.katila@intel.com>
2024-09-04 13:56:22 +03:00
Tuomas Katila
f08998aae0 xpumanager sidecar: remove HTTPS use without certificates
Add deployment that uses cert-manager to provide self-signed certificates
Add functionality to verify server endpoint in the sidecar

Signed-off-by: Tuomas Katila <tuomas.katila@intel.com>
2024-08-30 14:36:21 +03:00
Tuomas Katila
7e5b280cd1 golanci-lint: upgrade to 1.60.3, fix one issue and ignore conversion errors
Signed-off-by: Tuomas Katila <tuomas.katila@intel.com>
2024-08-27 11:40:35 +03:00
Tuomas Katila
fa6d027b24 Fix some lint errors
Signed-off-by: Tuomas Katila <tuomas.katila@intel.com>
2024-08-27 11:40:29 +03:00
Tuomas Katila
42c34a74a4 tls: drop additional ciphers
Signed-off-by: Tuomas Katila <tuomas.katila@intel.com>
2024-08-21 12:28:02 +03:00
Tuomas Katila
1a13dcd3e2 tls: limit version to 1.2 only and selected ciphers
Signed-off-by: Tuomas Katila <tuomas.katila@intel.com>
2024-08-20 11:58:38 +03:00
Tuomas Katila
333d6369db add a note about production clusters and proper certificates
Signed-off-by: Tuomas Katila <tuomas.katila@intel.com>
2024-08-20 11:46:12 +03:00
Hyeongju Johannes Lee
ea6d52d443 QAT: make plugin read trimmed heartbeat status
Plugin used to consider only the value "-1" but there are some
cases when files show "\n" or "\n\x00". This makes plugin to have
wrong status of the device. So, trim the value after \n so only
numerical value can be read.

Signed-off-by: Hyeongju Johannes Lee <hyeongju.lee@intel.com>
2024-08-05 19:57:33 +03:00
Mikko Ylinen
5a59385a09 qat: drop c6xxvf from defaults
The devices searched by default are QAT Gen4+ only.

Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2024-06-11 07:31:49 +03:00
Tuomas Katila
20b7b5a4d7
Merge pull request #1748 from mythi/PR-2024-013
pkg/deviceplugin: move to grpc.NewClient()
2024-05-28 12:09:22 +03:00
Tuomas Katila
11c9753aca
Merge pull request #1745 from bart0sh/PR155-fpga-support-CDI
FPGA: support CDI
2024-05-28 11:19:58 +03:00
Mikko Ylinen
4d858c5364 pkg/deviceplugin: move to grpc.NewClient()
grpc.NewClient(), added in grpc-go v1.63, is the preferred way to
create a new ClientConn. In most of our usages, moving away from
grpc.Dial*() to it is straightforward.

However, we've also relied on grpc.Dial*()'s behavior to automatically
make a new connection to "test" a connection is successful isn't available
anymore. Combined with grpc.WithBlock dialoption this usage is considered
"especially bad" way to handle a client connection.

The recommended approach to test a server connection is to separately
make a connection and watch the connection state to become Ready. This
change follows that recommendation.

Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2024-05-28 08:17:06 +03:00
Ed Bartosh
e58369ed13 rename deprecated prestart to createRuntime
`prestart` hook is marked as deprecated in the OCI runtime spec:
https://github.com/opencontainers/runtime-spec/blob/main/config.md#posix-platform-hooks

Renamed `prestart` to the `createRuntime` as suggested in the spec.

Replaced `CDI hook` with `OCI hook` to be more clear. CDI is just a
way to update OCI config and theoretically there is no such thing as
CDI hook.
2024-05-22 19:54:53 +03:00
Ed Bartosh
1fa557e680 crihook: update documentation 2024-05-22 15:59:36 +03:00
Ed Bartosh
ca6f8f3020 cri_hook: remove annotation check 2024-05-22 14:57:03 +03:00
Ed Bartosh
d245b2609d fpga: use CDI to run hooks 2024-05-22 14:56:58 +03:00
Ed Bartosh
988fbed528 deviceplugin: add DeviceInfo.hooks field 2024-05-22 13:13:38 +03:00
Tuomas Katila
e753423884
Merge pull request #1685 from hj-johannes-lee/PR-2024-001
qat: improve qat_dpdk_app, openssl-qat-engine
2024-05-15 11:28:26 +03:00
Tuomas Katila
7caba390e3 xpumanager sidecar: add note about using HTTPS with xpum
Signed-off-by: Tuomas Katila <tuomas.katila@intel.com>
2024-05-14 12:54:29 +03:00
Hyeongju Johannes Lee
2af37fd4cb qat_dpdk_app: drop generic
Signed-off-by: Hyeongju Johannes Lee <hyeongju.lee@intel.com>
2024-05-07 20:46:12 +03:00
Tuomas Katila
ff91a97934
Merge pull request #1720 from mythi/PR-2024-010
ci: move to golangci-lint v1.57.2
2024-05-03 12:55:29 +03:00
Tuomas Katila
05bb8ef156 qat: add support for 420xx driver and its devices (4946)
Signed-off-by: Tuomas Katila <tuomas.katila@intel.com>
2024-05-02 11:36:13 +03:00
Mikko Ylinen
54f9d730e9 ci: move to golangci-lint v1.57.2
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2024-05-02 09:18:27 +03:00
Tuomas Katila
4946b26018 gpu: doc: monitoring resource notes
Also align xelink-sidecar deployment with the new files in
the xpu manager project.

Signed-off-by: Tuomas Katila <tuomas.katila@intel.com>
2024-03-13 08:16:16 +02:00
Tuomas Katila
1de1024530 gpu: add xe notes
Co-authored-by: Eero Tamminen <eero.t.tamminen@intel.com>
Signed-off-by: Tuomas Katila <tuomas.katila@intel.com>
2024-03-12 15:41:44 +02:00
Tuomas Katila
e600fe9313 gpu: add support for the upcoming xe-driver
Plugin can support both i915 and xe drivers dynamically. But
having both drivers on same node with RM is not possible.

Signed-off-by: Tuomas Katila <tuomas.katila@intel.com>
2024-03-12 11:34:01 +02:00
Tuomas Katila
d5cb53a1d1 labeler: add xe support for tile counting
Signed-off-by: Tuomas Katila <tuomas.katila@intel.com>
2024-03-11 11:32:25 +02:00
Tuomas Katila
af04d41e1b labeler: use a function to store splittable labels
Signed-off-by: Tuomas Katila <tuomas.katila@intel.com>
2024-03-11 11:22:33 +02:00