Commit Graph

408 Commits

Author SHA1 Message Date
Eero Tamminen
0b519ecf1e Deprecate debugfs GPU IP block version labels in NFD hook doc
There's no mapping available from IP block versions to actual product
features, which make these version numbers fairly useless for end
users.

In mixed GPU clusters, running a job that adds/updates node labels for
the relevant GPU features to each relevant node would be much more
user-friendly.  This could be done easily by converting given GPU API
capability tool (e.g. "vainfo" for VA-API, "clinfo" for OpenCL) output
to a NFD feature file.

(Such thing would be outside of this project scope though, except
maybe as an example / test-case.)

Signed-off-by: Eero Tamminen <eero.t.tamminen@intel.com>
2022-08-24 16:55:01 +03:00
Eero Tamminen
0b7cbc862d Improve GPU NFD hook documentation
Add table of contents, simplify introduction text.

Signed-off-by: Eero Tamminen <eero.t.tamminen@intel.com>
2022-08-24 16:55:01 +03:00
Ed Bartosh
6177dd0dfe
Merge pull request #1093 from mythi/PR-2022-050
build: move to Go 1.19
2022-08-16 00:25:57 +03:00
astronaut0131
2d155edac7 sgx: add kind deployment notes for aesmd 2022-08-15 15:26:01 +08:00
Mikko Ylinen
642c4f7b59 build: move to Go 1.19 and golangci-lint 1.48 because of that
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2022-08-15 10:13:37 +03:00
Chelsea Mafrica
24eb52a912 docs: Fix missing code block in operator doc
Add missing code block to section the the operator README.

Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>
2022-08-05 11:32:48 -07:00
Mikko Ylinen
3c948cc106
Merge pull request #1063 from bart0sh/PR144-upgrade-libDLB
dlb: update DLB to v7.7.0
2022-07-18 09:29:55 +03:00
Ed Bartosh
9f2db89da6 dlb: update DLB to v7.7.0
Signed-off-by: Ed Bartosh <eduard.bartosh@intel.com>
2022-07-03 15:08:14 +03:00
Huang Xin
89caad1cd4 doc: modify SGX device plugin deployments url from 'main' to '<RELEASE_VERSION>'
Signed-off-by: Huang Xin <xin1.huang@intel.com>
2022-06-25 17:33:46 +08:00
Ed Bartosh
c82b907472
Merge pull request #1055 from mythi/PR-2022-045
operator: align with kubebuilder v3 functionality
2022-06-20 23:12:21 +03:00
Mikko Ylinen
f9ca36cc26 set TLSMinVersion for webhook servers
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2022-06-20 19:04:50 +03:00
Mikko Ylinen
b48568c43a operator: align with kubebuilder v3 functionality
kubebuilder v3 based scaffolding has updated many things
and they are documented in [1].

Update operator's functionality to v3 level. We've done
most/some of the changes earlier (e.g., by not using
deprecated k8s APIs anymore) so the changes are minimal.

[1] https://book.kubebuilder.io/migration/v2vsv3.html

Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2022-06-20 16:35:40 +03:00
Oleg Zhurakivskyy
f1ec14d106 iaa: Add e2e tests
Signed-off-by: Ed Bartosh <eduard.bartosh@intel.com>
Signed-off-by: Oleg Zhurakivskyy <oleg.zhurakivskyy@intel.com>
2022-06-09 15:00:25 +03:00
Mikko Ylinen
9bb5f303ab
Merge pull request #1048 from bart0sh/PR142-get-rid-of-sysfsDir
Get rid of unused sysfsDir parameter
2022-06-09 13:55:57 +03:00
Ed Bartosh
9d04ce825d idxd: get rid of unused sysfsDir parameter 2022-06-08 22:09:27 +03:00
Ed Bartosh
3df93cf04f rename image dsa-accel-config-demo -> accel-config-demo 2022-06-08 21:00:54 +03:00
Ed Bartosh
e182304c4d
Merge pull request #1030 from mythi/PR-2022-040
qat: add support for 401xx devices
2022-06-03 12:37:45 +03:00
Hyeongju Johannes Lee
276d25088e dlb: update the version of DLB driver & DPDK to new release
Signed-off-by: Hyeongju Johannes Lee <hyeongju.lee@intel.com>
2022-06-02 22:39:46 +03:00
Mikko Ylinen
8987f1ba53 qat: add support for 401xx devices
QAT_401xx is a derivative of 4xxx. Add support for that device
by including the device IDs (both PF and VF).

Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2022-06-02 08:11:39 +03:00
Hyeongju Johannes Lee
85a12609a3 sgx: deprecate /dev/sgx/ mounts
Signed-off-by: Hyeongju Johannes Lee <hyeongju.lee@intel.com>
2022-05-09 18:59:34 +03:00
Mikko Ylinen
6ea51a3623 qat: kerneldrv: skip QAT Gen4 devices
Containers running on QAT Gen4 should be based on qatlib and therefore
kerneldrv is not the right mode. Skip registering 4xxx* devices to
ensure it is not used.

Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2022-04-25 22:08:13 +03:00
Oleg Zhurakivskyy
e3a277c65f doc: Update the documentation on the DSA, IAA ConfigMap creation
Closes #941

Signed-off-by: Oleg Zhurakivskyy <oleg.zhurakivskyy@intel.com>
2022-04-25 10:17:17 +03:00
Mikko Ylinen
069b9bd79a qat: 4xxx: split generic resource to compression and crypto
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2022-04-07 22:33:17 +03:00
Mikko Ylinen
482ed7ba4d
Merge pull request #939 from hj-johannes-lee/qat-allocation-policy
qat: implement preferredAllocation policies
2022-04-07 21:15:49 +03:00
Hyeongju Johannes Lee
d3c8063ff3 qat: implement preferredAllocation policies
Signed-off-by: Hyeongju Johannes Lee <hyeongju.lee@intel.com>
2022-04-07 14:14:00 +03:00
Mikko Ylinen
2adad5ae76 drop deprecated grpc.WithInsecure()
grpc-go v1.43.0 deprecated grpc.WithInsecure() in favor of
insecure.NewCredentials(). Move to use the recommended approach
and drop the linter annotations.

Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2022-04-07 13:40:51 +03:00
Tonny Tzeng
bf94f566fd doc: unify test images build with make
Signed-off-by: Tonny Tzeng <tonny.tzeng@intel.com>
2022-04-01 15:49:43 +08:00
Mikko Ylinen
0f36cde605
Merge pull request #935 from tkatila/gpu/tiles-support-and-numa-mapping
gpu: add tiles annotation support
2022-03-30 19:33:09 +03:00
Tuomas Katila
8f6a235b5d gpu: Start using GetPreferredAllocation with fractional resources
Move reallocate logic to getpreferredallocation and simplify
allocate to use the kubelet's device ids.

Signed-off-by: Tuomas Katila <tuomas.katila@intel.com>
2022-03-30 11:32:49 +03:00
Mikko Ylinen
18379e92f3
Merge pull request #937 from tkatila/gpu/numa-mapping
gpu: Add numa node mapping label for GPUs
2022-03-29 07:56:26 +03:00
Hyeongju Johannes Lee
7eeaddc563 gpu: fix typo in implmentation of preferredAllocator interface
Signed-off-by: Hyeongju Johannes Lee <hyeongju.lee@intel.com>
2022-03-28 05:04:32 -07:00
Tuomas Katila
bdd72c8cf7 gpu: Add numa node mapping label for GPUs
Signed-off-by: Tuomas Katila <tuomas.katila@intel.com>
2022-03-24 14:29:05 +02:00
Tuomas Katila
db7e5bfc55 Add support for gas-container-tiles annotation
Adds functionality to convert container's tile annotation
in to corresponding L0 affinity mask. This helps to target
container's workload to specific L0 subdevices.

Signed-off-by: Tuomas Katila <tuomas.katila@intel.com>
2022-03-24 14:13:35 +02:00
Mikko Ylinen
a03df7edd6 doc: fix operator usage instructions
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2022-03-16 08:10:58 +02:00
Ed Bartosh
6b27cf1f7c Implement IAA plugin, operator, demo
Signed-off-by: Ed Bartosh <eduard.bartosh@intel.com>
Signed-off-by: Oleg Zhurakivskyy <oleg.zhurakivskyy@intel.com>
2022-03-04 15:58:42 +02:00
Mikko Ylinen
c064bfc4f1 demo: add intel-opencl-icd
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2022-02-24 11:06:27 +02:00
Hyeongju Johannes Lee
5fe2c3ef4d dlb: update the link to dlb driver
Signed-off-by: Hyeongju Johannes Lee <hyeongju.lee@intel.com>
2022-02-18 20:00:24 +02:00
Ed Bartosh
d4966e089c
Merge pull request #857 from ozhuraki/operator-upgrade
operator: Support upgrade of plugins
2022-02-18 17:55:53 +02:00
Oleg Zhurakivskyy
f29171b067 operator: Add a documentation on upgrade
Signed-off-by: Oleg Zhurakivskyy <oleg.zhurakivskyy@intel.com>
2022-02-18 12:52:55 +02:00
Mikko Ylinen
72c4552253 deployments: move SGX NFD config to an NFD kustomize overlay
Start using the newly created NodeFeatureRule configs with SGX.
This allows to drop the custom worker config.

Additionally, split the example NFD deployment into two steps

1) plain NFD (+SGX json patches)
2) NodeFeatureRule creation

NodeFeatureRule creation is not guaranteed to succeed when it's
part of the same kustomization with the CRD creation. Users may
also have NFD already running so allowing 2) alone works better
in that scenario.

Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2022-02-18 11:17:57 +02:00
Hyeongju Johannes Lee
d70397ebfb dlb: update README
Remove commands for building and loading dlb2 driver

Signed-off-by: Hyeongju Johannes Lee <hyeongju.lee@intel.com>
2022-02-16 16:10:51 +02:00
Mikko Ylinen
a74774f939 docs: update cert-manager installation instructions
The webhooks' default deployments depend on cert-manager. Our existing
documentation points to a specific cert-manager version giving users
the impression that it should be used. However, that is not the case.

Update the documentation so that we just point to cert-manager
installation page. With this, we don't have to hard-code to any
specific version.

Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2022-02-16 11:26:37 +02:00
Mikko Ylinen
1185f2329b crypto-perf: drop SYS_ADMIN capabilities
SYS_ADMIN capabilities are not necessary when using
vfio-pci.

Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2022-02-16 11:26:20 +02:00
Oleg Zhurakivskyy
656676b267 operator: Set klogr's format to FormatKlog
The default "Serialize" breaks multiline output.

Signed-off-by: Oleg Zhurakivskyy <oleg.zhurakivskyy@intel.com>
2022-02-09 16:49:35 +02:00
Ed Bartosh
8626d47d8b operator: implement NFD labelling rules
- added labelling rules for all supported devices
- updated operator installation instructions

Fixes: #768

Signed-off-by: Ed Bartosh <eduard.bartosh@intel.com>
2022-02-08 17:01:03 +02:00
Ed Bartosh
55f3e17dd0 add 'annotations' parameter to the NewDeviceInfo API
Signed-off-by: Ed Bartosh <eduard.bartosh@intel.com>
2022-02-07 15:15:30 +02:00
Tuomas Katila
6f57c55ef8 Add a total tile count to node's labels
This label isn't dependent on the debugfs as the platform
specific tile count is.

Signed-off-by: Tuomas Katila <tuomas.katila@intel.com>
2022-01-26 09:57:33 +02:00
Ukri Niemimuukko
7520393041 gpu_nfdhook: gpu-numbers and pci-groups
This adds a new label "gpu-numbers" for short numbered lists of
gpus, omitting "card" from the names. Also adds splitting of long
label values.

Similarly this adds a new label "pci-groups" for PCI groups. Grouping
can be controlled by env var GPU_PCI_GROUPING_LEVEL. The env var
dictates, how many pci-folder names need to match, in order for GPUs
to be considered to belong in a group.

Signed-off-by: Ukri Niemimuukko <ukri.niemimuukko@intel.com>
2022-01-25 09:17:56 +02:00
Mikko Ylinen
c306f5ef68 qat: detect noiommu mode with VFIO
If the kernel has CONFIG_VFIO_NOIOMMU enabled and the node admin
has explicitly set enable_unsafe_noiommu_mode VFIO parameter,
VFIO taints the kernel and writes "vfio-noiommu" to the IOMMU
group name. If these conditions are true, the /dev/vfio/ devices
are prefixed with "noiommu-".

This use-case is documented for DPDK so we don't want to break
it (as it was before because we added DeviceMounts to
/dev/vfio/<iommugroup> files that did not exist).

See DPDK documentation for further information and warnings.

Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2022-01-10 06:11:59 +02:00
Eero Tamminen
36046d90a4 Make GPU plugin / resource label limitations more explicit
While the labeling limit is obvious after little thought, IMHO
limitations like this should either be stated out front, or be in
their own section in the README.  Commit does former for the GPU
plugin fractional resources, and latter for the NFD hook / labeling.
2022-01-04 11:43:08 +02:00