Commit Graph

574 Commits

Author SHA1 Message Date
Mikko Ylinen
b81d2dcba8 Update SGX and FPGA webhook flags
SGX Admission webhook was quickly forked from FPGA's
implementation. After a bit of thinking, it turns out
leader election and metrics are not necessary for a
(idempotent) webhook-only functionality.

For FPGA Admission webhook, the metrics isn't correctly
set up so it's better to disable the functionality. Leader
election is kept but the flag name is renamed to align with
"kubebuilder v3 functionality" similar to how we changed it
to the operator as well.

Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2022-09-13 13:18:28 +03:00
Mikko Ylinen
3abf10d7ff qat: read device capabilities from sysfs
Linux 6.0 adds sysfs-driver-qat entries to read device capabilities:
42e66b1cc3/Documentation/ABI/testing/sysfs-driver-qat

Implement the logic for reading from sysfs and prefer that over debugfs.

Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2022-09-09 14:16:03 +03:00
Tuomas Katila
230570f12e gpu: add mentions about data center gpu support
Signed-off-by: Tuomas Katila <tuomas.katila@intel.com>
2022-09-09 13:07:50 +03:00
Mikko Ylinen
307e960871 docs: fix remaining review comments
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2022-09-06 14:28:25 +03:00
Mikko Ylinen
8ac321f5e3 sgx: send nil TopologyInfo
/dev/sgx_* cannot be mapped to any topology. SGX itself is topology
aware but we cannot control it with TopologyInfo.

Currently, pkg/topology returns empty TopologyInfo{Nodes:[]*NUMANode{}}
for /dev/sgx_* but kubelet TopologyManager (when enabled and with the
policy other than 'none') interpretes that as "Hint Provider has no
possible NUMA affinities for resource" and rejects the SGX resources.

What we want is "Hint Provider has no preference for NUMA affinity with
resource". This is communicated using nil TopologyInfo.

See: https://github.com/kubernetes/kubernetes/issues/112234

Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2022-09-06 08:43:04 +03:00
Ed Bartosh
f0dd95274e
Merge pull request #1126 from mythi/PR-2022-054
docs: rework development guide
2022-09-02 17:59:15 +03:00
Ed Bartosh
5756725b09 fix lint failure
Removed unused import. This should fix this golangci-lint failure:
  can't run linter goanalysis_metalinter:
  buildir: failed to load package :
  could not load export data:
  no export data for "cloud.google.com/go/compute/metadata"

Signed-off-by: Ed Bartosh <eduard.bartosh@intel.com>
2022-09-02 12:02:06 +03:00
Mikko Ylinen
1b3accacc2 docs: rework development guide
Currently, each individual plugin README documents roughly the same
daily development steps to git clone, build, and deploy. Re-purpose
the plugin READMEs more towards cluster admin type of documentation
and start moving all development related documentation to DEVEL.md.

The same is true for e2e testing documentation which is scattered
in places where they don't belong to. Having all day-to-day
development Howtos is good to have in a centralized place.

Finally, the cleanup includes some harmonization to plugins'
table of contents which now follows the pattern:

* [Introduction](#introduction)
(* [Modes and Configuration Options](#modes-and-configuration-options))
* [Installation](#installation)
    (* [Prerequisites](#prerequisites))
    * [Pre-built Images](#pre-built-images)
    * [Verify Plugin Registration](#verify-plugin-registration)
* [Testing and Demos](#testing-and-demos)
    * ...

Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2022-08-31 20:00:15 +03:00
Eero Tamminen
fb18923298 Log GPU device share count & type count changes separately
And instead of accessing DeviceTree internals, add suitable method for it.

Signed-off-by: Eero Tamminen <eero.t.tamminen@intel.com>
2022-08-31 17:23:57 +03:00
Mikko Ylinen
d826548d29
Merge pull request #1113 from eero-t/gpu-count-log
More detailed log for number of found GPU devices / resource types
2022-08-29 09:58:53 +03:00
Ed Bartosh
02446fca1d
Merge pull request #1114 from eero-t/prefix-option
Add "prefix" option to GPU plugin for scalability testing
2022-08-26 22:54:28 +03:00
Eero Tamminen
9d4b52188e Add "gpu_fakedev" documentation
Signed-off-by: Eero Tamminen <eero.t.tamminen@intel.com>
2022-08-26 19:05:10 +03:00
Eero Tamminen
cc3aebbefc Add minimal example JSON to test "gpu_fakedev" generator
Config file is suitably indented so that it can be directly
appended to a suitable configMap header.

Signed-off-by: Eero Tamminen <eero.t.tamminen@intel.com>
2022-08-26 19:05:10 +03:00
Eero Tamminen
c15feea1f8 Add code for generating fake GPU sysfs + devfs files
To facilitate GPU plugin scalability testing on a real cluster.

Pre-existing (fake) sysfs & devfs content needs to be removed first:

* Fake devfs directory is mounted from host so OCI runtime can "mount"
  device files also to workloads requesting fake devices. This means
  that those files can persist over fake GPU plugin life-time, and
  earlier files need to be removed, as they may not match

* DaemonSet restarts failing init containers, so errors about content
  created on previous generator run would prevent getting logs of the
  real error on first generator run

* Before removal, check that removed directory content is as expected,
  to avoid accidentally removing host sysfs/devfs content (in case
  container was erronously granted access to the real thing)

Container runtime requires fake device files to real be devices:

* Use NULL devices to represent fake GPU devices:
  https://www.kernel.org/doc/Documentation/admin-guide/devices.txt

* Give more detailed logging for MkNod() failures as device
  node creation is most likely operation to fail when container
  does not have the necessary access rights

Created content is based on JSON config file (instead of e.g.
commandline options) so that (configMap providing) it can be updated
independently of the pod where generator is run.

Signed-off-by: Eero Tamminen <eero.t.tamminen@intel.com>
2022-08-26 19:04:43 +03:00
Eero Tamminen
ddf2c8bc8f More detailed log for number of found GPU devices / resource types
Signed-off-by: Eero Tamminen <eero.t.tamminen@intel.com>
2022-08-26 17:51:27 +03:00
Eero Tamminen
0b519ecf1e Deprecate debugfs GPU IP block version labels in NFD hook doc
There's no mapping available from IP block versions to actual product
features, which make these version numbers fairly useless for end
users.

In mixed GPU clusters, running a job that adds/updates node labels for
the relevant GPU features to each relevant node would be much more
user-friendly.  This could be done easily by converting given GPU API
capability tool (e.g. "vainfo" for VA-API, "clinfo" for OpenCL) output
to a NFD feature file.

(Such thing would be outside of this project scope though, except
maybe as an example / test-case.)

Signed-off-by: Eero Tamminen <eero.t.tamminen@intel.com>
2022-08-24 16:55:01 +03:00
Eero Tamminen
0b7cbc862d Improve GPU NFD hook documentation
Add table of contents, simplify introduction text.

Signed-off-by: Eero Tamminen <eero.t.tamminen@intel.com>
2022-08-24 16:55:01 +03:00
Eero Tamminen
5666b8fa30 Add "prefix" option to GPU plugin for scalability testing
GPU plugin code assumes container paths to match host paths, and
container runtime prevents creating fake files under real paths. When
non-standard paths are used, devices can be faked for scalability
testing.

Note: If one wants to run both normal GPU plugin and faked one in same
cluster, all nodes providing fake "i915" resources should be labeled
differently from ones with real GPU plugin + devices, so that real GPU
workloads can be limited to correct nodes with a suitable
nodeSelector.

Signed-off-by: Eero Tamminen <eero.t.tamminen@intel.com>
2022-08-24 14:32:53 +03:00
Ed Bartosh
6177dd0dfe
Merge pull request #1093 from mythi/PR-2022-050
build: move to Go 1.19
2022-08-16 00:25:57 +03:00
astronaut0131
2d155edac7 sgx: add kind deployment notes for aesmd 2022-08-15 15:26:01 +08:00
Mikko Ylinen
642c4f7b59 build: move to Go 1.19 and golangci-lint 1.48 because of that
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2022-08-15 10:13:37 +03:00
Chelsea Mafrica
24eb52a912 docs: Fix missing code block in operator doc
Add missing code block to section the the operator README.

Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>
2022-08-05 11:32:48 -07:00
Mikko Ylinen
3c948cc106
Merge pull request #1063 from bart0sh/PR144-upgrade-libDLB
dlb: update DLB to v7.7.0
2022-07-18 09:29:55 +03:00
Ed Bartosh
9f2db89da6 dlb: update DLB to v7.7.0
Signed-off-by: Ed Bartosh <eduard.bartosh@intel.com>
2022-07-03 15:08:14 +03:00
Huang Xin
89caad1cd4 doc: modify SGX device plugin deployments url from 'main' to '<RELEASE_VERSION>'
Signed-off-by: Huang Xin <xin1.huang@intel.com>
2022-06-25 17:33:46 +08:00
Ed Bartosh
c82b907472
Merge pull request #1055 from mythi/PR-2022-045
operator: align with kubebuilder v3 functionality
2022-06-20 23:12:21 +03:00
Mikko Ylinen
f9ca36cc26 set TLSMinVersion for webhook servers
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2022-06-20 19:04:50 +03:00
Mikko Ylinen
b48568c43a operator: align with kubebuilder v3 functionality
kubebuilder v3 based scaffolding has updated many things
and they are documented in [1].

Update operator's functionality to v3 level. We've done
most/some of the changes earlier (e.g., by not using
deprecated k8s APIs anymore) so the changes are minimal.

[1] https://book.kubebuilder.io/migration/v2vsv3.html

Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2022-06-20 16:35:40 +03:00
Oleg Zhurakivskyy
f1ec14d106 iaa: Add e2e tests
Signed-off-by: Ed Bartosh <eduard.bartosh@intel.com>
Signed-off-by: Oleg Zhurakivskyy <oleg.zhurakivskyy@intel.com>
2022-06-09 15:00:25 +03:00
Mikko Ylinen
9bb5f303ab
Merge pull request #1048 from bart0sh/PR142-get-rid-of-sysfsDir
Get rid of unused sysfsDir parameter
2022-06-09 13:55:57 +03:00
Ed Bartosh
9d04ce825d idxd: get rid of unused sysfsDir parameter 2022-06-08 22:09:27 +03:00
Ed Bartosh
3df93cf04f rename image dsa-accel-config-demo -> accel-config-demo 2022-06-08 21:00:54 +03:00
Ed Bartosh
e182304c4d
Merge pull request #1030 from mythi/PR-2022-040
qat: add support for 401xx devices
2022-06-03 12:37:45 +03:00
Hyeongju Johannes Lee
276d25088e dlb: update the version of DLB driver & DPDK to new release
Signed-off-by: Hyeongju Johannes Lee <hyeongju.lee@intel.com>
2022-06-02 22:39:46 +03:00
Mikko Ylinen
8987f1ba53 qat: add support for 401xx devices
QAT_401xx is a derivative of 4xxx. Add support for that device
by including the device IDs (both PF and VF).

Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2022-06-02 08:11:39 +03:00
Hyeongju Johannes Lee
85a12609a3 sgx: deprecate /dev/sgx/ mounts
Signed-off-by: Hyeongju Johannes Lee <hyeongju.lee@intel.com>
2022-05-09 18:59:34 +03:00
Mikko Ylinen
6ea51a3623 qat: kerneldrv: skip QAT Gen4 devices
Containers running on QAT Gen4 should be based on qatlib and therefore
kerneldrv is not the right mode. Skip registering 4xxx* devices to
ensure it is not used.

Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2022-04-25 22:08:13 +03:00
Oleg Zhurakivskyy
e3a277c65f doc: Update the documentation on the DSA, IAA ConfigMap creation
Closes #941

Signed-off-by: Oleg Zhurakivskyy <oleg.zhurakivskyy@intel.com>
2022-04-25 10:17:17 +03:00
Mikko Ylinen
069b9bd79a qat: 4xxx: split generic resource to compression and crypto
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2022-04-07 22:33:17 +03:00
Mikko Ylinen
482ed7ba4d
Merge pull request #939 from hj-johannes-lee/qat-allocation-policy
qat: implement preferredAllocation policies
2022-04-07 21:15:49 +03:00
Hyeongju Johannes Lee
d3c8063ff3 qat: implement preferredAllocation policies
Signed-off-by: Hyeongju Johannes Lee <hyeongju.lee@intel.com>
2022-04-07 14:14:00 +03:00
Mikko Ylinen
2adad5ae76 drop deprecated grpc.WithInsecure()
grpc-go v1.43.0 deprecated grpc.WithInsecure() in favor of
insecure.NewCredentials(). Move to use the recommended approach
and drop the linter annotations.

Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2022-04-07 13:40:51 +03:00
Tonny Tzeng
bf94f566fd doc: unify test images build with make
Signed-off-by: Tonny Tzeng <tonny.tzeng@intel.com>
2022-04-01 15:49:43 +08:00
Mikko Ylinen
0f36cde605
Merge pull request #935 from tkatila/gpu/tiles-support-and-numa-mapping
gpu: add tiles annotation support
2022-03-30 19:33:09 +03:00
Tuomas Katila
8f6a235b5d gpu: Start using GetPreferredAllocation with fractional resources
Move reallocate logic to getpreferredallocation and simplify
allocate to use the kubelet's device ids.

Signed-off-by: Tuomas Katila <tuomas.katila@intel.com>
2022-03-30 11:32:49 +03:00
Mikko Ylinen
18379e92f3
Merge pull request #937 from tkatila/gpu/numa-mapping
gpu: Add numa node mapping label for GPUs
2022-03-29 07:56:26 +03:00
Hyeongju Johannes Lee
7eeaddc563 gpu: fix typo in implmentation of preferredAllocator interface
Signed-off-by: Hyeongju Johannes Lee <hyeongju.lee@intel.com>
2022-03-28 05:04:32 -07:00
Tuomas Katila
bdd72c8cf7 gpu: Add numa node mapping label for GPUs
Signed-off-by: Tuomas Katila <tuomas.katila@intel.com>
2022-03-24 14:29:05 +02:00
Tuomas Katila
db7e5bfc55 Add support for gas-container-tiles annotation
Adds functionality to convert container's tile annotation
in to corresponding L0 affinity mask. This helps to target
container's workload to specific L0 subdevices.

Signed-off-by: Tuomas Katila <tuomas.katila@intel.com>
2022-03-24 14:13:35 +02:00
Mikko Ylinen
a03df7edd6 doc: fix operator usage instructions
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2022-03-16 08:10:58 +02:00
Ed Bartosh
6b27cf1f7c Implement IAA plugin, operator, demo
Signed-off-by: Ed Bartosh <eduard.bartosh@intel.com>
Signed-off-by: Oleg Zhurakivskyy <oleg.zhurakivskyy@intel.com>
2022-03-04 15:58:42 +02:00
Mikko Ylinen
c064bfc4f1 demo: add intel-opencl-icd
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2022-02-24 11:06:27 +02:00
Hyeongju Johannes Lee
5fe2c3ef4d dlb: update the link to dlb driver
Signed-off-by: Hyeongju Johannes Lee <hyeongju.lee@intel.com>
2022-02-18 20:00:24 +02:00
Ed Bartosh
d4966e089c
Merge pull request #857 from ozhuraki/operator-upgrade
operator: Support upgrade of plugins
2022-02-18 17:55:53 +02:00
Oleg Zhurakivskyy
f29171b067 operator: Add a documentation on upgrade
Signed-off-by: Oleg Zhurakivskyy <oleg.zhurakivskyy@intel.com>
2022-02-18 12:52:55 +02:00
Mikko Ylinen
72c4552253 deployments: move SGX NFD config to an NFD kustomize overlay
Start using the newly created NodeFeatureRule configs with SGX.
This allows to drop the custom worker config.

Additionally, split the example NFD deployment into two steps

1) plain NFD (+SGX json patches)
2) NodeFeatureRule creation

NodeFeatureRule creation is not guaranteed to succeed when it's
part of the same kustomization with the CRD creation. Users may
also have NFD already running so allowing 2) alone works better
in that scenario.

Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2022-02-18 11:17:57 +02:00
Hyeongju Johannes Lee
d70397ebfb dlb: update README
Remove commands for building and loading dlb2 driver

Signed-off-by: Hyeongju Johannes Lee <hyeongju.lee@intel.com>
2022-02-16 16:10:51 +02:00
Mikko Ylinen
a74774f939 docs: update cert-manager installation instructions
The webhooks' default deployments depend on cert-manager. Our existing
documentation points to a specific cert-manager version giving users
the impression that it should be used. However, that is not the case.

Update the documentation so that we just point to cert-manager
installation page. With this, we don't have to hard-code to any
specific version.

Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2022-02-16 11:26:37 +02:00
Mikko Ylinen
1185f2329b crypto-perf: drop SYS_ADMIN capabilities
SYS_ADMIN capabilities are not necessary when using
vfio-pci.

Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2022-02-16 11:26:20 +02:00
Oleg Zhurakivskyy
656676b267 operator: Set klogr's format to FormatKlog
The default "Serialize" breaks multiline output.

Signed-off-by: Oleg Zhurakivskyy <oleg.zhurakivskyy@intel.com>
2022-02-09 16:49:35 +02:00
Ed Bartosh
8626d47d8b operator: implement NFD labelling rules
- added labelling rules for all supported devices
- updated operator installation instructions

Fixes: #768

Signed-off-by: Ed Bartosh <eduard.bartosh@intel.com>
2022-02-08 17:01:03 +02:00
Ed Bartosh
55f3e17dd0 add 'annotations' parameter to the NewDeviceInfo API
Signed-off-by: Ed Bartosh <eduard.bartosh@intel.com>
2022-02-07 15:15:30 +02:00
Tuomas Katila
6f57c55ef8 Add a total tile count to node's labels
This label isn't dependent on the debugfs as the platform
specific tile count is.

Signed-off-by: Tuomas Katila <tuomas.katila@intel.com>
2022-01-26 09:57:33 +02:00
Ukri Niemimuukko
7520393041 gpu_nfdhook: gpu-numbers and pci-groups
This adds a new label "gpu-numbers" for short numbered lists of
gpus, omitting "card" from the names. Also adds splitting of long
label values.

Similarly this adds a new label "pci-groups" for PCI groups. Grouping
can be controlled by env var GPU_PCI_GROUPING_LEVEL. The env var
dictates, how many pci-folder names need to match, in order for GPUs
to be considered to belong in a group.

Signed-off-by: Ukri Niemimuukko <ukri.niemimuukko@intel.com>
2022-01-25 09:17:56 +02:00
Mikko Ylinen
c306f5ef68 qat: detect noiommu mode with VFIO
If the kernel has CONFIG_VFIO_NOIOMMU enabled and the node admin
has explicitly set enable_unsafe_noiommu_mode VFIO parameter,
VFIO taints the kernel and writes "vfio-noiommu" to the IOMMU
group name. If these conditions are true, the /dev/vfio/ devices
are prefixed with "noiommu-".

This use-case is documented for DPDK so we don't want to break
it (as it was before because we added DeviceMounts to
/dev/vfio/<iommugroup> files that did not exist).

See DPDK documentation for further information and warnings.

Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2022-01-10 06:11:59 +02:00
Eero Tamminen
36046d90a4 Make GPU plugin / resource label limitations more explicit
While the labeling limit is obvious after little thought, IMHO
limitations like this should either be stated out front, or be in
their own section in the README.  Commit does former for the GPU
plugin fractional resources, and latter for the NFD hook / labeling.
2022-01-04 11:43:08 +02:00
Ukri Niemimuukko
46dcffc33e README typofix
Label descriptions had extra underscores.

Signed-off-by: Ukri Niemimuukko <ukri.niemimuukko@intel.com>
2021-12-28 12:01:40 +02:00
Hyeongju Johannes Lee
a2d13eea4c dlb: update README
Remove the sentence for pre-built image since Dockerhub image for dlb
plugin is available.

Signed-off-by: Hyeongju Johannes Lee <hyeongju.lee@intel.com>
2021-12-22 03:49:49 -08:00
Hyeongju Johannes Lee
74ecd6919c dsa: Fix the names still left as idxd-initcontainer
There are a few things left un-renamed after \#771.
Rename those to idxd-config-initcontainer.

Signed-off-by: Hyeongju Johannes Lee <hyeongju.lee@intel.com>
2021-12-21 04:39:19 -08:00
Mikko Ylinen
e09d52f6ff
Merge pull request #816 from hj-johannes-lee/dlb-flag-parse
dlb:Fix the problem that klog is not printed
2021-12-21 13:31:49 +02:00
Hyeongju Johannes Lee
515bd5908c dlb:Fix the problem that klog is not printed
Add flag parsing to get command line parameters so that parameters about
klog can be not ignored
2021-12-21 01:58:58 -08:00
Mikko Ylinen
c7e18d8b25 qat: rework driver binding
The new_id based driver binding is failing on kernels 5.11+ when the
QAT VF is not bound to any driver: attempts to write to new_id with
the same device ID repeatedly error with "file exists".

Move the new_id initialization to the beginning of the startup and
write the enabled device IDs only once.

This commit also fixes an issue where VF devices where not correctly detected
in virtual machines where the VF was not bound any driver.

Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2021-12-21 08:20:02 +02:00
Mikko Ylinen
b48ca7f686 qat: update dpdkdrv unit tests
After a closer review, it was noticed that some of the QAT dpdkdrv
unit tests need updating:

- "Broken igb_uio DPDKdriver..." is actually testing unknown device ID
and we already have tests for it -> drop.
- "igb_uio DPDKdriver with one kernel bound device (not QAT device)" is
testing something impossible: an unknown VF devID is originated from a
QAT PF -> drop.
- creating files for unbind/new_id etc. is unnecessary because
os.WriteFile() creates them during the tests -> drop these lines to
simplify unit tests maintenance.

Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2021-12-21 08:20:02 +02:00
dependabot[bot]
9a16e80f2b build(deps): bump google.golang.org/grpc from 1.42.0 to 1.43.0
Bumps [google.golang.org/grpc](https://github.com/grpc/grpc-go) from 1.42.0 to 1.43.0.
- [Release notes](https://github.com/grpc/grpc-go/releases)
- [Commits](https://github.com/grpc/grpc-go/compare/v1.42.0...v1.43.0)

---
updated-dependencies:
- dependency-name: google.golang.org/grpc
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

---
In addition to changes made by dependabot, I add nolint comments to ignore staticcheck(SA1019) errors.
It is because insecure.NewCredentials() recommended as an alternative is still declared experimental.
So keep grpc.withInsecure() with nolint comment.

Signed-off-by: Hyeongju Johannes Lee <hyeongju.lee@intel.com>
2021-12-20 04:50:39 -08:00
Ed Bartosh
cec004c398 lint: enable wsl check
Fixes: #392

Signed-off-by: Ed Bartosh <eduard.bartosh@intel.com>
2021-12-17 11:48:48 +02:00
Eero Tamminen
bcc737bd2a Adapt GPU label support to debugfs DRM entry changes
GPU generation "gen" number is replaced in the capability files of
latest kernels with separate display, graphics, and media versions.

For compatibility with newer kernels, provide "gen" based on the new
labels (but without decimals), and for older kernel compatibility, new
labels based on the "gen".

Because different kernels match different items from the action map,
whole capability file will get parsed. Capability file parsing is
optimized by using prefix check instead of scanf.

"platform_gen" label is deprecated, and can be dropped whenever it
becomes inconvenient (lint complains about line count etc).
2021-12-16 21:22:31 +02:00
Eero Tamminen
599fc18e71 Provide workaround for the media issue and document it
The issue is with VA-API and QSV, not VPL media API.
2021-12-15 18:40:33 +02:00
Hyeongju Johannes Lee
37dc1b124e dlb: update README
Add info on how to configure dlb driver and vfs.

Signed-off-by: Hyeongju Johannes Lee <hyeongju.lee@intel.com>
2021-12-14 12:05:35 -08:00
Mikko Ylinen
e83a811ec7 sgx: update README
The cmdline flags talked about the old device nodes. With the
upstream driver, the devices nodes are /dev/sgx_[enclave|provision].

Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2021-12-01 14:33:33 +02:00
Oleg Zhurakivskyy
fee2e12996 idxd-initcontainer: Drop libkmod, libudev
- Make libkmod, libudev optional
- Include accel-config, libjson-c, libuuid sources

Signed-off-by: Oleg Zhurakivskyy <oleg.zhurakivskyy@intel.com>
2021-11-30 15:32:23 +02:00
Mikko Ylinen
1c4ee778b3 sgx: update NFD deployment
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2021-11-25 17:13:03 +02:00
Dmitry Rozhkov
db20ce1fe4
Merge pull request #754 from mythi/PR-2021-062
qat: update default flags and deploy without ConfigMap
2021-11-22 10:00:02 +02:00
Hyeongju Johannes Lee
84d8408a4f README: add that operator supports for DSA and DLB plugins
Signed-off-by: Hyeongju Johannes Lee <hyeongju.lee@intel.com>
2021-11-19 02:38:58 -08:00
Mikko Ylinen
b921a4a458 qat: update default flags and deploy without ConfigMap
To make QAT plugin deployment consistent with the other plugins
we update the default flags and deploy without the flag settings
provided by the ConfigMap.

Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2021-11-18 14:02:36 +02:00
Dmitry Rozhkov
471549c11d
Merge pull request #753 from hj-johannes-lee/dlb-operator
operator: Add DLB support
2021-11-18 10:23:16 +02:00
Dmitry Rozhkov
42cde4ff6c
Merge pull request #742 from guoshuxu/dev
GPU devices resource preferred allocation methods.
2021-11-18 10:22:03 +02:00
Xu, Guoshu
e4c4a8f7ac GPU devices resource preferred allocation methods.
1. Implement PreferredAllocator interface.
2. Provide 3 preferred allocation policies: balancedPolicy, packedPolicy and nonePolicy.
3. Provide the cmdline interface: -allocation-policy balanced/packed/none, to select which preferred allocation policy to use.
4. Add operator support.

Co-authored-by: Mikko Ylinen <mikko.ylinen@intel.com>
2021-11-17 22:55:10 +08:00
Hyeongju Johannes Lee
ff9034822b operator: Add DLB support
Signed-off-by: Hyeongju Johannes Lee <hyeongju.lee@intel.com>
2021-11-17 01:51:47 -08:00
Leow Chun Fung
1bbb0a6a7c Support for PCI VPU device 8086/4fc0 and 8086/4fc1 2021-11-16 22:13:33 +07:00
Ed Bartosh
80829f72b1 ci: improve golangci job
- used the same go version as for the project build
- used verbose output
- fixed gofmt check failures

Signed-off-by: Ed Bartosh <eduard.bartosh@intel.com>
2021-11-13 00:32:25 +02:00
Ed Bartosh
b03227f9d4 dlb: add documentation
Document DLB plugin

Signed-off-by: Hyeongju Johannes Lee <hyeongju.lee@intel.com>
Signed-off-by: Ed Bartosh <eduard.bartosh@intel.com>
2021-11-11 12:25:25 +02:00
Hyeongju Johannes Lee
8362028560 dlb: Add new device plugin
Signed-off-by: Hyeongju Johannes Lee <hyeongju.lee@intel.com>
2021-11-11 11:51:49 +02:00
Oleg Zhurakivskyy
a7c612f7fc dsa: Rename dsa initcontainer to idxd
Signed-off-by: Oleg Zhurakivskyy <oleg.zhurakivskyy@intel.com>
2021-11-09 12:00:44 +02:00
Oleg Zhurakivskyy
cdaf6b3807 dsa: Add a documentation on provisioning with ConfigMap
Signed-off-by: Oleg Zhurakivskyy <oleg.zhurakivskyy@intel.com>
2021-11-09 10:31:50 +02:00
Hyeongju Johannes Lee
13f4ce82a1 Remove nolint annot.
Remove the annotation nolint:funlen since funlen is not used anymore.
2021-10-11 11:36:24 +03:00
Mikko Ylinen
e6cf299750 gpu: update READMEs
Commit 00a59e8f7d was not complete in that it didn't update
the corresponding documentation. This commit fixes that.

Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2021-10-08 11:57:16 +03:00
Oleg Zhurakivskyy
30ebc8e5d1 dsa: Add a documentation on provisioning with initcontainer
Signed-off-by: Oleg Zhurakivskyy <oleg.zhurakivskyy@intel.com>
2021-10-01 12:16:50 +03:00
Mikko Ylinen
9d0d6cbe11 qat: set c6xxvf and 4xxxvf to default devices
The devices enabled by default are different between the
kustomize and operator based deployments.

This change harmonizes the defaults to c6xxvf and 4xxxvf
in both deployment options.

Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2021-09-23 10:50:38 +03:00
Dmitry Rozhkov
19d54b9fe8
Merge pull request #707 from uniemimu/mem_read
gpu nfdhook: new memory amount reading logic
2021-09-23 10:33:41 +03:00
Ukri Niemimuukko
64290020d7 gpu nfdhook: new memory amount reading logic
This changes the memory reading to be done through lmem_total_bytes
file instead of the addr_range file.

Signed-off-by: Ukri Niemimuukko <ukri.niemimuukko@intel.com>
2021-09-21 13:50:41 +03:00
Hyeongju Johannes Lee
8fc5df7e37 Add govet-fieldalignment
Add govet-fieldalignment to .golangci.yml
Fix errors that come from adding govet-fieldalignment
- by reordering the fields of structs
- by putting nolint:govet annotations

Signed-off-by: Hyeongju Johannes Lee <hyeongju.lee@intel.com>
2021-09-20 20:59:04 +03:00
Ukri Niemimuukko
0670a82cb1 gpu rm linter comment fixes
Signed-off-by: Ukri Niemimuukko <ukri.niemimuukko@intel.com>
2021-09-10 14:35:13 +03:00
Li Ning
dcc12d9089 documentation: remove deprecated toc section in README
The 'Verify node kubelet config' content was removed in 6b208f8.

Signed-off-by: Li Ning <ning.a.li@transwarp.io>
2021-09-07 19:38:41 +08:00
Hyeongju Johannes Lee
4bc70ac544 Add goerr113 linter check
Add goerr113 lintercheck
Fix the usage of fmt.Errorf() by wrapping errors
Fix the usage of errors.New()
2021-09-03 11:02:14 +03:00
Hyeongju Johannes Lee
09ba9fde00 Update tool versions and fix errors and warnings that originated from the update
Update tool versions
Fix the errors and warnings originated from the update:
-Correct type deviceInfo (->DeviceInfo) to make it public
-Fix gpu_plugin.go and vpu_plugin_test.go where stylecheck errors occur
-Fix deprecation warnings
-Rename type 'PatcherManager' to 'Manager' to solve exported errors
-Rename type 'SgxMutator' to 'Mutator' to solve exported errors

Signed-off-by: Hyeongju Johannes Lee <hyeongju.lee@intel.com>
2021-08-25 07:09:34 +00:00
Mikko Ylinen
cfe2d65f32
Merge pull request #659 from 0x161e-swei/sgx-nfd-operator-dependency
Add SGX webhook operator as dependency of sgx-nfd
2021-07-28 06:20:32 +03:00
Shijia Wei
9b66176ca5 Add SGX admissionwebhook as dependency of sgx-nfd daemonset;
Mentioned dependency of the cert-manager in DaemonSet deployment method
in SGX README.
2021-07-27 00:39:59 -05:00
Ed Bartosh
8a54a9ba64 webhook: document mappings deployment
Fixes: #580

Signed-off-by: Ed Bartosh <eduard.bartosh@intel.com>
2021-07-26 14:23:10 +03:00
Eero Tamminen
83e7de0d41 Make GPU plugin intro information more generic & accurate
- Information on specific HW & virtualization types on which GPU plugin
  is tested on, belongs to releases notes, not to README intro
  (where it has already became obsolete)
- HW offloading is provided by driver backends, not frontends
  (e.g. OneVPL is just one of the media driver frontends)
2021-06-22 18:27:17 +03:00
Ukri Niemimuukko
b0130e693f more documentation for fractional resources
This adds a section heading, TOC link, command line flag description
and a short explanation of what other dependendent configuration
changes are needed with fractional resources in order for the command
line flag to achieve something useful.

Signed-off-by: Ukri Niemimuukko <ukri.niemimuukko@intel.com>
2021-06-14 16:25:38 +03:00
Ed Bartosh
98f80b5f47
Merge pull request #652 from uniemimu/hookupdate
add link to gpu_nfdhook and update hook README
2021-06-13 12:15:46 +03:00
Eero Tamminen
a2faa3a8fc Add section on GPU plugin options to its README 2021-06-11 19:55:43 +03:00
Ukri Niemimuukko
cbf7bab114 add link to gpu_nfdhook and update hook README
This adds a link from gpu-plugin README to the nfdhook README, and
updates the nfdhook README with label descriptions.

Signed-off-by: Ukri Niemimuukko <ukri.niemimuukko@intel.com>
2021-06-11 18:54:44 +03:00
skaajas
956154c1db
Updated GPU plugin-specific readme general description. 2021-06-11 15:50:14 +03:00
Ed Bartosh
9d8fb392f5
Merge pull request #637 from uniemimu/skip
add pf skip to gpu nfdhook
2021-06-11 10:57:39 +03:00
Ukri Niemimuukko
e3bf21dbe9 gpu_plugin: add documentation links to gpu aware scheduling
Signed-off-by: Ukri Niemimuukko <ukri.niemimuukko@intel.com>
2021-06-10 19:46:35 +03:00
Ukri Niemimuukko
7ca5cfcfd6 add pf skip to gpu nfdhook
This corresponds to the previous gpu-plugin skip code.

Signed-off-by: Ukri Niemimuukko <ukri.niemimuukko@intel.com>
2021-06-10 18:44:57 +03:00
Mikko Ylinen
383778a24b qat: fix C4xxx driver name
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2021-06-10 08:45:23 +03:00
Ed Bartosh
e180bfdf07
Merge pull request #644 from mythi/PR-2021-034
qat: do not fail if driver/unbind file does not exist
2021-06-09 11:38:52 +03:00
Mikko Ylinen
e8115d1c8d qat: do not fail if driver/unbind file does not exist
<device>/driver symlink does not exist if the device is not bound
to any driver. bindDevice() failed when writing to <device>/driver/unbind
errored but IsNotExist() error is acceptable in case there's no driver
to unbind.

Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2021-06-09 11:09:24 +03:00
Dmitry Rozhkov
6aa1a47c9a
Merge pull request #638 from uniemimu/fractional
gpu_plugin: fractional resource management
2021-06-09 10:58:10 +03:00
Ukri Niemimuukko
2c4d529d66 gpu_plugin: fractional resource management
Fractional resource management feature

Signed-off-by: Ukri Niemimuukko <ukri.niemimuukko@intel.com>
Signed-off-by: Dmitry Rozhkov <dmitry.rozhkov@intel.com>
2021-06-04 13:06:50 +03:00
Mikko Ylinen
facb4214a2 tree-wide: drop deprecated io/ioutil
Go 1.16 release notes announced the deprecation of io/ioutil [1]. It's easy
for us to move to use what is was recommended so just do it.

[1] https://golang.org/doc/go1.16#ioutil

Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2021-06-02 13:41:15 +03:00
Mikko Ylinen
06dbc1331b images: move intel-qat-plugin-kerneldrv to Debian
Also, update the documentation to reflect what is needed to
enable and use '-mode kernel'.

Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2021-06-02 13:39:39 +03:00
Leow Chun Fung
8e4b58c0f6 Implement support for PCI-based VPU 2021-05-19 18:15:17 +07:00
Mikko Ylinen
c3cf958c85 images: move most plugin images to distroless/static
All but one (VPU) of the published container images can be built with
static binaries which allows us to use distroless/static as the
base image. Moreover, when combined with stripping the plugin binaries,
we can get both build time and image size savings.

This is the part 1 (out of 2) of the rework. Part 2 will finish the
change by making some adjustments to VPU plugin image and moving the
FPGA/SGX/GPU initcontainers to distroless/static too.

Partial: #516

Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
Signed-off-by: Ed Bartosh <eduard.bartosh@intel.com>
2021-05-19 09:44:47 +03:00
Eero Tamminen
c575ce9099 Document GPU plugin test code test-case struct members 2021-05-06 11:02:57 +03:00
Eero Tamminen
57c8d76e1b Add minimal GPU plugin options testing
Tests plugin scan results in setups having none, one and multiple
eligible GPU devices, with and without SRIOV enabled, with two
different options values.

This does not cover verifying number of devices added under
"i915_monitoring" resource as that would be much larger change.
2021-05-05 17:09:09 +03:00
Eero Tamminen
ca9aa32556 Add "-enable-monitoring" option to GPU plugin
Make "i915_monitoring" resource (granting access to all GPUs) optional
so that it can be enabled only when it's needed.
2021-05-05 17:09:09 +03:00
Eero Tamminen
713c1ab170 Move GPU plugin CLI options to a struct
To help in:
* adding more CLI options in next and later commits, and
* to replace magic newDevicePlugin() input parameters with
  explicitly named one(s)
2021-05-05 17:09:09 +03:00
Eero Tamminen
06fac8128f Move GPU plugin sysfs device compatibility checks to own function
To reduce scan() function complexity before adding more functionality
to it.
2021-05-05 17:08:49 +03:00
Eero Tamminen
79b86fea2d Skip PF for "i915" resource when it has VFs
NOTE: this has impact only for GPUs which are virtualized with SR-IOV.

Access to physical devices (PFs) is disabled for "i915" resource when
they have configured virtual devices (VFs).

This is because:

* GPU resources are expected to be evenly split between VFs in such
  configurations

* But PF resource amount is expected to differ from VFs and typically
  retain only enough resources (just few MB of RAM), to be able to
  provide GPU metrics that are not available from VFs

* Neither the current GPU plugin, nor Kubernetes scheduling in
  general, has proper support for heterogeneous GPUs (= capability
  based scheduling)

Therefore "i915" resource needs to be limited to GPU devices with
homogeneous amount of resources, which in SR-IOV configurations is
expected to be the case only with VFs (when such are present).
2021-05-05 14:13:48 +03:00
Dmitry Rozhkov
38a59a57ea
Merge pull request #626 from mythi/PR-2021-028
sgx: add note about the SGX DCAP driver usage
2021-04-28 08:42:02 +03:00
Mikko Ylinen
111b833ea8 sgx: add note about the SGX DCAP driver usage
The SGX DCAP out-of-tree v1.41 driver is also known to work
with the SGX plugin. However, the default NFD labeling does not
work with the out-of-tree driver so warn users about it.

Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2021-04-27 22:10:21 +03:00
Eero Tamminen
e418c00fca Add "i915_monitoring" resource to GPU plugin
Which mounts all (Intel) GPU devices to requesting container.

This is needed e.g. to get GPU metrics from the node.  Requesting pod
does not know how many GPUs are on the node it gets assigned to, so
there needs to means to request them all.

(Only alternative for the new resource would be requesting Privileged
mode, which is clearly worse as that would grant pod access also to
all other devices and capabilities.)

This commit also:

* Adds "i915_monitoring" resource testing to: go test -v -run Scan

* Splits GPU plugin tests mock file system setup to a separate
  createTestFiles() function because otherwise TestScan() does not
  pass project's golangci-lint complexity limits
2021-04-27 14:21:05 +03:00
Ed Bartosh
08c2094329 update to cert-manager v1.3.1
Signed-off-by: Ed Bartosh <eduard.bartosh@intel.com>
2021-04-22 14:45:39 +03:00
Dmitry Rozhkov
3892baa4be
Merge pull request #615 from eero-t/gpu-plugin-testing-improvements
Gpu plugin testing improvements
2021-04-20 09:47:10 +03:00
Mikko Ylinen
280bdceb2a sgx: add separate admissionwebhook image
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2021-04-14 08:09:33 +03:00
Ed Bartosh
31614592c6
Merge pull request #599 from ozhuraki/operator-select-device-type
Make it possible to select supported devices in the operator
2021-04-12 19:09:59 +03:00
Ukri Niemimuukko
bb44156d4f gpu_nfdhook: make memory parsing more robust
This add support for parsing also hex and octal amounts.

Signed-off-by: Ukri Niemimuukko <ukri.niemimuukko@intel.com>
2021-04-09 16:23:48 +03:00
Oleg Zhurakivskyy
6fbf7c9182 operator: README: Document per device deployment
Signed-off-by: Oleg Zhurakivskyy <oleg.zhurakivskyy@intel.com>
2021-04-08 10:53:04 +00:00
Oleg Zhurakivskyy
2d27602ed0 operator: Add --device command line to operator
Add --device command line to operator's main.go which defines
the controllers/webhooks to set up.

Signed-off-by: Oleg Zhurakivskyy <oleg.zhurakivskyy@intel.com>
2021-04-08 10:33:47 +00:00
Eero Tamminen
f9158c1c3b Update GPU plugin copyrights 2021-04-01 15:20:35 +03:00
Eero Tamminen
8ca19d408f Fix GPU plugin error messages 2021-04-01 15:20:35 +03:00
Eero Tamminen
384d37ead0 Add test for multiple GPU devices 2021-04-01 15:20:35 +03:00
Eero Tamminen
49354693fb Fix GPU plugin test setup + better error message
Tests fail depending in which order they are run, unless mocked files
are cleaned between test runs.

Without this, the next commit would fail.
2021-04-01 15:20:35 +03:00
Mikko Ylinen
97bcecda04 operator: update usage guidelines
As the operator container image is available from a registry, we should
guide users to use it rather than build and deploy it locally.

Further, drop (un)deploy-operator targets in favor of simply using
kubectl for deployment.

Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2021-03-30 15:33:09 +03:00
Dmitry Shmulevich
c8b5dce247 added an option to create a node label if epc memory is present
updated README for SGX device plugin

Signed-off-by: Dmitry Shmulevich <dmitry.shmulevich@gmail.com>
2021-03-18 11:53:49 -07:00
Ukri Niemimuukko
f89b61f923 add tile count label
Signed-off-by: Ukri Niemimuukko <ukri.niemimuukko@intel.com>
2021-02-26 20:39:48 +02:00
Mikko Ylinen
15ad4ed54b ci: drop master branch from workflow triggers
Also, polish the remaining docs hits to 'master'.

Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2021-02-23 10:51:04 +02:00
DougTW
7153923cfc Edited qat_plugin README
Replaced multiple instances of master with main.
Reworded line 15 "Verify QAT device plugin is registered" removed 'on master'
and corresponding section heading. Related to pr499.

Signed-off-by: DougTW <doug.martin@intel.com>
2021-02-18 13:59:40 +02:00
Mikko Ylinen
abfa3496a2 sgx: update SGX SDK/DCAP versions
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2021-02-18 09:31:28 +02:00
Mikko Ylinen
f8c20905aa update to cert-manager v1.2.0
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2021-02-12 15:39:07 +02:00
Mikko Ylinen
37618d4f85 operator: move deviceplugin/v1 CRDs to cluster scope
The device plugins daemonsets are cluster wide and currently only
one device plugin instance per device is possible so making the
corresponding deviceplugin/v1 CRDs non-namespaced (i.e., scope: cluster)
fits better.

Previously, the device plugin daemonset was deployed in the same
namespace as the CR for that device but with the cluster scoped CRDs
we default to use the same namespace as the operator, unless overridden
via DEVICEPLUGIN_NAMESPACE env variable or a command line parameter
to operator manager deployment.

Three additional changes in this commit:
- enable DSA envtest tests
- update controller-runtime to v0.8.1
- change device plugin envtest suite to use klog/v2

Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2021-02-11 11:41:47 +02:00
Mikko Ylinen
c1f609c34a
Merge pull request #560 from DougTW/dm-edits-gpu-plugin
edited gpu_plugin README; changed 2 instances of master to main.
2021-02-10 15:00:54 +02:00
Mikko Ylinen
2409427939
Merge pull request #561 from DougTW/dm-edits-operator
Edited operator README. Changed 1 instance of master to main, line 78.
2021-02-10 10:36:56 +02:00
Mikko Ylinen
667aa943a4
Merge pull request #563 from DougTW/dm-edits-sgx-plugin
Editing sgx_plugin README. Replacing 'master' with 'main'.
2021-02-10 10:35:51 +02:00
Ed Bartosh
d446be3c3d
Merge pull request #558 from DougTW/dm-edits-fpga-adms-readme
fpga_admissionwebhook README.md; changed master to main
2021-02-10 10:15:04 +02:00
DougTW
a856f3215d Editing sgx_plugin README. Replacing 'master' with 'main'. Related to pr499.
Signed-off-by: DougTW <doug.martin@intel.com>
2021-02-09 17:17:05 -08:00
DougTW
80a7e4e651 Edited operator README. Changed 1 instance of master to main, line 78.
Signed-off-by: DougTW <doug.martin@intel.com>
2021-02-09 16:59:20 -08:00
DougTW
625b30fd1b Fixes 560. Edited gpu_plugin README. Restored master to line 157
Signed-off-by: DougTW <doug.martin@intel.com>
2021-02-09 16:49:30 -08:00
Mikko Ylinen
965936d8c3
Merge pull request #553 from bart0sh/PR0103-implement-dsa-operator
operator: add DSA support
2021-02-09 16:24:41 +02:00
DougTW
28cbebc81b edited gpu_plugin README; changed 2 instances of master to main. Related to PR 499.
Signed-off-by: DougTW <doug.martin@intel.com>
2021-02-08 18:40:47 -08:00
DougTW
467d4082d3 fpga_plugin-readme; changed one instance of master to main. Related to PR 499.
Signed-off-by: DougTW <doug.martin@intel.com>
2021-02-08 18:14:34 -08:00
DougTW
5ee1b6ce23 fpga_admissionwebhook README.md; changed master to main
Signed-off-by: DougTW <doug.martin@intel.com>
2021-02-08 17:24:46 -08:00
Ed Bartosh
884f8e3dfe operator: add DSA support
Fixes: #443

Signed-off-by: Ed Bartosh <eduard.bartosh@intel.com>
2021-02-09 02:13:27 +02:00
Mikko Ylinen
7561501a51
Merge pull request #550 from dmitsh/ds-ext-res
added implementation of EPC extended resource advertiser
2021-02-08 19:53:46 +02:00
Dmitry Shmulevich
3c3a3d1145 added implementation of EPC extended resource advertiser
Signed-off-by: Dmitry Shmulevich <dmitry.shmulevich@gmail.com>
2021-02-04 17:35:17 -08:00
Mikko Ylinen
e94857ce5d docs: harmonize device plugins operator naming
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2021-02-04 15:12:37 +02:00
Mikko Ylinen
0892a34705 move to k8s.io v1.20.x and klog/v2 v2.4.0
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2021-01-21 15:34:39 +02:00
Dmitry Rozhkov
771b0c7432
Merge pull request #544 from mythi/PR-2021-003
sgx: change getDefaultPodCount() logic
2021-01-13 10:31:16 +02:00
Mikko Ylinen
ed3a650ddd sgx: change getDefaultPodCount() logic
Decouple the default enclaveLimit/provisionLimit from core count. With
this change, the default limit is constant and it can be made relative
to core count by setting PODS_PER_CORE multiplier via env variable.

Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2021-01-12 20:24:46 +02:00
Ed Bartosh
6b208f8acf documentation: remove kubelet configuration check
Removed device plugin socket check from the documentation as
device plugin support is enabled by default in Kubelet.

Signed-off-by: Ed Bartosh <eduard.bartosh@intel.com>
2021-01-12 13:00:20 +02:00
Mikko Ylinen
da4a9fca96 qat: add note about vfio-pci module parameters
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2021-01-11 18:48:43 +02:00
Ed Bartosh
b007dc26f5 dsa: fix kubectl command line
Fixed kubectl command line to get allocatable DSA resources

Signed-off-by: Ed Bartosh <eduard.bartosh@intel.com>
2020-12-30 15:37:16 +02:00
Ed Bartosh
2e4de52f2b implement DSA demo
- Impelemented demo image that runs accel-config tests
- Added testing instructions to the documentation

Signed-off-by: Ed Bartosh <eduard.bartosh@intel.com>
2020-12-28 14:45:25 +02:00
Ukri Niemimuukko
5d31dca018 gpu_nfdhook: remove devfs dependency
This removes the devfs dependency. Sysfs is sufficient for scanning
presense of GPUs.

Signed-off-by: Ukri Niemimuukko <ukri.niemimuukko@intel.com>
2020-12-23 15:43:48 +02:00
Mikko Ylinen
aef2e1655e qat: run TestScanPrivate tests in parallel
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2020-12-23 11:18:21 +02:00
Mikko Ylinen
26d4b6f3a8 qat: fix device ID validation
It looks that for a long time now we have accepted a setup where a valid QAT
device ID is accepted as a QAT device resource even though the device is
not "enabled" via kernelVfDrivers parameter.

Fix device ID validation to skip valid QAT devices that are not
explicitly specified in kernelVfDrivers.

Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2020-12-21 14:33:27 +02:00
Mikko Ylinen
85fce2dcab qat: rework device scanning
The updated dp.scan() changes the way how VF devices are detected. The
main reason for the change is to take into account cases where the QAT VF
driver is not present in the system at all but only the PF driver is
loaded (and the SR-IOV devices are are enabled).

The rework also takes into account bare metal and VM deployments and
adds a test case for checking the virtualized environment.

Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2020-12-18 15:33:25 +02:00
Mikko Ylinen
2155a24e73 qat: add new devices and change defaults
The plugin now detects/accepts 4xxx and c4xxx devices too
and defaults to those drivers that are part of Linux mainline.

Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2020-12-17 15:23:00 +02:00
Mikko Ylinen
621122e456 sgx_epchook: update to cpuid/v2
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2020-12-15 19:58:13 +02:00
Ed Bartosh
2e7367eab3 fpga hook: language cleanup
Signed-off-by: Ed Bartosh <eduard.bartosh@intel.com>
2020-12-10 10:58:40 +02:00
Mikko Ylinen
312b771ab7
Merge pull request #494 from bart0sh/PR0093-DSA-draft
Implement DSA plugin
2020-12-09 15:15:46 +02:00
Mikko Ylinen
18ec3a449e qat: move to path/filepath
We have both "path" and "path/filepath" but the latter provides
everything needed so move it completely.

Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2020-12-08 07:38:20 +02:00
Mikko Ylinen
ad8bbcea21 qat: rework bus-device-function handling
The code was stripping out "0000:" (bus) and then adding
it back in several places.

That's not necessary so this change simplifies QAT VF addr
handling by operating using full BDF IDs.

Moveover, simplify function calls: use getDpdkDevice() once
for each VF device.

Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2020-12-08 07:37:16 +02:00
Ed Bartosh
174643436a implement DSA plugin
Signed-off-by: Ed Bartosh <eduard.bartosh@intel.com>
2020-12-03 17:24:48 +02:00
Dmitry Rozhkov
f0fa9df292 operator: prepare for publishing at operatorhub.io 2020-11-24 18:35:56 +02:00
Mikko Ylinen
d65cb902e6 sgx: move to RFC v4x device API
The SGX device nodes have changed from /dev/sgx/[enclave|provision]
to /dev/sgx_[enclave|provision] in v4x RFC patches according to the
LKML feedback.

This changes moves to use the new device nodes. Backwards compatibility
is provided by adding /dev/sgx directory mount to containers. This
assumes the cluster admin has installed the udev rules provided in the
README to make the old device nodes as symlinks to the new device nodes.

Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2020-11-18 21:17:28 +02:00
Dmitry Rozhkov
5ec466b2eb add known issue for operator 2020-11-12 11:23:41 +02:00
Alexander D. Kanevskiy
75355c9937
Merge pull request #497 from bart0sh/PR0094-move-GetAPIVersion-out-of-NewPort
fpga: move GetAPIVersion call out of NewPort and NewFME
2020-11-11 12:09:13 +02:00
Ed Bartosh
2c73e2a0b3 fpga: move GetAPIVersion call out of NewPort and NewFME
This call is implemented by calling ioctl, which raises
"open /dev/intel-fpga-port.X: operation not permitted" error
when called inside unprivileged container.

This breaks FPGA plugin.

Calling this API from fpga_tool is still OK, so
moving calls there should fix the issue.

Signed-off-by: Ed Bartosh <eduard.bartosh@intel.com>
2020-11-10 16:44:20 +02:00
Dmitry Rozhkov
5f0da56045 Upgrade to k8s v1.19.3 2020-11-10 16:09:20 +02:00
Ed Bartosh
680da54fd9 fpga: improve port init
Used generic newPort API instead of device-specific
newDflPort and newIntelFpgaPort.

Signed-off-by: Ed Bartosh <eduard.bartosh@intel.com>
2020-11-01 01:47:49 +02:00
Dmitry Rozhkov
25a52b0b74
Merge pull request #478 from bart0sh/PR0091-FPGA-SRIO-V
fpga: reimplement device discovering
2020-10-30 10:05:05 +02:00
Mikko Ylinen
0f6eefee23 sgx: add documentation
This commit documents the SGX building blocks for Kubernetes and
how to deploy them in the cluster.

Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2020-10-27 15:02:40 +02:00
Ed Bartosh
243870a707 fpga: reimplement device discovering
Reimplemented discovering of the FPGA devices using
APIs from pkg/fpga/intel_fpga_linux. The APis are also
used in the fpga_tool utility.

The API is more advanced and supports SR-IOV among other
things.

Fixes: #372

Signed-off-by: Ed Bartosh <eduard.bartosh@intel.com>
2020-10-26 21:45:52 +02:00
Dmitry Rozhkov
87143355ba
Merge pull request #483 from mythi/sgx-nfd
sgx: make SGX NFD kustomization overlay independent
2020-10-26 13:25:36 +02:00
Ukri Niemimuukko
5b5180ae00 gpu_nfdhook memory amount reading from sysfs
This adds reading of the GPU memory amount from the sysfs. As a
fallback the environment variable GPU_MEMORY_OVERRIDE remains.

Another environment variable GPU_MEMORY_RESERVED can be used to
reserve a dedicated byte amount outside of kubernetes usage.

Signed-off-by: Ukri Niemimuukko <ukri.niemimuukko@intel.com>
2020-10-26 09:45:43 +02:00
Mikko Ylinen
161298190f sgx: make SGX NFD kustomization overlay independent
With the addition of SGX webhook in the operator, full SGX stack
depends on having the operator deployed first. SgxDevicePlugin CRD
is set to get intel-sgx-plugin and intel-sgx-initcontainer deployed
by the operator.

As a pre-requisite, node-feature-discovery must be deployed but it
is currently deployed via sgx_plugin kustomization overlay only.

It's better to allow NFD with the SGX specific settings deployed with
a kustomization of its own.

Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2020-10-23 12:44:36 +03:00