Commit Graph

533 Commits

Author SHA1 Message Date
Hyeongju Johannes Lee
8362028560 dlb: Add new device plugin
Signed-off-by: Hyeongju Johannes Lee <hyeongju.lee@intel.com>
2021-11-11 11:51:49 +02:00
Oleg Zhurakivskyy
a7c612f7fc dsa: Rename dsa initcontainer to idxd
Signed-off-by: Oleg Zhurakivskyy <oleg.zhurakivskyy@intel.com>
2021-11-09 12:00:44 +02:00
Oleg Zhurakivskyy
cdaf6b3807 dsa: Add a documentation on provisioning with ConfigMap
Signed-off-by: Oleg Zhurakivskyy <oleg.zhurakivskyy@intel.com>
2021-11-09 10:31:50 +02:00
Hyeongju Johannes Lee
13f4ce82a1 Remove nolint annot.
Remove the annotation nolint:funlen since funlen is not used anymore.
2021-10-11 11:36:24 +03:00
Mikko Ylinen
e6cf299750 gpu: update READMEs
Commit 00a59e8f7d was not complete in that it didn't update
the corresponding documentation. This commit fixes that.

Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2021-10-08 11:57:16 +03:00
Oleg Zhurakivskyy
30ebc8e5d1 dsa: Add a documentation on provisioning with initcontainer
Signed-off-by: Oleg Zhurakivskyy <oleg.zhurakivskyy@intel.com>
2021-10-01 12:16:50 +03:00
Mikko Ylinen
9d0d6cbe11 qat: set c6xxvf and 4xxxvf to default devices
The devices enabled by default are different between the
kustomize and operator based deployments.

This change harmonizes the defaults to c6xxvf and 4xxxvf
in both deployment options.

Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2021-09-23 10:50:38 +03:00
Dmitry Rozhkov
19d54b9fe8
Merge pull request #707 from uniemimu/mem_read
gpu nfdhook: new memory amount reading logic
2021-09-23 10:33:41 +03:00
Ukri Niemimuukko
64290020d7 gpu nfdhook: new memory amount reading logic
This changes the memory reading to be done through lmem_total_bytes
file instead of the addr_range file.

Signed-off-by: Ukri Niemimuukko <ukri.niemimuukko@intel.com>
2021-09-21 13:50:41 +03:00
Hyeongju Johannes Lee
8fc5df7e37 Add govet-fieldalignment
Add govet-fieldalignment to .golangci.yml
Fix errors that come from adding govet-fieldalignment
- by reordering the fields of structs
- by putting nolint:govet annotations

Signed-off-by: Hyeongju Johannes Lee <hyeongju.lee@intel.com>
2021-09-20 20:59:04 +03:00
Ukri Niemimuukko
0670a82cb1 gpu rm linter comment fixes
Signed-off-by: Ukri Niemimuukko <ukri.niemimuukko@intel.com>
2021-09-10 14:35:13 +03:00
Li Ning
dcc12d9089 documentation: remove deprecated toc section in README
The 'Verify node kubelet config' content was removed in 6b208f8.

Signed-off-by: Li Ning <ning.a.li@transwarp.io>
2021-09-07 19:38:41 +08:00
Hyeongju Johannes Lee
4bc70ac544 Add goerr113 linter check
Add goerr113 lintercheck
Fix the usage of fmt.Errorf() by wrapping errors
Fix the usage of errors.New()
2021-09-03 11:02:14 +03:00
Hyeongju Johannes Lee
09ba9fde00 Update tool versions and fix errors and warnings that originated from the update
Update tool versions
Fix the errors and warnings originated from the update:
-Correct type deviceInfo (->DeviceInfo) to make it public
-Fix gpu_plugin.go and vpu_plugin_test.go where stylecheck errors occur
-Fix deprecation warnings
-Rename type 'PatcherManager' to 'Manager' to solve exported errors
-Rename type 'SgxMutator' to 'Mutator' to solve exported errors

Signed-off-by: Hyeongju Johannes Lee <hyeongju.lee@intel.com>
2021-08-25 07:09:34 +00:00
Mikko Ylinen
cfe2d65f32
Merge pull request #659 from 0x161e-swei/sgx-nfd-operator-dependency
Add SGX webhook operator as dependency of sgx-nfd
2021-07-28 06:20:32 +03:00
Shijia Wei
9b66176ca5 Add SGX admissionwebhook as dependency of sgx-nfd daemonset;
Mentioned dependency of the cert-manager in DaemonSet deployment method
in SGX README.
2021-07-27 00:39:59 -05:00
Ed Bartosh
8a54a9ba64 webhook: document mappings deployment
Fixes: #580

Signed-off-by: Ed Bartosh <eduard.bartosh@intel.com>
2021-07-26 14:23:10 +03:00
Eero Tamminen
83e7de0d41 Make GPU plugin intro information more generic & accurate
- Information on specific HW & virtualization types on which GPU plugin
  is tested on, belongs to releases notes, not to README intro
  (where it has already became obsolete)
- HW offloading is provided by driver backends, not frontends
  (e.g. OneVPL is just one of the media driver frontends)
2021-06-22 18:27:17 +03:00
Ukri Niemimuukko
b0130e693f more documentation for fractional resources
This adds a section heading, TOC link, command line flag description
and a short explanation of what other dependendent configuration
changes are needed with fractional resources in order for the command
line flag to achieve something useful.

Signed-off-by: Ukri Niemimuukko <ukri.niemimuukko@intel.com>
2021-06-14 16:25:38 +03:00
Ed Bartosh
98f80b5f47
Merge pull request #652 from uniemimu/hookupdate
add link to gpu_nfdhook and update hook README
2021-06-13 12:15:46 +03:00
Eero Tamminen
a2faa3a8fc Add section on GPU plugin options to its README 2021-06-11 19:55:43 +03:00
Ukri Niemimuukko
cbf7bab114 add link to gpu_nfdhook and update hook README
This adds a link from gpu-plugin README to the nfdhook README, and
updates the nfdhook README with label descriptions.

Signed-off-by: Ukri Niemimuukko <ukri.niemimuukko@intel.com>
2021-06-11 18:54:44 +03:00
skaajas
956154c1db
Updated GPU plugin-specific readme general description. 2021-06-11 15:50:14 +03:00
Ed Bartosh
9d8fb392f5
Merge pull request #637 from uniemimu/skip
add pf skip to gpu nfdhook
2021-06-11 10:57:39 +03:00
Ukri Niemimuukko
e3bf21dbe9 gpu_plugin: add documentation links to gpu aware scheduling
Signed-off-by: Ukri Niemimuukko <ukri.niemimuukko@intel.com>
2021-06-10 19:46:35 +03:00
Ukri Niemimuukko
7ca5cfcfd6 add pf skip to gpu nfdhook
This corresponds to the previous gpu-plugin skip code.

Signed-off-by: Ukri Niemimuukko <ukri.niemimuukko@intel.com>
2021-06-10 18:44:57 +03:00
Mikko Ylinen
383778a24b qat: fix C4xxx driver name
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2021-06-10 08:45:23 +03:00
Ed Bartosh
e180bfdf07
Merge pull request #644 from mythi/PR-2021-034
qat: do not fail if driver/unbind file does not exist
2021-06-09 11:38:52 +03:00
Mikko Ylinen
e8115d1c8d qat: do not fail if driver/unbind file does not exist
<device>/driver symlink does not exist if the device is not bound
to any driver. bindDevice() failed when writing to <device>/driver/unbind
errored but IsNotExist() error is acceptable in case there's no driver
to unbind.

Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2021-06-09 11:09:24 +03:00
Dmitry Rozhkov
6aa1a47c9a
Merge pull request #638 from uniemimu/fractional
gpu_plugin: fractional resource management
2021-06-09 10:58:10 +03:00
Ukri Niemimuukko
2c4d529d66 gpu_plugin: fractional resource management
Fractional resource management feature

Signed-off-by: Ukri Niemimuukko <ukri.niemimuukko@intel.com>
Signed-off-by: Dmitry Rozhkov <dmitry.rozhkov@intel.com>
2021-06-04 13:06:50 +03:00
Mikko Ylinen
facb4214a2 tree-wide: drop deprecated io/ioutil
Go 1.16 release notes announced the deprecation of io/ioutil [1]. It's easy
for us to move to use what is was recommended so just do it.

[1] https://golang.org/doc/go1.16#ioutil

Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2021-06-02 13:41:15 +03:00
Mikko Ylinen
06dbc1331b images: move intel-qat-plugin-kerneldrv to Debian
Also, update the documentation to reflect what is needed to
enable and use '-mode kernel'.

Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2021-06-02 13:39:39 +03:00
Leow Chun Fung
8e4b58c0f6 Implement support for PCI-based VPU 2021-05-19 18:15:17 +07:00
Mikko Ylinen
c3cf958c85 images: move most plugin images to distroless/static
All but one (VPU) of the published container images can be built with
static binaries which allows us to use distroless/static as the
base image. Moreover, when combined with stripping the plugin binaries,
we can get both build time and image size savings.

This is the part 1 (out of 2) of the rework. Part 2 will finish the
change by making some adjustments to VPU plugin image and moving the
FPGA/SGX/GPU initcontainers to distroless/static too.

Partial: #516

Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
Signed-off-by: Ed Bartosh <eduard.bartosh@intel.com>
2021-05-19 09:44:47 +03:00
Eero Tamminen
c575ce9099 Document GPU plugin test code test-case struct members 2021-05-06 11:02:57 +03:00
Eero Tamminen
57c8d76e1b Add minimal GPU plugin options testing
Tests plugin scan results in setups having none, one and multiple
eligible GPU devices, with and without SRIOV enabled, with two
different options values.

This does not cover verifying number of devices added under
"i915_monitoring" resource as that would be much larger change.
2021-05-05 17:09:09 +03:00
Eero Tamminen
ca9aa32556 Add "-enable-monitoring" option to GPU plugin
Make "i915_monitoring" resource (granting access to all GPUs) optional
so that it can be enabled only when it's needed.
2021-05-05 17:09:09 +03:00
Eero Tamminen
713c1ab170 Move GPU plugin CLI options to a struct
To help in:
* adding more CLI options in next and later commits, and
* to replace magic newDevicePlugin() input parameters with
  explicitly named one(s)
2021-05-05 17:09:09 +03:00
Eero Tamminen
06fac8128f Move GPU plugin sysfs device compatibility checks to own function
To reduce scan() function complexity before adding more functionality
to it.
2021-05-05 17:08:49 +03:00
Eero Tamminen
79b86fea2d Skip PF for "i915" resource when it has VFs
NOTE: this has impact only for GPUs which are virtualized with SR-IOV.

Access to physical devices (PFs) is disabled for "i915" resource when
they have configured virtual devices (VFs).

This is because:

* GPU resources are expected to be evenly split between VFs in such
  configurations

* But PF resource amount is expected to differ from VFs and typically
  retain only enough resources (just few MB of RAM), to be able to
  provide GPU metrics that are not available from VFs

* Neither the current GPU plugin, nor Kubernetes scheduling in
  general, has proper support for heterogeneous GPUs (= capability
  based scheduling)

Therefore "i915" resource needs to be limited to GPU devices with
homogeneous amount of resources, which in SR-IOV configurations is
expected to be the case only with VFs (when such are present).
2021-05-05 14:13:48 +03:00
Dmitry Rozhkov
38a59a57ea
Merge pull request #626 from mythi/PR-2021-028
sgx: add note about the SGX DCAP driver usage
2021-04-28 08:42:02 +03:00
Mikko Ylinen
111b833ea8 sgx: add note about the SGX DCAP driver usage
The SGX DCAP out-of-tree v1.41 driver is also known to work
with the SGX plugin. However, the default NFD labeling does not
work with the out-of-tree driver so warn users about it.

Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2021-04-27 22:10:21 +03:00
Eero Tamminen
e418c00fca Add "i915_monitoring" resource to GPU plugin
Which mounts all (Intel) GPU devices to requesting container.

This is needed e.g. to get GPU metrics from the node.  Requesting pod
does not know how many GPUs are on the node it gets assigned to, so
there needs to means to request them all.

(Only alternative for the new resource would be requesting Privileged
mode, which is clearly worse as that would grant pod access also to
all other devices and capabilities.)

This commit also:

* Adds "i915_monitoring" resource testing to: go test -v -run Scan

* Splits GPU plugin tests mock file system setup to a separate
  createTestFiles() function because otherwise TestScan() does not
  pass project's golangci-lint complexity limits
2021-04-27 14:21:05 +03:00
Ed Bartosh
08c2094329 update to cert-manager v1.3.1
Signed-off-by: Ed Bartosh <eduard.bartosh@intel.com>
2021-04-22 14:45:39 +03:00
Dmitry Rozhkov
3892baa4be
Merge pull request #615 from eero-t/gpu-plugin-testing-improvements
Gpu plugin testing improvements
2021-04-20 09:47:10 +03:00
Mikko Ylinen
280bdceb2a sgx: add separate admissionwebhook image
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2021-04-14 08:09:33 +03:00
Ed Bartosh
31614592c6
Merge pull request #599 from ozhuraki/operator-select-device-type
Make it possible to select supported devices in the operator
2021-04-12 19:09:59 +03:00
Ukri Niemimuukko
bb44156d4f gpu_nfdhook: make memory parsing more robust
This add support for parsing also hex and octal amounts.

Signed-off-by: Ukri Niemimuukko <ukri.niemimuukko@intel.com>
2021-04-09 16:23:48 +03:00
Oleg Zhurakivskyy
6fbf7c9182 operator: README: Document per device deployment
Signed-off-by: Oleg Zhurakivskyy <oleg.zhurakivskyy@intel.com>
2021-04-08 10:53:04 +00:00
Oleg Zhurakivskyy
2d27602ed0 operator: Add --device command line to operator
Add --device command line to operator's main.go which defines
the controllers/webhooks to set up.

Signed-off-by: Oleg Zhurakivskyy <oleg.zhurakivskyy@intel.com>
2021-04-08 10:33:47 +00:00
Eero Tamminen
f9158c1c3b Update GPU plugin copyrights 2021-04-01 15:20:35 +03:00
Eero Tamminen
8ca19d408f Fix GPU plugin error messages 2021-04-01 15:20:35 +03:00
Eero Tamminen
384d37ead0 Add test for multiple GPU devices 2021-04-01 15:20:35 +03:00
Eero Tamminen
49354693fb Fix GPU plugin test setup + better error message
Tests fail depending in which order they are run, unless mocked files
are cleaned between test runs.

Without this, the next commit would fail.
2021-04-01 15:20:35 +03:00
Mikko Ylinen
97bcecda04 operator: update usage guidelines
As the operator container image is available from a registry, we should
guide users to use it rather than build and deploy it locally.

Further, drop (un)deploy-operator targets in favor of simply using
kubectl for deployment.

Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2021-03-30 15:33:09 +03:00
Dmitry Shmulevich
c8b5dce247 added an option to create a node label if epc memory is present
updated README for SGX device plugin

Signed-off-by: Dmitry Shmulevich <dmitry.shmulevich@gmail.com>
2021-03-18 11:53:49 -07:00
Ukri Niemimuukko
f89b61f923 add tile count label
Signed-off-by: Ukri Niemimuukko <ukri.niemimuukko@intel.com>
2021-02-26 20:39:48 +02:00
Mikko Ylinen
15ad4ed54b ci: drop master branch from workflow triggers
Also, polish the remaining docs hits to 'master'.

Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2021-02-23 10:51:04 +02:00
DougTW
7153923cfc Edited qat_plugin README
Replaced multiple instances of master with main.
Reworded line 15 "Verify QAT device plugin is registered" removed 'on master'
and corresponding section heading. Related to pr499.

Signed-off-by: DougTW <doug.martin@intel.com>
2021-02-18 13:59:40 +02:00
Mikko Ylinen
abfa3496a2 sgx: update SGX SDK/DCAP versions
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2021-02-18 09:31:28 +02:00
Mikko Ylinen
f8c20905aa update to cert-manager v1.2.0
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2021-02-12 15:39:07 +02:00
Mikko Ylinen
37618d4f85 operator: move deviceplugin/v1 CRDs to cluster scope
The device plugins daemonsets are cluster wide and currently only
one device plugin instance per device is possible so making the
corresponding deviceplugin/v1 CRDs non-namespaced (i.e., scope: cluster)
fits better.

Previously, the device plugin daemonset was deployed in the same
namespace as the CR for that device but with the cluster scoped CRDs
we default to use the same namespace as the operator, unless overridden
via DEVICEPLUGIN_NAMESPACE env variable or a command line parameter
to operator manager deployment.

Three additional changes in this commit:
- enable DSA envtest tests
- update controller-runtime to v0.8.1
- change device plugin envtest suite to use klog/v2

Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2021-02-11 11:41:47 +02:00
Mikko Ylinen
c1f609c34a
Merge pull request #560 from DougTW/dm-edits-gpu-plugin
edited gpu_plugin README; changed 2 instances of master to main.
2021-02-10 15:00:54 +02:00
Mikko Ylinen
2409427939
Merge pull request #561 from DougTW/dm-edits-operator
Edited operator README. Changed 1 instance of master to main, line 78.
2021-02-10 10:36:56 +02:00
Mikko Ylinen
667aa943a4
Merge pull request #563 from DougTW/dm-edits-sgx-plugin
Editing sgx_plugin README. Replacing 'master' with 'main'.
2021-02-10 10:35:51 +02:00
Ed Bartosh
d446be3c3d
Merge pull request #558 from DougTW/dm-edits-fpga-adms-readme
fpga_admissionwebhook README.md; changed master to main
2021-02-10 10:15:04 +02:00
DougTW
a856f3215d Editing sgx_plugin README. Replacing 'master' with 'main'. Related to pr499.
Signed-off-by: DougTW <doug.martin@intel.com>
2021-02-09 17:17:05 -08:00
DougTW
80a7e4e651 Edited operator README. Changed 1 instance of master to main, line 78.
Signed-off-by: DougTW <doug.martin@intel.com>
2021-02-09 16:59:20 -08:00
DougTW
625b30fd1b Fixes 560. Edited gpu_plugin README. Restored master to line 157
Signed-off-by: DougTW <doug.martin@intel.com>
2021-02-09 16:49:30 -08:00
Mikko Ylinen
965936d8c3
Merge pull request #553 from bart0sh/PR0103-implement-dsa-operator
operator: add DSA support
2021-02-09 16:24:41 +02:00
DougTW
28cbebc81b edited gpu_plugin README; changed 2 instances of master to main. Related to PR 499.
Signed-off-by: DougTW <doug.martin@intel.com>
2021-02-08 18:40:47 -08:00
DougTW
467d4082d3 fpga_plugin-readme; changed one instance of master to main. Related to PR 499.
Signed-off-by: DougTW <doug.martin@intel.com>
2021-02-08 18:14:34 -08:00
DougTW
5ee1b6ce23 fpga_admissionwebhook README.md; changed master to main
Signed-off-by: DougTW <doug.martin@intel.com>
2021-02-08 17:24:46 -08:00
Ed Bartosh
884f8e3dfe operator: add DSA support
Fixes: #443

Signed-off-by: Ed Bartosh <eduard.bartosh@intel.com>
2021-02-09 02:13:27 +02:00
Mikko Ylinen
7561501a51
Merge pull request #550 from dmitsh/ds-ext-res
added implementation of EPC extended resource advertiser
2021-02-08 19:53:46 +02:00
Dmitry Shmulevich
3c3a3d1145 added implementation of EPC extended resource advertiser
Signed-off-by: Dmitry Shmulevich <dmitry.shmulevich@gmail.com>
2021-02-04 17:35:17 -08:00
Mikko Ylinen
e94857ce5d docs: harmonize device plugins operator naming
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2021-02-04 15:12:37 +02:00
Mikko Ylinen
0892a34705 move to k8s.io v1.20.x and klog/v2 v2.4.0
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2021-01-21 15:34:39 +02:00
Dmitry Rozhkov
771b0c7432
Merge pull request #544 from mythi/PR-2021-003
sgx: change getDefaultPodCount() logic
2021-01-13 10:31:16 +02:00
Mikko Ylinen
ed3a650ddd sgx: change getDefaultPodCount() logic
Decouple the default enclaveLimit/provisionLimit from core count. With
this change, the default limit is constant and it can be made relative
to core count by setting PODS_PER_CORE multiplier via env variable.

Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2021-01-12 20:24:46 +02:00
Ed Bartosh
6b208f8acf documentation: remove kubelet configuration check
Removed device plugin socket check from the documentation as
device plugin support is enabled by default in Kubelet.

Signed-off-by: Ed Bartosh <eduard.bartosh@intel.com>
2021-01-12 13:00:20 +02:00
Mikko Ylinen
da4a9fca96 qat: add note about vfio-pci module parameters
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2021-01-11 18:48:43 +02:00
Ed Bartosh
b007dc26f5 dsa: fix kubectl command line
Fixed kubectl command line to get allocatable DSA resources

Signed-off-by: Ed Bartosh <eduard.bartosh@intel.com>
2020-12-30 15:37:16 +02:00
Ed Bartosh
2e4de52f2b implement DSA demo
- Impelemented demo image that runs accel-config tests
- Added testing instructions to the documentation

Signed-off-by: Ed Bartosh <eduard.bartosh@intel.com>
2020-12-28 14:45:25 +02:00
Ukri Niemimuukko
5d31dca018 gpu_nfdhook: remove devfs dependency
This removes the devfs dependency. Sysfs is sufficient for scanning
presense of GPUs.

Signed-off-by: Ukri Niemimuukko <ukri.niemimuukko@intel.com>
2020-12-23 15:43:48 +02:00
Mikko Ylinen
aef2e1655e qat: run TestScanPrivate tests in parallel
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2020-12-23 11:18:21 +02:00
Mikko Ylinen
26d4b6f3a8 qat: fix device ID validation
It looks that for a long time now we have accepted a setup where a valid QAT
device ID is accepted as a QAT device resource even though the device is
not "enabled" via kernelVfDrivers parameter.

Fix device ID validation to skip valid QAT devices that are not
explicitly specified in kernelVfDrivers.

Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2020-12-21 14:33:27 +02:00
Mikko Ylinen
85fce2dcab qat: rework device scanning
The updated dp.scan() changes the way how VF devices are detected. The
main reason for the change is to take into account cases where the QAT VF
driver is not present in the system at all but only the PF driver is
loaded (and the SR-IOV devices are are enabled).

The rework also takes into account bare metal and VM deployments and
adds a test case for checking the virtualized environment.

Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2020-12-18 15:33:25 +02:00
Mikko Ylinen
2155a24e73 qat: add new devices and change defaults
The plugin now detects/accepts 4xxx and c4xxx devices too
and defaults to those drivers that are part of Linux mainline.

Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2020-12-17 15:23:00 +02:00
Mikko Ylinen
621122e456 sgx_epchook: update to cpuid/v2
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2020-12-15 19:58:13 +02:00
Ed Bartosh
2e7367eab3 fpga hook: language cleanup
Signed-off-by: Ed Bartosh <eduard.bartosh@intel.com>
2020-12-10 10:58:40 +02:00
Mikko Ylinen
312b771ab7
Merge pull request #494 from bart0sh/PR0093-DSA-draft
Implement DSA plugin
2020-12-09 15:15:46 +02:00
Mikko Ylinen
18ec3a449e qat: move to path/filepath
We have both "path" and "path/filepath" but the latter provides
everything needed so move it completely.

Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2020-12-08 07:38:20 +02:00
Mikko Ylinen
ad8bbcea21 qat: rework bus-device-function handling
The code was stripping out "0000:" (bus) and then adding
it back in several places.

That's not necessary so this change simplifies QAT VF addr
handling by operating using full BDF IDs.

Moveover, simplify function calls: use getDpdkDevice() once
for each VF device.

Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2020-12-08 07:37:16 +02:00
Ed Bartosh
174643436a implement DSA plugin
Signed-off-by: Ed Bartosh <eduard.bartosh@intel.com>
2020-12-03 17:24:48 +02:00
Dmitry Rozhkov
f0fa9df292 operator: prepare for publishing at operatorhub.io 2020-11-24 18:35:56 +02:00
Mikko Ylinen
d65cb902e6 sgx: move to RFC v4x device API
The SGX device nodes have changed from /dev/sgx/[enclave|provision]
to /dev/sgx_[enclave|provision] in v4x RFC patches according to the
LKML feedback.

This changes moves to use the new device nodes. Backwards compatibility
is provided by adding /dev/sgx directory mount to containers. This
assumes the cluster admin has installed the udev rules provided in the
README to make the old device nodes as symlinks to the new device nodes.

Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2020-11-18 21:17:28 +02:00
Dmitry Rozhkov
5ec466b2eb add known issue for operator 2020-11-12 11:23:41 +02:00
Alexander D. Kanevskiy
75355c9937
Merge pull request #497 from bart0sh/PR0094-move-GetAPIVersion-out-of-NewPort
fpga: move GetAPIVersion call out of NewPort and NewFME
2020-11-11 12:09:13 +02:00
Ed Bartosh
2c73e2a0b3 fpga: move GetAPIVersion call out of NewPort and NewFME
This call is implemented by calling ioctl, which raises
"open /dev/intel-fpga-port.X: operation not permitted" error
when called inside unprivileged container.

This breaks FPGA plugin.

Calling this API from fpga_tool is still OK, so
moving calls there should fix the issue.

Signed-off-by: Ed Bartosh <eduard.bartosh@intel.com>
2020-11-10 16:44:20 +02:00
Dmitry Rozhkov
5f0da56045 Upgrade to k8s v1.19.3 2020-11-10 16:09:20 +02:00
Ed Bartosh
680da54fd9 fpga: improve port init
Used generic newPort API instead of device-specific
newDflPort and newIntelFpgaPort.

Signed-off-by: Ed Bartosh <eduard.bartosh@intel.com>
2020-11-01 01:47:49 +02:00
Dmitry Rozhkov
25a52b0b74
Merge pull request #478 from bart0sh/PR0091-FPGA-SRIO-V
fpga: reimplement device discovering
2020-10-30 10:05:05 +02:00
Mikko Ylinen
0f6eefee23 sgx: add documentation
This commit documents the SGX building blocks for Kubernetes and
how to deploy them in the cluster.

Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2020-10-27 15:02:40 +02:00
Ed Bartosh
243870a707 fpga: reimplement device discovering
Reimplemented discovering of the FPGA devices using
APIs from pkg/fpga/intel_fpga_linux. The APis are also
used in the fpga_tool utility.

The API is more advanced and supports SR-IOV among other
things.

Fixes: #372

Signed-off-by: Ed Bartosh <eduard.bartosh@intel.com>
2020-10-26 21:45:52 +02:00
Dmitry Rozhkov
87143355ba
Merge pull request #483 from mythi/sgx-nfd
sgx: make SGX NFD kustomization overlay independent
2020-10-26 13:25:36 +02:00
Ukri Niemimuukko
5b5180ae00 gpu_nfdhook memory amount reading from sysfs
This adds reading of the GPU memory amount from the sysfs. As a
fallback the environment variable GPU_MEMORY_OVERRIDE remains.

Another environment variable GPU_MEMORY_RESERVED can be used to
reserve a dedicated byte amount outside of kubernetes usage.

Signed-off-by: Ukri Niemimuukko <ukri.niemimuukko@intel.com>
2020-10-26 09:45:43 +02:00
Mikko Ylinen
161298190f sgx: make SGX NFD kustomization overlay independent
With the addition of SGX webhook in the operator, full SGX stack
depends on having the operator deployed first. SgxDevicePlugin CRD
is set to get intel-sgx-plugin and intel-sgx-initcontainer deployed
by the operator.

As a pre-requisite, node-feature-discovery must be deployed but it
is currently deployed via sgx_plugin kustomization overlay only.

It's better to allow NFD with the SGX specific settings deployed with
a kustomization of its own.

Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2020-10-23 12:44:36 +03:00
Mikko Ylinen
e9dec450d6 improve docs for no_proxy when using cert-manager
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2020-10-21 14:57:41 +03:00
Mikko Ylinen
4e5eae62c4 update to cert-manager v1.0.3
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2020-10-16 22:37:57 +03:00
Ukri Niemimuukko
505eadaf94 gpu-plugin nfd-hook
This adds an nfd-hook for the gpu-plugin, which will create labels
for the GPUs that can then be used for POD deployment purposes or
creation of GPU extended resources which allow then finer grained
GPU resource management.

The nfd-hook will install to the host system when the
intel-gpu-initcontainer is run. It is added into the plugin deployment
yaml.

Signed-off-by: Ukri Niemimuukko <ukri.niemimuukko@intel.com>
2020-10-01 12:02:57 +03:00
Kevin Putnam
1d149ffee6 Documentation: Fixes broken links and standardizes headers.
Signed-off-by: Kevin Putnam <kevin.putnam@intel.com>
2020-09-22 08:32:21 -07:00
Dmitry Rozhkov
1b82ab9df6 sync README.md files with the current state of the code
Closes #356
2020-09-16 10:54:39 +03:00
Mikko Ylinen
33a4f8f546 sgx: add SgxDevicePlugin CRD and admission webhook
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2020-09-10 15:31:26 +03:00
Ukri Niemimuukko
b2991b94e1 gpu_plugin: reduce topology scanning for high shared dev count
For every created device info, a new topology scan is performed in
the filesystem. The shared dev count was implemented so that for each
shared device, a new device info was created, which resulted in the
topology scan happening as many times per Scan-round, as there were
shared devs.

This fixes the issue by making the device info to be shared among the
shared devices.

Signed-off-by: Ukri Niemimuukko <ukri.niemimuukko@intel.com>
2020-09-08 18:57:29 +03:00
Dmitry Rozhkov
9bdf3a4def
Merge pull request #440 from mythi/ctrl-runtime-062
go.mod: update controller-runtime to v0.6.2
2020-09-03 12:02:06 +03:00
Dmitry Rozhkov
41e23dab3f
Merge pull request #438 from mythi/updates-20200901
.gitignore + kind + cert-manager v1.0.0
2020-09-03 12:00:33 +03:00
Alexander Kanevskiy
c74cb563dc Implemented SR-IOV Release/Assign ioctl
fpgatool now able to prepare FME via kernel ioctl to release and
assign ports for SR-IOV configurations.
2020-09-02 18:16:53 +03:00
Mikko Ylinen
f0d4754d53 move to cert-manager v1.0.0
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2020-09-02 18:07:05 +03:00
Mikko Ylinen
76aa7b91f0 go.mod: update controller-runtime to v0.6.2
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2020-09-02 15:16:12 +03:00
Dmitry Rozhkov
71075d4478 lint: enable exportloopref, prealloc and scopelint checks 2020-08-31 11:10:51 +03:00
Dmitry Rozhkov
be713f1c8b lint: enable errcheck 2020-08-28 16:14:14 +03:00
Mikko Ylinen
6b2148d22c
Merge pull request #431 from rojkov/staticcheck
linter: enable staticcheck
2020-08-26 18:08:09 +03:00
Ukri Niemimuukko
7244bd0f25 gpu_plugin: README.md update
Move remark about GVT-d to end of introduction. Remove remarks
about GVT-g for the time being.

Signed-off-by: Ukri Niemimuukko <ukri.niemimuukko@intel.com>
2020-08-25 13:45:10 +03:00
Dmitry Rozhkov
7ff08ee874 linter: enable staticcheck 2020-08-25 09:54:59 +03:00
Mikko Ylinen
a5f648077e sgx: add NFD EPC source, README and deployment YAMLs
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2020-08-24 16:33:45 +03:00
Ismo Puustinen
3ab60b4027 sgx: add tests for the plugin.
Signed-off-by: Ismo Puustinen <ismo.puustinen@intel.com>
2020-08-24 16:33:45 +03:00
Ismo Puustinen
8751afb6c7 sgx: add new plugin.
The SGX plugin exposes two device files as separate resources:

  * /dev/sgx/enclave   as sgx.intel.com/enclave
  * /dev/sgx/provision as sgx.intel.com/provision

The number of resources is configurable, but it's intended to be equal
to the pod count by default, so that any pod requiring access would have
it. The access control (who can do SGX remote attestation) is done
outside this plugin.

Signed-off-by: Ismo Puustinen <ismo.puustinen@intel.com>
2020-08-24 16:33:45 +03:00
Mikko Ylinen
cd068c797a ci: update tool versions
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2020-08-21 17:04:04 +03:00
Dmitry Rozhkov
200e2f8181 operator: add simple FPGA operator combined with FPGA webhook 2020-08-18 17:32:23 +03:00
Ed Bartosh
4794072273
Merge pull request #422 from rojkov/fpga-kubebuilder
fpga webhook: reimplement to use kubebuilder framework
2020-08-18 13:31:31 +03:00
Dmitry Rozhkov
a62c6f7d5e fpga webhook: reimplement to use kubebuilder framework
Simplify upgrade procedure to newer versions of kubernetes by relying on the
kubebuilder framework rather than using codegen directly.

Closes #377
2020-08-17 12:09:03 +03:00
Mikko Ylinen
1cfb849eef qat: update QAT software stack
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2020-08-12 23:08:59 +03:00
Dmitry Rozhkov
e87d94d4fb fpga: finalize plugin kustomization
closes #318
2020-07-01 11:57:45 +03:00
Mikko Ylinen
2f16509fe3
Merge pull request #376 from rojkov/operator-v3
operator: initial version with gpu and qat controllers
2020-06-25 15:49:49 +03:00
Dmitry Rozhkov
6b2fa0a264 operator: initial version with gpu and qat controllers 2020-06-25 13:48:41 +03:00
Alexander D. Kanevskiy
79ef9d54e2
Merge pull request #397 from rojkov/nakedret
fpga_tool: enable nakedret check
2020-06-24 20:52:33 +03:00
Dmitry Rozhkov
7177409f19 fpga webhook: rework deployment to use kustomize
Contributes to #318
2020-06-23 15:53:36 +03:00
Dmitry Rozhkov
339cdee501 linter: enable nakedret check 2020-06-23 12:04:35 +03:00
Mikko Ylinen
bc22a07638
Merge pull request #398 from rojkov/gosec
linter: enable gosec check
2020-06-16 16:16:02 +03:00
Dmitry Rozhkov
73aea0aa1b linter: enable gosec check 2020-06-11 17:56:24 +03:00
Dmitry Rozhkov
828e12f896 doc: add note about proxy to webhook doc 2020-06-11 16:06:54 +03:00
Dmitry Rozhkov
70f862f2aa add golangci linter
In this initial commit the following checks are disabled due to
excessive amount of changes required:
- dupl (duplicate code)
- funlen (function length)
- goerr113 (errors handling expressions)
- gomnd (magic numbers)
- gosec (security)
- nakedret (naked returns)
- wsl (forces to use empty lines)
- errcheck (checking for unchecked errors)
- staticcheck (static analysis)
2020-06-08 14:01:13 +03:00
Dmitry Rozhkov
aabc45cbb5 gpu: increase code coverage for unit tests 2020-05-19 16:14:40 +03:00
Dmitry Rozhkov
c63dbf61b8 fpgawebhook: move to v2 API of fpga.intel.com group 2020-05-04 15:43:20 +03:00
Dmitry Rozhkov
99fcb69d33 fpga: compress fpga AF resource names 2020-04-29 11:59:50 +03:00
Dmitry Rozhkov
6c2eacfae5 webhook: remove mode of operation
fpga: make AFU resource name 63 char long

webhook: drop mode from README

webhook: extend mappings description

webhook: tighten CRD definitions

webhook: drop mapping to non-existing afuId

explicitly state mappings names can be in any format

use consistent terminology across fpga webhook and plugin
2020-04-22 13:55:43 +03:00
Dmitry Rozhkov
8fc187f4d8 move to k8s v1.18.2 release
Also fix the plugins and e2e tests
2020-04-17 12:40:18 +03:00
Mikko Ylinen
e4a57899d2 qat: fix UIO mounts
DPDK uses /sys/class/uio/uioX/device/[control|resource*] and we
had special mounts for the individual uioX paths. However, it turned
out this wasn't working as expected: host's /sys/class/uio/uioX/device/
was mounted to container's /sys/class/uio and DPDK failed to find
uioX/device/[control|resource*] files. Moreover, workloads requesting
more than one QAT resource, still saw only one path.

While cri-o/containerd give sysfs read-only mounts, DPDK needs
device/config RW. Therefore, we need to mount host /sys/class/uio/uioX
to container /sys/class/uio/uioX for each requested device.

Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2020-04-01 09:08:55 +03:00
Ed Bartosh
2ec6677ab0 fpga tests cleanup
- used t.Run api for better visibility
- used ioutil.TempDir to create temporary directories

Signed-off-by: Ed Bartosh <eduard.bartosh@intel.com>
2020-03-31 14:36:15 +03:00
Ed Bartosh
a668c596b2 fpga_crihook: improve unit tests
- increased test coverage to 91.4%
- cleaned up the code
- removed unused test data

Signed-off-by: Ed Bartosh <eduard.bartosh@intel.com>
2020-03-31 11:57:06 +03:00
Alek Du
cfbb69ddd6 vpu: improve test coverage
Changed code a little bit to improve test coverage:
* call Scan in test code
* call Scan without hddl socket
* call Scan with 0 SharedDevNum
* move SharedDevNum in newDevicePlugin
* use Ticker instead of Sleep

Signed-off-by: Alek Du <alek.du@intel.com>
2020-03-31 14:12:59 +08:00
Graham Whaley
71d08224ee fpga: move to using klog for logs and debug
Move all the fpga components to using klog for logging
and debug. This includes replacing our homebrew 'fatal()'
with klog.Error().

Modify the deployment files to move from `-debug` to
`-v`, and set their default level to '1' (Info), rather
than full debug mode ('4').

Signed-off-by: Graham Whaley <graham.whaley@intel.com>
2020-03-24 14:31:53 +00:00
Ed Bartosh
cf731f3c18 fpga plugin: increase test coverage 2020-03-24 15:46:39 +02:00
Ed Bartosh
29be713a96 fpga_plugin: use time.Ticker instead of time.Sleep
Signed-off-by: Ed Bartosh <eduard.bartosh@intel.com>
2020-03-24 13:32:35 +02:00
Mikko Ylinen
a6bf48f8db dpdkdrv: improve unit test coverage
Add NewDevicePlugin() tests to improve test coverage. This also
contributes to "input validation" (part of #321) that wasn't done
properly before.

Fixes: #325

Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2020-03-24 08:23:44 +02:00
Mikko Ylinen
336d2b34bc
Merge pull request #340 from grahamwhaley/20200316_klog_vpu
vpu: move to using klog
2020-03-24 08:10:21 +02:00
Graham Whaley
626bbb6ee2 gpu: move to using klog
Move from fmt to klog for logging and debug.
Also add an extra info level message noting when we find
new devices.

Signed-off-by: Graham Whaley <graham.whaley@intel.com>
2020-03-20 11:54:38 +00:00
Graham Whaley
82713d0cf9 vpu: move to using klog
Move to using klog for logging and debug for vpu plugin.

Signed-off-by: Graham Whaley <graham.whaley@intel.com>
2020-03-20 11:38:20 +00:00
Mikko Ylinen
15d4b10715
Merge pull request #329 from grahamwhaley/20200312_klog
klog: Add klog logging to framework and qat plugins
2020-03-19 16:59:44 +02:00
Graham Whaley
f8dbc896a1 devicemanager: qat: use klog for logging and debug
Move the framework, and the qat driver, to use `klog`
for logging and debug.

This has a some noticeable effects:

1) Our default log output gains a bunch of annotation:
From:
    QAT device plugin started in 'dpdk' mode
To:
    I0312 11:51:02.057728    6053 qat_plugin.go:64] QAT device plugin started in 'dpdk' mode

(there is now a command line option to drop those annotations if
necessary).

2) We gain a bunch of command line parameters from klog for controlling log
levels and output. We go from 5 arguments to 17:

---
Usage of ./cmd/qat_plugin/qat_plugin:
  -add_dir_header
        If true, adds the file directory to the header
  -alsologtostderr
        log to standard error as well as files
  -debug
        enable debug output
  -dpdk-driver string
        DPDK Device driver for configuring the QAT device (default "vfio-pci")
  -kernel-vf-drivers string
        Comma separated VF Device Driver of the QuickAssist Devices in the system. Devices supported: DH895xCC,C62x,C3xxx and D15xx (default "dh895xccvf,c6xxvf,c3xxxvf,d15xxvf")
  -log_backtrace_at value
        when logging hits line file:N, emit a stack trace
  -log_dir string
        If non-empty, write log files in this directory
  -log_file string
        If non-empty, use this log file
  -log_file_max_size uint
        Defines the maximum size a log file can grow to. Unit is megabytes. If the value is 0, the maximum file size is unlimited. (default 1800)
  -logtostderr
        log to standard error instead of files (default true)
  -max-num-devices int
        maximum number of QAT devices to be provided to the QuickAssist device plugin (default 32)
  -mode string
        plugin mode which can be either dpdk (default) or kernel (default "dpdk")
  -skip_headers
        If true, avoid header prefixes in the log messages
  -skip_log_headers
        If true, avoid headers when opening log files
  -stderrthreshold value
        logs at or above this threshold go to stderr (default 2)
  -v value
        number for the log level verbosity
  -vmodule value
        comma-separated list of pattern=N settings for file-filtered logging
---

3) Our `-debug` flag is now replaced by the `klog` `-v n` flag.

*NOTE:* This is potentially a minor breaking change. Applying
this debug overlay to any previous (pre-klog edit) images will
cause the container to fail to launch, as it will not recognise
the new `-v` arguments.

We also update the kustomize deployment to move from using
DEBUG env vars to adding a VERBOSITY var that controls both
the log verbosity and now the debug mode enabling.

Signed-off-by: Graham Whaley <graham.whaley@intel.com>
2020-03-19 11:20:48 +00:00
Mikko Ylinen
b021152eb8 qat: kerneldrv: fix device registration when run in VMs
Kerneldrv checks for available devices based on adf_ctl output.
We only accepted two cases: PFs if IOMMU is off and VFs if IOMMU
is on.

The right check is to only skip PFs if IOMMU is on and allow other
cases. This fixes two scenarios: when run in VMs, we accept VFs
regardless of (v)IOMMU presence.

Moreover, do not hard code domain '0000:' because it is not the
case always.

Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2020-03-16 20:17:57 +02:00
Alek Du
7c2bc3bda0 vpu_plugin: add kustomizations
- Default deployment: `kubectl apply -k deployments/vpu_plugin`
- Default deployment does not specify namespace anymore
  (was: `kube-system`)
- Variant: deploy to `kube-system` instead of user-defined namespace
  (or `default`)
  `kubectl apply -k deployments/vpu_plugin/overlays/namespace_kube-system`
- VPU plugin README updated.
- Change volume mounts to readonly when possible

Signed-off-by: Alek Du <alek.du@intel.com>
2020-02-25 14:53:26 +08:00
Mikko Ylinen
332fbdc35c
Merge pull request #300 from askervin/55B_fpga_kustomization
fpga plugin kustomization, stage 2
2020-02-24 22:20:27 +02:00
Antti Kervinen
5fe8174077 fpga_plugin: add kustomization files
- Add script/fpga-plugin-prepare-for-kustomization.sh, creates contents
  for the secret needed by the fpga plugin webhook.
- Single-command fpga plugin + webhook deployment for both modes:
  - `kubectl create -k deployments/fpga_plugin/overlays/af`
  - `kubectl create -k deployments/fpga_plugin/overlays/region`
- Change intel-fpga-plugin image CMD to ENTRYPOINT.
2020-02-24 16:32:26 +02:00
Ed Bartosh
ca5d144e8e
Merge pull request #296 from mythi/gomod
fpga_plugin: drop dependency to k8s.io/kubernetes
2020-02-24 14:10:12 +02:00
Ed Bartosh
13836c2d09
Merge pull request #299 from mythi/gitclone
READMEs: use git clone to get the code
2020-02-24 12:42:32 +02:00
Mikko Ylinen
61c135d1d6 fpga_plugin: drop dependency to k8s.io/kubernetes
This commit drops fpga_plugin dependency to k8s.io/kubernetes which
was used to get GetHostname(). After this change, the plugin node
name can be set using new -node-name parameter. The default value for
is read from NODE_NAME environment variable.

If the node annotation override check fails, we continue with the default
mode parameter and do not exist like we did previously.

Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2020-02-21 18:48:30 +02:00
Mikko Ylinen
f145541caf READMEs: use git clone to get the code
go get'ing does not work due to our k8s.io/kubernetes dependency
so guide users to use git clone to get the code.

Fixes: #290

Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2020-02-20 08:04:07 +02:00
Antti Kervinen
d04aa77ac5 fpga_plugin: orchestration/orchestrated fixed in READMEs
Not touching "orchestration programmed". Fixing only instances where
this refers directly to the mode recognized by the webhook-deploy.sh
script.

Signed-off-by: Antti Kervinen <antti.kervinen@intel.com>
2020-02-17 16:32:54 +02:00
Dmitry Rozhkov
3db440d2d4
Merge pull request #288 from askervin/kustomize-gpu
gpu_plugin: add kustomizations
2020-02-11 10:54:14 +02:00
Ed Bartosh
1f4928790f Implement function for DeviceInfo creation
- Made DeviceInfo fields private
- Implement NewDeviceInfo constructor
2020-02-07 15:26:37 +02:00
Antti Kervinen
d568f050c5 gpu_plugin: add kustomizations
- Default deployment: `kubectl apply -k deployments/gpu_plugin`
- Default deployment does not specify namespace anymore
  (was: `kube-system`).
- Variant: deploy only on nodes with Intel GPU label by NFD:
  `kubectl apply -k deployments/gpu_plugin/overlays/nfd_labeled_nodes`
- Variant: deploy to `kube-system` instead of user-defined namespace
  (or "default"):
  `kubectl apply -k deployments/gpu_plugin/overlays/namespace_kube-system`
- GPU plugin README updated.

Signed-off-by: Antti Kervinen <antti.kervinen@intel.com>
2020-02-07 14:56:52 +02:00
Mikko Ylinen
f036b72cff
Merge pull request #286 from askervin/kustomize
qat_plugin: add kustomizations
2020-02-06 13:53:08 +02:00
Antti Kervinen
ec8eef6daa qat_plugin: add kustomizations
- Default deployment: `kubectl apply -k deployments/qat_plugin`
- Debug variant: `kubectl apply -k deployments/qat_plugin/overlays/debug`
- Single-resource `yaml` naming convention:
  applying x-y-z.yaml configures k8s resource named x-y-z.
- QAT plugin README updated.

Signed-off-by: Antti Kervinen <antti.kervinen@intel.com>
2020-02-05 15:48:57 +02:00
Mikko Ylinen
28a89a2820 qat: README: clarify crypto-perf usage
crypto-perf instructions were outdated and hand implicit
assumptions about the environment. More specifically:

Clear Linux builds DPDK libraries as shared so for the
compress and crypto test applications to run, the memory and
QAT PMD libraries must be explicitly preloaded using '-d' parameter.

Also, the test-crypto1 and test-compress1 deployments expect the
cluster is configured with CPU Manager's static policy.

Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2020-02-04 13:32:10 +02:00
Mikko Ylinen
0c89f242aa
Merge pull request #283 from alekdu/fix_readme
vpu: refactor the vpu plugin readme
2020-02-04 13:28:38 +02:00
Alek Du
6321c424ca vpu: refactor the vpu plugin readme
Just follow the standard format to fix the vpu plugin readme.
Also added the ubuntu OpenVINO demo job long logs.

Signed-off-by: Alek Du <alek.du@intel.com>
2020-02-04 18:15:27 +08:00
Ed Bartosh
20ea365e62
Merge pull request #268 from grahamwhaley/20200117_fpga_readme
fpga: docs: update all the READMEs
2020-02-03 12:52:09 +02:00
Ed Bartosh
7e6e053349
Merge pull request #279 from rojkov/cleanup
Cleanup
2020-01-31 15:59:34 +02:00
Graham Whaley
07e902334f fpga: crio: docs: update README
Update the CRI-O webhook README, adding notes about what it is and
does, and that it is normally installed as part of the device
plugin daemonset.

Signed-off-by: Graham Whaley <graham.whaley@intel.com>
2020-01-30 16:19:19 +00:00
Graham Whaley
f39a374e9d fpga_admission: docs: expand README
Expand the FPGA webhook admission controller README.

Signed-off-by: Graham Whaley <graham.whaley@intel.com>
2020-01-30 16:19:19 +00:00
Graham Whaley
27bc562478 fpga plugin: docs: Clean up and expand README
Expand and re-arrange the README. Add some details about what the
plugin and other FPGA components provide.

Signed-off-by: Graham Whaley <graham.whaley@intel.com>
2020-01-30 16:19:19 +00:00
Dmitry Rozhkov
456c8f3ff1 fpga: fix stutter reported by golint 2020-01-30 15:17:27 +02:00
Dmitry Rozhkov
7695e450de fpga_crihook: remove unused struct field 2020-01-29 17:17:06 +02:00
Dmitry Rozhkov
3a845cfe15 fpga: rename files to make them linux-only 2020-01-29 17:17:06 +02:00
Graham Whaley
6537e38499 gpu: do not fail if device scanning fails
If we fail to scan for GPU devices (note, that is potentially
different from not finding any devices during a scan), then
warn on it, and go around the poll loop again. Do not treat
it as a fatal error or we might end up in a re-launch death
deploy loop...

Of course, getting a warning in your logs every 5s could also
be annoying, but is somewhat 'less fatal'.

Fixes: #260
Fixes: #230

Signed-off-by: Graham Whaley <graham.whaley@intel.com>
2020-01-29 09:24:50 +00:00
Mikko Ylinen
9d76946b49
Merge pull request #269 from grahamwhaley/20200121_qat_readme
qat: docs: Update the README
2020-01-29 07:29:27 +02:00
Alek Du
887e56e780 VPU: Add Intel Movidius MyriadX VPU plugin support
This patch is to support Intel VCAC-A card (with MyriadX 2485 VPUs), for other
later on VPUs, we will reuse this plugin and add support.

VCAC-A board info is at:
https://www.intel.com/content/dam/www/public/us/en/documents/datasheets/media-analytics-vcac-a-accelerator-card-by-celestica-datasheet.pdf

Also add openvino HDDL VPU demo for Intel VCAC-A card.

Signed-off-by: Alek Du <alek.du@intel.com>
2020-01-28 23:17:50 +08:00
Graham Whaley
1ca19696e0 qat: docs: Update the README
Update the QAT README. Add some descriptions. Add information about
the command line and config options.

Signed-off-by: Graham Whaley <graham.whaley@intel.com>
2020-01-27 16:51:00 +00:00
Graham Whaley
958ab2aa7e fpga: docs: Add diagrams for FPGA modes
Add draw.io and their generated PNG files for both
orchestrated and preprogrammed FPGA modes. These will
then be used in the documentation.

Signed-off-by: Graham Whaley <graham.whaley@intel.com>
2020-01-27 14:55:15 +00:00
Graham Whaley
88cec1fd16 fpga_tool: doc: add a basic README
The fpga_tool had no README. Add a basic one.
Desired as we should at least reference the tool from the
fpga_plugins document.

Signed-off-by: Graham Whaley <graham.whaley@intel.com>
2020-01-17 16:36:40 +00:00
Graham Whaley
79a86c10e8 docs: gpu: Add more details, re-arrange section order
Re-arrange the section order a little (such as putting the use
of the DaemonSet before the sudo hand-deploy), and add a lot more
detail of what to expect, and how to check if the pod has launched
correctly.

Signed-off-by: Graham Whaley <graham.whaley@intel.com>
2020-01-17 13:34:13 +00:00
Graham Whaley
6705a8e461 docs: gpu: add high level details to README
Fill out the introduction to the GPU README to give some details around
what the plugin supports and how.

Signed-off-by: Graham Whaley <graham.whaley@intel.com>
2020-01-16 15:27:22 +00:00
Ed Bartosh
7aca59e032
Merge pull request #245 from rojkov/update-v1.17.0
bump k8s dependencies up to v1.17.0
2020-01-15 13:07:55 +02:00
Ed Bartosh
1b1206e39a fpga: change webhook service port
Changed port webhook is listening on from 443 to 8443 to be able
to bind to it from non-root user account.
2020-01-14 16:31:12 +02:00
Dmitry Rozhkov
814e2e1a50 bump k8s dependencies up to v1.17.0 2020-01-09 11:19:58 +02:00
Ed Bartosh
06c07a5961 deployments/fpga_plugin: limit host mounts
The default deployment gives rather wide host mounts.

Limited sysfs mount only to the subdirectory the plugin
needs.

Mounted sysfs and dev  mounts read-only.

Added notes that FPGA plugin can be run as non-root user.
2019-12-12 13:07:19 +02:00
Mikko Ylinen
fd631fc31c deployments/gpu_plugin: limit host mounts
The default deployment gives rather wide host mounts. We can limit
the mounts only to the subdirectories the plugin needs and mount
them read-only.

Also, add notes that both QAT and GPU plugins can be run as non-root
user.

Fixes: #228

Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2019-12-11 12:54:36 +02:00
Alexander Kanevskiy
67825dcc06 Fix admission hook for pods generated by ReplicaSet
In the pods generated automatically by Deployment/ReplicaSets
fields name and namespace might be missing.
We can use information about namespace from request itself.
2019-10-25 17:40:42 +03:00
Ed Bartosh
57b4927eda crihook: simplified NewHookEnv signature 2019-09-16 12:56:35 +03:00
Ed Bartosh
8d21aff5ac crihook: removed unused field 2019-09-16 12:51:50 +03:00
Ed Bartosh
73ac87cd8d crihook; fix forgotten error check 2019-09-16 12:50:29 +03:00
Ed Bartosh
a6b3a217e8 crihook: fix ineffective Errorf call
Returned error instead of calling errors.Errorf with no effect.
2019-09-16 12:49:26 +03:00
Ubuntu
4f28657b6b fpga: fixed documentation and demo 2019-09-10 19:30:20 -05:00
Alexander Kanevskiy
cd263ba287 Update README file for fpga_crihook
Initcontainer is now built in main build process, no need to download
anythin special.

Added note about checking OCI hooks configuration parameter in CRI-O

Fixes: #192
2019-08-25 02:37:07 +03:00
Alexander Kanevskiy
2430e204d5 fpga_tool: UX improvements
- user readable output for fpgainfo/fmeinfo/portinfo commands
- new commands: list, list-fme, list-port
- new -q flag to suppres headers, progress and too verbose messages
- install command will now fail if destination file already exist
- new --force flag: allows overwrite files in install command
- removed development and debug output
2019-08-25 02:37:07 +03:00
Alexander Kanevskiy
71bb38f496 Implemented native FPGA flashing
Removed dependency to OPAE libraries
2019-08-25 02:37:01 +03:00
Ed Bartosh
de9df8373e fpga_plugin: support in-tree kernel driver
Extended fpga plugin to support both in-tree(DFL) and
out-of-tree (OPAE) kernel drivers.

- fpga_crihook: move JSON parsing to separate functions
- decreased cyclomatic complexity of the CRI hook main() function
- increased readability
- increased test coverage

Signed-off-by: Ed Bartosh <eduard.bartosh@intel.com>
2019-08-24 18:27:15 +03:00
Alexander Kanevskiy
186ec6613c FPGA: migrate to ClearLinux environment
- Migrate to OPAE 1.3.2
- Build all the tools from the source
- ignore files in workspace
- minimal fpga_tool utility to check gbs/aocx file parsing and flashing
- implemented kernel IOCTL based flashing of bitstreams
- add PCI and sysfs functions
2019-08-24 02:55:19 +03:00
Mikko Ylinen
832e4aaf3c crypto-perf: add kustomization and move to deployments
We plan to use crypto-perf for simple QAT testing. This commit adds
kustomization to make the deployment easier. The original .yaml is
also moved to deployments/ with some changes.

For instance, it turns out also vfio-pci mode with DPDK needs CAP_SYS_ADMIN
(See PR: #187 which states that only igb_uio would need it).

kustomize is available part of kubectl since kubernetes v1.14.

Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2019-08-20 22:01:44 +03:00
Dmitry Rozhkov
a2debf6fb4 qat: fix typo 2019-08-19 12:52:16 +03:00
Mikko Ylinen
d92b528ab6 qat: document kerneldrv mode and build instructions
-mode kerneldrv comes with no documentation. This patch adds few
notes about it and instructions how to get it build if a user chooses
to have it enabled.

Closes: #197

Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2019-08-19 09:56:57 +03:00
Dmitry Rozhkov
8390388f89 qat: make users explicitly opt in to have kernel mode compiled in 2019-08-14 13:41:44 +03:00
Mikko Ylinen
08a079ead2 crypto-perf: use IPC_LOCK to ensure mmap() works
Change SYS_ADMIN to IPC_LOCK capability to ensure DPDK gets to mmap() hugepages.

Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2019-06-12 07:31:01 +03:00
Dmitry Rozhkov
156970adca
Merge pull request #185 from mythi/iommu
qat_plugin: kerneldrv: register VF devices when IOMMU is on
2019-05-31 14:08:29 +03:00
Mikko Ylinen
4ba6af14b9 qat_plugin: kerneldrv: register VF devices when IOMMU is on
When IOMMU is on in the system, the physical function (PF)
devices cannot be used. This prevented using kerneldrv as it
was only written to work with PFs.

However, QAT bare metal functions can also be used when IOMMU
is enabled. In this case, they must be used via virtual functions
(VF).

This commit makes it possible to use kerneldrv when IOMMU is
on. The added side benefit is we can now slice the same QAT HW
for both "dpdk" and "kernel" usages simultaneously.

Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2019-05-29 22:10:26 +03:00
Alexander Kanevskiy
4dc19851ee Pass correct PCI bus/device/function to fpgaconf
Partially helps with #148
2019-05-29 16:08:52 +03:00
Mikko Ylinen
4a80aa83e2 qat_plugin: kerneldrv: get device.id from inst_id
In adf_ctl output, qat_devX is a sequence number that includes both
PF and VF devices:

qat_dev0 - type: c6xx,  inst_id: 0,  node_id: 1,  bsf: 84:00.0, #accel: 5 #engines: 10 state: up
qat_dev1 - type: c6xx,  inst_id: 1,  node_id: 1,  bsf: 85:00.0, #accel: 5 #engines: 10 state: up
qat_dev2 - type: c6xx,  inst_id: 2,  node_id: 1,  bsf: 86:00.0, #accel: 5 #engines: 10 state: up
qat_dev3 - type: c6xxvf,  inst_id: 0,  node_id: 1,  bsf: 84:01.0, #accel: 1 #engines: 1 state: up
qat_dev4 - type: c6xxvf,  inst_id: 1,  node_id: 1,  bsf: 84:01.1, #accel: 1 #engines: 1 state: up
...

X cannot be used as the config file identified because it does not match
the real id of the device. inst_id gives this so move to use that to find
the right config file.

Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2019-05-28 15:49:45 +03:00
Alexander D. Kanevskiy
5a7c5079d4
Merge pull request #183 from rojkov/admission-readme
gpu: fix grammar
2019-05-24 17:29:54 +02:00
Alexander D. Kanevskiy
dd70b69c76
Merge pull request #182 from rojkov/verbosity
gpu: add log messages for not found cards
2019-05-24 16:33:50 +02:00
Dmitry Rozhkov
f5d5cd32ed gpu: fix grammar 2019-05-24 16:45:59 +03:00
Dmitry Rozhkov
44ff734be6 gpu: add log messages for not found cards
Let a user know the plugin can't find any Intel GPU if that's
the case. It might be cumbersome to realize that the plugin runs
on a host which doesn't have any Intel GPUs.

Also make the code less nested for better readability.
2019-05-24 16:19:06 +03:00
Dmitry Rozhkov
ea63ad94f2 webhook: add note on mapping applicability 2019-05-24 10:28:37 +03:00
Dmitry Rozhkov
da132f6584 qat: add kernel mode plugin 2019-04-25 14:15:32 +03:00
Rivera Gonzalez, Julio C
22b9c61c4d Adding support for dh895xcc devices
This commit adds the possibility to qat2_plugin use pci,
devices with communication chipset 8925 to 8955.

Signed-off-by: Rivera Gonzalez, Julio C <julio.c.rivera.gonzalez@intel.com>
2019-04-25 14:14:09 +03:00
Dmitry Rozhkov
ca569b0f70 qat: initial support for openssl QAT engine 2019-04-25 14:14:09 +03:00
Ed Bartosh
ea5a06dfae
Merge pull request #172 from rojkov/issue-167-namespaced-fpga-mappings
fpga: mutate pods with CRDs from its corresponding namespace
2019-04-09 14:35:56 +03:00
Dmitry Rozhkov
565045f6f2 fpga: mutate pods with CRDs from its corresponding namespace
CRDs for AF or Region mappings are scoped to namespaces. So an
admitted pod has to be mutated with CRDs existing in the same
namespace as the pod's.

Closes #167
2019-04-02 12:17:08 +03:00
Dmitry Rozhkov
4bf8c5e685 Fix compilation issues 2019-02-19 16:12:56 +02:00
nolancon
52df9329e4 Re-order devices in scan loop
Fixes: #146

Removed whitespace
2019-01-23 13:41:22 +00:00
Dmitry Rozhkov
54332c5eea announce deviceplugin API public 2019-01-21 17:20:01 +02:00
Dmitry Rozhkov
7662cb9154 extend API to receive full specs instead of strings 2019-01-21 17:15:27 +02:00
Dmitry Rozhkov
58b62f579b qat: fix numbering of env vars
An `Allocate()` request can be used to allocate resources for many
containers thus `counter` needs to be reset for each container
response.
2018-12-12 13:42:05 +02:00
ssehgal
100ecf8340 Improving consumption of devices by updating the environment variable name based on number of devices requested in a container(e.g. QAT0, QAT1) 2018-12-05 15:11:23 +00:00
nolancon
1bb035cc64 PostAllocate implemented in QAT device plugin 2018-12-05 15:11:23 +00:00
ssehgal
eb6d48a512 QAT README update and crypto perf image tag correction 2018-12-03 14:03:55 +00:00
Ed Bartosh
1215bc7fb7 admissionwebhook: fix region regexp
Region regexp doesn't allow to have dots, which
results in incorrect matching of arria10.dcp1.0 region.
2018-11-28 19:56:35 +02:00
Mikko Ylinen
794b3077bd qat_plugin: readme: list all known VF devices
Not all QAT chips (e.g, 37c9) are available in pci.ids which makes
"grep QAT" to not show them.

Scan all known VF PCI ids in a loop to ensure all configured devices
are shown.

Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2018-11-28 10:32:31 +02:00
Mikko Ylinen
187f8040f0 qat_plugin: use vfio-pci as the default driver
vfio-pci uses IOMMU memory protection and is a safer default.

Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2018-11-28 10:32:31 +02:00
Frederik Carlier
d6016dedf9 Fix typos 2018-11-22 20:44:00 +00:00
Mikko Ylinen
00bbe922de qat: deployment: set parameters via ConfigMap
For easier deployments, fetch plugin command line arguments from ConfigMap.
When using ConfigMaps, qat_plugin.yaml needs no changes and can always
be used as is.

qat_plugin_default_configmap.yaml uses built-in defaults.

Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2018-11-20 13:43:00 +02:00
Dmitry Rozhkov
c2b635e627 webhook: reformat source code with gofmt 1.11 2018-10-04 11:03:24 +03:00
Dmitry Rozhkov
06487dcded crihook: do program multiple devices at once 2018-10-04 10:19:23 +03:00
Dmitry Rozhkov
6ce053a0a6 crihook: drop unused test data 2018-10-04 10:19:23 +03:00
Dmitry Rozhkov
dc21749a83 crihook: optimize regexp application 2018-10-04 10:19:23 +03:00
Dmitry Rozhkov
f1623cc5e9 webhook: add support for multiple FPGAs per container 2018-10-04 10:19:23 +03:00
Dmitry Rozhkov
90776a63c7 webhook: make debug message meaningful 2018-10-04 10:19:23 +03:00
Ed Bartosh
14b4168cbd add GPU plugin deployment
Added DaemonSet yaml
Added deployment instructions to plugin's README
2018-09-14 13:55:08 +03:00