Commit Graph

238 Commits

Author SHA1 Message Date
Mikko Ylinen
ad8bbcea21 qat: rework bus-device-function handling
The code was stripping out "0000:" (bus) and then adding
it back in several places.

That's not necessary so this change simplifies QAT VF addr
handling by operating using full BDF IDs.

Moveover, simplify function calls: use getDpdkDevice() once
for each VF device.

Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2020-12-08 07:37:16 +02:00
Dmitry Rozhkov
f0fa9df292 operator: prepare for publishing at operatorhub.io 2020-11-24 18:35:56 +02:00
Mikko Ylinen
d65cb902e6 sgx: move to RFC v4x device API
The SGX device nodes have changed from /dev/sgx/[enclave|provision]
to /dev/sgx_[enclave|provision] in v4x RFC patches according to the
LKML feedback.

This changes moves to use the new device nodes. Backwards compatibility
is provided by adding /dev/sgx directory mount to containers. This
assumes the cluster admin has installed the udev rules provided in the
README to make the old device nodes as symlinks to the new device nodes.

Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2020-11-18 21:17:28 +02:00
Dmitry Rozhkov
5ec466b2eb add known issue for operator 2020-11-12 11:23:41 +02:00
Alexander D. Kanevskiy
75355c9937
Merge pull request #497 from bart0sh/PR0094-move-GetAPIVersion-out-of-NewPort
fpga: move GetAPIVersion call out of NewPort and NewFME
2020-11-11 12:09:13 +02:00
Ed Bartosh
2c73e2a0b3 fpga: move GetAPIVersion call out of NewPort and NewFME
This call is implemented by calling ioctl, which raises
"open /dev/intel-fpga-port.X: operation not permitted" error
when called inside unprivileged container.

This breaks FPGA plugin.

Calling this API from fpga_tool is still OK, so
moving calls there should fix the issue.

Signed-off-by: Ed Bartosh <eduard.bartosh@intel.com>
2020-11-10 16:44:20 +02:00
Dmitry Rozhkov
5f0da56045 Upgrade to k8s v1.19.3 2020-11-10 16:09:20 +02:00
Ed Bartosh
680da54fd9 fpga: improve port init
Used generic newPort API instead of device-specific
newDflPort and newIntelFpgaPort.

Signed-off-by: Ed Bartosh <eduard.bartosh@intel.com>
2020-11-01 01:47:49 +02:00
Dmitry Rozhkov
25a52b0b74
Merge pull request #478 from bart0sh/PR0091-FPGA-SRIO-V
fpga: reimplement device discovering
2020-10-30 10:05:05 +02:00
Mikko Ylinen
0f6eefee23 sgx: add documentation
This commit documents the SGX building blocks for Kubernetes and
how to deploy them in the cluster.

Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2020-10-27 15:02:40 +02:00
Ed Bartosh
243870a707 fpga: reimplement device discovering
Reimplemented discovering of the FPGA devices using
APIs from pkg/fpga/intel_fpga_linux. The APis are also
used in the fpga_tool utility.

The API is more advanced and supports SR-IOV among other
things.

Fixes: #372

Signed-off-by: Ed Bartosh <eduard.bartosh@intel.com>
2020-10-26 21:45:52 +02:00
Dmitry Rozhkov
87143355ba
Merge pull request #483 from mythi/sgx-nfd
sgx: make SGX NFD kustomization overlay independent
2020-10-26 13:25:36 +02:00
Ukri Niemimuukko
5b5180ae00 gpu_nfdhook memory amount reading from sysfs
This adds reading of the GPU memory amount from the sysfs. As a
fallback the environment variable GPU_MEMORY_OVERRIDE remains.

Another environment variable GPU_MEMORY_RESERVED can be used to
reserve a dedicated byte amount outside of kubernetes usage.

Signed-off-by: Ukri Niemimuukko <ukri.niemimuukko@intel.com>
2020-10-26 09:45:43 +02:00
Mikko Ylinen
161298190f sgx: make SGX NFD kustomization overlay independent
With the addition of SGX webhook in the operator, full SGX stack
depends on having the operator deployed first. SgxDevicePlugin CRD
is set to get intel-sgx-plugin and intel-sgx-initcontainer deployed
by the operator.

As a pre-requisite, node-feature-discovery must be deployed but it
is currently deployed via sgx_plugin kustomization overlay only.

It's better to allow NFD with the SGX specific settings deployed with
a kustomization of its own.

Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2020-10-23 12:44:36 +03:00
Mikko Ylinen
e9dec450d6 improve docs for no_proxy when using cert-manager
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2020-10-21 14:57:41 +03:00
Mikko Ylinen
4e5eae62c4 update to cert-manager v1.0.3
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2020-10-16 22:37:57 +03:00
Ukri Niemimuukko
505eadaf94 gpu-plugin nfd-hook
This adds an nfd-hook for the gpu-plugin, which will create labels
for the GPUs that can then be used for POD deployment purposes or
creation of GPU extended resources which allow then finer grained
GPU resource management.

The nfd-hook will install to the host system when the
intel-gpu-initcontainer is run. It is added into the plugin deployment
yaml.

Signed-off-by: Ukri Niemimuukko <ukri.niemimuukko@intel.com>
2020-10-01 12:02:57 +03:00
Kevin Putnam
1d149ffee6 Documentation: Fixes broken links and standardizes headers.
Signed-off-by: Kevin Putnam <kevin.putnam@intel.com>
2020-09-22 08:32:21 -07:00
Dmitry Rozhkov
1b82ab9df6 sync README.md files with the current state of the code
Closes #356
2020-09-16 10:54:39 +03:00
Mikko Ylinen
33a4f8f546 sgx: add SgxDevicePlugin CRD and admission webhook
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2020-09-10 15:31:26 +03:00
Ukri Niemimuukko
b2991b94e1 gpu_plugin: reduce topology scanning for high shared dev count
For every created device info, a new topology scan is performed in
the filesystem. The shared dev count was implemented so that for each
shared device, a new device info was created, which resulted in the
topology scan happening as many times per Scan-round, as there were
shared devs.

This fixes the issue by making the device info to be shared among the
shared devices.

Signed-off-by: Ukri Niemimuukko <ukri.niemimuukko@intel.com>
2020-09-08 18:57:29 +03:00
Dmitry Rozhkov
9bdf3a4def
Merge pull request #440 from mythi/ctrl-runtime-062
go.mod: update controller-runtime to v0.6.2
2020-09-03 12:02:06 +03:00
Dmitry Rozhkov
41e23dab3f
Merge pull request #438 from mythi/updates-20200901
.gitignore + kind + cert-manager v1.0.0
2020-09-03 12:00:33 +03:00
Alexander Kanevskiy
c74cb563dc Implemented SR-IOV Release/Assign ioctl
fpgatool now able to prepare FME via kernel ioctl to release and
assign ports for SR-IOV configurations.
2020-09-02 18:16:53 +03:00
Mikko Ylinen
f0d4754d53 move to cert-manager v1.0.0
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2020-09-02 18:07:05 +03:00
Mikko Ylinen
76aa7b91f0 go.mod: update controller-runtime to v0.6.2
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2020-09-02 15:16:12 +03:00
Dmitry Rozhkov
71075d4478 lint: enable exportloopref, prealloc and scopelint checks 2020-08-31 11:10:51 +03:00
Dmitry Rozhkov
be713f1c8b lint: enable errcheck 2020-08-28 16:14:14 +03:00
Mikko Ylinen
6b2148d22c
Merge pull request #431 from rojkov/staticcheck
linter: enable staticcheck
2020-08-26 18:08:09 +03:00
Ukri Niemimuukko
7244bd0f25 gpu_plugin: README.md update
Move remark about GVT-d to end of introduction. Remove remarks
about GVT-g for the time being.

Signed-off-by: Ukri Niemimuukko <ukri.niemimuukko@intel.com>
2020-08-25 13:45:10 +03:00
Dmitry Rozhkov
7ff08ee874 linter: enable staticcheck 2020-08-25 09:54:59 +03:00
Mikko Ylinen
a5f648077e sgx: add NFD EPC source, README and deployment YAMLs
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2020-08-24 16:33:45 +03:00
Ismo Puustinen
3ab60b4027 sgx: add tests for the plugin.
Signed-off-by: Ismo Puustinen <ismo.puustinen@intel.com>
2020-08-24 16:33:45 +03:00
Ismo Puustinen
8751afb6c7 sgx: add new plugin.
The SGX plugin exposes two device files as separate resources:

  * /dev/sgx/enclave   as sgx.intel.com/enclave
  * /dev/sgx/provision as sgx.intel.com/provision

The number of resources is configurable, but it's intended to be equal
to the pod count by default, so that any pod requiring access would have
it. The access control (who can do SGX remote attestation) is done
outside this plugin.

Signed-off-by: Ismo Puustinen <ismo.puustinen@intel.com>
2020-08-24 16:33:45 +03:00
Mikko Ylinen
cd068c797a ci: update tool versions
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2020-08-21 17:04:04 +03:00
Dmitry Rozhkov
200e2f8181 operator: add simple FPGA operator combined with FPGA webhook 2020-08-18 17:32:23 +03:00
Ed Bartosh
4794072273
Merge pull request #422 from rojkov/fpga-kubebuilder
fpga webhook: reimplement to use kubebuilder framework
2020-08-18 13:31:31 +03:00
Dmitry Rozhkov
a62c6f7d5e fpga webhook: reimplement to use kubebuilder framework
Simplify upgrade procedure to newer versions of kubernetes by relying on the
kubebuilder framework rather than using codegen directly.

Closes #377
2020-08-17 12:09:03 +03:00
Mikko Ylinen
1cfb849eef qat: update QAT software stack
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2020-08-12 23:08:59 +03:00
Dmitry Rozhkov
e87d94d4fb fpga: finalize plugin kustomization
closes #318
2020-07-01 11:57:45 +03:00
Mikko Ylinen
2f16509fe3
Merge pull request #376 from rojkov/operator-v3
operator: initial version with gpu and qat controllers
2020-06-25 15:49:49 +03:00
Dmitry Rozhkov
6b2fa0a264 operator: initial version with gpu and qat controllers 2020-06-25 13:48:41 +03:00
Alexander D. Kanevskiy
79ef9d54e2
Merge pull request #397 from rojkov/nakedret
fpga_tool: enable nakedret check
2020-06-24 20:52:33 +03:00
Dmitry Rozhkov
7177409f19 fpga webhook: rework deployment to use kustomize
Contributes to #318
2020-06-23 15:53:36 +03:00
Dmitry Rozhkov
339cdee501 linter: enable nakedret check 2020-06-23 12:04:35 +03:00
Mikko Ylinen
bc22a07638
Merge pull request #398 from rojkov/gosec
linter: enable gosec check
2020-06-16 16:16:02 +03:00
Dmitry Rozhkov
73aea0aa1b linter: enable gosec check 2020-06-11 17:56:24 +03:00
Dmitry Rozhkov
828e12f896 doc: add note about proxy to webhook doc 2020-06-11 16:06:54 +03:00
Dmitry Rozhkov
70f862f2aa add golangci linter
In this initial commit the following checks are disabled due to
excessive amount of changes required:
- dupl (duplicate code)
- funlen (function length)
- goerr113 (errors handling expressions)
- gomnd (magic numbers)
- gosec (security)
- nakedret (naked returns)
- wsl (forces to use empty lines)
- errcheck (checking for unchecked errors)
- staticcheck (static analysis)
2020-06-08 14:01:13 +03:00
Dmitry Rozhkov
aabc45cbb5 gpu: increase code coverage for unit tests 2020-05-19 16:14:40 +03:00