SGX Admission webhook was quickly forked from FPGA's
implementation. After a bit of thinking, it turns out
leader election and metrics are not necessary for a
(idempotent) webhook-only functionality.
For FPGA Admission webhook, the metrics isn't correctly
set up so it's better to disable the functionality. Leader
election is kept but the flag name is renamed to align with
"kubebuilder v3 functionality" similar to how we changed it
to the operator as well.
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
Linux 6.0 adds sysfs-driver-qat entries to read device capabilities:
42e66b1cc3/Documentation/ABI/testing/sysfs-driver-qat
Implement the logic for reading from sysfs and prefer that over debugfs.
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
/dev/sgx_* cannot be mapped to any topology. SGX itself is topology
aware but we cannot control it with TopologyInfo.
Currently, pkg/topology returns empty TopologyInfo{Nodes:[]*NUMANode{}}
for /dev/sgx_* but kubelet TopologyManager (when enabled and with the
policy other than 'none') interpretes that as "Hint Provider has no
possible NUMA affinities for resource" and rejects the SGX resources.
What we want is "Hint Provider has no preference for NUMA affinity with
resource". This is communicated using nil TopologyInfo.
See: https://github.com/kubernetes/kubernetes/issues/112234
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
Removed unused import. This should fix this golangci-lint failure:
can't run linter goanalysis_metalinter:
buildir: failed to load package :
could not load export data:
no export data for "cloud.google.com/go/compute/metadata"
Signed-off-by: Ed Bartosh <eduard.bartosh@intel.com>
Currently, each individual plugin README documents roughly the same
daily development steps to git clone, build, and deploy. Re-purpose
the plugin READMEs more towards cluster admin type of documentation
and start moving all development related documentation to DEVEL.md.
The same is true for e2e testing documentation which is scattered
in places where they don't belong to. Having all day-to-day
development Howtos is good to have in a centralized place.
Finally, the cleanup includes some harmonization to plugins'
table of contents which now follows the pattern:
* [Introduction](#introduction)
(* [Modes and Configuration Options](#modes-and-configuration-options))
* [Installation](#installation)
(* [Prerequisites](#prerequisites))
* [Pre-built Images](#pre-built-images)
* [Verify Plugin Registration](#verify-plugin-registration)
* [Testing and Demos](#testing-and-demos)
* ...
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
Config file is suitably indented so that it can be directly
appended to a suitable configMap header.
Signed-off-by: Eero Tamminen <eero.t.tamminen@intel.com>
To facilitate GPU plugin scalability testing on a real cluster.
Pre-existing (fake) sysfs & devfs content needs to be removed first:
* Fake devfs directory is mounted from host so OCI runtime can "mount"
device files also to workloads requesting fake devices. This means
that those files can persist over fake GPU plugin life-time, and
earlier files need to be removed, as they may not match
* DaemonSet restarts failing init containers, so errors about content
created on previous generator run would prevent getting logs of the
real error on first generator run
* Before removal, check that removed directory content is as expected,
to avoid accidentally removing host sysfs/devfs content (in case
container was erronously granted access to the real thing)
Container runtime requires fake device files to real be devices:
* Use NULL devices to represent fake GPU devices:
https://www.kernel.org/doc/Documentation/admin-guide/devices.txt
* Give more detailed logging for MkNod() failures as device
node creation is most likely operation to fail when container
does not have the necessary access rights
Created content is based on JSON config file (instead of e.g.
commandline options) so that (configMap providing) it can be updated
independently of the pod where generator is run.
Signed-off-by: Eero Tamminen <eero.t.tamminen@intel.com>
There's no mapping available from IP block versions to actual product
features, which make these version numbers fairly useless for end
users.
In mixed GPU clusters, running a job that adds/updates node labels for
the relevant GPU features to each relevant node would be much more
user-friendly. This could be done easily by converting given GPU API
capability tool (e.g. "vainfo" for VA-API, "clinfo" for OpenCL) output
to a NFD feature file.
(Such thing would be outside of this project scope though, except
maybe as an example / test-case.)
Signed-off-by: Eero Tamminen <eero.t.tamminen@intel.com>
GPU plugin code assumes container paths to match host paths, and
container runtime prevents creating fake files under real paths. When
non-standard paths are used, devices can be faked for scalability
testing.
Note: If one wants to run both normal GPU plugin and faked one in same
cluster, all nodes providing fake "i915" resources should be labeled
differently from ones with real GPU plugin + devices, so that real GPU
workloads can be limited to correct nodes with a suitable
nodeSelector.
Signed-off-by: Eero Tamminen <eero.t.tamminen@intel.com>
kubebuilder v3 based scaffolding has updated many things
and they are documented in [1].
Update operator's functionality to v3 level. We've done
most/some of the changes earlier (e.g., by not using
deprecated k8s APIs anymore) so the changes are minimal.
[1] https://book.kubebuilder.io/migration/v2vsv3.html
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
QAT_401xx is a derivative of 4xxx. Add support for that device
by including the device IDs (both PF and VF).
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
Containers running on QAT Gen4 should be based on qatlib and therefore
kerneldrv is not the right mode. Skip registering 4xxx* devices to
ensure it is not used.
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
grpc-go v1.43.0 deprecated grpc.WithInsecure() in favor of
insecure.NewCredentials(). Move to use the recommended approach
and drop the linter annotations.
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
Move reallocate logic to getpreferredallocation and simplify
allocate to use the kubelet's device ids.
Signed-off-by: Tuomas Katila <tuomas.katila@intel.com>
Adds functionality to convert container's tile annotation
in to corresponding L0 affinity mask. This helps to target
container's workload to specific L0 subdevices.
Signed-off-by: Tuomas Katila <tuomas.katila@intel.com>
Start using the newly created NodeFeatureRule configs with SGX.
This allows to drop the custom worker config.
Additionally, split the example NFD deployment into two steps
1) plain NFD (+SGX json patches)
2) NodeFeatureRule creation
NodeFeatureRule creation is not guaranteed to succeed when it's
part of the same kustomization with the CRD creation. Users may
also have NFD already running so allowing 2) alone works better
in that scenario.
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
The webhooks' default deployments depend on cert-manager. Our existing
documentation points to a specific cert-manager version giving users
the impression that it should be used. However, that is not the case.
Update the documentation so that we just point to cert-manager
installation page. With this, we don't have to hard-code to any
specific version.
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
This adds a new label "gpu-numbers" for short numbered lists of
gpus, omitting "card" from the names. Also adds splitting of long
label values.
Similarly this adds a new label "pci-groups" for PCI groups. Grouping
can be controlled by env var GPU_PCI_GROUPING_LEVEL. The env var
dictates, how many pci-folder names need to match, in order for GPUs
to be considered to belong in a group.
Signed-off-by: Ukri Niemimuukko <ukri.niemimuukko@intel.com>
If the kernel has CONFIG_VFIO_NOIOMMU enabled and the node admin
has explicitly set enable_unsafe_noiommu_mode VFIO parameter,
VFIO taints the kernel and writes "vfio-noiommu" to the IOMMU
group name. If these conditions are true, the /dev/vfio/ devices
are prefixed with "noiommu-".
This use-case is documented for DPDK so we don't want to break
it (as it was before because we added DeviceMounts to
/dev/vfio/<iommugroup> files that did not exist).
See DPDK documentation for further information and warnings.
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
While the labeling limit is obvious after little thought, IMHO
limitations like this should either be stated out front, or be in
their own section in the README. Commit does former for the GPU
plugin fractional resources, and latter for the NFD hook / labeling.
Remove the sentence for pre-built image since Dockerhub image for dlb
plugin is available.
Signed-off-by: Hyeongju Johannes Lee <hyeongju.lee@intel.com>
There are a few things left un-renamed after \#771.
Rename those to idxd-config-initcontainer.
Signed-off-by: Hyeongju Johannes Lee <hyeongju.lee@intel.com>
The new_id based driver binding is failing on kernels 5.11+ when the
QAT VF is not bound to any driver: attempts to write to new_id with
the same device ID repeatedly error with "file exists".
Move the new_id initialization to the beginning of the startup and
write the enabled device IDs only once.
This commit also fixes an issue where VF devices where not correctly detected
in virtual machines where the VF was not bound any driver.
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
After a closer review, it was noticed that some of the QAT dpdkdrv
unit tests need updating:
- "Broken igb_uio DPDKdriver..." is actually testing unknown device ID
and we already have tests for it -> drop.
- "igb_uio DPDKdriver with one kernel bound device (not QAT device)" is
testing something impossible: an unknown VF devID is originated from a
QAT PF -> drop.
- creating files for unbind/new_id etc. is unnecessary because
os.WriteFile() creates them during the tests -> drop these lines to
simplify unit tests maintenance.
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
Bumps [google.golang.org/grpc](https://github.com/grpc/grpc-go) from 1.42.0 to 1.43.0.
- [Release notes](https://github.com/grpc/grpc-go/releases)
- [Commits](https://github.com/grpc/grpc-go/compare/v1.42.0...v1.43.0)
---
updated-dependencies:
- dependency-name: google.golang.org/grpc
dependency-type: direct:production
update-type: version-update:semver-minor
...
---
In addition to changes made by dependabot, I add nolint comments to ignore staticcheck(SA1019) errors.
It is because insecure.NewCredentials() recommended as an alternative is still declared experimental.
So keep grpc.withInsecure() with nolint comment.
Signed-off-by: Hyeongju Johannes Lee <hyeongju.lee@intel.com>
GPU generation "gen" number is replaced in the capability files of
latest kernels with separate display, graphics, and media versions.
For compatibility with newer kernels, provide "gen" based on the new
labels (but without decimals), and for older kernel compatibility, new
labels based on the "gen".
Because different kernels match different items from the action map,
whole capability file will get parsed. Capability file parsing is
optimized by using prefix check instead of scanf.
"platform_gen" label is deprecated, and can be dropped whenever it
becomes inconvenient (lint complains about line count etc).
The cmdline flags talked about the old device nodes. With the
upstream driver, the devices nodes are /dev/sgx_[enclave|provision].
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
To make QAT plugin deployment consistent with the other plugins
we update the default flags and deploy without the flag settings
provided by the ConfigMap.
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
1. Implement PreferredAllocator interface.
2. Provide 3 preferred allocation policies: balancedPolicy, packedPolicy and nonePolicy.
3. Provide the cmdline interface: -allocation-policy balanced/packed/none, to select which preferred allocation policy to use.
4. Add operator support.
Co-authored-by: Mikko Ylinen <mikko.ylinen@intel.com>
- used the same go version as for the project build
- used verbose output
- fixed gofmt check failures
Signed-off-by: Ed Bartosh <eduard.bartosh@intel.com>
Commit 00a59e8f7d was not complete in that it didn't update
the corresponding documentation. This commit fixes that.
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
The devices enabled by default are different between the
kustomize and operator based deployments.
This change harmonizes the defaults to c6xxvf and 4xxxvf
in both deployment options.
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
This changes the memory reading to be done through lmem_total_bytes
file instead of the addr_range file.
Signed-off-by: Ukri Niemimuukko <ukri.niemimuukko@intel.com>
Add govet-fieldalignment to .golangci.yml
Fix errors that come from adding govet-fieldalignment
- by reordering the fields of structs
- by putting nolint:govet annotations
Signed-off-by: Hyeongju Johannes Lee <hyeongju.lee@intel.com>
Update tool versions
Fix the errors and warnings originated from the update:
-Correct type deviceInfo (->DeviceInfo) to make it public
-Fix gpu_plugin.go and vpu_plugin_test.go where stylecheck errors occur
-Fix deprecation warnings
-Rename type 'PatcherManager' to 'Manager' to solve exported errors
-Rename type 'SgxMutator' to 'Mutator' to solve exported errors
Signed-off-by: Hyeongju Johannes Lee <hyeongju.lee@intel.com>
- Information on specific HW & virtualization types on which GPU plugin
is tested on, belongs to releases notes, not to README intro
(where it has already became obsolete)
- HW offloading is provided by driver backends, not frontends
(e.g. OneVPL is just one of the media driver frontends)
This adds a section heading, TOC link, command line flag description
and a short explanation of what other dependendent configuration
changes are needed with fractional resources in order for the command
line flag to achieve something useful.
Signed-off-by: Ukri Niemimuukko <ukri.niemimuukko@intel.com>
This adds a link from gpu-plugin README to the nfdhook README, and
updates the nfdhook README with label descriptions.
Signed-off-by: Ukri Niemimuukko <ukri.niemimuukko@intel.com>
<device>/driver symlink does not exist if the device is not bound
to any driver. bindDevice() failed when writing to <device>/driver/unbind
errored but IsNotExist() error is acceptable in case there's no driver
to unbind.
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
Go 1.16 release notes announced the deprecation of io/ioutil [1]. It's easy
for us to move to use what is was recommended so just do it.
[1] https://golang.org/doc/go1.16#ioutil
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
All but one (VPU) of the published container images can be built with
static binaries which allows us to use distroless/static as the
base image. Moreover, when combined with stripping the plugin binaries,
we can get both build time and image size savings.
This is the part 1 (out of 2) of the rework. Part 2 will finish the
change by making some adjustments to VPU plugin image and moving the
FPGA/SGX/GPU initcontainers to distroless/static too.
Partial: #516
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
Signed-off-by: Ed Bartosh <eduard.bartosh@intel.com>
Tests plugin scan results in setups having none, one and multiple
eligible GPU devices, with and without SRIOV enabled, with two
different options values.
This does not cover verifying number of devices added under
"i915_monitoring" resource as that would be much larger change.
To help in:
* adding more CLI options in next and later commits, and
* to replace magic newDevicePlugin() input parameters with
explicitly named one(s)
NOTE: this has impact only for GPUs which are virtualized with SR-IOV.
Access to physical devices (PFs) is disabled for "i915" resource when
they have configured virtual devices (VFs).
This is because:
* GPU resources are expected to be evenly split between VFs in such
configurations
* But PF resource amount is expected to differ from VFs and typically
retain only enough resources (just few MB of RAM), to be able to
provide GPU metrics that are not available from VFs
* Neither the current GPU plugin, nor Kubernetes scheduling in
general, has proper support for heterogeneous GPUs (= capability
based scheduling)
Therefore "i915" resource needs to be limited to GPU devices with
homogeneous amount of resources, which in SR-IOV configurations is
expected to be the case only with VFs (when such are present).
The SGX DCAP out-of-tree v1.41 driver is also known to work
with the SGX plugin. However, the default NFD labeling does not
work with the out-of-tree driver so warn users about it.
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
Which mounts all (Intel) GPU devices to requesting container.
This is needed e.g. to get GPU metrics from the node. Requesting pod
does not know how many GPUs are on the node it gets assigned to, so
there needs to means to request them all.
(Only alternative for the new resource would be requesting Privileged
mode, which is clearly worse as that would grant pod access also to
all other devices and capabilities.)
This commit also:
* Adds "i915_monitoring" resource testing to: go test -v -run Scan
* Splits GPU plugin tests mock file system setup to a separate
createTestFiles() function because otherwise TestScan() does not
pass project's golangci-lint complexity limits
Add --device command line to operator's main.go which defines
the controllers/webhooks to set up.
Signed-off-by: Oleg Zhurakivskyy <oleg.zhurakivskyy@intel.com>
As the operator container image is available from a registry, we should
guide users to use it rather than build and deploy it locally.
Further, drop (un)deploy-operator targets in favor of simply using
kubectl for deployment.
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
Replaced multiple instances of master with main.
Reworded line 15 "Verify QAT device plugin is registered" removed 'on master'
and corresponding section heading. Related to pr499.
Signed-off-by: DougTW <doug.martin@intel.com>
The device plugins daemonsets are cluster wide and currently only
one device plugin instance per device is possible so making the
corresponding deviceplugin/v1 CRDs non-namespaced (i.e., scope: cluster)
fits better.
Previously, the device plugin daemonset was deployed in the same
namespace as the CR for that device but with the cluster scoped CRDs
we default to use the same namespace as the operator, unless overridden
via DEVICEPLUGIN_NAMESPACE env variable or a command line parameter
to operator manager deployment.
Three additional changes in this commit:
- enable DSA envtest tests
- update controller-runtime to v0.8.1
- change device plugin envtest suite to use klog/v2
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
Decouple the default enclaveLimit/provisionLimit from core count. With
this change, the default limit is constant and it can be made relative
to core count by setting PODS_PER_CORE multiplier via env variable.
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
Removed device plugin socket check from the documentation as
device plugin support is enabled by default in Kubelet.
Signed-off-by: Ed Bartosh <eduard.bartosh@intel.com>
- Impelemented demo image that runs accel-config tests
- Added testing instructions to the documentation
Signed-off-by: Ed Bartosh <eduard.bartosh@intel.com>
It looks that for a long time now we have accepted a setup where a valid QAT
device ID is accepted as a QAT device resource even though the device is
not "enabled" via kernelVfDrivers parameter.
Fix device ID validation to skip valid QAT devices that are not
explicitly specified in kernelVfDrivers.
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
The updated dp.scan() changes the way how VF devices are detected. The
main reason for the change is to take into account cases where the QAT VF
driver is not present in the system at all but only the PF driver is
loaded (and the SR-IOV devices are are enabled).
The rework also takes into account bare metal and VM deployments and
adds a test case for checking the virtualized environment.
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
The plugin now detects/accepts 4xxx and c4xxx devices too
and defaults to those drivers that are part of Linux mainline.
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
We have both "path" and "path/filepath" but the latter provides
everything needed so move it completely.
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
The code was stripping out "0000:" (bus) and then adding
it back in several places.
That's not necessary so this change simplifies QAT VF addr
handling by operating using full BDF IDs.
Moveover, simplify function calls: use getDpdkDevice() once
for each VF device.
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
The SGX device nodes have changed from /dev/sgx/[enclave|provision]
to /dev/sgx_[enclave|provision] in v4x RFC patches according to the
LKML feedback.
This changes moves to use the new device nodes. Backwards compatibility
is provided by adding /dev/sgx directory mount to containers. This
assumes the cluster admin has installed the udev rules provided in the
README to make the old device nodes as symlinks to the new device nodes.
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
This call is implemented by calling ioctl, which raises
"open /dev/intel-fpga-port.X: operation not permitted" error
when called inside unprivileged container.
This breaks FPGA plugin.
Calling this API from fpga_tool is still OK, so
moving calls there should fix the issue.
Signed-off-by: Ed Bartosh <eduard.bartosh@intel.com>
This commit documents the SGX building blocks for Kubernetes and
how to deploy them in the cluster.
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
Reimplemented discovering of the FPGA devices using
APIs from pkg/fpga/intel_fpga_linux. The APis are also
used in the fpga_tool utility.
The API is more advanced and supports SR-IOV among other
things.
Fixes: #372
Signed-off-by: Ed Bartosh <eduard.bartosh@intel.com>
This adds reading of the GPU memory amount from the sysfs. As a
fallback the environment variable GPU_MEMORY_OVERRIDE remains.
Another environment variable GPU_MEMORY_RESERVED can be used to
reserve a dedicated byte amount outside of kubernetes usage.
Signed-off-by: Ukri Niemimuukko <ukri.niemimuukko@intel.com>
With the addition of SGX webhook in the operator, full SGX stack
depends on having the operator deployed first. SgxDevicePlugin CRD
is set to get intel-sgx-plugin and intel-sgx-initcontainer deployed
by the operator.
As a pre-requisite, node-feature-discovery must be deployed but it
is currently deployed via sgx_plugin kustomization overlay only.
It's better to allow NFD with the SGX specific settings deployed with
a kustomization of its own.
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>