This adds the initImage field to the custom resource definition
and takes it into use.
The fpga webhook image validation function is split off into a
separate file.
Signed-off-by: Ukri Niemimuukko <ukri.niemimuukko@intel.com>
SGX aesmd (architectural enclave service daemon) can be used for SGX
DCAP Quote Generation. This commit adds a sample deployment that by
default talks to an Intel reference PCCS (Provisioning Certificate
Caching Service).
The default config provided is for a "single node" cluster that has
PCCS service localhost.
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
With the addition of SGX webhook in the operator, full SGX stack
depends on having the operator deployed first. SgxDevicePlugin CRD
is set to get intel-sgx-plugin and intel-sgx-initcontainer deployed
by the operator.
As a pre-requisite, node-feature-discovery must be deployed but it
is currently deployed via sgx_plugin kustomization overlay only.
It's better to allow NFD with the SGX specific settings deployed with
a kustomization of its own.
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
We currently build using trivialVersions=true and don't deal with
multiversion APIs and their conversion webhooks.
Therefore, drop the registration of the conversion webooks.
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
This adds an nfd-hook for the gpu-plugin, which will create labels
for the GPUs that can then be used for POD deployment purposes or
creation of GPU extended resources which allow then finer grained
GPU resource management.
The nfd-hook will install to the host system when the
intel-gpu-initcontainer is run. It is added into the plugin deployment
yaml.
Signed-off-by: Ukri Niemimuukko <ukri.niemimuukko@intel.com>
This commit adds two initcontainers in a kustomize overlay to QAT
deployment. The overlay can be used to prepare QAT setup on a freshly
booted system.
Note: containerd/cri-o seem to have issues mounting sysfs rw in even
if the container is privileged. Therefore, we do a special /sys:/sys
bind mount for 'cat sriov_totalvs | tee sriov_numvfs' to work.
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
this commits also changes validatePluginImage() to allow
image version as a parameter so that it can be used by by
other webooks too.
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
Some Ubuntu systems may run with Apparmor LSM policy enformements making
the default QAT daemonset to fail with (un)bind errors.
This commit adds a sample kustomize overlay to deploy the QAT daemonset with
Apparmor uconfined policy.
Fixes: #381
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
fpga: make AFU resource name 63 char long
webhook: drop mode from README
webhook: extend mappings description
webhook: tighten CRD definitions
webhook: drop mapping to non-existing afuId
explicitly state mappings names can be in any format
use consistent terminology across fpga webhook and plugin
Webhook uses region CRDs even if run in preprogrammed mode.
Adding them to the base configuration should fix this deployment error:
Failed to list *v1.FpgaRegion: the server could not find the requested resource
Fixes: #361
The same fix as previous:
The `-v 1` arg is treated as single word thus klog throws
"flag provided but not defined: -v 1" error.
This time it's in the webhook kustomize base.
Move all the fpga components to using klog for logging
and debug. This includes replacing our homebrew 'fatal()'
with klog.Error().
Modify the deployment files to move from `-debug` to
`-v`, and set their default level to '1' (Info), rather
than full debug mode ('4').
Signed-off-by: Graham Whaley <graham.whaley@intel.com>
Move the framework, and the qat driver, to use `klog`
for logging and debug.
This has a some noticeable effects:
1) Our default log output gains a bunch of annotation:
From:
QAT device plugin started in 'dpdk' mode
To:
I0312 11:51:02.057728 6053 qat_plugin.go:64] QAT device plugin started in 'dpdk' mode
(there is now a command line option to drop those annotations if
necessary).
2) We gain a bunch of command line parameters from klog for controlling log
levels and output. We go from 5 arguments to 17:
---
Usage of ./cmd/qat_plugin/qat_plugin:
-add_dir_header
If true, adds the file directory to the header
-alsologtostderr
log to standard error as well as files
-debug
enable debug output
-dpdk-driver string
DPDK Device driver for configuring the QAT device (default "vfio-pci")
-kernel-vf-drivers string
Comma separated VF Device Driver of the QuickAssist Devices in the system. Devices supported: DH895xCC,C62x,C3xxx and D15xx (default "dh895xccvf,c6xxvf,c3xxxvf,d15xxvf")
-log_backtrace_at value
when logging hits line file:N, emit a stack trace
-log_dir string
If non-empty, write log files in this directory
-log_file string
If non-empty, use this log file
-log_file_max_size uint
Defines the maximum size a log file can grow to. Unit is megabytes. If the value is 0, the maximum file size is unlimited. (default 1800)
-logtostderr
log to standard error instead of files (default true)
-max-num-devices int
maximum number of QAT devices to be provided to the QuickAssist device plugin (default 32)
-mode string
plugin mode which can be either dpdk (default) or kernel (default "dpdk")
-skip_headers
If true, avoid header prefixes in the log messages
-skip_log_headers
If true, avoid headers when opening log files
-stderrthreshold value
logs at or above this threshold go to stderr (default 2)
-v value
number for the log level verbosity
-vmodule value
comma-separated list of pattern=N settings for file-filtered logging
---
3) Our `-debug` flag is now replaced by the `klog` `-v n` flag.
*NOTE:* This is potentially a minor breaking change. Applying
this debug overlay to any previous (pre-klog edit) images will
cause the container to fail to launch, as it will not recognise
the new `-v` arguments.
We also update the kustomize deployment to move from using
DEBUG env vars to adding a VERBOSITY var that controls both
the log verbosity and now the debug mode enabling.
Signed-off-by: Graham Whaley <graham.whaley@intel.com>
Previously, /dev/ion device was just arbitrary string and the plugin
did not need the device for anything. After adding the checks for
topology hints, the device node must be bind mounted in the plugin
container.
Signed-off-by: Alek Du <alek.du@intel.com>
Previously, /dev/vfio/xx devices were just arbitrary strings and the
plugin did not need the devices for anything. After adding the checks
for topology hints, we need to read the devices attached to those so
the device nodes must be bind mounted in the plugin container.
Moreover, be more verbose about any errors coming from the topology code.
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
We had securityContext specified twice and the latter was overwriting
readOnlyRootFilesystem=true.
With this commit, the container is properly mounted readonly. However,
we need a tmpfs for DPDK runtime data so an emptyDir volume is added
(NB: see kubernetes/issues/48912 for discussion on emptyDir mount options).
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
runtime uses /dev/termination-log to write container termination
messages. If this file doesn't exist on the host the runtime tries
to create it. As /dev is read-only for intel-fpga-plugin container
attempt to create /dev/termination-log fails with this error:
Warning Failed kubelet, device-plugins-kubernetes-clearlinux-14-4.novalocal Error:
container create failed: container_linux.go:345: starting container process caused
"process_linux.go:430: container init caused \"rootfs_linux.go:58:
mounting \\\"/var/lib/kubelet/pods/d7262db5-e3fc-4b7b-bc2e-da245f600c4b/containers/intel-fpga-plugin/cddd0f76\\\"
to rootfs \\\"/var/lib/containers/storage/overlay/edd75bb94b1b4cf93ae1ea5c064945169fb329d0abdb56b7621cddfc721f6eda/merged\\\"
at \\\"/var/lib/containers/storage/overlay/edd75bb94b1b4cf93ae1ea5c064945169fb329d0abdb56b7621cddfc721f6eda/merged/dev/termination-log\\\"
caused \\\"open /var/lib/containers/storage/overlay/edd75bb94b1b4cf93ae1ea5c064945169fb329d0abdb56b7621cddfc721f6eda/merged/dev/termination-log: read-only file system\\\"\""
Setting terminationMessagePath to rw-mounted file system
/tmp/termination-log for the plugin container should fix this.
Fixes: #259
Since Kubernetes v1.16 release, DaemonSet, Deployment, StatefulSet, and ReplicaSet in the extensions/v1beta1 and apps/v1beta2 API groups have been deprecated. This PR migrates the webhook deployment to use apps/v1 instead of extensions/v1beta1 and add the selector part also required by the migration.
Signed-off-by: Hector Augusto Garcia Baleon <hector.augusto.garcia.baleon@intel.com>
The default deployment gives rather wide host mounts.
Limited sysfs mount only to the subdirectory the plugin
needs.
Mounted sysfs and dev mounts read-only.
Added notes that FPGA plugin can be run as non-root user.
The default deployment gives rather wide host mounts. We can limit
the mounts only to the subdirectories the plugin needs and mount
them read-only.
Also, add notes that both QAT and GPU plugins can be run as non-root
user.
Fixes: #228
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
In preparations to get some of the images to hub.docker.com/intel,
start using intel/ prefix.
Moreover, set the Makefile variables so that the images built
by make [images|demos] can easily be pushed to any registry/org
by 'docker push' (e.g., by Jenkins).
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
- used ENTRYPOINT instead of CMD in plugin and admission webhook
Dockerfiles to avoid duplicating commands in the pod yamls
- fixed path to deploy.sh script in fpga initcontainer Dockerfile
- Ordered collection in DCP release/region/afus order for simpler
maintenance.
- Got rid of ambiguous entries without dcp releases, e.g. Arria10,
Arria10-nlb3 etc.
- For AOCX files, afuId should be set to unique UUID
(can be seen via fpga_tool)
- arria10 now points to DCP 1.2 release
- added mappings for Stratix10 based D5005 PAC card
Clear Linux enables DPDK QAT PMD so we can move to use everything from
there. This saves maintenance efforts and we get more up-to-date DPDK.
The DPDK version in this update gives a tool for compress perf too, for
instance.
The commit also adds kustomize scripts that overlay the original DPDK
demo deployment to run dpdk-test-[compress|crypto|-perf test cases:
$ kubectl apply -k deployments/qat_dpdk_app/test-compress1/
$ kubectl apply -f deployments/qat_dpdk_app/test-crypto1/
New test cases ('ptest's with varying parameters) can be easily added
by following the pattern in test-[crypto|compress]1 directories.
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
We plan to use crypto-perf for simple QAT testing. This commit adds
kustomization to make the deployment easier. The original .yaml is
also moved to deployments/ with some changes.
For instance, it turns out also vfio-pci mode with DPDK needs CAP_SYS_ADMIN
(See PR: #187 which states that only igb_uio would need it).
kustomize is available part of kubectl since kubernetes v1.14.
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
Current mappings break admissionwebhook deployment with this
kind of errors:
Invalid value: "arria10_dcp1.0": a DNS-1123 subdomain must consist of
lower case alphanumeric characters, '-' or '.', and must start and end
with an alphanumeric character (e.g. 'example.com', regex used for
validation is
'[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*'
New mappings conform DNS-1123 regexp. They have been tested by the
compression demo and known to work.
For easier deployments, fetch plugin command line arguments from ConfigMap.
When using ConfigMaps, qat_plugin.yaml needs no changes and can always
be used as is.
qat_plugin_default_configmap.yaml uses built-in defaults.
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
Added DCP 1.0 Arria10 region and compress AFU ids
to the mapping collection to be able to work with
DCP 1.0 bitstreams.
This is also an enabler for FPGA demo that uses compress.aocx,
which is not compilable by aoc compiler from DCP 1.1
Added alternative builder for project images: buildah
https://github.com/containers/buildah
Considering that some of our plugins use CRI-O runtime it could be
a good idea to get rid of docker as a builder. It should allow us
not to run docker daemon at all, even for build purposes.
Kubernetes also goes this way encouraging users to switch to CRI
runtimes (CRI-O and containerd), so having non-docker builds supported
looks good from this perspective too.
Currently we have hardcoded mapping from human readable names of
AFs and FPGA regions like arria10-nlb0 to the resource names
produced by the FPGA device plugin. This is not sustainable
long term solution.
Implement CRD based mappings so that a new mapping can be added or
removed dynamically by cluster admins with CRD resources.