intel-device-plugins-for-kubernetes

github/intel-device-plugins-for-kubernetes

mirror of https://github.com/intel/intel-device-plugins-for-kubernetes.git synced 2025-06-03 03:59:37 +00:00

Author	SHA1	Message	Date
Mikko Ylinen	b81d2dcba8	Update SGX and FPGA webhook flags SGX Admission webhook was quickly forked from FPGA's implementation. After a bit of thinking, it turns out leader election and metrics are not necessary for a (idempotent) webhook-only functionality. For FPGA Admission webhook, the metrics isn't correctly set up so it's better to disable the functionality. Leader election is kept but the flag name is renamed to align with "kubebuilder v3 functionality" similar to how we changed it to the operator as well. Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>	2022-09-13 13:18:28 +03:00
Mikko Ylinen	3abf10d7ff	qat: read device capabilities from sysfs Linux 6.0 adds sysfs-driver-qat entries to read device capabilities: `42e66b1cc3/Documentation/ABI/testing/sysfs-driver-qat` Implement the logic for reading from sysfs and prefer that over debugfs. Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>	2022-09-09 14:16:03 +03:00
Tuomas Katila	230570f12e	gpu: add mentions about data center gpu support Signed-off-by: Tuomas Katila <tuomas.katila@intel.com>	2022-09-09 13:07:50 +03:00
Mikko Ylinen	307e960871	docs: fix remaining review comments Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>	2022-09-06 14:28:25 +03:00
Mikko Ylinen	8ac321f5e3	sgx: send nil TopologyInfo /dev/sgx_* cannot be mapped to any topology. SGX itself is topology aware but we cannot control it with TopologyInfo. Currently, pkg/topology returns empty TopologyInfo{Nodes:[]NUMANode{}} for /dev/sgx_ but kubelet TopologyManager (when enabled and with the policy other than 'none') interpretes that as "Hint Provider has no possible NUMA affinities for resource" and rejects the SGX resources. What we want is "Hint Provider has no preference for NUMA affinity with resource". This is communicated using nil TopologyInfo. See: https://github.com/kubernetes/kubernetes/issues/112234 Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>	2022-09-06 08:43:04 +03:00
Ed Bartosh	f0dd95274e	Merge pull request #1126 from mythi/PR-2022-054 docs: rework development guide	2022-09-02 17:59:15 +03:00
Ed Bartosh	5756725b09	fix lint failure Removed unused import. This should fix this golangci-lint failure: can't run linter goanalysis_metalinter: buildir: failed to load package : could not load export data: no export data for "cloud.google.com/go/compute/metadata" Signed-off-by: Ed Bartosh <eduard.bartosh@intel.com>	2022-09-02 12:02:06 +03:00
Mikko Ylinen	1b3accacc2	docs: rework development guide Currently, each individual plugin README documents roughly the same daily development steps to git clone, build, and deploy. Re-purpose the plugin READMEs more towards cluster admin type of documentation and start moving all development related documentation to DEVEL.md. The same is true for e2e testing documentation which is scattered in places where they don't belong to. Having all day-to-day development Howtos is good to have in a centralized place. Finally, the cleanup includes some harmonization to plugins' table of contents which now follows the pattern: * [Introduction](#introduction) (* [Modes and Configuration Options](#modes-and-configuration-options)) * [Installation](#installation) (* [Prerequisites](#prerequisites)) * [Pre-built Images](#pre-built-images) * [Verify Plugin Registration](#verify-plugin-registration) * [Testing and Demos](#testing-and-demos) * ... Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>	2022-08-31 20:00:15 +03:00
Eero Tamminen	fb18923298	Log GPU device share count & type count changes separately And instead of accessing DeviceTree internals, add suitable method for it. Signed-off-by: Eero Tamminen <eero.t.tamminen@intel.com>	2022-08-31 17:23:57 +03:00
Mikko Ylinen	d826548d29	Merge pull request #1113 from eero-t/gpu-count-log More detailed log for number of found GPU devices / resource types	2022-08-29 09:58:53 +03:00
Ed Bartosh	02446fca1d	Merge pull request #1114 from eero-t/prefix-option Add "prefix" option to GPU plugin for scalability testing	2022-08-26 22:54:28 +03:00
Eero Tamminen	9d4b52188e	Add "gpu_fakedev" documentation Signed-off-by: Eero Tamminen <eero.t.tamminen@intel.com>	2022-08-26 19:05:10 +03:00
Eero Tamminen	cc3aebbefc	Add minimal example JSON to test "gpu_fakedev" generator Config file is suitably indented so that it can be directly appended to a suitable configMap header. Signed-off-by: Eero Tamminen <eero.t.tamminen@intel.com>	2022-08-26 19:05:10 +03:00
Eero Tamminen	c15feea1f8	Add code for generating fake GPU sysfs + devfs files To facilitate GPU plugin scalability testing on a real cluster. Pre-existing (fake) sysfs & devfs content needs to be removed first: * Fake devfs directory is mounted from host so OCI runtime can "mount" device files also to workloads requesting fake devices. This means that those files can persist over fake GPU plugin life-time, and earlier files need to be removed, as they may not match * DaemonSet restarts failing init containers, so errors about content created on previous generator run would prevent getting logs of the real error on first generator run * Before removal, check that removed directory content is as expected, to avoid accidentally removing host sysfs/devfs content (in case container was erronously granted access to the real thing) Container runtime requires fake device files to real be devices: * Use NULL devices to represent fake GPU devices: https://www.kernel.org/doc/Documentation/admin-guide/devices.txt * Give more detailed logging for MkNod() failures as device node creation is most likely operation to fail when container does not have the necessary access rights Created content is based on JSON config file (instead of e.g. commandline options) so that (configMap providing) it can be updated independently of the pod where generator is run. Signed-off-by: Eero Tamminen <eero.t.tamminen@intel.com>	2022-08-26 19:04:43 +03:00
Eero Tamminen	ddf2c8bc8f	More detailed log for number of found GPU devices / resource types Signed-off-by: Eero Tamminen <eero.t.tamminen@intel.com>	2022-08-26 17:51:27 +03:00
Eero Tamminen	0b519ecf1e	Deprecate debugfs GPU IP block version labels in NFD hook doc There's no mapping available from IP block versions to actual product features, which make these version numbers fairly useless for end users. In mixed GPU clusters, running a job that adds/updates node labels for the relevant GPU features to each relevant node would be much more user-friendly. This could be done easily by converting given GPU API capability tool (e.g. "vainfo" for VA-API, "clinfo" for OpenCL) output to a NFD feature file. (Such thing would be outside of this project scope though, except maybe as an example / test-case.) Signed-off-by: Eero Tamminen <eero.t.tamminen@intel.com>	2022-08-24 16:55:01 +03:00
Eero Tamminen	0b7cbc862d	Improve GPU NFD hook documentation Add table of contents, simplify introduction text. Signed-off-by: Eero Tamminen <eero.t.tamminen@intel.com>	2022-08-24 16:55:01 +03:00
Eero Tamminen	5666b8fa30	Add "prefix" option to GPU plugin for scalability testing GPU plugin code assumes container paths to match host paths, and container runtime prevents creating fake files under real paths. When non-standard paths are used, devices can be faked for scalability testing. Note: If one wants to run both normal GPU plugin and faked one in same cluster, all nodes providing fake "i915" resources should be labeled differently from ones with real GPU plugin + devices, so that real GPU workloads can be limited to correct nodes with a suitable nodeSelector. Signed-off-by: Eero Tamminen <eero.t.tamminen@intel.com>	2022-08-24 14:32:53 +03:00
Ed Bartosh	6177dd0dfe	Merge pull request #1093 from mythi/PR-2022-050 build: move to Go 1.19	2022-08-16 00:25:57 +03:00
astronaut0131	2d155edac7	sgx: add kind deployment notes for aesmd	2022-08-15 15:26:01 +08:00
Mikko Ylinen	642c4f7b59	build: move to Go 1.19 and golangci-lint 1.48 because of that Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>	2022-08-15 10:13:37 +03:00
Chelsea Mafrica	24eb52a912	docs: Fix missing code block in operator doc Add missing code block to section the the operator README. Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>	2022-08-05 11:32:48 -07:00
Mikko Ylinen	3c948cc106	Merge pull request #1063 from bart0sh/PR144-upgrade-libDLB dlb: update DLB to v7.7.0	2022-07-18 09:29:55 +03:00
Ed Bartosh	9f2db89da6	dlb: update DLB to v7.7.0 Signed-off-by: Ed Bartosh <eduard.bartosh@intel.com>	2022-07-03 15:08:14 +03:00
Huang Xin	89caad1cd4	doc: modify SGX device plugin deployments url from 'main' to '<RELEASE_VERSION>' Signed-off-by: Huang Xin <xin1.huang@intel.com>	2022-06-25 17:33:46 +08:00
Ed Bartosh	c82b907472	Merge pull request #1055 from mythi/PR-2022-045 operator: align with kubebuilder v3 functionality	2022-06-20 23:12:21 +03:00
Mikko Ylinen	f9ca36cc26	set TLSMinVersion for webhook servers Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>	2022-06-20 19:04:50 +03:00
Mikko Ylinen	b48568c43a	operator: align with kubebuilder v3 functionality kubebuilder v3 based scaffolding has updated many things and they are documented in [1]. Update operator's functionality to v3 level. We've done most/some of the changes earlier (e.g., by not using deprecated k8s APIs anymore) so the changes are minimal. [1] https://book.kubebuilder.io/migration/v2vsv3.html Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>	2022-06-20 16:35:40 +03:00
Oleg Zhurakivskyy	f1ec14d106	iaa: Add e2e tests Signed-off-by: Ed Bartosh <eduard.bartosh@intel.com> Signed-off-by: Oleg Zhurakivskyy <oleg.zhurakivskyy@intel.com>	2022-06-09 15:00:25 +03:00
Mikko Ylinen	9bb5f303ab	Merge pull request #1048 from bart0sh/PR142-get-rid-of-sysfsDir Get rid of unused sysfsDir parameter	2022-06-09 13:55:57 +03:00
Ed Bartosh	9d04ce825d	idxd: get rid of unused sysfsDir parameter	2022-06-08 22:09:27 +03:00
Ed Bartosh	3df93cf04f	rename image dsa-accel-config-demo -> accel-config-demo	2022-06-08 21:00:54 +03:00
Ed Bartosh	e182304c4d	Merge pull request #1030 from mythi/PR-2022-040 qat: add support for 401xx devices	2022-06-03 12:37:45 +03:00
Hyeongju Johannes Lee	276d25088e	dlb: update the version of DLB driver & DPDK to new release Signed-off-by: Hyeongju Johannes Lee <hyeongju.lee@intel.com>	2022-06-02 22:39:46 +03:00
Mikko Ylinen	8987f1ba53	qat: add support for 401xx devices QAT_401xx is a derivative of 4xxx. Add support for that device by including the device IDs (both PF and VF). Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>	2022-06-02 08:11:39 +03:00
Hyeongju Johannes Lee	85a12609a3	sgx: deprecate /dev/sgx/ mounts Signed-off-by: Hyeongju Johannes Lee <hyeongju.lee@intel.com>	2022-05-09 18:59:34 +03:00
Mikko Ylinen	6ea51a3623	qat: kerneldrv: skip QAT Gen4 devices Containers running on QAT Gen4 should be based on qatlib and therefore kerneldrv is not the right mode. Skip registering 4xxx* devices to ensure it is not used. Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>	2022-04-25 22:08:13 +03:00
Oleg Zhurakivskyy	e3a277c65f	doc: Update the documentation on the DSA, IAA ConfigMap creation Closes #941 Signed-off-by: Oleg Zhurakivskyy <oleg.zhurakivskyy@intel.com>	2022-04-25 10:17:17 +03:00
Mikko Ylinen	069b9bd79a	qat: 4xxx: split generic resource to compression and crypto Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>	2022-04-07 22:33:17 +03:00
Mikko Ylinen	482ed7ba4d	Merge pull request #939 from hj-johannes-lee/qat-allocation-policy qat: implement preferredAllocation policies	2022-04-07 21:15:49 +03:00
Hyeongju Johannes Lee	d3c8063ff3	qat: implement preferredAllocation policies Signed-off-by: Hyeongju Johannes Lee <hyeongju.lee@intel.com>	2022-04-07 14:14:00 +03:00
Mikko Ylinen	2adad5ae76	drop deprecated grpc.WithInsecure() grpc-go v1.43.0 deprecated grpc.WithInsecure() in favor of insecure.NewCredentials(). Move to use the recommended approach and drop the linter annotations. Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>	2022-04-07 13:40:51 +03:00
Tonny Tzeng	bf94f566fd	doc: unify test images build with make Signed-off-by: Tonny Tzeng <tonny.tzeng@intel.com>	2022-04-01 15:49:43 +08:00
Mikko Ylinen	0f36cde605	Merge pull request #935 from tkatila/gpu/tiles-support-and-numa-mapping gpu: add tiles annotation support	2022-03-30 19:33:09 +03:00
Tuomas Katila	8f6a235b5d	gpu: Start using GetPreferredAllocation with fractional resources Move reallocate logic to getpreferredallocation and simplify allocate to use the kubelet's device ids. Signed-off-by: Tuomas Katila <tuomas.katila@intel.com>	2022-03-30 11:32:49 +03:00
Mikko Ylinen	18379e92f3	Merge pull request #937 from tkatila/gpu/numa-mapping gpu: Add numa node mapping label for GPUs	2022-03-29 07:56:26 +03:00
Hyeongju Johannes Lee	7eeaddc563	gpu: fix typo in implmentation of preferredAllocator interface Signed-off-by: Hyeongju Johannes Lee <hyeongju.lee@intel.com>	2022-03-28 05:04:32 -07:00
Tuomas Katila	bdd72c8cf7	gpu: Add numa node mapping label for GPUs Signed-off-by: Tuomas Katila <tuomas.katila@intel.com>	2022-03-24 14:29:05 +02:00
Tuomas Katila	db7e5bfc55	Add support for gas-container-tiles annotation Adds functionality to convert container's tile annotation in to corresponding L0 affinity mask. This helps to target container's workload to specific L0 subdevices. Signed-off-by: Tuomas Katila <tuomas.katila@intel.com>	2022-03-24 14:13:35 +02:00
Mikko Ylinen	a03df7edd6	doc: fix operator usage instructions Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>	2022-03-16 08:10:58 +02:00
Ed Bartosh	6b27cf1f7c	Implement IAA plugin, operator, demo Signed-off-by: Ed Bartosh <eduard.bartosh@intel.com> Signed-off-by: Oleg Zhurakivskyy <oleg.zhurakivskyy@intel.com>	2022-03-04 15:58:42 +02:00
Mikko Ylinen	c064bfc4f1	demo: add intel-opencl-icd Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>	2022-02-24 11:06:27 +02:00
Hyeongju Johannes Lee	5fe2c3ef4d	dlb: update the link to dlb driver Signed-off-by: Hyeongju Johannes Lee <hyeongju.lee@intel.com>	2022-02-18 20:00:24 +02:00
Ed Bartosh	d4966e089c	Merge pull request #857 from ozhuraki/operator-upgrade operator: Support upgrade of plugins	2022-02-18 17:55:53 +02:00
Oleg Zhurakivskyy	f29171b067	operator: Add a documentation on upgrade Signed-off-by: Oleg Zhurakivskyy <oleg.zhurakivskyy@intel.com>	2022-02-18 12:52:55 +02:00
Mikko Ylinen	72c4552253	deployments: move SGX NFD config to an NFD kustomize overlay Start using the newly created NodeFeatureRule configs with SGX. This allows to drop the custom worker config. Additionally, split the example NFD deployment into two steps 1) plain NFD (+SGX json patches) 2) NodeFeatureRule creation NodeFeatureRule creation is not guaranteed to succeed when it's part of the same kustomization with the CRD creation. Users may also have NFD already running so allowing 2) alone works better in that scenario. Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>	2022-02-18 11:17:57 +02:00
Hyeongju Johannes Lee	d70397ebfb	dlb: update README Remove commands for building and loading dlb2 driver Signed-off-by: Hyeongju Johannes Lee <hyeongju.lee@intel.com>	2022-02-16 16:10:51 +02:00
Mikko Ylinen	a74774f939	docs: update cert-manager installation instructions The webhooks' default deployments depend on cert-manager. Our existing documentation points to a specific cert-manager version giving users the impression that it should be used. However, that is not the case. Update the documentation so that we just point to cert-manager installation page. With this, we don't have to hard-code to any specific version. Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>	2022-02-16 11:26:37 +02:00
Mikko Ylinen	1185f2329b	crypto-perf: drop SYS_ADMIN capabilities SYS_ADMIN capabilities are not necessary when using vfio-pci. Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>	2022-02-16 11:26:20 +02:00
Oleg Zhurakivskyy	656676b267	operator: Set klogr's format to FormatKlog The default "Serialize" breaks multiline output. Signed-off-by: Oleg Zhurakivskyy <oleg.zhurakivskyy@intel.com>	2022-02-09 16:49:35 +02:00
Ed Bartosh	8626d47d8b	operator: implement NFD labelling rules - added labelling rules for all supported devices - updated operator installation instructions Fixes: #768 Signed-off-by: Ed Bartosh <eduard.bartosh@intel.com>	2022-02-08 17:01:03 +02:00
Ed Bartosh	55f3e17dd0	add 'annotations' parameter to the NewDeviceInfo API Signed-off-by: Ed Bartosh <eduard.bartosh@intel.com>	2022-02-07 15:15:30 +02:00
Tuomas Katila	6f57c55ef8	Add a total tile count to node's labels This label isn't dependent on the debugfs as the platform specific tile count is. Signed-off-by: Tuomas Katila <tuomas.katila@intel.com>	2022-01-26 09:57:33 +02:00
Ukri Niemimuukko	7520393041	gpu_nfdhook: gpu-numbers and pci-groups This adds a new label "gpu-numbers" for short numbered lists of gpus, omitting "card" from the names. Also adds splitting of long label values. Similarly this adds a new label "pci-groups" for PCI groups. Grouping can be controlled by env var GPU_PCI_GROUPING_LEVEL. The env var dictates, how many pci-folder names need to match, in order for GPUs to be considered to belong in a group. Signed-off-by: Ukri Niemimuukko <ukri.niemimuukko@intel.com>	2022-01-25 09:17:56 +02:00
Mikko Ylinen	c306f5ef68	qat: detect noiommu mode with VFIO If the kernel has CONFIG_VFIO_NOIOMMU enabled and the node admin has explicitly set enable_unsafe_noiommu_mode VFIO parameter, VFIO taints the kernel and writes "vfio-noiommu" to the IOMMU group name. If these conditions are true, the /dev/vfio/ devices are prefixed with "noiommu-". This use-case is documented for DPDK so we don't want to break it (as it was before because we added DeviceMounts to /dev/vfio/<iommugroup> files that did not exist). See DPDK documentation for further information and warnings. Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>	2022-01-10 06:11:59 +02:00
Eero Tamminen	36046d90a4	Make GPU plugin / resource label limitations more explicit While the labeling limit is obvious after little thought, IMHO limitations like this should either be stated out front, or be in their own section in the README. Commit does former for the GPU plugin fractional resources, and latter for the NFD hook / labeling.	2022-01-04 11:43:08 +02:00
Ukri Niemimuukko	46dcffc33e	README typofix Label descriptions had extra underscores. Signed-off-by: Ukri Niemimuukko <ukri.niemimuukko@intel.com>	2021-12-28 12:01:40 +02:00
Hyeongju Johannes Lee	a2d13eea4c	dlb: update README Remove the sentence for pre-built image since Dockerhub image for dlb plugin is available. Signed-off-by: Hyeongju Johannes Lee <hyeongju.lee@intel.com>	2021-12-22 03:49:49 -08:00
Hyeongju Johannes Lee	74ecd6919c	dsa: Fix the names still left as idxd-initcontainer There are a few things left un-renamed after \#771. Rename those to idxd-config-initcontainer. Signed-off-by: Hyeongju Johannes Lee <hyeongju.lee@intel.com>	2021-12-21 04:39:19 -08:00
Mikko Ylinen	e09d52f6ff	Merge pull request #816 from hj-johannes-lee/dlb-flag-parse dlb:Fix the problem that klog is not printed	2021-12-21 13:31:49 +02:00
Hyeongju Johannes Lee	515bd5908c	dlb:Fix the problem that klog is not printed Add flag parsing to get command line parameters so that parameters about klog can be not ignored	2021-12-21 01:58:58 -08:00
Mikko Ylinen	c7e18d8b25	qat: rework driver binding The new_id based driver binding is failing on kernels 5.11+ when the QAT VF is not bound to any driver: attempts to write to new_id with the same device ID repeatedly error with "file exists". Move the new_id initialization to the beginning of the startup and write the enabled device IDs only once. This commit also fixes an issue where VF devices where not correctly detected in virtual machines where the VF was not bound any driver. Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>	2021-12-21 08:20:02 +02:00
Mikko Ylinen	b48ca7f686	qat: update dpdkdrv unit tests After a closer review, it was noticed that some of the QAT dpdkdrv unit tests need updating: - "Broken igb_uio DPDKdriver..." is actually testing unknown device ID and we already have tests for it -> drop. - "igb_uio DPDKdriver with one kernel bound device (not QAT device)" is testing something impossible: an unknown VF devID is originated from a QAT PF -> drop. - creating files for unbind/new_id etc. is unnecessary because os.WriteFile() creates them during the tests -> drop these lines to simplify unit tests maintenance. Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>	2021-12-21 08:20:02 +02:00
dependabot[bot]	9a16e80f2b	build(deps): bump google.golang.org/grpc from 1.42.0 to 1.43.0 Bumps [google.golang.org/grpc](https://github.com/grpc/grpc-go) from 1.42.0 to 1.43.0. - [Release notes](https://github.com/grpc/grpc-go/releases) - [Commits](https://github.com/grpc/grpc-go/compare/v1.42.0...v1.43.0) --- updated-dependencies: - dependency-name: google.golang.org/grpc dependency-type: direct:production update-type: version-update:semver-minor ... --- In addition to changes made by dependabot, I add nolint comments to ignore staticcheck(SA1019) errors. It is because insecure.NewCredentials() recommended as an alternative is still declared experimental. So keep grpc.withInsecure() with nolint comment. Signed-off-by: Hyeongju Johannes Lee <hyeongju.lee@intel.com>	2021-12-20 04:50:39 -08:00
Ed Bartosh	cec004c398	lint: enable wsl check Fixes: #392 Signed-off-by: Ed Bartosh <eduard.bartosh@intel.com>	2021-12-17 11:48:48 +02:00
Eero Tamminen	bcc737bd2a	Adapt GPU label support to debugfs DRM entry changes GPU generation "gen" number is replaced in the capability files of latest kernels with separate display, graphics, and media versions. For compatibility with newer kernels, provide "gen" based on the new labels (but without decimals), and for older kernel compatibility, new labels based on the "gen". Because different kernels match different items from the action map, whole capability file will get parsed. Capability file parsing is optimized by using prefix check instead of scanf. "platform_gen" label is deprecated, and can be dropped whenever it becomes inconvenient (lint complains about line count etc).	2021-12-16 21:22:31 +02:00
Eero Tamminen	599fc18e71	Provide workaround for the media issue and document it The issue is with VA-API and QSV, not VPL media API.	2021-12-15 18:40:33 +02:00
Hyeongju Johannes Lee	37dc1b124e	dlb: update README Add info on how to configure dlb driver and vfs. Signed-off-by: Hyeongju Johannes Lee <hyeongju.lee@intel.com>	2021-12-14 12:05:35 -08:00
Mikko Ylinen	e83a811ec7	sgx: update README The cmdline flags talked about the old device nodes. With the upstream driver, the devices nodes are /dev/sgx_[enclave\|provision]. Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>	2021-12-01 14:33:33 +02:00
Oleg Zhurakivskyy	fee2e12996	idxd-initcontainer: Drop libkmod, libudev - Make libkmod, libudev optional - Include accel-config, libjson-c, libuuid sources Signed-off-by: Oleg Zhurakivskyy <oleg.zhurakivskyy@intel.com>	2021-11-30 15:32:23 +02:00
Mikko Ylinen	1c4ee778b3	sgx: update NFD deployment Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>	2021-11-25 17:13:03 +02:00
Dmitry Rozhkov	db20ce1fe4	Merge pull request #754 from mythi/PR-2021-062 qat: update default flags and deploy without ConfigMap	2021-11-22 10:00:02 +02:00
Hyeongju Johannes Lee	84d8408a4f	README: add that operator supports for DSA and DLB plugins Signed-off-by: Hyeongju Johannes Lee <hyeongju.lee@intel.com>	2021-11-19 02:38:58 -08:00
Mikko Ylinen	b921a4a458	qat: update default flags and deploy without ConfigMap To make QAT plugin deployment consistent with the other plugins we update the default flags and deploy without the flag settings provided by the ConfigMap. Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>	2021-11-18 14:02:36 +02:00
Dmitry Rozhkov	471549c11d	Merge pull request #753 from hj-johannes-lee/dlb-operator operator: Add DLB support	2021-11-18 10:23:16 +02:00
Dmitry Rozhkov	42cde4ff6c	Merge pull request #742 from guoshuxu/dev GPU devices resource preferred allocation methods.	2021-11-18 10:22:03 +02:00
Xu, Guoshu	e4c4a8f7ac	GPU devices resource preferred allocation methods. 1. Implement PreferredAllocator interface. 2. Provide 3 preferred allocation policies: balancedPolicy, packedPolicy and nonePolicy. 3. Provide the cmdline interface: -allocation-policy balanced/packed/none, to select which preferred allocation policy to use. 4. Add operator support. Co-authored-by: Mikko Ylinen <mikko.ylinen@intel.com>	2021-11-17 22:55:10 +08:00
Hyeongju Johannes Lee	ff9034822b	operator: Add DLB support Signed-off-by: Hyeongju Johannes Lee <hyeongju.lee@intel.com>	2021-11-17 01:51:47 -08:00
Leow Chun Fung	1bbb0a6a7c	Support for PCI VPU device 8086/4fc0 and 8086/4fc1	2021-11-16 22:13:33 +07:00
Ed Bartosh	80829f72b1	ci: improve golangci job - used the same go version as for the project build - used verbose output - fixed gofmt check failures Signed-off-by: Ed Bartosh <eduard.bartosh@intel.com>	2021-11-13 00:32:25 +02:00
Ed Bartosh	b03227f9d4	dlb: add documentation Document DLB plugin Signed-off-by: Hyeongju Johannes Lee <hyeongju.lee@intel.com> Signed-off-by: Ed Bartosh <eduard.bartosh@intel.com>	2021-11-11 12:25:25 +02:00
Hyeongju Johannes Lee	8362028560	dlb: Add new device plugin Signed-off-by: Hyeongju Johannes Lee <hyeongju.lee@intel.com>	2021-11-11 11:51:49 +02:00
Oleg Zhurakivskyy	a7c612f7fc	dsa: Rename dsa initcontainer to idxd Signed-off-by: Oleg Zhurakivskyy <oleg.zhurakivskyy@intel.com>	2021-11-09 12:00:44 +02:00
Oleg Zhurakivskyy	cdaf6b3807	dsa: Add a documentation on provisioning with ConfigMap Signed-off-by: Oleg Zhurakivskyy <oleg.zhurakivskyy@intel.com>	2021-11-09 10:31:50 +02:00
Hyeongju Johannes Lee	13f4ce82a1	Remove nolint annot. Remove the annotation nolint:funlen since funlen is not used anymore.	2021-10-11 11:36:24 +03:00
Mikko Ylinen	e6cf299750	gpu: update READMEs Commit `00a59e8f7d` was not complete in that it didn't update the corresponding documentation. This commit fixes that. Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>	2021-10-08 11:57:16 +03:00
Oleg Zhurakivskyy	30ebc8e5d1	dsa: Add a documentation on provisioning with initcontainer Signed-off-by: Oleg Zhurakivskyy <oleg.zhurakivskyy@intel.com>	2021-10-01 12:16:50 +03:00
Mikko Ylinen	9d0d6cbe11	qat: set c6xxvf and 4xxxvf to default devices The devices enabled by default are different between the kustomize and operator based deployments. This change harmonizes the defaults to c6xxvf and 4xxxvf in both deployment options. Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>	2021-09-23 10:50:38 +03:00
Dmitry Rozhkov	19d54b9fe8	Merge pull request #707 from uniemimu/mem_read gpu nfdhook: new memory amount reading logic	2021-09-23 10:33:41 +03:00
Ukri Niemimuukko	64290020d7	gpu nfdhook: new memory amount reading logic This changes the memory reading to be done through lmem_total_bytes file instead of the addr_range file. Signed-off-by: Ukri Niemimuukko <ukri.niemimuukko@intel.com>	2021-09-21 13:50:41 +03:00
Hyeongju Johannes Lee	8fc5df7e37	Add govet-fieldalignment Add govet-fieldalignment to .golangci.yml Fix errors that come from adding govet-fieldalignment - by reordering the fields of structs - by putting nolint:govet annotations Signed-off-by: Hyeongju Johannes Lee <hyeongju.lee@intel.com>	2021-09-20 20:59:04 +03:00
Ukri Niemimuukko	0670a82cb1	gpu rm linter comment fixes Signed-off-by: Ukri Niemimuukko <ukri.niemimuukko@intel.com>	2021-09-10 14:35:13 +03:00
Li Ning	dcc12d9089	documentation: remove deprecated toc section in README The 'Verify node kubelet config' content was removed in `6b208f8`. Signed-off-by: Li Ning <ning.a.li@transwarp.io>	2021-09-07 19:38:41 +08:00
Hyeongju Johannes Lee	4bc70ac544	Add goerr113 linter check Add goerr113 lintercheck Fix the usage of fmt.Errorf() by wrapping errors Fix the usage of errors.New()	2021-09-03 11:02:14 +03:00
Hyeongju Johannes Lee	09ba9fde00	Update tool versions and fix errors and warnings that originated from the update Update tool versions Fix the errors and warnings originated from the update: -Correct type deviceInfo (->DeviceInfo) to make it public -Fix gpu_plugin.go and vpu_plugin_test.go where stylecheck errors occur -Fix deprecation warnings -Rename type 'PatcherManager' to 'Manager' to solve exported errors -Rename type 'SgxMutator' to 'Mutator' to solve exported errors Signed-off-by: Hyeongju Johannes Lee <hyeongju.lee@intel.com>	2021-08-25 07:09:34 +00:00
Mikko Ylinen	cfe2d65f32	Merge pull request #659 from 0x161e-swei/sgx-nfd-operator-dependency Add SGX webhook operator as dependency of sgx-nfd	2021-07-28 06:20:32 +03:00
Shijia Wei	9b66176ca5	Add SGX admissionwebhook as dependency of sgx-nfd daemonset; Mentioned dependency of the cert-manager in DaemonSet deployment method in SGX README.	2021-07-27 00:39:59 -05:00
Ed Bartosh	8a54a9ba64	webhook: document mappings deployment Fixes: #580 Signed-off-by: Ed Bartosh <eduard.bartosh@intel.com>	2021-07-26 14:23:10 +03:00
Eero Tamminen	83e7de0d41	Make GPU plugin intro information more generic & accurate - Information on specific HW & virtualization types on which GPU plugin is tested on, belongs to releases notes, not to README intro (where it has already became obsolete) - HW offloading is provided by driver backends, not frontends (e.g. OneVPL is just one of the media driver frontends)	2021-06-22 18:27:17 +03:00
Ukri Niemimuukko	b0130e693f	more documentation for fractional resources This adds a section heading, TOC link, command line flag description and a short explanation of what other dependendent configuration changes are needed with fractional resources in order for the command line flag to achieve something useful. Signed-off-by: Ukri Niemimuukko <ukri.niemimuukko@intel.com>	2021-06-14 16:25:38 +03:00
Ed Bartosh	98f80b5f47	Merge pull request #652 from uniemimu/hookupdate add link to gpu_nfdhook and update hook README	2021-06-13 12:15:46 +03:00
Eero Tamminen	a2faa3a8fc	Add section on GPU plugin options to its README	2021-06-11 19:55:43 +03:00
Ukri Niemimuukko	cbf7bab114	add link to gpu_nfdhook and update hook README This adds a link from gpu-plugin README to the nfdhook README, and updates the nfdhook README with label descriptions. Signed-off-by: Ukri Niemimuukko <ukri.niemimuukko@intel.com>	2021-06-11 18:54:44 +03:00
skaajas	956154c1db	Updated GPU plugin-specific readme general description.	2021-06-11 15:50:14 +03:00
Ed Bartosh	9d8fb392f5	Merge pull request #637 from uniemimu/skip add pf skip to gpu nfdhook	2021-06-11 10:57:39 +03:00
Ukri Niemimuukko	e3bf21dbe9	gpu_plugin: add documentation links to gpu aware scheduling Signed-off-by: Ukri Niemimuukko <ukri.niemimuukko@intel.com>	2021-06-10 19:46:35 +03:00
Ukri Niemimuukko	7ca5cfcfd6	add pf skip to gpu nfdhook This corresponds to the previous gpu-plugin skip code. Signed-off-by: Ukri Niemimuukko <ukri.niemimuukko@intel.com>	2021-06-10 18:44:57 +03:00
Mikko Ylinen	383778a24b	qat: fix C4xxx driver name Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>	2021-06-10 08:45:23 +03:00
Ed Bartosh	e180bfdf07	Merge pull request #644 from mythi/PR-2021-034 qat: do not fail if driver/unbind file does not exist	2021-06-09 11:38:52 +03:00
Mikko Ylinen	e8115d1c8d	qat: do not fail if driver/unbind file does not exist <device>/driver symlink does not exist if the device is not bound to any driver. bindDevice() failed when writing to <device>/driver/unbind errored but IsNotExist() error is acceptable in case there's no driver to unbind. Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>	2021-06-09 11:09:24 +03:00
Dmitry Rozhkov	6aa1a47c9a	Merge pull request #638 from uniemimu/fractional gpu_plugin: fractional resource management	2021-06-09 10:58:10 +03:00
Ukri Niemimuukko	2c4d529d66	gpu_plugin: fractional resource management Fractional resource management feature Signed-off-by: Ukri Niemimuukko <ukri.niemimuukko@intel.com> Signed-off-by: Dmitry Rozhkov <dmitry.rozhkov@intel.com>	2021-06-04 13:06:50 +03:00
Mikko Ylinen	facb4214a2	tree-wide: drop deprecated io/ioutil Go 1.16 release notes announced the deprecation of io/ioutil [1]. It's easy for us to move to use what is was recommended so just do it. [1] https://golang.org/doc/go1.16#ioutil Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>	2021-06-02 13:41:15 +03:00
Mikko Ylinen	06dbc1331b	images: move intel-qat-plugin-kerneldrv to Debian Also, update the documentation to reflect what is needed to enable and use '-mode kernel'. Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>	2021-06-02 13:39:39 +03:00
Leow Chun Fung	8e4b58c0f6	Implement support for PCI-based VPU	2021-05-19 18:15:17 +07:00
Mikko Ylinen	c3cf958c85	images: move most plugin images to distroless/static All but one (VPU) of the published container images can be built with static binaries which allows us to use distroless/static as the base image. Moreover, when combined with stripping the plugin binaries, we can get both build time and image size savings. This is the part 1 (out of 2) of the rework. Part 2 will finish the change by making some adjustments to VPU plugin image and moving the FPGA/SGX/GPU initcontainers to distroless/static too. Partial: #516 Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com> Signed-off-by: Ed Bartosh <eduard.bartosh@intel.com>	2021-05-19 09:44:47 +03:00
Eero Tamminen	c575ce9099	Document GPU plugin test code test-case struct members	2021-05-06 11:02:57 +03:00
Eero Tamminen	57c8d76e1b	Add minimal GPU plugin options testing Tests plugin scan results in setups having none, one and multiple eligible GPU devices, with and without SRIOV enabled, with two different options values. This does not cover verifying number of devices added under "i915_monitoring" resource as that would be much larger change.	2021-05-05 17:09:09 +03:00
Eero Tamminen	ca9aa32556	Add "-enable-monitoring" option to GPU plugin Make "i915_monitoring" resource (granting access to all GPUs) optional so that it can be enabled only when it's needed.	2021-05-05 17:09:09 +03:00
Eero Tamminen	713c1ab170	Move GPU plugin CLI options to a struct To help in: * adding more CLI options in next and later commits, and * to replace magic newDevicePlugin() input parameters with explicitly named one(s)	2021-05-05 17:09:09 +03:00
Eero Tamminen	06fac8128f	Move GPU plugin sysfs device compatibility checks to own function To reduce scan() function complexity before adding more functionality to it.	2021-05-05 17:08:49 +03:00
Eero Tamminen	79b86fea2d	Skip PF for "i915" resource when it has VFs NOTE: this has impact only for GPUs which are virtualized with SR-IOV. Access to physical devices (PFs) is disabled for "i915" resource when they have configured virtual devices (VFs). This is because: * GPU resources are expected to be evenly split between VFs in such configurations * But PF resource amount is expected to differ from VFs and typically retain only enough resources (just few MB of RAM), to be able to provide GPU metrics that are not available from VFs * Neither the current GPU plugin, nor Kubernetes scheduling in general, has proper support for heterogeneous GPUs (= capability based scheduling) Therefore "i915" resource needs to be limited to GPU devices with homogeneous amount of resources, which in SR-IOV configurations is expected to be the case only with VFs (when such are present).	2021-05-05 14:13:48 +03:00
Dmitry Rozhkov	38a59a57ea	Merge pull request #626 from mythi/PR-2021-028 sgx: add note about the SGX DCAP driver usage	2021-04-28 08:42:02 +03:00
Mikko Ylinen	111b833ea8	sgx: add note about the SGX DCAP driver usage The SGX DCAP out-of-tree v1.41 driver is also known to work with the SGX plugin. However, the default NFD labeling does not work with the out-of-tree driver so warn users about it. Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>	2021-04-27 22:10:21 +03:00
Eero Tamminen	e418c00fca	Add "i915_monitoring" resource to GPU plugin Which mounts all (Intel) GPU devices to requesting container. This is needed e.g. to get GPU metrics from the node. Requesting pod does not know how many GPUs are on the node it gets assigned to, so there needs to means to request them all. (Only alternative for the new resource would be requesting Privileged mode, which is clearly worse as that would grant pod access also to all other devices and capabilities.) This commit also: * Adds "i915_monitoring" resource testing to: go test -v -run Scan * Splits GPU plugin tests mock file system setup to a separate createTestFiles() function because otherwise TestScan() does not pass project's golangci-lint complexity limits	2021-04-27 14:21:05 +03:00
Ed Bartosh	08c2094329	update to cert-manager v1.3.1 Signed-off-by: Ed Bartosh <eduard.bartosh@intel.com>	2021-04-22 14:45:39 +03:00
Dmitry Rozhkov	3892baa4be	Merge pull request #615 from eero-t/gpu-plugin-testing-improvements Gpu plugin testing improvements	2021-04-20 09:47:10 +03:00
Mikko Ylinen	280bdceb2a	sgx: add separate admissionwebhook image Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>	2021-04-14 08:09:33 +03:00
Ed Bartosh	31614592c6	Merge pull request #599 from ozhuraki/operator-select-device-type Make it possible to select supported devices in the operator	2021-04-12 19:09:59 +03:00
Ukri Niemimuukko	bb44156d4f	gpu_nfdhook: make memory parsing more robust This add support for parsing also hex and octal amounts. Signed-off-by: Ukri Niemimuukko <ukri.niemimuukko@intel.com>	2021-04-09 16:23:48 +03:00
Oleg Zhurakivskyy	6fbf7c9182	operator: README: Document per device deployment Signed-off-by: Oleg Zhurakivskyy <oleg.zhurakivskyy@intel.com>	2021-04-08 10:53:04 +00:00
Oleg Zhurakivskyy	2d27602ed0	operator: Add --device command line to operator Add --device command line to operator's main.go which defines the controllers/webhooks to set up. Signed-off-by: Oleg Zhurakivskyy <oleg.zhurakivskyy@intel.com>	2021-04-08 10:33:47 +00:00
Eero Tamminen	f9158c1c3b	Update GPU plugin copyrights	2021-04-01 15:20:35 +03:00
Eero Tamminen	8ca19d408f	Fix GPU plugin error messages	2021-04-01 15:20:35 +03:00
Eero Tamminen	384d37ead0	Add test for multiple GPU devices	2021-04-01 15:20:35 +03:00
Eero Tamminen	49354693fb	Fix GPU plugin test setup + better error message Tests fail depending in which order they are run, unless mocked files are cleaned between test runs. Without this, the next commit would fail.	2021-04-01 15:20:35 +03:00
Mikko Ylinen	97bcecda04	operator: update usage guidelines As the operator container image is available from a registry, we should guide users to use it rather than build and deploy it locally. Further, drop (un)deploy-operator targets in favor of simply using kubectl for deployment. Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>	2021-03-30 15:33:09 +03:00
Dmitry Shmulevich	c8b5dce247	added an option to create a node label if epc memory is present updated README for SGX device plugin Signed-off-by: Dmitry Shmulevich <dmitry.shmulevich@gmail.com>	2021-03-18 11:53:49 -07:00
Ukri Niemimuukko	f89b61f923	add tile count label Signed-off-by: Ukri Niemimuukko <ukri.niemimuukko@intel.com>	2021-02-26 20:39:48 +02:00
Mikko Ylinen	15ad4ed54b	ci: drop master branch from workflow triggers Also, polish the remaining docs hits to 'master'. Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>	2021-02-23 10:51:04 +02:00
DougTW	7153923cfc	Edited qat_plugin README Replaced multiple instances of master with main. Reworded line 15 "Verify QAT device plugin is registered" removed 'on master' and corresponding section heading. Related to pr499. Signed-off-by: DougTW <doug.martin@intel.com>	2021-02-18 13:59:40 +02:00
Mikko Ylinen	abfa3496a2	sgx: update SGX SDK/DCAP versions Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>	2021-02-18 09:31:28 +02:00
Mikko Ylinen	f8c20905aa	update to cert-manager v1.2.0 Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>	2021-02-12 15:39:07 +02:00
Mikko Ylinen	37618d4f85	operator: move deviceplugin/v1 CRDs to cluster scope The device plugins daemonsets are cluster wide and currently only one device plugin instance per device is possible so making the corresponding deviceplugin/v1 CRDs non-namespaced (i.e., scope: cluster) fits better. Previously, the device plugin daemonset was deployed in the same namespace as the CR for that device but with the cluster scoped CRDs we default to use the same namespace as the operator, unless overridden via DEVICEPLUGIN_NAMESPACE env variable or a command line parameter to operator manager deployment. Three additional changes in this commit: - enable DSA envtest tests - update controller-runtime to v0.8.1 - change device plugin envtest suite to use klog/v2 Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>	2021-02-11 11:41:47 +02:00
Mikko Ylinen	c1f609c34a	Merge pull request #560 from DougTW/dm-edits-gpu-plugin edited gpu_plugin README; changed 2 instances of master to main.	2021-02-10 15:00:54 +02:00
Mikko Ylinen	2409427939	Merge pull request #561 from DougTW/dm-edits-operator Edited operator README. Changed 1 instance of master to main, line 78.	2021-02-10 10:36:56 +02:00
Mikko Ylinen	667aa943a4	Merge pull request #563 from DougTW/dm-edits-sgx-plugin Editing sgx_plugin README. Replacing 'master' with 'main'.	2021-02-10 10:35:51 +02:00
Ed Bartosh	d446be3c3d	Merge pull request #558 from DougTW/dm-edits-fpga-adms-readme fpga_admissionwebhook README.md; changed master to main	2021-02-10 10:15:04 +02:00
DougTW	a856f3215d	Editing sgx_plugin README. Replacing 'master' with 'main'. Related to pr499. Signed-off-by: DougTW <doug.martin@intel.com>	2021-02-09 17:17:05 -08:00
DougTW	80a7e4e651	Edited operator README. Changed 1 instance of master to main, line 78. Signed-off-by: DougTW <doug.martin@intel.com>	2021-02-09 16:59:20 -08:00
DougTW	625b30fd1b	Fixes 560. Edited gpu_plugin README. Restored master to line 157 Signed-off-by: DougTW <doug.martin@intel.com>	2021-02-09 16:49:30 -08:00
Mikko Ylinen	965936d8c3	Merge pull request #553 from bart0sh/PR0103-implement-dsa-operator operator: add DSA support	2021-02-09 16:24:41 +02:00
DougTW	28cbebc81b	edited gpu_plugin README; changed 2 instances of master to main. Related to PR 499. Signed-off-by: DougTW <doug.martin@intel.com>	2021-02-08 18:40:47 -08:00
DougTW	467d4082d3	fpga_plugin-readme; changed one instance of master to main. Related to PR 499. Signed-off-by: DougTW <doug.martin@intel.com>	2021-02-08 18:14:34 -08:00
DougTW	5ee1b6ce23	fpga_admissionwebhook README.md; changed master to main Signed-off-by: DougTW <doug.martin@intel.com>	2021-02-08 17:24:46 -08:00
Ed Bartosh	884f8e3dfe	operator: add DSA support Fixes: #443 Signed-off-by: Ed Bartosh <eduard.bartosh@intel.com>	2021-02-09 02:13:27 +02:00
Mikko Ylinen	7561501a51	Merge pull request #550 from dmitsh/ds-ext-res added implementation of EPC extended resource advertiser	2021-02-08 19:53:46 +02:00
Dmitry Shmulevich	3c3a3d1145	added implementation of EPC extended resource advertiser Signed-off-by: Dmitry Shmulevich <dmitry.shmulevich@gmail.com>	2021-02-04 17:35:17 -08:00
Mikko Ylinen	e94857ce5d	docs: harmonize device plugins operator naming Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>	2021-02-04 15:12:37 +02:00
Mikko Ylinen	0892a34705	move to k8s.io v1.20.x and klog/v2 v2.4.0 Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>	2021-01-21 15:34:39 +02:00
Dmitry Rozhkov	771b0c7432	Merge pull request #544 from mythi/PR-2021-003 sgx: change getDefaultPodCount() logic	2021-01-13 10:31:16 +02:00
Mikko Ylinen	ed3a650ddd	sgx: change getDefaultPodCount() logic Decouple the default enclaveLimit/provisionLimit from core count. With this change, the default limit is constant and it can be made relative to core count by setting PODS_PER_CORE multiplier via env variable. Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>	2021-01-12 20:24:46 +02:00
Ed Bartosh	6b208f8acf	documentation: remove kubelet configuration check Removed device plugin socket check from the documentation as device plugin support is enabled by default in Kubelet. Signed-off-by: Ed Bartosh <eduard.bartosh@intel.com>	2021-01-12 13:00:20 +02:00
Mikko Ylinen	da4a9fca96	qat: add note about vfio-pci module parameters Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>	2021-01-11 18:48:43 +02:00
Ed Bartosh	b007dc26f5	dsa: fix kubectl command line Fixed kubectl command line to get allocatable DSA resources Signed-off-by: Ed Bartosh <eduard.bartosh@intel.com>	2020-12-30 15:37:16 +02:00
Ed Bartosh	2e4de52f2b	implement DSA demo - Impelemented demo image that runs accel-config tests - Added testing instructions to the documentation Signed-off-by: Ed Bartosh <eduard.bartosh@intel.com>	2020-12-28 14:45:25 +02:00
Ukri Niemimuukko	5d31dca018	gpu_nfdhook: remove devfs dependency This removes the devfs dependency. Sysfs is sufficient for scanning presense of GPUs. Signed-off-by: Ukri Niemimuukko <ukri.niemimuukko@intel.com>	2020-12-23 15:43:48 +02:00
Mikko Ylinen	aef2e1655e	qat: run TestScanPrivate tests in parallel Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>	2020-12-23 11:18:21 +02:00
Mikko Ylinen	26d4b6f3a8	qat: fix device ID validation It looks that for a long time now we have accepted a setup where a valid QAT device ID is accepted as a QAT device resource even though the device is not "enabled" via kernelVfDrivers parameter. Fix device ID validation to skip valid QAT devices that are not explicitly specified in kernelVfDrivers. Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>	2020-12-21 14:33:27 +02:00
Mikko Ylinen	85fce2dcab	qat: rework device scanning The updated dp.scan() changes the way how VF devices are detected. The main reason for the change is to take into account cases where the QAT VF driver is not present in the system at all but only the PF driver is loaded (and the SR-IOV devices are are enabled). The rework also takes into account bare metal and VM deployments and adds a test case for checking the virtualized environment. Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>	2020-12-18 15:33:25 +02:00
Mikko Ylinen	2155a24e73	qat: add new devices and change defaults The plugin now detects/accepts 4xxx and c4xxx devices too and defaults to those drivers that are part of Linux mainline. Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>	2020-12-17 15:23:00 +02:00
Mikko Ylinen	621122e456	sgx_epchook: update to cpuid/v2 Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>	2020-12-15 19:58:13 +02:00
Ed Bartosh	2e7367eab3	fpga hook: language cleanup Signed-off-by: Ed Bartosh <eduard.bartosh@intel.com>	2020-12-10 10:58:40 +02:00
Mikko Ylinen	312b771ab7	Merge pull request #494 from bart0sh/PR0093-DSA-draft Implement DSA plugin	2020-12-09 15:15:46 +02:00
Mikko Ylinen	18ec3a449e	qat: move to path/filepath We have both "path" and "path/filepath" but the latter provides everything needed so move it completely. Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>	2020-12-08 07:38:20 +02:00
Mikko Ylinen	ad8bbcea21	qat: rework bus-device-function handling The code was stripping out "0000:" (bus) and then adding it back in several places. That's not necessary so this change simplifies QAT VF addr handling by operating using full BDF IDs. Moveover, simplify function calls: use getDpdkDevice() once for each VF device. Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>	2020-12-08 07:37:16 +02:00
Ed Bartosh	174643436a	implement DSA plugin Signed-off-by: Ed Bartosh <eduard.bartosh@intel.com>	2020-12-03 17:24:48 +02:00
Dmitry Rozhkov	f0fa9df292	operator: prepare for publishing at operatorhub.io	2020-11-24 18:35:56 +02:00
Mikko Ylinen	d65cb902e6	sgx: move to RFC v4x device API The SGX device nodes have changed from /dev/sgx/[enclave\|provision] to /dev/sgx_[enclave\|provision] in v4x RFC patches according to the LKML feedback. This changes moves to use the new device nodes. Backwards compatibility is provided by adding /dev/sgx directory mount to containers. This assumes the cluster admin has installed the udev rules provided in the README to make the old device nodes as symlinks to the new device nodes. Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>	2020-11-18 21:17:28 +02:00
Dmitry Rozhkov	5ec466b2eb	add known issue for operator	2020-11-12 11:23:41 +02:00
Alexander D. Kanevskiy	75355c9937	Merge pull request #497 from bart0sh/PR0094-move-GetAPIVersion-out-of-NewPort fpga: move GetAPIVersion call out of NewPort and NewFME	2020-11-11 12:09:13 +02:00
Ed Bartosh	2c73e2a0b3	fpga: move GetAPIVersion call out of NewPort and NewFME This call is implemented by calling ioctl, which raises "open /dev/intel-fpga-port.X: operation not permitted" error when called inside unprivileged container. This breaks FPGA plugin. Calling this API from fpga_tool is still OK, so moving calls there should fix the issue. Signed-off-by: Ed Bartosh <eduard.bartosh@intel.com>	2020-11-10 16:44:20 +02:00
Dmitry Rozhkov	5f0da56045	Upgrade to k8s v1.19.3	2020-11-10 16:09:20 +02:00
Ed Bartosh	680da54fd9	fpga: improve port init Used generic newPort API instead of device-specific newDflPort and newIntelFpgaPort. Signed-off-by: Ed Bartosh <eduard.bartosh@intel.com>	2020-11-01 01:47:49 +02:00
Dmitry Rozhkov	25a52b0b74	Merge pull request #478 from bart0sh/PR0091-FPGA-SRIO-V fpga: reimplement device discovering	2020-10-30 10:05:05 +02:00
Mikko Ylinen	0f6eefee23	sgx: add documentation This commit documents the SGX building blocks for Kubernetes and how to deploy them in the cluster. Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>	2020-10-27 15:02:40 +02:00
Ed Bartosh	243870a707	fpga: reimplement device discovering Reimplemented discovering of the FPGA devices using APIs from pkg/fpga/intel_fpga_linux. The APis are also used in the fpga_tool utility. The API is more advanced and supports SR-IOV among other things. Fixes: #372 Signed-off-by: Ed Bartosh <eduard.bartosh@intel.com>	2020-10-26 21:45:52 +02:00
Dmitry Rozhkov	87143355ba	Merge pull request #483 from mythi/sgx-nfd sgx: make SGX NFD kustomization overlay independent	2020-10-26 13:25:36 +02:00
Ukri Niemimuukko	5b5180ae00	gpu_nfdhook memory amount reading from sysfs This adds reading of the GPU memory amount from the sysfs. As a fallback the environment variable GPU_MEMORY_OVERRIDE remains. Another environment variable GPU_MEMORY_RESERVED can be used to reserve a dedicated byte amount outside of kubernetes usage. Signed-off-by: Ukri Niemimuukko <ukri.niemimuukko@intel.com>	2020-10-26 09:45:43 +02:00
Mikko Ylinen	161298190f	sgx: make SGX NFD kustomization overlay independent With the addition of SGX webhook in the operator, full SGX stack depends on having the operator deployed first. SgxDevicePlugin CRD is set to get intel-sgx-plugin and intel-sgx-initcontainer deployed by the operator. As a pre-requisite, node-feature-discovery must be deployed but it is currently deployed via sgx_plugin kustomization overlay only. It's better to allow NFD with the SGX specific settings deployed with a kustomization of its own. Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>	2020-10-23 12:44:36 +03:00

... 2 3 4 5 6 ...

574 Commits