intel-device-plugins-for-kubernetes

github/intel-device-plugins-for-kubernetes

mirror of https://github.com/intel/intel-device-plugins-for-kubernetes.git synced 2025-06-03 03:59:37 +00:00

Author	SHA1	Message	Date
Ed Bartosh	b4c2bd3afe	Merge pull request #1116 from eero-t/gpu_fakedev Add fake GPU device generator for scalability testing	2022-12-07 18:44:08 +02:00
Ukri Niemimuukko	59cd72a66f	fix gpu nfdhook numa labeling Numa labeling only worked when card numbering started from 0. Signed-off-by: Ukri Niemimuukko <ukri.niemimuukko@intel.com>	2022-11-22 18:19:33 +02:00
Mikko Ylinen	afce0ed79c	Merge pull request #1196 from ozhuraki/e2e-operator operator: Add e2e tests for DSA, IAA	2022-11-17 21:30:33 +02:00
Oleg Zhurakivskyy	ef7954c8e1	operator: Add e2e tests for DSA, IAA Closes #1230 Signed-off-by: Oleg Zhurakivskyy <oleg.zhurakivskyy@intel.com>	2022-11-17 17:47:21 +02:00
Hyeongju Johannes Lee	9b203ba6b8	iaa: fix the name of the demo image Signed-off-by: Hyeongju Johannes Lee <hyeongju.lee@intel.com>	2022-11-11 15:37:11 +02:00
Mikko Ylinen	5876882066	operator: add support for Liveness and Readiness probes Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>	2022-11-03 10:25:07 +02:00
Hyeongju Johannes Lee	372dd73bfd	iaa: fix readme to have correct web links Signed-off-by: Hyeongju Johannes Lee <hyeongju.lee@intel.com>	2022-10-31 13:07:17 +02:00
chaitanya1731	084bf53efb	Added ocp_quickstart_guide for OCP users Added operator installation steps for RedHat OpenShift Container Platform and updated main README to add the link Signed-off-by: chaitanya1731 <chaitanya.kulkarni@intel.com>	2022-10-13 01:10:31 -07:00
Ukri Niemimuukko	41b7b55727	gpu: log errors from pod listing Signed-off-by: Ukri Niemimuukko <ukri.niemimuukko@intel.com>	2022-10-11 14:31:56 +03:00
Mikko Ylinen	75bff62ba1	Merge pull request #1183 from tkatila/gpu-demo-updates gpu: improve demo run instructions	2022-10-07 13:08:54 +03:00
Eero Tamminen	0b47ebd3e7	Add information on new DKMS kernel GPU driver packages Signed-off-by: Eero Tamminen <eero.t.tamminen@intel.com>	2022-10-06 18:08:53 +03:00
Tuomas Katila	56bc5ebeee	Modifications based on Eero's comments Signed-off-by: Tuomas Katila <tuomas.katila@intel.com>	2022-10-06 17:55:04 +03:00
Tuomas Katila	63cbe808a7	gpu: improve demo run instructions Signed-off-by: Tuomas Katila <tuomas.katila@intel.com>	2022-10-05 16:10:03 +03:00
Mikko Ylinen	fd1b25b9d4	docs: move away from 01.org doc links Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>	2022-10-03 18:22:07 +03:00
Eero Tamminen	647b484e7a	Improve GPU drivers installation instructions - Add note about LTS kernel DKMS source repo - Correct note about the demo (unlike FPGA demo, GPU demo is not in docker hub) Fixes: `89d3c5a4f3` Signed-off-by: Eero Tamminen <eero.t.tamminen@intel.com>	2022-09-28 12:40:30 +03:00
Eero Tamminen	9b3ee06cb1	Add GPU plugin README prerequisites section Signed-off-by: Eero Tamminen <eero.t.tamminen@intel.com>	2022-09-23 20:32:46 +03:00
Tuomas Katila	eac635e439	gpu: fix documentation links Signed-off-by: Tuomas Katila <tuomas.katila@intel.com>	2022-09-23 20:32:46 +03:00
Tuomas Katila	e375186458	Update cmd/gpu_plugin/README.md Co-authored-by: Ukri Niemimuukko <ukri.niemimuukko@intel.com>	2022-09-15 15:30:23 +03:00
Tuomas Katila	c562db9b28	gpu: Improve installation options and documentation Signed-off-by: Tuomas Katila <tuomas.katila@intel.com>	2022-09-15 15:19:23 +03:00
Ed Bartosh	92cd51bec3	Merge pull request #1152 from mythi/PR-2022-063 Update SGX and FPGA webhook flags	2022-09-13 19:58:44 +03:00
Ed Bartosh	f2db3826d8	Merge pull request #1134 from mythi/PR-2022-058 qat: read device capabilities from sysfs	2022-09-13 19:56:45 +03:00
Mikko Ylinen	b81d2dcba8	Update SGX and FPGA webhook flags SGX Admission webhook was quickly forked from FPGA's implementation. After a bit of thinking, it turns out leader election and metrics are not necessary for a (idempotent) webhook-only functionality. For FPGA Admission webhook, the metrics isn't correctly set up so it's better to disable the functionality. Leader election is kept but the flag name is renamed to align with "kubebuilder v3 functionality" similar to how we changed it to the operator as well. Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>	2022-09-13 13:18:28 +03:00
Mikko Ylinen	3abf10d7ff	qat: read device capabilities from sysfs Linux 6.0 adds sysfs-driver-qat entries to read device capabilities: `42e66b1cc3/Documentation/ABI/testing/sysfs-driver-qat` Implement the logic for reading from sysfs and prefer that over debugfs. Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>	2022-09-09 14:16:03 +03:00
Tuomas Katila	230570f12e	gpu: add mentions about data center gpu support Signed-off-by: Tuomas Katila <tuomas.katila@intel.com>	2022-09-09 13:07:50 +03:00
Mikko Ylinen	307e960871	docs: fix remaining review comments Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>	2022-09-06 14:28:25 +03:00
Mikko Ylinen	8ac321f5e3	sgx: send nil TopologyInfo /dev/sgx_* cannot be mapped to any topology. SGX itself is topology aware but we cannot control it with TopologyInfo. Currently, pkg/topology returns empty TopologyInfo{Nodes:[]NUMANode{}} for /dev/sgx_ but kubelet TopologyManager (when enabled and with the policy other than 'none') interpretes that as "Hint Provider has no possible NUMA affinities for resource" and rejects the SGX resources. What we want is "Hint Provider has no preference for NUMA affinity with resource". This is communicated using nil TopologyInfo. See: https://github.com/kubernetes/kubernetes/issues/112234 Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>	2022-09-06 08:43:04 +03:00
Ed Bartosh	f0dd95274e	Merge pull request #1126 from mythi/PR-2022-054 docs: rework development guide	2022-09-02 17:59:15 +03:00
Ed Bartosh	5756725b09	fix lint failure Removed unused import. This should fix this golangci-lint failure: can't run linter goanalysis_metalinter: buildir: failed to load package : could not load export data: no export data for "cloud.google.com/go/compute/metadata" Signed-off-by: Ed Bartosh <eduard.bartosh@intel.com>	2022-09-02 12:02:06 +03:00
Mikko Ylinen	1b3accacc2	docs: rework development guide Currently, each individual plugin README documents roughly the same daily development steps to git clone, build, and deploy. Re-purpose the plugin READMEs more towards cluster admin type of documentation and start moving all development related documentation to DEVEL.md. The same is true for e2e testing documentation which is scattered in places where they don't belong to. Having all day-to-day development Howtos is good to have in a centralized place. Finally, the cleanup includes some harmonization to plugins' table of contents which now follows the pattern: * [Introduction](#introduction) (* [Modes and Configuration Options](#modes-and-configuration-options)) * [Installation](#installation) (* [Prerequisites](#prerequisites)) * [Pre-built Images](#pre-built-images) * [Verify Plugin Registration](#verify-plugin-registration) * [Testing and Demos](#testing-and-demos) * ... Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>	2022-08-31 20:00:15 +03:00
Eero Tamminen	fb18923298	Log GPU device share count & type count changes separately And instead of accessing DeviceTree internals, add suitable method for it. Signed-off-by: Eero Tamminen <eero.t.tamminen@intel.com>	2022-08-31 17:23:57 +03:00
Mikko Ylinen	d826548d29	Merge pull request #1113 from eero-t/gpu-count-log More detailed log for number of found GPU devices / resource types	2022-08-29 09:58:53 +03:00
Ed Bartosh	02446fca1d	Merge pull request #1114 from eero-t/prefix-option Add "prefix" option to GPU plugin for scalability testing	2022-08-26 22:54:28 +03:00
Eero Tamminen	9d4b52188e	Add "gpu_fakedev" documentation Signed-off-by: Eero Tamminen <eero.t.tamminen@intel.com>	2022-08-26 19:05:10 +03:00
Eero Tamminen	cc3aebbefc	Add minimal example JSON to test "gpu_fakedev" generator Config file is suitably indented so that it can be directly appended to a suitable configMap header. Signed-off-by: Eero Tamminen <eero.t.tamminen@intel.com>	2022-08-26 19:05:10 +03:00
Eero Tamminen	c15feea1f8	Add code for generating fake GPU sysfs + devfs files To facilitate GPU plugin scalability testing on a real cluster. Pre-existing (fake) sysfs & devfs content needs to be removed first: * Fake devfs directory is mounted from host so OCI runtime can "mount" device files also to workloads requesting fake devices. This means that those files can persist over fake GPU plugin life-time, and earlier files need to be removed, as they may not match * DaemonSet restarts failing init containers, so errors about content created on previous generator run would prevent getting logs of the real error on first generator run * Before removal, check that removed directory content is as expected, to avoid accidentally removing host sysfs/devfs content (in case container was erronously granted access to the real thing) Container runtime requires fake device files to real be devices: * Use NULL devices to represent fake GPU devices: https://www.kernel.org/doc/Documentation/admin-guide/devices.txt * Give more detailed logging for MkNod() failures as device node creation is most likely operation to fail when container does not have the necessary access rights Created content is based on JSON config file (instead of e.g. commandline options) so that (configMap providing) it can be updated independently of the pod where generator is run. Signed-off-by: Eero Tamminen <eero.t.tamminen@intel.com>	2022-08-26 19:04:43 +03:00
Eero Tamminen	ddf2c8bc8f	More detailed log for number of found GPU devices / resource types Signed-off-by: Eero Tamminen <eero.t.tamminen@intel.com>	2022-08-26 17:51:27 +03:00
Eero Tamminen	0b519ecf1e	Deprecate debugfs GPU IP block version labels in NFD hook doc There's no mapping available from IP block versions to actual product features, which make these version numbers fairly useless for end users. In mixed GPU clusters, running a job that adds/updates node labels for the relevant GPU features to each relevant node would be much more user-friendly. This could be done easily by converting given GPU API capability tool (e.g. "vainfo" for VA-API, "clinfo" for OpenCL) output to a NFD feature file. (Such thing would be outside of this project scope though, except maybe as an example / test-case.) Signed-off-by: Eero Tamminen <eero.t.tamminen@intel.com>	2022-08-24 16:55:01 +03:00
Eero Tamminen	0b7cbc862d	Improve GPU NFD hook documentation Add table of contents, simplify introduction text. Signed-off-by: Eero Tamminen <eero.t.tamminen@intel.com>	2022-08-24 16:55:01 +03:00
Eero Tamminen	5666b8fa30	Add "prefix" option to GPU plugin for scalability testing GPU plugin code assumes container paths to match host paths, and container runtime prevents creating fake files under real paths. When non-standard paths are used, devices can be faked for scalability testing. Note: If one wants to run both normal GPU plugin and faked one in same cluster, all nodes providing fake "i915" resources should be labeled differently from ones with real GPU plugin + devices, so that real GPU workloads can be limited to correct nodes with a suitable nodeSelector. Signed-off-by: Eero Tamminen <eero.t.tamminen@intel.com>	2022-08-24 14:32:53 +03:00
Ed Bartosh	6177dd0dfe	Merge pull request #1093 from mythi/PR-2022-050 build: move to Go 1.19	2022-08-16 00:25:57 +03:00
astronaut0131	2d155edac7	sgx: add kind deployment notes for aesmd	2022-08-15 15:26:01 +08:00
Mikko Ylinen	642c4f7b59	build: move to Go 1.19 and golangci-lint 1.48 because of that Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>	2022-08-15 10:13:37 +03:00
Chelsea Mafrica	24eb52a912	docs: Fix missing code block in operator doc Add missing code block to section the the operator README. Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>	2022-08-05 11:32:48 -07:00
Mikko Ylinen	3c948cc106	Merge pull request #1063 from bart0sh/PR144-upgrade-libDLB dlb: update DLB to v7.7.0	2022-07-18 09:29:55 +03:00
Ed Bartosh	9f2db89da6	dlb: update DLB to v7.7.0 Signed-off-by: Ed Bartosh <eduard.bartosh@intel.com>	2022-07-03 15:08:14 +03:00
Huang Xin	89caad1cd4	doc: modify SGX device plugin deployments url from 'main' to '<RELEASE_VERSION>' Signed-off-by: Huang Xin <xin1.huang@intel.com>	2022-06-25 17:33:46 +08:00
Ed Bartosh	c82b907472	Merge pull request #1055 from mythi/PR-2022-045 operator: align with kubebuilder v3 functionality	2022-06-20 23:12:21 +03:00
Mikko Ylinen	f9ca36cc26	set TLSMinVersion for webhook servers Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>	2022-06-20 19:04:50 +03:00
Mikko Ylinen	b48568c43a	operator: align with kubebuilder v3 functionality kubebuilder v3 based scaffolding has updated many things and they are documented in [1]. Update operator's functionality to v3 level. We've done most/some of the changes earlier (e.g., by not using deprecated k8s APIs anymore) so the changes are minimal. [1] https://book.kubebuilder.io/migration/v2vsv3.html Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>	2022-06-20 16:35:40 +03:00
Oleg Zhurakivskyy	f1ec14d106	iaa: Add e2e tests Signed-off-by: Ed Bartosh <eduard.bartosh@intel.com> Signed-off-by: Oleg Zhurakivskyy <oleg.zhurakivskyy@intel.com>	2022-06-09 15:00:25 +03:00

1 2 3 4 5 ...

445 Commits