intel-device-plugins-for-kubernetes

github/intel-device-plugins-for-kubernetes

mirror of https://github.com/intel/intel-device-plugins-for-kubernetes.git synced 2025-06-03 03:59:37 +00:00

Author	SHA1	Message	Date
Tuomas Katila	13e20c9abf	gpu: fix documentation links Signed-off-by: Tuomas Katila <tuomas.katila@intel.com>	2022-09-16 15:25:45 +03:00
Tuomas Katila	e375186458	Update cmd/gpu_plugin/README.md Co-authored-by: Ukri Niemimuukko <ukri.niemimuukko@intel.com>	2022-09-15 15:30:23 +03:00
Tuomas Katila	c562db9b28	gpu: Improve installation options and documentation Signed-off-by: Tuomas Katila <tuomas.katila@intel.com>	2022-09-15 15:19:23 +03:00
Tuomas Katila	230570f12e	gpu: add mentions about data center gpu support Signed-off-by: Tuomas Katila <tuomas.katila@intel.com>	2022-09-09 13:07:50 +03:00
Ed Bartosh	f0dd95274e	Merge pull request #1126 from mythi/PR-2022-054 docs: rework development guide	2022-09-02 17:59:15 +03:00
Mikko Ylinen	1b3accacc2	docs: rework development guide Currently, each individual plugin README documents roughly the same daily development steps to git clone, build, and deploy. Re-purpose the plugin READMEs more towards cluster admin type of documentation and start moving all development related documentation to DEVEL.md. The same is true for e2e testing documentation which is scattered in places where they don't belong to. Having all day-to-day development Howtos is good to have in a centralized place. Finally, the cleanup includes some harmonization to plugins' table of contents which now follows the pattern: * [Introduction](#introduction) (* [Modes and Configuration Options](#modes-and-configuration-options)) * [Installation](#installation) (* [Prerequisites](#prerequisites)) * [Pre-built Images](#pre-built-images) * [Verify Plugin Registration](#verify-plugin-registration) * [Testing and Demos](#testing-and-demos) * ... Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>	2022-08-31 20:00:15 +03:00
Eero Tamminen	fb18923298	Log GPU device share count & type count changes separately And instead of accessing DeviceTree internals, add suitable method for it. Signed-off-by: Eero Tamminen <eero.t.tamminen@intel.com>	2022-08-31 17:23:57 +03:00
Mikko Ylinen	d826548d29	Merge pull request #1113 from eero-t/gpu-count-log More detailed log for number of found GPU devices / resource types	2022-08-29 09:58:53 +03:00
Eero Tamminen	ddf2c8bc8f	More detailed log for number of found GPU devices / resource types Signed-off-by: Eero Tamminen <eero.t.tamminen@intel.com>	2022-08-26 17:51:27 +03:00
Eero Tamminen	5666b8fa30	Add "prefix" option to GPU plugin for scalability testing GPU plugin code assumes container paths to match host paths, and container runtime prevents creating fake files under real paths. When non-standard paths are used, devices can be faked for scalability testing. Note: If one wants to run both normal GPU plugin and faked one in same cluster, all nodes providing fake "i915" resources should be labeled differently from ones with real GPU plugin + devices, so that real GPU workloads can be limited to correct nodes with a suitable nodeSelector. Signed-off-by: Eero Tamminen <eero.t.tamminen@intel.com>	2022-08-24 14:32:53 +03:00
Mikko Ylinen	642c4f7b59	build: move to Go 1.19 and golangci-lint 1.48 because of that Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>	2022-08-15 10:13:37 +03:00
Mikko Ylinen	2adad5ae76	drop deprecated grpc.WithInsecure() grpc-go v1.43.0 deprecated grpc.WithInsecure() in favor of insecure.NewCredentials(). Move to use the recommended approach and drop the linter annotations. Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>	2022-04-07 13:40:51 +03:00
Mikko Ylinen	0f36cde605	Merge pull request #935 from tkatila/gpu/tiles-support-and-numa-mapping gpu: add tiles annotation support	2022-03-30 19:33:09 +03:00
Tuomas Katila	8f6a235b5d	gpu: Start using GetPreferredAllocation with fractional resources Move reallocate logic to getpreferredallocation and simplify allocate to use the kubelet's device ids. Signed-off-by: Tuomas Katila <tuomas.katila@intel.com>	2022-03-30 11:32:49 +03:00
Hyeongju Johannes Lee	7eeaddc563	gpu: fix typo in implmentation of preferredAllocator interface Signed-off-by: Hyeongju Johannes Lee <hyeongju.lee@intel.com>	2022-03-28 05:04:32 -07:00
Tuomas Katila	db7e5bfc55	Add support for gas-container-tiles annotation Adds functionality to convert container's tile annotation in to corresponding L0 affinity mask. This helps to target container's workload to specific L0 subdevices. Signed-off-by: Tuomas Katila <tuomas.katila@intel.com>	2022-03-24 14:13:35 +02:00
Mikko Ylinen	c064bfc4f1	demo: add intel-opencl-icd Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>	2022-02-24 11:06:27 +02:00
Ed Bartosh	55f3e17dd0	add 'annotations' parameter to the NewDeviceInfo API Signed-off-by: Ed Bartosh <eduard.bartosh@intel.com>	2022-02-07 15:15:30 +02:00
Eero Tamminen	36046d90a4	Make GPU plugin / resource label limitations more explicit While the labeling limit is obvious after little thought, IMHO limitations like this should either be stated out front, or be in their own section in the README. Commit does former for the GPU plugin fractional resources, and latter for the NFD hook / labeling.	2022-01-04 11:43:08 +02:00
dependabot[bot]	9a16e80f2b	build(deps): bump google.golang.org/grpc from 1.42.0 to 1.43.0 Bumps [google.golang.org/grpc](https://github.com/grpc/grpc-go) from 1.42.0 to 1.43.0. - [Release notes](https://github.com/grpc/grpc-go/releases) - [Commits](https://github.com/grpc/grpc-go/compare/v1.42.0...v1.43.0) --- updated-dependencies: - dependency-name: google.golang.org/grpc dependency-type: direct:production update-type: version-update:semver-minor ... --- In addition to changes made by dependabot, I add nolint comments to ignore staticcheck(SA1019) errors. It is because insecure.NewCredentials() recommended as an alternative is still declared experimental. So keep grpc.withInsecure() with nolint comment. Signed-off-by: Hyeongju Johannes Lee <hyeongju.lee@intel.com>	2021-12-20 04:50:39 -08:00
Ed Bartosh	cec004c398	lint: enable wsl check Fixes: #392 Signed-off-by: Ed Bartosh <eduard.bartosh@intel.com>	2021-12-17 11:48:48 +02:00
Eero Tamminen	599fc18e71	Provide workaround for the media issue and document it The issue is with VA-API and QSV, not VPL media API.	2021-12-15 18:40:33 +02:00
Xu, Guoshu	e4c4a8f7ac	GPU devices resource preferred allocation methods. 1. Implement PreferredAllocator interface. 2. Provide 3 preferred allocation policies: balancedPolicy, packedPolicy and nonePolicy. 3. Provide the cmdline interface: -allocation-policy balanced/packed/none, to select which preferred allocation policy to use. 4. Add operator support. Co-authored-by: Mikko Ylinen <mikko.ylinen@intel.com>	2021-11-17 22:55:10 +08:00
Mikko Ylinen	e6cf299750	gpu: update READMEs Commit `00a59e8f7d` was not complete in that it didn't update the corresponding documentation. This commit fixes that. Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>	2021-10-08 11:57:16 +03:00
Hyeongju Johannes Lee	8fc5df7e37	Add govet-fieldalignment Add govet-fieldalignment to .golangci.yml Fix errors that come from adding govet-fieldalignment - by reordering the fields of structs - by putting nolint:govet annotations Signed-off-by: Hyeongju Johannes Lee <hyeongju.lee@intel.com>	2021-09-20 20:59:04 +03:00
Ukri Niemimuukko	0670a82cb1	gpu rm linter comment fixes Signed-off-by: Ukri Niemimuukko <ukri.niemimuukko@intel.com>	2021-09-10 14:35:13 +03:00
Li Ning	dcc12d9089	documentation: remove deprecated toc section in README The 'Verify node kubelet config' content was removed in `6b208f8`. Signed-off-by: Li Ning <ning.a.li@transwarp.io>	2021-09-07 19:38:41 +08:00
Hyeongju Johannes Lee	09ba9fde00	Update tool versions and fix errors and warnings that originated from the update Update tool versions Fix the errors and warnings originated from the update: -Correct type deviceInfo (->DeviceInfo) to make it public -Fix gpu_plugin.go and vpu_plugin_test.go where stylecheck errors occur -Fix deprecation warnings -Rename type 'PatcherManager' to 'Manager' to solve exported errors -Rename type 'SgxMutator' to 'Mutator' to solve exported errors Signed-off-by: Hyeongju Johannes Lee <hyeongju.lee@intel.com>	2021-08-25 07:09:34 +00:00
Eero Tamminen	83e7de0d41	Make GPU plugin intro information more generic & accurate - Information on specific HW & virtualization types on which GPU plugin is tested on, belongs to releases notes, not to README intro (where it has already became obsolete) - HW offloading is provided by driver backends, not frontends (e.g. OneVPL is just one of the media driver frontends)	2021-06-22 18:27:17 +03:00
Ukri Niemimuukko	b0130e693f	more documentation for fractional resources This adds a section heading, TOC link, command line flag description and a short explanation of what other dependendent configuration changes are needed with fractional resources in order for the command line flag to achieve something useful. Signed-off-by: Ukri Niemimuukko <ukri.niemimuukko@intel.com>	2021-06-14 16:25:38 +03:00
Ed Bartosh	98f80b5f47	Merge pull request #652 from uniemimu/hookupdate add link to gpu_nfdhook and update hook README	2021-06-13 12:15:46 +03:00
Eero Tamminen	a2faa3a8fc	Add section on GPU plugin options to its README	2021-06-11 19:55:43 +03:00
Ukri Niemimuukko	cbf7bab114	add link to gpu_nfdhook and update hook README This adds a link from gpu-plugin README to the nfdhook README, and updates the nfdhook README with label descriptions. Signed-off-by: Ukri Niemimuukko <ukri.niemimuukko@intel.com>	2021-06-11 18:54:44 +03:00
skaajas	956154c1db	Updated GPU plugin-specific readme general description.	2021-06-11 15:50:14 +03:00
Ed Bartosh	9d8fb392f5	Merge pull request #637 from uniemimu/skip add pf skip to gpu nfdhook	2021-06-11 10:57:39 +03:00
Ukri Niemimuukko	e3bf21dbe9	gpu_plugin: add documentation links to gpu aware scheduling Signed-off-by: Ukri Niemimuukko <ukri.niemimuukko@intel.com>	2021-06-10 19:46:35 +03:00
Ukri Niemimuukko	7ca5cfcfd6	add pf skip to gpu nfdhook This corresponds to the previous gpu-plugin skip code. Signed-off-by: Ukri Niemimuukko <ukri.niemimuukko@intel.com>	2021-06-10 18:44:57 +03:00
Dmitry Rozhkov	6aa1a47c9a	Merge pull request #638 from uniemimu/fractional gpu_plugin: fractional resource management	2021-06-09 10:58:10 +03:00
Ukri Niemimuukko	2c4d529d66	gpu_plugin: fractional resource management Fractional resource management feature Signed-off-by: Ukri Niemimuukko <ukri.niemimuukko@intel.com> Signed-off-by: Dmitry Rozhkov <dmitry.rozhkov@intel.com>	2021-06-04 13:06:50 +03:00
Mikko Ylinen	facb4214a2	tree-wide: drop deprecated io/ioutil Go 1.16 release notes announced the deprecation of io/ioutil [1]. It's easy for us to move to use what is was recommended so just do it. [1] https://golang.org/doc/go1.16#ioutil Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>	2021-06-02 13:41:15 +03:00
Eero Tamminen	c575ce9099	Document GPU plugin test code test-case struct members	2021-05-06 11:02:57 +03:00
Eero Tamminen	57c8d76e1b	Add minimal GPU plugin options testing Tests plugin scan results in setups having none, one and multiple eligible GPU devices, with and without SRIOV enabled, with two different options values. This does not cover verifying number of devices added under "i915_monitoring" resource as that would be much larger change.	2021-05-05 17:09:09 +03:00
Eero Tamminen	ca9aa32556	Add "-enable-monitoring" option to GPU plugin Make "i915_monitoring" resource (granting access to all GPUs) optional so that it can be enabled only when it's needed.	2021-05-05 17:09:09 +03:00
Eero Tamminen	713c1ab170	Move GPU plugin CLI options to a struct To help in: * adding more CLI options in next and later commits, and * to replace magic newDevicePlugin() input parameters with explicitly named one(s)	2021-05-05 17:09:09 +03:00
Eero Tamminen	06fac8128f	Move GPU plugin sysfs device compatibility checks to own function To reduce scan() function complexity before adding more functionality to it.	2021-05-05 17:08:49 +03:00
Eero Tamminen	79b86fea2d	Skip PF for "i915" resource when it has VFs NOTE: this has impact only for GPUs which are virtualized with SR-IOV. Access to physical devices (PFs) is disabled for "i915" resource when they have configured virtual devices (VFs). This is because: * GPU resources are expected to be evenly split between VFs in such configurations * But PF resource amount is expected to differ from VFs and typically retain only enough resources (just few MB of RAM), to be able to provide GPU metrics that are not available from VFs * Neither the current GPU plugin, nor Kubernetes scheduling in general, has proper support for heterogeneous GPUs (= capability based scheduling) Therefore "i915" resource needs to be limited to GPU devices with homogeneous amount of resources, which in SR-IOV configurations is expected to be the case only with VFs (when such are present).	2021-05-05 14:13:48 +03:00
Eero Tamminen	e418c00fca	Add "i915_monitoring" resource to GPU plugin Which mounts all (Intel) GPU devices to requesting container. This is needed e.g. to get GPU metrics from the node. Requesting pod does not know how many GPUs are on the node it gets assigned to, so there needs to means to request them all. (Only alternative for the new resource would be requesting Privileged mode, which is clearly worse as that would grant pod access also to all other devices and capabilities.) This commit also: * Adds "i915_monitoring" resource testing to: go test -v -run Scan * Splits GPU plugin tests mock file system setup to a separate createTestFiles() function because otherwise TestScan() does not pass project's golangci-lint complexity limits	2021-04-27 14:21:05 +03:00
Eero Tamminen	f9158c1c3b	Update GPU plugin copyrights	2021-04-01 15:20:35 +03:00
Eero Tamminen	8ca19d408f	Fix GPU plugin error messages	2021-04-01 15:20:35 +03:00
Eero Tamminen	384d37ead0	Add test for multiple GPU devices	2021-04-01 15:20:35 +03:00

1 2

95 Commits