Commit Graph

23 Commits

Author SHA1 Message Date
Tuomas Katila
13e20c9abf gpu: fix documentation links
Signed-off-by: Tuomas Katila <tuomas.katila@intel.com>
2022-09-16 15:25:45 +03:00
Eero Tamminen
0b519ecf1e Deprecate debugfs GPU IP block version labels in NFD hook doc
There's no mapping available from IP block versions to actual product
features, which make these version numbers fairly useless for end
users.

In mixed GPU clusters, running a job that adds/updates node labels for
the relevant GPU features to each relevant node would be much more
user-friendly.  This could be done easily by converting given GPU API
capability tool (e.g. "vainfo" for VA-API, "clinfo" for OpenCL) output
to a NFD feature file.

(Such thing would be outside of this project scope though, except
maybe as an example / test-case.)

Signed-off-by: Eero Tamminen <eero.t.tamminen@intel.com>
2022-08-24 16:55:01 +03:00
Eero Tamminen
0b7cbc862d Improve GPU NFD hook documentation
Add table of contents, simplify introduction text.

Signed-off-by: Eero Tamminen <eero.t.tamminen@intel.com>
2022-08-24 16:55:01 +03:00
Tuomas Katila
bdd72c8cf7 gpu: Add numa node mapping label for GPUs
Signed-off-by: Tuomas Katila <tuomas.katila@intel.com>
2022-03-24 14:29:05 +02:00
Tuomas Katila
6f57c55ef8 Add a total tile count to node's labels
This label isn't dependent on the debugfs as the platform
specific tile count is.

Signed-off-by: Tuomas Katila <tuomas.katila@intel.com>
2022-01-26 09:57:33 +02:00
Ukri Niemimuukko
7520393041 gpu_nfdhook: gpu-numbers and pci-groups
This adds a new label "gpu-numbers" for short numbered lists of
gpus, omitting "card" from the names. Also adds splitting of long
label values.

Similarly this adds a new label "pci-groups" for PCI groups. Grouping
can be controlled by env var GPU_PCI_GROUPING_LEVEL. The env var
dictates, how many pci-folder names need to match, in order for GPUs
to be considered to belong in a group.

Signed-off-by: Ukri Niemimuukko <ukri.niemimuukko@intel.com>
2022-01-25 09:17:56 +02:00
Eero Tamminen
36046d90a4 Make GPU plugin / resource label limitations more explicit
While the labeling limit is obvious after little thought, IMHO
limitations like this should either be stated out front, or be in
their own section in the README.  Commit does former for the GPU
plugin fractional resources, and latter for the NFD hook / labeling.
2022-01-04 11:43:08 +02:00
Ukri Niemimuukko
46dcffc33e README typofix
Label descriptions had extra underscores.

Signed-off-by: Ukri Niemimuukko <ukri.niemimuukko@intel.com>
2021-12-28 12:01:40 +02:00
Ed Bartosh
cec004c398 lint: enable wsl check
Fixes: #392

Signed-off-by: Ed Bartosh <eduard.bartosh@intel.com>
2021-12-17 11:48:48 +02:00
Eero Tamminen
bcc737bd2a Adapt GPU label support to debugfs DRM entry changes
GPU generation "gen" number is replaced in the capability files of
latest kernels with separate display, graphics, and media versions.

For compatibility with newer kernels, provide "gen" based on the new
labels (but without decimals), and for older kernel compatibility, new
labels based on the "gen".

Because different kernels match different items from the action map,
whole capability file will get parsed. Capability file parsing is
optimized by using prefix check instead of scanf.

"platform_gen" label is deprecated, and can be dropped whenever it
becomes inconvenient (lint complains about line count etc).
2021-12-16 21:22:31 +02:00
Hyeongju Johannes Lee
13f4ce82a1 Remove nolint annot.
Remove the annotation nolint:funlen since funlen is not used anymore.
2021-10-11 11:36:24 +03:00
Dmitry Rozhkov
19d54b9fe8
Merge pull request #707 from uniemimu/mem_read
gpu nfdhook: new memory amount reading logic
2021-09-23 10:33:41 +03:00
Ukri Niemimuukko
64290020d7 gpu nfdhook: new memory amount reading logic
This changes the memory reading to be done through lmem_total_bytes
file instead of the addr_range file.

Signed-off-by: Ukri Niemimuukko <ukri.niemimuukko@intel.com>
2021-09-21 13:50:41 +03:00
Hyeongju Johannes Lee
8fc5df7e37 Add govet-fieldalignment
Add govet-fieldalignment to .golangci.yml
Fix errors that come from adding govet-fieldalignment
- by reordering the fields of structs
- by putting nolint:govet annotations

Signed-off-by: Hyeongju Johannes Lee <hyeongju.lee@intel.com>
2021-09-20 20:59:04 +03:00
Ukri Niemimuukko
cbf7bab114 add link to gpu_nfdhook and update hook README
This adds a link from gpu-plugin README to the nfdhook README, and
updates the nfdhook README with label descriptions.

Signed-off-by: Ukri Niemimuukko <ukri.niemimuukko@intel.com>
2021-06-11 18:54:44 +03:00
Ukri Niemimuukko
7ca5cfcfd6 add pf skip to gpu nfdhook
This corresponds to the previous gpu-plugin skip code.

Signed-off-by: Ukri Niemimuukko <ukri.niemimuukko@intel.com>
2021-06-10 18:44:57 +03:00
Mikko Ylinen
facb4214a2 tree-wide: drop deprecated io/ioutil
Go 1.16 release notes announced the deprecation of io/ioutil [1]. It's easy
for us to move to use what is was recommended so just do it.

[1] https://golang.org/doc/go1.16#ioutil

Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2021-06-02 13:41:15 +03:00
Ukri Niemimuukko
bb44156d4f gpu_nfdhook: make memory parsing more robust
This add support for parsing also hex and octal amounts.

Signed-off-by: Ukri Niemimuukko <ukri.niemimuukko@intel.com>
2021-04-09 16:23:48 +03:00
Ukri Niemimuukko
f89b61f923 add tile count label
Signed-off-by: Ukri Niemimuukko <ukri.niemimuukko@intel.com>
2021-02-26 20:39:48 +02:00
Mikko Ylinen
0892a34705 move to k8s.io v1.20.x and klog/v2 v2.4.0
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2021-01-21 15:34:39 +02:00
Ukri Niemimuukko
5d31dca018 gpu_nfdhook: remove devfs dependency
This removes the devfs dependency. Sysfs is sufficient for scanning
presense of GPUs.

Signed-off-by: Ukri Niemimuukko <ukri.niemimuukko@intel.com>
2020-12-23 15:43:48 +02:00
Ukri Niemimuukko
5b5180ae00 gpu_nfdhook memory amount reading from sysfs
This adds reading of the GPU memory amount from the sysfs. As a
fallback the environment variable GPU_MEMORY_OVERRIDE remains.

Another environment variable GPU_MEMORY_RESERVED can be used to
reserve a dedicated byte amount outside of kubernetes usage.

Signed-off-by: Ukri Niemimuukko <ukri.niemimuukko@intel.com>
2020-10-26 09:45:43 +02:00
Ukri Niemimuukko
505eadaf94 gpu-plugin nfd-hook
This adds an nfd-hook for the gpu-plugin, which will create labels
for the GPUs that can then be used for POD deployment purposes or
creation of GPU extended resources which allow then finer grained
GPU resource management.

The nfd-hook will install to the host system when the
intel-gpu-initcontainer is run. It is added into the plugin deployment
yaml.

Signed-off-by: Ukri Niemimuukko <ukri.niemimuukko@intel.com>
2020-10-01 12:02:57 +03:00