There's no mapping available from IP block versions to actual product
features, which make these version numbers fairly useless for end
users.
In mixed GPU clusters, running a job that adds/updates node labels for
the relevant GPU features to each relevant node would be much more
user-friendly. This could be done easily by converting given GPU API
capability tool (e.g. "vainfo" for VA-API, "clinfo" for OpenCL) output
to a NFD feature file.
(Such thing would be outside of this project scope though, except
maybe as an example / test-case.)
Signed-off-by: Eero Tamminen <eero.t.tamminen@intel.com>
This adds a new label "gpu-numbers" for short numbered lists of
gpus, omitting "card" from the names. Also adds splitting of long
label values.
Similarly this adds a new label "pci-groups" for PCI groups. Grouping
can be controlled by env var GPU_PCI_GROUPING_LEVEL. The env var
dictates, how many pci-folder names need to match, in order for GPUs
to be considered to belong in a group.
Signed-off-by: Ukri Niemimuukko <ukri.niemimuukko@intel.com>
While the labeling limit is obvious after little thought, IMHO
limitations like this should either be stated out front, or be in
their own section in the README. Commit does former for the GPU
plugin fractional resources, and latter for the NFD hook / labeling.
GPU generation "gen" number is replaced in the capability files of
latest kernels with separate display, graphics, and media versions.
For compatibility with newer kernels, provide "gen" based on the new
labels (but without decimals), and for older kernel compatibility, new
labels based on the "gen".
Because different kernels match different items from the action map,
whole capability file will get parsed. Capability file parsing is
optimized by using prefix check instead of scanf.
"platform_gen" label is deprecated, and can be dropped whenever it
becomes inconvenient (lint complains about line count etc).
This changes the memory reading to be done through lmem_total_bytes
file instead of the addr_range file.
Signed-off-by: Ukri Niemimuukko <ukri.niemimuukko@intel.com>
Add govet-fieldalignment to .golangci.yml
Fix errors that come from adding govet-fieldalignment
- by reordering the fields of structs
- by putting nolint:govet annotations
Signed-off-by: Hyeongju Johannes Lee <hyeongju.lee@intel.com>
This adds a link from gpu-plugin README to the nfdhook README, and
updates the nfdhook README with label descriptions.
Signed-off-by: Ukri Niemimuukko <ukri.niemimuukko@intel.com>
Go 1.16 release notes announced the deprecation of io/ioutil [1]. It's easy
for us to move to use what is was recommended so just do it.
[1] https://golang.org/doc/go1.16#ioutil
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
This adds reading of the GPU memory amount from the sysfs. As a
fallback the environment variable GPU_MEMORY_OVERRIDE remains.
Another environment variable GPU_MEMORY_RESERVED can be used to
reserve a dedicated byte amount outside of kubernetes usage.
Signed-off-by: Ukri Niemimuukko <ukri.niemimuukko@intel.com>
This adds an nfd-hook for the gpu-plugin, which will create labels
for the GPUs that can then be used for POD deployment purposes or
creation of GPU extended resources which allow then finer grained
GPU resource management.
The nfd-hook will install to the host system when the
intel-gpu-initcontainer is run. It is added into the plugin deployment
yaml.
Signed-off-by: Ukri Niemimuukko <ukri.niemimuukko@intel.com>