Commit Graph

25 Commits

Author SHA1 Message Date
Mikko Ylinen
6255810e0d gpu/rm: move to fake.NewClientSet()
k8s v1.32 client-go makes FakePods private so the current
resourcemanager fake client won't work anymore.

client-go provides a simple fake Client that works easily so
just move to use it.

Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2025-01-02 12:00:34 +02:00
Tuomas Katila
518a8606ff gpu: add levelzero sidecar support for plugin and the deployment files
In addition to the levelzero's health data use, this adds support to
scan devices in WSL. Scanning happens by retrieving Intel device
indices from the Level-Zero API.

Signed-off-by: Tuomas Katila <tuomas.katila@intel.com>
2024-09-19 19:14:15 +03:00
Mikko Ylinen
4d858c5364 pkg/deviceplugin: move to grpc.NewClient()
grpc.NewClient(), added in grpc-go v1.63, is the preferred way to
create a new ClientConn. In most of our usages, moving away from
grpc.Dial*() to it is straightforward.

However, we've also relied on grpc.Dial*()'s behavior to automatically
make a new connection to "test" a connection is successful isn't available
anymore. Combined with grpc.WithBlock dialoption this usage is considered
"especially bad" way to handle a client connection.

The recommended approach to test a server connection is to separately
make a connection and watch the connection state to become Ready. This
change follows that recommendation.

Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2024-05-28 08:17:06 +03:00
Mikko Ylinen
54f9d730e9 ci: move to golangci-lint v1.57.2
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2024-05-02 09:18:27 +03:00
Tuomas Katila
e600fe9313 gpu: add support for the upcoming xe-driver
Plugin can support both i915 and xe drivers dynamically. But
having both drivers on same node with RM is not possible.

Signed-off-by: Tuomas Katila <tuomas.katila@intel.com>
2024-03-12 11:34:01 +02:00
Tuomas Katila
8640b1501c gpu: default to flat/combined mode for l0 affinity mask
With tile requests, the level zero affinity mask now defaults to
flat/combined mode. If ZE_FLAT_DEVICE_HIERARCHY is set to COMPOSITE
in the Pod's specification, plugin will use the previous "x.y" format
instead of the new "x" in the affinity mask.

Signed-off-by: Tuomas Katila <tuomas.katila@intel.com>
2023-12-08 08:42:02 +02:00
Tuomas Katila
827b9a0ced fix crash with rm when kubelet request timeouts
Signed-off-by: Tuomas Katila <tuomas.katila@intel.com>
2023-09-12 16:20:33 +03:00
Tuomas Katila
691dfc3483 gpu: refactor nfdhook functionality to plugin
NFD v0.14+ doesn't support binary NFD hooks by default, so there is
a need to move the label creation away from the GPU nfdhook.

Move extended resource label creation to plugin, and drop labels that were
already marked deprecated (platform_gen, media_version etc.).

Drop init-container from deployment files and operator. It is still possible
to use an initcontainer, but the default deployments do not support it.

Signed-off-by: Tuomas Katila <tuomas.katila@intel.com>
2023-09-12 16:20:33 +03:00
Tuomas Katila
532f2fe8cd gpu/rm: add error check in kubelet flow
Signed-off-by: Tuomas Katila <tuomas.katila@intel.com>
2023-05-24 09:52:07 +03:00
Mikko Ylinen
e428cd6c19 go.mod: update to k8s 1.27.1 and controller runtime 0.15.x
k8s 1.27.x triggers build errors on controller-runtime 0.14.x
so we will need to update to 0.15.x at the same time.

Changes include:

* k8s e2e framework moved to use Ginkgo context so we add
  test context to all our test nodes.
* adapt Ginkgo parameter modifications.
* adapt SGX admissionwebhook to InjectDecoder removal.
* adapt deviceplugins and FPGA CRDs to controller-runtime
  API changes.

Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2023-05-09 14:49:24 +03:00
Tuomas Katila
974829ff7c gpu: try to fetch PodList from kubelet API
In large clusters and with resource management, the load
from gpu-plugins can become heavy for the api-server.
This change will start fetching pod listings from kubelet
and use api-server as a backup. Any other error than timeout
will also move the logic back to using api-server.

Signed-off-by: Tuomas Katila <tuomas.katila@intel.com>
2023-03-30 12:43:02 +03:00
Ukri Niemimuukko
3feb185277 randomize cleanup interval and increase it to 20 minutes
Signed-off-by: Ukri Niemimuukko <ukri.niemimuukko@intel.com>
2023-03-24 10:39:55 +02:00
Tuomas Katila
527f638367 test: gpu: add fake target for grpc.Dial
In preparation for grpc 1.52.0.

Signed-off-by: Tuomas Katila <tuomas.katila@intel.com>
2023-01-12 11:50:47 +02:00
Ukri Niemimuukko
8ed705d79c unexport internal types
ContainerAssignments and PodAssignementDetauls need not be exported.

Signed-off-by: Ukri Niemimuukko <ukri.niemimuukko@intel.com>
2022-12-30 16:39:41 +02:00
Ukri Niemimuukko
41b7b55727 gpu: log errors from pod listing
Signed-off-by: Ukri Niemimuukko <ukri.niemimuukko@intel.com>
2022-10-11 14:31:56 +03:00
Mikko Ylinen
642c4f7b59 build: move to Go 1.19 and golangci-lint 1.48 because of that
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2022-08-15 10:13:37 +03:00
Mikko Ylinen
2adad5ae76 drop deprecated grpc.WithInsecure()
grpc-go v1.43.0 deprecated grpc.WithInsecure() in favor of
insecure.NewCredentials(). Move to use the recommended approach
and drop the linter annotations.

Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2022-04-07 13:40:51 +03:00
Tuomas Katila
8f6a235b5d gpu: Start using GetPreferredAllocation with fractional resources
Move reallocate logic to getpreferredallocation and simplify
allocate to use the kubelet's device ids.

Signed-off-by: Tuomas Katila <tuomas.katila@intel.com>
2022-03-30 11:32:49 +03:00
Tuomas Katila
db7e5bfc55 Add support for gas-container-tiles annotation
Adds functionality to convert container's tile annotation
in to corresponding L0 affinity mask. This helps to target
container's workload to specific L0 subdevices.

Signed-off-by: Tuomas Katila <tuomas.katila@intel.com>
2022-03-24 14:13:35 +02:00
dependabot[bot]
9a16e80f2b build(deps): bump google.golang.org/grpc from 1.42.0 to 1.43.0
Bumps [google.golang.org/grpc](https://github.com/grpc/grpc-go) from 1.42.0 to 1.43.0.
- [Release notes](https://github.com/grpc/grpc-go/releases)
- [Commits](https://github.com/grpc/grpc-go/compare/v1.42.0...v1.43.0)

---
updated-dependencies:
- dependency-name: google.golang.org/grpc
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

---
In addition to changes made by dependabot, I add nolint comments to ignore staticcheck(SA1019) errors.
It is because insecure.NewCredentials() recommended as an alternative is still declared experimental.
So keep grpc.withInsecure() with nolint comment.

Signed-off-by: Hyeongju Johannes Lee <hyeongju.lee@intel.com>
2021-12-20 04:50:39 -08:00
Ed Bartosh
cec004c398 lint: enable wsl check
Fixes: #392

Signed-off-by: Ed Bartosh <eduard.bartosh@intel.com>
2021-12-17 11:48:48 +02:00
Hyeongju Johannes Lee
8fc5df7e37 Add govet-fieldalignment
Add govet-fieldalignment to .golangci.yml
Fix errors that come from adding govet-fieldalignment
- by reordering the fields of structs
- by putting nolint:govet annotations

Signed-off-by: Hyeongju Johannes Lee <hyeongju.lee@intel.com>
2021-09-20 20:59:04 +03:00
Ukri Niemimuukko
0670a82cb1 gpu rm linter comment fixes
Signed-off-by: Ukri Niemimuukko <ukri.niemimuukko@intel.com>
2021-09-10 14:35:13 +03:00
Hyeongju Johannes Lee
09ba9fde00 Update tool versions and fix errors and warnings that originated from the update
Update tool versions
Fix the errors and warnings originated from the update:
-Correct type deviceInfo (->DeviceInfo) to make it public
-Fix gpu_plugin.go and vpu_plugin_test.go where stylecheck errors occur
-Fix deprecation warnings
-Rename type 'PatcherManager' to 'Manager' to solve exported errors
-Rename type 'SgxMutator' to 'Mutator' to solve exported errors

Signed-off-by: Hyeongju Johannes Lee <hyeongju.lee@intel.com>
2021-08-25 07:09:34 +00:00
Ukri Niemimuukko
2c4d529d66 gpu_plugin: fractional resource management
Fractional resource management feature

Signed-off-by: Ukri Niemimuukko <ukri.niemimuukko@intel.com>
Signed-off-by: Dmitry Rozhkov <dmitry.rozhkov@intel.com>
2021-06-04 13:06:50 +03:00