Commit Graph

40 Commits

Author SHA1 Message Date
Kevin Putnam
1d149ffee6 Documentation: Fixes broken links and standardizes headers.
Signed-off-by: Kevin Putnam <kevin.putnam@intel.com>
2020-09-22 08:32:21 -07:00
Dmitry Rozhkov
1b82ab9df6 sync README.md files with the current state of the code
Closes #356
2020-09-16 10:54:39 +03:00
Ukri Niemimuukko
b2991b94e1 gpu_plugin: reduce topology scanning for high shared dev count
For every created device info, a new topology scan is performed in
the filesystem. The shared dev count was implemented so that for each
shared device, a new device info was created, which resulted in the
topology scan happening as many times per Scan-round, as there were
shared devs.

This fixes the issue by making the device info to be shared among the
shared devices.

Signed-off-by: Ukri Niemimuukko <ukri.niemimuukko@intel.com>
2020-09-08 18:57:29 +03:00
Ukri Niemimuukko
7244bd0f25 gpu_plugin: README.md update
Move remark about GVT-d to end of introduction. Remove remarks
about GVT-g for the time being.

Signed-off-by: Ukri Niemimuukko <ukri.niemimuukko@intel.com>
2020-08-25 13:45:10 +03:00
Mikko Ylinen
cd068c797a ci: update tool versions
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2020-08-21 17:04:04 +03:00
Dmitry Rozhkov
73aea0aa1b linter: enable gosec check 2020-06-11 17:56:24 +03:00
Dmitry Rozhkov
70f862f2aa add golangci linter
In this initial commit the following checks are disabled due to
excessive amount of changes required:
- dupl (duplicate code)
- funlen (function length)
- goerr113 (errors handling expressions)
- gomnd (magic numbers)
- gosec (security)
- nakedret (naked returns)
- wsl (forces to use empty lines)
- errcheck (checking for unchecked errors)
- staticcheck (static analysis)
2020-06-08 14:01:13 +03:00
Dmitry Rozhkov
aabc45cbb5 gpu: increase code coverage for unit tests 2020-05-19 16:14:40 +03:00
Graham Whaley
626bbb6ee2 gpu: move to using klog
Move from fmt to klog for logging and debug.
Also add an extra info level message noting when we find
new devices.

Signed-off-by: Graham Whaley <graham.whaley@intel.com>
2020-03-20 11:54:38 +00:00
Mikko Ylinen
f145541caf READMEs: use git clone to get the code
go get'ing does not work due to our k8s.io/kubernetes dependency
so guide users to use git clone to get the code.

Fixes: #290

Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2020-02-20 08:04:07 +02:00
Dmitry Rozhkov
3db440d2d4
Merge pull request #288 from askervin/kustomize-gpu
gpu_plugin: add kustomizations
2020-02-11 10:54:14 +02:00
Ed Bartosh
1f4928790f Implement function for DeviceInfo creation
- Made DeviceInfo fields private
- Implement NewDeviceInfo constructor
2020-02-07 15:26:37 +02:00
Antti Kervinen
d568f050c5 gpu_plugin: add kustomizations
- Default deployment: `kubectl apply -k deployments/gpu_plugin`
- Default deployment does not specify namespace anymore
  (was: `kube-system`).
- Variant: deploy only on nodes with Intel GPU label by NFD:
  `kubectl apply -k deployments/gpu_plugin/overlays/nfd_labeled_nodes`
- Variant: deploy to `kube-system` instead of user-defined namespace
  (or "default"):
  `kubectl apply -k deployments/gpu_plugin/overlays/namespace_kube-system`
- GPU plugin README updated.

Signed-off-by: Antti Kervinen <antti.kervinen@intel.com>
2020-02-07 14:56:52 +02:00
Graham Whaley
6537e38499 gpu: do not fail if device scanning fails
If we fail to scan for GPU devices (note, that is potentially
different from not finding any devices during a scan), then
warn on it, and go around the poll loop again. Do not treat
it as a fatal error or we might end up in a re-launch death
deploy loop...

Of course, getting a warning in your logs every 5s could also
be annoying, but is somewhat 'less fatal'.

Fixes: #260
Fixes: #230

Signed-off-by: Graham Whaley <graham.whaley@intel.com>
2020-01-29 09:24:50 +00:00
Graham Whaley
79a86c10e8 docs: gpu: Add more details, re-arrange section order
Re-arrange the section order a little (such as putting the use
of the DaemonSet before the sudo hand-deploy), and add a lot more
detail of what to expect, and how to check if the pod has launched
correctly.

Signed-off-by: Graham Whaley <graham.whaley@intel.com>
2020-01-17 13:34:13 +00:00
Graham Whaley
6705a8e461 docs: gpu: add high level details to README
Fill out the introduction to the GPU README to give some details around
what the plugin supports and how.

Signed-off-by: Graham Whaley <graham.whaley@intel.com>
2020-01-16 15:27:22 +00:00
Dmitry Rozhkov
814e2e1a50 bump k8s dependencies up to v1.17.0 2020-01-09 11:19:58 +02:00
Mikko Ylinen
fd631fc31c deployments/gpu_plugin: limit host mounts
The default deployment gives rather wide host mounts. We can limit
the mounts only to the subdirectories the plugin needs and mount
them read-only.

Also, add notes that both QAT and GPU plugins can be run as non-root
user.

Fixes: #228

Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2019-12-11 12:54:36 +02:00
Dmitry Rozhkov
44ff734be6 gpu: add log messages for not found cards
Let a user know the plugin can't find any Intel GPU if that's
the case. It might be cumbersome to realize that the plugin runs
on a host which doesn't have any Intel GPUs.

Also make the code less nested for better readability.
2019-05-24 16:19:06 +03:00
Dmitry Rozhkov
54332c5eea announce deviceplugin API public 2019-01-21 17:20:01 +02:00
Dmitry Rozhkov
7662cb9154 extend API to receive full specs instead of strings 2019-01-21 17:15:27 +02:00
Frederik Carlier
d6016dedf9 Fix typos 2018-11-22 20:44:00 +00:00
Ed Bartosh
14b4168cbd add GPU plugin deployment
Added DaemonSet yaml
Added deployment instructions to plugin's README
2018-09-14 13:55:08 +03:00
Dmitry Rozhkov
47464fa034 Update gpu_plugin/README.md not to mention beignet 2018-09-04 11:09:23 +03:00
Dmitry Rozhkov
eccd70c600 replace glog with simpler home-grown debug logging 2018-08-16 17:40:16 +03:00
Dmitry Rozhkov
2ff6c5929a Use annotated errors for tracing 2018-08-16 17:31:19 +03:00
Dmitry Rozhkov
40246f64ad gpu_plugin: add -shared-dev-num option
The DRM driver of Intel i915 GPUs allows sharing one device
between many containers.

Make it possible to use the same device from different containers.
The exact number of containers sharing the same device can be limited
with the new option -shared-dev-num set to 1 by default.

closes #53
2018-08-14 14:54:49 +03:00
Ed Bartosh
21e0e5c518
Merge pull request #48 from rojkov/refactor-dev-plugins
Refactor dev plugins to increase code reuse
2018-07-31 13:37:16 +03:00
MCamp859
fb5e20c14a Edits for format and text flow.
Signed-off-by: MCamp859 <mary.camp@ptiglobal.net>
2018-07-30 13:59:26 -04:00
MCamp859
f3f749d4f5 Edits for format and text flow.
Signed-off-by: MCamp859 <mary.camp@ptiglobal.net>
2018-07-30 13:54:01 -04:00
Dmitry Rozhkov
1e7dbac162 Update README.md files to reflect changes caused by refactoring
update demo files
2018-07-30 15:29:33 +03:00
Dmitry Rozhkov
bbee3fde77 refactor device plugins to increase code reuse
Every device plugin is supposed to implement PluginInterfaceServer
interface to be exposed as a gRPC service. But this functionality is
common for all our device plugins and can be hidden in a Manager
which manages all gRPC servers dynamically.

The only mandatory functionality that needs to be provided by a device
plugin and which differentiate one plugin from another is the code
scanning the host for devices present on it.

Refactor the internal deviceplugin package to accept only
one mandatory method implementation from device plugins - Scan().

In addition to that  a device plugin can optionally implement a
PostAllocate() method which mutates responses returned by
PluginInterfaceServer.Allocate() method.

Also to narrow the gap between these device plugins and the
kubevirt's collection the naming scheme for resources has been changed.
Now device plugins provide a namespace for the device types they
operate with. E.g. for resources in format "color.example.com/<color>"
the namespace would be "color.example.com". So, the resource name
"intel.com/fpga-region-fffffff" becomes "fpga.intel.com/region-fffffff".
2018-07-30 15:29:33 +03:00
Alexander D. Kanevskiy
6c08dbdb64
Merge pull request #54 from zhenyw/gpu
gpu_plugin: skip drm control node
2018-07-26 15:03:04 +03:00
Zhenyu Wang
ec632e0b38 gpu_plugin: skip drm control node
DRM control node is deprecated and removed by latest kernel.
This will skip possible drm control node found on host.

v2: Fix lint error
v3: Fix regex string

Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
2018-07-26 10:35:53 +08:00
Zhenyu Wang
6f3543884f gpu_plugin: Fix regex string for drm card node
As noted on pull request comment, fix regex for drm card node.

Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
2018-07-26 10:33:12 +08:00
ssehgal
3eb2b10f75 Enabling support for QuickAssist Devices 2018-07-23 17:35:37 +01:00
Dmitry Rozhkov
945f6e98f7 Update GPU demo to run FFT on the device
Also the demo runs on top of Intel compute runtime instead of
Beignet.
2018-07-05 16:19:16 +03:00
Ed Bartosh
6a3953fc85 reformatted *.go with gofmt -s -w
This is done to fix https://goreportcard.com warnnigs:

gofmt 33%
Gofmt formats Go programs. We run gofmt -s on your code, where -s is for the "simplify" command

intel-device-plugins-for-kubernetes/cmd/fpga_plugin/fpga_plugin_test.go
Line 1: warning: file is not gofmted with -s (gofmt)

intel-device-plugins-for-kubernetes/internal/deviceplugin/deviceplugin_test.go
Line 1: warning: file is not gofmted with -s (gofmt)

intel-device-plugins-for-kubernetes/cmd/gpu_plugin/gpu_plugin_test.go
Line 1: warning: file is not gofmted with -s (gofmt)

intel-device-plugins-for-kubernetes/cmd/fpga_plugin/fpga_plugin.go
Line 1: warning: file is not gofmted with -s (gofmt)
2018-05-28 16:59:19 +03:00
Ed Bartosh
983245b5a9 Reworked README.md
Split into 3 parts:
- main part with high level description of the repository
- Build and test Intel GPU Device Plugin for Kubernetes
- Build and test Intel FPGA Device Plugin for Kubernetes

Added Intel logo to the main README.md

Fixes #2
2018-05-25 10:31:53 +03:00
Alexander Kanevskiy
d4d77a71e4 Initial public code release 2018-05-18 18:30:54 +03:00