Commit Graph

109 Commits

Author SHA1 Message Date
Alexander D. Kanevskiy
2a6eda891a
Merge pull request #68 from bart0sh/PR0030-fix-annotation-value
fpga_crihook: fix annotation value
2018-08-08 01:18:18 +03:00
Ed Bartosh
9de82c819f fpga_crihook: fix annotation value
Annotation value has been changed in FPGA plugin code recently.
Updated fpga_crihook to use the same value.
2018-08-07 17:40:16 +03:00
Ed Bartosh
71e8ea471a fpga_crihook: specify socket number when programming device
Added -S <device number> parameter to fpgaconf command line to
ensure usage of allocated device.
2018-08-07 17:32:59 +03:00
Alexander D. Kanevskiy
f8c3e9abf4
Merge pull request #62 from MCamp859/qat-readme-edits
QAT readme edits for format and text flow.
2018-08-03 20:56:52 +03:00
MCamp859
68c099db99 Fixed typo.
Signed-off-by: MCamp859 <mary.camp@ptiglobal.net>
2018-08-02 10:28:30 -04:00
MCamp859
1fe88a9067 Updated with review comments.
Signed-off-by: MCamp859 <mary.camp@ptiglobal.net>
2018-08-02 10:10:14 -04:00
MCamp859
4d5046f860 QAT readme edits for format and text flow.
Signed-off-by: MCamp859 <mary.camp@ptiglobal.net>
2018-07-31 16:08:27 -04:00
MCamp859
6544d35ab1 QAT readme edits for format and text flow.
Signed-off-by: MCamp859 <mary.camp@ptiglobal.net>
2018-07-31 16:04:47 -04:00
Mary Camp
51bb79bc60
Merge branch 'master' into fpga-readme-edits 2018-07-31 13:18:40 -04:00
MCamp859
be66697049 Replaced "afu" with "af" in 2 places.
Signed-off-by: MCamp859 <mary.camp@ptiglobal.net>
2018-07-31 10:33:18 -04:00
Ed Bartosh
21e0e5c518
Merge pull request #48 from rojkov/refactor-dev-plugins
Refactor dev plugins to increase code reuse
2018-07-31 13:37:16 +03:00
MCamp859
a29e04f614 Edits to FPGA readme files for format and text flow.
Signed-off-by: MCamp859 <mary.camp@ptiglobal.net>
2018-07-30 16:13:47 -04:00
MCamp859
fb5e20c14a Edits for format and text flow.
Signed-off-by: MCamp859 <mary.camp@ptiglobal.net>
2018-07-30 13:59:26 -04:00
MCamp859
f3f749d4f5 Edits for format and text flow.
Signed-off-by: MCamp859 <mary.camp@ptiglobal.net>
2018-07-30 13:54:01 -04:00
Dmitry Rozhkov
b6894b8195 refactor QAT plugin 2018-07-30 15:29:33 +03:00
Dmitry Rozhkov
972a80bedb fpga_admissionwebhook: update resource names 2018-07-30 15:29:33 +03:00
Dmitry Rozhkov
1e7dbac162 Update README.md files to reflect changes caused by refactoring
update demo files
2018-07-30 15:29:33 +03:00
Dmitry Rozhkov
bbee3fde77 refactor device plugins to increase code reuse
Every device plugin is supposed to implement PluginInterfaceServer
interface to be exposed as a gRPC service. But this functionality is
common for all our device plugins and can be hidden in a Manager
which manages all gRPC servers dynamically.

The only mandatory functionality that needs to be provided by a device
plugin and which differentiate one plugin from another is the code
scanning the host for devices present on it.

Refactor the internal deviceplugin package to accept only
one mandatory method implementation from device plugins - Scan().

In addition to that  a device plugin can optionally implement a
PostAllocate() method which mutates responses returned by
PluginInterfaceServer.Allocate() method.

Also to narrow the gap between these device plugins and the
kubevirt's collection the naming scheme for resources has been changed.
Now device plugins provide a namespace for the device types they
operate with. E.g. for resources in format "color.example.com/<color>"
the namespace would be "color.example.com". So, the resource name
"intel.com/fpga-region-fffffff" becomes "fpga.intel.com/region-fffffff".
2018-07-30 15:29:33 +03:00
Mikko Ylinen
6c787ec658 qat: read maxNumdevice as integer 2018-07-26 14:20:44 +03:00
Alexander D. Kanevskiy
6c08dbdb64
Merge pull request #54 from zhenyw/gpu
gpu_plugin: skip drm control node
2018-07-26 15:03:04 +03:00
Ed Bartosh
a4b3f7f068 qat: fix formatting errors 2018-07-26 12:44:26 +03:00
Zhenyu Wang
ec632e0b38 gpu_plugin: skip drm control node
DRM control node is deprecated and removed by latest kernel.
This will skip possible drm control node found on host.

v2: Fix lint error
v3: Fix regex string

Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
2018-07-26 10:35:53 +08:00
Zhenyu Wang
6f3543884f gpu_plugin: Fix regex string for drm card node
As noted on pull request comment, fix regex for drm card node.

Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
2018-07-26 10:33:12 +08:00
Alexander D. Kanevskiy
1140756037
Merge pull request #50 from swatisehgal/dev/qat
Enabling support for QuickAssist Devices
2018-07-24 16:13:18 +03:00
ssehgal
3eb2b10f75 Enabling support for QuickAssist Devices 2018-07-23 17:35:37 +01:00
Dmitry Rozhkov
ff813285ed typo fixed 2018-07-20 10:44:37 +03:00
Ed Bartosh
b1b2edf1b8 fpga_crihook: check if requested AF is programmed
Check if programmed AF id is equal to the requested AF id
after re-programming a device.
2018-07-18 12:27:45 +03:00
Ed Bartosh
9df1afdf43 fpga_crihook: check if intel annotation is set
Check if container annotation com.intel.fpga.mode is set to
"intel.com/fpga-region". This annotation is set by device plugin.
So, the check should help to filter out unwanted workflow that
device plugin is not aware of.
2018-07-16 16:12:59 +03:00
Ed Bartosh
2f9debe35b fpga_crihook: check if bitstream is already programmed
FPGA device can be already programmed with requested bitstream.
In this case hook should not programm the device again.
2018-07-13 14:16:27 +03:00
Dmitry Rozhkov
8f977b7782 Send device list upon reconnecting to kubelet
When kubelet notifies the plugin about its restart by removing
the plugin's socket we do reconnect to kubelet, but we don't
send the current list of monitored devices to kubelet. As result
kubelet is not aware of discovered devices if it restarts.

Always send the current list of monitored devices to kubelet
upon ListAndWatch() request.
2018-07-11 12:04:43 +03:00
Ed Bartosh
7f83feaf99
Merge pull request #41 from rojkov/vpg-demo
Update GPU demo to run FFT on the device
2018-07-06 14:45:27 +03:00
Dmitry Rozhkov
945f6e98f7 Update GPU demo to run FFT on the device
Also the demo runs on top of Intel compute runtime instead of
Beignet.
2018-07-05 16:19:16 +03:00
Ed Bartosh
69a32df387 fpga_crihook: covered by tests 2018-07-05 13:49:09 +03:00
Ed Bartosh
b4476110f8 implement CRI prestart hook
The hook gets FPGA_REGION and FPGA_BITSTREAM environment variables
defined in a pod spec, finds bitstream file, verifies it and programs
FPGA device with it using fpga-configure tool from OPAE.
2018-07-05 13:49:09 +03:00
Dmitry Rozhkov
bb2403deb9 fpga: ignore afu_id in region mode
When running in the region mode we don't need to know AFU IDs
thus don't read them while in the mode.

It's important not to try to read them because in the region mode
AFUs are supposed to be reprogrammed in the fly anyway and the
afu_id files may become busy.
2018-07-04 12:02:07 +03:00
Ed Bartosh
54fd4f6f8f fpga: ignore EBUSY error when reading afu_id
Device descovery can get EBUSY error when AFU is being programmed.
It causes plugin to crash with error:
  Device scan failed: read /sys/class/fpga/intel-fpga-dev.0/intel-fpga-port.0/afu_id:
      device or resource busy

This error should be ignored as this is valid use case.
This is harmless as afu will be rediscovered on the next run, when
reprogramming is done.
2018-07-03 11:09:09 +03:00
Ed Bartosh
6a571e7d5b fpga: decrease cyclomatic complexity of scanFPGAs
Moved code that goes through sysfs to the separate function
getSysFsInfo to decrease cyclomatic complexity of the scanFPGAs
function.

This is required to get the next commit through our CI check.
2018-07-03 11:09:09 +03:00
Dmitry Rozhkov
87daf21eea
Merge pull request #36 from bart0sh/PR0020-Allocate-annotations
fpga: set container annotations
2018-07-02 10:16:36 +03:00
Ed Bartosh
cbd7173b1f fpga: set container annotations
Plugin sets container annotation com.intel.fpga.mode to
intel.com/fpga-region in region mode.

This should allow to configure CRI-O to run reprogramming hooks
only when annotation is set.
2018-06-29 16:58:02 +03:00
Dmitry Rozhkov
3082d453ad extend webhook-deploy.sh to accept --mode
Since the webhook can operate in two modes, either `preprogrammed`
or `orchestrated`, extend the deploying script to support these
modes.
2018-06-29 16:30:36 +03:00
Dmitry Rozhkov
3d30fa2872 fpga_admissionwebhook: add orchestrated mode
In `orchestrated` mode the webhook parses requested resources and to translates
them to a container's ENV variables to let CRI hooks to program the FPGA with
requested bitstreams before starting the container.
2018-06-29 16:30:36 +03:00
Dmitry Rozhkov
62a8c50f6c automate FPGA webhook deployment 2018-06-20 14:54:49 +03:00
Dmitry Rozhkov
861b23308d Check node's annotations to set mode of FPGA plugin 2018-06-20 09:45:43 +03:00
Dmitry Rozhkov
562f8fe722 fpga-admissionwebhook: add initial implementation 2018-06-19 14:55:59 +03:00
Dmitry Rozhkov
371376cf73 fpga: restore mode names in resource prefixes
The latest refactoring of FPGA scanning accidentally removed
mode names from resource name prefixes.

Restore them back.
2018-06-18 14:10:53 +03:00
Dmitry Rozhkov
4a1b311e62 fix up misspelling 2018-06-15 15:25:43 +03:00
Dmitry Rozhkov
9800f5a5e6 run gofmt -s 2018-06-15 15:23:49 +03:00
Dmitry Rozhkov
979a8357c8 add regiondevel mode to fpga_plugin
In the `af` mode the plugin announces AFUs and tells kubelet
to pass only AFU ports to containers.

In the `region` mode the plugin announces region interfaces and tells
kubelet to pass only AFU ports to containers.

In the `regiondevel` mode the plugin announces region interfaces and
tells kubelet to pass AFU ports and FME devices to containers, so the
conteainers have full access to the regions.
2018-06-15 12:28:16 +03:00
Dmitry Rozhkov
80b7b03576 fpga_plugin: refactor FPGA scans
This refactoring brings in device Cache running in its own
thread and scanning FPGA devices once every 5 secs. Then no timers
are used inside ListAndWatch() method of device managers and
no need to run scanning periodically in every device manager's
thread.

Cache generates update events and the plugin creates, updates or
deletes device managers on the fly upon receiving the events.

Introducing new modes of operations is a matter of adding a single
function converting and filtering the content of Cache.
2018-06-15 11:54:52 +03:00
Ed Bartosh
ce2437d5ec fix go_vet error reported by goreportcard.com
Fixed go_vet error:
 Line 413: error: TestisValidPluginMode has malformed name: first letter after 'Test' must not be lowercase (vet)
2018-05-29 10:50:52 +03:00
Ed Bartosh
6a3953fc85 reformatted *.go with gofmt -s -w
This is done to fix https://goreportcard.com warnnigs:

gofmt 33%
Gofmt formats Go programs. We run gofmt -s on your code, where -s is for the "simplify" command

intel-device-plugins-for-kubernetes/cmd/fpga_plugin/fpga_plugin_test.go
Line 1: warning: file is not gofmted with -s (gofmt)

intel-device-plugins-for-kubernetes/internal/deviceplugin/deviceplugin_test.go
Line 1: warning: file is not gofmted with -s (gofmt)

intel-device-plugins-for-kubernetes/cmd/gpu_plugin/gpu_plugin_test.go
Line 1: warning: file is not gofmted with -s (gofmt)

intel-device-plugins-for-kubernetes/cmd/fpga_plugin/fpga_plugin.go
Line 1: warning: file is not gofmted with -s (gofmt)
2018-05-28 16:59:19 +03:00
Ed Bartosh
7310a98343 fix golint warnings
Fixed the following golint warnings:
./cmd/fpga_plugin/fpga_plugin.go:71:2: struct field fpgaId should be fpgaID
./cmd/fpga_plugin/fpga_plugin.go:78:44: func parameter fpgaId should be fpgaID
./cmd/fpga_plugin/fpga_plugin.go:104:8: var interfaceId should be interfaceID
./cmd/fpga_plugin/fpga_plugin.go:120:7: var interfaceIdFile should be interfaceIDFile
./cmd/fpga_plugin/fpga_plugin.go:156:8: range var fpgaId should be fpgaID
./cmd/fpga_plugin/fpga_plugin.go:254:6: range var fpgaId should be fpgaID
./cmd/fpga_plugin/fpga_plugin.go:254:14: should omit 2nd value from range; this loop is equivalent to `for fpgaId := range ...`
./internal/deviceplugin/deviceplugin.go:30:6: exported type DeviceInfo should have comment or be unexported
./internal/deviceplugin/deviceplugin.go:35:6: exported type Server should have comment or be unexported
./internal/deviceplugin/deviceplugin.go:39:1: exported method Server.Serve should have comment or be unexported
./internal/deviceplugin/deviceplugin.go:43:1: exported method Server.Stop should have comment or be unexported
2018-05-28 16:53:37 +03:00
Ed Bartosh
8a8971ed5c fpga: add prefix to FPGA resource name
Added mode ("af" or "region") prefix to the resource name to
distingush between announced functions and regions, e.g.
 intel.com/fpga-af-f7df405cbd7acf7222f144b0b93acd18
 intel.com/fpga-region-ce48969398f05f33946d560708be108a
2018-05-28 15:38:09 +03:00
Ed Bartosh
983245b5a9 Reworked README.md
Split into 3 parts:
- main part with high level description of the repository
- Build and test Intel GPU Device Plugin for Kubernetes
- Build and test Intel FPGA Device Plugin for Kubernetes

Added Intel logo to the main README.md

Fixes #2
2018-05-25 10:31:53 +03:00
Ed Bartosh
4ef2705a8a use glog.Error when mode is incorrect
Using glog.Fatal produces stacktrace, which looks quite scary
for this simple case:
$ ./fpga_plugin -mode bla
F0523 15:17:57.997937   11555 fpga_plugin.go:237] Wrong mode: bla
goroutine 1 [running]:
github.com/intel/intel-device-plugins-for-kubernetes/vendor/github.com/golang/glog.stacks(0xc420214000, 0xc42018e000, 0x42, 0x8f)
	/home/ed/go/src/github.com/intel/intel-device-plugins-for-kubernetes/vendor/github.com/golang/glog/glog.go:769 +0xcf
github.com/intel/intel-device-plugins-for-kubernetes/vendor/github.com/golang/glog.(*loggingT).output(0xbf72c0, 0xc400000003, 0xc4200bea50, 0xba3309, 0xe, 0xed, 0x0)
	/home/ed/go/src/github.com/intel/intel-device-plugins-for-kubernetes/vendor/github.com/golang/glog/glog.go:720 +0x32d
github.com/intel/intel-device-plugins-for-kubernetes/vendor/github.com/golang/glog.(*loggingT).printDepth(0xbf72c0, 0x7f4500000003, 0x1, 0xc420079ec8, 0x2, 0x2)
	/home/ed/go/src/github.com/intel/intel-device-plugins-for-kubernetes/vendor/github.com/golang/glog/glog.go:646 +0x129
github.com/intel/intel-device-plugins-for-kubernetes/vendor/github.com/golang/glog.(*loggingT).print(0xbf72c0, 0x3, 0xc420079ec8, 0x2, 0x2)
	/home/ed/go/src/github.com/intel/intel-device-plugins-for-kubernetes/vendor/github.com/golang/glog/glog.go:637 +0x5a
github.com/intel/intel-device-plugins-for-kubernetes/vendor/github.com/golang/glog.Fatal(0xc420079ec8, 0x2, 0x2)
	/home/ed/go/src/github.com/intel/intel-device-plugins-for-kubernetes/vendor/github.com/golang/glog/glog.go:1128 +0x53
main.main()
	/home/ed/go/src/github.com/intel/intel-device-plugins-for-kubernetes/cmd/fpga_plugin/fpga_plugin.go:237 +0x5fb
2018-05-23 15:20:51 +03:00
Dmitry Rozhkov
49840e9720 fpga: add mode CLI switch
By default the fpga plugin announce regions' interface IDs. With
added `-mode af` switch the plugins announces IDs of accelerator
functions instead of regions.
2018-05-21 15:45:38 +03:00
Dmitry Rozhkov
7e830d7953 fpga: refactor afuID to fpgaId
We are going to use not only afu ids, but also regions' interface
ids as device identificators in future.
2018-05-21 12:23:04 +03:00
Dmitry Rozhkov
390d8583e9 init struct with explicit field names to avoid formatting warning 2018-05-21 11:05:13 +03:00
Alexander Kanevskiy
d4d77a71e4 Initial public code release 2018-05-18 18:30:54 +03:00