Commit Graph

72 Commits

Author SHA1 Message Date
Dmitry Rozhkov
f60aad70d6 webhook: deny all requests for unknown FPGA resources 2018-08-20 12:09:31 +03:00
Ed Bartosh
917b68206e
Merge pull request #90 from rojkov/readme
clarify fpga plugin modes in README.md
2018-08-20 11:41:23 +03:00
Dmitry Rozhkov
009a6ebfb6 clarify fpga plugin modes in README.md 2018-08-20 10:28:11 +03:00
Dmitry Rozhkov
eccd70c600 replace glog with simpler home-grown debug logging 2018-08-16 17:40:16 +03:00
Dmitry Rozhkov
2ff6c5929a Use annotated errors for tracing 2018-08-16 17:31:19 +03:00
Ed Bartosh
cf34e92cab
Merge pull request #88 from rojkov/shared-gpu
gpu_plugin: add -shared-dev-num option
2018-08-15 16:45:05 +03:00
Alexander D. Kanevskiy
5cb6f34234
Merge pull request #86 from bart0sh/PR0034-fix-link
fix broken links in the FPGA plugin documentation
2018-08-14 22:47:53 +03:00
Ed Bartosh
a0b6651c1d QAT: fix spelling
Fixed misspell warning:
   Line 153: warning: "Unknwon" is a misspelling of "Unknown" (misspell)
2018-08-14 16:42:02 +03:00
Ed Bartosh
cf8d6bbc3f fix broken links in the FPGA plugin documentation 2018-08-14 15:00:48 +03:00
Dmitry Rozhkov
40246f64ad gpu_plugin: add -shared-dev-num option
The DRM driver of Intel i915 GPUs allows sharing one device
between many containers.

Make it possible to use the same device from different containers.
The exact number of containers sharing the same device can be limited
with the new option -shared-dev-num set to 1 by default.

closes #53
2018-08-14 14:54:49 +03:00
Dmitry Rozhkov
92f72e4196 fpga_plugin: indicate unhealthy devices
When the device's firmware crashes the OPAE driver reports all properties
of the device as a stream of binary ones. This effectively makes
interface and afu IDs look like "ffffffffffffffffffffffffffffffff".

Mark such devices as Unhealthy.

closes #77
2018-08-13 11:52:51 +03:00
Dmitry Rozhkov
56b85b1830 webhook: update README.md 2018-08-08 17:58:18 +03:00
Dmitry Rozhkov
271bc0d29a webhook: add dynamically configured mappings
Currently we have hardcoded mapping from human readable names of
AFs and FPGA regions like arria10-nlb0 to the resource names
produced by the FPGA device plugin. This is not sustainable
long term solution.

Implement CRD based mappings so that a new mapping can be added or
removed dynamically by cluster admins with CRD resources.
2018-08-08 17:58:18 +03:00
Alexander D. Kanevskiy
2a6eda891a
Merge pull request #68 from bart0sh/PR0030-fix-annotation-value
fpga_crihook: fix annotation value
2018-08-08 01:18:18 +03:00
Ed Bartosh
9de82c819f fpga_crihook: fix annotation value
Annotation value has been changed in FPGA plugin code recently.
Updated fpga_crihook to use the same value.
2018-08-07 17:40:16 +03:00
Ed Bartosh
71e8ea471a fpga_crihook: specify socket number when programming device
Added -S <device number> parameter to fpgaconf command line to
ensure usage of allocated device.
2018-08-07 17:32:59 +03:00
Alexander D. Kanevskiy
f8c3e9abf4
Merge pull request #62 from MCamp859/qat-readme-edits
QAT readme edits for format and text flow.
2018-08-03 20:56:52 +03:00
MCamp859
68c099db99 Fixed typo.
Signed-off-by: MCamp859 <mary.camp@ptiglobal.net>
2018-08-02 10:28:30 -04:00
MCamp859
1fe88a9067 Updated with review comments.
Signed-off-by: MCamp859 <mary.camp@ptiglobal.net>
2018-08-02 10:10:14 -04:00
MCamp859
4d5046f860 QAT readme edits for format and text flow.
Signed-off-by: MCamp859 <mary.camp@ptiglobal.net>
2018-07-31 16:08:27 -04:00
MCamp859
6544d35ab1 QAT readme edits for format and text flow.
Signed-off-by: MCamp859 <mary.camp@ptiglobal.net>
2018-07-31 16:04:47 -04:00
Mary Camp
51bb79bc60
Merge branch 'master' into fpga-readme-edits 2018-07-31 13:18:40 -04:00
MCamp859
be66697049 Replaced "afu" with "af" in 2 places.
Signed-off-by: MCamp859 <mary.camp@ptiglobal.net>
2018-07-31 10:33:18 -04:00
Ed Bartosh
21e0e5c518
Merge pull request #48 from rojkov/refactor-dev-plugins
Refactor dev plugins to increase code reuse
2018-07-31 13:37:16 +03:00
MCamp859
a29e04f614 Edits to FPGA readme files for format and text flow.
Signed-off-by: MCamp859 <mary.camp@ptiglobal.net>
2018-07-30 16:13:47 -04:00
MCamp859
fb5e20c14a Edits for format and text flow.
Signed-off-by: MCamp859 <mary.camp@ptiglobal.net>
2018-07-30 13:59:26 -04:00
MCamp859
f3f749d4f5 Edits for format and text flow.
Signed-off-by: MCamp859 <mary.camp@ptiglobal.net>
2018-07-30 13:54:01 -04:00
Dmitry Rozhkov
b6894b8195 refactor QAT plugin 2018-07-30 15:29:33 +03:00
Dmitry Rozhkov
972a80bedb fpga_admissionwebhook: update resource names 2018-07-30 15:29:33 +03:00
Dmitry Rozhkov
1e7dbac162 Update README.md files to reflect changes caused by refactoring
update demo files
2018-07-30 15:29:33 +03:00
Dmitry Rozhkov
bbee3fde77 refactor device plugins to increase code reuse
Every device plugin is supposed to implement PluginInterfaceServer
interface to be exposed as a gRPC service. But this functionality is
common for all our device plugins and can be hidden in a Manager
which manages all gRPC servers dynamically.

The only mandatory functionality that needs to be provided by a device
plugin and which differentiate one plugin from another is the code
scanning the host for devices present on it.

Refactor the internal deviceplugin package to accept only
one mandatory method implementation from device plugins - Scan().

In addition to that  a device plugin can optionally implement a
PostAllocate() method which mutates responses returned by
PluginInterfaceServer.Allocate() method.

Also to narrow the gap between these device plugins and the
kubevirt's collection the naming scheme for resources has been changed.
Now device plugins provide a namespace for the device types they
operate with. E.g. for resources in format "color.example.com/<color>"
the namespace would be "color.example.com". So, the resource name
"intel.com/fpga-region-fffffff" becomes "fpga.intel.com/region-fffffff".
2018-07-30 15:29:33 +03:00
Mikko Ylinen
6c787ec658 qat: read maxNumdevice as integer 2018-07-26 14:20:44 +03:00
Alexander D. Kanevskiy
6c08dbdb64
Merge pull request #54 from zhenyw/gpu
gpu_plugin: skip drm control node
2018-07-26 15:03:04 +03:00
Ed Bartosh
a4b3f7f068 qat: fix formatting errors 2018-07-26 12:44:26 +03:00
Zhenyu Wang
ec632e0b38 gpu_plugin: skip drm control node
DRM control node is deprecated and removed by latest kernel.
This will skip possible drm control node found on host.

v2: Fix lint error
v3: Fix regex string

Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
2018-07-26 10:35:53 +08:00
Zhenyu Wang
6f3543884f gpu_plugin: Fix regex string for drm card node
As noted on pull request comment, fix regex for drm card node.

Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
2018-07-26 10:33:12 +08:00
Alexander D. Kanevskiy
1140756037
Merge pull request #50 from swatisehgal/dev/qat
Enabling support for QuickAssist Devices
2018-07-24 16:13:18 +03:00
ssehgal
3eb2b10f75 Enabling support for QuickAssist Devices 2018-07-23 17:35:37 +01:00
Dmitry Rozhkov
ff813285ed typo fixed 2018-07-20 10:44:37 +03:00
Ed Bartosh
b1b2edf1b8 fpga_crihook: check if requested AF is programmed
Check if programmed AF id is equal to the requested AF id
after re-programming a device.
2018-07-18 12:27:45 +03:00
Ed Bartosh
9df1afdf43 fpga_crihook: check if intel annotation is set
Check if container annotation com.intel.fpga.mode is set to
"intel.com/fpga-region". This annotation is set by device plugin.
So, the check should help to filter out unwanted workflow that
device plugin is not aware of.
2018-07-16 16:12:59 +03:00
Ed Bartosh
2f9debe35b fpga_crihook: check if bitstream is already programmed
FPGA device can be already programmed with requested bitstream.
In this case hook should not programm the device again.
2018-07-13 14:16:27 +03:00
Dmitry Rozhkov
8f977b7782 Send device list upon reconnecting to kubelet
When kubelet notifies the plugin about its restart by removing
the plugin's socket we do reconnect to kubelet, but we don't
send the current list of monitored devices to kubelet. As result
kubelet is not aware of discovered devices if it restarts.

Always send the current list of monitored devices to kubelet
upon ListAndWatch() request.
2018-07-11 12:04:43 +03:00
Ed Bartosh
7f83feaf99
Merge pull request #41 from rojkov/vpg-demo
Update GPU demo to run FFT on the device
2018-07-06 14:45:27 +03:00
Dmitry Rozhkov
945f6e98f7 Update GPU demo to run FFT on the device
Also the demo runs on top of Intel compute runtime instead of
Beignet.
2018-07-05 16:19:16 +03:00
Ed Bartosh
69a32df387 fpga_crihook: covered by tests 2018-07-05 13:49:09 +03:00
Ed Bartosh
b4476110f8 implement CRI prestart hook
The hook gets FPGA_REGION and FPGA_BITSTREAM environment variables
defined in a pod spec, finds bitstream file, verifies it and programs
FPGA device with it using fpga-configure tool from OPAE.
2018-07-05 13:49:09 +03:00
Dmitry Rozhkov
bb2403deb9 fpga: ignore afu_id in region mode
When running in the region mode we don't need to know AFU IDs
thus don't read them while in the mode.

It's important not to try to read them because in the region mode
AFUs are supposed to be reprogrammed in the fly anyway and the
afu_id files may become busy.
2018-07-04 12:02:07 +03:00
Ed Bartosh
54fd4f6f8f fpga: ignore EBUSY error when reading afu_id
Device descovery can get EBUSY error when AFU is being programmed.
It causes plugin to crash with error:
  Device scan failed: read /sys/class/fpga/intel-fpga-dev.0/intel-fpga-port.0/afu_id:
      device or resource busy

This error should be ignored as this is valid use case.
This is harmless as afu will be rediscovered on the next run, when
reprogramming is done.
2018-07-03 11:09:09 +03:00
Ed Bartosh
6a571e7d5b fpga: decrease cyclomatic complexity of scanFPGAs
Moved code that goes through sysfs to the separate function
getSysFsInfo to decrease cyclomatic complexity of the scanFPGAs
function.

This is required to get the next commit through our CI check.
2018-07-03 11:09:09 +03:00