fpga: make AFU resource name 63 char long
webhook: drop mode from README
webhook: extend mappings description
webhook: tighten CRD definitions
webhook: drop mapping to non-existing afuId
explicitly state mappings names can be in any format
use consistent terminology across fpga webhook and plugin
Move all the fpga components to using klog for logging
and debug. This includes replacing our homebrew 'fatal()'
with klog.Error().
Modify the deployment files to move from `-debug` to
`-v`, and set their default level to '1' (Info), rather
than full debug mode ('4').
Signed-off-by: Graham Whaley <graham.whaley@intel.com>
This commit drops fpga_plugin dependency to k8s.io/kubernetes which
was used to get GetHostname(). After this change, the plugin node
name can be set using new -node-name parameter. The default value for
is read from NODE_NAME environment variable.
If the node annotation override check fails, we continue with the default
mode parameter and do not exist like we did previously.
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
go get'ing does not work due to our k8s.io/kubernetes dependency
so guide users to use git clone to get the code.
Fixes: #290
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
Not touching "orchestration programmed". Fixing only instances where
this refers directly to the mode recognized by the webhook-deploy.sh
script.
Signed-off-by: Antti Kervinen <antti.kervinen@intel.com>
Expand and re-arrange the README. Add some details about what the
plugin and other FPGA components provide.
Signed-off-by: Graham Whaley <graham.whaley@intel.com>
Add draw.io and their generated PNG files for both
orchestrated and preprogrammed FPGA modes. These will
then be used in the documentation.
Signed-off-by: Graham Whaley <graham.whaley@intel.com>
The default deployment gives rather wide host mounts.
Limited sysfs mount only to the subdirectory the plugin
needs.
Mounted sysfs and dev mounts read-only.
Added notes that FPGA plugin can be run as non-root user.
Extended fpga plugin to support both in-tree(DFL) and
out-of-tree (OPAE) kernel drivers.
- fpga_crihook: move JSON parsing to separate functions
- decreased cyclomatic complexity of the CRI hook main() function
- increased readability
- increased test coverage
Signed-off-by: Ed Bartosh <eduard.bartosh@intel.com>
When the device's firmware crashes the OPAE driver reports all properties
of the device as a stream of binary ones. This effectively makes
interface and afu IDs look like "ffffffffffffffffffffffffffffffff".
Mark such devices as Unhealthy.
closes#77
Every device plugin is supposed to implement PluginInterfaceServer
interface to be exposed as a gRPC service. But this functionality is
common for all our device plugins and can be hidden in a Manager
which manages all gRPC servers dynamically.
The only mandatory functionality that needs to be provided by a device
plugin and which differentiate one plugin from another is the code
scanning the host for devices present on it.
Refactor the internal deviceplugin package to accept only
one mandatory method implementation from device plugins - Scan().
In addition to that a device plugin can optionally implement a
PostAllocate() method which mutates responses returned by
PluginInterfaceServer.Allocate() method.
Also to narrow the gap between these device plugins and the
kubevirt's collection the naming scheme for resources has been changed.
Now device plugins provide a namespace for the device types they
operate with. E.g. for resources in format "color.example.com/<color>"
the namespace would be "color.example.com". So, the resource name
"intel.com/fpga-region-fffffff" becomes "fpga.intel.com/region-fffffff".
When kubelet notifies the plugin about its restart by removing
the plugin's socket we do reconnect to kubelet, but we don't
send the current list of monitored devices to kubelet. As result
kubelet is not aware of discovered devices if it restarts.
Always send the current list of monitored devices to kubelet
upon ListAndWatch() request.
When running in the region mode we don't need to know AFU IDs
thus don't read them while in the mode.
It's important not to try to read them because in the region mode
AFUs are supposed to be reprogrammed in the fly anyway and the
afu_id files may become busy.
Device descovery can get EBUSY error when AFU is being programmed.
It causes plugin to crash with error:
Device scan failed: read /sys/class/fpga/intel-fpga-dev.0/intel-fpga-port.0/afu_id:
device or resource busy
This error should be ignored as this is valid use case.
This is harmless as afu will be rediscovered on the next run, when
reprogramming is done.
Moved code that goes through sysfs to the separate function
getSysFsInfo to decrease cyclomatic complexity of the scanFPGAs
function.
This is required to get the next commit through our CI check.
Plugin sets container annotation com.intel.fpga.mode to
intel.com/fpga-region in region mode.
This should allow to configure CRI-O to run reprogramming hooks
only when annotation is set.