diff --git a/README.md b/README.md index dbb50b96..c3e696c2 100644 --- a/README.md +++ b/README.md @@ -23,6 +23,7 @@ Table of Contents * [DLB device plugin](#dlb-device-plugin) * [IAA device plugin](#iaa-device-plugin) * [Device Plugins Operator](#device-plugins-operator) +* [XeLink XPU-Manager sidecar](#xelink-xpu-manager-sidecar) * [Demos](#demos) * [Workload Authors](#workload-authors) * [Developers](#developers) @@ -203,6 +204,12 @@ The [Device plugins operator README](cmd/operator/README.md) gives the installat The [Device plugins Operator for OCP](cmd/operator/ocp_quickstart_guide/README.md) gives the installation and usage details for the operator available on [Red Hat OpenShift Container Platform](https://catalog.redhat.com/software/operators/detail/61e9f2d7b9cdd99018fc5736). +## XeLink XPU-Manager Sidecar + +To support interconnected GPUs in Kubernetes, XeLink sidecar is needed. + +The [XeLink XPU-Manager sidecar README](cmd/xpumanager_sidecar/README.md) gives information how the sidecar functions and how to use it. + ## Demos The [demo subdirectory](demo/readme.md) contains a number of demonstrations for diff --git a/cmd/xpumanager_sidecar/README.md b/cmd/xpumanager_sidecar/README.md new file mode 100644 index 00000000..dff89d89 --- /dev/null +++ b/cmd/xpumanager_sidecar/README.md @@ -0,0 +1,72 @@ +# XeLink sidecar for Intel XPU Manager + +Table of Contents + +* [Introduction](#introduction) +* [Modes and Configuration Options](#modes-and-configuration-options) +* [Installation](#installation) + * [Install XPU-Manager with the Sidecar](#install-xpu-manager-with-the-sidecar) + * [Install Sidecar to an Existing XPU-Manager](#install-sidecar-to-an-existing-xpu-manager) +* [Verify Sidecar Functionality](#verify-sidecar-functionality) + +## Introduction + +Intel GPUs can be interconnected via an XeLink. In some workloads it is beneficial to use GPUs that are XeLinked together for optimal performance. XeLink information is provided by [Intel XPU Manager](https://www.github.com/intel/xpumanager) via its metrics API. Xelink sidecar retrieves the information from XPU Manager and stores it on the node under ```/etc/kubernetes/node-feature-discovery/features.d/``` as a feature label file. [NFD](https://github.com/kubernetes-sigs/node-feature-discovery) reads this file and converts it to Kubernetes node labels. These labels are then used by [GAS](https://github.com/intel/platform-aware-scheduling/tree/master/gpu-aware-scheduling) to make [scheduling decisions](https://github.com/intel/platform-aware-scheduling/blob/master/gpu-aware-scheduling/docs/usage.md#multi-gpu-allocation-with-xe-link-connections) for Pods. + +## Modes and Configuration Options + +| Flag | Argument | Default | Meaning | +|:---- |:-------- |:------- |:------- | +| -lane-count | int | 4 | Minimum lane count for an XeLink interconnect to be accepted | +| -interval | int | 10 | Interval for XeLink topology fetching and label writing (seconds, >= 1) | +| -startup-delay | int | 10 | Startup delay before the first topology fetching (seconds, >= 0) | +| -label-namespace | string | gpu.intel.com | Namespace or prefix for the labels. i.e. **gpu.intel.com**/xe-links | + +The sidecar also accepts a number of other arguments. Please use the -h option to see the complete list of options. + +## Installation + +The following sections detail how to obtain, deploy and test the XPU-Manager XeLink sidecar. + +### Pre-built Images + +[Pre-built images](https://hub.docker.com/r/intel/intel-xpumanager-sidecar) +of this component are available on the Docker hub. These images are automatically built and uploaded +to the hub from the latest main branch of this repository. + +Release tagged images of the components are also available on the Docker hub, tagged with their +release version numbers in the format `x.y.z`, corresponding to the branches and releases in this +repository. + +Note: Replace `` with the desired [release tag](https://github.com/intel/intel-device-plugins-for-kubernetes/tags) or `main` to get `devel` images. + +See [the development guide](../../DEVEL.md) for details if you want to deploy a customized version of the plugin. + +#### Install XPU-Manager with the Sidecar + +Install XPU-Manager daemonset with the XeLink sidecar + +```bash +$ kubectl apply -k 'https://github.com/intel/intel-device-plugins-for-kubernetes/deployments/xpumanager_sidecar?ref=' +``` + +Please see XPU-Manager Kubernetes files for additional info on [installation](https://github.com/intel/xpumanager/tree/master/deployment/kubernetes). + +#### Install Sidecar to an Existing XPU-Manager + +Use patch to add sidecar into the XPU-Manager daemonset. + +```bash +$ kubectl patch daemonsets.apps intel-xpumanager --patch-file 'https://github.com/intel/intel-device-plugins-for-kubernetes/deployments/xpumanager_sidecar/kustom/kustom_xpumanager.yaml?ref=' +``` + +NOTE: The sidecar patch will remove other resources from the XPU-Manager container. If your XPU-Manager daemonset is using, for example, the smarter device manager resources, those will be removed. + +#### Verify Sidecar Functionality + +You can verify the sidecar's functionality by checking node's xe-links labels: + +```bash +$ kubectl get nodes -A -o=jsonpath="{range .items[*]}{.metadata.name},{.metadata.labels.gpu\.intel\.com\/xe-links}{'\n'}{end}" +master,0.0-1.0_0.1-1.1 +```