docs: gpu: Add more details, re-arrange section order

Re-arrange the section order a little (such as putting the use of the DaemonSet before the sudo hand-deploy), and add a lot more detail of what to expect, and how to check if the pod has launched correctly. Signed-off-by: Graham Whaley <graham.whaley@intel.com>
2025-06-03 03:59:37 +00:00 · 2020-01-17 13:31:55 +00:00 · 2020-01-17 13:31:55 +00:00 · 79a86c10e8
commit 79a86c10e8
parent 6705a8e461
1 changed files with 137 additions and 62 deletions
--- a/cmd/gpu_plugin/README.md
+++ b/cmd/gpu_plugin/README.md
@ -2,19 +2,18 @@

 # Table of Contents

-
 * [Introduction](#introduction)
-    * [Build and test](#build-and-test)
-        * [Getting the source code:](#getting-the-source-code)
-        * [Verify kubelet socket exists in /var/lib/kubelet/device-plugins/ directory:](#verify-kubelet-socket-exists-in-varlibkubeletdevice-plugins-directory)
-        * [Deploy GPU device plugin as host process for development purposes](#deploy-gpu-device-plugin-as-host-process-for-development-purposes)
-            * [Build GPU device plugin:](#build-gpu-device-plugin)
-            * [Run GPU device plugin as administrator:](#run-gpu-device-plugin-as-administrator)
-        * [Deploy GPU device plugin as a DaemonSet:](#deploy-gpu-device-plugin-as-a-daemonset)
-            * [Build plugin image](#build-plugin-image)
-            * [Create plugin DaemonSet](#create-plugin-daemonset)
-        * [Verify GPU device plugin is registered on master:](#verify-gpu-device-plugin-is-registered-on-master)
-        * [Test GPU device plugin:](#test-gpu-device-plugin)
+* [Installation](#installation)
+    * [Getting the source code](#getting-the-source-code)
+    * [Verify node kubelet config](#verify-node-kubelet-config)
+    * [Deploying as a DaemonSet](#deploying-as-a-daemonset)
+        * [Build the plugin image](#build-the-plugin-image)
+        * [Deploy plugin DaemonSet](#deploy-plugin-daemonset)
+    * [Deploy by hand](#deploy-by-hand)
+        * [Build the plugin](#build-the-plugin)
+        * [Run the plugin as administrator](#run-the-plugin-as-administrator)
+    * [Verify plugin registration](#verify-plugin-registration)
+    * [Testing the plugin](#testing-the-plugin)

 # Introduction

@ -26,90 +25,166 @@ and acceleration, supporting GPUs of the following hardware families:
 - Integrated GPUs within Intel Xeon processors
 - Intel Visual Compute Accelerator (Intel VCA)

-The GPU plugin offloads the processing of computation intensive workloads to GPU hardware.
+The GPU plugin facilitates offloading the processing of computation intensive workloads to GPU hardware.
 There are two primary use cases:

 - hardware vendor-independent acceleration using the [Intel Media SDK](https://github.com/Intel-Media-SDK/MediaSDK)
 - OpenCL code tuned for high end Intel devices.

-For example, the Intel Media SDK can offload video transcoding operations, and the OpenCL™ libraries can provide computation acceleration for Intel GPUs
+For example, the Intel Media SDK can offload video transcoding operations, and the OpenCL libraries can provide computation acceleration for Intel GPUs

-For information on Intel GVT-g virtual GPU device passthrough, see
+For information on Intel GVT-g virtual GPU device passthrough (as opposed to full device passthrough), see
 [this site](https://github.com/intel/gvt-linux/wiki/GVTg_Setup_Guide).

-## Build and test
+# Installation

-The following sections detail how to obtain, build, test and deploy the GPU device plugin.
+The following sections detail how to obtain, build, deploy and test the GPU device plugin.

-### Getting the source code:
+Examples are provided showing how to deploy the plugin either using a DaemonSet or by hand on a per-node basis.

-```
-$ mkdir -p $GOPATH/src/github.com/intel/
-$ cd $GOPATH/src/github.com/intel/
-$ git clone https://github.com/intel/intel-device-plugins-for-kubernetes.git
+## Getting the source code
+
+> **Note:** It is presumed you have a valid and configured [golang](https://golang.org/) environment
+> that meets the minimum required version.
+
+```bash
+$ go get -d -u github.com/intel/intel-device-plugins-for-kubernetes
 ```

-### Verify kubelet socket exists in /var/lib/kubelet/device-plugins/ directory:
-```
+## Verify node kubelet config
+
+Every node that will be running the gpu plugin must have the
+[kubelet device-plugins](https://kubernetes.io/docs/concepts/extend-kubernetes/compute-storage-net/device-plugins/)
+configured. For each node, check that the kubelet device plugin socket exists:
+
+```bash
 $ ls /var/lib/kubelet/device-plugins/kubelet.sock
 /var/lib/kubelet/device-plugins/kubelet.sock
 ```

-### Deploy GPU device plugin as host process for development purposes
+## Deploying as a DaemonSet

-#### Build GPU device plugin:
+To deploy the gpu plugin as a daemonset, you first need to build a container image for the
+plugin and ensure that is visible to your nodes.
+
+### Build the plugin image
+
+The following will use `docker` to build a local container image called
+`intel/intel-gpu-plugin` with the tag `devel`.
+
+The image build tool can be changed from the default `docker` by setting the `BUILDER` argument
+to the [`Makefile`](Makefile).
+
+```bash
+$ cd $GOPATH/src/github.com/intel/intel-device-plugins-for-kubernetes
+$ make intel-gpu-plugin
+...
+Successfully tagged intel/intel-gpu-plugin:devel
 ```
+
+### Deploy plugin DaemonSet
+
+You can then use the example DaemonSet YAML file provided to deploy the plugin.
+
+```bash
+$ kubectl create -f ./deployments/gpu_plugin/gpu_plugin.yaml
+daemonset.apps/intel-gpu-plugin created
+```
+
+> **Note**: It is also possible to run the GPU device plugin using a non-root user. To do this,
+the nodes' DAC rules must be configured to device plugin socket creation and kubelet registration.
+Furthermore, the deployments `securityContext` must be configured with appropriate `runAsUser/runAsGroup`.
+
+## Deploy by hand
+
+For development purposes, it is sometimes convenient to deploy the plugin 'by hand' on a node.
+In this case, you do not need to build the complete container image, and can build just the plugin.
+
+### Build the plugin
+
+First we build the plugin:
+
+```bash
 $ cd $GOPATH/src/github.com/intel/intel-device-plugins-for-kubernetes
 $ make gpu_plugin
 ```

-#### Run GPU device plugin as administrator:
-```
+### Run the plugin as administrator
+
+Now we can run the plugin directly on the node:
+
+```bash
 $ sudo $GOPATH/src/github.com/intel/intel-device-plugins-for-kubernetes/cmd/gpu_plugin/gpu_plugin
 device-plugin start server at: /var/lib/kubelet/device-plugins/gpu.intel.com-i915.sock
 device-plugin registered
 ```

-### Deploy GPU device plugin as a DaemonSet:
+## Verify plugin registration

-#### Build plugin image
-```
-$ make intel-gpu-plugin
+You can verify the plugin has been registered with the expected nodes by searching for the relevant
+resource allocation status on the nodes:
+
+```bash
+$ kubectl get nodes -o=jsonpath="{range .items[*]}{.metadata.name}{'\n'}{' i915: '}{.status.allocatable.gpu\.intel\.com/i915}{'\n'}"
+master
+ i915: 1
 ```

-#### Create plugin DaemonSet
-```
-$ kubectl create -f ./deployments/gpu_plugin/gpu_plugin.yaml
-daemonset.apps/intel-gpu-plugin created
-```
+## Testing the plugin

-**Note**: It is also possible to run the GPU device plugin using a non-root user. To do this,
-the nodes' DAC rules must be configured to device plugin socket creation and kubelet registration.
-Furthermore, the deployments `securityContext` must be configured with appropriate `runAsUser/runAsGroup`.
-
-### Verify GPU device plugin is registered on master:
-```
-$ kubectl describe node <node name> | grep gpu.intel.com
- gpu.intel.com/i915:  1
- gpu.intel.com/i915:  1
-```
-
-### Test GPU device plugin:
+We can test the plugin is working by deploying the provided example OpenCL image with FFT offload enabled.

 1. Build a Docker image with an example program offloading FFT computations to GPU:
-   ```
-   $ cd demo
-   $ ./build-image.sh ubuntu-demo-opencl
-   ```

-      This command produces a Docker image named `ubuntu-demo-opencl`.
+    ```bash
+    $ cd demo
+    $ ./build-image.sh ubuntu-demo-opencl
+    ...
+    Successfully tagged ubuntu-demo-opencl:devel
+    ```

-2. Create a pod running unit tests off the local Docker image:
-   ```
-   $ kubectl apply -f demo/intelgpu-job.yaml
-   ```
+1. Create a job running unit tests off the local Docker image:

-3. Review the pod's logs:
-   ```
-   $ kubectl logs intelgpu-demo-job-xxxx
-   ```
+    ```bash
+    $ cd $GOPATH/src/github.com/intel/intel-device-plugins-for-kubernetes
+    $ kubectl apply -f demo/intelgpu-job.yaml
+    job.batch/intelgpu-demo-job created
+    ```
+
+1. Review the job's logs:
+
+    ```bash
+    $ kubectl get pods | fgrep intelgpu
+    # substitute the 'xxxxx' below for the pod name listed in the above
+    $ kubectl logs intelgpu-demo-job-xxxxx
+    + WORK_DIR=/root/6-1/fft
+    + cd /root/6-1/fft
+    + ./fft
+    + uprightdiff --format json output.pgm /expected.pgm diff.pgm
+    + cat diff.json
+    + jq .modifiedArea
+    + MODIFIED_AREA=0
+    + [ 0 -gt 10 ]
+    + echo Success
+    Success
+    ```
+
+    If the pod did not successfully launch, possibly because it could not obtain the gpu
+    resource, it will be stuck in the `Pending` status:
+
+    ```bash
+    $ kubectl get pods
+    NAME                      READY   STATUS    RESTARTS   AGE
+    intelgpu-demo-job-xxxxx   0/1     Pending   0          8s
+    ```
+
+    This can be verified by checking the Events of the pod:
+
+    ```bash
+    $ kubectl describe pod intelgpu-demo-job-xxxxx
+    ...
+    Events:
+      Type     Reason            Age        From               Message
+      ----     ------            ----       ----               -------
+      Warning  FailedScheduling  <unknown>  default-scheduler  0/1 nodes are available: 1 Insufficient gpu.intel.com/i915.
+    ```