Merge pull request #1149 from eero-t/gpu-reqs

Add GPU plugin README prerequisites section
2025-06-03 03:59:37 +00:00 · 2022-09-27 12:32:22 +03:00 · 2022-09-27 12:32:22 +03:00 · 89d3c5a4f3
commit 89d3c5a4f3
parent d491f462f4 9b3ee06cb1
1 changed files with 78 additions and 5 deletions
--- a/cmd/gpu_plugin/README.md
+++ b/cmd/gpu_plugin/README.md
@ -5,6 +5,11 @@ Table of Contents
 * [Introduction](#introduction)
 * [Modes and Configuration Options](#modes-and-configuration-options)
 * [Installation](#installation)
    * [Prerequisites](#prerequisites)
        * [Drivers for discrete GPUs](#drivers-for-discrete-gpus)
            * [Kernel driver](#kernel-driver)
            * [User-space drivers](#user-space-drivers)
        * [Drivers for older (integrated) GPUs](#drivers-for-older-integrated-gpus)
    * [Pre-built Images](#pre-built-images)
         * [Install to all nodes](#install-to-all-nodes)
         * [Install to nodes with Intel GPUs with NFD](#install-to-nodes-with-intel-gpus-with-nfd)
@ -19,7 +24,8 @@ Table of Contents
 ## Introduction
 Intel GPU plugin facilitates Kubernetes workload offloading by providing access to
-discrete (including Intel® Data Center GPU Flex Series) and integrated Intel GPU device files.
+discrete (including Intel® Data Center GPU Flex Series) and integrated Intel GPU devices
 supported by the host kernel.
 Use cases include, but are not limited to:
 - Media transcode
@ -50,6 +56,73 @@ The following sections detail how to obtain, build, deploy and test the GPU devi
 Examples are provided showing how to deploy the plugin either using a DaemonSet or by hand on a per-node basis.
 ### Prerequisites
 Access to a GPU device requires firmware, kernel and user-space
 drivers supporting it.  Firmware and kernel driver need to be on the
 host, user-space drivers in the GPU workload containers.
 Intel GPU devices supported by the current kernel can be listed with:
 ```
 $ grep i915 /sys/class/drm/card?/device/uevent
 /sys/class/drm/card0/device/uevent:DRIVER=i915
 /sys/class/drm/card1/device/uevent:DRIVER=i915
 ```
 #### Drivers for discrete GPUs
 ##### Kernel driver
 For now, kernel needs to be built from sources. Later on there will
 also be pre-built kernels and/or DKMS GPU module distro packages for
 the enterprise / long-term-support kernels.
 While last 5.x upstream Linux kernel releases already had preliminary
 discrete Intel GPU support, one should really use kernel v6.x.
 In upstream kernels, discrete GPU support needs to be enabled with kernel
 `i915.force_probe=<PCI_ID>` command line option until relevant kernel
 driver features have been completed in upstream:
 https://www.kernel.org/doc/html/latest/gpu/rfc/index.html
 PCI IDs for the Intel GPUs on given host can be listed with:
 ```
 $ lspci | grep -e VGA -e Display | grep Intel
 88:00.0 Display controller: Intel Corporation Device 56c1 (rev 05)
 8d:00.0 Display controller: Intel Corporation Device 56c1 (rev 05)
 ```
 (`lspci` lists GPUs with display support as "VGA compatible controller",
 and server GPUs without display support, as "Display controller".)
 Mesa "Iris" 3D driver header provides a mapping between GPU PCI IDs and their Intel brand names:
 https://gitlab.freedesktop.org/mesa/mesa/-/blob/main/include/pci_ids/iris_pci_ids.h
 If your kernel build does not find the correct firmware version for
 a given GPU from the host (see `dmesg | grep i915` output), latest
 firmware versions are available in upstream:
 https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/tree/i915
 ##### User-space drivers
 Until new enough user-space drivers (supporting also discrete GPUs)
 are available directly from distribution package repositories, they
 can be installed to containers from Intel package repositories. See:
 https://dgpu-docs.intel.com/installation-guides/index.html
 Example container is listed in [Testing and demos](#testing-and-demos).
 Validation status against *upstream* kernel is listed in the user-space drivers release notes:
 * Media driver: https://github.com/intel/media-driver/releases
 * Compute driver: https://github.com/intel/compute-runtime/releases
 #### Drivers for older (integrated) GPUs
 For the older (integrated) GPUs, new enough firmware and kernel driver
 are typically included already with the host OS, and new enough
 user-space drivers (for the GPU containers) are in the host OS
 repositories.
 ### Pre-built Images
 [Pre-built images](https://hub.docker.com/r/intel/intel-gpu-plugin)
@ -155,8 +228,8 @@ master
 ## Testing and Demos
 We can test the plugin is working by deploying an OpenCL image and running `clinfo`.
-The sample OpenCL image can be built using `make intel-opencl-icd` and must be made
+[intel-opencl-icd](../../demo/intel-opencl-icd/) sample OpenCL image, built using
-available in the cluster.
+`make intel-opencl-icd` and available from DockerHub, is used for this.
 1. Create a job:
@ -174,8 +247,8 @@ available in the cluster.
    <log output>
    ```
-    If the pod did not successfully launch, possibly because it could not obtain the gpu
+    If the pod did not successfully launch, possibly because it could not obtain
-    resource, it will be stuck in the `Pending` status:
+    the requested GPU resource, it will be stuck in the `Pending` status:
    ```bash
    $ kubectl get pods