mirror of
https://github.com/intel/intel-device-plugins-for-kubernetes.git
synced 2025-06-03 03:59:37 +00:00
Merge pull request #1149 from eero-t/gpu-reqs
Add GPU plugin README prerequisites section
This commit is contained in:
commit
89d3c5a4f3
@ -5,6 +5,11 @@ Table of Contents
|
|||||||
* [Introduction](#introduction)
|
* [Introduction](#introduction)
|
||||||
* [Modes and Configuration Options](#modes-and-configuration-options)
|
* [Modes and Configuration Options](#modes-and-configuration-options)
|
||||||
* [Installation](#installation)
|
* [Installation](#installation)
|
||||||
|
* [Prerequisites](#prerequisites)
|
||||||
|
* [Drivers for discrete GPUs](#drivers-for-discrete-gpus)
|
||||||
|
* [Kernel driver](#kernel-driver)
|
||||||
|
* [User-space drivers](#user-space-drivers)
|
||||||
|
* [Drivers for older (integrated) GPUs](#drivers-for-older-integrated-gpus)
|
||||||
* [Pre-built Images](#pre-built-images)
|
* [Pre-built Images](#pre-built-images)
|
||||||
* [Install to all nodes](#install-to-all-nodes)
|
* [Install to all nodes](#install-to-all-nodes)
|
||||||
* [Install to nodes with Intel GPUs with NFD](#install-to-nodes-with-intel-gpus-with-nfd)
|
* [Install to nodes with Intel GPUs with NFD](#install-to-nodes-with-intel-gpus-with-nfd)
|
||||||
@ -19,7 +24,8 @@ Table of Contents
|
|||||||
## Introduction
|
## Introduction
|
||||||
|
|
||||||
Intel GPU plugin facilitates Kubernetes workload offloading by providing access to
|
Intel GPU plugin facilitates Kubernetes workload offloading by providing access to
|
||||||
discrete (including Intel® Data Center GPU Flex Series) and integrated Intel GPU device files.
|
discrete (including Intel® Data Center GPU Flex Series) and integrated Intel GPU devices
|
||||||
|
supported by the host kernel.
|
||||||
|
|
||||||
Use cases include, but are not limited to:
|
Use cases include, but are not limited to:
|
||||||
- Media transcode
|
- Media transcode
|
||||||
@ -50,6 +56,73 @@ The following sections detail how to obtain, build, deploy and test the GPU devi
|
|||||||
|
|
||||||
Examples are provided showing how to deploy the plugin either using a DaemonSet or by hand on a per-node basis.
|
Examples are provided showing how to deploy the plugin either using a DaemonSet or by hand on a per-node basis.
|
||||||
|
|
||||||
|
### Prerequisites
|
||||||
|
|
||||||
|
Access to a GPU device requires firmware, kernel and user-space
|
||||||
|
drivers supporting it. Firmware and kernel driver need to be on the
|
||||||
|
host, user-space drivers in the GPU workload containers.
|
||||||
|
|
||||||
|
Intel GPU devices supported by the current kernel can be listed with:
|
||||||
|
```
|
||||||
|
$ grep i915 /sys/class/drm/card?/device/uevent
|
||||||
|
/sys/class/drm/card0/device/uevent:DRIVER=i915
|
||||||
|
/sys/class/drm/card1/device/uevent:DRIVER=i915
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Drivers for discrete GPUs
|
||||||
|
|
||||||
|
##### Kernel driver
|
||||||
|
|
||||||
|
For now, kernel needs to be built from sources. Later on there will
|
||||||
|
also be pre-built kernels and/or DKMS GPU module distro packages for
|
||||||
|
the enterprise / long-term-support kernels.
|
||||||
|
|
||||||
|
While last 5.x upstream Linux kernel releases already had preliminary
|
||||||
|
discrete Intel GPU support, one should really use kernel v6.x.
|
||||||
|
|
||||||
|
In upstream kernels, discrete GPU support needs to be enabled with kernel
|
||||||
|
`i915.force_probe=<PCI_ID>` command line option until relevant kernel
|
||||||
|
driver features have been completed in upstream:
|
||||||
|
https://www.kernel.org/doc/html/latest/gpu/rfc/index.html
|
||||||
|
|
||||||
|
PCI IDs for the Intel GPUs on given host can be listed with:
|
||||||
|
```
|
||||||
|
$ lspci | grep -e VGA -e Display | grep Intel
|
||||||
|
88:00.0 Display controller: Intel Corporation Device 56c1 (rev 05)
|
||||||
|
8d:00.0 Display controller: Intel Corporation Device 56c1 (rev 05)
|
||||||
|
```
|
||||||
|
|
||||||
|
(`lspci` lists GPUs with display support as "VGA compatible controller",
|
||||||
|
and server GPUs without display support, as "Display controller".)
|
||||||
|
|
||||||
|
Mesa "Iris" 3D driver header provides a mapping between GPU PCI IDs and their Intel brand names:
|
||||||
|
https://gitlab.freedesktop.org/mesa/mesa/-/blob/main/include/pci_ids/iris_pci_ids.h
|
||||||
|
|
||||||
|
If your kernel build does not find the correct firmware version for
|
||||||
|
a given GPU from the host (see `dmesg | grep i915` output), latest
|
||||||
|
firmware versions are available in upstream:
|
||||||
|
https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/tree/i915
|
||||||
|
|
||||||
|
##### User-space drivers
|
||||||
|
|
||||||
|
Until new enough user-space drivers (supporting also discrete GPUs)
|
||||||
|
are available directly from distribution package repositories, they
|
||||||
|
can be installed to containers from Intel package repositories. See:
|
||||||
|
https://dgpu-docs.intel.com/installation-guides/index.html
|
||||||
|
|
||||||
|
Example container is listed in [Testing and demos](#testing-and-demos).
|
||||||
|
|
||||||
|
Validation status against *upstream* kernel is listed in the user-space drivers release notes:
|
||||||
|
* Media driver: https://github.com/intel/media-driver/releases
|
||||||
|
* Compute driver: https://github.com/intel/compute-runtime/releases
|
||||||
|
|
||||||
|
#### Drivers for older (integrated) GPUs
|
||||||
|
|
||||||
|
For the older (integrated) GPUs, new enough firmware and kernel driver
|
||||||
|
are typically included already with the host OS, and new enough
|
||||||
|
user-space drivers (for the GPU containers) are in the host OS
|
||||||
|
repositories.
|
||||||
|
|
||||||
### Pre-built Images
|
### Pre-built Images
|
||||||
|
|
||||||
[Pre-built images](https://hub.docker.com/r/intel/intel-gpu-plugin)
|
[Pre-built images](https://hub.docker.com/r/intel/intel-gpu-plugin)
|
||||||
@ -155,8 +228,8 @@ master
|
|||||||
## Testing and Demos
|
## Testing and Demos
|
||||||
|
|
||||||
We can test the plugin is working by deploying an OpenCL image and running `clinfo`.
|
We can test the plugin is working by deploying an OpenCL image and running `clinfo`.
|
||||||
The sample OpenCL image can be built using `make intel-opencl-icd` and must be made
|
[intel-opencl-icd](../../demo/intel-opencl-icd/) sample OpenCL image, built using
|
||||||
available in the cluster.
|
`make intel-opencl-icd` and available from DockerHub, is used for this.
|
||||||
|
|
||||||
1. Create a job:
|
1. Create a job:
|
||||||
|
|
||||||
@ -174,8 +247,8 @@ available in the cluster.
|
|||||||
<log output>
|
<log output>
|
||||||
```
|
```
|
||||||
|
|
||||||
If the pod did not successfully launch, possibly because it could not obtain the gpu
|
If the pod did not successfully launch, possibly because it could not obtain
|
||||||
resource, it will be stuck in the `Pending` status:
|
the requested GPU resource, it will be stuck in the `Pending` status:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
$ kubectl get pods
|
$ kubectl get pods
|
||||||
|
Loading…
Reference in New Issue
Block a user