RHEL AI installation and options as ROCM, CUDA

As of RHEL 9 (and with growing support in RHEL 8.10+), Red Hat provides official capabilities for AI/ML workloads, including support for GPU acceleration via ROCm (for AMD GPUs) and NVIDIA CUDA (for NVIDIA GPUs). However, the level of support, installation methods, and available tooling differ significantly between the two ecosystems.

Below is a structured overview of AI-related installation options on RHEL, including ROCm, CUDA, and complementary AI/ML frameworks.


1. NVIDIA CUDA on RHEL

Official Support

  • Fully supported by Red Hat in partnership with NVIDIA.
  • Available via RHEL NVIDIA GPU drivers and CUDA repositories.
  • Certified for use with OpenShift AI, Red Hat Enterprise Linux AI (RHEL AI), and NVIDIA AI Enterprise.

Installation Options

Option A: Use RHEL NVIDIA Driver + CUDA from NVIDIA

  1. Enable required repos:bash12
  2. Install EPEL and kernel headers:bash12
  3. Download and install NVIDIA driver + CUDA Toolkit from NVIDIA’s site (select RHEL 9).
    • Or use NVIDIA’s RPM repo:bash12

Option B: Use NVIDIA AI Enterprise (NVAIE) on RHEL

  • Fully supported stack (drivers, CUDA, cuDNN, TensorRT, etc.).
  • Requires NVIDIA enterprise license.
  • Distributed via NGC Catalog and integrated with OpenShift AI.

Note: RHEL 9.2+ includes DKMS support, making NVIDIA driver updates more robust.


2. AMD ROCm on RHEL

Limited / Community-Level Support

  • Not officially certified by Red Hat for RHEL (as of early 2026).
  • AMD provides community-supported ROCm packages for RHEL 9, but not all RHEL versions or kernels are compatible.
  • Best support on RHEL 9.2–9.4 with specific AMD GPUs (e.g., MI210, MI250, RX 7900 XT).

Installation Steps (Community Approach)

  1. Check GPU compatibility:
    Only select AMD Instinct™ and Radeon Pro GPUs are supported. Consumer cards (e.g., RX 6000/7000) have partial or no support.
  2. Enable required repos:
  3. Add ROCm repo (from AMD):
  4. Install ROCm
  5. Reboot and verify

Known Issues:

  • Kernel updates may break ROCm (requires reinstall or DKMS setup).
  • SELinux may block GPU access—temporary workaround: setenforce 0 (not recommended in production).
  • No official Red Hat support; use at your own risk in enterprise environments.

3. RHEL AI (Red Hat Enterprise Linux AI)

Introduced in 2024, RHEL AI is a subscription-based offering that includes:

  • InstructLab – Open-source framework for local LLM training and tuning (based on IBM’s Granite models).
  • Optimized AI toolchain (PyTorch, vLLM, llama.cpp, etc.).
  • Integration with OpenShift AI for scalable MLOps.
  • Support for both NVIDIA and (future) AMD accelerators.

How to Install RHEL AI:

  1. Subscribe to RHEL AI via Red Hat Customer Portal.
  2. Enable the AI repository: “sudo subscription-manager repos –enable=rhel-ai-for-x86_64-rpms”
  3. Install InstructLab:
  4. “sudo dnf install ilab
  5. ilab config init
  6. ilab model download”

This is the recommended path for enterprise AI on RHEL, especially for NVIDIA-based systems.


4. Other AI/ML Tools on RHEL

Tool/FrameworkAvailability on RHEL
PyTorchVia PyPI, Conda, or RHEL AI repos
TensorFlowCommunity builds; not in base RHEL repos
JupyterLabAvailable in AppStream (dnf install jupyterlab)
OpenVINOIntel’s toolkit; supported on RHEL
ONNX RuntimeAvailable via PyPI or Microsoft RPMs

Recommendations by Use Case

Use CaseRecommended Stack
Enterprise AI (production)RHEL AI + NVIDIA GPU + OpenShift AI
Research / Dev (NVIDIA)CUDA + PyTorch/TensorFlow from NVIDIA
AMD GPU experimentationROCm (community) + PyTorch (HIP)
CPU-only LLMsRHEL AI + InstructLab (no GPU needed)

Resources

Scroll to Top