# AMD HPC Training Examples Repo

Welcome to AMD's HPC Training Examples Repo!

(Last revision of this README: **April 2nd, 2025**).

Here you will find a variety of examples to showcase the capabilities of AMD's GPU software stack.
Please be aware that the repo is continuously updated to keep up with the most recent releases of the AMD software, and also to increase the number of examples and use cases that we strive to provide for our users.

## Repository Structure

Please refer to this table of contents to locate the exercises and examples you are interested in, sorted by topic.

1. [**HIP**](https://github.com/amd/HPCTrainingExamples/tree/main/HIP)
   1. ***HIP Functionality Checks***
      1. [`query_device`](https://github.com/amd/HPCTrainingExamples/tree/main/HIP/Stream_Overlap): checks that `hipMemGetInfo` works.
   2. ***Fundamental Examples***
      1. [`basic_examples`](https://github.com/amd/HPCTrainingExamples/tree/main/HIP/basic_examples): a collection of introductory exercises to get familiar with the HIP API and the HIP build process. Examples include an hipification of some CUDA code, device to host data transfer, error checking, and basic GPU kernel implementation. Begin here if you are just starting with HIP. [`README`](https://github.com/amd/HPCTrainingExamples/tree/main/HIP/basic_examples/README.md).
      2. [`Stream_Overlap`](https://github.com/amd/HPCTrainingExamples/tree/main/HIP/Stream_Overlap): this example shows how to share the workload of a GPU offload computation using several overlapping HIP streams. Note that AMD GPUs natively support the creation of multiple stream queues on the same GPU. The result is an additional gain in terms of time of execution due to the additional parallelism provided by the overlapping streams. [`README`](https://github.com/amd/HPCTrainingExamples/blob/main/HIP/Stream_Overlap/README.md).
      3. [`dgemm`](https://github.com/amd/HPCTrainingExamples/tree/main/HIP/dgemm): a (d)GEMM application created as an exercise to showcase simple matrix-matrix multiplications on AMD GPUs. [`README`](https://github.com/amd/HPCTrainingExamples/tree/main/HIP/dgemm/README.md).
      4. [`hip_stream`](https://github.com/amd/HPCTrainingExamples/tree/main/HIP/hip-stream): modification of the STREAM benchmark for HIP. [`README`](https://github.com/amd/HPCTrainingExamples/blob/main/HIP/hip-stream/README.md).
      5. [`jacobi`](https://github.com/amd/HPCTrainingExamples/tree/main/HIP/jacobi): distributed Jacobi solver, using GPUs to perform the computation and MPI for halo exchanges. [`README`](https://github.com/amd/HPCTrainingExamples/blob/main/HIP/jacobi/README.md).
      6. [`matrix_addition`](https://github.com/amd/HPCTrainingExamples/tree/main/HIP/matrix_addition): example of a HIP kernel performing a matrix addition.
      7. [`saxpy`](https://github.com/amd/HPCTrainingExamples/tree/main/HIP/saxpy): example of a HIP kernel performing a saxpy operation. [`README`](https://github.com/amd/HPCTrainingExamples/tree/main/HIP/saxpy/README.md).
      8. [`stencil_examples`](https://github.com/amd/HPCTrainingExamples/tree/main/HIP/stencil_examples): examples stencils operation with a HIP kernel, including the use of timers and asyncronous copies.
      9. [`vectorAdd`](https://github.com/amd/HPCTrainingExamples/tree/main/HIP/vectorAdd): example of a HIP kernel to perform a vector add. Note that the `CMakeLists.txt` in this directory represents a good example of a portable CMakeLists to build on either AMD or Nvidia GPUs with HIP. [`README`](https://github.com/amd/HPCTrainingExamples/tree/main/HIP/vectorAdd/README.md).
      10. [`vector_addition_examples`](https://github.com/amd/HPCTrainingExamples/tree/main/HIP/vector_addition_examples): another example of a HIP kernel to perform vector addition, including different versions such as one using shared memory, one with timers, and a CUDA one to try [`HIPIFY`](https://github.com/amd/HPCTrainingExamples/tree/main/HIPIFY) and [`hipifly`](https://github.com/amd/HPCTrainingExamples/tree/main/hipifly) tools on. The examples in this directory are not part of the HIP test suite.
      11. [`reduction`](https://github.com/amd/HPCTrainingExamples/tree/main/HIP/reduction): several examples of reduction operations using HIP kernels. [`README`](https://github.com/amd/HPCTrainingExamples/blob/main/HIP/reduction/README.md).
      12. [`functions`](https://github.com/amd/HPCTrainingExamples/tree/main/HIP/functions): example on how to define a function to be called both from the host and from a HIP kernel. [`README`](https://github.com/amd/HPCTrainingExamples/tree/main/HIP/functions/README.md).
      13. [`allocators`](https://github.com/amd/HPCTrainingExamples/tree/main/HIP/allocators): example on how different ways of allocating memory affect the execution time of a vector add kernel performed on the CPU and on the GPU. [`README`](https://github.com/amd/HPCTrainingExamples/tree/main/HIP/allocators/README.md).
      14. [`transpose`](https://github.com/amd/HPCTrainingExamples/tree/main/HIP/transpose): examples looking at LDS and coalesced data reads and writes considering the transposition of a matrix with HIP. [`README`](https://github.com/amd/HPCTrainingExamples/blob/main/HIP/transpose/README.md). 
   3. ***CUDA to HIP Porting***
      1. [`HIPIFY`](https://github.com/amd/HPCTrainingExamples/tree/main/HIPIFY): example to show how to port CUDA code to HIP with HIPIFY tools. [`README`](https://github.com/amd/HPCTrainingExamples/blob/main/HIPIFY/README.md).
      2. [`hipifly`](https://github.com/amd/HPCTrainingExamples/tree/main/hipifly): example to show how to port CUDA code to HIP with hipifly tools. [`README`](https://github.com/amd/HPCTrainingExamples/blob/main/hipifly/vector_add/README.md).
   4. [`HIP-Optimizations`](https://github.com/amd/HPCTrainingExamples/tree/main/HIP-Optimizations): two examples are currently in this directory: the first one is a daxpy HIP kernel used to show how an initial version can be optimized to improve performance. [`README`](https://github.com/amd/HPCTrainingExamples/tree/main/HIP-Optimizations/daxpy/README.md). The second example shows how to reduce register pressure, and is based on the associated ROCm blog [post](https://rocm.blogs.amd.com/software-tools-optimization/register-pressure/README.html#how-to-reduce-register-pressure). [`README`](https://github.com/amd/HPCTrainingExamples/blob/main/HIP-Optimizations/register-pressure/README.md).
   5. [`HIPFort`](https://github.com/amd/HPCTrainingExamples/tree/main/HIPFort): two examples that show how to use the hipfort interface to call hipblas functions from Fortran.
      1. [`hipgemm`](https://github.com/amd/HPCTrainingExamples/tree/main/HIPFort/hipgemm): call the hipBLAS function `hipblasZgemm` from an OpenMP application code written in Fortran, leveraging the hipfort interface. [`README`](https://github.com/amd/HPCTrainingExamples/tree/main/HIPFort/hipgemm).
      2. [`matmult`](https://github.com/amd/HPCTrainingExamples/tree/main/HIPFort/matmult): this example compares the results of a matrix multiplication done with `hipblasDgemm` using hipBLAS and hipfort, with one done using a HIP kernel. For the HIP kernel, a proper interface has to be created, which is instead provided by hipfort for the case of hipBLAS. With this example, userts can better understand how hipfort works, being involved themselves in the creation of such an interface. [`README`](https://github.com/amd/HPCTrainingExamples/blob/main/HIPFort/matmult/README.md).
   6. [`HIPStdPar`](https://github.com/amd/HPCTrainingExamples/tree/main/HIPStdPar): several examples showing C++ Std Parallelism with HIP on AMD GPUs. [`README`](https://github.com/amd/HPCTrainingExamples/blob/main/HIPStdPar/CXX/README.md).
   7. [`HIP-OpenMP`](https://github.com/amd/HPCTrainingExamples/tree/main/HIP-OpenMP): several examples on HIP/OpenMP interoperability in Fortran and C++.
      1. **C++**
         1. [**Call HIP kernels from OpenMP app and vice-versa**](https://github.com/amd/HPCTrainingExamples/tree/main/HIP-OpenMP/CXX): this directory contains several examples on how to use OpenMP and HIP in the same application. A detailed explanation of the `saxpy` and `daxpy` examples in this directory is contained in the [`README`](https://github.com/amd/HPCTrainingExamples/blob/main/HIP-OpenMP/CXX/README.md).
         2.  [`interop`]((https://github.com/amd/HPCTrainingExamples/tree/main/HIP-OpenMP/CXX/interop):): this example uses the OpenMP `interop` contstruct to synchronize a HIP kernel with an OpenMP kernel by placing them on the same HIP stream. The construct seems to be not working correctly at the moment, and a call to `hipStreamSynchronize` is made, detailes in the [`README`](https://github.com/amd/HPCTrainingExamples/blob/main/HIP-OpenMP/CXX/interop/README.md).
      2. **Fortran**
         1. [`Calling_DGEMM`](https://github.com/amd/HPCTrainingExamples/tree/main/HIP-OpenMP/F/Calling_DGEMM): this example calls a rocblass dgemm function from an OpenMP application code written in Fortran. It has two versions, one with explicit memory management done with OpenMP, in the [`explicit`](https://github.com/amd/HPCTrainingExamples/tree/main/HIP-OpenMP/F/Calling_DGEMM/explicit) directory, and one that uses unified shared memory, in the [`usm`](https://github.com/amd/HPCTrainingExamples/tree/main/HIP-OpenMP/F/Calling_DGEMM/usm) directory.  [`README`](https://github.com/amd/HPCTrainingExamples/blob/main/HIP-OpenMP/F/Calling_DGEMM/README.md).
2. [**MPI-examples**](https://github.com/amd/HPCTrainingExamples/tree/main/MPI-examples)
   1. ***Benchmarks***: GPU aware benchmarks (`collective.cpp` and `pt2pt.cpp`) to assess the performance of the communication libraries. [`README`](https://github.com/amd/HPCTrainingExamples/blob/main/MPI-examples/README.md) [`Video of Presentation`](https://www.youtube.com/watch?v=77bS4B2Mvmo&list=PLB4tvLCynFjQq2rOIjEy39IDaugXcFPNa&index=2&t=0s).
   2. [***GhostExchange***](https://github.com/amd/HPCTrainingExamples/tree/main/MPI-examples/GhostExchange): slimmed down example of an actual physics application where the solution is initialized on a square 2D or 3D domain discretized with a Cartesian grid, and then advanced in parallel using MPI communications with unified shared memory, so host pointers are passed to the MPI calls, even if a GPU aware installation of MPI is used.
      1. [`GhostExchange_ArrayAssign`](https://github.com/amd/HPCTrainingExamples/tree/main/MPI-examples/GhostExchange/GhostExchange_ArrayAssign): this version uses OpenMP to offload to the GPU.  Detailed [`README`](https://github.com/amd/HPCTrainingExamples/blob/main/MPI-examples/GhostExchange/GhostExchange_ArrayAssign/README.md) files are provided here for the different versions of the `GhostExchange_ArrayAssign` code, that showcase how to use `Omnitrace` to profile this application. Note that while the timeline tracing tool is now `rocprof-sys`, `Omnitrace` stil lives in its dedicated [github repository](https://github.com/ROCm/omnitrace).
      2. [`GhostExchange_ArrayAssign_HIP`](https://github.com/amd/HPCTrainingExamples/tree/main/MPI-examples/GhostExchange/GhostExchange_ArrayAssign_HIP): this version uses HIP to offload to the GPU. In this case as well, Detailed [`README`](https://github.com/amd/HPCTrainingExamples/blob/main/MPI-examples/GhostExchange/GhostExchange_ArrayAssign_HIP/README.md) files are provided here for the different versions of the `GhostExchange_ArrayAssign_HIP` code, that illustrate how to use `Omnitrace` to profile this application.
      3. [`GhostExchange3D_ArrayAssign`](https://github.com/amd/HPCTrainingExamples/tree/main/MPI-examples/GhostExchange/GhostExchange3D_ArrayAssign): a CPU only version of the Ghost Exchange example in 3D, without offloading to GPU.
3. [**ManagedMemory**](https://github.com/amd/HPCTrainingExamples/tree/main/ManagedMemory): programming model exercises, topics covered are APU programming model, OpenMP, performance protability frameworks (Kokkos and RAJA) and discrete GPU programming model. Some HIP examples are also available. [`README`](https://github.com/amd/HPCTrainingExamples/blob/main/ManagedMemory/README.md).
4. [**MLExamples**](https://github.com/amd/HPCTrainingExamples/tree/main/MLExamples): this is a rapidly growing directory including a variety of machine learning (ML) and artificial intellingence (AI) related examples.
   1. [**Miscelaneous Examples**](https://github.com/amd/HPCTrainingExamples/tree/main/MLExamples): PyTorch MNIST examples, Tensorflow with Horovod, and Huggingface transformers [`README`](https://github.com/amd/HPCTrainingExamples/tree/main/MLExamples/README.md).
   2. [`AI_Surrogates`](https://github.com/amd/HPCTrainingExamples/tree/main/MLExamples/AI_Surrogates): this directory contains a variety of Jupyter notebooks that have been developed to show some applications of AI for science using surrogate models. There are no READMEs for these examples at the moment and we suggest users work directly with the Jupyter notebooks for details.
   3. [`PyTorch_Profiling`](https://github.com/amd/HPCTrainingExamples/tree/main/MLExamples/PyTorch_Profiling): a colleciton of examples to show how to profile PyTorch using AMD tools. [`README`](https://github.com/amd/HPCTrainingExamples/blob/main/MLExamples/PyTorch_Profiling/README.md).
   4. [`RAG_LangChainDemo`](https://github.com/amd/HPCTrainingExamples/tree/main/MLExamples/RAG_LangChainDemo): a RAG Chatobot Demo application. [`README`](https://github.com/amd/HPCTrainingExamples/blob/main/MLExamples/RAG_LangChainDemo/README.md).
   5. [`Neural_Operators`](https://github.com/amd/HPCTrainingExamples/tree/main/MLExamples/Neural_Operators): training a Fourier-Neural-Operator on a CFD dataset using PyTorch and performing first optimizations. [`README`](https://github.com/amd/HPCTrainingExamples/blob/main/MLExamples/Neural_Operators/README.md).
5. [**Occupancy**](https://github.com/amd/HPCTrainingExamples/tree/main/Occupancy): example on modifying thread occupancy, using several variants of a matrix vector multiplication leveraging shared memory and launch bounds.
6. [**rocprof-compute**](https://github.com/amd/HPCTrainingExamples/tree/main/rocprof-compute): several examples showing how to leverage rocprof-compute (formerly Omniperf) to perform kernel level optimization using HIP. **NOTE**: detailed READMEs are provided on each subdirectory. [`README`](https://github.com/amd/HPCTrainingExamples/blob/main/rocprof-compute/README.md).[`Video of Presentation`](https://fs.hlrs.de/projects/par/events/2025/GPU-AMD/day4/15_Lecture.mp4).
7. [**Omniperf-OpenMP**](https://github.com/amd/HPCTrainingExamples/tree/main/Omniperf-OpenMP): example showing how to leverage Omniperf (now rocprof-compute) to perform kernel level optimization using Fortran and OpenMP. [`README`](https://github.com/amd/HPCTrainingExamples/blob/main/Omniperf-OpenMP/README.md).
8. [**Omnitrace**](https://github.com/amd/HPCTrainingExamples/tree/main/Omnitrace)
   1. ***Omnitrace on Jacobi***: Omnitrace used on the Jacobi solver example. [`README`](https://github.com/amd/HPCTrainingExamples/tree/main/Omnitrace/README.md).
   2. ***Omnitrace by Example***: Omnitrace used on several versions of the Ghost Exchange example:
      1. ***OpenMP Version***: [`READMEs`](https://github.com/amd/HPCTrainingExamples/blob/main/MPI-examples/GhostExchange/GhostExchange_ArrayAssign) available for each of the different versions of the example code. [`Video of Presentation`](https://vimeo.com/951998260).
      2. ***HIP Version***:  [`READMEs`](https://github.com/amd/HPCTrainingExamples/blob/main/MPI-examples/GhostExchange/GhostExchange_ArrayAssign_HIP) available for each of the different versions of the example code.
9. [**Pragma_Examples**](https://github.com/amd/HPCTrainingExamples/tree/main/Pragma_Examples): a large variety of examples for OpenMP (in Fortran, C, and C++) and a few for OpenACC.
   1. [**OpenMP**](https://github.com/amd/HPCTrainingExamples/tree/main/Pragma_Examples/OpenMP): there are really many OpenMP examples that span various languages (C,C++ and Fortran) and various levels of complexity. There is an introductory [`README`](https://github.com/amd/HPCTrainingExamples/tree/main/Pragma_Examples/README.md) for the OpenMP material but users are strongly encouraged to browes this directory and its sub-directory in great detail to make sure they go over as many examples as possible.
      1. [**C**](https://github.com/amd/HPCTrainingExamples/tree/main/Pragma_Examples/OpenMP/C): this directory contains many examples that go from simple constructs to complex constructs, device routines, reductions, build examples and also a Jacobi solver example. This directory contains a [`README`](https://github.com/amd/HPCTrainingExamples/blob/main/Pragma_Examples/OpenMP/C/README.md) but users are encouraged to browse each sub-directory independently and consult the dedicated READMEs anytime they are available.
      2. [**C++**](https://github.com/amd/HPCTrainingExamples/tree/main/Pragma_Examples/OpenMP/CXX): more complex exercises that explore optimizations with memory alignment, targeted use of the memory management directives and clauses, and setting ad-hoc parameters such as `num_threads()` and `thread_limit()`. There is also an example called `cpp_classes` that applies OpenMP offloading to a code using C++ classes. There is no specific README at the moment for this directory and users are encouraged to browse the sub-directories and associated READMEs independently.
      3. [**Fortran**](https://github.com/amd/HPCTrainingExamples/tree/main/Pragma_Examples/OpenMP/Fortran): as in the C sub-directory, there is a wide variety of examples here that span a similar set of cases such as the C counterpart. For instance the Jacobi solver example is also available here in Fortan. A top level [`README`](https://github.com/amd/HPCTrainingExamples/blob/main/Pragma_Examples/OpenMP/Fortran/README.md) is available but once again users are strongly encouraged to browse the sub-directories and associated READMEs independently.
      4. [`Intro`](https://github.com/amd/HPCTrainingExamples/tree/main/Pragma_Examples/OpenMP/Intro): a collection of mostly C++ examples with some Fortran as well. There is no associated README at the moment so users will need to inspect the code directly for more details.
      5. [`USM`](https://github.com/amd/HPCTrainingExamples/tree/main/Pragma_Examples/OpenMP/USM): some examples specific to unified shared memory and OpenMP. [`README`](https://github.com/amd/HPCTrainingExamples/blob/main/Pragma_Examples/OpenMP/USM/README.md).
      6. [`OpenMP_CPU`](https://github.com/amd/HPCTrainingExamples/tree/main/Pragma_Examples/OpenMP_CPU): some examples of using OpenMP on the CPU.
   2. [**OpenACC**](https://github.com/amd/HPCTrainingExamples/tree/main/Pragma_Examples/OpenACC): a few examples of offloading to GPU using OpenACC.
      1. [**C**](https://github.com/amd/HPCTrainingExamples/tree/main/Pragma_Examples/OpenACC/C): examples of reductions, saxpy and vector addition in C using OpenACC.
      2. [**Fortran**](https://github.com/amd/HPCTrainingExamples/tree/main/Pragma_Examples/OpenACC/Fortran): examples of reductions and vector add in Fortran using OpenACC.
10. [**Speedup_Examples**](https://github.com/amd/HPCTrainingExamples/tree/main/Speedup_Examples): examples to show the speedup obtained going from a CPU to a GPU implementation. [`README`](https://github.com/amd/HPCTrainingExamples/blob/main/Speedup_Examples/rzf_training/README.md).
11. [**atomics_openmp**](https://github.com/amd/HPCTrainingExamples/tree/main/atomics_openmp): examples on atomic operations using OpenMP.
12. [**Kokkos**](https://github.com/amd/HPCTrainingExamples/tree/main/Kokkos): runs the Stream Triad example with a Kokkos implementation. [`README`](https://github.com/amd/HPCTrainingExamples/tree/main/Kokkos/README.md).
13. [**Rocgdb**](https://github.com/amd/HPCTrainingExamples/tree/main/Rocgdb): debugs the [`HPCTrainingExamples/HIP/saxpy`](https://github.com/amd/HPCTrainingExamples/tree/main/HIP/saxpy) example with Rocgdb.[`README`](https://github.com/amd/HPCTrainingExamples/tree/main/Rocgdb/README.md). [`Video of Presentation`](https://www.youtube.com/watch?v=8gg3aNUsR44&list=PLB4tvLCynFjQq2rOIjEy39IDaugXcFPNa&index=1&t=0s).
14. [**Rocprof**](https://github.com/amd/HPCTrainingExamples/tree/main/Rocprof): uses Rocprof to profile [`HPCTrainingExamples/HIPIFY/mini-nbody/hip/`](https://github.com/amd/HPCTrainingExamples/tree/main/HIPIFY/mini-nbody/hip). [`README`](https://github.com/amd/HPCTrainingExamples/tree/main/Rocprof/README.md).
15. [**Rocprofv3**](https://github.com/amd/HPCTrainingExamples/tree/main/Rocprofv3): uses Rocprofv3 to profile a Jacobi solver example.
    1. [`HIP`](https://github.com/amd/HPCTrainingExamples/tree/main/Rocprofv3/HIP): example showing how to use `rocprofv3` to profile the Jacobi solver example written in HIP and available at [`HPCTrainingExamples/HIP/jacobi`](https://github.com/amd/HPCTrainingExamples/tree/main/HIP/jacobi).
    2. [`OpenMP`](https://github.com/amd/HPCTrainingExamples/tree/main/Rocprofv3/OpenMP): this directory contains various examples on how to use Rocprofv3 to profile OpenMP applications:
       1. **Jacobi**: example showing how to use `rocprofv3` to profile the Jacobi solver example written with OpenMP and available at [`HPCTrainingExamples/Pragma_Examples/OpenMP/Fortran/7_jacobi/1_jacobi_usm`](HPCTrainingExamples/Pragma_Examples/OpenMP/Fortran/7_jacobi/1_jacobi_usm). [`README`](https://github.com/amd/HPCTrainingExamples/tree/main/Rocprofv3/OpenMP).
       2. [`Allocations_and_MemoryPool_MI300A`](https://github.com/amd/HPCTrainingExamples/blob/main/Rocprofv3/OpenMP/Allocations_and_MemoryPool_MI300A/Fortran): example showing the importance of reducting dynamic memory allocations on MI300A with unified memory. [`README`](https://github.com/amd/HPCTrainingExamples/tree/main/Rocprofv3/OpenMP/Allocations_and_MemoryPool_MI300A/Fortran/README.md).
16. [**rocm-blog-codes**](https://github.com/amd/HPCTrainingExamples/tree/main/rocm-blogs-codes): this directory contains accompany source code examples for select HPC ROCm blogs found at [https://rocm.blogs.amd.com](https://rocm.blogs.amd.com). [`README`](https://github.com/amd/HPCTrainingExamples/tree/main/rocm-blogs-codes/README.md).
17. [`Libraries`](https://github.com/amd/HPCTrainingExamples/tree/main/Libraries): examples showcasing how to integrate some of the HIP/ROCm libraries in your application code.
    1. [`matrix_exponential`](https://github.com/amd/HPCTrainingExamples/tree/main/Libraries/matrix_exponential): an example on how to use rocBLAS to compute the approximate solution of a linear system of ordinary differential equations. [`README`](https://github.com/amd/HPCTrainingExamples/blob/main/Libraries/matrix_exponential/README.md).
    2. [`ConjugateGradient`](https://github.com/amd/HPCTrainingExamples/tree/main/Libraries/ConjugateGradient): example showing how to use rocBLAS and rocSPARSE to solve a linear system with sparse symmetric positive definite matrix using a conjugate gradient algorithm. [`README`](https://github.com/amd/HPCTrainingExamples/tree/main/Libraries/ConjugateGradient/README.md).
   3. [`RocSolverRf`](https://github.com/amd/HPCTrainingExamples/tree/main/Libraries/RocSolverRf): this example shows how to solve a sequence of sparse linear systems with refactorizaion using RocSolverRf. [`README`](https://github.com/amd/HPCTrainingExamples/blob/main/Libraries/RocSolverRf/README.md).
18. [`rocprofiler-systems`](https://github.com/amd/HPCTrainingExamples/tree/main/rocprofiler-systems): an example of how to use the `rocprof-sys` timeline trace profile on the Jacobi solver example in [`HPCTrainingExamples/HIP/jacobi`](https://github.com/amd/HPCTrainingExamples/tree/main/HIP/jacobi).
19. [`rocprof-tracedecoder`](https://github.com/amd/HPCTrainingExamples/tree/main/rocprof-tracedecoder): an example of how to use the `rocprof-tracedecoder`. [`README`](https://github.com/amd/HPCTrainingExamples/blob/main/rocprof-tracedecoder/README.md).
20. [`Profile-by-example`](https://github.com/amd/HPCTrainingExamples/tree/main/Profiling-by-example): a walk-through of how to profile the Jacobi solver example in  [`HPCTrainingExamples/HIP/jacobi`](https://github.com/amd/HPCTrainingExamples/tree/main/HIP/jacobi)) on Oak Ridge National Lab's machine Frontier, using `rocprofv3`, `rocprof-sys`, and `rocprof-compute`, effectively providing an example of an all around profiling effort using AMD tools. [`README`](https://github.com/amd/HPCTrainingExamples/blob/main/Profiling-by-example/README-Jacobi-Frontier.md).
21. [`Affinity`](https://github.com/amd/HPCTrainingExamples/tree/main/Affinity): an example to show how to set proper affinity to CPU cores and GPUs. [`README`](https://github.com/amd/HPCTrainingExamples/blob/main/Affinity/README.md).
22. [**login_info**](https://github.com/amd/HPCTrainingExamples/tree/main/login_info)
    1. [***AAC***](https://github.com/amd/HPCTrainingExamples/tree/main/login_info/AAC): instructions on how to log in to the AMD Accelerator Cloud (AAC) resource. [`README`](https://github.com/amd/HPCTrainingExamples/tree/main/login_info/AAC/README.md).
23. [`Python`](https://github.com/amd/HPCTrainingExamples/tree/main/Python): growing directory of material on Python examples.
    1. [`cupy`](https://github.com/amd/HPCTrainingExamples/tree/main/Python/cupy): an example showing how to perform array sums on GPU and CPU using CuPy and NumPy. [`README`](https://github.com/amd/HPCTrainingExamples/blob/main/Python/cupy/README.md).
    2. [`mpi4py`](https://github.com/amd/HPCTrainingExamples/blob/main/Python/mpi4py): an example showing MPI communication in Python, leveraging MPI4Py. [`README`](https://github.com/amd/HPCTrainingExamples/blob/main/Python/mpi4py/README.md).
    3. [`tensorflow`](https://github.com/amd/HPCTrainingExamples/blob/main/Python/tensorflow): an example where TensorFlow is used for an MNIST classification problem. 
24. [**Doc**](https://github.com/amd/HPCTrainingExamples/tree/main/Doc): directory with LaTeX and PDF documents that contain some of the most relevant README files properly formatted for ease of reading. The PDF document is obtained building the LaTeX document. Note: the document may be out of date compared to the READMEs in the repo which are most current source of information for these exercises.
25. [`tests`](https://github.com/amd/HPCTrainingExamples/tree/main/tests): this directory contains a huge number of test scripts aimed at testing the installation of the software provided by the scripts in the companing repo [`HPCTrainingDock`](https://github.com/amd/HPCTrainingDock).


## Run the Tests

Most of the exercises in this repo can be run as a test suite by doing:

```
git clone https://github.com/amd/HPCTrainingExamples && \
cd HPCTrainingExamples && \
cd tests && \
./runTests.sh
```
You can also run a subset of the whole test suite by specifying the subset you are interested in as an input to the `runTests.sh` script. For instance: `./runTests.sh --pytorch`. To see a full list of the possible subsets that can be run: `./runTests.sh --help`.

**NOTE**: tests can also be run manually from their respective directories, provided the necessary modules have been loaded and they have been compiled appropriately.

## Additional Resources

We recommend users also check out the [`rocm-examples`](https://github.com/rocm/rocm-examples) Github repo, that has a lot of content specific to HIP and ROCm libraries.

## Feedback
We welcome your feedback and contributions, feel free to use this repo to bring up any issues or submit pull requests.
The software made available here is released under the MIT license, more details can be found in [`LICENSE.md`](https://github.com/amd/HPCTrainingExamples/blob/main/LICENSE.md).
