
# Porting exercises
The C porting exercises can be found here (this is the directory of this README): 
```
cd $HOME/HPCTrainingExamples/Pragma_Examples/OpenMP/C
```
#### on aac6: 

Follow the message of the day how to allocate one GPU interactively.
Load the amdclang compiler and set up the environment 
```
module load rocm
export C_COMPILER=amdclang
```
#### on aac7:
Get an interactive session on a node:
```
salloc -N 1 --mem=100GB --gpus=1
```
Note: you will get 1 GPU and 100 GB of memory. This will allow others to use the remaining resources of a node.
Useful commands:
```
sinfo
```
check for available nodes.
```
squeue
```
check for your job(s). In case it was not terminated correctly, you may have to use
```
scancel <JobID>
```
to terminate a job.

You can choose the Cray C compiler (cc) or the amdclang compiler.
##### amdclang compiler on aac7:
Note that in CPE/25.03 the cc compiler wrapper leads to a segfault at program finalization. Therefore we decided to not reccomend to use the compiler wrappers for now on aac7 with amdclang. If you have rocm 6.3.3 or greater in that or following versions of CPE you should not encounter any issues with PrgEnv-amd and the wrappers.
```
module load rocm
```
```
export C_COMPILER=amdclang
```
##### Cray C compiler on aac7:
Prepare the environment (those are default, check with ```module list```):
```
module load PrgEnv-cray
module load craype-x86-genoa
module load craype-accel-amd-gfx942
module load cce
module load rocm
```
```
export C_COMPILER=cc
```
> [!NOTE]
> The `C_COMPILER` environment variable is chosen instead of the standard `CC` variable for C compilers since `CC` clashes with the Cray C++ compiler wrapper `CC`.
#### On all systems independent of the compiler:
This flag
```
export HSA_XNACK=1
```
will enable no memory copies (use of `unified_shared_memory`) on MI300A
```
export HSA_XNACK=0
```
will disable this and behave similarly to a discrete GPU with memory copies.
Check with
```
rocminfo
```
if ```xnack+``` (unified memory enabled) or ```xnack-``` (with memory copies) is set.

The exercises in the folders numbered 1 to 6 are small examples of what one may encounter when porting a real world code. 
Each exercise has its own README with instructions.
The exercises 1-5 have a CPU only code to try porting yourself and (intermediate steps) of a solution. Exercise 6 does not have a version to port yourself, but explains a common challenge for porting to discrete GPUs.
The instructions assume you work on MI300A and some of the exercises explore the differences of using the discrete GPU or APU programming model (```HSA_XNACK=0``` or ```=1```).
The recommended order to do the exercises is the order in which they are numbered and first all with unified memory and then again with map clauses or data region.

Exercise 8 is a small app with a Jacobi solver that you can find in the CXX section. (Note: This code is explained in detail a blogpost https://gpuopen.com/learn/amd-lab-notes/amd-lab-notes-jacobi-readme/.) 
Choose one of the exercises in the sub-directories and use the README there for instructions:
```
cd 1_saxpy
cd 2_vecadd  
cd 3_reduction 
cd 4_reduction_scalars  
cd 5_reduction_array
cd 6_device_routine
```
