HPC 101

About HPC

What is HPC

HPC stands for High Performance Computing. The term may refer to the field of using distributed computing resources to accelerate different workloads. In short words, when we need to do some intensive calculations, we will try to use a lot of computers, combine their computational power together. How to get this work is HPC.

The components for the HPC includes hardware and software.

Hardware part will normally includes storage server, login node, compute nodes(GPU/CPU), network.

Software part: it normally based on Linux system. The core task for the software part is to translate the calculation tasks you have into tasks on different compute nodes and get the results back. There is a system designed for this purpose, which is called SLURM

So if you have a calculation task needed to be ran, you can submit the job via the slurm system, then the task will be into a queue, and it will be allocated with resources and do the job later.

Here is a good introduction linkage for you to have a look: https://deic-hpc.github.io/EuroCC-knowledgepool/

There are a lot of different HPC systems, each organisation may have one. For example, UWA has its Kaya, Pawsey center has its own HPC, the company DUG in WA also provide HPC for companies and research to use.

The HPC systems from different organisation may be varied in some command details (For example, Kaya do not support singularity, a docker containerazation system in HPC, but Pawsey do support it), but in general, they all based on the architecture above, and they all use slurm system to manage jobs.

Software Part

SLURM

Slurm cheatsheet: https://slurm.schedmd.com/pdfs/summary.pdf

This is the cheatsheet generally about how to manage the jobs using slurm, we do provide examples for specific HPC system.

Module

Another important part for the software is the module. In HPC system, normally, due to the security reason, you do not have permission to install any package you want. Because the linux is file based system, HPC will have the required and common used software pre-built as modules, when you need the specific software, for example: Python3.9 and Conda, instead of install it via apt-get install xxx You should use the command module load Python3.9 and module load conda

Actually, before doing that, you will need to first check whether the module is availabe in the HPC system or not, with command module avail

We will give examples later for each HPC system we have and running on.

PreviousGPU Resources NextKaya

Last updated 1 year ago