DUG

DUG is a Perth based HPC as service company located in West Perth.

It provide some resources for us to do the research, which includes access to A100 GPU.

To access that, you will first need to contact them to setup your accounts and projects. In this step, you will need to provide your ssh pub key to them, you can generate a new one if you want.

To login, you will need to load your provided ssh key first via the command

ssh-add ~/.ssh/your_rsa_key # not the .pub one

Then you can update your ~/.ssh/config

Add this chunk

Host dug
HostName mcc_uwa
User your_username
ProxyJump [email protected]
IdentityFile /path/to/your/dug_rsa_key

After this you should be able to login to the login node via ssh dug

It will prompt you a password requirements, type it in, then you are in.

Run on the HPC

Keep in mind with the architecture diagram above, you do not have much storage places inside the login node (Which is 10GB for DUG here), so you will need to go to your data directory in DUG. Similar to the concept for Kaya, you will need to go to sbatch or group directory.

Create conda env

This example is provided by Kai

module load conda # load the conda first
mkdir -p /data/uwa_multimodality/your_directory/env/conda
conda create -p /data/uwa_multimodality/your_directory/env/conda/text2video-finetune python=3.10 # specify env install path
conda create --name ENV_NAME --clone /data/uwa_multimodality/your_directory/env/conda/text2video-finetune
# you should not run exactly the same code as this, need to change based on your situation

This will create the conda environment under your data directory

Also you will want to get the conda init every time your login automatically, add the below section into the .bashrc

## >>> conda initialize >>>
. /data/uwa_multimodality/uwa_niuk/etc/profile.d/conda.sh
## <<< conda initialize <<<

Make sure the cache directories for different stuff into your data folder

export CONDA_PKGS_DIRS=/data/uwa_multimodality/uwa_niuk/.conda-pkgs
export PIP_CACHE_DIR=/data/uwa_multimodality/uwa_niuk/.pip-cache

mkdir "$PIP_CACHE_DIR"
mv ~/.cache/pip/* "$PIP_CACHE_DIR/"

mkdir /data/uwa_multimodality/uwa_niuk/tmp
export TMPDIR=/data/uwa_multimodality/uwa_niuk/tmp

Same for the huggingface, etc

mkdir /data/uwa_multimodality/uwa_niuk/hf
mkdir /data/uwa_multimodality/uwa_niuk/torch
mkdir /data/uwa_multimodality/uwa_niuk/hf/datasets
mkdir /data/uwa_multimodality/uwa_niuk/hf/models

export TORCH_HOME=/data/uwa_multimodality/uwa_niuk/torch/
export HF_HOME=/data/uwa_multimodality/uwa_niuk/hf/
export HF_DATASETS_CACHE=/data/uwa_multimodality/uwa_niuk/hf/datasets
export TRANSFORMERS_CACHE=/data/uwa_multimodality/uwa_niuk/hf/models

Job submit

Rather #SBATCH in Kaya, DUG use #rj as the prefix for the delcaration for HPC params.

Example of job scripts

#!/bin/bash

#rj name=captionvideo queue=uwa_multimodality
#rj features=a100

module add cuda/compat/12.0

export http_proxy="http://proxy.per.dug.com:3128"
export https_proxy="http://proxy.per.dug.com:3128"

. /data/uwa_multimodality/uwa_niuk/etc/profile.d/conda.sh
conda activate text2video-finetune

python /data/uwa_multimodality/uwa_niuk/project/Text-To-Video-Finetuning/Video-BLIP2-Preprocessor/preprocess.py --video_directory /data/uwa_multimodality/uwa_niuk/project/Text-To-Video-Finetuning/data --config_name "secrets-human-blip2" --config_save_name "secrets-human-blip2" --prompt_amount 8

echo "Videos Captioned"

Access internet from compute nodes

The compute nodes cannot access internet directly. You'll need to configure the proxy settings in your job script:

export http_proxy="http://proxy.per.dug.com:3128"
export https_proxy="http://proxy.per.dug.com:3128"

Kai's origin note is here: https://github.com/Kai0226/PHD/blob/main/DUG%20connection.md

We will keep it updated

PreviousRun experiment on Multiple GPU in Kaya NextDUG HPC FastX connection Guide for Linux

Last updated 1 year ago