Tutorial
  • Research Skill Bootcamp
    • Overview
    • Introduction to Research
    • Literature Review
    • Latex, Overleaf and Template
    • Latex, use TikZ to draw diagram
    • Introduction to Pytorch
    • Introduction to Neural Networks, MLP, CNN, RNN, Transformer
    • Problem Formulation and Experiment Design
    • Before Vision Becomes Reality
  • AI Engineer Bootcamp
    • Poster
    • Introduction
    • Development Environment Setup
    • Docker
    • Git, GitHub and Agile
    • Introduction to RAG
    • Full Stack Intro, Demo and Setup
    • Python Package Development
    • Databases
    • React
    • Django
    • GraphQL and Hasura
    • Authentication and Authorization
    • Deploy and CI/CD
    • Project Demo
    • External Resources
    • Knowledge Graph and GraphRAG
  • Dev Setup
    • How to install Docker
    • Docker 101
  • GPU Resources
    • HPC 101
    • Kaya
      • Demo project
      • Interactively use Kaya with JetBrains Gateway
      • Battle notes
      • Run experiment on Multiple GPU in Kaya
    • DUG
      • DUG HPC FastX connection Guide for Linux
  • Background Knowledge
    • Public Available LLM
      • Self-Hosted LLM
      • Quantization demo
      • Public Open LLM API
Powered by GitBook
On this page
  • Login
  • Run on the HPC
  1. GPU Resources

DUG

PreviousRun experiment on Multiple GPU in KayaNextDUG HPC FastX connection Guide for Linux

Last updated 1 year ago

DUG is a Perth based HPC as service company located in West Perth.

It provide some resources for us to do the research, which includes access to A100 GPU.

To access that, you will first need to contact them to setup your accounts and projects. In this step, you will need to provide your ssh pub key to them, you can generate a new one if you want.

Login

To login, you will need to load your provided ssh key first via the command

ssh-add ~/.ssh/your_rsa_key # not the .pub one

Then you can update your ~/.ssh/config

Add this chunk

Host dug
HostName mcc_uwa
User your_username
ProxyJump [email protected]
IdentityFile /path/to/your/dug_rsa_key

After this you should be able to login to the login node via ssh dug

It will prompt you a password requirements, type it in, then you are in.

Run on the HPC

Keep in mind with the architecture diagram above, you do not have much storage places inside the login node (Which is 10GB for DUG here), so you will need to go to your data directory in DUG. Similar to the concept for Kaya, you will need to go to sbatch or group directory.

Create conda env

This example is provided by Kai

module load conda # load the conda first
mkdir -p /data/uwa_multimodality/your_directory/env/conda
conda create -p /data/uwa_multimodality/your_directory/env/conda/text2video-finetune python=3.10 # specify env install path
conda create --name ENV_NAME --clone /data/uwa_multimodality/your_directory/env/conda/text2video-finetune
# you should not run exactly the same code as this, need to change based on your situation

This will create the conda environment under your data directory

Also you will want to get the conda init every time your login automatically, add the below section into the .bashrc

## >>> conda initialize >>>
. /data/uwa_multimodality/uwa_niuk/etc/profile.d/conda.sh
## <<< conda initialize <<<

Make sure the cache directories for different stuff into your data folder

export CONDA_PKGS_DIRS=/data/uwa_multimodality/uwa_niuk/.conda-pkgs
export PIP_CACHE_DIR=/data/uwa_multimodality/uwa_niuk/.pip-cache

mkdir "$PIP_CACHE_DIR"
mv ~/.cache/pip/* "$PIP_CACHE_DIR/"

mkdir /data/uwa_multimodality/uwa_niuk/tmp
export TMPDIR=/data/uwa_multimodality/uwa_niuk/tmp

Same for the huggingface, etc

mkdir /data/uwa_multimodality/uwa_niuk/hf
mkdir /data/uwa_multimodality/uwa_niuk/torch
mkdir /data/uwa_multimodality/uwa_niuk/hf/datasets
mkdir /data/uwa_multimodality/uwa_niuk/hf/models

export TORCH_HOME=/data/uwa_multimodality/uwa_niuk/torch/
export HF_HOME=/data/uwa_multimodality/uwa_niuk/hf/
export HF_DATASETS_CACHE=/data/uwa_multimodality/uwa_niuk/hf/datasets
export TRANSFORMERS_CACHE=/data/uwa_multimodality/uwa_niuk/hf/models

Job submit

Rather #SBATCH in Kaya, DUG use #rj as the prefix for the delcaration for HPC params.

Example of job scripts

#!/bin/bash

#rj name=captionvideo queue=uwa_multimodality
#rj features=a100

module add cuda/compat/12.0

export http_proxy="http://proxy.per.dug.com:3128"
export https_proxy="http://proxy.per.dug.com:3128"

. /data/uwa_multimodality/uwa_niuk/etc/profile.d/conda.sh
conda activate text2video-finetune

python /data/uwa_multimodality/uwa_niuk/project/Text-To-Video-Finetuning/Video-BLIP2-Preprocessor/preprocess.py --video_directory /data/uwa_multimodality/uwa_niuk/project/Text-To-Video-Finetuning/data --config_name "secrets-human-blip2" --config_save_name "secrets-human-blip2" --prompt_amount 8

echo "Videos Captioned"

Access internet from compute nodes

The compute nodes cannot access internet directly. You'll need to configure the proxy settings in your job script:

export http_proxy="http://proxy.per.dug.com:3128"
export https_proxy="http://proxy.per.dug.com:3128"

We will keep it updated

Kai's origin note is here:

https://github.com/Kai0226/PHD/blob/main/DUG%20connection.md