Preclude

This is a .md file [Markdown file].
View in a markdown viewer(reccomended) or as plain text. I reccomend using VS Code

This guide is done by Lim Zi Xiong and Bryan Wong Wen Ping from Singapore Polytechnic.

Preclude

This Guide serves as a "wtf is going on" and serves to try and explain why things are done in X way and what things are being done.
It is meant to be comprehensive in showing how to use NSCC

These guides are "how to" guides, meant to get jobs running: (start from first in list)
For quick start, you can refer to these guides.

Python_Singularity_Guide.md (RunPythonSingularity Folder) -> Guide to use tensorflow library and run python files
JupyterNoteBook_Guide.md (RunJupyterSingularity FOlder) -> Guide to setting up jupyter notebook server

Visual Guide:

Visual step by step guide

These are other usefull resources online:

Official quick start guide by NSCC, link
NSCC User Guides
NSCC quick reference sheet
NSCC FAQ page
Guide to using dgx GPUs link

So whats actually happening?

NSCC owns the supercomputer, Aspire1, and lets users "borrow" its computing power.
To facilitate this, the linux Operating System PBS Pro is used. See Portable Batch System.
Users submit "jobs", which are requests for resources that contains code/commands they want to run.
These "jobs" use Bash to tell the supercomputer what to do and what files to run.

"Jobs" are submitted in the NSCC Login node, where they are processed and sent to an internal compute node to be processed.
To connect to the Login node, we join the VPN used by NSCC, Sophos VPN.
Then we have to SSH to the Login node to connect to it.

PBS Pro have special commands to submit jobs, such as qsub, qstat, qdel.
As jobs are Bash Scripts, we will need to use Bash to run our code.
ie. all our code can be run using the bash command line.

We need all our dependacies to be working on NSCC as well.
to enable use of some commands like anaconda, we can add it as a module using module load

Installing modules is problematic. we dont have root access, so we cannot just pip install <module>.
We cannot modify the root file where modules are installed, we have to install into our local directory.

SSH (Secure Shell)

SSH is a Network protocol to securely allow a user to access the command line(shell) of another computer

We use this to access NSCC login terminal.
A popular SSH terminal software is PuTTy, we reccomend using it to SSH into the Login Node
It can be downloaded here

Bash (Unix Shell)

Bash is a shell, a user interface for access to an operating system's services, in this case Linux Machines
Basically, Bash is the windows command prompt, but for Linux machines(UNIX).

NSCC uses PBS Pro, which is a Linux OS and uses the Bash shell.
Therefore we will need to use Bash commands to interact with NSCC.

Common Bash Commands:

ls => List all files + folders in current directory
cd => Change directory
mkdir => Makes a directory
rm => Removes an item
echo => Prints to console(stdout)
env => List all environment variables

TIP: To get an overview on what a command does, use the man command or --help flag.

man <command> => opens manual for command
<command> --help => bring out help menu for command

WinSCP, Transfering Files

NSCC uses SFTP (Secure File Transfer Protocol) protocol to transfer files from host computer to NSCC.
Reccomended software that supports SFTP is WinSCP

Therefore, we use WinSCP to tranfer bash scripts and other files such as python files to NSCC

PBS Commands

PBS Pro has commands to submit, delete and track jobs.
NSCC has guides on how to use these commands, As well as a quick reference sheet.

These commands flags are placed on the top of the bash file, after #!/bin/sh, and always start with #PBS.

### The following requests 1 Chunk, 5 CPU Nodes, 1 GPU
#PBS -l select=1:ncpus=5:ngpus=1

### Specify amount of time required
### Values less than 4 hours go into a higher priority queue
#PBS -l walltime=2:00:00

### Specify gpu/dgx queue
#PBS -q gpu

Some Common Commands:

qsub <shell_script> => Submit a job to queue
qdel <job_id> => Delete a running or queued job
qstat => Find information about current jobs
qstat -f <job_id> => Full information of specific job

To view more info about commands , use ` help`. \ For a list of commands, refer to [quick reference sheet](https://help.nscc.sg/wp-content/uploads/2016/08/PBS_Professional_Quick_Reference.pdf)

PBS Queues

Different queues are used to satisfy the resource requirements of the various workloads that run on NSCC.
If we want use dgx GPU vs normal GPU, we have to send out job into different queues.

As a user, we submit jobs into an external queue depending on our needs
Examples:

normal => just use CPUs
GPU => use normal GPUs
dgx => use special dgx fast GPUs

Specifc documentation on queues can be found on pg.4 of [quick start guide](https://help.nscc.sg/pbspro-quickstartguide/) under external queues

Problem with pip install as no root access

Usually, when we add libraries for python, we enter into the command prompt pip install <module> or conda install <module>.
However if we pip install <module>, we get a permission error, as we end up trying to overwrite files we dont have permission to.
pip modules are usually. installed in /usr/*** file, However we do not have access to that file location on NSCC.

Therefore, we have to use pip install -U -q --user <module>:

-U => upgrade if possible
-q => quiet installation, preserve time stamps
--user => install to user home directory instead of system directory (Installs in site.USER_SITE) documentation

Modules

Modules are a part of Modules Package in Linux. Modules Documentation
Modules allow for dynamic modification of the user's environment ($PATH, $MANPATH) via modulefiles.
In essence, modules prepares the environment and allows us to use commands.

We load modules we want with the command module load.
Example: module load anaconda to let us use the conda command.

Commonly used commands:

module help => Show help manual
module list => List currently added modules
module avail => Show available modules
module add/load [module file] => Add/Loads modules
module remove/unload [module file] => Remove/Unloads module
module show [modulefile] => Shows what module file does to environment
module whatis [modulefile] => Query what the module does

Accessing NSCC Login node

Visual Guide

Its reccomended to follow first half of the visual guide. Visual step by step guide

Steps Taken

Download Sophos VPN from NSCC website.
Download Auth app on phone, reccomneded to download Sophos Authenticator.
Use App to Aceess VPN network.
Login to NSCC Login node using PuTTY. \

Run Basic Python script on NSCC as a queue.

Quick Guide

NSCC has made a NSCC PBSPro Quick Start Guide which can be followed.

Transfer files to NSCC

Use WinSCP with this setting

Hostname: aspire.nscc.sg
PortNumber: 22
User name: <your_user_name>
Password: <leave bank, enter when required>

Click and drag to copy files over.

Submitting Job using submission script.

To submit a job.

qsub submit.pbs

Where submit.pbs is :

#!/bin/bash

#PBS –q normal
#PBS –l select=1:ncpus=1:mem=100M
#PBS –l walltime=00:10:00
#PBS –N Sleep_Job
#PBS -o ~/outputfiles/Sleep_Job.o
#PBS –e ~/errorfiles/Sleep_Job.e

echo sleep job for 30 seconds
sleep 30

Of the format :

#!/bin/bash

[#PBS Commands, specify configs about job.]

[Rest of commands to run]

Checking Status And other commands

To check status on your submitted jobs qstat will list current running jobs.

To delete a job, use qdel <job_id>

Running Python With Tensorflow

Tensorflow 2.0 is a tricky module to add.
Tensorflow is GPU dependant, and module load tensorflow only supports Tensorflow 1.4.
Even worse, module load anaconda and module load tensorflow are incompatiable and will raise a warning.
Even pip install tensorflow can cause errors.

The only proper way to add GPU Tensorflow libraries i'hv found so far is by using containers.

Containers

Containers allows for OS-level virtualisation.
Basically its a Virtual Machine that virtualises processes and not the whole computer, which makes it much more efficient.

The main benefit of containers is isolation,
allowing us to package an application with all of its dependencies into a standardized unit.
In essence, we can put our setup into the container (such as installing dependacies), and it will work anywhere.

The most popular containerisation software is Docker.
However for High Performance Computing(HPC) in scientific context, Singularity is a popular option.
While both can be used with NSCC, in our guide we use Singularity.

NSCC has ready-made containers with tensorflow that works with their GPUs.
So we can:

"Boot up" a container with tensorflow
Add other dependacies as needed (pip install)
Run our code from there.

Container Syntax

Singularity Container Syntax: singularity exec --nv <image> /bin/bash << EOF
Example: singularity exec --nv /app/singularity/images/tensorflow/tensorflow_2.3.0_gpu_py3.simg /bin/bash << EOF

singularity exec => executes singularity image
--nv => specify nvidia card
<image> => specify image path
/bin/bash => specify to use bash
<< EOF => here-document, takes in terminal input and input into /bin/bash untill it see an EOF character

<< EOF and Bash variable resolution

Below << EOF, we enter commands we want to run inside the container.
Container starts at location defined in image. As this image is defined by NSCC, it means it starts at some weird nameless location.
We will need to cd into the correct directory like so, cd "$PBS_O_WORKDIR".
$PBS_O_WORKDIR refers to the directory the job was submitted in.

Variables in bash shell and in the container are different when we use << EOF.
In bash shell $variable will resolve the variable in bash shell.
Instead using \$variable will escape the $ character, so the variable will not be resolved in shell, but instead when it runs in container.

Running Jupyter NoteBook Server

Jupyter Notebook lets us run a webpage on a host server machine, and by accessing the webpage, we can run python code on the machine.
The key point to note, is that anyone that can access the webpage on the machine, can run python code on the machine.
connect to webpage => run python code!
Link to documentation

So the idea is to have our job run a Jupyter Notebook Server. Then we connect from our computer to the Jupyter Notebook Server!

SSH Tunneling

There are some caveats and problems to the above plan.
Our jobs run on a compute node, which is not directly exposed to the internet.
It is however connected to the login node(where you connect PuTTy to), which is exposed to the internet.

So... we need to "jump" from the login node to the compute node.
Welcome SSH Port Forwarding/Tunneling which does exactly that.
We basically ssh into the login node, then tell the login node to forward our messages to the compute node.

Syntax:
ssh -N -L localhost::-ib0: @aspire.nscc.sg

-N => dont do anything after ssh connection established
-L => Local Port Forwarding
=> Start a connection from us
=> On Port
-ib0 => when connected, forward to -ib0
=> On Port (Port which Jupyter Notebook runs on)
@aspire.nscc.sg => Where to ssh to and as what user

We cannot determine which compute node our job will run on before hand(that i know of).
So we need to query the running job with `qstat -f ` and determine from EXEC_1720 variable

Hashing

Jupyter Notebook needs to verify whoever connecting is legit.
so we set a password using:

from notebook.auth import passwd
passwd()

This generates a salted hash from our password, and we copy this into our script.
So when we enter our password, the jupyter notebook server can verify

Why use a Salted Hash?
TL;DR cannot reverse compute password and wont fall to rainbow table attacks.

Exporting variables

If we want export Shell variables into Singularity Container,
we use SINGULARITYENV_<VAR_NAME>=<VAR_VALUE>

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
1. RunPythonSingularity		1. RunPythonSingularity
2. RunJupyterSingularity		2. RunJupyterSingularity
DockerPython		DockerPython
PBS_script_template.sh		PBS_script_template.sh
README.md		README.md
Step by step guide on how to access NSCC supercomputer and open up jupyter notebook.pdf		Step by step guide on how to access NSCC supercomputer and open up jupyter notebook.pdf
V2 Step by step guide on how to access NSCC supercomputer and open up jupyter notebook.docx.pdf		V2 Step by step guide on how to access NSCC supercomputer and open up jupyter notebook.docx.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

This guide is done by Lim Zi Xiong and Bryan Wong Wen Ping from Singapore Polytechnic.

Preclude

So whats actually happening?

SSH (Secure Shell)

Bash (Unix Shell)

WinSCP, Transfering Files

PBS Commands

PBS Queues

Problem with pip install as no root access

Modules

Accessing NSCC Login node

Visual Guide

Steps Taken

Run Basic Python script on NSCC as a queue.

Quick Guide

Transfer files to NSCC

Submitting Job using submission script.

Checking Status And other commands

Running Python With Tensorflow

Containers

Container Syntax

<< EOF and Bash variable resolution

Running Jupyter NoteBook Server

SSH Tunneling

Hashing

Exporting variables

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

This guide is done by Lim Zi Xiong and Bryan Wong Wen Ping from Singapore Polytechnic.

Preclude

So whats actually happening?

SSH (Secure Shell)

Bash (Unix Shell)

WinSCP, Transfering Files

PBS Commands

PBS Queues

Problem with pip install as no root access

Modules

Accessing NSCC Login node

Visual Guide

Steps Taken

Run Basic Python script on NSCC as a queue.

Quick Guide

Transfer files to NSCC

Submitting Job using submission script.

Checking Status And other commands

Running Python With Tensorflow

Containers

Container Syntax

<< EOF and Bash variable resolution

Running Jupyter NoteBook Server

SSH Tunneling

Hashing

Exporting variables

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages