Access the compute facility

First we need to check we can log in to the compute facility that we will be using: the Eddie computer at the Edinburgh Compute and Data Facility (ECDF).

IMPORTANT: you must never directly contact the University computing helpline for assistance, unless we ask you to! Everything that you need (e.g., filesystems, compute access) has already been pre-configured for you. If you need help, ask in the lab or on the forums.

What you will learn in this part:

Working with a compute cluster:
- Logging in
- How to run a job
- Skills for working on a remote system
- Where files are stored

Logging in to Eddie in the terminal

We will be using the Edinburgh Compute and Data Facility that provides the “Eddie” compute cluster which has GPUs that we need.

Terminology:

Local computer – the computer you are sitting at, perhaps your own laptop or a PPLS lab computer

Login node – a computer that is part of Eddie, which you can log in to remotely

Compute node – a computer that is part of Eddie, which you cannot directly log in to, but can run jobs on. Some nodes contain a GPU, which is required for running the models that we will use in this exercise.

GPU – “graphics processing unit”, a specialised type of computer that performs a large number of computations at the same time (“in parallel”): well-suited to the type of computations needed for neural networks

Job – running a program on one of the compute nodes, by scheduling it

Scheduler – a program running on the cluster that decides which job to run next

Filesystem – a place to store files. There are multiple filesystems which you’ll learn about shortly.

To log in to Eddie, you need to be either on the campus network, or connected to the VPN. In a terminal on your local computer:

ssh s1234567@eddie.ecdf.ed.ac.uk

Where s1234567 is your username and the password is your EASE password. This will log you in to one of Eddie’s login nodes. Important: you must never perform any heavy computation on login nodes – they are shared between all users (and they don’t have GPUs).

For convenience, you can add the following to your ssh configuration file on your local computer (usually this is ~/.ssh/config)

Host eddie
  HostName eddie.ecdf.ed.ac.uk
  User s1234567

which will allow you to log in using

ssh eddie

and that will also be convenient when using VS Code.

Running your first job

Remember: do not run substantial jobs directly on a login node! There are two ways to run a job on the cluster:

Schedule the job (add it to the queue) – this is the most common way you will use
Request an interactive session on a compute node, then run the job directly – this is useful for development and debugging

Let’s run our first job!Create a shell script that prints “Hello world!” and save it as hello.sh in your home directory. Since this is such a simple script, it’s OK to try running it on the login node to check it works:

./hello.sh

now add the job to the queue

qsub hello.sh

You will get a message like: Your job 51978720 ("hello.sh") has been submitted

Now, how do we know whether our job is running, or has finished? We can ask the scheduler:

qstat

When run with no arguments, this will list all the jobs that you currently have in the queue. If you get no output, that means nothing is in the queue – so perhaps your job already finished. Since it ran on a compute node, it will not have printed “Hello world!” to our session on the login node. Instead, all output will be placed in a pair of files, one each for stdout and stderr:

hello.sh.e51978720
hello.sh.o51978720

The names for these files are constructed from the name of the job (which defaults to the name of the program or script that you submitted), e or o, and the job number (assigned by the scheduler). They will be saved in whichever directory you were in when you ran qsub.

Take a look at their contents. Which one contains “Hello world!”? Why?

Next, start learning the useful skills in the following section. Take your time and don’t worry if at first they seem difficult or confusing.

Skills: working on a remote machine
A few clever techniques will make working on a remote machine more convenient. If you find this part difficult or confusing, just take it slowly and keep practicing.
Skills: filesystems on ECDF
It's important to understand the differences between the various filesystems. Each is for a specific purpose.

Related forums

Korin’s slides from the first lab session are available in the forums.

- Forum
- Topics
- Posts
- Last Post

Access the compute facility

Logging in to Eddie in the terminal

Running your first job

Skills: working on a remote machine

Skills: filesystems on ECDF

Related forums