Skills: filesystems on ECDF

It's important to understand the differences between the various filesystems. Each is for a specific purpose.

What you will learn:

  • Why there are multiple filesystems
  • What each of them should be used for

The Edinburgh Compute and Data Facility (ECDF) includes the Eddie compute cluster and several filesystems.

High-performance storage

Because Eddie has thousands of compute nodes, each of which needs to constantly access files, high-performance (i.e., very fast) storage is required. Because this is special, high-performance storage, it can only be accessed from within ECDF. Moving files to and from the high-performance filesystem requires the use of “staging”.

The high-performance filesystem provides your home directory, group space, and scratch space.

Your home directory

On ECDF, you have a relatively small quota available in your home directory, /home/s1234567. This is only for storing configuration files (e.g., ssh keys) and should not be used for storing data or models.

Group space

High-performance shared (i.e., “group”) space for the whole class is available at /exports/chss/eddie/ppls/groups/slpgpustorage. There is a quota (i.e., maximum amount of storage available) and this is shared between everyone, so please be careful about how much space you use.

In the group space you will find tts_cw which is a read-only shared copy of all the code and common data (e.g., ARCTIC corpora). Do not make your own copy of this! That will simply waste expensive storage space. Always use the shared copy!

You should create a personal area within the group space, with your UUN as the directory name /exports/chss/eddie/ppls/groups/slpgpustorage/users/s1234567, which is where you should store everything that you need. Keep your space tidy, and delete things that you don’t need.

The high-performance filesystem is extremely reliable. If you delete something accidentally, there is a way to recover it (ask in the lab or on the forums). But if the system has a major fault, files might be lost. Therefore, you should keep safe copies of anything that would be difficult to replace – such as your own speech recordings – somewhere else (not on ECDF).

Scratch space

You have a substantial amount of space available in /exports/eddie/scratch/s1234567. This is called “scratch” space because it is only for short-lived files. It is free-of-charge and is ideal for use with Eddie’s compute nodes. However, the filesystem will automatically delete old files (older than 1 month), so anything that you wish to keep longer (e.g., a trained model) should be moved to group space

General storage

High-performance storage is expensive, so it is only used when necessary: for files needed by Eddie’s compute nodes. Cheaper, lower-performance storage is available for everything else. This storage is provided by DataStore and can be accessed from within ECDF at /chss/datastore/ppls/groups/slpgpustorage. You probably do not need to use this storage area.

Moving files to and from ECDF

There are several methods for moving files to or from ECDF. The procedure is called “staging”. Full details are available in the documentation but you will probably only need the following method.

From any computer on the University network (including your personal computer, if connected to the VPN), assuming that you have configured ssh so that eddie refers to a login node, you can copy a file to ECDF:

rsync -av some_file.txt eddie:/exports/chss/eddie/ppls/groups/full/path/to/destination/directory/

rsync can also be used to recursively copy an entire directory, or to copy from ECDF.