Skip to the content.

Welcome to Perimeter’s HPC system “Symmetry”

Symmetries are important in physics. Noether’s theorem states that every local symmetry of a physical system generates a conservation law. In honour of this principle, Perimeter’s HPC system is called Symmetry.

Symmetry is intended to serve the needs of Perimeter researchers, filling a gap between personal devices such as laptops and desktops, and large national sytems offered e.g. by Compute Canada. As such, each node of Symmetry is significantly more powerful than a laptop, but cannot compete with a national system such as Graham or Niagara.

(This documentation is still under construction, and should be completed within the next days. Please report errors, omissions, and suggestions for this documentation to our help desk.)

Contact and help

As usual for all technical systems at Perimeter, the main channel to report issues and ask for assistance is our help desk.

For online discussions, there is a Gitter chat room Computing at Perimeter. This chat room is not restricted to discussing Symmetry, but is for all topics related to Computational Physics at Perimeter.

System description

Hardware

Symmetry consists of:

There are various additional bits and pieces, primarily for administration, that are mostly invisible to general users.

Available Software

Symmetry provides a wide range of software. If you need additional software, you can request this via the help desk, or you can install it into your home directory.

Pre-installed software:

Modules

Some of the software packages use Environment modules. This means you need to load a module before the package is available. Use module avail to see what modules are avilable, module load to load a module. There are also module list, module unload, and module help.

Python

Several Python versions are available, documented on this page.

Containers / Docker / Singularity

We provide the Singularity program for running containers on Symmetry.

Using Symmetry

Access

All researchers at Perimeter have in principle access to Symmetry. Please contact the help desk to enable this access. It is probably a good idea to enable VPN and ssh access to Perimeter at the same time. Symmetry is located behind Perimeter’s firewall, and is not directly accessible from the outside.

There are two ways to access Symmetry, the traditional command-line based way using ssh, and via a web browser and JupyterHub:

Access via ssh

To log in, use ssh USERNAME@symmetry. (Replace USERNAME with your user name.) This will ask for your Perimeter password. We recommend generating ssh keys and using an ssh key chain to allow a password-less access. (Question: Where is a good tutorial for this?)

On Linux and MacOS, ssh is pre-installed. On Windows, you might need to install a client such as PuTTY. (Question: Is there a better alternative to PuTTY?)

Access via JupyterHub

Jupyterhub is documented here.

Remote Desktop / Mathematica / Matlab

In order to run graphical desktop applications on the Symmetry head nodes, we have a VNC server set up, described here.

Running jobs

While you can run jobs interactively on the head nodes, you need to be careful when doing so: Head nodes are shared between all users on Symmetry. Do this only for tasks that do not need many resources. For example, compiling code, or brief tests of a Julia or Mathematica notebook are probably fine. If in doubt, use a compute node instead.

If you overload a head node by using too much memory or too many threads, others might suffer, and an administrator might have to stop in and abort your task. (Question: Is there a tutorial explaining how to use top to monitor one’s processes?)

Generally, we recommend running jobs on Symmetry’s compute nodes, as described below.

Slurm resource manager

To avoid conflict when accessing the compute nodes, we use the Slurm resource manager (aka “scheduler” or “queueing system”). Slurm keeps track of which compute nodes are currently used by who. If you want to use a certain number of compute nodes, you have to ask Slurm, and you might have to wait until the nodes are available before you can run your job.

The basic work flow is thus as follows:

  1. You write a batch script (shell script) for your job. (Below are some examples.) This script defines which resources you want (e.g. “4 nodes for 7 days”), and also how to run your job.

  2. You submit this script to Slurm via sbatch (see below for examples).

  3. If the system is busy, your job might have to wait in the queue for some time. Slurm will try to be “fair” to all users (whatever that means). Your job’s priority is determined by several factors, including how much you have used Symmetry recently, and how many resources your job requests.

  4. Slurm will run your job automatically (that’s what batch means). This does generally not work with notebooks (e.g. Mathematica, Jupyter). Instead, you need to write a script with a text editor (see below for examples).

  5. After the job has finished, you examine its output that was presumably written to a file.

Using Slurm

After module load slurm, you can get an overall view of the system with sinfo. This might output a description like this:

$ sinfo
PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
defq*        up 7-00:00:00      6 drain* cn[017,019-020,040,045,056]
defq*        up 7-00:00:00      1  drain cn002
defq*        up 7-00:00:00      1  alloc cn001
defq*        up 7-00:00:00     68   idle cn[003-016,018,021-039,041-044,046-055,057-076]
debugq       up    1:00:00      6 drain* cn[017,019-020,040,045,056]
debugq       up    1:00:00      1  drain cn002
debugq       up    1:00:00      1  alloc cn001
debugq       up    1:00:00     68   idle cn[003-016,018,021-039,041-044,046-055,057-076]

This means there are two partitions (queues) available, called defq (the default queue) and debugq (for debugging and short interactive jobs). defq allows jobs to run for up to 7 days, debugq for up to 1 hour. Not shown here is the fact that jobs in debugq have a much higher priority and will usually start before any jobs waiting in defq.

The description of the compute nodes is (unfortunately) repeated for both queues. Nodes in the drain state are not available; they are either reserved for administrative use (here cn002), or are unresponsive (here cn[017,019-020,040,045,056]; these nodes are presumably either being updated, or might be reporting a hardware issue). Nodes in the alloc state are currently in use, and nodes in the idle state are currently free.

The command squeue shows all jobs that are currently either waiting or running. squeue -u USERNAME (replace USERNAME with your user name) shows only your jobs.

Running jobs interactively

(JupyterHub, srun, reservations, …)

Running batch jobs

Slurm comes with extensive documentation and tutorials.

When running a job on Symmetry, you need to describe how many nodes and cores your job is requesting. Determining this correctly is not always straightforward:

Here are some examples that might be useful for a quick start:

Note: The Slurm scripts contain path names pointing into my (eschnetter’s) home directory. You need to change this to point into a directory of yours, otherwise you will not see the output.

File systems

(home directory, GPFS)