Current System Load

The plot below shows the status of the CPU nodes on the current Cirrus service for the past day (note: the Cirrus GPU nodes are not included in this plot).

A description of each of the status types is provided below the plot.

CPU

Cirrus Node Status graph

GPU

Cirrus GPU Node Status graph

Known Issues

We are experiening a heavy load on the metadata server. Our systems team are investigating but we suspect this is due to user(s) performing many I/O operations. We apologise for the inconvenience this is causing users.

Service Alerts

No current service alerts

Recently Resolved Service Alerts

This table lists resolved service alerts from the past 30 days. A full list of historical resolved service alerts is available.

Status Type Start End Scope User Impact Reason
Resolved Service alert 2022-11-21 09:00 2022-11-21 11:00 Login nodes The Cirrus login nodes are currently unavailable to users The Ceph home file system has issues due to a failure at the data centre
Resolved Service alert 2022-11-13 10:00 2022-11-14 09:00 SAFE website Users will get a security warning when trying to access SAFE website; some web browsers (e.g. Chrome) will not connect to SAFE website; Cirrus load plot on status page will not work The website certificate has expired

Service Calendar and Maintenance

Maintenance Sessions:Quarter 4 2022 (1st October - 31st December 2022)

Quarter 4 2022 (1st October - 31st December 2022)

Status Type Start End System User Impact Reason
Planned Full Maintenance 2022-12-07 09:00 2022-12-07 17:00 Cirrus Cirrus will not be available to users. This imcludes the login nodes, compute nodes and access to the filesystems. We will notify users when it is returned to service. Upgrade to the slurm batch scheduler.

Maintenance Logs for previous periods

Previous maintenance logs

Module Updates

Module Update following Cirrus Upgrade September 2022

Description Reason Advice
Removed Molpro module and user doc section No longer functional No longer centrally supported on Cirrus
Forge to be updated v20.0.3 found to have security flaw Pending. Newer version will be installed as a replacement.
Updated mpi4py All the mpi4py modules are tied to a particular version of python, 3.8.12. More flexibility is required such that users can run python-based parallel code using different python versions. The mpi4py modules have been replaced by a suite of python modules: python/3.8.13, python/3.8.13-gpu, python/3.9.12, and python/3.9.12-gpu. The gpu modules load a miniconda3 python environment containing mpi4py 3.1.3 linked with OpenMPI 4.1.x and CUDA 11.6; whereas the cpu modules (no -gpu suffix) load a python environment containing mpi4py 3.1.3 linked with HPE MPT 2.25. (The python/3.8.13-gpu module is linked with OpenMPI 4.1.2 and the python/3.9.12-gpu module is linked with OpenMPI 4.1.4.) 
Updated horovod Updated Module version 0.24.2-gpu has been replaced by 0.25.0-gpu.
Updated pytorch Updated Module version 1.11.0-gpu has been replaced by 1.12.0-gpu.
Updated tensorflow Updated Module version 2.8.0-gpu has been replaced by 2.9.1-gpu.
Updated scalasca Version 2.5 no longer functional. Please use 2.6-gcc8-mpt225 or 2.6-intel19-mpt225 instead.
Removed spack/2020 module Not used. Not required. Please contact the service desk if Spack installation is needed.
Updated tmux Version 3.1b no longer functional. Version 3.3a provided as replacement.

At Risk Maintenance Sessions

There is an ‘At-Risk’ Session provisionally booked every Wednesday from 1000 - 1200. A user mailing will be sent if any work is going to take place which may impact users.

Service Calendar

We maintain a calendar for the Cirrus service that lists upcoming events (such as training courses and maintenance sessions):

We keep maintenance downtime to a minimum on the service but do occaisionally need to perform essential work on the system. Maintenance sessions are used to ensure that:

Additional maintenance sessions can be scheduled for major hardware or software updates; major upgrades to facility plant and infrastructure; acceptance testing following major service upgrades and statutory electrical testing.