Current System Load

The plot below shows the status of the CPU nodes on the current Cirrus service for the past day (note: the Cirrus GPU nodes are not included in this plot).

A description of each of the status types is provided below the plot.

CPU

Cirrus Node Status graph

GPU

Cirrus GPU Node Status graph

Known Issues

We are experiening a heavy load on the metadata server. Our systems team are investigating but we suspect this is due to user(s) performing many I/O operations. We apologise for the inconvenience this is causing users.

Service Alerts

No current service alerts

Recently Resolved Service Alerts

This table lists resolved service alerts from the past 30 days. A full list of historical resolved service alerts is available.

Status Type Start End Scope User Impact Reason
Resolved Service alert 2023-03-13 21:00 2023-03-14 15:25 Work (Lustre) parallel file system Users may see lack of responsiveness on Cirrus login nodes and reduced IO performance Heavy load on the Lustre file system is causing contention for shared resources

Service Calendar and Maintenance

Maintenance Sessions:Quarter 4 2022 (1st October - 31st December 2022)

Quarter 1 2023

Status Type Start End System User Impact Reason
Planned Partial Maintenance 2023-02-07 09:00 2023-02-07 17:00 Cirrus CPU and GPU compute nodes will be unavailable. Login access and access to data will still be available. Essential maintenance to the Cirrus liquid cooling system.

Maintenance Logs for previous periods

Previous maintenance logs

Module Updates

Module Update following Cirrus Upgrade September 2022

Description Reason Advice
Removed Molpro module and user doc section No longer functional No longer centrally supported on Cirrus
Forge to be updated v20.0.3 found to have security flaw Pending. Newer version will be installed as a replacement.
Updated mpi4py All the mpi4py modules are tied to a particular version of python, 3.8.12. More flexibility is required such that users can run python-based parallel code using different python versions. The mpi4py modules have been replaced by a suite of python modules: python/3.8.13, python/3.8.13-gpu, python/3.9.12, and python/3.9.12-gpu. The gpu modules load a miniconda3 python environment containing mpi4py 3.1.3 linked with OpenMPI 4.1.x and CUDA 11.6; whereas the cpu modules (no -gpu suffix) load a python environment containing mpi4py 3.1.3 linked with HPE MPT 2.25. (The python/3.8.13-gpu module is linked with OpenMPI 4.1.2 and the python/3.9.12-gpu module is linked with OpenMPI 4.1.4.) 
Updated horovod Updated Module version 0.24.2-gpu has been replaced by 0.25.0-gpu.
Updated pytorch Updated Module version 1.11.0-gpu has been replaced by 1.12.0-gpu.
Updated tensorflow Updated Module version 2.8.0-gpu has been replaced by 2.9.1-gpu.
Updated scalasca Version 2.5 no longer functional. Please use 2.6-gcc8-mpt225 or 2.6-intel19-mpt225 instead.
Removed spack/2020 module Not used. Not required. Please contact the service desk if Spack installation is needed.
Updated tmux Version 3.1b no longer functional. Version 3.3a provided as replacement.

At Risk Maintenance Sessions

There is an ‘At-Risk’ Session provisionally booked every Wednesday from 1000 - 1200. A user mailing will be sent if any work is going to take place which may impact users.

Service Calendar

We maintain a calendar for the Cirrus service that lists upcoming events (such as training courses and maintenance sessions):

We keep maintenance downtime to a minimum on the service but do occaisionally need to perform essential work on the system. Maintenance sessions are used to ensure that:

Additional maintenance sessions can be scheduled for major hardware or software updates; major upgrades to facility plant and infrastructure; acceptance testing following major service upgrades and statutory electrical testing.