Current System Load

The plot below shows the status of the CPU nodes on the current Cirrus service for the past day (note: the Cirrus GPU nodes are not included in this plot).

A description of each of the status types is provided below the plot.

CPU

Cirrus Node Status graph

GPU

Cirrus GPU Node Status graph

Known Issues

We are experiening a heavy load on the metadata server. Our systems team are investigating but we suspect this is due to user(s) performing many I/O operations. We apologise for the inconvenience this is causing users.

Service Alerts

No current service alerts

Recently Resolved Service Alerts

This table lists resolved service alerts from the past 30 days. A full list of historical resolved service alerts is available.

Status Type Start End Scope User Impact Reason
Resolved Service Alert 2025-03-12 08:53 2025-03-12 10:00 Issues with the slurm controller have been observed Users can connect to the login node but jobs will not start on the compute nodes. Users will not be able to issue slurm commands. Systems team are investigating the issue.
Resolved Service Alert 2025-02-26 09:30 2025-02-26 12:00 A group of nodes on Cirrus developed a technical fault. Work has been prevented from starting on the affected nodes. Work already running on these nodes may fail but should be uncharged. Our systems team have identified a technical fault with some Cirrus nodes. These nodes have now been restored.
Resolved Service Alert 2025-02-13 12:30 2025-02-25 09:00 Solid state (/scratch) RPOOL file system Any jobs using /scratch file system will fail /scratch file system is 100% full

Cirrus Service end

The EPSRC funding for the Cirrus service will end on 31st March 2025. EPCC plan on operating the Cirrus service until 30th June 2025 in an at-risk, unsupported mode. If there are any major system issues between 1st April and 30th June, we may need to end the service during this period.

Service Calendar and Maintenance

This section lists recent and upcoming maintenance sessions. A full list of past maintenance sessions is available.

No scheduled or recent maintenance sessions

Maintenance Logs for previous periods

Previous maintenance logs

At Risk Maintenance Sessions

There is an ‘At-Risk’ Session provisionally booked every Wednesday from 1000 - 1200. A user mailing will be sent if any work is going to take place which may impact users.

Service Calendar

We maintain a calendar for the Cirrus service that lists upcoming events (such as training courses and maintenance sessions):

We keep maintenance downtime to a minimum on the service but do occaisionally need to perform essential work on the system. Maintenance sessions are used to ensure that:

Additional maintenance sessions can be scheduled for major hardware or software updates; major upgrades to facility plant and infrastructure; acceptance testing following major service upgrades and statutory electrical testing.