Cirrus Service Status

Current System Load
Service Alerts
Service Maintenance Sessions

Current System Load

The plot below shows the status of the CPU nodes on the current Cirrus service for the past day (note: the Cirrus GPU nodes are not included in this plot).

A description of each of the status types is provided below the plot.

CPU

Cirrus Node Status graph

alloc: Nodes running user jobs
idle: Nodes available for user jobs
resv: Nodes in reservation and not available for standard user jobs
down, drain, maint, drng, comp: Nodes unavailable for user jobs
mix: Nodes in multiple states

GPU

Cirrus GPU Node Status graph

alloc: Nodes running user jobs
idle: Nodes available for user jobs
resv: Nodes in reservation and not available for standard user jobs
down, drain, maint, drng, comp: Nodes unavailable for user jobs
mix: Nodes in multiple states

Service Alerts

Status	Start	End	Scope	Impact	Reason
Ongoing	2025-10-16 12:00	2025-10-31 18:00	/work file system	Risk of unexpected I/O performance issues	Commissioning/testing of new Cirrus hardware sharing same file system

Recently Resolved Service Alerts

This table lists the last five resolved service alerts A full list of historical resolved service alerts is available.

Status	Start	End	Scope	Impact	Reason
Resolved	2025-10-15 08:00	2025-10-16 14:30	Login nodes	Risk of unexpected issues with new accounts on 15/16 Oct	Essential upgrade of authorisation servers
Resolved	2025-08-11 13:50	2025-08-11 14:17	SAFE, MFA at login	Login not accessible, SAFE not accessible	Due to work on SAFE database, SAFE and Cirrus login MFA are currently unavailable
Resolved	2025-05-05 08:00	2025-05-05 10:30	Slurm batch system, lustre file system (work) and the solid state file system (RPOOL).	Users can connect to login nodes and access their data. No new jobs will start until further investigations take place.	Issues with switch on Cirrus front end
Resolved	2025-03-12 08:53	2025-03-12 10:00	Issues with the slurm controller have been observed	Users can connect to the login node but jobs will not start on the compute nodes. Users will not be able to issue slurm commands.	Systems team are investigating the issue.
Resolved	2025-02-26 09:30	2025-02-26 12:00	A group of nodes on Cirrus developed a technical fault.	Work has been prevented from starting on the affected nodes. Work already running on these nodes may fail but should be uncharged.	Our systems team have identified a technical fault with some Cirrus nodes. These nodes have now been restored.

Service Maintenance Sessions

We keep maintenance downtime to a minimum on the service but do occasionally need to perform essential work on the system. Maintenance sessions are used to ensure that:

software versions are kept up to date;
firmware levels on HPE and third-party peripheral equipment are kept up to date; essential security patches are applied;
failed/suspect hardware can be replaced;
new software can be installed; periodic essential maintenance on HPE electrical and mechanical support equipment (refrigeration systems, air blowers and power distribution units) can be undertaken safely.

Additional maintenance sessions can be scheduled for major hardware or software updates; major upgrades to facility plant and infrastructure; acceptance testing following major service upgrades and statutory electrical testing.

No upcoming or ongoing maintenance sessions

A list of all previous maintenance sessions.