Cirrus Service Status

Current System Load

The plot below shows the status of the CPU nodes on the current Cirrus service for the past day.

A description of each of the status types is provided below the plot.

CPU

Cirrus Node Status graph

  • alloc: Nodes running user jobs
  • idle: Nodes available for user jobs
  • resv: Nodes in reservation and not available for standard user jobs
  • down, drain, maint, drng, comp: Nodes unavailable for user jobs
  • mix: Nodes in multiple states

Service Alerts

StatusStartEndScopeImpactReason
Ongoing2026-01-21 18:302026-02-06 18:00Cirrus coolingLow probability risk of requirement to throttle compute for Cirrus to reduce cooling loadMaintenance being carried out on one of 3 pumps that cool ARCHER2 and Cirrus. The other 2 pumps will provide cooling unless there is a coincidental failure of one of the 2 remaining pumps.

Recently Resolved Service Alerts

This table lists the last five resolved service alerts A full list of historical resolved service alerts is available.

StatusStartEndScopeImpactReason
Resolved2026-02-02 13:442026-02-02 14:31Compute nodesNew and pending jobs will remain queued and not run on Cirrus until the issue is resolved.There's an ongoing issue with the cooling pump, which is being actively investigated.
Resolved2026-01-28 18:002026-01-30 10:00Cirrus home (CephFS) file systemLow risk of file system outage, performance degradation expectedEssential software upgrades to the CephFS file system
Resolved2026-01-07 08:002026-01-07 18:00Mains power supply to systems at ACFLow risk of loss of mains power to ACF systemsEssential maintenance to ACF high voltage switchgear to resolve what is believed to be a minor issue with voltage transformation found during detailed checks after the replacement earlier this year.
Resolved2025-10-16 12:002025-11-19 18:00/work file systemRisk of unexpected I/O performance issuesCommissioning/testing of new Cirrus hardware sharing same file system
Resolved2025-10-15 08:002025-10-16 14:30Login nodesRisk of unexpected issues with new accounts on 15/16 OctEssential upgrade of authorisation servers

Service Maintenance Sessions

We keep maintenance downtime to a minimum on the service but do occasionally need to perform essential work on the system. Maintenance sessions are used to ensure that:

  • software versions are kept up to date;
  • firmware levels on HPE and third-party peripheral equipment are kept up to date; essential security patches are applied;
  • failed/suspect hardware can be replaced;
  • new software can be installed; periodic essential maintenance on HPE electrical and mechanical support equipment (refrigeration systems, air blowers and power distribution units) can be undertaken safely.

Additional maintenance sessions can be scheduled for major hardware or software updates; major upgrades to facility plant and infrastructure; acceptance testing following major service upgrades and statutory electrical testing.

No upcoming or ongoing maintenance sessions

A list of all previous maintenance sessions.