Full system power down
Friday 29th August 2025 09:00 - Thursday 18th September 18:00
Due to a significant Health and Safety risk, associated with our power supply to the site, action is required at the Advanced Computing Facility (ACF). There will be a full power outage to the site during this period. Specialised external contractors will be working on a 24/7 basis for the outage period replacing switchgear.
Users will not be able to connect to Cirrus and will not be able to access data on any of the Cirrus file systems. The system will be drained of jobs ahead of the power outage and jobs will not run during this period. Any queued jobs will remain in the queue during the outage and jobs will start once the service is returned. SAFE and the Cirrus website will be available.
- Current System Load
- Known Issues
- Current Issues
- Recent Issues
- Maintenance
- Cirrus service end
- Service Calendar and Maintenance
Current System Load
The plot below shows the status of the CPU nodes on the current Cirrus service for the past day (note: the Cirrus GPU nodes are not included in this plot).
A description of each of the status types is provided below the plot.
CPU
- alloc: Nodes running user jobs
- idle: Nodes available for user jobs
- resv: Nodes in reservation and not available for standard user jobs
- down, drain, maint, drng, comp: Nodes unavailable for user jobs
- mix: Nodes in multiple states
GPU
- alloc: Nodes running user jobs
- idle: Nodes available for user jobs
- resv: Nodes in reservation and not available for standard user jobs
- down, drain, maint, drng, comp: Nodes unavailable for user jobs
- mix: Nodes in multiple states
Known Issues
We are experiening a heavy load on the metadata server. Our systems team are investigating but we suspect this is due to user(s) performing many I/O operations. We apologise for the inconvenience this is causing users.
Service Alerts
No current service alerts
Recently Resolved Service Alerts
This table lists resolved service alerts from the past 30 days. A full list of historical resolved service alerts is available.
Status | Type | Start | End | Scope | User Impact | Reason |
---|---|---|---|---|---|---|
Resolved | Service Alert | 2025-08-11 13:50 | 2025-08-11 14:17 | SAFE, MFA at login | Login not accessible, SAFE not accessible | Due to work on SAFE database, SAFE and Cirrus login MFA are currently unavailable |
Cirrus Service end
The EPSRC funding for the Cirrus service ended on 31st March 2025. EPCC plan on operating the Cirrus service until later in the year in an at-risk, unsupported mode. If there are any major system issues between we may need to end the service during this period.
Service Calendar and Maintenance
This section lists recent and upcoming maintenance sessions. A full list of past maintenance sessions is available.
Status | Type | Start | End | Scope | User Impact | Reason |
---|---|---|---|---|---|---|
Planned | Full | 2025-08-29 09:00 | 2025-09-18 18:00 | Full Cirrus system | Users will not be able to connect to Cirrus and will not be able to access data on any of the Cirrus file systems. The system will be drained of jobs ahead of the power outage and jobs will not run during this period. Any queued jobs will remain in the queue during the outage and jobs will start once the service is returned. SAFE and the Cirrus website will be available. | Due to a significant Health and Safety risk, associated with our power supply to the site, action is required at the Advanced Computing Facility (ACF). There will be a full power outage to the site during this period. Specialised external contractors will be working on a 24/7 basis for the outage period replacing switchgear. |
Maintenance Logs for previous periods
At Risk Maintenance Sessions
There is an ‘At-Risk’ Session provisionally booked every Wednesday from 1000 - 1200. A user mailing will be sent if any work is going to take place which may impact users.
Service Calendar
We maintain a calendar for the Cirrus service that lists upcoming events (such as training courses and maintenance sessions):
We keep maintenance downtime to a minimum on the service but do occaisionally need to perform essential work on the system. Maintenance sessions are used to ensure that:
- software versions are kept up to date;
- firmware levels on HPE and third-party peripheral equipment are kept up to date; essential security patches are applied;
- failed/suspect hardware can be replaced;
- new software can be installed; periodic essential maintenance on HPE electrical and mechanical support equipment (refrigeration systems, air blowers and power distribution units) can be undertaken safely.
Additional maintenance sessions can be scheduled for major hardware or software updates; major upgrades to facility plant and infrastructure; acceptance testing following major service upgrades and statutory electrical testing.