Dear MeluXina users,
On September 1st starting 09:00 CEST a maintenance for performance, stability and security will be performed on MeluXina, which is expected to take up to 4 days.
During this time, the system will not be accessible, and jobs submitted prior to the maintenance window will only start after the maintenance is concluded. Services running on the Cloud module will not be able to start computational jobs.
The following actions are tentatively planned for this maintenance.
- the DDN Lustre storage systems will be upgraded to a major new release (ExaScaler 6.3)
- the Slurm workload manager will be upgraded to a new release (23.11)
- the login and compute nodes will be upgraded to RHEL/Rocky Linux 8.10 including updated drivers for the Nvidia GPUs, BittWare FPGAs, Nvidia Infiniband and DDN Lustre
- several other updates for improved stability and security
For any questions please contact our Service Desk.
Thank you for your understanding.
The LuxProvide team
UPDATES
- 2024-09-05 23:00 System open for production
- 2024-09-05 22:00 Most of the compute nodes have been updated with a new HCA firmware
- 2024-09-05 12:30 The Infiniband network issue has been identified and a fix will be applied
- 2024-09-05 08:55 The Infiniband network issue has been escalated to the system vendor and is under investigation. The maintenance window will be extended until a fix can be implemented.
- 2024-09-04 23:55 An Infiniband network issue is being analysed and prevents the return of the system to production. The maintenance window may need to be extended for the recovery of the fabric.
- 2024-09-01 09:00 Maintenance has started