Server stops working while trying to reach the storage drive
Server stops working while trying to reach the storage drive
Yesterday, my home server stopped abruptly after a reboot. Everything functioned normally for nearly an hour before it shut down again. Restarting via the power button didn’t help until I performed a full power cycle of the PSU. Today’s attempts: removed SSDs and hard drives, booted an Ubuntu live CD, and ran a CPU stress test. The test ended after two hours without any unusual events. After turning off everything, reconnecting the hard drives and returning to the live CD, I imported my ZFS pool. While starting a scrub and continuing the CPU test, the system crashed after ten minutes. I restarted, reimported the ZFS pool, and attempted to copy configuration files and Docker Compose from the ZFS array. Initially successful, but on a second try it failed again. This time, skipping the PSU power cycle and simply disconnecting/reconnecting the ATX power cable to the motherboard allowed me to boot back up. From this, it appears the motherboard may be the issue rather than the PSU. The problem could stem from the PSU or more likely from the motherboard itself. It’s possible a hard drive can cause a sudden system failure without other visible signs. Any guidance on further troubleshooting would be helpful!
Hardware details: CPU – AMD Ryzen 5 3600; Motherboard – Asus ROG STRIX X470-F GAMING; RAM – 2 x Corsair Vengeance LPX, 16 GB DDR4-3600; Storage – 2 x SSD 120GB MP300 (RAID1 with mdadm); GPU – MSI GT 710 2GB; PSU – Antec Earthwatts Gold Pro 550W.
I understand you checked the CPU performance but didn’t test memory under stress. It seems you’re concerned about possible memory problems, especially since they happened during ZFS pool operations or when cleaning. Could you share which operating system you were using?
Typically I use Ubuntu Server 24.04, but my tests were done on a live CD of the same version. I'm currently using Memtest and anticipating a crash due to poor memory, not a simple power cut.