System crash after GPU stress test
System crash after GPU stress test
Hello! I just completed assembling my PC two days back. Here are the details: CPU - AMD Ryzen 7 7700X 4.5 GHz 8-core processor, CPU cooler - Thermalright Phantom Spirit 120 SE ARGB 66.17 CFM, Motherboard - gigabyte B650 Gaming X AV V2 ATX AM5, Memory - GSkill Flare X5 48 GB (2 x 24) DDR5-5200 CL40, Storage - Silicon Power UD90 2 TB M.2-2280 PCIe 4.0 X4 NVME SSD, Video Card - Gigabyte AERO OC GeForce RTX 4070 Ti Super 16 GB, Case - Phanteks XT Pro Ultra ATX Mid Tower, Power Supply - MSI MPG A850G PCIE5 850 W (80+ Gold certified), OS: Windows 11 home, version 10.0.22631 x64, BIOS Version: F31.
To review my progress: XMP was activated in BIOS, BIOS updated. Ran Cinebench for a 10-minute stress test; temperatures stayed reasonable (~68°C max). After adjustments with PBO settings (-20 pbo, capped at 85W, temps at 85), I passed the stress test without issues. Then I performed an OCCT stability check and got no BSODs. The GPU remained stable up to 68°C.
During the stability tests, I encountered repeated crashes during OCCT 3D Standard GPU benchmarking. Minidump analysis suggested driver problems initially, but after running sfc /scannow, some corrupted Bluetooth drivers were fixed. Occasionally the OCCT app would crash instead of the PC, but it happened randomly. I turned off PBO completely, updated BIOS and graphics drivers, and even changed memory stick positions.
I disabled XMP link to gdrive using minidump numbers (newest first). Some earlier crashes were still occurring, but they were resolved after these changes. A few of the minidumps came from before I tried troubleshooting steps that didn’t help. The latest ones point to kernel or RAM issues, though I’d have noticed them during the stability test if they were present.
If anyone has suggestions for next steps, please let me know. Thanks!
Usually I haven't noticed it crashing during stress tests, though I've seen brief screen glitches that seemed odd. Recent checks suggest a memory problem might be at play. I ran an OCCT memory test which showed rapid errors. I also tried using only one memory stick in each slot and both times the system failed.
Stress tests are designed to push it to the point of failure, and if it isn't doing it under regular use: it's not something to be really concerned about. It's entirely possible, RAM can do some odd things when you stress test it. That can happen, especially because it isn't ECC, though technically DDR5 does have some ECC functionality.
Check if your precise RAM model appears in the motherboard's documentation. AMD CPUs typically don't support 24GB modules. Additionally, does the GPU require a firmware or VBIOS upgrade for the card?
It seems the data comes from dump files. Memory isn't always RAM, but it's often what people think. Windows moves small amounts of RAM data to the page file and retrieves it when needed, making storage appear like memory. The controller is built into the CPU, and if it fails, it behaves like memory. Storage issues usually show up in about half of the dumps as storage or storage drivers, which isn't the case here—so storage is unlikely. If any part is overclocked or under-voltaged, remove it. This includes XMP settings above 5200MT/s. Updating the BIOS might help. To check RAM, run the machine normally with one stick at a time. If only one stick causes crashes, that stick is faulty. If it crashes with either stick, the CPU is likely the problem. Memory testers often miss bad DDR4 and newer RAM.
While performing that task, I ran a memtest86 on one of the drives. It encountered two errors during the second pass. I plan to test the remaining drives and ports later today after work.
Consider that RAM may become defective occasionally, though this is rare. Alternatively, a mismatch could occur.