Author Topic: Brutal crash Corona 1.7 + threadripper  (Read 7242 times)

2017-10-28, 08:39:50

lolec

  • Active Users
  • **
  • Posts: 179
    • View Profile
Hello, Just upgraded my system to threadripper 1950 and I'm getting some pretty brutal crashes that "lock" my computer. I've never had crashes like this before in any other scenario, I will try to describe.

Rendering in in interactive or normal render, after a few minutes the video source goes out.  Fans and case lights are still on though, but I get absolutely no video. Another strange thing is that my mouse's rgb logo goes black too.

After the crash, I can't turn the pc off by long pressing the power button. I have to cut power completely, plug again (fans spinning but still no video), long press power button to turn off and then turn on again normally, even the mouse turns on. This is the only way I found to recover from the brutal crash.

This happened 4 or 5 times, and I decided to downgrade to 1.6.3, no issue so far.

I would like to reproduce this issue but I fear hardware damage, I've never seen a computer crash like this.

The system is not overclocked at all. It's a fresh install only 3dsmax and corona, all the latest drivers.




2017-10-28, 09:36:58
Reply #1

maru

  • Corona Team
  • Active Users
  • ****
  • Posts: 12750
  • Marcin
    • View Profile
It sounds like some hardware malfunction, maybe something with the PSU? You could try running some stress tests other than Corona and see if your system is stable. Other than that, you can try underclocking your CPU and check if the crashes still appear in 1.7.
Marcin Miodek | chaos-corona.com
3D Support Team Lead - Corona | contact us

2017-10-29, 02:18:49
Reply #2

lolec

  • Active Users
  • **
  • Posts: 179
    • View Profile
It appears to be a hardware issue indeed.

After some more testing, 1.6.3 ended up crashing in the exact same way.

However, I returned to 1.7 and the crashes are easier to reproduce.

I'm trying to figure out what part is causing the crashes, is there something that happens much more in 1.7 than 1.6.3?  Since the crashes are more common in 1.7, maybe that can lead me to find the issue?

2017-10-29, 13:26:39
Reply #3

agentdark45

  • Active Users
  • **
  • Posts: 579
    • View Profile
It appears to be a hardware issue indeed.

After some more testing, 1.6.3 ended up crashing in the exact same way.

However, I returned to 1.7 and the crashes are easier to reproduce.

I'm trying to figure out what part is causing the crashes, is there something that happens much more in 1.7 than 1.6.3?  Since the crashes are more common in 1.7, maybe that can lead me to find the issue?

If this were me, here's what I'd do to troubleshoot this:

1. Check all drivers are updated.

2. Fresh install of Windows/reformat.

3. Remove all but 1 of the RAM sticks (and then cycle that one to check that you haven't picked a possible dodgy RAM stick).

4. Try an alternative graphics card.

5. Disconnect all non essential peripherals and try swapping the PSU if possible.

6. If all of the above fails you are then left with either A. a dodgy CPU (highly unlikely), or B. A dodgy motherboard (more likely, I've had several fail on me, but never had a CPU malfunction). Assuming your hardware vendors are decent people I would at this point RMA both the motherboard and CPU for testing.
« Last Edit: 2017-10-29, 13:30:00 by agentdark45 »
Vray who?

2017-10-29, 20:12:17
Reply #4

lolec

  • Active Users
  • **
  • Posts: 179
    • View Profile
Thanks for your help, not much of a pc hardware person so I really appreciate the help.

2017-10-29, 22:25:38
Reply #5

sprayer

  • Active Users
  • **
  • Posts: 794
    • View Profile
check event log for system and applications in time of crash it can show the cause. I am doubt what corona do this. Just check if it's happen at 100% load cpu. Is power supply enough for your config? Check temperature for all devices during 100% load

2017-10-30, 02:42:21
Reply #6

danio1011

  • Active Users
  • **
  • Posts: 361
    • View Profile
Yeah sounds like a heat or PSU issue.  If you're seeing more crashes under 1.7 than 1.6, perhaps Corona 1.7 is slightly more demanding in some way?  Like for CPU usage (resulting in heat) or RAM (exposing a flaw in one of your sticks)?

2017-10-30, 04:10:56
Reply #7

lolec

  • Active Users
  • **
  • Posts: 179
    • View Profile
PSU is 1000w, doubt is the problem.

@sprayer, can you elaborate a little bit on event log? do you mean coronas? or windows? Tried googling but couldn't find something definitive under that name for windows.

I have stress tested the cpu for 12 hours and no crashes, i'm leaning towards RAM issue.

Will test sticks individually as agentdark45 suggested.

2017-10-30, 09:29:10
Reply #8

maru

  • Corona Team
  • Active Users
  • ****
  • Posts: 12750
  • Marcin
    • View Profile
Are you monitoring your CPU temperature? What is it before the crash?
Marcin Miodek | chaos-corona.com
3D Support Team Lead - Corona | contact us

2017-10-30, 10:16:57
Reply #9

sprayer

  • Active Users
  • **
  • Posts: 794
    • View Profile
i was meaning this event log http://cdn.windowsreport.com/wp-content/uploads/2014/09/clear-event-log-in-Windows-8.png
What stress test do you do? LinX is heating cpu fine. Ram test all run without booting OS. So easy method is to insert only one plank and run corona test. Windows have ram test too at boot, but it's not very informative. But if it was ram you should see Screen of Death and info in logs.
Anyway you should check all, in nowdays new hardware very not stable, for example in past i have memory leak because of ethernet card driver

2017-10-30, 16:03:10
Reply #10

lolec

  • Active Users
  • **
  • Posts: 179
    • View Profile
I narrowed down the issue to using 2 ram sticks or more at the same time.

Here's what I did:

I tested each ram stick in each one of the slots and got no crashes, then started testing 2 sticks at a time and it appears that when I'm using more than one stick, the crash happens.

Tried to find something useful in the log, but it just stops at the time of the crash, in this case it looks like it crashed around 6:32 am.

I'm now 100% sure it's not corona's problem, I ran a ram test that boots from a USB drive, so no OS, no drivers, no corona... and it crashed in the same way.

However I cannot make it crash using any of the stress tests I found, probably because they are single focused? Maybe corona stresses the CPU and RAM at the same time ?

Does this sound like a RAM or Motherboard issue to you guys? I'm leaning towards Mobo but as I said, dont have much experience.

2017-10-30, 16:08:58
Reply #11

vahur6

  • Active Users
  • **
  • Posts: 7
    • View Profile
Do you happen to use MSI x399 motherboard?
I had this exact blackscreen issue with hardware intensive tasks.

Turns out MSI rushed the market with unstable product and only later updated the firmware to fix some major issues.
Once I downloaded the latest firmware the issues were gone.

If you use another board, you still might want to look into updating BIOS firmware.

2017-10-30, 16:42:19
Reply #12

lolec

  • Active Users
  • **
  • Posts: 179
    • View Profile
YES! MSI x399 carbon.

So updating the firmware then might fix this, will look into that.

Did you also have problems with the included network card?

I'm leaning to returning the mobo and switching to the asus zenith (the only one carried at the same store, they only refund in store credit) but I would have to pay 200 usd extra. Are you happy with the motherboard otherwise, or would you switch if you were me?

2017-10-30, 16:53:46
Reply #13

vahur6

  • Active Users
  • **
  • Posts: 7
    • View Profile
Yep, the firmware fixed all the issues I had with it.
download from here:  https://www.msi.com/Motherboard/support/X399-GAMING-PRO-CARBON-AC

I was pulling my hair out trying to figure out what was wrong with my new build. And only after week of troubleshooting everything they released this firmware fix. No issues after that.

I don't have any other complaints with this mobo, it has every possible feature I could ask for.

2017-10-30, 16:58:10
Reply #14

lolec

  • Active Users
  • **
  • Posts: 179
    • View Profile
Did you have issues with the network card too? were those fixed too?