Author Topic: DR server needs to be restarted issue - V3 daily builds update  (Read 9523 times)

2018-10-24, 11:37:14

maru

  • Corona Team
  • Active Users
  • ****
  • Posts: 12758
  • Marcin
    • View Profile
We have received few reports from users having an issue with the DR server where it gets stuck and has to be restarted to continue rendering. Unfortunately we were never able to reproduce this problem and investigation based on your reports and minidumps did not lead us to the root cause of this bug yet. This makes us suspect that the issue might be related to specific hardware configuration, network setup, or 3rd party plugins and applications (however we are not saying it's not our fault!).

We have identified some other bugs which we believe may be related to this problem and we hope fixing them fixes this one as well. Some of the fixes have been already released in the recent builds, and some of them will be released in the upcoming ones.

We have also added a "Restart 3ds Max after each render" option to the DR server based on your requests. We hope this will serve as a workaround for this problem until we can identify the real cause and fix it properly.

Please let us know if you are still experiencing this "DR server needs to be restarted" issue with the "Restart 3ds Max" checkbox enabled and disabled as this will greatly help us improve distributed rendering further.

*Update: note that the new version of the DR server application is installed into C:\Program Files\Corona\DR Server\DrServer.exe while the old one was installed into C:\Program Files\Corona\DrServer.exe. Make sure you are launching the correct version of the DR server application. It must have "DrServer | 3 (Release Candidate X)" text printed in its title bar.

The newest daily build can be downloaded from https://coronarenderer.freshdesk.com/support/solutions/articles/5000570015
Feel free to share your feedback in this forum thread or through https://coronarenderer.freshdesk.com/support/tickets/new

Thank you in advance for testing and for your patience, and sorry for this inconvenience.

Update: 09.11.2018 - V3 RC5 released with yet another DR-related fix. Please try it.

« Last Edit: 2018-11-09, 14:37:26 by maru »
Marcin Miodek | chaos-corona.com
3D Support Team Lead - Corona | contact us

2018-10-25, 16:32:37
Reply #1

Dionysios.TS

  • Active Users
  • **
  • Posts: 766
    • View Profile
    • Evolvia Imaging
Unfortunately the master PC was stuck again this morning. I hoped the problem was disappeared but I was wrong.
Can't figure out why is happening.

It seems if I use the DR servers from my local PC everything works fine.
Using Backburner and sending the job to another server + DR produces randomly the block of the main servers while the DR ones continue to calculate I don't know even what?!?!?
Their task manager shows CPU at 100% while the main server is blocked.

Dionysios -

2018-10-25, 16:35:28
Reply #2

maru

  • Corona Team
  • Active Users
  • ****
  • Posts: 12758
  • Marcin
    • View Profile
Unfortunately the master PC was stuck again this morning. I hoped the problem was disappeared but I was wrong.
Did you try with the "restart 3ds Max" option on and off? It was the same in both cases?
Marcin Miodek | chaos-corona.com
3D Support Team Lead - Corona | contact us

2018-10-25, 16:37:27
Reply #3

Dionysios.TS

  • Active Users
  • **
  • Posts: 766
    • View Profile
    • Evolvia Imaging
Unfortunately the master PC was stuck again this morning. I hoped the problem was disappeared but I was wrong.
Did you try with the "restart 3ds Max" option on and off? It was the same in both cases?

We activated it right now. We have a very long list of renders to be done so if it works or not will come out by tomorrow morning.
For sure I'll let you know! :)

Dionysios -

2018-10-26, 10:19:15
Reply #4

Dionysios.TS

  • Active Users
  • **
  • Posts: 766
    • View Profile
    • Evolvia Imaging
No good news, the system got blocked 2 times tonight, again... That's so sad...

Last night at 11pm I had to log in from home via Team Viewer to the Master PC and the desktop was freezed. Fortunately I could start task manager and ill the 3ds Max process.
In a second, everything turned to normal. Backburner sent the same job again which was started successfully and duting the night the job was finished.
After that another job after 3 hours got done, and the next one started but now that I see the desktop the process seems alive but everything in the Corona UI is sooooo slow.
This is what happens when before the process gets blocked! If I press the tabs Post, Stats, History, DR, Lightmix, all of them have a very slow response.

This is all I can give for now and I don't know guys how you're going to resolve this issue. It gets really annoying...
We need to find a solution ASAP. Let me know if you need any extra data from us.

Thanks,

Dionysios -

2018-10-26, 14:19:42
Reply #5

maru

  • Corona Team
  • Active Users
  • ****
  • Posts: 12758
  • Marcin
    • View Profile
Thanks for testing Dionysios, and sorry to hear about your results. We will definitely do our best to fix this.
Marcin Miodek | chaos-corona.com
3D Support Team Lead - Corona | contact us

2018-10-26, 14:24:16
Reply #6

Dionysios.TS

  • Active Users
  • **
  • Posts: 766
    • View Profile
    • Evolvia Imaging
Thanks for testing Dionysios, and sorry to hear about your results. We will definitely do our best to fix this.

I know, is not your fault guys, I wish I could help actually...
I have an update, I wrote this morning the system was freezed, it was true, I left it there for a while and at the end it finished the render job and it goes on with the rest. So no crash for now.
We have 8 more jobes to be done in Backburner so I'll let you know.

We have the Restare 3ds Max option ON now.

Thanks,

Dionysios -

2018-10-26, 14:25:57
Reply #7

maru

  • Corona Team
  • Active Users
  • ****
  • Posts: 12758
  • Marcin
    • View Profile
the system was freezed, it was true, I left it there for a while and at the end it finished the render job and it goes on with the rest
Does it mean that you left the computer frozen, and then it unfroze and continued rendering?? Or maybe I misunderstood your message?
Marcin Miodek | chaos-corona.com
3D Support Team Lead - Corona | contact us

2018-10-26, 14:30:59
Reply #8

Dionysios.TS

  • Active Users
  • **
  • Posts: 766
    • View Profile
    • Evolvia Imaging
the system was freezed, it was true, I left it there for a while and at the end it finished the render job and it goes on with the rest
Does it mean that you left the computer frozen, and then it unfroze and continued rendering?? Or maybe I misunderstood your message?

Yes I confirm!

It was frozen, left it there and went on!

2018-10-26, 14:34:34
Reply #9

maru

  • Corona Team
  • Active Users
  • ****
  • Posts: 12758
  • Marcin
    • View Profile
It was frozen, left it there and went on!
Woah, I don't think we've ever had a similar report. Not sure if it's a good thing, as it might potentially make the issue even more confusing. :/
Marcin Miodek | chaos-corona.com
3D Support Team Lead - Corona | contact us

2018-10-26, 14:39:02
Reply #10

Dionysios.TS

  • Active Users
  • **
  • Posts: 766
    • View Profile
    • Evolvia Imaging
It was frozen, left it there and went on!
Woah, I don't think we've ever had a similar report. Not sure if it's a good thing, as it might potentially make the issue even more confusing. :/

I know... :(
But what could it be that freezes the 3ds Max / Corona process so much? And in never happens at the start of the rendering process but ALWAYS near to the end by the way.

2018-10-30, 11:44:30
Reply #11

Dionysios.TS

  • Active Users
  • **
  • Posts: 766
    • View Profile
    • Evolvia Imaging
I have some updates but I am not sure if they may help. Better share them with you in any case.

Lets start:

- Yesterday I sent to Backburner 6 jobs.
- At 10pm I checked via TeamViewer the DR Master to see how the jobs are doing.
- One of the job was freezed under this condition: Corona VFB was open but freezed, the noise threshold was reached so the job was completed at 100% but Corona was there, waiting and doing nothing.
- I decided to check what the others DR servers where doing at this point.
- DR Server 03 was in standby mode! Not working at all, normal if you consider that the rendering process had reached the 100% state.
- I then connected to my personal workstation which during the nights I use is as DR server as well, guess what??? The DR server mode was on rendering!!! WTF! :)
- I checked the Logs and I found that in that specific moment was calculating passes and couldn't send the file to the DR Master!!!
- I closed the DR server on my machine at this point and you know what? The DR Master finally completed the job instantly and saved the file...
- Last thing, I opend the task manager of my PC and I found another 3ds Max process going on but I closed the DR server a while ago. It seems that a second 3ds Max process was going on and maybe created prblems to the DR process at the end? I don't know guys.
- I started the DR server on my machine again and till now all the rest of Backburner jobs are go on without problems till now.

Excuse me for the long message here but I am trying to hel and be as detailed as I can.

Thanks,

Dionysios -

2018-10-30, 19:02:31
Reply #12

maru

  • Corona Team
  • Active Users
  • ****
  • Posts: 12758
  • Marcin
    • View Profile
Thanks for your message. It may be crucial that there is a 2nd instance of 3ds Max running. We have recently identified a similar issue.
We will investigate this - stay tuned for updates.
Marcin Miodek | chaos-corona.com
3D Support Team Lead - Corona | contact us

2018-10-30, 21:54:13
Reply #13

Dung (Ivan)

  • Corona Team
  • Active Users
  • ****
  • Posts: 38
    • View Profile
I have some updates but I am not sure if they may help. Better share them with you in any case.

Lets start:

- Yesterday I sent to Backburner 6 jobs.
- At 10pm I checked via TeamViewer the DR Master to see how the jobs are doing.
- One of the job was freezed under this condition: Corona VFB was open but freezed, the noise threshold was reached so the job was completed at 100% but Corona was there, waiting and doing nothing.
- I decided to check what the others DR servers where doing at this point.
- DR Server 03 was in standby mode! Not working at all, normal if you consider that the rendering process had reached the 100% state.
- I then connected to my personal workstation which during the nights I use is as DR server as well, guess what??? The DR server mode was on rendering!!! WTF! :)
- I checked the Logs and I found that in that specific moment was calculating passes and couldn't send the file to the DR Master!!!
- I closed the DR server on my machine at this point and you know what? The DR Master finally completed the job instantly and saved the file...
- Last thing, I opend the task manager of my PC and I found another 3ds Max process going on but I closed the DR server a while ago. It seems that a second 3ds Max process was going on and maybe created prblems to the DR process at the end? I don't know guys.
- I started the DR server on my machine again and till now all the rest of Backburner jobs are go on without problems till now.

Excuse me for the long message here but I am trying to hel and be as detailed as I can.

Thanks,

Dionysios -

Hi, are you using Corona DR? Or Backburner servers? I want to setup the same scenario and so far I was able to set up with Backburner servers only when using Backburner Monitor.

Thank you for your patience :)

2018-10-31, 10:33:44
Reply #14

Dionysios.TS

  • Active Users
  • **
  • Posts: 766
    • View Profile
    • Evolvia Imaging
Quote
Hi, are you using Corona DR? Or Backburner servers? I want to setup the same scenario and so far I was able to set up with Backburner servers only when using Backburner Monitor.

Thank you for your patience :)

Hi and I am glad to help!
 
I also have some updates but first I'll answer your question.
This is our render farm setup:

Workstation 1 (used as DR server during the night and during the day uses the DR Servers below if available for rendering tests)
Workstation 2 (used as DR server during the night and during the day uses the DR Servers below if available for rendering tests)
DR Server 04  (used as Backburner Manager and DR Master machine during the Backburner process)
DR Server 03  (only DR Server)

So basically, DR Server 04 Receives the jobs from Backburner and renders them using all the above computers in DR mode.

Here is my today update:

The freeze happened again tonight and I saw it remotely from my phone this morning.
Again, DR Server 04 was freezed but this time wasn't my workstation who was stuck on the saving process but DR Server 03.
"Magically", turning off DR Server process on the DR Server 03 resolved the problem on the DR GENERAL PROCESS and the image was saved instantly and Backburner loaded to the next job.

I took a capture screen images of the DR Server's task manager when I close the DR Server process and I see a Corona process on!!! What is this? (See image below)
Sorry for the size of the screenshot but was made by my phone.

So in general, one of the DR servers (Workstations or DR Server 03) during the Backburner process has difficulty to save the passes and goes on in infinity even if the passes on the master machine are 100% done. And all the Dr process freezes.

This is all I can report for now.

Thanks,

Dionysios -

« Last Edit: 2018-11-05, 13:47:46 by maru »