Author Topic: DR server needs to be restarted issue - V3 daily builds update  (Read 11487 times)

2018-10-31, 18:18:13
Reply #15

Dionysios.TS

  • Active Users
  • **
  • Posts: 777
    • View Profile
    • Evolvia Imaging
I would like to share some thoughts with you guys.

This issue is very strange, there is something really blocking the process and very often I've noticed at the end that the freeze behaviour is not always there. Sometimes the DR Master seems sluggish and slow, the render remain time seems extremely long while the same scene opened on a local workstation using DR at the same time renders everything perfectly.

Sometimes I think that the problem could be on th Nvidia AI Denoiser but I am not sure. We had the freeze moments even when the denoiser was the Corona one but the sluggish behaviour is almost the same like having 2 sessions of 3ds Max opened locally and rendering in one of them an image with Nvidia AI on. It is sluggish as hell.

Maybe is a Frame Network bug, actually could be hundreds of things.

Would be perfect if we could replicate the issue on your network as well but I don't know how we can do that.

I have a scene who give us problems always lately when is rendered via Backburner + DR and the job is sent to the DR Server 04 and not locally.
Is quite heavy as is full of glass bricks and caustics are enabled on their glass. Don't know if this could help you or not. In that particular scene Corona gives us strange render remain timing when Backburner is used while locally + DR works fine.

Let me know if eventually we can prepare the scene for you.

Thanks,

Dionysios -

2018-11-04, 22:48:05
Reply #16

Giorgos Zacharioudakis

  • Active Users
  • **
  • Posts: 54
  • CG Artist at 500s studio
    • View Profile
    • 500s studio
Hi,

Does any have encountered the above message? Every time I use DR it freezes and I have to close DR server to get the master working. Then I get the error “sending masking samples to slaves time out”

Corona 3 RC4
Forming digital canvas with geometries shaped with lights, traveling the viewer’s mind through time and space.

2018-11-04, 23:14:39
Reply #17

Giorgos Zacharioudakis

  • Active Users
  • **
  • Posts: 54
  • CG Artist at 500s studio
    • View Profile
    • 500s studio
Increasing time from 60 to 120 seems to work. But why is this happing in corona 3?
Forming digital canvas with geometries shaped with lights, traveling the viewer’s mind through time and space.

2018-11-05, 13:58:42
Reply #18

maru

  • Corona Team
  • Active Users
  • ****
  • Posts: 13154
  • Marcin
    • View Profile
@Dionysios:

I'd like to sum up what we have so far. Please let me know if any of the below is not correct, or if you would like to add anything:

Main issue:
Rendering getting "stuck" when using Corona's distributed rendering with Autodesk Backburner.

Additional issues:
Sometimes the VFB becomes very slow.

Additional notes:
Quote
It seems if I use the DR servers from my local PC everything works fine.
Does this mean that if you use ONLY the Corona's distributed rendering (without Backburner), then everything is working fine, and the issue never appears?

Quote
next one started but now that I see the desktop the process seems alive but everything in the Corona UI is sooooo slow
So this means that you can see the VFB on the node computer, and that the VFB works very slowly. Right?
Is this happening when using Corona's DR + Backburner, or also when using Corona's DR only (without Backburner)?

-Sometimes the process gets un-stuck if you wait long enough - is this correct?

-When the rendering is stuck, are there always 2 or more instances of 3dsmax.exe in the task manager? Or is it sometimes stuck with just 1 3dsmax.exe in the task manager?

-Do you have all Windows Updates installed, including the newest Spectre patch?



@CloundN9:
Can you please contact us about this issue here https://coronarenderer.freshdesk.com/support/tickets/new and provide your full DR and Backburner logs (even if you are not using BB)? Here is how to get them: https://coronarenderer.freshdesk.com/support/solutions/articles/12000002065
Thanks.
Marcin Miodek | chaos-corona.com
3D Support Team Lead - Corona | contact us

2018-11-05, 14:46:54
Reply #19

Dionysios.TS

  • Active Users
  • **
  • Posts: 777
    • View Profile
    • Evolvia Imaging
Thanks for getting back Maru, I'll answer to your questions:

Quote
Does this mean that if you use ONLY the Corona's distributed rendering (without Backburner), then everything is working fine, and the issue never appears?
It happened only via Backburner, locally we never had any issues for now so I can confirm that DR locally seems to work fine.

Quote
So this means that you can see the VFB on the node computer, and that the VFB works very slowly. Right?
Is this happening when using Corona's DR + Backburner, or also when using Corona's DR only (without Backburner)?
Only when I use DR + Backburner, it starts working very slowly and after a while the process freezes.

Quote
Sometimes the process gets un-stuck if you wait long enough - is this correct?
Yes confirm!

Quote
When the rendering is stuck, are there always 2 or more instances of 3dsmax.exe in the task manager? Or is it sometimes stuck with just 1 3dsmax.exe in the task manager?
Unfortunately that's random but in most cases I saw a second instance some other times was only 1.

Quote
Do you have all Windows Updates installed, including the newest Spectre patch?
I don't know this info, I need to ask out IT manager. Every week we receive updates from our main server so if is important I can check with him right away.

Thanks!

2018-11-05, 14:58:25
Reply #20

maru

  • Corona Team
  • Active Users
  • ****
  • Posts: 13154
  • Marcin
    • View Profile
Thanks for the replies.

Quote
Do you have all Windows Updates installed, including the newest Spectre patch?
I don't know this info, I need to ask out IT manager. Every week we receive updates from our main server so if is important I can check with him right away.
It would be great if you could check this. There are some random issues after applying the spectre/meltdown fix, usually they are related to CPU usage and general performance when rendering, but who knows...

One more question:
Are you running only one job on the network, and then all nodes are working on this single job?
Or are you submitting multiple jobs, and various nodes pick up various jobs?
When there are a few masters and a few nodes on one network, and various jobs are submitted, then some issues may appear.
« Last Edit: 2018-11-05, 15:03:49 by maru »
Marcin Miodek | chaos-corona.com
3D Support Team Lead - Corona | contact us

2018-11-05, 15:45:11
Reply #21

Dionysios.TS

  • Active Users
  • **
  • Posts: 777
    • View Profile
    • Evolvia Imaging
Thanks for the replies.

Quote
Do you have all Windows Updates installed, including the newest Spectre patch?
I don't know this info, I need to ask out IT manager. Every week we receive updates from our main server so if is important I can check with him right away.
It would be great if you could check this. There are some random issues after applying the spectre/meltdown fix, usually they are related to CPU usage and general performance when rendering, but who knows...

One more question:
Are you running only one job on the network, and then all nodes are working on this single job?
Or are you submitting multiple jobs, and various nodes pick up various jobs?
When there are a few masters and a few nodes on one network, and various jobs are submitted, then some issues may appear.

I am sure 100% we didn't make any microcode updates so on the HW side things are the same as 1 year ago. As for the Windows OS side, we receive updates every 2 or 3 weeks so I don't know for now if that update is already installed. I checked the updates history but can't see anything for this.

As for the network, I run 1 job on the network and all the nodes are working on this. The network Master is only 1.
When we have issues we use, as explained before, our local workstations + DR.

2018-11-07, 11:07:09
Reply #22

Dionysios.TS

  • Active Users
  • **
  • Posts: 777
    • View Profile
    • Evolvia Imaging
I have a clue for you guys!!!

Right now we've got a freeze issue again.
Job sent a while ago via Backburner to our DR Master.

The job started succesfully, after a while the DR Master Corona UI was Freezed and the calculation became very very slow.
I checked all the DR servers and in this case one of them is one of ours workstations and I found out that was gaving error to send the EXR dump file.
The CPUs were 100% on power but the Corona process was freezed there as well.

I force the shutdown of DR Server application and the job on Backburner turned out normal again!

At that moment, my assistant had opened a 3ds Max session in the same time to work on a simple scene while her PC was in the DR process. Could be the reason?
That is something random btw. I found the same problem with the normal DR Servers being freezed cause they can't sent the EXR Dump file and they block all the process and guess what? Another 3ds Max process was on.

Here is the error I saw in the DR log:

2018-11-07 09:58:11   Finished EXR dump after 2 s
2018-11-07 09:58:11   Received sampling focus mask (region 0 2378 6100 2460)
2018-11-07 09:59:16   Started EXR dump (206 M bytes, filename: C:/Users/ABAGATELLA/AppData/Local/CoronaRenderer/DrData/dump8989.exr)
2018-11-07 09:59:19   Finished EXR dump after 2 s
2018-11-07 09:59:19   Received sampling focus mask (region 0 2460 6100 2542)
2018-11-07 10:00:24   Started EXR dump (206 M bytes, filename: C:/Users/ABAGATELLA/AppData/Local/CoronaRenderer/DrData/dump8990.exr)
2018-11-07 10:01:30   Started EXR dump (205 M bytes, filename: C:/Users/ABAGATELLA/AppData/Local/CoronaRenderer/DrData/dump8991.exr)
2018-11-07 10:02:38   Started EXR dump (205 M bytes, filename: C:/Users/ABAGATELLA/AppData/Local/CoronaRenderer/DrData/dump8992.exr)
2018-11-07 10:03:44   Started EXR dump (206 M bytes, filename: C:/Users/ABAGATELLA/AppData/Local/CoronaRenderer/DrData/dump8993.exr)
2018-11-07 10:04:51   Started EXR dump (205 M bytes, filename: C:/Users/ABAGATELLA/AppData/Local/CoronaRenderer/DrData/dump8994.exr)
2018-11-07 10:05:58   Started EXR dump (201 M bytes, filename: C:/Users/ABAGATELLA/AppData/Local/CoronaRenderer/DrData/dump8995.exr)
2018-11-07 10:07:04   Started EXR dump (186 M bytes, filename: C:/Users/ABAGATELLA/AppData/Local/CoronaRenderer/DrData/dump8996.exr)
2018-11-07 10:08:10   Started EXR dump (185 M bytes, filename: C:/Users/ABAGATELLA/AppData/Local/CoronaRenderer/DrData/dump8997.exr)
2018-11-07 10:09:17   Started EXR dump (184 M bytes, filename: C:/Users/ABAGATELLA/AppData/Local/CoronaRenderer/DrData/dump8998.exr)
2018-11-07 10:10:24   Started EXR dump (184 M bytes, filename: C:/Users/ABAGATELLA/AppData/Local/CoronaRenderer/DrData/dump8999.exr)
2018-11-07 10:10:24   Sending file to remote side failed
2018-11-07 10:11:30   Started EXR dump (183 M bytes, filename: C:/Users/ABAGATELLA/AppData/Local/CoronaRenderer/DrData/dump9000.exr)
2018-11-07 10:12:35   Started EXR dump (182 M bytes, filename: C:/Users/ABAGATELLA/AppData/Local/CoronaRenderer/DrData/dump9001.exr)
2018-11-07 10:13:41   Started EXR dump (181 M bytes, filename: C:/Users/ABAGATELLA/AppData/Local/CoronaRenderer/DrData/dump9002.exr)
2018-11-07 10:14:47   Started EXR dump (181 M bytes, filename: C:/Users/ABAGATELLA/AppData/Local/CoronaRenderer/DrData/dump9003.exr)
2018-11-07 10:15:54   Started EXR dump (180 M bytes, filename: C:/Users/ABAGATELLA/AppData/Local/CoronaRenderer/DrData/dump9004.exr)
2018-11-07 10:17:00   Started EXR dump (178 M bytes, filename: C:/Users/ABAGATELLA/AppData/Local/CoronaRenderer/DrData/dump9005.exr)
2018-11-07 10:18:05   Started EXR dump (173 M bytes, filename: C:/Users/ABAGATELLA/AppData/Local/CoronaRenderer/DrData/dump9006.exr)
2018-11-07 10:19:11   Started EXR dump (128 M bytes, filename: C:/Users/ABAGATELLA/AppData/Local/CoronaRenderer/DrData/dump9007.exr)
2018-11-07 10:20:15   Started EXR dump (114 M bytes, filename: C:/Users/ABAGATELLA/AppData/Local/CoronaRenderer/DrData/dump9008.exr)
2018-11-07 10:20:25   Sending file to remote side failed
2018-11-07 10:21:20   Started EXR dump (114 M bytes, filename: C:/Users/ABAGATELLA/AppData/Local/CoronaRenderer/DrData/dump9009.exr)
2018-11-07 10:22:24   Started EXR dump (114 M bytes, filename: C:/Users/ABAGATELLA/AppData/Local/CoronaRenderer/DrData/dump9010.exr)
2018-11-07 10:23:29   Started EXR dump (110 M bytes, filename: C:/Users/ABAGATELLA/AppData/Local/CoronaRenderer/DrData/dump9011.exr)
2018-11-07 10:24:33   Started EXR dump (106 M bytes, filename: C:/Users/ABAGATELLA/AppData/Local/CoronaRenderer/DrData/dump9012.exr)
2018-11-07 10:25:38   Started EXR dump (105 M bytes, filename: C:/Users/ABAGATELLA/AppData/Local/CoronaRenderer/DrData/dump9013.exr)
2018-11-07 10:26:42   Started EXR dump (104 M bytes, filename: C:/Users/ABAGATELLA/AppData/Local/CoronaRenderer/DrData/dump9014.exr)
2018-11-07 10:27:46   Started EXR dump (81 455 470 bytes, filename: C:/Users/ABAGATELLA/AppData/Local/CoronaRenderer/DrData/dump9015.exr)
2018-11-07 10:28:53   Started EXR dump (207 M bytes, filename: C:/Users/ABAGATELLA/AppData/Local/CoronaRenderer/DrData/dump9016.exr)
2018-11-07 10:30:00   Started EXR dump (207 M bytes, filename: C:/Users/ABAGATELLA/AppData/Local/CoronaRenderer/DrData/dump9017.exr)
2018-11-07 10:30:25   Sending file to remote side failed
2018-11-07 10:31:07   Started EXR dump (207 M bytes, filename: C:/Users/ABAGATELLA/AppData/Local/CoronaRenderer/DrData/dump9018.exr)
2018-11-07 10:32:13   Started EXR dump (207 M bytes, filename: C:/Users/ABAGATELLA/AppData/Local/CoronaRenderer/DrData/dump9019.exr)
2018-11-07 10:33:21   Started EXR dump (207 M bytes, filename: C:/Users/ABAGATELLA/AppData/Local/CoronaRenderer/DrData/dump9020.exr)
2018-11-07 10:34:27   Started EXR dump (207 M bytes, filename: C:/Users/ABAGATELLA/AppData/Local/CoronaRenderer/DrData/dump9021.exr)
2018-11-07 10:35:34   Started EXR dump (208 M bytes, filename: C:/Users/ABAGATELLA/AppData/Local/CoronaRenderer/DrData/dump9022.exr)
2018-11-07 10:36:40   Started EXR dump (208 M bytes, filename: C:/Users/ABAGATELLA/AppData/Local/CoronaRenderer/DrData/dump9023.exr)
2018-11-07 10:37:57   Started EXR dump (207 M bytes, filename: C:/Users/ABAGATELLA/AppData/Local/CoronaRenderer/DrData/dump9024.exr)
2018-11-07 10:39:04   Started EXR dump (207 M bytes, filename: C:/Users/ABAGATELLA/AppData/Local/CoronaRenderer/DrData/dump9025.exr)
2018-11-07 10:40:11   Started EXR dump (207 M bytes, filename: C:/Users/ABAGATELLA/AppData/Local/CoronaRenderer/DrData/dump9026.exr)
2018-11-07 10:40:25   Sending file to remote side failed
2018-11-07 10:41:18   Started EXR dump (207 M bytes, filename: C:/Users/ABAGATELLA/AppData/Local/CoronaRenderer/DrData/dump9027.exr)
2018-11-07 10:42:25   Started EXR dump (207 M bytes, filename: C:/Users/ABAGATELLA/AppData/Local/CoronaRenderer/DrData/dump9028.exr)
2018-11-07 10:43:32   Started EXR dump (206 M bytes, filename: C:/Users/ABAGATELLA/AppData/Local/CoronaRenderer/DrData/dump9029.exr)
2018-11-07 10:44:39   Started EXR dump (206 M bytes, filename: C:/Users/ABAGATELLA/AppData/Local/CoronaRenderer/DrData/dump9030.exr)
2018-11-07 10:45:45   Started EXR dump (205 M bytes, filename: C:/Users/ABAGATELLA/AppData/Local/CoronaRenderer/DrData/dump9031.exr)
2018-11-07 10:46:52   Started EXR dump (206 M bytes, filename: C:/Users/ABAGATELLA/AppData/Local/CoronaRenderer/DrData/dump9032.exr)
2018-11-07 10:47:59   Started EXR dump (206 M bytes, filename: C:/Users/ABAGATELLA/AppData/Local/CoronaRenderer/DrData/dump9033.exr)
2018-11-07 10:49:06   Started EXR dump (206 M bytes, filename: C:/Users/ABAGATELLA/AppData/Local/CoronaRenderer/DrData/dump9034.exr)
2018-11-07 10:50:15   Started EXR dump (206 M bytes, filename: C:/Users/ABAGATELLA/AppData/Local/CoronaRenderer/DrData/dump9035.exr)
2018-11-07 10:50:26   Sending file to remote side failed
2018-11-07 10:51:22   Started EXR dump (207 M bytes, filename: C:/Users/ABAGATELLA/AppData/Local/CoronaRenderer/DrData/dump9036.exr)
2018-11-07 10:52:28   Started EXR dump (207 M bytes, filename: C:/Users/ABAGATELLA/AppData/Local/CoronaRenderer/DrData/dump9037.exr)

As you can see, till hour 09:58:08 everything was fine.
After that, something happens and it gives Sending file to remote side failed.
Then why we have all those Started EXR dump messages from 10:00 and on??? And then we get always the sending falied error?

Hope all this can help!

Thanks,

Dionysios -

2018-11-07, 14:57:41
Reply #23

Dionysios.TS

  • Active Users
  • **
  • Posts: 777
    • View Profile
    • Evolvia Imaging
Happened again now and this time with 2 PCs (DR Servers) but no second 3ds Max instance was open on purpose or found in the task manager...

Dionysios -

2018-11-07, 16:04:59
Reply #24

Dionysios.TS

  • Active Users
  • **
  • Posts: 777
    • View Profile
    • Evolvia Imaging
Another weird thing today!

The DR Master appears in the DR Servers list without having the DR Server service open!
See attached file!

2018-11-08, 16:26:49
Reply #25

maru

  • Corona Team
  • Active Users
  • ****
  • Posts: 13154
  • Marcin
    • View Profile
What's "workstation" and "DR master" - shouldn't it be the same thing?
Maybe there is some IP conflict in your network, and two computers are getting the same IP?
Marcin Miodek | chaos-corona.com
3D Support Team Lead - Corona | contact us

2018-11-09, 14:36:08
Reply #26

maru

  • Corona Team
  • Active Users
  • ****
  • Posts: 13154
  • Marcin
    • View Profile
We have just released RC5, which fixes yet another issue (Fixed DR server sometimes getting stuck when restarting slave 3ds Max). If anyone with the issue where DR server needs to be restarted to trigger rendering is reading this, please test the newest RC and report to us whether there is an improvement: https://coronarenderer.freshdesk.com/support/solutions/articles/5000570015
Marcin Miodek | chaos-corona.com
3D Support Team Lead - Corona | contact us

2018-11-13, 10:21:25
Reply #27

Dionysios.TS

  • Active Users
  • **
  • Posts: 777
    • View Profile
    • Evolvia Imaging
What's "workstation" and "DR master" - shouldn't it be the same thing?
Maybe there is some IP conflict in your network, and two computers are getting the same IP?

I thought the same thing, but I don't see any conflicts here.
anyway, I'll install the RC5 now and let you know.

Thanks!

Dionysios

2018-11-13, 15:34:11
Reply #28

Dionysios.TS

  • Active Users
  • **
  • Posts: 777
    • View Profile
    • Evolvia Imaging
I installed the RC 5 and unfortunately we got right now the freeze problem!
Is very difficult to work under such circumstances and I really don't knwo how to help more.

My workstation was blocking the process this time but in the DR log I didn't see any of the sending EXR errors as usually were happened the last week. When I close the DR Server process on my workstation everthing started to work again perfectly.

Are we sure the NET Framework doesn't block the process of the system??? Or the Nvidia Denoiser? Or who knows what else...

Actually with the NET Framework we had some problems in the past.

For now the only thing I can say is that working like this is quite impossible.
« Last Edit: 2018-11-13, 16:07:14 by Dionysios.TS »

2018-11-13, 16:02:07
Reply #29

maru

  • Corona Team
  • Active Users
  • ****
  • Posts: 13154
  • Marcin
    • View Profile
When I close the DR Server process on my workstation everthing started to work again perfectly.
Can you please explain this sentence? Your DR setup is a bit confusing to me all this time.
By "workstation" we usually mean a computer where a person is sitting and doing some stuff in 3ds Max. It is also the computer where you click "render" inside 3ds Max and then it either renders with the help of nodes, or sends the job to BB.
By "nodes" we usually mean computers where 3ds Max is running in command line mode, without its UI exposed, and no one is using those computers for working with 3ds Max (they may even not have monitors).

Is it the same for you, or are you using your computers in some different way?

Also, is there ever a situation for you where:
-You are using more than 1 instance of 3ds Max on the same PC
-You are running Backburner Manager/Server/Monitor and Corona's DR server on one PC?
-You are running 3ds Max and Corona's DR server on once PC?
Marcin Miodek | chaos-corona.com
3D Support Team Lead - Corona | contact us