Author Topic: DR render problem.  (Read 8118 times)

2018-06-25, 14:27:01

d30evoco

  • Users
  • *
  • Posts: 3
    • View Profile
Hey guys hows it going?
We are in the middle of a new project and recently just for that project alone we upgraded our pcs. So the specs are 1950x threadrippers with msi399x motherboards, each one equipped with 64 gb ram (win 10,max 2018, corona 1.7.4). Everything looked dandy until we got to the dr part, heres a problem: so on average one render (4k) takes about 1 hour on 1 computer to reach 5% noise level. Now when we connected 2 additional pcs we were hoping to get the same noise level faster at least 2 times faster, but it didnt happen. It took exactly the same amount of time. I wonder if anyone else encountered this problem and if so how do you guys dealt with it? As for the things we tried: lan is fine 100mb, we tried different corona builds even corona 2.0 but it didnt really help, we didnt set the limit by number of passes or by time, only by noise level. So where did we go wrong? Your help appreciated, cheers.

 

2018-06-25, 15:57:30
Reply #1

TomG

  • Administrator
  • Active Users
  • *****
  • Posts: 5468
    • View Profile
Just some quick initial questions - what did DR report in the VFB? Did it show the machines contributing passes? Do you have the DR logs to show?
Tom Grimes | chaos-corona.com
Product Manager | contact us

2018-06-25, 17:50:41
Reply #2

d30evoco

  • Users
  • *
  • Posts: 3
    • View Profile
Hey thanks for the reply, yeah dr slaves indeed contributed passes (they were connected as far as i remember there were no mistakes) albeit real slow. Regarding logs unfortunately we didnt save them, but that would be easy to replicate and save them.

2018-06-25, 17:58:01
Reply #3

TomG

  • Administrator
  • Active Users
  • *****
  • Posts: 5468
    • View Profile
TY for the info! I'll have to let the developers say what would be the most useful info for diagnosing from this point on, could be DR logs, maybe even the scene etc. not sure what would be best for them.
Tom Grimes | chaos-corona.com
Product Manager | contact us

2018-06-26, 11:55:48
Reply #4

urbanite

  • Active Users
  • **
  • Posts: 13
    • View Profile
Hi

I have just install Corona 2 and I am having same/similar problem. On my workstation on test scene I am getting 15 passes in 6 minute. Adding 2 nodes (i7 and AMD TR) changed nothing in therms of speed although I see that they are 100% load in task manager. I cannot see the image preview corona dr slaves. Running max 2014.

2018-06-27, 09:54:18
Reply #5

d30evoco

  • Users
  • *
  • Posts: 3
    • View Profile
Any luck with that problem?

2018-06-27, 10:09:35
Reply #6

Frood

  • Active Users
  • **
  • Posts: 1922
    • View Profile
    • Rakete GmbH
Can someone please at least post a screen from "Stats" and "DR" tab of Corona VFB while rendering? Without any further information this is hard to judge. And just to be sure: you are not using some additional time limit?


Good Luck


Never underestimate the power of a well placed level one spell.

2018-06-27, 13:36:20
Reply #7

urbanite

  • Active Users
  • **
  • Posts: 13
    • View Profile
No additional time limits. Further tests shows that that only old scenes (corona 1.6 in my case) were having trouble with DR. Merging to the new one solved it.

2018-07-23, 20:37:58
Reply #8

Mario Rothenbühler

  • Active Users
  • **
  • Posts: 20
    • View Profile
    • ROBO STUDIO GmbH
Same Problem here
render with 3 or 4 machines is not much more faster then 1 machine
alle machines contribute passes and CPU load is @ 100%
ROBO STUDIO GmbH
Ringstrasse 40  | 4900 Langenthal
mr@robostudio.swiss  | Mobile: +41 794 808 193
www.robostudio.swiss

2018-07-24, 13:21:19
Reply #9

maru

  • Corona Team
  • Active Users
  • ****
  • Posts: 12768
  • Marcin
    • View Profile
If anyone is having similar problems, please report them to support@corona-renderer.com or through https://coronarenderer.freshdesk.com/support/tickets/new
That's the best way to have your issue looked into and resolved.
Marcin Miodek | chaos-corona.com
3D Support Team Lead - Corona | contact us

2018-07-25, 14:07:08
Reply #10

Mario Rothenbühler

  • Active Users
  • **
  • Posts: 20
    • View Profile
    • ROBO STUDIO GmbH
Dear Team

I do some real tests here with 4 machines... i facing the weird rendertimes also on network rendering
first i think we have to solve this weird issue... and after then we can solve DR rendertimes

specs of the machines are (Corona benchmark 1.3)

T7500-1 (local, gbit):
Corona 1.3 Benchmark Finished
BTR Scene 16 passes
Intel(R) Xeon(R) CPU X5670 @ 2.93GHz
 Real CPU Frequency [GHz]: 3
Render Time: 0:02:28, Rays/sec: 3 271 340

T7500-2 (local, gbit):
Corona 1.3 Benchmark Finished
BTR Scene 16 passes
Intel(R) Xeon(R) CPU X5680 @ 3.33GHz
 Real CPU Frequency [GHz]: Undetected
Render Time: 0:02:16, Rays/sec: 3'551'220

T7500-3 (local, gbit):
Corona 1.3 Benchmark Finished
BTR Scene 16 passes
Intel(R) Xeon(R) CPU X5680 @ 3.33GHz
 Real CPU Frequency [GHz]: Undetected
Render Time: 0:02:09, Rays/sec: 3'746'800

Render-02 (offsite, VPN, 100Mbit):
Corona 1.3 Benchmark Finished
BTR Scene 16 passes
Intel(R) Xeon(R) Platinum 8168 CPU @ 2.70GHz (x2)
 Real CPU Frequency [GHz]: 2.7
Render Time: 0:00:32, Rays/sec: 14,957,200

Regarding the Corona Benchmark the Azure RS-2 have to be the x4 speed of the other machines.
But in real case it is only x2. this makes no sense... compared to vray the scaling of CPU renderpower is linear... more cores more clock = shorter rendertimes.... linear maybe 2-5% loss in network communication. but not more

attached screenshot of rendering all 4 machines
and screen of the backburner rendertimes

also attached the scene file... very basic.. not much in the file.

thank you for your help in advance

/.mario'


ROBO STUDIO GmbH
Ringstrasse 40  | 4900 Langenthal
mr@robostudio.swiss  | Mobile: +41 794 808 193
www.robostudio.swiss

2018-08-08, 11:41:22
Reply #11

maru

  • Corona Team
  • Active Users
  • ****
  • Posts: 12768
  • Marcin
    • View Profile
Dear Team

I do some real tests here with 4 machines... i facing the weird rendertimes also on network rendering
first i think we have to solve this weird issue... and after then we can solve DR rendertimes

specs of the machines are (Corona benchmark 1.3)

T7500-1 (local, gbit):
Corona 1.3 Benchmark Finished
BTR Scene 16 passes
Intel(R) Xeon(R) CPU X5670 @ 2.93GHz
 Real CPU Frequency [GHz]: 3
Render Time: 0:02:28, Rays/sec: 3 271 340

T7500-2 (local, gbit):
Corona 1.3 Benchmark Finished
BTR Scene 16 passes
Intel(R) Xeon(R) CPU X5680 @ 3.33GHz
 Real CPU Frequency [GHz]: Undetected
Render Time: 0:02:16, Rays/sec: 3'551'220

T7500-3 (local, gbit):
Corona 1.3 Benchmark Finished
BTR Scene 16 passes
Intel(R) Xeon(R) CPU X5680 @ 3.33GHz
 Real CPU Frequency [GHz]: Undetected
Render Time: 0:02:09, Rays/sec: 3'746'800

Render-02 (offsite, VPN, 100Mbit):
Corona 1.3 Benchmark Finished
BTR Scene 16 passes
Intel(R) Xeon(R) Platinum 8168 CPU @ 2.70GHz (x2)
 Real CPU Frequency [GHz]: 2.7
Render Time: 0:00:32, Rays/sec: 14,957,200

Regarding the Corona Benchmark the Azure RS-2 have to be the x4 speed of the other machines.
But in real case it is only x2. this makes no sense... compared to vray the scaling of CPU renderpower is linear... more cores more clock = shorter rendertimes.... linear maybe 2-5% loss in network communication. but not more

attached screenshot of rendering all 4 machines
and screen of the backburner rendertimes

also attached the scene file... very basic.. not much in the file.

thank you for your help in advance

/.mario'

I am sorry, but I do not fully understand your report. For example, I do not see the Corona 1.3 Benchmark results of the "AZURE-RS-2" computer. What exactly is that computer, and what is its hardware?
I think it would be best if you could contact us using this form https://coronarenderer.freshdesk.com/support/tickets/new and then we would ask you some further questions to find out how exactly we could help. One issue per one support ticket please, so if you have two separate issues (one with render performance, and another one with DR), then please send them separately.

Thank you in advance,
Marcin
Marcin Miodek | chaos-corona.com
3D Support Team Lead - Corona | contact us

2018-08-08, 11:52:49
Reply #12

Mario Rothenbühler

  • Active Users
  • **
  • Posts: 20
    • View Profile
    • ROBO STUDIO GmbH
Hi Marcin, Thank you for your Feedback

The Specs and Benchmark Results are the one from "Render-02"

Thank you

/.mario'
ROBO STUDIO GmbH
Ringstrasse 40  | 4900 Langenthal
mr@robostudio.swiss  | Mobile: +41 794 808 193
www.robostudio.swiss

2018-08-13, 18:47:44
Reply #13

nicolasZ

  • Active Users
  • **
  • Posts: 16
    • View Profile
Hi,

I think I have the same problem as you.

When I render with the node with noise level limit enabled, it takes the same amount of time it would with only one machine.

I have tried to render only with 1 thread on my master ( set 1 on #of theards in system settings) and with one slave machine :  the slave only adds data on a little stripe and that doesn't affect the noise level limit.

I don't know why, but I'm pretty sure it has been doing this since the last Windows update.

Before this Windows update everthing worked fine with Corona DR (with 1.7.4 and V2). FYI : I'm working with Corona 2 and I have tried with 3dsmax 2016 and 2018


@ TomG and maru : I have already reported this problem (ticket 12673).

I hope you guys will find a solution to that problem, because we have big projects that will begin soon.

Many thanks

2018-08-15, 09:47:02
Reply #14

dj_buckley

  • Active Users
  • **
  • Posts: 875
    • View Profile
I'm seeing similar results.  If I set a render going on the host, leave it for 20 or so passes, I get a decent time estimate.  If I then fire up the Node, the time estimation doesn't change at all once the node has kicked in and rendered a few passes.  I'd expect it to nearly half (if not more as the node is far quicker than the host).

2018-08-16, 13:45:21
Reply #15

nicolasZ

  • Active Users
  • **
  • Posts: 16
    • View Profile
@dj_buckley : Could you please, do the same test like me to see your result ?

Master set 1 on #of theards in system settings  and 1 node

Thanks

2018-08-17, 04:58:46
Reply #16

thehay95

  • Users
  • *
  • Posts: 1
    • View Profile
I have encountered the same problem
tìm giúp việc nhà an giang, dich vu giup viec nha, dich vu chuyen nha chuyen nghiep

2018-08-21, 18:05:11
Reply #17

LorenzoS

  • Active Users
  • **
  • Posts: 291
    • View Profile
Hi all,
it seems i have similar problem on DR.
Workstation:Ryzen Threadripper 1950X on win 10, 64 bit.
3 node i7 2700k on win 7, 64bit.

The contribution of the 3 nodes is too low than it should be, sometime i notice also long parsing time.

2018-08-29, 16:39:15
Reply #18

nicolasZ

  • Active Users
  • **
  • Posts: 16
    • View Profile

2018-08-29, 20:30:50
Reply #19

cgifarm

  • Active Users
  • **
  • Posts: 55
  • Your Brand New RenderFarm
    • View Profile
    • CGIFarm
Hey guys,

I see your render times are very small and here is my experience with distributed render on our render farm:

We use distributed render on frames which take longer than 30 minutes to render. I specify "Render" because
it might take up to 12 - 20 mins just to load the scene in memory, and that doesn't count as rendering.

If you are working with a scene which hase lots of assets, all those assets must be loaded in memory, transported over
network (network can be bottleneck as well as you mentioned you are using only 100 MB switches.)

My recommendation is to use scenes which takes longer to render for distributed render, otherwise I recommend submitting
1 frame per machine if it's 6-10 mins renders.

When using DBR server you should not expect rendering twice faster and that's because the Master machine is sending "pixel" coordinates
for each node to render then it receives the data and adds that information to the main file. This is happening over network for each job sent
to the nodes. While this is very fast on local PC with the CPU communicatingwith RAM, over network is a different story.

Good luck!

Alex
Working on a Renderfarm Platform - checkout our website cgifarm.com and our cost calculator : https://www.cgifarm.com/renderfarm-cost-calculator

2018-08-29, 22:34:15
Reply #20

dj_buckley

  • Active Users
  • **
  • Posts: 875
    • View Profile
Makes sense, but when you only have two PC's which are connected in the same room through a small (but fast) switch.  I'd expect to see some pretty instant speed improvement when the node kicks in, especially when the node is twice as fast as the host to render a single frame.  I'm also rendering images which take 5 hours+. 

Admittedly I need to find a day where I can leave a render going on one machine until completion and then render again with DR on and a do a full comparison to completion.  It's just not always practical in a production environment.  It was obvious in VRay when you could see buckets, it's guesswork in Corona (any progressive renderer) without actually sitting there for a full render to complete as described above.

2018-08-29, 23:08:17
Reply #21

cgifarm

  • Active Users
  • **
  • Posts: 55
  • Your Brand New RenderFarm
    • View Profile
    • CGIFarm
Makes sense, but when you only have two PC's which are connected in the same room through a small (but fast) switch.  I'd expect to see some pretty instant speed improvement when the node kicks in, especially when the node is twice as fast as the host to render a single frame.  I'm also rendering images which take 5 hours+. 

Admittedly I need to find a day where I can leave a render going on one machine until completion and then render again with DR on and a do a full comparison to completion.  It's just not always practical in a production environment.  It was obvious in VRay when you could see buckets, it's guesswork in Corona (any progressive renderer) without actually sitting there for a full render to complete as described above.

The best thing is to see how many passes your DR node did and subract that from the total amount. Then you will know which node rendered how much of the image.

You need to keep an eye on the stats while it renders or setup a screenshot program to take screenshots every minute or so.

It's a paint to test, but ya, vray with bucket shows machine name while rendering, but this is different, because each machine gets the full image to work on for a full pass.

It's better to setup test scenes which are known to render for 1 hour on one machine for example then do test on DR. In my experience is like it will be about 80% of the capacity of the second machine because of the extra waiting time for network traffic and job management from the master. Takes time to do these tests, but it's good to know your tools and what to expect
so you know how to set the deadlines and costs for the clients, for me an extra 30% of what I estimate is my golden number which I add all the time, even if I do a great work planning, it's going to be always something which requires more attention if it's a new project.

I would also recommend having the master machine the strongest if it's also doing rendering, otherwise it won't have enough cores to distribute the job to the slave, and slave might be waiting much longer for a job.

In some situations you can setup the master machine not to do any rendering at all if you use more DR nodes, so it will just spawn jobs for the nodes and compute the final image.

Good luck!
Working on a Renderfarm Platform - checkout our website cgifarm.com and our cost calculator : https://www.cgifarm.com/renderfarm-cost-calculator

2018-08-29, 23:17:22
Reply #22

cgifarm

  • Active Users
  • **
  • Posts: 55
  • Your Brand New RenderFarm
    • View Profile
    • CGIFarm
Hi all,
it seems i have similar problem on DR.
Workstation:Ryzen Threadripper 1950X on win 10, 64 bit.
3 node i7 2700k on win 7, 64bit.

The contribution of the 3 nodes is too low than it should be, sometime i notice also long parsing time.

Hi,

Long parsing time can be when all 3 nodes are copying files at the same time from network. Please check your network speed, hard drive
or lack of enough ram on the network storage can make things worst.

If you are using the master as network sharing and also renders at the same time, it can affect performance a lot. Best thing is to use dedicated
sharing server for the assets and at least 1GB network connection if not 10GB fiber optics to make sure network is not bottle neck when doing DR.

To test network performance, you can test with 1 extra node, check parsing time, then add the other 2 etc.

You also need to take into the account that a render node will start a 3ds max instance and load the scene from scratch. On your workstation
the scene is loaded and you might be deceived that it starts rendering right away, and the others are still parsing.

Not sure what your real problem can be, but I just put some of my thoughts based on the little info provided.

Good luck!

Alex
Working on a Renderfarm Platform - checkout our website cgifarm.com and our cost calculator : https://www.cgifarm.com/renderfarm-cost-calculator

2018-09-03, 10:32:54
Reply #23

LorenzoS

  • Active Users
  • **
  • Posts: 291
    • View Profile
I Alex,
thanks for replay.
changing the switch 10/100/1000 a notice improvement.
I will check the other aspects you suggested.


2018-09-03, 11:27:02
Reply #24

cgifarm

  • Active Users
  • **
  • Posts: 55
  • Your Brand New RenderFarm
    • View Profile
    • CGIFarm
I Alex,
thanks for replay.
changing the switch 10/100/1000 a notice improvement.
I will check the other aspects you suggested.

You're welcome!
Working on a Renderfarm Platform - checkout our website cgifarm.com and our cost calculator : https://www.cgifarm.com/renderfarm-cost-calculator

2018-09-21, 12:07:04
Reply #25

Luis.Goncalves

  • Active Users
  • **
  • Posts: 33
    • View Profile
Guys. For me, something that helped a lot with this issue is going to the System tab and in the synchronization interval lower the value from 60s to 5s

I've done tests with this and here are my results

Synch 1s = 2:54m
Synch 5s = 3:02m
Synch 10s = 3:44m
Synch 60s = 5:26m

I've also sent the same image with the backburner stripes and the stripe that took the longest was  3:47m so anything lower than that (with the DR) means that the renderfarm is being well used.
« Last Edit: 2018-09-21, 12:38:02 by Luis.Goncalves »

2018-12-06, 08:07:11
Reply #26

Anadt

  • Users
  • *
  • Posts: 1
    • View Profile

2018-12-06, 13:24:20
Reply #27

hrvojezg00

  • Active Users
  • **
  • Posts: 273
    • View Profile
    • www.as-soba.com
Seems like we have the same problem. This wasn`t the case with Corona 1.7 or 2, but 3 has alot of issues with DR which is slowing us down alot! Corona team, please make progress on this issue.

Thanks,
Hrvoje

2018-12-17, 12:19:56
Reply #28

maru

  • Corona Team
  • Active Users
  • ****
  • Posts: 12768
  • Marcin
    • View Profile
We are working on improving DR. Unfortunately, as you know, network-related issues are complex, so it will take time.
Meanwhile, I would appreciate it if you could report your issues over at the support portal: https://coronarenderer.freshdesk.com/support/tickets/new
Do not forget to include DR logs and Backburner log as they may prove useful: https://coronarenderer.freshdesk.com/support/solutions/articles/12000002065

What you can try:

1) Make sure that you are running the same version of Corona on all PCs (e.g. Corona Renderer 3) and the corresponding version of DR server on all nodes (e.g. DR server 3). Keep in mind that in Corona Renderer 3 DR server is by default installed into C:\Program Files\Corona\DR Server and in older versions it was just C:\Program Files\Corona\

2) After installing newer version of Corona and DR server (e.g. 2 > 3) your antivirus and firewall software may treat it like new software, so you may need to include it as exception once again, otherwise it will block it. What works for some users is also adding firewall rules manually: https://coronarenderer.freshdesk.com/support/solutions/articles/12000050816

Marcin Miodek | chaos-corona.com
3D Support Team Lead - Corona | contact us