Author Topic: Distributed Rendering takes ages to get data from nodes  (Read 3586 times)

2015-11-13, 18:59:43

Dippndots

  • Active Users
  • **
  • Posts: 298
  • Alex Fagan Co-Founder at The Faction
    • View Profile
    • The Faction
This is probably an easy fix (hopefully), and I think its to do with network bandwidth, but I didn't have enough time to troubleshoot before I left the office. Basically I have 26 nodes rendering a 5k image; each did the required 5 or 6 passes in about 8min, but it took nearly an hour for all the data to get to the master machine and finish the render.

Should I reduce the max number of pixels transported or increase the time between sends? Or both?


2015-11-13, 19:19:06
Reply #1

Nekrobul

  • Primary Certified Instructor
  • Active Users
  • ***
  • Posts: 1028
    • View Profile
This is probably an easy fix (hopefully), and I think its to do with network bandwidth, but I didn't have enough time to troubleshoot before I left the office. Basically I have 26 nodes rendering a 5k image; each did the required 5 or 6 passes in about 8min, but it took nearly an hour for all the data to get to the master machine and finish the render.

Should I reduce the max number of pixels transported or increase the time between sends? Or both?

Try this numbers. 3 sec synchronisation interval Max pixels 50000. Thise worked out for me with limited VPN connection.
---------------------------------------------------------------
https://www.blackbellstudio.com/
https://www.behance.net/blackbell3d
CEO at "Blackbell Studio"

2015-11-14, 11:34:54
Reply #2

Dippndots

  • Active Users
  • **
  • Posts: 298
  • Alex Fagan Co-Founder at The Faction
    • View Profile
    • The Faction
Thanks I'll give that a go on Monday. Perhaps I should've mentioned all the machine are on the same local network, if that makes any difference?

2015-11-16, 10:31:57
Reply #3

Dippndots

  • Active Users
  • **
  • Posts: 298
  • Alex Fagan Co-Founder at The Faction
    • View Profile
    • The Faction
This worked pretty well, down to 27 minutes so only about ~17 minutes of it grabbing data from other nodes. Though I'm having another issue with all my render elements that might be prolonging this. I'll make a separate post for it.

2015-11-16, 11:03:20
Reply #4

Nekrobul

  • Primary Certified Instructor
  • Active Users
  • ***
  • Posts: 1028
    • View Profile
This worked pretty well, down to 27 minutes so only about ~17 minutes of it grabbing data from other nodes. Though I'm having another issue with all my render elements that might be prolonging this. I'll make a separate post for it.

27 minutes is quet long for single image. What kind of connetion do you have?
---------------------------------------------------------------
https://www.blackbellstudio.com/
https://www.behance.net/blackbell3d
CEO at "Blackbell Studio"

2015-11-16, 11:17:55
Reply #5

Dippndots

  • Active Users
  • **
  • Posts: 298
  • Alex Fagan Co-Founder at The Faction
    • View Profile
    • The Faction
Well about 10 minutes of that 27 was rendering. All the machines are talking to each other at a minimum of 1Gbps

2015-11-16, 11:21:43
Reply #6

Nekrobul

  • Primary Certified Instructor
  • Active Users
  • ***
  • Posts: 1028
    • View Profile
Well about 10 minutes of that 27 was rendering. All the machines are talking to each other at a minimum of 1Gbps

This is realy realy wierd. Actyaly the data should be dumped while it renders and would not take any aditional time to dump it after render finishes.
---------------------------------------------------------------
https://www.blackbellstudio.com/
https://www.behance.net/blackbell3d
CEO at "Blackbell Studio"

2015-11-16, 11:31:50
Reply #7

Dippndots

  • Active Users
  • **
  • Posts: 298
  • Alex Fagan Co-Founder at The Faction
    • View Profile
    • The Faction
Yeah, this was how 1.1 was working for us, I just thought it was one of the changes to distributed rendering in 1.2 or 1.3.

Just to be clear this is how the render happens: The master machine renders 5 passes (the total required passes are 125, so I'm assuming that it knows the other 25 machines will do 5 passes as well), another window pops up saying collecting data from slaves, at this point, the master machine stops rendering (cpu usage drops down under 20%). Then for the rest of the time, it is just collecting data from the slaves. All 26 machines are the same, so they all finish their 5 passes around the same time.

This after render dumping might be tied into my other problem with distributed rendering via backburner:
https://forum.corona-renderer.com/index.php/topic,10297.0.html