Author Topic: Slow DR Parsing etc  (Read 1432 times)

2023-11-10, 01:03:47

dj_buckley

  • Active Users
  • **
  • Posts: 875
    • View Profile
Why are DR nodes so slow to join in?

Admittedly it's a relatively heavy scene as you can see from how much RAM it's using.  But i'm 25 minutes into the render on the main workstation and in the first screenshot, the nodes are still parsing/downloading assets etc.

Everything is reading and writing from the same server, all machines are very similar spec's, all using the same software Corona 10 Hotfix 2 Max 2024.

Also I can't help but come back to that "DR nodes use more RAM than the main workstation issue" from years ago.  It still blows my mind.  It shoots up to over 180GB+ on the nodes which then causes them to fail rendering.  But the main PC renders it fine and happily churns along at 75GB RAM usage.

As you can see in the second screenshot.  DAVECO01 has now failed and reverted back to 'Waiting for Render Start' and DAVECO03 has shot up to over 180GB.  Also note on the second screenshot, I'm now 35 minutes into the render.  It's taken 35 minutes for one of the nodes to fail and the other to still be on precomputing GI Cache while using insane amounts of RAM, all the while the main workstation has been happily chugging through the render without issue.

Edi:   Aaaaaand just as I submitted that post, the second node failed.

2023-11-10, 09:20:49
Reply #1

Frood

  • Active Users
  • **
  • Posts: 1922
    • View Profile
    • Rakete GmbH
Why are DR nodes so slow to join in?

Counter question: how long does it take on the master to load the scene, press render and to have it actually rendering the first pass?

Also I can't help but come back to that "DR nodes use more RAM than the main workstation issue" from years ago.  It still blows my mind.  It shoots up to over 180GB+ on the nodes which then causes them to fail rendering.  But the main PC renders it fine and happily churns along at 75GB RAM usage.

You are comparing the wrong data. Those 75GB of the render status window is the active working set of the 3ds Max process, while the 180GB of the DR status display is the ram commit size (additionally: for the complete system. This should not make a significant difference here if the nodes are dedicated and nothing else is running). You'd have to compare the 180GB to the second value of the render status window of the master: 235GB. So the master uses more ram according to your screenshots, not the other way round.

Best is to compare using taskmanager and activating all ram relevant columns in the "Details" tab. Do this on master and slave. Still the question is: who/what requests (and does not use) such a large amount of memory. Do you use any fancy plugin here? I'm aware of strange commit sizes using Max generally, but 75GB vs. 180+ (commit size) looks in fact extreme. Does the master uses "only" 75GB for the process from start or is it much larger when loading/starting to render and later goes down to 75GB?

As for the fail: do the nodes actually crash? As always: logs, logs, logs from the slave will help:

- 3ds Max log: "Max.log"
- DrServer log: "DrLog.txt"
- Corona log: "CoronaMax2024_log.txt", "CoronaMax2024_errors.txt"


Good Luck



« Last Edit: 2023-11-10, 09:27:27 by Frood »
Never underestimate the power of a well placed level one spell.

2023-11-10, 09:27:38
Reply #2

dj_buckley

  • Active Users
  • **
  • Posts: 875
    • View Profile
First question, about 5 minutes to load, 0.5 seconds to press render, less than 5 minutes to start rendering

Master does use more when loading but doesn't exceed 100GB.

The point is, regardless of task managers details etc etc. I can render it on all 3 individually, if I open the scene on each and just render.  But I can't through DR.

Another issue while I remember.  When I first open a scene, fresh 3ds max, open scene and render.  It struggles with all of the parsing and loading and I get ram warnings.  However if I cancel the render and render again, it all goes through much quicker with no issues.

I will be able to get it to render by optimizing as there's a lot of unnecessary displacement for this shot etc but it's besides the point.


« Last Edit: 2023-11-10, 10:02:52 by dj_buckley »

2023-11-10, 09:29:49
Reply #3

dj_buckley

  • Active Users
  • **
  • Posts: 875
    • View Profile
I'll have more time today to investigate.  And I appreciate that the numbers aren't direct comparisons.  But such a gulf between them seems crazy.  The nodes are literally dedicated.  Nothing else running on them other than default windows processes.  They're just sat in a corner of the room solely dedicated to rendering.

2023-11-10, 10:19:19
Reply #4

Frood

  • Active Users
  • **
  • Posts: 1922
    • View Profile
    • Rakete GmbH
First question, less than 2 minutes.

Oh amazing. Starting Max alone takes about 30 seconds on my computer :) So it loads, parses and starts rendering in 2 minutes in fact?

Edit: seen your edit :)

The point is, regardless of task managers details etc etc. I can render it on all 3 individually, if I open the scene on each and just render.  But I can't through DR.

That's interesting, more later.

Another issue while I remember.  When I first open a scene, fresh 3ds max, open scene and render.  It struggles with all of the parsing and loading and I get ram warnings.  However if I cancel the render and render again, it all goes through much quicker with no issues.

That's even more interesting because with scenes using all available ram, I struggle with this phenomenon since ever. Look at the graph (older one I captured, but still applies). It shows a BB render node crunching a job. At the fist bubble, I logged in and started another Max session and rendered some scene locally (flat part of passes graph). Second render stops at the second bubble. Look at the ram consumption of the first job: It drops from 15+GB to about 6GB and stays at that level to the end. I can observe this until now, also with larger scenes and larger impact. Sometimes it's even enough to log on and do anything on the node.

One possible explanation is: 3ds max holds an entire copy of the scene while the renderer has another. At some point, a kind of purge seems to happen. I never found out how to trigger it by purpose.

But such a gulf between them seems crazy.

I still don't see that until you check the working set (and not the commit size) of the dr node max process, curious what you will find.


Good Luck



« Last Edit: 2023-11-10, 10:23:44 by Frood »
Never underestimate the power of a well placed level one spell.

2023-11-10, 10:20:37
Reply #5

dj_buckley

  • Active Users
  • **
  • Posts: 875
    • View Profile
Yep i misread the question, i've since edited about loading times etc

2023-11-11, 17:25:00
Reply #6

dj_buckley

  • Active Users
  • **
  • Posts: 875
    • View Profile
Still testing all of this.  But out of interest, when optimizing.  What's more important, Texture Resolution or Texture File Size?  i.e. if the texture is 8k but the file is only a few MB does it need optimizing?

Anyway my first test, I've just been watching the render with all of the windows open via remote desktop for one of the nodes.

The process in the VFB (underlined red) seems to be "Parsing (Slowwwww reaches around 70GB Commit) > Downloading Assets (Slowwwwwww again CPU Usage down to 1% +14GB Commit) > Parsing Scene (CPU Usage Still less than 1% +3GB Commit, then CPU Usage jumps just above 1% + another 12GB Commit) > Displacement (+5GB Commit) >  Building Acc Structure (+15GB Commit) >  Pre Computing GI Cache (+30GB Commit - Commit Peaks at roughly 150GB) > Rendering (Commit starts to drop).

Max Process (underlined blue) does not exceed 90GB at any point during any of the above.

« Last Edit: 2023-11-11, 18:07:59 by dj_buckley »

2023-11-11, 18:09:16
Reply #7

dj_buckley

  • Active Users
  • **
  • Posts: 875
    • View Profile
I was also expecting these circled numbers to match or have I got that wrong

2023-11-11, 21:03:42
Reply #8

dj_buckley

  • Active Users
  • **
  • Posts: 875
    • View Profile
So further investigation.  I've just been waching the scene on my workstation with DR turned off.

There are a couple of things I find odd.  When I first open the scene on a fresh Max open and press render.  The RAM usage/Max Process RAM is insane but only on that first render.  On first render the Max process shoots up to over 100GB.  But .... if I cancel the render.  The Max process idles between 55-65GB.  If I press render again now, the Max process drops to around 35-45GB initially, parsing scene and calculating GI etc goes really quickly and although the Max process does climb during this, it peaks once it starts rendering and never exceeds 65GB.

So ...

Question 1.  Why is the very first render twice as RAM hungry as subsequent renders?

Question 2.  When using DR, why is the Commit RAM accumulating so much during all of the different "tasks" i.e. Parsing, Displacement, Building Acc Structure etc etc.  For example "Pre Computing GI Cache" - why is that adding around 85GB+ to the Commit RAM? 

Rendering Locally - Commit RAM/Max Process initially drops about by about 25GB (I'm assuming this is happening due to some cache clearing on pressing render after the first render) but then as Corona goes through it's process that initial 25GB drop returns before it actually starts rendering then it stays there.

Rendering with DR - Commit RAM never drops, it just keeps adding up and adding up.  Those tasks are super quick when rendering locally and don't affect RAM all that much, relatively speaking.  But they take ages on the nodes and have a huge impact on RAM.

Edit:  I'm adding a screenshot from one of the nodes.  I opened the scene and rendered locally.  Again it gives the excessive RAM warning on first render but then RAM calms down on subsequent renders and I get no warnings/errors.  But what throws me is the details in the screenshot.  For example, the parsing and geometry and gi cache times.  They're really fast, but these bits seem to take forever when rendering through DR?  For example 4K Cache Precomp takes 11 seconds rendering locally, but from the details in my other posts it appears the commit ram increases another 30GB during this task during DR.  What is it doing in those 11 seconds to add that much to the commit ram?  And why does it not take 11 seconds for that task when rendering through DR?
« Last Edit: 2023-11-12, 21:41:32 by dj_buckley »

2023-11-14, 10:37:13
Reply #9

Frood

  • Active Users
  • **
  • Posts: 1922
    • View Profile
    • Rakete GmbH
But out of interest, when optimizing.  What's more important, Texture Resolution or Texture File Size?  i.e. if the texture is 8k but the file is only a few MB does it need optimizing?

The size on disk does not matter, it gets extracted into memory anyway. So for the rendering process, it's no difference if you load 8k from a small JPG or any uncompressed file format. But if you use CoronaBitmaps, you save a lot of memory when using the out of core feature.

Anyway my first test, I've just been watching the render with all of the windows open via remote desktop for one of the nodes.

The "slowwwww" parts just come from loading assets imho. Your master already has a lot of them loaded, to set up the viewports for example. Network/disk usage (depending on the location of your stuff) should be high at the same time.

Max Process (underlined blue) does not exceed 90GB at any point during any of the above.

Those insane commit sizes you listed are exactly the issue I monitor at most jobs. And they are responsible for crashes, even if the process seems not to need or use the provided virtual memory. It is just crazy to see the system paging gigabytes of ram for a scene that can actually render with a fraction.

I was also expecting these circled numbers to match or have I got that wrong

No W11 here, the memory page in task manager would have been useful, don't know what W11 shows here. If it is like W10, then yes. Additionally I don't know how Corona exactly displays the values there (Gibibyte vs. Gigabyte, that is 2^30 vs. 10^9 bytes per "GB"). But I assume the values are fractional Gibibytes.

Edit: Taskmanager seems to show used ram in your screenshot while DR tab shows (system) commit size.

There are a couple of things I find odd.

Agree. I would like to know the answers as well. Except for the duplicated scene Max stores, I have no hint what is causing all that trouble.

But as for the slow DR, same as above: DR server spawns a Max instance and all stuff that is already loaded when pressing render interactively has to be processed first. If you look at your max.log when loading a scene, you will notice a line like "Done loading file: (...)". Note the timestamp and see how many minutes you have to wait for Max getting responsive when loading a scene interactively. That time between loading the scene and having a "renderable" scene adds to DR nodes because the scene is (currently) loaded on the slave every time you start rendering on the master.


Good Luck




« Last Edit: 2023-11-14, 10:58:49 by Frood »
Never underestimate the power of a well placed level one spell.

2023-11-14, 10:48:03
Reply #10

dj_buckley

  • Active Users
  • **
  • Posts: 875
    • View Profile
Nice one

Any idea why on first render (locally) RAM usage is super high, but simply pressing cancel after the first pass has started and pressing render again, the RAM usage drops significantly?  For example, first render Max process is over 100GB, second render is 20-30GB less

2023-11-15, 10:59:08
Reply #11

maru

  • Corona Team
  • Active Users
  • ****
  • Posts: 12768
  • Marcin
    • View Profile
@dj_buckley - hi and sorry for a delayed response.

As to "why RAM usage is high for the first render and becomes much lower for the next render", perhaps it might be related to the out of core textures feature of Corona? I am just guessing here, but could you try disabling that feature in your scene (Performance tab) and see if you are observing the same behavior?

Also, could you share any scene where those issues are reproducible with us? https://support.chaos.com/hc/en-us/requests/new
Marcin Miodek | chaos-corona.com
3D Support Team Lead - Corona | contact us

2023-11-15, 11:09:30
Reply #12

dj_buckley

  • Active Users
  • **
  • Posts: 875
    • View Profile
Hi Maru

That has been happening way before Out of Core Textures was introduced if I remember rightly.  But I'll give it a test.  Work is hectic at the minute so it might need to wait until the weekend.  But I'll prepare send over the really heavy scene I've been working with.  It'll be a good scene to stress test future Corona versions with too.

2023-11-15, 11:11:54
Reply #13

maru

  • Corona Team
  • Active Users
  • ****
  • Posts: 12768
  • Marcin
    • View Profile
Thanks a lot!

Corona (or maybe it's 3ds Max? not sure!) also does this thing called decompression, which could be affecting RAM usage. It is mostly visible when there are many hi-res textures used in the scene.
Once we have a sample scene and we are able to reproduce the issue, it should be easy to find out what exactly is going on.
Marcin Miodek | chaos-corona.com
3D Support Team Lead - Corona | contact us

2023-11-15, 11:16:18
Reply #14

dj_buckley

  • Active Users
  • **
  • Posts: 875
    • View Profile
Yep it could well be 3DS Max, I have been optimizing this scene as I go.  But it's still littered with (unecessary) high res textures.  My main workstation has 192GB so admittedly I've not really been paying attention to optimization until now as the main workstation can handle it no problem.  It's the slaves that are the issue (128GB RAM).  Anyway leave it with me and I'll get something sent across before next week.