Chaos Corona for Cinema 4D > [C4D] Bug Reporting
Teamrendering - Clients do not start next job in queue after some time
HFPatzi:
Hey there,
first of all, here are our Renderfarm Specs:
Teamrender Server Machine:
CPU: 32 x 2,4 GHz (I think its an intel, but not sure about that)
RAM: 512 GB
System: Windows 10
Teamrender Server and Corona are on the latest version. No other 3rd party plugins besides corona are installed.
Teamrender Clients (3x):
CPU: AMD Threadripper 24 x 4,0 GHz
RAM: 64 GB
System: Windows 11
Teamrender Client and Corona are on the latest version. No other 3rd party plugins besides corona are installed.
The Machines are all connected via a dedicated 10 Gigabit Switch.
Since quite some time now, we have problems with our renderfarm.
For example, i put 10 jobs (Still images) in the servers queue and start them all at once.
The first jobs are running fine and every client is participating. Then at some undefined time the clients dont start the nex job in the queue. In fact, not all clients stop at once. There might be one machine which switches to idle after completing a job and not starting the next one. Machines that turn idle, will stay there until you add and start a new renderjob. The jobs which are pending in the active Queue seem to be ignored. So after a few Jobs, no client machine is rendering anymore, status is idle and there are still jobs in the active queue with status pending. There is no error or warning in any log what so ever. It almost semms, that the clients don't see anymore jobs in the queue after turning idle. Only "Workaround" is to restart the clients software (i.e. via the Temarender Webinterface). Once restarted, the clients start immediatly with the next job in the queue. We will contact the maxon support parallel to this post. But i have some theories what could cause this issue:
While monitoring the Rendering process on the farm i saw, that one machine renders a bit slower than the other two. When the "faster" machines are done with their part, they directly start with the next job in the queue while the slower machine still renders the old job. When this machine has finished, it will also start on the job, the other machines already are rendering. Maybe at some point, the gap between the faster machines and the slower one is too big and the machine doesn't know where to continue. But on the other hand, the two faster machines also switch to idle eventually.
Another theory would be, that maybe the teamrender software counts a job a finished before denoising starts and therefore there will be an expanding gap between what the machine thinks means finished and when it is actually finished. This might also explain why the web interfaces progress bar goes over 100% (around 120% mostly).
Another point might be the different Windows versions. But AFAIK the Server-Machine can not be upgraded to win 11. Maybe there is a general problem with teamrendering on win11?
I'm not sure, if the Teamrender settings in the corona rendersettings could help solving this problem. With our old farm (same server machine, two older client machines) i figured out the values for fastest rendering (60s / 150MB). On our old Farm we didn't have the clients "idle" problem.
I hope, I explained our issue somewhat understandable ;)
If you have further questions, let me know!
Have a nice day!
Moritz
HFPatzi:
Hey,
over the weekend, I tested some different variations of the Teamrender Settings of corona (Document Rendersettings this is).
Looks like the problem that clients stop working the queue, is solved with the Arbitrated mode.
So in Arbitrated mode, all jobs of my test queue (10 jobs) have been rendered.
Unfortunatly, another problem appeared: As said, all jobs rendered and all jobs have a correct result.
Some jobs however have the Status "Am zusammensetzen", I am missing the correct translation in english here. But i think you can translate it with "assembling" maybe.
These jobs are also in the inactive queue and have render results.
According to the Logfiles, it seems that these jobs do get the "assembling successful" status, but right after that they again get the Status "assembling". Not sure what causes this, but it would be nice if it can get fixed somehow.
Here is a log where the status is set back to "assembling" after successful assembling it:
--- Code: ---2025/02/24 15:20:19 Created 'Testjob_CR_arbitrated_TR_05s_250MB_09' by 'XXXXXXX'
2025/02/24 15:21:24 Eingereiht
2025/02/24 15:21:24 Started Job by User XXXXXXX
2025/02/24 15:21:24 Eingereiht
2025/02/24 15:21:24 Am Vorbereiten
2025/02/24 15:21:24 Am Vorbereiten
2025/02/24 15:21:25 Am Rendern
2025/02/24 15:21:31 RENDER-CLIENT01: Downloaded Asset(s) in 4.779 seconds
2025/02/24 15:21:31 RENDER-CLIENT02: Downloaded Asset(s) in 4.900 seconds
2025/02/24 15:21:32 RENDER-CLIENT03: Downloaded Asset(s) in 5.908 seconds
2025/02/24 15:21:33 RENDER-CLIENT01: Rendering frame 0 of job 'Testjob_CR_arbitrated_TR_05s_250MB_09'
2025/02/24 15:21:33 RENDER-CLIENT02: Rendering frame 0 of job 'Testjob_CR_arbitrated_TR_05s_250MB_09'
2025/02/24 15:21:34 RENDER-CLIENT03: Rendering frame 0 of job 'Testjob_CR_arbitrated_TR_05s_250MB_09'
2025/02/24 15:26:46 Am Zusammensetzen
2025/02/24 15:26:50 Rendern erfolgreich
2025/02/24 15:26:50 Zusammensetzen erfolgreich
2025/02/24 15:26:50 Am Zusammensetzen
--- End code ---
And here is a log, where everything worked as it should and the status is set to "completed":
--- Code: ---2025/02/24 15:17:27 Created 'Testjob_CR_arbitrated_TR_05s_250MB_10' by 'XXXXXX'
2025/02/24 15:21:24 Eingereiht
2025/02/24 15:21:24 Eingereiht
2025/02/24 15:21:24 Started Job by User XXXXXX
2025/02/24 15:21:25 Am Vorbereiten
2025/02/24 15:21:25 Am Vorbereiten
2025/02/24 15:21:27 Am Rendern
2025/02/24 15:26:58 RENDER-CLIENT02: Downloaded Asset(s) in 5.035 seconds
2025/02/24 15:26:59 RENDER-CLIENT01: Downloaded Asset(s) in 5.373 seconds
2025/02/24 15:26:59 RENDER-CLIENT03: Downloaded Asset(s) in 5.965 seconds
2025/02/24 15:27:00 RENDER-CLIENT02: Rendering frame 0 of job 'Testjob_CR_arbitrated_TR_05s_250MB_10'
2025/02/24 15:27:01 RENDER-CLIENT01: Rendering frame 0 of job 'Testjob_CR_arbitrated_TR_05s_250MB_10'
2025/02/24 15:27:01 RENDER-CLIENT03: Rendering frame 0 of job 'Testjob_CR_arbitrated_TR_05s_250MB_10'
2025/02/24 15:32:56 Am Zusammensetzen
2025/02/24 15:33:00 Rendern erfolgreich
2025/02/24 15:33:00 Am Zusammensetzen
2025/02/24 15:33:00 Zusammensetzen erfolgreich
--- End code ---
If you need any further informations, just ask ;)
Greetings,
Moritz
HFPatzi:
I guess i celebrated too early. While arbitrated mode seems to work fine with the regular image, there is a huge problem with the multipass image. Allthough, the job has rendered and finished correctly (Statusbar at 100% and green), the multipass result looks like it stopped somwhere in the initial pass (lots of black squares, etc...) and does not show the final image.
So after testing around with different versions of corona, we're now back to square one. Cinema 2025.0.2 and corona 12 hf2 seem to render correct results with manual mode, but the clients again are refusing to start the next job in the active queue after some time. Only solution for now is to restart the clients when this is happening. Thats not ideal, especially when rendering over the weekend.
Looking forward to hear from you!
Greetings,
Moritz
HFPatzi:
And on we go with this monologue. I just started some renders on our farm. Then i monitored the license server for a bit and it seems that the renderclients only aquire a license, when they have to render something. My assumption was, that as soon as the teamrender client software is started, it will keep a licesnse active for as long as it is running. But it looks like, when a renderjob is finished, the licenses on the clients are getting released. Even if there are more jobs in the renderqueue. Is there a way to give the renderclients some kind of permanent license or maybe another way to fix this?
Greetings,
Moritz
bnji:
Hello Moritz,
Thank you for bringing this to our attention.
Is this happening with every single scene you're working on?
Are you able to share a test scene (including its assets) so we can investigate further?
If so, please submit a new support ticket so that you can upload a zipped scene project.
We're currently investigating other related reports, so it would be great to have more scenes where the issue is reproducible.
Looking forward to hearing from you soon.
Kind regards.
Navigation
[0] Message Index
[#] Next page
Go to full version