moderate [...] overclocking of my workstation.
3.6GHz to 4.6GHz
Holy moly, our definitions of moderate vastly differ :D
Welcome to the forum.
No, the Code is not more inefficient. This has to do with how the CPU architecture crunches numbers. Data types and calculations are being broken up into the FPU and the ALU. One crunches Floating point operations, the other integer and logic operations. Rendering in itself relies heavily on Floating point calculation, thus when the heavy Floating point crunching comes into play, logical operations may or may not idle during that time, even if only for milliseconds. By definition, the CPU has reached it's operational limit and CPU usage is 100%, but physically only on one part is mostly crunching numbers.
Thus temperature does not rise as high, as in a workload, which happens to strike the perfect balance between the two. If have noticed this aswell, denoise strikes this very perfect balance by random, delivering just enough logical operations, for the FPU to keep up, thus both are being used just perfectly and everything cooks.
On my FX 8350, and all AMD FX for that matter, we only have 4 FPU for the supposed 8 core chip. I can run the corona benchmark and cinebench at almost 5ghz and
have thus the fastest 8350 in the benchmark :DThis is because I basically "only use half the chip".
As soon as I run prime 95 with small FFT or any other FPU light, but logical heavy operation, the system crashes withing a second or if bump the voltage, cooks itself to metal melting temperatures.
edit:
This is the same reason, that sometimes at 100% CPU load you can still use windows, while at workloads, which eat up the logical part to 100%, still the same 100% usage, but now windows is unusable.