I will leave Juraj to comment on his workflow :) I have tested something similar this last week though, using ComfyUI, and I can say that trying to do "face improvement" at 512x512 does not work, it just uglifies things :) For me I had to take a crop of my test image which ended up being 374 x 669, upscale it through a 4x upscaler to 1496x2672, then pass that through the Realistic model with 0.49 denoising, and then it improved clothes and faces. Trying to do that on the original crop with no upscaling just made things worse - so in the way I had things set up, a) you are not limited to 512 at all and b) 512 makes things worse.
This may be dependent on GPU memory though.
For interest, this was on a 4080 laptop GPU with 12GB memory, with similar performance seen on a 3070 Ti desktop. It was about 60 seconds to process the crop (that is to upscale it with a face sensitive upscaler, and then feed it through the model, combined in that time)
Also Tom (sorry to hijack the thread a bit here feel free to move into a new thread), i'm assuming here that your crop was of the whole person, so in effect the face was much much smaller than 374 x 669 crop. The upscale then allowed the 'face' to fill the 512 marquee better in the resulting 1496x2672 image giving more starting fidelity for SD to work with?