I think just like the "old" method, the camera has to be outside the object to trigger running the calculations when the ray hits the surface. For animating passing through the cloud in the demo video, I had a ProBoolean with a cube around the camera cut out from the cube that had the Volume material, making sure the camera was always "outside" the object and allowing it to fly through the cloud. This would be unrelated to stepping size, just that the code can't be triggered unless a ray hits the surface to say "start running this shader" (I think... not being a dev, though I did write a ray-marching shader a lonnnng time ago in another application in a galaxy far far away).
I believe slow is inevitable given the amount of calculations that are being done - Single Bounce is certainly your friend! Depending on the material and desired results, very high stepping values can be used (though you will lose detail, and depending on the material may see stepping effects, if the material is pretty solid rather than wispy).