I always think of the top / bottom (ie left eye / right eye) images as two separate renders that happen at once - which means any background will indeed appear in full in both (rather than it being one render, where the background would be stretched across both left and right eye results)
Never tried compositing onto an already-stereo image, but my first thought would be to render without the background, just getting the 3D object with an alpha, and recombine in post (stereo output of 3D object with background as an alpha on top of the stereo footage.) A non-stereo version of the background would be used to get reflections etc, and a shadow catcher for catching shadows onto the table.
Have never tested it though - may give it a try and see!