I quote from K.C. Pohlmann, Principles of Digital Audio, 3rd ed., McGraw-Hill, 1995
As with other low-bit coders, to achieve high resolution, high sampling rates are required; for example, with an audio band 22,1 kHz and 64 times oversampling, the internal sampling frequency rises to 2.8224 MHz, thus quantization noise is spread from dc to 1.4112 MHz
Can you see why I'm confused? 64 is much higher than 8.
Pohlmann is not wrong at all about the noise - but in the same way that the Watkinson references do, he assumes a certain knowledge of basic principles in his readers that perhaps he
shouldn't...
The higher you raise the effective sampling rate, the wider the bandwidth of the noise - you
can't just make magical gains here. You can cheat
slightly by shaping the dither of your audio, which has the effect of squashing the signal contribution to the noise level in one part of the band, and letting it rise slightly in another to compensate, but in reality, this effect varies somewhat when played through different systems with different oversampling charactersitics anyway. The difference that any of these techniques actually make to the percieved noise floor is absolutely minimal.
The bottom line is that you get more improvement because the filtering becomes much easier to implement - other factors in the design make more difference to the output than this - like noise in the monotonic steps, and the overall stability of the conversion. But however you look at it, above 4x, most of the advantages are to the marketing department, not the consumer - a well-designed 8x oversampling system will sound better than a 64x one which uses naff components in the output stage, for instance. And quite frankly, you are looking at vanishingly small differences anyway. When it comes to production techniques, you can often make
far more of an improvement to the sound that comes out than you will by changing from 8x to 64x oversampling.