![]() ![]() |
Audio Asylum Thread Printer Get a view of an entire thread on one page |
For Sale Ads |
194.78.209.104
In Reply to: RE: 44.1 kHz shown scientifically to be inadequate posted by Tony Lauck on July 26, 2009 at 19:26:14
Let's repeat in short what the experiments *actually* were:
1) two vertically-arrayed tweeters are fed a 7kHz squarewave. The upper
tweeter is kept aligned with the lower one (sound A) or moved backwards over a distance d (sound B). A listener is asked to distinguish between A and B. Note that sound A and sound B both are continuous and periodic, and
also that one would expect the listener to be sensitive to the fundamental
only (7kHz) as the first harmonic is 21kHz, which is proven to be inaudible (average people and normal listening levels). The experiment tells us that listeners perceive differences in A and B down to distances that correspond to a delay of 5us in free air. Note that the experiment does *NOT* test the audibility of two impulses 5us apart.
2) a headphone is fed a mono triangular signal of 7kHz. The signal is untreated (A) or pre-filtered first-order with a time constant T. Listeners can discern between A and B for Ts down to 5 us. Again, one expects that the listeners are only sensitive to the fundamental of both sounds.
There is some theorising about the cause of people being capable of telling A and B apart, but no clear origin is pointed at, except perhaps
the non-linear mixing of 7k and 21k to produce 14k (by two mechanisms).
So far so good, and I'm sure no-one takes any offence.
Then K jumps to the conclusion that a 44.1kHz sample rate is insufficient. Quite a jump. And without further motivation, nor
an experiment where the same signals are sent through a 44.1kHz
encoding/decoding system, which, by the way, would still pass
21kHz.
This has sparked the mother of flame wars at the Stereophile forum,
and it was about the ugliest thing I've ever seen on a forum.
bring back dynamic range
Follow Ups:
Werner wrote:
This has sparked the mother of flame wars at the Stereophile forum.
Hopefully, we can conduct the discussion slightly better here.
Then K jumps to the conclusion that a 44.1kHz sample rate is insufficient. Quite a jump.
I feel you may be over-simplifying his position but perhaps I’ve missed something. Can you point me to where he draws ("jumps to”, even) that conclusion?
Thanks.
Dave
"Can you point me to where he draws ("jumps to”, even) that conclusion?"
I searched his papers and his FAQ for "44.1" and came up with the following paragraphs.
"B. Implications for sound reproduction
The result presented here has relevance for the performance
requirements of audio components and digital encoding
schemes. It is known that the bandwidth requirement
for sonically transparent audio reproduction is higher
than the 20 kHz: in the coding of digital audio it has been
noted (Stuart, 2004) that listeners show a preference for a
96 kHz sampling rate over the CD (digital compact disk)
standard of 44.1 (i.e., a 22 kHz Nyquist frequency). It
is sometimes thought that this may be due to the less
drastically sloped cutoff of the digital filter and the reduced
disturbances introduced in the audible pass band.
The present work shows that the bandwidth requirement
into the ultrasonic range is more fundamental and not just
due to artifacts of digital filtering. It is also commonly
conjectured in the audio literature that the time-domain
response of a system (e.g., temporal smearing caused by
capacitive and other energy-storage mechanisms in cables)
is a key factor in determining the transparency of reproduction
(see, for example, van Maanen, 1993). However a
search of the literature revealed an absence of a controlled
blind experiment comparable to the one conducted here.
The present work thus contributes toward a better fundamental
understanding and provides a quantitative measure
for audio-reproduction standards."
"Digital audio recording: In my papers, statements related to "consumer audio" refer to CD quality, i.e., 16 bits of vertical resolution and a 44.1 kHz sampling rate (when the work for these papers was begun around 2003, 24bit/96kHz and other fancier formats were not in common use in people's homes for music reproduction). For CD, the sampling period is 1/44100 ~ 23 microseconds and the Nyquist frequency fN for this is 22.05 kHz. Frequencies above fN must be removed by anti-alias/low-pass filtering to avoid aliasing. While oversampling and other techniques may be used at one stage or another, the final 44.1 kHz sampled digital data should have no content above fN. If there are two sharp peaks in sound pressure separated by 5 microseconds (which was the threshold upper bound determined in our experiments), they will merge together and the essential feature (the
presence of two distinct peaks rather than one blurry blob) is destroyed. There is no ambiguity about this and no number of vertical bits or DSP can fix this. Hence the temporal resolution of the CD is inadequate for delivering the essence of the acoustic signal (2 distinct peaks). However this lack of temporal resolution regarding the acoustic signal transmission should not be confused with the coding resolution of the digitizer, which is given by 23 microseconds/2^16 = 346 picoseconds. This latter quantity has no direct bearing on the system's ability to separate and keep distinct two nearby peaks and hence to preserve the details of musical sounds. Now the CD's lack of temporal resolution for complete fidelity is not systemic of the digital format in general: the problem is relaxed as one goes to higher sampling rates and by the time one gets to 192 kHz, the bandwidth and the ability to reproduce fine temporal details is likely to be adequate. I use the word "likely" rather state definitely for two reasons. In our research we found human temporal resolution to be ~5 microseconds. This is an upper bound: i.e., with even better equipment, younger subjects, more sensitive psychophysical testing protocols, etc., one might find a lower value. The second reason to not give an unambiguous green signal to a particular sampling rate is that the effective bandwidth that can be recorded is less than the Nyquist frequency because of the properties of the anti-aliasing filter, which is never perfect in real life. One more thing I want to add is that one forum poster inquired whether the blurring is an analog effect and not a digital one (“… this isn't a sampling-rate issue, it's a simple question of linear filtering…"). But the two are not separate. While it is true that the smearing may take place in the analog low-pass filter circuitry before the signal reaches the ADC, the low-pass filter cutoff is dictated directly by the sampling rate. The exact amount of smearing and other errors will depend on the slope and other details of the filter, but the big-picture conclusion is still the same."
"As has been noted in the
literature [4], there have also been anecdotal claims by
listeners that an improvement in fidelity can be noticed
for sampling rates in excess of the 44.1 kHz sampling rate
of the digital compact disk (CD) even though the listeners
cannot hear pure tones above the 22 kHz Nyquist
frequency. Such subtle effects may be masked in average
mass-produced commercial audio systems and audiometric
apparatus used in psychoacoustic research, because of
the limited resolution of the equipment—the bottleneck
then arises from the limitations of the apparatus rather
than the ear."
Tony Lauck
"Diversity is the law of nature; no two entities in this universe are uniform." - P.R. Sarkar
"For CD, the sampling period is 1/44100 ~ 23 microseconds and the Nyquist frequency fN for this is 22.05 kHz. Frequencies above fN must be removed by anti-alias/low-pass filtering to avoid aliasing. While oversampling and other techniques may be used at one stage or another, the final 44.1 kHz sampled digital data should have no content above fN. If there are two sharp peaks in sound pressure separated by 5 microseconds (which was the threshold upper bound determined in our experiments), they will merge together and the essential feature (the
presence of two distinct peaks rather than one blurry blob) is destroyed. There is no ambiguity about this and no number of vertical bits or DSP can fix this. Hence the temporal resolution of the CD is inadequate for delivering the essence of the acoustic signal (2 distinct peaks). However this lack of temporal resolution regarding the acoustic signal transmission should not be confused with the coding resolution of the digitizer, which is given by 23 microseconds/2^16 = 346 picoseconds. This latter quantity has no direct bearing on the system's ability to separate and keep distinct two nearby peaks and hence to preserve the details of musical sounds."
That quote came from the FAQ Prof.Kunchur wrote as a reply to the questions and comments he got on the three published papers. It contains some interesting things. Let's go through them.
"While oversampling and other techniques may be used at one stage or another"
Oversampling techniques *must* be used in order to get correct reconstruction. This is not optional, and this should not be dismissed.
"the final 44.1 kHz sampled digital data should have no content above fN."
That data cannot have content above fN. Once sampled, Fn is 'infinity' (or rather, the edge of the circle) and there is nothing outside/above it.
Methinks the two above quotes are rather unscientific in their formulation.
"If there are two sharp peaks in sound pressure separated by 5 microseconds (which was the threshold upper bound determined in our experiments), they will merge together "
That two such peaks would merge together with 44.1kHz sampling (or rather, as a result of the anti-aliasing filtering prior to the deed
of sampling) is obvious. We don't need any scientific publications for that.
"(which was the threshold upper bound determined in our experiments)"
Er. NO.
The experiments were about a 5us delay in one source of a continuous, periodic 7kHz signal, and another about a 5us time constant in a first-order low-pass filter, gain for a continuous and periodic signal. There were no experiments involving 'sharp peaks separated by 5us'. The auditory relevance of 44.1kHz not resolving two such peaks was not proven at all. That proof was not even on the agenda.
So
"Hence the temporal resolution of the CD is inadequate for delivering the essence of the acoustic signal (2 distinct peaks)."
is a very questionable claim (the jump), at least under the experimental evidence published in the three papers.
"this lack of temporal resolution regarding the acoustic signal transmission should not be confused with the coding resolution of the digitizer, which is given by 23 microseconds/2^16 = 346 picoseconds. "
It is a fact of nature that the spatio/temporal resulotion(*) of a correctly
band-limited sampled system far exceeds the actual spatio/temporal sample period. Many advanced systems operate according to this principle. In the awful wars JJ quoted cellphones, digital TV, and modems. I add the optical attitude control systems of spacecraft to the list. The last time I checked most satellites presently in orbit knew more or less where they were, so it must be working...
(* Meaning the accuracy with which the position of a signal event on a time or spacial axis can be determined.)
bring bac k dynamic range
I am not interested in debating with people who misconstrue and misinterpret remarks for the purpose of discrediting people. I'm not saying that this applies to you, but from reading your post it might. More I will not say, because I don't want to further escalate the situation.
Tony Lauck
"Diversity is the law of nature; no two entities in this universe are uniform." - P.R. Sarkar
I just want to see the link between the experiments that have been done,
and the aural relevance of two sharp pulses spaced 5us apart. That is a valid question.
bring bac k dynamic range
Werner wrote:
Methinks the two above quotes are rather unscientific in their formulation.
The nits you pick at are undoubtedly present (i.e. some might find the text a little bit ambiguous, especially if they are so minded) but to complain of Kunchar’s writing in this context is a bit like complaining that, OK, that Stephen Hawking fellow is fine at physics and all that - but he’s shite at football. I don’t think the meaning is that hard to fathom.
We don't need any scientific publications for that.
The point was made in a note aimed at audio forum members, not in a scientific publication.
There were no experiments involving 'sharp peaks separated by 5us'. The auditory relevance of 44.1kHz not resolving two such peaks was not proven at all. That proof was not even on the agenda.
I don’t read Kunchar’s “FAQ” as suggesting that the point has been “proved”, rather that it is a logical consequence of his results. You suggest as much in the previous paragraph of your post (“We don't need any scientific publications for that”).
Again, you seem to be complaining of possible ambiguities in a text aimed at a lay audience rather than examining the experiments themselves. The latter strikes me as a more fruitful route. I don’t want to get into “quote swapping” but in the paper describing the low-pass filtering experiment, the phenomenon is described thus (“Procedure”, para 1):
The control tone was perceived to have a sharper or brighter timbre whereas the filtered one had a duller quality (no difference in loudness was perceived except for the largest setting of t=30 μs).
Note that this is for differences in a stimulus (input signal) that are so minute that they had previously been dismissed by one and all as completely imperceptible. No one in the field had even felt the need to construct apparatus competent to measure them.
Now that their perceptibility has been established, we can agree that RBCD recordings are inherently unable to reproduce that level of detail. A technology now 30 years old does not, in fact, capture “perfect sound forever”. It’s good - but it’s not perfect.
Kunchar points that out and all hell breaks loose. He doesn’t suggest that we have to dump our CD collections. My DAC samples at 44.1 kHz only but I’m not planning to throw it in the sea (certainly not at that price). And so on.
But I can readily see how these findings provide strong support for those who consistently report that better quality sound is provided by good recordings made at higher resolutions.
It seems almost self-evident that the ultra-fast rise times encountered in percussive transients, the female voice and so on push RBCD to its limits. I just cannot see what the fuss is about.
I’m sorry but I don’t understand your points about cellphones and about satellites not getting lost or why the points are relevant. Nor do I know who JJ is. All that’s probably my fault.
Dave
"Again, you seem to be complaining of possible ambiguities in a text aimed at a lay audience rather than examining the experiments themselves. The latter strikes me as a more fruitful route"
I have been reading and rereading the two initial papers, with the two experiments, for about a month now.
The papers describe experimental results that, assuming no mistake was made, indicate that:
1) people can discern when two vertically stacked loudspeakers replaying a continous 7kHz squarewave are misaligned in depth down to a distance of 2mm (~5us)
2) ... when a monophonic headphones feed of a continuous 7kHz triangle is first-order low-pass filtered with a time constant down to 5 us.
Kunchur claims that all trivial mechanisms for distinction (i.e. as presently in the knowledge on auditory perception) have been accounted for and were found each to be below the JND threshold at the respective levels and frequencies. In other words: differences are detected and this
through a hitherto unknown mechanism.
Fine.
These two items, and only these two, are the basis of all further argument.
Now please tell me: how follows from 1) and/or 2) that a 44.1kHz sampling system is insufficient?
And tell me further: how follows from 1) and/or 2) that an audio coding system has to be capable of keeping separated two short impulses spaced 5us apart in time?
These are two claims made in the papers and/or the later FAQ, but I fail to see the connection, I fail to see how this is 'a logical consequence of his results' when the experiments did not involve a 44.1kHz sampling system, did not involve bandwidth limiting of the stimuli, and did not involve pulses spaced 5us apart.
This is a sincere question. Maybe I don't see the connection because
I'm stupid, ignorant, or wrongly educated. I'd like to know.
--
As for the FAQ being for laypeople: inaccuracy in terminology never ever brings any benefit.
bring bac k dynamic range
The relevance of the two pulse experiment is to the necessity of precise speaker alignment. (From the anecdotal story, this is what started Kunchur down his research path.)
The square wave test shows that a band limited 20 kHz channel is not aurally transparent. This would be a conclusive proof that a 40 kHz sampling rate is not adequate. By itself it is not conclusive proof that a 44.1 kHz LPCM system is transparent, but it strongly suggests it. What it definitely does is prove that the bandwidth of the human hearing system is not accurately characterized by the detection threshold of high frequency sine waves. Since it is these numbers that are frequently used to "prove" that 44.1 kHz suffices, what Kunchur has done is to refute of these proofs. Had Kunchur used a test frequency of 7351 Hz instead of 7000 Hz then he would have conclusively proven that 44.1 kHz LPCM is not adequate (subject of course to errors in experiment design, execution and analysis).
Informally, Kunchur has demonstrated that simplistic models of human hearing are not adequate to describe reality, and therefore that simplistic engineering approximations in the design of audio equipment may not be appropriate when designing the highest quality equipment.
Tony Lauck
"Diversity is the law of nature; no two entities in this universe are uniform." - P.R. Sarkar
"Had Kunchur used a test frequency of 7351 Hz instead of 7000 Hz then he would have conclusively proven that"
Then again, the very outcome of the experiments might have been different.
But IMO the experiments should be redone, at 8kHz.
bring bac k dynamic range
If I were a betting man, I would bet that many if not most of the subjects would pass the test at a slightly higher frequency, given that they all passed the 7 KHz test. Not sure how far a bet I would place.
Speaking as an engineer, I would agree that the test should be repeated at 8 kHz. Or perhaps 8001 Hz. :-)
Tony Lauck
"Diversity is the law of nature; no two entities in this universe are uniform." - P.R. Sarkar
"Informally, Kunchur has demonstrated that simplistic models of human hearing are not adequate to describe reality, and therefore that simplistic engineering approximations in the design of audio equipment may not be appropriate when designing the highest quality equipment."
More accurately, Kunchur has demonstrated that one simplistic model of human hearing yields different results than another simplistic model of human hearing. What he has NOT demonstrated, what his tests do not even imply, is anything meaningful about the complex models that are humans listening to real music through real, complex, audio reproduction systems.
That requires controlled, unsighted listening tests.
P
And what is an even greater leap, stretch, whatever...is the idea put forth in this thread that this study provides evidence, much less proof, that redbook is inadequate, or even distinguishable from higher rez formats. There are no audio systems in this study, not even any complete speaker systems, no music. This provides evidence that listeners can hear a variance in alignment between two identical drivers playing a square wave. Nothing more.
All the rest is wild speculation fueled by wishful thinking. I'm not saying there is no audible difference between redbook and hi-res. I'm just saying that this study takes us no closer to settling that debate.
This one, while it may make many uncomfortable, does:
http://www.google.com/url?sa=t&source=web&ct=res&cd=5&url=http%3A%2F%2Fdrewdaniels.com%2Faudible.pdf&ei=OrJxStOCPJOxtgf0-pWNBA&usg=AFQjCNHv8OtMKvwB9x8GX78tugq--0IChw
Multiple systems, multiple rooms, many listener groups of varying expertise and "listening skills," hundreds of trials...and relatively the same results over and over again. Though I'm sure we'll find reasons to invalidate its conclusions. We always do.
P
nt
The situation would have been quite simple if Kunchur had used an 8 kHz square wave or if the PCM channel were band limited to 20 kHz.
It's not so simple, but he has established that previous measurements of the bandwidth of human hearing based on sine waves do not properly characterize the actual bandwidth of human hearing.
Tony Lauck
"Diversity is the law of nature; no two entities in this universe are uniform." - P.R. Sarkar
Post a Followup:
FAQ |
Post a Message! |
Forgot Password? |
|
||||||||||||||
|
This post is made possible by the generous support of people like you and our sponsors: