![]() ![]() |
Audio Asylum Thread Printer Get a view of an entire thread on one page |
For Sale Ads |
24.151.80.177
In Reply to: Some comments posted by David Aiken on March 13, 2005 at 13:18:35:
David,Great points, all.
> Well, how about compression of drivers and the effect of that on transients and dynamics? There's also the speed that some components respond with and the effect of that on transient/dynamic performance but that could be considered a time-based error. <
But every one of those are easily measured using the four parameters I defined! Everything you mentionend will show up clearly in simple distortion and frequency response tests.
> Sorry but we don't always have the tools and models. <
Sure we do. You can easily invert a copy of the input to any device under test, combine it with the output and null out the original, and all that remains is the difference. Regardless of what effect caused the difference, it's trivial to identify the artifact completely and unambiguously.
> take a look at some of John Atkinson's comments about the lack of fit with his test measurements and the subjective reviewer's comments in Stereophile <
Without repeatability in the form of double-blind test results, all subjective opinion is meaningless and worthless. Now, if some of the things people claim to hear but that can't be measured were identified repeatably in a proper test, I'd gladly reconsider. As far as I know this has not yet happened. And there's a reason this has not yet happened.
Follow Ups:
...is your mantra, "Without repeatability in the form of double-blind test results, all subjective opinion is meaningless and worthless."?In your exchange above with David McGown...
"David,
> I prefer to understand how and why audio equipment and associated tweeks work, and apply a bit of scepticism where it doesn't make sense to me. <Exactly, me too."
...you seemed a bit more resilient.
"Without repeatability in the form of double-blind test results, all subjective opinion is meaningless and worthless."Come on, Ethan. That's more than a little arrogant.
One of the reasons such opinon is not meaningless and worthless is that it is a large part of what helps us all choose components. Are you really suggesting we make all purchasing choices simply on the basis of test results? That would be a little difficult when there are not independent test results available for every product and there's often inconsistencies in what is tested by different testing bodies so comparison of test results for competing products is often difficult.
The truth is that subjective opinion is not meaningless and worthless, but it's value does have limitations and a place. You can't make a sweeping statement like that and be expected to be taken at face value.
But the real issue here is that you made that comment in response to a remark I made about your claim that we can correlate listener experience with objective test results and the subjective opinions I was referring to were those JA mentions in his test results and tries to correlate with his results, sometimes quite successfully and sometimes not.
You were the one to say that the correlation can be made, a statement you made without qualification. Now you say that "all subjective opinion is meaningless and worthless". Does that mean that you now accept that the correlations can't be made?
David,> Are you really suggesting we make all purchasing choices simply on the basis of test results? <
No, of course not. Although I didn't say it, I assumed it was obvious that what I was referring to tweaks and other changes that are not blatantly obvious. Anyone can hear the difference between a Bose Wave Radio and a pair of B&W loudspeakers. Where proper testing is needed is when comparing things that are always hotly disputed, such as the "improvement" from bi-wiring or a replacement power cord.
Fine, but as I said my comment was in relation to your statement in reponse to David McGown's question which stated "…do we have the tools and models to measure the right things and correlate them with what we perceive?" in which you said "Without repeatability in the form of double-blind test results, all subjective opinion is meaningless and worthless."I'm not suggesting that there should be objective data to correlate with every perception - some perceptions ARE mistakes. And I'm not suggesting that there should be a perception to relate to everything measurable - some measurements indicate effects below the threshold of audibility.
But if, in the area where things are clearly audible and perceptions aren't generally disputed (which is a pretty big area in its own right), we can draw reliable correlations between perception and test data — and you said that we could — then I'd say that's a reasonable indication that some subjective perceptions are not meaningless nor worthless. If the data does allow us to infer what the associated subjective perceptions will be, then those subjective perceptions will also allow us to reliably infer what the tests will indicate.
I don't think you can have it both ways. If the objective data is meaningful and has worth, and if genuine and reliable correlations can be drawn between some objective data and some perceptions, then I'd say that was a more than fair indication that those particular perceptions were both meaningful and of worth.
And the fact is that we neither ask for, nor think we need, appropriate test data to support every perception we have nor the opinions we form from them. We happily accept most of our perceptions and the opinions that form as a result without question and without being questioned. The perceptions and opinions we question are only a very small proportion of the perceptions and opinions we have and form every day, and the proportion that are wrong is even smaller still. You are the person who chose to put the "all" into "Without repeatability in the form of double-blind test results, all subjective opinion is meaningless and worthless" and it is simply because of that choice that the statement is false.
Without repeatability, some perceptions definitely aren't to be trusted, but that's a long way from not trusting all perceptions. If you really want to claim that ALL subjective perceptions and any opinions we form as a result of those perceptions are meaningless and worthless, and I really do mean *ALL* here, that means that you assign no meaning or value to any perception or opinion at all that you have, so you don't give any importance at all to the fact that you perceive and believe yourself to be crossing the street, and that you simultaneously perceive and believe that a motor vehicle is about to strike you, or the fact that these two meaningless and worthless subjective perceptions and opinions formed as a result are quickly followed by another equally meaningless and worthless subjective perception of pain plus an opinion which says "this hurts", both of which immediately precede the cessation of all meaningless and worthless subjective perceptions and opinions whatsoever that comes with death.
And that little story is the ultimate argument about why we should never regard all subjective perceptions and opinions as meaningless and worthless, and why we also shouldn't bother to wait for double blind test data to support everything we perceive and believe.
And all of that comes from your choice to include the word "all" in your claim.
At a more meaningful level I would hesitate to suggest that any perception is meaningless and worthless because, even if it isn't a perception of a genuine effect (in other words, even if the perception is mistaken), the perception still indicates something that can be of genuine interest like the fact that a mistake has been made so there may still be some meaning and worth to the perception. In other words, subjective perceptions that are correct have the meaning and value we normally assign to correct perceptions, and suybjective perceptions that are wrong may well find very useful meaning and value as data leading to our understanding of how errors of perception occur.
The problem is you're trying to hard too debunk some things and you're allowing your desire to do that to lead you into overstating your case by making extreme and universal statements that simply aren't universally true. They need to be qualified but if YOU don't qualify them when you make them, that makes your statement false and then the things that you want to assert as a result of those statements lose their support. You destroy your own argument.
I've got no problems with what you're trying to achieve. It's the way you're going about it that I'm criticising because it simply doesn't work. You're just ensuring the arguments you mount are logically invalid because you're poorly framing the premises on which your conclusions rest. Scientific proofs do have to be logically valid as well as having a basis in reliable observations and existing accepted theory and you're ruining the logical part of your efforts.
David,I agree with much of what you wrote. But what about this?:
> I would hesitate to suggest that any perception is meaningless and worthless <
Okay, sp suppose ten people claim to hear a real improvement from that $485 replacement volume knob. How much credibility do you give the idea that this knob really makes a difference?
As you said elsewhere (paraphrasing), it's a waste of time to test tweaks that appear to have no rational basis.
As I said, I hesitate to say that any perception is meaningless and worthless and I went on to say "even if it isn't a perception of a genuine effect (in other words, even if the perception is mistaken), the perception still indicates something that can be of genuine interest like the fact that a mistake has been made so there may still be some meaning and worth to the perception."So is there anything of genuine interest in ten people claiming to hear a real improvement from that $485 replacement volume knob. There may be - it might convince someone else that there's a market in $50 replacement knobs.
OK, that's being facetious but I couldn't resist it :-) It may be useful info in the context of belief patterns associated with expenditure but I can't see much other meaning in it unless of course there's some reason for it actually producing a difference.
The only ways I could even imagine it causing a difference is that there is either some resonance associated with either the original knob and that is removed when the replacement is installed, or the replacement knob introduces a resonance which listeners find 'euphonic'. Nothing else makes sense. If I had one of that manufacturer's amps/preamps, what I'd be tempted to do is to test out the first of those alternatives by damping the original knob in some way - a couple of suitably sized broad rubber bands might do the trick nicely at the cost of a couple of cents - and see if I noticed a difference. If I did and I thought it sounded better I might even leave the rubber bands in place as a no brainer: low cost, perceived improvement, who gives a damn whether it really does or does not make a difference but it could be classified as a 'feel good' item. If the bands made no difference and it was looking like the knob might induce euphonic colouration of some kind, I'd simply forget it. I don't like colouration so I don't want to add euphonics if I can avoid it, and $485 worth of 'improvement' would need to be an awful lot of euphonics - definitely not my cup of tea.
I suspect if it really is doing something, it's the second alternative and it's adding some resonance of its own, as certain other fancy wood blocks and pucks are claimed to do. Since I've actually watched a classical guitar and lute maker at work, heard the differences in tone that different choices in wood can make, owned one of his instruments and experienced 'burn in' on a brand new instrument freshly strung and played for the first time (a phenomenon definitely due in part to getting the new strings stretched and up to the right tension - something that you notice for a brief period every time you restring a stringed instrument but also in part I believe to the wood going under tension for the first time and settling under tension), resonance from wood is something I can give some credence to. Even so, I'm not into adding colouration to reproduction. I think using different wood for tonal control is fine in an instrument, after all why shouldn't a performer choose the tonal colours they want in the music they play. When it comes to reproduction, however, I'd prefer not to mask the tonal colours chosen by the performers whose recordings I play with some other set of tonal colours that prevents me from hearing what's on the record. On the other hand, there are people out there who disagree with me totally on that point.
And that's being about as pragmatic about the idea of a knob making a difference as I can be. Run a very quick and dirty test out of sheer general interest to see whether the more interesting of my 2 guesses might have something in it, otherwise forget it. I don't think it's worth doing a scientifically conducted listening test - there's really nothing of scientific interest in finding out either that something can resonate and induce a colouration in reproduced sound that people like, or that people can fool themselves about expensive knobs. The first of those options isn't going to help anyone produce more accurate sound reproduction or something else of genuine intrinsic value, there's piles of people making that claim anyway, and it's easy and cheap for anyone to experiment with and reproduce themselves if they wish. There's really nothing new there at all. The second amounts to just another bit of trivia.
And I do have to say that I think the knob has a bit more going for it than the intelligent chip because I can think of two potential mechanisms by which it might actually do something - I can't think of any potential mechanisms for the chip.
David Aiken
David,> The only ways I could even imagine it causing a difference is that there is either some resonance associated with either the original knob and that is removed when the replacement is installed, or the replacement knob introduces a resonance which listeners find 'euphonic'. Nothing else makes sense. <
But even that doesn't make sense because a knob that resonates is not in the signal path.
> I've actually watched a classical guitar and lute maker at work, heard the differences in tone that different choices in wood can make <
Asolutely. But those are not only "in" the signal path, they are the signal.
> experienced 'burn in' on a brand new instrument <
Yes, but this is not "burn in" in the same sense that audiophiles mean. Wood changes over time - mostly its moisture content - and there's no doubt this affects the sound of an acoustic instrument. But a solid state amplifier is not a musical instrument. Note I qualify this with "solid state" because tubes are known to be microphonic. (In a bad way, I might add.)
> I can't think of any potential mechanisms for the chip <
Nor can I. So now we're back to the original question: How do you account for people who are adamant they hear a difference after using that chip? If people are deluding themselves over the benefit of the chip, then why not other tweaks that defy all that's known about the science of audio?
BTW, thanks for your continued sane and thoughtful posts on this.
Ethan,It's so much simpler to be back discussing a specific tweak :-)
""But even that doesn't make sense because a knob that resonates is not in the signal path."
Not sure that it isn't in the path. If one considers room surfaces as within the path, then the surface of the knob is a minor room surface…
But I know what you mean and I'm not sure that being in the signal path in that way is necessary. We know that unused musical instruments in a room can resonate when music is played in a room, and a Helmholtz resonator surely does. If something in the room resonates loudly enough, then it can affect the sound.
How loudly would it have to resonate? Well, at least as loudly as any overtone of an instrument/voice in a recording that's contributing to the tonality of the instrument/voice as we hear it. That means it doesn't have to be anywhere near as loud as the fundamental note in order to produce colouration since some overtones which contribute to the tonality of an instrument are quite a bit down in level from the fundamental (at extremely low frequencies, some overtones are also higher in level than the fundamental).
So, can a knob resonate that loudly? I don't know and it may well depend on the wood used and the frequency content of the music being played. At least that's a nice, clear cut and practical question for someone to attempt to answer and, if the answer is 'yes', then there's definitely nothing mysterious or unexplained about the phenomena.
As to how I account for people hearing differences with the chip? Well, if it really doesn't do anything as we suspect, there are at least three possible explanations I can think of. The first is the one that always gets trotted out - people believe there's a difference because they want there to be one. I've got no problem with that phenomena as a cause — I just don't think that account will explain every report. I think some reports may well result from the 'peer pressure' which can occur when someone listens in a group with a number of others who are convinced that it causes a difference, and allows themselves to be convinced by the others rather than their ears. I think some people may simply make a mistake listening to unfamiliar music on unfamiliar systems in a poor environment at a show. Tiredness and fatigue may also simply contribute to people making a mistake.
There are quite a few reasons for people making errors of perception and I think it is wrong simply to assume that everyone who does is gullible. That view simply doesn't have credibility for me.
And another reason for rejecting gullibility as a reason for everyone who reports hearing a difference doing so is that if that's how we explain that phenomenon, the 'inverse' of that explanation is going to end up being our single explanation for why some people don't report a difference when there genuinely is one - they don't report one because they simply don't believe in it. We have no worries rejecting that view - some people simply may not have the hearing acuity required, especially those with hearing impairments of some kind; test circumstances could have partially masked the difference; some people hesitate to report a difference they're not certain about; and once again simple perceptual error due to factors like tiredness and fatigue.
So, to my question.
Why is it that as a group we seem more willing to give the benefit of the doubt to people who fail to hear a difference that is there, wtiting that off as honest mistakes of one kind or another, than we are to those who think they do hear a difference that isn't there? Now there is a really interesting question which I think is critical to understanding why tweakers seem to excite such strongly antagonistic responses. I accept that there are going to be some tweaks where gullibility is much more of an issue than with others, but many tweaks don't fall into that category and many 'critics' simply can't see that there is a huge difference between how one considers a tweak like the chip and a tweak like acoustical treatment of a room or even just sitting on a higher or lower chair and changing one's position in relation to the speaker's driver array. Given the nature of the extremes in any individual difference that exists amongs humans, I have a sneaking suspicion that there may be as many people out there who are willing to dismiss any tweak out of hand, regardless of its basis, as there are people willing to accept any tweak out of hand, regardless of its basis. That just seems to be the nature of human difference - a relatively symmetrical bell curve distribution either side of the mean. Why shouldn't it apply here as well and, if it does, why are we generally more tolerant of people at the extreme on one side of the curve than we are of those at the extreme on the other side?
David,> the surface of the knob is a minor room surface <
Yeah, really REALLY minor! And it's not even in the path of the sound waves.
> The first is the one that always gets trotted out - people believe there's a difference because they want there to be one. <
Actually, that's farther down my list. I think in many cases it's simply because human perception is so frail, and auditory memory is so short. I already mentioned being in a different state of mind after crawling around on the floor to change speaker wires. And if I didn't mention this before I will now: When you get up to change a cable and sit down again, unless you sit in the exact same place - within half an inch - the frequency response can be different because of comb filtering at the listening position.
> Why is it that as a group we seem more willing to give the benefit of the doubt to people who fail to hear a difference <
That's not my experience at all. I hear tweakers all the time accuse others of having a tin ear, or not being experienced enough to hear some of the small (but of course real) improvements they can hear.
Ethan,"> the surface of the knob is a minor room surface <
Yeah, really REALLY minor! And it's not even in the path of the sound waves."
Not in the path of the sound waves!!!! Everything in the room is eventually in the path of the sound waves. Whether or not it's in a direct path depends solely on where you site the amp in relation to the speakers and if it isn't in the direct path, it's going to be in the reflected path. There are no points in a room in contact with air that are not impinged upon by sound waves.
We'll have to break you of that habit of leaping to unfounded universal statements. :-)
I didn't say it wasn't a minor surface, and I didn't say categorically that the knob resonated. I said that resonance was the only conceivable explanation I could conceive and that, if that turned out to be the case and the resonance was strong enough, there would then be nothing mysterious about the reported perceptions of a difference between the stock knob and the wooden knob.
"> The first is the one that always gets trotted out - people believe there's a difference because they want there to be one. <
Actually, that's farther down my list. I think in many cases it's simply because human perception is so frail, and auditory memory is so short. I already mentioned being in a different state of mind after crawling around on the floor to change speaker wires. And if I didn't mention this before I will now: When you get up to change a cable and sit down again, unless you sit in the exact same place - within half an inch - the frequency response can be different because of comb filtering at the listening position."
I don't agree that "human perception is so frail". If it was, we would always be double checking our perceptions and we don't. We quite happily rely on our perceptions without questioning them for most of our waking day and they rarely let us down in normal use. That doesn't sound 'frail' to me - in fact it sounds quite robust. Yes, auditory memory is short *in some ways* but it doesn't appear to be in others. We can and do become reliably familiar with the sound of some things like the voices of our family, close friends, and people we see frequently. We can recognise their voices after long absences, even when we can't see them, and when I say long absences I mean days to weeks and longer - much longer than the few minutes researchers associate with auditory memory. And this can and does extend to the sounds emitted by objects. We don't look strangely at someone who says of their car "it doesn't sound quite normal today" or "it sounds a little different today" and give them the third degree or automatically tell them they're imagining it, but we treat similar comments about audio systems quite differently. Why?Differences in position and factors like comb filtering won't explain why we treat audio differently. We hear the people whose voices we are familiar with in different positions in the room, at different distances, in different rooms, inside and outside, and recognition still occurs. It isn't clear to me that those sorts of factors necessarily play as big a part in mistakes about audio differences as you might think.
And I'm not trying to be funny. I'm simply making observations about what we see, hear and do in normal life — things we accept and regard as normal — and comparing them to how we treat things related to audio. Why should we treat audio differently to the rest of out lives?
As an explanation of where I'm coming from with this approach, I was very influenced by the work of JL Austin during my study of philosophy. Austin tried to resolve many questions by simply attending quite seriously to how normal people spoke about them in everyday life, rather than by how people spoke about the issue when engaged in a theeoretical discussion or analysis of it. What I'm doing is simply looking at how we treat perceptions in real life - the bulk of the time when we're away from our systems - and it does appear different. We don't question the majority of our perceptions and we do accept them implicitly most of the time. We simply wouldn't get all that much done in real life if we questioned every perception and tried to check it, and we wouldn't go about our lives as naturally as we do accepting our perceptions and relying on them pretty much automatically if there was a high error rate. When we do make errors the reason is usually obvious, and we do learn from experience if we start making the same error regularly and allow for that, regardless of whether the error was obvious or not. All we seem to need to start putting in the "check it" loop is the knowledge that we've made a mistake regularly in this particular sort of situation.
Frankly, I think the whole discussion of perception of difference and making mistakes in audio is not going to progress very far at all while it is divorced so radically from our approach to perception in the rest of our lives. I am not convinced that there is something special about listening to audio and making judgements which turns that activity into a special case which requires rules and explanations which seem to be completely unique. I'm not trying to say that there aren't differences but I'm inclined to think they're more of degree rather than kind.
"> Why is it that as a group we seem more willing to give the benefit of the doubt to people who fail to hear a difference <
That's not my experience at all. I hear tweakers all the time accuse others of having a tin ear, or not being experienced enough to hear some of the small (but of course real) improvements they can hear."
Yes, tweakers do accuse others of not hearing things but take a look at the sorts of comments that the majority - the people in the middle part of the bell curve - make about those at the extremes. There does seem to be a difference in the comments made about the tweaky end and the other end for whom nothing makes a difference. The tweaky end do seem to be regarded as 'stranger'.
David,> I said that resonance was the only conceivable explanation I could conceive <
I won't continue to argue about the audibility of a replacement volume control knob, or how much effect it may or may not have on the acoustics in a room. If your best explanation is that the knob actually is changing the sound - rather than realizing via common sense that this is clearly a case of poor auditory memory or wishful thinking - there's nothing I can say to change your mind.
Thanks.
That's not what I'm saying. As I said, ***IF*** it's doing something audible, then resonance is the only mechanism I can conceive. That's a long way from saying that it is actually changing the sound. Go back and re-read - nothing I said actually implies that it is changing the sound. It's all couched in terms of that very big "if".It seems that when confronted with something like the knob or the chip, your immediate reaction is to decide that there is no way it can do what it claims therefore it must do nothing.
My response is pretty much to withhold judgement if there is no explanation or the claims don't make sense and there's no strong evidence that it may do something. If something genuinely does work, people don't have to get the explanation right for it to work and there are an awful lot of things that worked right for centuries while science had the wrong explanation. Cauterising wounds saved lives for well over two thousand years before the germ theory of disease came along to give us our current understanding, and some of the reasons people believed before then were really way out. What counts in determining whether or not something works is the result, not the explanation. Confusing the two will always eventually lead to mistakes. As far as the knob goes, I haven't heard comments from anyone who has tried it, or from anyone who has tried to disprove it in any way other than laughing at the claim and that form of disproof does not work. There's more than a few bits of science that we now accept which were originally laughed at.
The simple fact that you can't think of a way that something may work, or that you don't believe that it can work, is not and can never be a proof that it doesn't work. You can believe what you like but, believing for invalid or ungrounded reasons that something doesn't work is just as illogical and unreasonable as believing for invalid or ungrounded reasons that it does work. If you don't have genuinely logical and grounded reasons, and grounded reasons includes a scientifically acceptable reason where one is available or pretty solid evidence that it actually does work if there isn't a currently acceptable scientific reason, then the only reasonable course is to withold judgement instead of leaping to a conclusion.
And it's funny how people seem to leap to the conclusion that if you're not explicitly agreeing with them, you have to be explicitly agreeing with "the other side".
David, a possible pair of thought processes might go something like:CASE 1 "He/ she doesn't quite get whatever it is, and ME, I know that there is something there to be gotten, but, as with him being a naysayer, it probably isn't worth the time or potential flame to attempt to show him the error of his way. What's the point?" Or, within your premise, "Maybe they just made an honest mistake".
CASE 2 "Oh, you bozo, how could you be so gullible/ weak/ dumb? I KNOW there is nothing there, and you are obviously a sucker/ putz/ moron, for thinking that there is."
It just seems easier to attack someone, rather than correct or educate them. Lazyness? Perhaps. Also, it would seem easier to go after someone when that someone seems to be getting something when nothing is there to get. Or seemingly nothing to be got. It may be the difference between someone simply expressing a wrong opinion, (your "honest mistake") as opposed to someone who is guilty of a much more serious mistake, bordering on the criminal, that of being fooled. (Being a fool?) We always tend to ridicule those who have been "had". Additionally, there may be an element of "degrees of rightness", versus "degrees of wrongness", if you will.
I'm sure there is a basic psychological maxim or premise at work here, I just don't know what the correct term is.It is a shame that there is so much division, which always spirals downward. It seems division creates ever more division.
![]()
Now, if some of the things people claim to hear but that can't be measured were identified repeatably in a proper test, I'd gladly reconsider. As far as I know this has not yet happened. And there's a reason this has not yet happened.
Hmmm. Just "a" reason? Do you mean to say that you have conclusively excluded all other possible reasons?
se
![]()
![]()
> Without repeatability in the form of double-blind test results, all subjective opinion is meaningless and worthless.>Without a "validation test" showing that forced-chioice audio DBTs with music are as sensitive as blind ABAB testing, they are not "scientific" and in fact seem to mask small audible differences.
If you would.
People can make 2 sorts of error in a test - they may not hear a difference where one exists, or they may hear a difference where none exists. In some cases the nature of the test results in the people taking it being more prone to one of those types of error tban the other. When that occurs, the test is biased towards a particular result and, if the bias is strong enough, the test can repeatedly yield the wrong result.There has been a lot of discussion and argument here previously about whether DBT testing, and in particular ABX tests, are associated with a tendency for the people being tested to not hear differences where they exist. If that is the case, and many believe that it is, then this sort of testing actually masks differences and is not the most appropriate form of testing for determining whether or not people can hear a difference.
The scientific method does not require double blind testing. It merely requires appropriate and repeatable test procedures. Appropriateness is determined by what will produce the most accurate, reliable and repeatable result. The kind of tests employed to determine whether or not someone can hear a difference between 2 cables are not the same sort of tests that are used to determine whether exposure to electromagnetic radiation causes cancer and those tests aren't the same as are used to determine whether greenhouse gas emmission causes global warming, and so on. Blind and double blind testing are often used in psychological and perceptual testing to eliminate some sources of error, but it's also important not to introduce a new source of error in the process.
Test design is a major discipline in itself because the test has to be one which ensures that any difference between the control and the test situation are maintained and not concealed, and the method of determining whether or not there is a difference between the two has to be impartial and not favour one side over the other, either deliberately or accidentally. That sometimes is not all that easy to achieve.
Can you briefly explain where the errors can arise in testing? I've long thought that there was an inherent flaw in DBT/ ABX, at least with regard to mindset, or opinion, or lack of either, on the part of the individual running or controlling the test.I'm sure ther may well be other, more concrete reasons as well.
![]()
> Can you briefly explain where the errors can arise in testing? I've long thought that there was an inherent flaw in DBT/ ABX, at least with regard to mindset, or opinion, or lack of either, on the part of the individual running or controlling the test.>All of those things you mention are variables which affect the results of a DBT which is why you need a lot of trials from a lot of people for the results to be statistically meaningful. A Swedish Audio Society DBT which was published a couple of years ago identified differences between two different CD players but found that experienced listeners (disc masterers and recording engineers) could identify the differences where the average audiophile couldn't. But there are more basic issues.
In actual science, no assumption can go untested. DBTs were designed for use in medicine for new drug trials and have been adapted to psychometric testing. In the latter application, a known test tone, noise or distortion artifact is introduced and the subjects are tested to see if they hear it, alone or added to a program, and in many cases these can be identified down to the threshold of hearing.
Since it is used in hearing tests like this, the ABX advocates assume that it is just as valid for comparing two audio components with unknown differences using music - a program which is dynamic and constantly changing. But this assumption has not been tested - i.e. forced choice ABX DBT compared to relaxed blind ABAB listening between two components of known and specified audible differences.
Part of the problem is correlating real audible differences to measurements because it's hit and miss (as you see from Stereophile's valiant atttempts), particularly when the differences are out of the frequency response domain - like imaging, tonal color or dynamic contrasts, for example. How do you creat two identical, say, amplifiers, but give one better dynamic contrasts and change nothing else for the test?
And then there's the theory of how the brain processes information. When you are relaxed, listening to music, your "right brain" (emotional, intuitive) is functioning. In an ABX-type DBT, you can listen to A and then switch to B. You are able to listen to each as long as you want switching back and forth. Now you switch to X. As your audible memory is quickly fading, you are forced to make a decision about whether X is A or B. Your brain switches to the "left side" (rational, logical) for the decision-making and you can no longer remember the differences you heard unless they are large, say about 2dB or greater, so you decide "no difference" and the results are null. DBT advocates scoff at this theory, but it has not been disproven by a validation test.
Most audio ABX DBT results are null. ABX-DBT proponents say most all equipment, except speakers, sound the same. Listen and decide for yourself.
![]()
And I always DO decide for myself.
In an ABX-type DBT, you can listen to A and then switch to B. You are able to listen to each as long as you want switching back and forth. Now you switch to X. As your audible memory is quickly fading, you are forced to make a decision about whether X is A or B.
No, you're not. Why do you keep perpetuating this myth?
You can switch back and forth between A and X or B and X just as long and as leisurely as you did A and B.
Hell, you don't even have to bother with comparing A and B. You can just choose to listen to either A and X or B and X.
Your brain switches to the "left side" (rational, logical) for the decision-making and you can no longer remember the differences you heard unless they are large, say about 2dB or greater, so you decide "no difference" and the results are null.
More mythology.
You don't have to remember the differences you heard. You simply have be aware of whether there's a difference or not. That's it. Once you're aware of that, then the decision making takes care of itself. After the listening is over.
se
![]()
![]()
> You can switch back and forth between A and X or B and X just as long and as leisurely as you did A and B.>Any other ABX users want to confirm this?
Here's what an article says about the ABX Comparator:
"The ABX Comparator is a manually-operated switcher that allows a listener or group thereof to select between either of two input sources (A or B) or an unknown source (X). When you push the X button, the device switches to A or B, but doesn't tell you which you're hearing. You're on your own. You make a note (on "score sheets" supplied) of whether you think it is A or B, and whether or not you prefer its sound, then go on to the next trial by pressing the button marked Up. This time, X may be the same device as it was previously, or it may be the other device. Only the ABX knows which is which. If you wish to refresh your memory about the sound of A or B, just push the appropriate button."
![]()
Yes, that's basically the way the ABX test works.It is important to note that this is different from the way we make our judgements in real life about whether or not there is a difference. In real life we generally don't swap back and forth between our usual component or whatever and the new one. It's hard to swap cables or an amplifier in real life - we would have to get up and move, change things which is sometimes physically difficult due to space or the weight or a component, then sit down and listen again. What we often, probably usually, do is listen to the way things have been, make the change and listen for a while before making an opinion, and maybe change things back again to see what we think going back to the original thing. If we call the original way A and the new way B, sometimes we notice a bigger difference going from A to B than from B to A, or vice versa which is why we sometimes think we hear a difference when we make the change but decide we didn't when we swap back, or vice versa as the case may be.
In order to be able to set things up with the comparator for an ABX test, you have to use quite a bit more in the way of interconnects and/or speaker cables and then there's the circuitry in the ABX box itself as well. What you actually listen to in an ABX test is one of the two states of the whole system PLUS those additional cables and the ABX comparator. Those additions should not make a difference but if they do, then that difference may either make any difference the change in the system introduces more apparent or less apparent. In other words it is conceivable that the ABX setup introduces a confounding difference into the test situation. ABX proponents say it doesn't and some others say it does.
The difference in the listening and decision making process - the ability to swap between A and B quickly and easily rather than listening for long periods to avoid having to get up and physically change things - may also make a difference. I suspect that the ABX proponents will say it makes things more accurate because the person being tested has the ability to totally familiarise themeselves with both A and B in any way they like and for as long as they like before pressing X and actually having the test. I think some others say that it can be easier to become confused as a result. In addition, in a test situation rather than normal life, some people may well be more inclined to say 'no difference' in cases where they think there might be a small difference but they're not quite sure.
So there's argument on both sides but it is important to note that there are differences in the reproduction chain in an ABX test to the chain in your living room when you try swapping something around at home, and there are some different characteristics to the listening and decision making process. Either these don't make a difference, in which case ABX is a good test procedure, or they do make a difference in which case it isn't because that could skew the results in favour of the wrong result - ie provide one of the two errors I mentioned in my short explanation. There are people with quite respectable credentials on both sides of the fence.
...and ABX DBTs compared to the way we normally listen and compare audio equipment. That piece is still missing and prevents the DBTers from claiming they truly have 'science' on their side.I'm not dismissing the different types of bias that may enter into observational listening comparisons, even blind ones. But there is no 'scientific' basis for claiming one method is superior to the other - based on the evidence we have now - they just involve different types of potential errors.
![]()
You need to consider what kind of result you have when a test indicates that people do hear a difference between 2 different things.It's not a pass/fail sort of thing. What the result indicates is that we can be confident to a particular level that people can accurately discriminate between the 2 things. The usual level of confidence required to be met for the result to be accepted as 'proof' is that it be significant to the .05 level - that amounts to being 95% positive that people can hear a difference.
Note that 100% confidence - significant to the 0.0% level is impossible. There are some people in the population who won't be able to discriminate between the 2 things because of hearing difficulties. Likewise you can't get a 100% confident result that people can see a difference between red and green colour patches because some people are colour blind.
Now, accepting that .05% significance requirement (we can change it up or down, but it is the usual standard accepted for most tests), what is critical to realise is that where the test result is going to be most questionable is when it's actually very close to that level of significance. If the result achieves .01% significance - ie 99% of people can reliably tell the difference - then it's pretty much a sure thing and no-one is going to question that sort of result. Test bias isn't going to cause a result to be that strong. Similarly, if the test only achieves .1% significance -ie only 90% of people can tell the difference - then once again test bias isn't likely to be the reason that the result is so unconvincing. No-one is likely to retest if the result is very strong in one direction or the other.
On the other hand, if the significance the test result achieved is very close to the .05 level we're using for the threshold of proof, say .051 or .049 for example, then test bias may be an issue and doing a different test may result in the outcome coming out on the other side of the .05 significance level than on the original test. Test bias is always only likely to have a slight impact on the overall level of significance the test achieves so it's only an issue when the test result is hovering about the level normally accepted as proving the hypothesis. Any test with a genuinely high degree would simply be discredited and not in use because respectable researchers would simply refuse to use it.
Note also that the level of significance achieved does have to be reasonably high for a result to be accepted as 'proof' that people can tell the difference between the 2 things. If someone was merely guessing, it is possible that they could have a good run of luck and get quite a few guesses right in a row in the short term but their accuracy score will drop to 50% or so over a long enough run. You need to set the threshold level you're going to accept reasonably high in order to ensure that you don't get fooled by a lucky run. The actual number of trials that have to be correct in order to achieve that level of significance will decrease slightly as the total number of trials increases. In other words, you need to demand very high accuracy over a small run of trials because it's more likely that a high score over a small number of trials could occur by chance than it is over a much larger number of trials.
And it is possible to get a statistical estimate of the level of bias a test introduces. If there's no bias, the incidence of people not hearing a difference when there is one should be the same as the incidence of them hearing a difference when there isn't one. It's possible to compare the size of the 2 errors, make allowances for the size of the samples, and come up with an estimate of what kind of error any bias in the test would produce. So you can tell by looking at the data and the overall result whether it's worth considering retesting.
Finally, there may or may not be problems with ABX testing. Read my post again and you will note that I was very careful not to say whether I thought there was or there wasn't. I've seen people argue both sides and my statistics studies were done long ago, were nothing more than very basic stats, and didn't include test design. I'm not qualified to make a decision on the question. But regardless of whether there are or are not problems with ABX, the situation may not be any better with any other test protocol you use. I wonder whether there are any perfect tests. Using one test to validate another if reservations can be raised about both is like trying to determine whether a ruler you bought at the office supplies shop is accurate by comparing it to a randomly selected 2nd brand of ruler. If the significance of your test result is borderline, you may well be better off either repeating the same test and doing a massive increase in the number of trials conducted because that would be expected to increase your significance, or reviewing the physical aspects of the test to see whether or not there's something in them that's masking the effect and removing any such features you find in order to make the effect more clear.
So even if there is some bias in a particular test protocol, really strong results one way or the other are likely to be reliable as indicators of whether or not there is a difference. It's the borderline results that are usually going to be contentious.
Hope there'e enough in that to satisfy you somewhat, because that's about the limit of my knowledge of test reliability and design, and there may even be some errors in that but I figured I better have a go at an answer since it was my post that prompted your question.
> You can switch back and forth between A and X or B and X just as long and as leisurely as you did A and B.>Any other ABX users want to confirm this?
What's to confirm? You've got three buttons. A, B and X and you can switch between them however you choose.
Here's what an article says about the ABX Comparator:
"The ABX Comparator is a manually-operated switcher that allows a listener or group thereof to select between either of two input sources (A or B) or an unknown source (X). When you push the X button, the device switches to A or B, but doesn't tell you which you're hearing. You're on your own. You make a note (on "score sheets" supplied) of whether you think it is A or B, and whether or not you prefer its sound, then go on to the next trial by pressing the button marked Up. This time, X may be the same device as it was previously, or it may be the other device. Only the ABX knows which is which. If you wish to refresh your memory about the sound of A or B, just push the appropriate button."
And I don't see anything in there that's inconsistent with what I said.
se
![]()
![]()
This post is made possible by the generous support of people like you and our sponsors: