75.54.119.222
| '); } else { document.writeln(''); } } else { document.writeln(''); } } else { document.writeln(''); } } // End --> |
In Reply to: RE: statistics question posted by mike1127 on June 25, 2009 at 11:17:03
There are a lot of potential problems with conducting amateur DBT's, see:
http://www.audioasylum.com/forums/prophead/messages/2190.html
http://www.audioasylum.com/forums/prophead/messages/2579.html
http://www.audioasylum.com/forums/prophead/messages/2580.html
for some discussion of the common problems and errors that get committed.
One of the biggest problems from a test methodology aspect, is that even experienced listeners get tired very easily and quickly. The suggestion (more like a demand) of the ABX folks that one use 16 trials has been one of the biggest problems in my opinion.
In the first cited URL I state that:
"The benchmark for the 16 trials was to get 12 or more correct, this would then establish that the listener had less than a 5% chance of just guessing that many correct. It is what is known as a confidence level of 95%. The criteria for what was considered 'good enough' so as to not be just due to chance, is supposed to be selected before the test, and then adhered to. Other confidence levels could be used, such as 99% (very strict, and usually extremely hard to do in these kinds of tests), or 90%. It should be noted, that for a 95% confidence level, that just conducting 20 runs would typically result in one that appeared to exceeded the 95% confidence level, even if everything was just random choices. So in order to take the test results as a valid positive, one would have to do better than this on the average."
What this means, is that you would have to perform the test more than once to satisfy most of the objectivist folks, otherwise they would be very likely to deny that a single test had any meaning.
If you ran say 10 such listening tests with 16 trials, and had more than half of them get more than 12 of the 16 trials correct, this would tend to be a strong indicator that tthe test results were showing something that was really there.
However, as I said above, doing 16 trials tends to become a self-fullfilling prophecy: many such tests end up with null results.
Why is this?
I cover that in the three cited URL's. A quick version would be that by the time you get past about 8 or 10 trials, listening fatique often sets in, and the rest of the results end up almost random.
If the last 7 or 8 trials are random, then even if you got 6 or 7 out of the first 8 correct, then the end result falls below the cutoff, and is declared "random results".
Yet if you look at doing just 9 trials, and get 7 correct, that is a p=0.09 (or 9%), which falls within a 90% criteria, rather than a 95% criteria.
I talk about the number of trials in the 3rd part of the cited URL's, and how many trials to run. All of these were selected to minimize the number of trials per test, to help minimize listening fatigue and then getting poor results.
I strongly suggest that you read the three URL's cited, and make sure that any listening tests you intend to conduct avoid the most common and worst of the mistakes I list.
Jon Risch
Follow Ups:
Post a Followup: