|
Home
/ FAQ
/ News Classifieds / Events |
Audio Asylum Thread Printer |
Get a view of an entire thread on one page |
216.209.33.106
| '); } else { document.writeln(''); } } else { document.writeln(''); } } else { document.writeln(''); } } // End --> |
In Reply to: RE: statistics question posted by mike1127 on June 25, 2009 at 11:17:03
Use the calculator located at the link below.Example (set p=0.5 and don't change):
12 Trials (n=12), 10 Correct Identifications (k=10) [83% correct] (P-value) = 0.019287 (1.93%) is within your 5% level of sigificance
In general as the number of trials increase then required percentage of correct identification for a given level of significance decreases.
Example: 75 Trials: 46 correct identifications (61.33%) is within a 5% level of sigificance requirement.
---
More importantly you should take the time to read the contributions by Les Leventhal to the The Highs & Lows of Double-Blind Testing article that appears on the Stereophile site (Leventhal's stuff start on Page 2). This is an excellent primer on the statistics involved in typical DBT/ABX test. In particular you will learn of the problems inherent in tests with a low number of trials, such as 16 trials, in particular the high probability of what is called a Type 2 error (which in the case of DBT/ABX test generally means "... mistakenly concluding that audible differences are inaudible" as Leventhal puts it).
...
Back to your question ... we found that a 10 correct out of 12 trials test (83.3% correct) meets the 5% level of significance (pretty close to the 80% correct case).
Of course the probability of Type 2 error for a 12 trial test is even worst than it is for a 16 trial test, and the probability of Type 2 error in a 16 trial test is itself unacceptably high, certainly too high to be considered "serious science" to say the very least.
That said if you *do* consistently get 10 out of 12 correct then that in itself is strong evidence of sonic difference, the Type 2 Error concern comes in when failing to "score" 10 out of 12... just to be clear on that point.
Everything matters, don't forget to tweak your placebos!
Edits: 06/25/09
Meaningful significance and statistical significance are quite different. One can get very great statistical significance what a large random sample while have no impact on meaningful significance. A random sample of 25,000 would be sufficient for belt size to have a statistical significant impact on how people vote with no meaningful significance.
"""Meaningful significance and statistical significance are quite different. One can get very great statistical significance what a large random sample while have no impact on meaningful significance. A random sample of 25,000 would be sufficient for belt size to have a statistical significant impact on how people vote with no meaningful significance."""
But belt size is a continuous variable. We are talking about an ABX test, in which the answers are binary; true/false. Aren't these different topics?
This would only matter were you trying to say that all people can hear a difference and were using a random sample. As I understand it, you are only seeking to satisfy yourself, not to generalize. You seek to only say that it is improbable not that it is statistically significant. Inferential statistics are quite different than descriptive statistics.
Could you explain one more thing? I am aware that one can find correlation without proving causation. Seems like this issue is irrelevant to ABX testing.
Let's say in an ABX test I have a slight tendency to pick A. Because X is totally random in each trial, there is no way this can influence the results. That's my understanding.
easier to hear and better.
I must say that personally I am okay in just putting cables in and hearing a difference that I like. When I am studying whether people vote their party loyalty or whether states with concealed handgun laws have less crime, I am engaged in science and must be concern with causation, explanation, and whether the data are valid for the questions I am asking. When I am deciding whether one set of cables are better than another, I am not engaged in science. I am assessing my tastes in sound. The magnitude of the improve become important. Often I try blind tests if the improvement is small, but often if it is small I just stick with what I have.
Depending on how you conduct your testing, how thorough and rigorous you are, - your hypothesis and conclusions may vary. And some may call your testing methodolgy "poor science," - but it's still science. I have take issue with the neither those that that want more rigor or less; but, - I always appreciate tolerance for both sides.
Both sides of the river, there is bacteria; there must be meaning behind the moaning, is this living?
minded. When engage in information gathering to assess regularities of some benefit to society, I insisted on valid measurement of concepts, random samples, careful methodology, and care to avoid spurious relationships, when you cannot do real experiments. How I would love to randomly pick 25 states to have concealed handgun laws and 25 none and wait 20 years to see what differences there are between the states. My null hypothesis would, of course, be no differences.
I think it helps a lot that I am the whole population and that even if I'm wrong, there is little downside.
It seems to hit the fan each time someone posts that they did thus and so with good results. To someone else that may seem the depths of impossibility so they ridicule the poster rather than either trying it or just deciding that it's so unlikely that they aren't going to waste the time to check it out. Ridiculing others rather than thoughtfully examining your own understanding is very tempting and I've fallen off the wagon a few times myself.
I enjoyed this thread but like you, I believe, it's hard for me to see how statistics have much value for an individual listener. If I can't hear it, it doesn't matter and if I can I'll try to choose the best compromise if it isn't clear-cut. And I may share the result. Even if it isn't reliably predictive it does provide insights into things to try. And that's where AA shines, getting ideas to play with.
If I want to learn more about the underlying processes then I'd turn to measurements and try to find ones that correlate to the listening and from there try to reproduce the results with known changes which would hopefully be enough information to understand and usefully model whatever the process is.
Ironically even if it can be proven beyond question that Joe reliably hears a difference by putting marbles under his clock radio, that chunk of data alone adds little more to predicting my results than just his assertion that he does. One of the nicest things about this hobby is that you can try this stuff at home without the neighbors knowing.
Rick
a
"Meaningful significance and statistical significance are quite different."
An essential distinction.
There are two steps involved in going from statistical significance to meaningful significance . The first step bridges the gap from correlation to causation . The second step bridges the gap between an effect and a meaningful effect . Both steps are frequently contentious, as can be seen in numerous threads in this forum. The first step can be accomplished with a causal model, the second step requires a set of values.
Those people who view the world in terms of crude (e.g. black or white) facts should stay well away from anything to do with statistics.
Tony Lauck
"Diversity is the law of nature; no two entities in this universe are uniform." - P.R. Sarkar
Causation and explanation are the next steps in providing an understanding. I am mainly concerned with people dropping "statistical" and assuming that what they have found is meaningfully significant. You find this quite prevalently in research literature in the social sciences.
pounding of ones' chest ... I suggest you do a little searching and add if you locate as the addition would be a nice finishing touch to your post.
Everything matters, don't forget to tweak your placebos!
Not a problem, I was merely keeping it simple, not even addressing the larger issue; nor do I consider myself up to that task for that matter.
Everything matters, don't forget to tweak your placebos!
*
I understand.However it brings to mind an experience I had in 1st year university. I had this Economics professor who had a way of creating in his student's minds, mine included, a feeling of now understanding how it all works .
But in a class near the end of the semester he shocks us (well he certainly shocked me in any case) by declaring that everything we had learned was essentially incorrect, that as we progressed we'd discover it all to be egregious over-simplification. Yet he added that he still felt that his style of teaching with conviction , as he put it, was the correct way to approach a topic, basically a variation on the theme that one must first crawl to walk and that when at the crawl stage one should concentrate on doing that (alone) to the best of ones' abilities.
I was real life lesson that stuck with me.
Everything matters, don't forget to tweak your placebos!
Edits: 06/26/09
I would outline why each was said to be better. Then I would use data to show they were irrelevant. This is essentially teaching against the textbook.
Now I have written my own text and develop everything from the data themselves. It is still confusing, but for many I get them to think critically. The data show that "merit selection" of judges, where voters merely say whether a judge deserved another term or not does nothing for the quality of justice but does get younger judges and those with degrees from more prestigious law schools.
> Example (set p=0.5 and don't change):
Thanks for the link. Does lower-case p represent the probability of the null hypothesis?
OK, a little on null hypothesis...A null hypothesis is some statement that has can be modeled mathematically, and hence something you can compare experimental results against, specifically against what the mathematical model "says" about the value obtained experimentally.
For example we can mathematically model the probabilities associates with coin tosses, the probability of getting a head (or a tail) on a single independent toss (trial) is 50%, and we can answer questions like:
. what is the probability of getting exactly 12 Heads when we toss the coin 125 times.
. what is the probability of getting at least 23 Tails when we toss the coin 50 times.As it turns out the coin toss experiment for the second question, probability of getting X Heads (or Tails) for n tosses (trials) is modeled by the Binomial Mass Function when p=.5 (the probability of getting a head (or a tail) on a single independent trial).
Hence jumping ahead we see that the traditional DBT/ABX test is similar to coin tosses when modeled mathematically.
But before we get there ...
---
OK, let's say we have a cable test. We want to test if the cables "sound different". So what is our null hypothesis? Is it...
The cables sound different.
So we do our test, say we get 34 out of 50 correct, what does it mean? What is the mathematical model for "The cables sound different"? There isn't one! So forget it, that's not the null hypothesis!
Instead we propose that when it comes to being able to distinguish between the two cables (their "sound") that such is determined by "chance" alone, and that on a single independent trial the chance of correctly identifying X (i.e. in a traditional ABX test) is exactly 50%. Now we are getting somewhere, in fact that is our null hypothesis but we can put it simply as...
Distinguishing between the cables is determined by chance alone with p=.5
Hence now when we run the test and get some result we have something to compare against, namely the Binomial Mass Function with p=.5 (simple BMF hereafter) Aren't we clever!
Then we get to level of significance. Well for any test where the null hypothesis is modeled by the BMF (or by some other function for that matter) we decide in advance what result we require to "reject" the null hypothesis, that is we set a "level of significance" (LOS). A 5% LOS means...
We will reject the null hypothesis for any result for which the probability of obtaining that result due to chance alone is less than 5%.
So we run a test getting X correct identifications and the BMF tell us that probability of getting at least X correct identifications is 7.5%. Well that's greater than our 5% LOS so we *don't* reject the null hypothesis, in other words we accept that X correct identifications could have been due to chance alone... which fails to demonstrate a difference between the cables.
Now if we get Y correct identifications and the BMF tell us that probability of getting at least Y correct identifications is 1.3% then that's less than our 5% LOS so we *do* reject the null hypothesis, in other words we agree that the result *could not have been due to chance alone* (given our LOS) ... which would then imply that there is a difference between the cables.
Hope that helps.
Everything matters, don't forget to tweak your placebos!
Edits: 06/25/09 06/26/09
Post a Followup: