Propeller Head Plaza

Technical and scientific discussion of amps, cables and other topics.

Return to Propeller Head Plaza


Message Sort: Post Order or Asylum Reverse Threaded

Testing aka hypothesis testing is subject to errors, Type 1 and type 2

124.171.195.14

Posted on November 20, 2019 at 20:54:04
Timbo in Oz
Audiophile

Posts: 23221
Location: Canberra - in the ACT - SE Australia
Joined: January 30, 2002
How about some paras about type 1 and type 2 error, for ANY kind of testing.

DBT & ABX, together or separately are hypothesis tests.

The Null hypothesis being that 'we can't hear the difference .....'

Any kind of hypothesis test is subject to the two errors. No matter if it's in a management context, or economics, or ... anything.

For audio testing it is quite difficult to achieve a high number of trials 'n', so both types of error are quite likely.

And, setting the P any tighter than 90/10, given the typically low 'n', is another driver of these errors.

I'm not anti testing.


Warmest

Tim Bailey

Skeptical Measurer & Audio Scrounger


 

Hide full thread outline!
    ...
Negative results of a single test have no significance. Nt, posted on November 24, 2019 at 03:02:38
Nt

 

Because 'n' is way too low! :-) and ;-) , posted on November 26, 2019 at 17:41:17
Timbo in Oz
Audiophile

Posts: 23221
Location: Canberra - in the ACT - SE Australia
Joined: January 30, 2002
Beta will be VERY high, close to 1.0 / 100%.




Warmest

Tim Bailey

Skeptical Measurer & Audio Scrounger


 

I might / Could have said BGBO., posted on December 16, 2019 at 22:30:30
Timbo in Oz
Audiophile

Posts: 23221
Location: Canberra - in the ACT - SE Australia
Joined: January 30, 2002
Blinding glimpse of the Bleeding Obvious.

Geoff? please engage with what I'm about, eh?


Warmest

Tim Bailey

Skeptical Measurer & Audio Scrounger


 

RE: Testing aka hypothesis testing is subject to errors, Type 1 and type 2, posted on January 27, 2020 at 13:17:47
pictureguy
Audiophile

Posts: 22597
Location: SoCal
Joined: October 19, 2008
In doing Control Charts for a 'process' we would gather a minimum of 20 data points making NO changes to the equipment. All averages and standard deviations recorded on a chart which would than be 'backward' engineered for a grand average with 'warning'lines at 1, 2, and 3 'sigma' from X.

No opinions here, just straight statistics. Opinion comes LATER when you try to figure out how to respond to out of control conditions and just what Those are. For example? And I could be misremembering here......Something like 10 points for 'X' above the centerline in a row. Or below.
Or ANY point above or below the 3 sigma 'control limit'.

Link is a good intro to such stuff. But keep in mind that some companies use an abbreviated version of such rules to keep from driving themselves NUTS.
Too much is never enough

 

Are you writing about hypothesis tests, which ABX and DBT tests are?, posted on January 28, 2020 at 01:33:14
Timbo in Oz
Audiophile

Posts: 23221
Location: Canberra - in the ACT - SE Australia
Joined: January 30, 2002
I'm not at all sure that the tests you are describing are relevant to audio testing, where we are trying to find out what 'we' as human beings, collectively / reliably can hear

ALL ABX/DBT tests of what humans can, or cannot, reliably hear, are - hypothesis tests.

They are thus subject to the fixed rules that all hypothesis tests must be assessed by.

Given the parameters chosen by by the testers.

Those choices - WILL - drive the reliability of the reported results.

I have not been offering 'opinions!

I have, simply, been identifying the statistical / mathematical rules which, inevitably, & necessarily, apply to all hypothesis tests.

If you don't get that, go and do some finding out, eh?!

OR, STFUp.

Almost all the reported 'can we hear the difference' tests, fail my 'bullshit' test.

Because? they either try to hard ( setting the tests too tight) / too few tests. OR, do too few tests, aka 'n'.

IF you do not understand these basic points, I suggest you give up and STFU, as well.

'Kay?









Yes, or No?


Warmest

Tim Bailey

Skeptical Measurer & Audio Scrounger


 

RE: Are you writing about hypothesis tests, which ABX and DBT tests are?, posted on January 28, 2020 at 10:50:20
pictureguy
Audiophile

Posts: 22597
Location: SoCal
Joined: October 19, 2008
All such tests must of course be evaluated statistically, right?
That's where I was headed.
Done more of the type I describe than I'll admit to.
Stats is required in many differrent fields, from Political Science to any of the sciences.

Furthermore, DBT and such are vastly complicated by the sheer amount of data and choices made, as you note, by the testers. Do you, for example test one person at a time or groups? Do you control for seating position if several persons at once?

The elephant in the room is of course one of Experimental Design. Depending on the number of variables, the number of tests needed for a valid result can skyrocket.

Yes, I think I've got a fair grip on the subject. And know enough not to 'try this at home'.
Too much is never enough

 

"And know enough not to 'try this at home'. ", posted on January 28, 2020 at 12:01:12
Timbo in Oz
Audiophile

Posts: 23221
Location: Canberra - in the ACT - SE Australia
Joined: January 30, 2002
Yes.

Very few reported ABX and DBT tests meet the low enough Beta requirement.

So, they prove nothing. I can recall one, by Audio magazine a few decades ago, about speaker cables IIRC which didn't.

And, shouldn't have been published or discussed.

The points about how many people and how seated, are relevant to another such by HFN&RR where the participants were across and in depth. Listening to a stereo system!!!!!!

Even if n had been high enough, I still didn't pay any attention to it.

In my 4 unit Statistics sub-major I had a Distinction average. ? B. Comm, Mgmt Science.

Uni here is unlike College. Most of the folks who are employed as Economists, Systems Analysts / Bus. Anal., Programmers, etc have a single degree.

But I was a paid Tutor in IS from 2nd semester on - late 80s to 90s ? A 'mid-life crisis' degree, backed up by a history in economic / industry policy analysis, using main-frames and minis.

Lots of 'Aha' moments.

Having worked for Canberra's top Hi-fi shop, I do have some VDH interconnects between sources and the pre-amp, as I got them at Duratone's cost, ;-) and ;-).

I've had them for a LONG time. But, the 'long', <20 ft cables, to the power amps are digital-network coax. THE RCA plugs and sockets are gold plated and fit tight. I do clean the sockets and plugs, once a year.

The Pre's output impedance is 600 ohms, and there are two pairs of cables. It's early morning and I haven't looked at an audio manual, but I 'think' that comes to 300 ohms. Into several Kohms. Not that it matters.

I have used laser devices and protractors to set up my sphere speakers and their angling back, aimed at me. Based on measurements by their builders.

Time and phase coherence, matters to me, as does absolute polarity, on simple stereo recordings. It can matter on typical Decca Xmas tree recordings, too.

Most visitors agree, but not all.

And yet! ... I'm a former hunter and infantryman, and range officer and marksmanship instructor!!! I did wear ear muffs when it was practical / range safe.

Perhaps it's all that singing as a leader in a large group of firsts boy sopranos in a cathedral. 'Timing and Pitch! Tim-Jim!' he said to me.

I'm not going to test for anything, until I install the QUADs and subs.

All, the sub boxes are now constructed, but not stuffed or 'finished'. ? Way too hot in the big garage, by 9 AM every day, plus the smoke haze that now seems permanent*.

And I water and fertilise, most mornings on the entire block, while it's still cool enough, plus *.


Warmest

Tim Bailey

Skeptical Measurer & Audio Scrounger


 

RE: "And know enough not to 'try this at home'. ", posted on January 28, 2020 at 12:47:52
pictureguy
Audiophile

Posts: 22597
Location: SoCal
Joined: October 19, 2008
My Statss background is MUCH less stellar than yours.
College was stats in PolySci. I still have my FAVORITE text of the period.....
'How to Lie With Stastics'.....mandatory reading for anyone running for office in this country.

Later, a LOT of my work duties centered on experimental design and stats. But these were ALL numerically based with no 'opinion' as part of the mix, unless I presented a solution which increased cycle time, in which case I would be requested to LOWER cycle time (increase production) at the possible expense of some numerical output. Probably standard deviation against the desired mean.

I would teach the practical aspects SPC to line operators. Mainly proper reaction and who to call in the event of an 'out of control' condition when measuring something. Our charts ended up all online.

One day perhaps I'll relate some experiences during an ISO mandated audit by automotive auditors


Too much is never enough

 

Page processed in 0.017 seconds.