beautypg.com

Comparisons: definitions – Manley SKIPJACK User Manual

Page 11

background image

Comparisons: Definitions

-Basic A/B testing is when the user can select between two items and those items are clearly
indicated which is useful for simple direct comparisons.

-Blind A/B testing is when the user is not given any direct indication of the choice. This keeps
things more ‘honest’ and less prone to bias. Also useful, but if it involves a second party, a tester as
opposed to the testee, the tester might give some hint, glance or message, however subtle and
unconscious, due to their own bias or expectations.

-Double Blind A/B testing takes the above into account and neither the tester nor the testee knows
the selection as it is happening. This way the tester is also kept “honest”.

-A/B/X testing is particularly useful when the goal is determining thresholds of audibility. The
testee, is allowed to audition both A & B, and is then presented with a randomly selected X and they
press either the “A button” or “B button” to indicate which one they think it is. Obviously if they
are 50% correct, then their choice is no better than random and we can be confident that the testee
cannot hear an identifiable difference.

-Double Blind A/B/X testing is when both the testee and tester can’t know which X is.

-Feedback is a variation on any of the above too. The testee may be given some indication while are
choosing or if a blind test, after they have chosen, which then allows the testee to learn or refine
their skills. With blind A/B/X tests the feedback could either indicate ‘right/wrong’ or X was A or B
after the choice. An obscure variation on that theme is feedback after a number of test runs, where
the testee might be told, for example, that they got it right 3 out of 5 times, try again.

- Momentary silence between selections. Some comparisons, especially those that may have slight
time delays like digital converters benefit when a moment of silence inserted between selections.
Typical times are between 0.2 and 2 seconds. This prevents subtle cues based on precedence. Some
test procedures use a variation that quickly ramps down the current selection, then ramps up the
next. This prevents pops and clicks from skewing preferences and these artifacts are random and
dependent mostly on where (when) in a wave one flips a switch.

Each of the above common testing methods is the best way depending on the purpose of the test.
Basic A/B testing is not a great way to perform scientific or academic statistical analysis suitable for
publication, but is a good way to find a casual preference. Double blind A/B/X tests would probably
be overkill in that situation. However, because preference might be influenced by unwanted factors,
a bit more serious test would, at least, be “blind” and repeated enough times and results
accumulated. Using statistics and probabilities one could get a handle on how significant the
preference was. We see double-blind A/B/X testing with immediate feedback as the preferred
method for testing thresholds of audibility intended for possible publication and peer review.

11