Informational masking: an experimentalist’s position statement Christophe Micheyl |
The energetic vs. informational distinction According to a loose definition, informational masking (IM) is any form of perceptual masking or interference that cannot be construed as energetic masking (EM). Defined as such, IM is too vast a concept to be of much practical use – just like timbre. This does not mean that the distinction between EM and IM is not important; it certainly provides a good starting point. But if we want to make some progress, we must go beyond it. Oh and, BTW, for those of you who think that the detection of a pure tone in noise involves only EM, read: Lutfi (1990). The peripheral vs. central distinction This distinction comes just above the previous one on the scientific ‘utility’ scale. Knowing that IM cannot be explained mainly or solely in terms of peripheral masking is somewhat useful. At the very least, it tells physiologists interested in the neural bases of this phenomenon that they shouldn’t just look in the auditory nerve. Unfortunately, contrary to the once predominant view among auditory scientists, the central auditory system is a lot more than just an appendix to the cochlea; so, knowing that IM is central still leaves us with a world of possible mechanisms to explore. In addition, it’s worth noting that, even though IM undoubtedly involves central phenomena, peripheral functional properties such as frequency selectivity can nonetheless play an important role in determining it: in particular, the amount of IM in experiments where the effect appears to be due chiefly to listener’s uncertainty can be predicted (quantitatively and rather successfully) by the ensemble variance of peripheral auditory filter outputs evoked by randomly-varying maskers (see Lutfi, 1993). From that point of view, the peripheral vs. central distinction starts to fade. The low- vs. high-uncertainty distinction This marks the great divide between the “purists”, who think that true IM only arises in conditions where the listener is highly uncertain about what the next stimulus is going to be like, and the “generalists” who count as IM perceptual interference or masking that cannot be EM, yet is observed under conditions where uncertainty is minimal. For instance, the latter might categorize MDI or other forms of across-channel interference as IM. However, we would like to suggest that the notion of “uncertainty” contributes to blur things rather than to clarify them. Until uncertainty can be defined precisely and measured in another way than through IM (which otherwise leads to a circular argument), the notion should be abandoned in favor of physical and quantifiable notions like the degree of stimulus variability or the number of possible stimuli that can occur during the course of an experiment. The synthetic vs. analytic distinction, selective attention, and perceptual organization For some, IM is a failure of selective attention to the target. While this may be true, it does not help much, unless we can more easily determine why selective attention to the target fails than we can determine why IM occurs. Unfortunately, at this point, the factors that determine a listener’s selective attention seem to be at least as obscure as those that determine IM. Let’s note also that there are different ways in which selective attention to the target can fail. The listener’s attention can be distracted away from the target by one or more of the masker components (i.e., the listener is still attending selectively but to the wrong thing). Alternatively, the listener may be unable to perceptually separate the target from the masker components, and IM may then result from the resulting global (synthetic) percept being dominated by the more energetic and numerous masker components. Looking for ways of experimentally measuring the respective contributions (if any) of these two types of mechanisms (i.e., distraction and perceptual grouping) seems like a necessary avenue for research on IM. One interesting implication of the notion that IM depends upon perceptual organization (i.e., little or no IM if the target can be perceptually segregated from the masker), is that knowing what factors promote segregation, we would also know what factors should reduce IM. Let’s note in this respect that perceptual segregation depends not just on the properties of the stimulus, but also on the listener’s experience (internal templates or memory traces). The distinction between analytic and synthetic listeners appears to imply that listeners differ greatly in their ability to form or utilize templates. The question, then, is why should they? Prior experience comes to mind, and while some “synthetic” listeners appear completely impermeable to protracted psychoacoustical training (Neff and Dethlefs, 1995), musical training appears to favor analytic-listening abilities (Oxenham et al., 2003), suggesting that at least some forms of training can influence IM. Exploring the perceptual and neural mechanisms of IM: start simple We’d like to conclude this statement by arguing for a “start-simple” philosophy, which we are currently putting in practice as we’re trying to better understand (and build models of) the perceptual and neural mechanisms of another vast and complex aspect of auditory perception: auditory scene analysis. Rather than trying to embrace the phenomenon in all its variety and complexity, we start by selecting a simple experimental paradigm, which captures some basic aspect of the perceptual phenomenon under study. As it turns out, one of the paradigms that are working with comes from the IM literature, and involves the detection of repeating target tones among randomly varying, multi-component masker tones (Kidd et al., 1994; 2003). This paradigm meets all the above requirements: it minimizes the role of peripheral EM (thanks to the use of a protected spectral region around the target), produces high uncertainty (with spectro-temporally varying maskers), and allows us to explore the respective contributions of selective attention, and perceptual organization. Even with such a simple paradigm, there are numerous stimulus parameters to control, and many factors that can influence perception, including for instance the average frequency separation between the target and masker components, their repetition rate, temporal regularity,… Yet, with careful stimulus design, the respective contributions of different factors can be determined. Preliminary results suggest that performance is determined by only a few factors, such as the average frequency separation between the target and the maskers and their relative repetition rates. Other factors, such as temporal regularity, appear to play only a marginal role in the listener’s ability to hear-out the repeating target tones among the random multi-tone maskers. Thus, the phenomenon may involve less complex and varied mechanisms than it may have seemed at first. These psychophysical results are then used to guide the search for neurophysiological correlates of the perceptual phenomenon in the primary auditory cortex, and inspire the design of models involving relatively simple neural mechanisms such as frequency selectivity, lateral and forward inhibition, and adaptation. The relative success of this simple-minded approach to the exploration of the neural basis of auditory stream segregation in recent years (e.g., Bee and Klump, 2004; Fishman et al., 2004; Micheyl et al., 2005) leads us to belief that a similar strategy may also benefit the understanding of IM. References Bee MA, Klump GM. (2004) Primitive auditory stream segregation: a neurophysiological study in the songbird forebrain. J Neurophysiol. 92, 1088-1104. Fishman YI, Arezzo JC, Steinschneider M. (2004) Auditory stream segregation in monkey auditory cortex: effects of frequency separation, presentation rate, and tone duration. J Acoust Soc Am. 116, 1656-1670. Kidd G Jr, Mason CR, Deliwala PS, Woods WS, Colburn HS (1994) Reducing informational masking by sound segregation. J Acoust Soc Am 95, 3475-3480. Kidd G Jr, Mason CR, Richards VM. (2003) Multiple bursts, multiple looks, and stream coherence in the release from informational masking. J Acoust Soc Am. 114, 2835-2845. Lutfi RA (1990) How much masking is informational masking? J Acoust Soc Am. 88, 2607-2710. Lutfi RA (1993) A model of auditory pattern analysis based on component-relative-entropy. J Acoust Soc Am. 94, 748-758. Micheyl C, Tian B, Carlyon RP, Rauschecker JP. (2005) Perceptual organization of tone sequences in the auditory cortex of awake macaques. Neuron 48, 139-148. Neff DL, Dethlefs TM (1995) Individual differences in simultaneous masking with random-frequency, multicomponent maskers. J Acoust Soc Am. 98, 125-134. Oxenham AJ, Fligor BJ, Mason CR, Kidd GJr (2003) Informational masking and musical training. J Acoust Soc Am. 2003 114, 1543-1549. |
Chuck Watson First, I am sorry I cannot attend this
conference. Useful discussions have already begun on the web site and I
will follow them with interest. Barbara and Shihab were kind enough to
allow me to add a couple of thoughts to the growing collection and I
appreciate that, and also congratulate them for doing such a good job of
organizing this workshop. Bill Yost’s contribution mentioned my
past forays into IM, so I’ll begin by responding to a couple of his
thoughts. The easiest one is his concerns about whether we should be
talking about informational masking at all. In our first mention
of that term (1976) we had reported that in tonal patterns the later
occurring components tended to interfere with the processing of earlier
ones, and noted that others had reported similar effects which authors
“…variously referred to as ‘recognition masking’, ‘blanking’,
‘informational masking’, or ‘temporal interference.’” In that paper we
elected to use “recognition masking”, since it seemed a better
descriptor of loss of frequency resolution as a consequence of
interference by temporally adjacent stimuli. By 1981, however, we had
realized that under high trial-to-trial uncertainty, not only were
frequency-discrimination thresholds elevated for components of tonal
patters, but the detection thresholds for those same components were
also elevated by as much as 40-50 dB, above the levels measured under
minimal uncertainty. That made me a bit more comfortable in referring
to the influence of context on target-tone processing as “informational
masking”, since the shift in detectability that some might infer from
that term did in fact occur. Last year, in the paper in Acustica, I
argued for explicit recognition of at least two forms of IM, IM(S) and
IM(U). I am certainly not wedded to those terms, but do feel that the
experimental literature supports the considerable difference between the
effects of signal-masker similarity (better, SN vs. N) and of
uncertainty, or familiarity. Those effects differ greatly in
total magnitude and also in the duration of perceptual learning, and it
seems certain that at least some of their neurophysiological correlates
differ. To return to Bill’s concerns, I don’t
believe there are three different sorts of phenomena at work here,
discretely affecting detection, discrimination and recognition because,
at least in our work with tonal patterns, the changes in detection and
discrimination thresholds are so highly correlated that it seems most
likely that they are different measures of the same mechanism of
interference. The commonality of the three is also captured by Bill’s
observation that the ability to recognize implies discrimination, which
in turn implies detection. Now as to Bill’s concern that the term
“masking” should be reserved for detection experiments, I would not be
unhappy if we all began to refer to the interference of one
stimulus with another. A rose is a rose. Bill wants to restrict
“interference” to shifts in discrimination thresholds, but that term is
so well established in a broader sense that it might better be used as
the generic label for the whole class. Thus critical-band masking is
one form of interference, and the degraded recognition caused by a
same-sex competing talker is another quite different one. As to Bill’s quip that IM was neither
informational nor did it involve masking, that is cute, but wrong on
both counts. In a series of experiments we studied the role of the
number and duration of target components, finally concluding that the
essential measure was the duration of the target tone as a proportion of
the total pattern. This seemed to imply an informational limit. Then
Bob Lutfi’s CoRE model made the informational assumption explicit and
managed to fit the data from almost (not quite) all of our experiments.
In the case of “masking”, by which Bill here apparently means only
detection threshold shifts, we have reported large shifts in detection
thresholds as uncertainty is manipulated, in numerous experiments..
Then there is Bill’s concern with the
central-peripheral distinction, which must hark back to his training as
a behaviorist. All I can suggest in response is this:
“…the limiting factors for discrimination of complex sounds are often central, whereas those that limit our hearing of tones, noise bursts, or clicks tend to be peripheral. We should probably clarify our use of the central-peripheral distinction. It is obvious that you cannot prove a physiological hypothesis with psychophysical data. Therefore, no literal anatomical or physiological inferences are intended by this distinction. It is merely a convenient way of summarizing some assumptions about the functional sequence of events that must occur as sensory information is processed. These assumptions are: (1) that there is an early
stage of auditory transduction that imposes certain fixed limits on
the resolving power of the whole system; and (2) that additional
limitations on information transmission are imposed at succeeding
stages of processing, some of which may not be specific to the
auditory system but rather are common to all sensory modalities.
These are not new assumptions. To them we have added an additional
criterion by which some of the postulated central and peripheral
factors limiting information processing might be distinguished. It
is that those limits on information processing that can be modified
by manipulating stimulus uncertainty, or by overtraining, are
central, whereas those insensitive to such limitations may not be.
This distinction is modest, but we believe it can be useful in
summarizing a variety of new research findings.”
That is what we thought about
central-peripheral distinctions in 1981 (Watson and Kelly) and that is
still pretty much the way I feel about them. The 2003 letter by Durlach
et al., assigns “IM” to degrading effects apparently occurring at higher
levels than the CN, which makes reasonable sense to me, albeit it will
be a bit difficult to implement their definition. As to Bill’s
identification of selective attention as the key concept, darn right!
I think it is important to recall an
important difference between a lot of IM studies and the way we hear
things in the real world. We seem able to selectively attend to
essential stimulus details (establish appropriate streams?) without any
required period of warm up or “stream forming.” In our work with
word-length ten-tone patterns under high-uncertainty conditions the
resolution of component frequency, duration, and intensity all were
extremely degraded, compared to performance under minimal uncertainty
(same pattern on every trial). But, following min(U) training, the now
familiar patterns could be presented in random order and the precise
resolution achieved by that training was maintained in the higher
uncertainty conditions (Spiegel and Watson, 1981).
We have recently returned to thinking
about the benefits of familiarity in relation to work on individual
differences in auditory abilities (briefly described in Watson and Kidd,
2002; expanded article in preparation). First, individual differences
in frequency and temporal resolution, obtained with many different
measures, fail to predict differences in listeners’ abilities to
identify speech sounds, under speech-babble masking. Second, the only
non-speech measure that is correlated with several measures of speech
processing (recognition of nonsense syllables, words, sentences), is the
ability to identify familiar environmental sounds. In other
words, the ability to benefit from familiarity differs from one person
to another, and it covarys for speech and non-speech sounds. Tasks that
group together in factor analyses in repeated studies (there have now
been some 5-6 studies finding speech processing to be distinct from
other measures of auditory ability) may have a common neurophysiological
correlate and, Bill, we doubt that it is in the cochlea. Thanks for your attention. Wish I
could be there. Chuck Spiegel, M.F. and Watson, C.S. (1981). Factors
in the discrimination of tonal patterns. III. Selective attention and
the level of target tones. J. Acoust. Soc. Am., 69, 223-230. Watson, C. S. (2005) Some comments on informational masking. Acta Acustica united with Acustica, 91,
502-512. Watson, C.S., Kelly, W.J. and Wroton, H.W.
(1976). Factors in the discrimination of tonal patterns. II. Selective
attention and learning under various levels of stimulus uncertainty.
J. Acoust. Soc. Am., 60, 1176-1186. Watson, C.S. and Kelly, W.J. (1981). The role
of stimulus uncertainty in the discrimination of auditory patterns. In
D.J. Getty and J.H. Howard (Eds.), Auditory and Visual Pattern
Recognition. Lawrence Erlbaum Associates. Watson, C. S. and Kidd, G. R. (2002) On the lack of association between basic auditory abilities, speech processing, and other
cognitive skills. Seminars in Hearing, 23, 83-93. |