Informational masking: an experimentalist’s position statement 

Christophe Micheyl

The energetic vs. informational distinction

According to a loose definition, informational masking (IM) is any form of perceptual masking or interference that cannot be construed as energetic masking (EM). Defined as such, IM is too vast a concept to be of much practical use – just like timbre. This does not mean that the distinction between EM and IM is not important; it certainly provides a good starting point. But if we want to make some progress, we must go beyond it. Oh and, BTW, for those of you who think that the detection of a pure tone in noise involves only EM, read: Lutfi (1990). 

The peripheral vs. central distinction

This distinction comes just above the previous one on the scientific ‘utility’ scale. Knowing that IM cannot be explained mainly or solely in terms of peripheral masking is somewhat useful. At the very least, it tells physiologists interested in the neural bases of this phenomenon that they shouldn’t just look in the auditory nerve. Unfortunately, contrary to the once predominant view among auditory scientists, the central auditory system is a lot more than just an appendix to the cochlea; so, knowing that IM is central still leaves us with a world of possible mechanisms to explore. In addition, it’s worth noting that, even though IM undoubtedly involves central phenomena, peripheral functional properties such as frequency selectivity can nonetheless play an important role in determining it: in particular, the amount of IM in experiments where the effect appears to be due chiefly to listener’s uncertainty can be predicted (quantitatively and rather successfully) by the ensemble variance of peripheral auditory filter outputs evoked by randomly-varying maskers (see Lutfi, 1993). From that point of view, the peripheral vs. central distinction starts to fade. 

The low- vs. high-uncertainty distinction

This marks the great divide between the “purists”, who think that true IM only arises in conditions where the listener is highly uncertain about what the next stimulus is going to be like, and the “generalists” who count as IM perceptual interference or masking that cannot be EM, yet is observed under conditions where uncertainty is minimal. For instance, the latter might categorize MDI or other forms of across-channel interference as IM. However, we would like to suggest that the notion of “uncertainty” contributes to blur things rather than to clarify them. Until uncertainty can be defined precisely and measured in another way than through IM (which otherwise leads to a circular argument), the notion should be abandoned in favor of physical and quantifiable notions like the degree of stimulus variability or the number of possible stimuli that can occur during the course of an experiment. 

The synthetic vs. analytic distinction, selective attention, and perceptual organization

For some, IM is a failure of selective attention to the target. While this may be true, it does not help much, unless we can more easily determine why selective attention to the target fails than we can determine why IM occurs. Unfortunately, at this point, the factors that determine a listener’s selective attention seem to be at least as obscure as those that determine IM. Let’s note also that there are different ways in which selective attention to the target can fail. The listener’s attention can be distracted away from the target by one or more of the masker components (i.e., the listener is still attending selectively but to the wrong thing). Alternatively, the listener may be unable to perceptually separate the target from the masker components, and IM may then result from the resulting global (synthetic) percept being dominated by the more energetic and numerous masker components. Looking for ways of experimentally measuring the respective contributions (if any) of these two types of mechanisms (i.e., distraction and perceptual grouping) seems like a necessary avenue for research on IM. One interesting implication of the notion that IM depends upon perceptual organization (i.e., little or no IM if the target can be perceptually segregated from the masker), is that knowing what factors promote segregation, we would also know what factors should reduce IM. Let’s note in this respect that perceptual segregation depends not just on the properties of the stimulus, but also on the listener’s experience (internal templates or memory traces). The distinction between analytic and synthetic listeners appears to imply that listeners differ greatly in their ability to form or utilize templates. The question, then, is why should they? Prior experience comes to mind, and while some “synthetic” listeners appear completely impermeable to protracted psychoacoustical training (Neff and Dethlefs, 1995), musical training appears to favor analytic-listening abilities (Oxenham et al., 2003), suggesting that at least some forms of training can influence IM. 

Exploring the perceptual and neural mechanisms of IM: start simple

We’d like to conclude this statement by arguing for a “start-simple” philosophy, which we are currently putting in practice as we’re trying to better understand (and build models of) the perceptual and neural mechanisms of another vast and complex aspect of auditory perception: auditory scene analysis. Rather than trying to embrace the phenomenon in all its variety and complexity, we start by selecting a simple experimental paradigm, which captures some basic aspect of the perceptual phenomenon under study. As it turns out, one of the paradigms that are working with comes from the IM literature, and involves the detection of repeating target tones among randomly varying, multi-component masker tones (Kidd et al., 1994; 2003). This paradigm meets all the above requirements: it minimizes the role of peripheral EM (thanks to the use of a protected spectral region around the target), produces high uncertainty (with spectro-temporally varying maskers), and allows us to explore the respective contributions of selective attention, and perceptual organization. Even with such a simple paradigm, there are numerous stimulus parameters to control, and many factors that can influence perception, including for instance the average frequency separation between the target and masker components, their repetition rate, temporal regularity,… Yet, with careful stimulus design, the respective contributions of different factors can be determined. Preliminary results suggest that performance is determined by only a few factors, such as the average frequency separation between the target and the maskers and their relative repetition rates. Other factors, such as temporal regularity, appear to play only a marginal role in the listener’s ability to hear-out the repeating target tones among the random multi-tone maskers. Thus, the phenomenon may involve less complex and varied mechanisms than it may have seemed at first. These psychophysical results are then used to guide the search for neurophysiological correlates of the perceptual phenomenon in the primary auditory cortex, and inspire the design of models involving relatively simple neural mechanisms such as frequency selectivity, lateral and forward inhibition, and adaptation. The relative success of this simple-minded approach to the exploration of the neural basis of auditory stream segregation in recent years (e.g., Bee and Klump, 2004; Fishman et al., 2004; Micheyl et al., 2005) leads us to belief that a similar strategy may also benefit the understanding of IM. 

References

    Bee MA, Klump GM. (2004) Primitive auditory stream segregation: a neurophysiological study in the songbird forebrain. J Neurophysiol. 92, 1088-1104.

    Fishman YI, Arezzo JC, Steinschneider M. (2004) Auditory stream segregation in monkey auditory cortex: effects of frequency separation, presentation rate, and tone duration. J Acoust Soc Am. 116, 1656-1670.

    Kidd G Jr, Mason CR, Deliwala PS, Woods WS, Colburn HS (1994) Reducing informational masking by sound segregation. J Acoust Soc Am 95, 3475-3480.

    Kidd G Jr, Mason CR, Richards VM. (2003) Multiple bursts, multiple looks, and stream coherence in the release from informational masking. J Acoust Soc Am. 114, 2835-2845.

    Lutfi RA (1990) How much masking is informational masking? J Acoust Soc Am. 88, 2607-2710.

    Lutfi RA (1993) A model of auditory pattern analysis based on component-relative-entropy. J Acoust Soc Am. 94, 748-758.

    Micheyl C, Tian B, Carlyon RP, Rauschecker JP. (2005) Perceptual organization of tone sequences in the auditory cortex of awake macaques. Neuron 48, 139-148.

    Neff DL, Dethlefs TM (1995) Individual differences in simultaneous masking with random-frequency, multicomponent maskers. J Acoust Soc Am. 98, 125-134.

    Oxenham AJ, Fligor BJ, Mason CR, Kidd GJr (2003) Informational masking and musical training. J Acoust Soc Am. 2003 114, 1543-1549.


Chuck Watson 
 

First, I am sorry I cannot attend this conference.  Useful discussions have already begun on the web site and I will follow them with interest.  Barbara and Shihab were kind enough to allow me to add a couple of thoughts to the growing collection and I appreciate that, and also congratulate them for doing such a good job of organizing this workshop. 
 

Bill Yost’s contribution mentioned my past forays into IM, so I’ll begin by responding to a couple of his thoughts.  The easiest one is his concerns about whether we should be talking about informational masking at all.  In our first mention of that term (1976) we had reported that in tonal patterns the later occurring components tended to interfere with the processing of earlier ones, and noted that others had reported similar effects which authors “…variously referred to as ‘recognition masking’, ‘blanking’, ‘informational masking’, or ‘temporal interference.’”  In that paper we elected to use “recognition masking”, since it seemed a better descriptor of loss of frequency resolution as a consequence of interference by temporally adjacent stimuli.  By 1981, however, we had realized that under high trial-to-trial uncertainty, not only were frequency-discrimination thresholds elevated for components of tonal patters, but the detection thresholds for those same components were also elevated by as much as 40-50 dB, above the levels measured under minimal uncertainty.  That made me a bit more comfortable in referring to the influence of context on target-tone processing as “informational masking”, since the shift in detectability that some might infer from that term did in fact occur.  Last year, in the paper in Acustica, I argued for explicit recognition of at least two forms of IM, IM(S) and IM(U).  I am certainly not wedded to those terms, but do feel that the experimental literature supports the considerable difference between the effects of signal-masker similarity (better, SN vs. N) and of uncertainty, or familiarity. Those effects differ greatly in total magnitude and also in the duration of perceptual learning, and it seems certain that at least some of their neurophysiological correlates differ.   
 

To return to Bill’s concerns,  I don’t believe there are three different sorts of phenomena at work here, discretely affecting detection, discrimination and recognition because, at least in our work with tonal patterns, the changes in detection and discrimination thresholds are so highly correlated that it seems most likely that they are different measures of the same mechanism of interference.  The commonality of the three is also captured by Bill’s observation that the ability to recognize implies discrimination, which in turn implies detection.  Now as to Bill’s concern that the term “masking” should be reserved for detection experiments, I would not be unhappy if we all began to refer to the interference of one stimulus with another.  A rose is a rose.  Bill wants to restrict “interference” to shifts in discrimination thresholds, but that term is so well established in a broader sense that it might better be used as the generic label for the whole class.  Thus critical-band masking is one form of interference, and the degraded recognition caused by a same-sex competing talker is another quite different one. 
 

As to Bill’s quip that IM was neither informational nor did it involve masking, that is cute, but wrong on both counts.  In a series of experiments we studied the role of the number and duration of target components, finally concluding that the essential measure was the duration of the target tone as a proportion of the total pattern.  This seemed to imply an informational limit.  Then Bob Lutfi’s CoRE model made the informational assumption explicit and managed to fit the data from almost (not quite) all of our experiments.  In the case of “masking”, by which Bill here apparently means only detection threshold shifts, we have reported large shifts in detection thresholds as uncertainty is manipulated, in numerous experiments..   
 

Then there is Bill’s concern with the central-peripheral distinction, which must hark back to his training as a behaviorist.  All I can suggest in response is this:    
 

    “…the limiting factors for discrimination of complex sounds are often central, whereas those that limit our hearing of tones, noise bursts, or clicks tend to be peripheral.  We should probably clarify our use of the central-peripheral distinction.  It is obvious that you cannot prove a physiological hypothesis with psychophysical data.  Therefore, no literal anatomical or physiological inferences are intended by this distinction.  It is merely a convenient way of summarizing some assumptions about the functional sequence of events that must occur as sensory information is processed.  These assumptions are:

    (1) that there is an early stage of auditory transduction that imposes certain fixed limits on the resolving power of the whole system; and (2) that additional limitations on information transmission are imposed at succeeding stages of processing, some of which may not be specific to the auditory system but rather are common to all sensory modalities.  These are not new assumptions.  To them we have added an additional criterion by which some of the postulated central and peripheral factors limiting information processing might be distinguished.  It is that those limits on information processing that can be modified by manipulating stimulus uncertainty, or by overtraining, are central, whereas those insensitive to such limitations may not be.  This distinction is modest, but we believe it can be useful in summarizing a variety of new research findings.”  
     

That is what we thought about central-peripheral distinctions in 1981 (Watson and Kelly) and that is still pretty much the way I feel about them.  The 2003 letter by Durlach et al., assigns “IM” to degrading effects apparently occurring at higher levels than the CN, which makes reasonable sense to me, albeit it will be a bit difficult to implement their definition.  As to Bill’s identification of selective attention as the key concept, darn right!  
 

I think it is important to recall an important difference between a lot of IM studies and the way we hear things in the real world.  We seem able to selectively attend to essential stimulus details (establish appropriate streams?) without any required period of warm up or “stream forming.” In our work with word-length ten-tone patterns under high-uncertainty conditions the resolution of component frequency, duration, and intensity all were extremely degraded, compared to performance under minimal uncertainty (same pattern on every trial).  But, following min(U) training, the now familiar patterns could be presented in random order and the precise resolution achieved by that training was maintained in the higher uncertainty conditions (Spiegel and Watson, 1981).   
 

We have recently returned to thinking about the benefits of familiarity in relation to work on individual differences in auditory abilities (briefly described in Watson and Kidd, 2002; expanded article in preparation).  First, individual differences in frequency and temporal resolution, obtained with many different measures, fail to predict differences in listeners’ abilities to identify speech sounds, under speech-babble masking.  Second, the only non-speech measure that is correlated with several measures of speech processing (recognition of nonsense syllables, words, sentences), is the ability to identify familiar environmental sounds.  In other words, the ability to benefit from familiarity differs from one person to another, and it covarys for speech and non-speech sounds. Tasks that group together in factor analyses in repeated studies (there have now been some 5-6 studies finding speech processing to be distinct from other measures of auditory ability) may have a common neurophysiological correlate and, Bill, we doubt that it is in the cochlea. 
 

Thanks for your attention.  Wish I could be there. 
 

Chuck

Spiegel, M.F. and Watson, C.S. (1981).  Factors in the discrimination of tonal patterns. III. Selective attention and the level of target tones.  J. Acoust. Soc. Am., 69, 223-230. 
 

Watson, C. S. (2005) Some comments on informational masking.

Acta Acustica united with Acustica, 91, 502-512. 
 

Watson, C.S., Kelly, W.J. and Wroton, H.W. (1976).  Factors in the discrimination of tonal patterns. II.  Selective attention and learning under various levels of stimulus uncertainty.  J. Acoust. Soc. Am., 60, 1176-1186. 
 

Watson, C.S. and Kelly, W.J. (1981).  The role of stimulus uncertainty in the discrimination of auditory patterns.  In D.J. Getty and J.H. Howard (Eds.), Auditory and Visual Pattern Recognition.  Lawrence Erlbaum Associates. 
 

Watson, C. S. and Kidd, G. R. (2002)  On the lack of association between basic auditory

abilities, speech processing, and other cognitive skills.  Seminars in Hearing, 23, 83-93.