Attention-Setting and Human Mental Function: An Introduction: History
Please note this is an old version of this entry, which may differ significantly from the current revision.
Subjects: Psychology
Contributor: ,

Working draft

  • attention
  • visual attention
  • human perception
  • top-down processing

1. Introduction

Attention is critical for human mental function; attention-setting is a framework for understanding intentional, top-down processes of attention. This entry introduces attention-setting and relates it to the larger topics of attention and experimental research on human attention. Attention-setting is the primary way in which the brain is cognitively “active” during everyday perception: it is the implementation of intentions and goals. For example, given a goal such as find (object), attention sets up and prioritizes relevant machinery in neural subsystems. These are not the only top-down effects in the brain, but they appear to be the most powerful and important.

We present attention-setting within a general treatment of attention that emphasizes visual tasks in complex scenes. We begin with some historical background and then develop the concept of attention-setting in tasks such as visual search. This is followed by a selective review of major experimental results on task-switching and temporal attention. Then, we develop the idea of attention-setting over seconds. This time scale is especially important in human behavior [1]. We present evidence that attention is especially powerful when it can be set and changed over seconds. We finish with some relevant highlights from neuroscience and cognitive health. Our general argument is that attention-setting is a pervasive top-down mental skill, which plays out over multiple timescales, especially the humanly important time scale of seconds.

1.1. Pieces of History

A basic principle of attention is that perceivers choose one stream of information out of many; this is a main way in which the brain is active. This idea was first explicated by William James in 1890 [2] and has been re-stated in recent decades [3,4]. The brain is limited, in contrast to the immense informational richness of any real-world situation. Furthermore, computational studies demonstrate that interpretation is exponentially more complex than the stimulus, an even stronger motivation for attention [5,6].
In the early days of scientific psychology in the 1890’s, attention was a main topic (e.g., [7,8]). Effective empirical approaches such as “task switching” were being developed into the 1920’s [9]. However, when behaviorism became dominant in the 1930s and 1940s, attention studies were not published in primary experimental research journals. Attention was not directly measurable. The cognitive approach to attention re-appeared in the 1950s as internal mental constructs, and the evidence-based inferences necessary to support them were again allowed in the research mainstream. Since that time, the field of attention has developed and flourished. Major results have emerged and been replicated many times, producing some areas of general empirical agreement.
The larger world-context in the 1950s was the dawning of the information age and ideas such as measuring information and manipulating it. Attention researchers could present different types of information and infer internal mechanisms from the resulting patterns of human performance. In the first major theory, attention was hypothesized to be an early filter that selected among sensory information coded in parallel across sensory registers. Consequently, information matching a single sensory register (a “channel”) should be easy to select and then process further into later, deeper perception [10,11]. With the then-modern technologies of tape recorders and headphones, researchers used broad sensory channels such as input ear, and difficult tasks such as dichotic listening: repeating one “relevant” message (e.g., words in the left ear) while ignoring other, irrelevant messages (words in the right ear). The early filter construct was supported by the result that repeating an input stream was efficient when it was distinct on a sensory basis (e.g., high voice in one ear) but less efficient when sensory selection was difficult (e.g., relevant inputs switched ears, or high voices in both ears) [10,11].
The second major theory moved the filter later into the processing stream, positing a late filter after basic perceptual identification had been completed. Objects were identified in parallel, and the filter was used to select a single identified object for admission to limited consciousness (e.g., [12,13]. As Kahneman and Treisman noted in 1984 [14], early and late selection were scientific paradigms in the Kuhnian sense [15]; the paradigms provided theory together with methods. The late selection methods tended to favor efficient processing; the stimuli were familiar and simple (e.g., letters in arrays), as were the tasks (e.g., detect a T or F), which encouraged late processing. The early selection methods, on the other hand, involved broad channels (such as ear) and difficult tasks that encouraged early section [14]. Ultimately, the early versus late debate was resolved with hybrid models in which each type of selection could occur, but with important differences such as the relative ease of early selection and the broadness of late selection [16,17].
Note that attention was used mainly as a noun, denoting a mechanism. The early decades of research can be viewed as a search for the mechanism of attention, as well as a search for manipulations that could separate attention from other processes, such as perception. Kahneman and Treisman [14] acknowledged the fruitfulness of the filter metaphors but proposed a framework that was literally more integrative. Perception and attention were functionally related in the key construct of mental “object-representations”. A mental object can be thought of as a top-node in a perceptual hierarchy. Object representations efficiently integrated perceptual, conceptual, and event information, and this produced a priority for unified mental concepts. The object-metaphor was also quite fruitful and generated considerable research. The functional relationship between perception and attention was new to this attention framework, although some perceptual theorists of the 1970’s suggested a similar relationship [1,18,19]. The functional relationship continues in most modern conceptions, including the present one.
As attention research grew over the next decades, the range of attentional functions studied in experiments greatly increased. Franconeri [20] describes and explains 15 different attentional limitations within a common framework. Geng, Leber, and Shomstein [21] recently called for research articles on attention and perception and published what they termed 40 different views. Research has also begun to address the complexity of real-world situations, which magnifies the importance of attention and priority (e.g., [6,22,23,24]). Situations that approach real-world complexity are emphasized here.
Over the decades, the theoretical metaphors of attention became less singular and more general, each capturing important aspects of attention: a single pool of energetic resources (e.g., [25]), multiple pools with distinct resources (e.g., [26,27]), the object-centered structures mentioned [14], attentional sets (e.g., [28,29]), and biased competition between representational networks [30,31,32,33]. The multiplicity of concepts suggests that the functions of attention are too varied and too pervasive to be captured by any single mechanism.

1.2. Attention-Setting

Attention-setting is a set of skills, i.e., mental actions. Attention-setting is a verb phrase designating the category of mental actions that initiate and prioritize mental functioning within the limited resources of the human brain. Attention sets up mental processing in accordance with the observer’s goals and situational parameters. Once set up, familiar processes such as “read that highway sign” run as a continuing interaction involving the mind, the display, and the larger situation. We illustrate attention-setting further with an example involving the well-studied process of visual search. Attention-setting extends over seconds in the example, consistent with our emphasis on humanly important time scales [1]. Theories and evidence supporting this example are noted below. In the section that follows, we relate attention-setting to the theoretical concepts in the literature and illustrate the pervasiveness of attention.
  • Theory and Evidence behind the Example
  • A comprehensive theory of visual search has been developed by Wolfe and colleagues and provides details on many important visual mechanisms—Guided Search Version 6.0 [34]. The theory integrates well-supported details of sensory coding channels and the priority map, and the paper provides useful further references. Zelinsky, Chen, Ahn, and Adeli [35] provide an amazing catalog of computational search models, with an emphasis on the general problem and real-world scenes. Zelinsky et al. treat eye fixations, which are closely linked to attention; they provide a complementary theoretical approach to top-down influences (see also [36]).
  • Attentional templates are included in most search models, and many experiments measure the set-up of the templates. Priority maps are also central constructs; they combine top-down knowledge and visual features from bottom-up parallel processing (e.g., [34,37,38]). Priority maps are used to guide the search toward likely target locations and away from unlikely locations [39]. The trade-offs in energetic resources between different tasks is a long-standing topic in basic and applied research (e.g., [27,40]). Internal attention, such as turning attention into one’s memory, is becoming a distinct research topic (e.g., [41,42]). Unconscious problem solving is a growing area of research. The idea that processes will be modified over time, through interaction with the world, was proposed by Neisser [1] as the perceptual cycle. We will develop the idea below as a useful framework for understanding attention over the time scale of seconds.
Imagine that the power went out at night and it is near dark in one’s home. Intention takes the form of a goal such as find (flashlight), and attention sets up processes to meet the goal. Attention sets up visual search processes by initiating the creation of an internal attentional template for the goal object (target), in visual working memory. The template can be fairly specific (my red flashlight in dim light) or abstract in various ways (any light source). The template is used in a matching process that compares it to a priority map of the visible world. The priority map combines sensory information (bottom-up features) and knowledge (top-down) on a spatial map. The sensory features in the map are weighted by priority (e.g., reddish glints of light, non-accidental shape properties). The knowledge includes historically likely locations (the flashlight should be on its shelf). The attentional template is matched against the priority map, to guide the search through the immediate scene. In near darkness, the search may be slow because the incoming features are limited by low light (a data limitation; [25]). Because bottom-up information is weak, top-down location knowledge will be more important, but only if valid (and only if the flashlight has been put back on its shelf).
Once the search process has begun, it will continue to require some mental resources but usually less than at the start. In addition, attention can set up new processes such as reaching or tactile search, again drawing on resources. Attention can also set up internal processes. It can initiate wider problem solving, including memory retrieval, which is set up with a memory cue (e.g., when did I last use the flashlight?). The results of these processes (when I was fixing the toilet) can then be used to modify priorities. Problem solving is aided by abstract goals (find the light) and is set up by attention; a goal can serve as a memory cue that can activate unconscious knowledge. Phones now have flashlights. The activation of an unconscious memory is not directly caused by intention; activation is caused because an abstract goal (memory cue) is broadcast to memory and there is a match.
In sum, attention sets up and guides processes at multiple levels and modifies priorities as results come in from the world, decision making, and memory. Attention sets up, guides, and prioritizes larger systems (e.g., visual search, tactile search, and memory retrieval), initiates the construction of central objects within systems (e.g., attentional templates and retrieval cues), and implements priorities at multiple levels (e.g., favoring particular tasks, locations for search, and certain visual features, while inhibiting unlikely features and locations). The exact number of attention-setting functions may be difficult to know because humans invent and tune new cognitive skills. Nevertheless, we think that attention-setting could explain the major goal-directed mental actions of the perceiver, across many situations.
The boundary between attention and other mental processes is an interesting issue. We argue that a strict boundary is not yet appropriate for attention. A more fruitful approach is to assume that attention-setting works directly with other processes and examine those functional relationships. At today’s levels of discovery, functional relationships are more important than carving mental nature into independent parts.
Attention-setting is an expansion of attentional set theory, which has emphasized specific sets within controlled situations (e.g., [23,28,43]). We expand upon this idea, arguing that that attention-setting is a powerful set of skills involving setting and tuning. Setting often takes place over seconds, during interactions with the world. As we explain later in this paper, the settings of attention can have profound effects. Because set helps determine the information that observers pick up from the world, set also helps determine what observers understand and learn [1].

1.3. Relations to Other Major Concepts and Processes

The mental resources prioritized in attention-setting are often called “attention” in the literature, for simplicity. Resource limitations are critical (e.g., [25,26,27]), and attention-setting is constrained by the limitations. “Attending” is a basic result of attention-setting. Attention-setting is similar to the widely studied construct of attentional control (e.g., [6,44]). Attentional control is a fundamental executive process in the brain (e.g., [45]). However, attention-setting emphasizes the setup of brain processes to run rather than continuous control. Set-up (preparation) is often a highly resource-intensive process (e.g., [46]).
Attention-setting (and attention in general) is functionally related to many mental processes, and we will now mention some of the most important. Extending upward, there is the executive domain of meta-awareness and executive processing, where initiating and setting processes is often critical (e.g., see [47]). Attention in general is closely related to awareness; Graziano’s attention-schema theory provides good treatment (e.g., [48]). Attentional control is a major portion of intelligence, and attention-setting may be a core mechanism in the portion termed fluid intelligence, the flexible, creative, and problem-solving aspects of intelligence [44,49]. Skillful attentional control is necessary for creativity and imagination. Attention sets up mental “simulations” that involve knowledge and images assembled from memory and that may seem to run themselves as long as they continue to be attended (e.g., [50,51,52]).
Attention-setting works with each of the main types of memory. It is functionally related to working memory, which is a highly flexible, temporary representational space. Attention sets up and uses working memory in multiple ways, for example, as an image-like memory buffer, or a verbal rehearsal mechanism to remember a set of numbers [53]. Attention-setting also interacts with long-term memory, by setting up cues broadcast to memory (e.g., [54]). In fact, memory retrieval can be viewed as attention-setting turned inward [42]. Third, attention-setting is initially critical for developing implicit memory skills such as driving. Beginners set up the new tasks carefully in a serial, attention-controlled (and resource-intensive) manner. However, with practice the procedures become an implicit memory that runs with low resource requirements. The links between intentions and the networks that implement them are critical, and recent work has begun to flush them out conceptually and formally (e.g., [55,56]). In summary, attention-setting contributes to many mental processes, and these functional relations are active research topics.

1.4. The Present Approach

The strongest arguments for the attention-setting framework come more from the “big picture” of attention than from any single experiment. We believe attention-setting is consistent with many of the thousands of experiments on top-down attention. Furthermore, critical support also comes from success in related fields, when attention is viewed as an active and pervasive top-down influence on networks. This includes research on attentional disabilities [57] and computational vision [6]. Near the end of this paper, we will bring in evidence from neuroscience and the emerging sciences of attentional and cognitive health and note that attention-setting has biological characteristics such as exercise-benefits and fatigue.
The overall goal in this paper is to provide an informative but highly selective tour of attention and attention-setting in the field of visual cognition. We use verbal and descriptive concepts typical of the field and emphasize relatively complex situations that begin to resemble the real world. The aim is to capture the most important messages from recent decades of experimental research, in a “consumer-friendly” manner. That should mean readable, but in psychology and neuroscience, the units readers care about most are “effect sizes”. How large is the effect of attention-setting on mental performance? Visual cognition provides some scales that are simple and intuitive. Here, our favored scale is the ability to perceive something in plain sight, such as a gorilla. Usually performance is near 100% for this process, but research puts some interesting marks on the other side of the scale.

2. Attention-Setting and Gorilla Missing

The missed-gorilla experiment is a landmark demonstration in the domain of attention [58]. We will describe that experiment, but first readers should note that they can still experience misperception in the original video, or experience it anew in the sequel, “Monkey Business” ( (accessed on 10 June 2021)).
The original experiment demonstrates the powerful effects of attention-setting over seconds. As mentioned, noticing a gorilla is usually near 100%, even in video. However, this ability is greatly reduced when healthy observers engage in a visually and mentally challenging task while watching a video with two interacting teams of players. In a representative condition, there were two teams of three players each (white shirts versus black shirts), and the task was to notice passes of a basketball by one team (task focus 1) and count the number of passes (task focus 2). The players in the other shirts should be ignored (suppression; task 3). This makes the observers busy, maybe as busy as crossing a city street. The critical finding was that when the gorilla walked in and pounded his chest, only 42% of observers reported noticing it when subsequently asked, “Did you notice anything else?”. The results have stood up to years of scrutiny, including careful considerations of memory [43], and further research, including more controlled conditions to be described. Because false alarms were low (no false gorilla reports by another group of observers), the hit rate is a valid scale of conscious perception. The missed gorilla is a marked failure of the mental processes that lead up to conscious perception, a failure that lasts for seconds. There is likely to be limited unconscious processing in this situation, however, as will be noted.
A reasonable explanation for the conscious failures is that the attention settings were for the relevant task, pass-counting. The settings enable processes for the three challenging foci mentioned above, beginning with the complex processes of tracking complex objects in space (both the ball and white-shirt players). This requires guidance systems for eye movements and attentional resources, as well as the executive direction of counting and remembering. The task set also includes the suppression of non-relevant information and especially the black-shirt players. Interestingly, when the colors are reversed for other observers (attend to black shirts, ignore white), the color-settings change and the gorilla is noticed 83% of the time. However, for other subjects, the gorilla is replaced by a woman wearing light grey clothes with an umbrella, and she is noticed only 58% of the time.
Thus, the results are not due to a single mechanism but instead a configuration of systems, the task set. The task settings are also likely to pertain to time and size scale; the basketball is the primary object and is relatively small and fairly fast, in contrast to the slower and larger unexpected people in guises. The configuration of systems gives the set selective high efficacy in the relevant task but causes the human to miss many other stimuli outside of the set. Noticing an unexpected stimulus requires bottom-up capture, to be described.

Gorilla Missing with More Control, and Bottom-Up Capture

Attention-setting is an internal action that changes the functioning of the brain. The match or mismatch between the settings and the experimenter’s stimuli can produce large differences in performance. Researchers can observe the match and mismatch by changing the task, and do so repeatedly (100s of times) to obtain more reliable data. When the task changes back and forth repeatedly, participants learn to change settings with some efficiency; this is known as task switching or task reconfiguration. This has been an intense area of research (e.g., for reviews see [59,60,61]), and we will be switching back to it throughout this paper. Note that in task switching research, the search for the mechanism (a single structural bottleneck) may be successful only in limited situations (cf. [62]). Perspectives of flexibility and practice are necessary to explain major results [59]. Thus, we return to the first switch in task, which usually produces the largest change in mental function.
Observing the first matches and mismatches of set requires a special type of experiment, usually one without task-specific practice that stabilizes performance. Findings such as missed gorillas helped inspire an era of these experiments that led to important insights. Gorilla missing with more controlled displays was measured in a program of experiments led by Steve Most, who was a graduate student drafted onto the Simons and Chabris gorilla team. Together, they directed an army of researchers with laptops far and wide across campuses, to conduct dozens of short experiments [29,63].
In a number of experiments, the stimuli were 8 smallish black or white circles and squares, about the size of a large-ish coin [29,63]. The shapes moved haphazardly across the display screen over seconds. Depending on the experiment, the task set was to track 4 of them, defined by color or by shape, and to ignore the 4 others. When squares were tracked (black and white), attention set a visual-cognitive “square-template” for the relevant squares while inhibiting irrelevant circles. The role of the gorilla was played by an unexpected ninth object that entered the screen. When attention was set for squares, observers noticed a square intruder much more often than a circle-intruder. In another experiment, the template for relevance was “black” (or “white”), and observers tracked that color. The gorilla was played by a cross similar in size, and either black, white, or one of 2 intermediate levels of grey. The cross was almost always noticed when it was the attended black or white color (93%), but noticing went down linearly the next 3 grayscale steps away, to 3% with maximum departure (e.g., black relevant; white cross). This function, spanning most of the range in performance scale, wins a prize for the largest effect size in this paper.
Another important special paradigm was devised by two sages of the cognitive revolution, Ariel Mack and Irvin Rock [64]. They sought to measure perception that was unprepared and low on directed attentional resources. The observers’ efforts were directed to a briefly appearing cross, for which they would compare the length of the two segments (to establish if the horizontal or vertical was longer). For 3 trials, only the cross appeared but briefly, so the observers were set for optimal size processing. On the fourth trial, the cross appeared again but along with a nearby, unexpected stimulus. Would observers notice a simple but unexpected stimulus such as a line or colored shape? Across many experiments, a variety of simple stimuli went unnoticed by most observers, even though the stimuli should activate simple feature detectors in the observers’ brains. At this point, the results were quite pleasing to a hard-core top-down theorist: even simple stimuli were not perceived, if the observer was not set for them.
However, researchers keep on experimenting, and the simple conclusion was qualified with an important twist. Mack and Rock [64] found that if the unexpected stimulus was more meaningful—the observer’s printed name—most observers (87%) noticed it. The finding echoes a now-classic finding that seriously hampered the early filter model, concerning information from an unattended (ignored) ear during dichotic listening. If selection was sensory-based, then everything in the ignored sensory channel was thought to simply decay; indeed, participants remembered nothing from that ear. Then, Moray [65] found that the participant’s own name could be noticed and remembered. Thus, unexpected but significant signals may be processed into awareness, through a primarily bottom-up route. Classic theories of attention added mechanisms for prioritizing significant information (e.g., [66]). More recent research indicates that this critical result has narrow boundary conditions, however [67]. More generally, as theorists recognized in the 1970s, the flow of information during perception is both bottom-up and top-down in nature (e.g., [19,25]). Humans like to be driven by their knowledge, but adaptation requires being open to unexpected inputs and new ideas. Modern theories include rapid bottom-up routes for efficiently processing familiar stimuli, along with more controlled top-down mechanisms (e.g., [68,69,70]). In fact, a possibly major difference between individuals is the degree to which a person is top-down or bottom-up in general [71].
However, in a more precise sense, the relative strength of purely bottom-up routes and top-down settings remains a critical issue. The ability of an unexpected but physically salient stimulus to capture attention is a demonstration of the power of bottom-up processing (e.g., [72,73]). For example, observers set to respond to blue-Ts can be slowed by a nearby but “irrelevant” red-X. This is at least somewhat independent of top-down settings. However, note that observers in such studies form general sets, such as using vision and responding rapidly to sudden stimuli. In the now large literature on this topic, critical factors include the spatial region to which attention is set and the degree of task-relevance of the stimulus (see [74], this issue). Bottom-up capture can be eliminated in some general conditions, for example, with an exclusionary attentional set such as “ignore red” or “ignore that region” (e.g., [75,76,77]). Thus, in many cases, the capture of attention is contingent on high-level settings (e.g., [78]). This is a critical indication of the power of top-down processing. Purely bottom-up capture appears to be limited to certain experimental conditions [79].
Nevertheless, bottom-up processing routes are efficient for a variety of stimuli, from familiar words to novel but typical everyday scenes. Additionally, some information is prioritized, including negative information and self-relevant information (e.g., [80,81]). Some processing is unconscious. For example, if a human is set to watch a certain region of the video screen and an unexpected but familiar word appears near there, it is likely be processed to some depth in the brain, independent of other ongoing processes. The familiar word activates feature, letter, and word detectors in intermediate brain areas, resulting in some activation of meaning (cf. [30,82,83]). This can happen while an observer’s awareness is focused on another task. Such effects qualify the large effect-sizes of tasks set on mental function that we have emphasized. When a stimulus such as a gorilla or a circle in clear view is missed, there is likely to be some stimulus-specific processing at unconscious levels.
The research discussed so far has focused on limited windows of time—single critical trails and events, either in the first parts of experiments or in some cases repeated over and over again. However, attention-setting takes place in time. Larger changes are likely to take more time, and sets can develop or change over time in an experiment. We are about to enter a new and important dimension.

This entry is adapted from the peer-reviewed paper 10.3390/jimaging8060159

This entry is offline, you can click here to edit this entry!
Video Production Service