Researchers review research on the visual working memory for information portrayed by items arranged in depth (i.e., distance to the observer) within peri-personal space, are here described. Most items lose their metric depths within half a second, even though their identities and spatial positions are retained. The paradoxical loss of depth information may arise because visual working memory retains the depth of a single object for the purpose of actions such as pointing or grasping which usually apply to only one thing at a time.
1. Depth and visual working memory
In nearly all past research on the visual storage of recently presented objects or symbols, the stimuli have been presented in the fronto-parallel or ‘picture’ plane. However, since the seminal paper by Xu and Nakayama (2007),
[1], several researchers have investigated the role that depth information might play in short-term visual memory
[2][3][4][5][6][7][8][9][2,3,4,5,6,7,8,9]. Here
we re
searchers focus on view the contributions of visual depth to iconic memory and to visual working memory (VWM) as discovered so far.
As an important preliminary, we first review evidence for the accurate perception of nearby spatial dimensions, especially depth, before discussing the retention of this information.
2. Visual Short-Term Memory for (x, y)
In contrast to memory for depth, the short-term memory for visual stimuli presented in the picture plane (x, y) is well studied. In brief, there exists a visual short-term store (VSTS) which extends the ~500 ms duration of sensory (or ‘iconic’) memory
[10][13] to more than 1 s
[11][12][13][14][15][14,15,16,17,18],
[16][19] section 2.3.4. For example, Lindsay-Wilson
[17][20], in a study of iconic persistence, found that the identification of random-dot letter patterns whose halves were displaced in time survived initial iconic decay at well-above chance levels. Dual-task studies show that VWM and VSTS can function independently
[18][21]. While the icon and VSTS enjoy relatively large capacities, only a few items enter VWM, either because of an attentional bottleneck (Boadbent,
[19][22]) between VSTS and VWM, or because VWM has an intrinsically limited capacity
[20][21][23,24]. The VWM capacity of four items reported by Luck and Vogel
[21][24] varies somewhat with object complexity
[6][22][23][6,25,26] and the disposition of visual attention
[24][25][26][27,28,29]. Since studies of VWM typically employ retention intervals of 900 ms or more, exceeding the duration of iconic memory but incorporating items from VSTS, it is tempting to infer that any attentional bottleneck or capacity limitation starts to exerts its effects at some time between 0.5 and 0.9 s.
It is clearly important to determine which visual memories survive an eye movement. Iconic memory does not
[10][13], and VSTS (and therefore VWM) may lose all but an attended item following a saccade
[20][27][23,30]. Indeed, any form of interruption of vision, even by a blank field, may suffice to erase visual memory
[28][29][31,32]. No studies have assessed whether depth information survives a saccade or vergence movement, or some other interruption, so in this review, we can only review studies in which the eyes are held steady and the visual memory for items is not erased. Future studies may remedy this lacuna.
That memoranda are encoded with their spatial locations in iconic memory and in VSTS seems intuitive, and is presupposed when items are recalled by cues to spatial location
[10][12][13,15] and many others. Spatial location may help to bind features in VMW
[30][31][33,34], most likely at encoding
[32][35]. Item color is bound to item location
[33][36] (Gobell, Tseng, and Sperling, 2004), and item color and item shape only come apart in visual memory if attention is distracted
[34][35][37,38]. Change detection is made more difficult if the spatial configuration of the probe display differs from that of the test display
[36][39], as if spatial arrangement is held in VWM. True, it is possible to know that a particular object had been presented recently but not where, as in backward masking by pattern
[16][19], so one cannot
a priori exclude unlocated items from visual memory. However, the consensus that location (x, y) is stored along with each item in VWM makes it possible that depth (z) is also stored.
RWe
searchers next consider whether this is the case or not, first in iconic memory, and then in working memory.
3. Depth Information in Iconic Memory
Reeves and Lei
[9] were the first to study the contribution of depth to iconic memory, and concluded it was minimal. In their Experiment 1, rows of letters were either presented in one depth plane (‘flat’) or were separated into three depth planes. A partial report paradigm was used. Depth separation was induced using two depth cues, size (closer letters were made larger) and disparity. Depth had no effect on cued recalls at all, at any cue delay from –100 (pre-cue) to +700 ms (post-cue). In their Experiment 2, flat displays of four rows of letters were intermixed with concave displays in which the middle two rows appeared further away, or intermixed with both convex and concave displays, and again there was no depth effect. Their Experiment 3 encouraged a shift of attention between rows, and although items from rows 1 and 2 were slightly more likely to be transposed when the two rows were in different depth planes than in the same depth plane, no other transpositions showed any effect of depth. As none of the subjects had difficulty in seeing or reporting the depth manipulations, which were obvious, these authors concluded that iconic storage is effectively flat, consistent with the model of Sakitt
[37][40]. It was not the case that partial reports could be increased by storing information from different depth planes in different locations in iconic memory. Either metric depth is not encoded in the icon, or it fades too rapidly to be useful, or it is retained but report is limited by a capacity limit or bottleneck.
Reeves and Lei
[8] wondered whether the ‘flat’ icon they had found up to 700 ms was nevertheless sensitive to depth. They attempted to measure depth retention directly. Item information was made as simple as possible. Just four numerals, 1, 2, 3, and 4, were presented on each trial, in a well-spaced column running down the screen. Since only these four well-known numerals were presented, there could be no error in recalling their identities. Each numeral was presented in a different depth plane, but which numeral was in which depth plane was randomized. An arrow cue appeared next to fixation, either simultaneous with the numeral display, or just after it. Subjects reported the identity of the (single) numeral in the same depth plane as the arrow. They also counted backwards by threes during the retention interval to ensure that they did not rehearse verbally.
In Sperling-type partial report experiments, subjects can report up to 11 letters from a spatial array of 12 letters
[10][13]. In contrast, subjects in this experiment were very poor. For some, the stimulus duration had to be extended to 800 ms to retain any depth information, but as this permits vergence to change, data from these subjects were suspect. However eight subjects were able to do the task at better than chance with 200 ms displays. When the arrow cue was simultaneous with the numeral array, accuracy for the 200 ms subjects averaged 76% for recalling the depth of just one of four easy-to-see items. Accuracy dropped to 60%, and then increased slightly to 64% and 66% as the arrow cue was delayed from 0 ms to 200 ms, 700 ms, and 1700 ms (see
Table 1, row 4). (Accuracy remained well above the chance level of 25%, so depth information was not absent from the icon, but it may have been too limited to aid the subjects of Reeves and Lei
[9], who had reported from arrays of 9 or 12 items.) As the four items were distinct, well known, and perceived correctly on every trial, it was retaining their depths, not perceiving the depths or retaining them as objects, that was so difficult.
Table 1. Key findings.
Reference |
Stimuli |
Paradigm |
Set Size |
Display Time |
Retention for |
Accuracy |
Xu & Naka-yama [1] |
Squares |
Change Detection, CDT |
N = 6 |
200 ms |
1000 ms |
77% 1 or 2 planes (mixed) 81% 2 planes (blocked) |
Reeves & Lei [9] |
Letters |
Partial report of 3 items (Expt. 1) |
N = 9 |
50 ms |
ISI = 0, 100, 300, 700 ms |
0 100 300 700 ms (planes) 80%, 54%, 52%, 53% (one or two) |
Chunharas et al. [2] |
Colors |
CDT (Expt. 2) |
N = 2, 4, 6, 8, 12 |
500 ms |
900 ms |
N = 2, 4, 6, 8, 12 (planes) 80% 60% 40% 29% 15% (two) 80% 60% 39% 28% 13% (one) |
Reeves & Lei [8] |
Numeral |
Partial report of 1 of 4 items |
N = 4 |
best subjects; 200 ms |
0, 200, 700, or 1700 ms delay |
0 200 700 1700 ms 76%, 60%, 64%, 66% |
Qian & Zhang [5] |
Square |
CDT |
N = 1, 2, 4, 6 |
800 ms |
900 ms |
Single display 71%. Whole display 78%. |
Qian et al. [4] |
Square |
CDT. δ = metric depth change |
N = 2, 3 |
800 ms |
900 ms |
δ: small large 89% 93% (order changed) 67% 80% (unchanged) |
Zhang et al. [38] | Zhang et al. [41] |
Square |
Adjustment |
N = 1, 6 |
800 ms |
900 ms |
90% contraction bias (N = 1) 80% contraction bias (N = 6) |
Wang et al. [39] | Wang et al. [42] |
Square |
CDT |
N = 2, 4 |
800 ms |
900 ms |
fixating middle plane, Front: 75%; Mid: 63%; Back: 73% |
Li et al. [40] | Li et al. [43] |
Square |
CDT; retro-cue |
N = 4 |
800 ms |
1300 ms |
Cue valid 77%; Neutral: 71%; Invalid: 68% |