Visual perception is dependent not only on low-level sensory input but also on high-level cognitive factors such as attention. In this paper, we sought to determine whether attentional processes can be internally monitored for the purpose of enhancing behavioural performance. To do so, we developed a novel paradigm involving an orientation discrimination task in which observers had the freedom to delay target presentation—by any amount required—until they judged their attentional focus to be complete. Our results show that discrimination performance is significantly improved when individuals self-monitor their level of visual attention and respond only when they perceive it to be maximal. Although target delay times varied widely from trial-to-trial (range 860 ms–12.84 s), we show that their distribution is Gaussian when plotted on a reciprocal latency scale. We further show that the neural basis of the delay times for judging attentional status is well explained by a linear rise-to-threshold model. We conclude that attentional mechanisms can be self-monitored for the purpose of enhancing human decision-making processes, and that the neural basis of such processes can be understood in terms of a simple, yet broadly applicable, linear rise-to-threshold model.
Our perception of the environment is dependent on both low-level sensory input and high-level cognitive factors such as attention and states of self-awareness (Smith et al. 2008). The visual, behavioural and neurophysiological consequences of these high-level mental factors are keenly sought. Numerous psychophysical studies show that attention improves behavioural performance and is vital for efficient interaction with the environment (Tucker & Ellis 1998; Anderson et al. 2002). Electro- and magneto-encephalographic studies show that attentional modulation either prior to or during visual stimulation affects the amplitude of cortical alpha rhythms, and that such changes are predictive of human visual performance (Yamagishi et al. 2005, 2008; Thut et al. 2006). Animal studies have shown that attention modulates neuronal signals in sensory areas, typically increasing firing rates to attended objects (Yantis & Serences 2003; Maunsell & Treue 2006). Recently, Cohen & Maunsell (2009) suggested that attention may improve performance primarily by reducing interneuronal correlations. In brief, the neural mechanisms associated with visual attention can be observed at both microscopic and macroscopic brain levels.
Here, we ask whether it is possible for individuals to internally monitor their attentional status. Anecdotal reports by observers engaged in visual experiments suggest they can. For example, despite attention being appropriately directed for a given task, as defined by experimental protocol, observers often excuse poor performance with the statement ‘I was not quite ready’.
Variable performance levels on a given task may, in large part, reflect an individual's attentional status (e.g. Huang et al. 2009). In this study, we employ a psychophysical paradigm to assess whether observers are able to self-monitor their attentional status, responding only when they perceive their attentional focus to be maximal. To quote the American psychologist William James (James 1890), we ask observers to ‘take possession of their mind’ and evaluate and act upon competing internal signals for the purpose of maximizing behavioural performance.
2. Material and Methods
Six volunteers, naive to the purpose of the experiments, and one of the authors (N.Y.) acted as observers. All had normal or correlated-to-normal visual acuity and full visual fields. Each gave written consent to participate in the experiment, which was approved by the ATR Human Subject Review Committee.
(b) Apparatus and stimuli
All stimuli were created using a VSG2/5 graphics card (Cambridge Research Systems, UK) and displayed on a (gamma-corrected) Sony GDM-F500 monitor with a resolution of 600 lines × 800 pixels. The mean luminance (40 cd m−2) and colour (CIE coordinates: x = 0.31, y = 0.32) of the stimuli were matched to that of the surround. The frame rate was 60 Hz. A response box with right and left buttons (CT3; Cambridge Research Systems) was positioned 10 cm in front of the participant. Head restraint was achieved using a chin rest. Eye movements were monitored at 250 Hz using EyeLinkII (SR research Ltd., Canada).
The target stimulus was a Gaussian-modulated (σ = 0.5°) sinusoid of two cycles/degree periodicity, occupying 3° × 3° of visual angle at a viewing distance of 100 cm. Its orientation was rotated from the horizontal by 5° (clockwise or anti-clockwise with equal probability) immediately prior to the start of each trial. The centre of the target stimulus was 7° directly below fixation.
Two distracter targets of fixed 50 per cent contrast were presented either side of the target stimulus, each with the same size and spatial structure as the target stimulus but assigned a random orientation (0–360°) prior to each trial. The distracters and the target stimulus were aligned horizontally and separated by 3° (centre-to-centre spacing).
A binary-choice procedure was used in conjunction with a method of constant stimuli to measure psychometric functions of performance for judging orientation of the target stimulus as a function of its contrast, in the presence of four distracter targets of fixed 50 per cent contrast. Importantly, participants were aware of the spatial location of the target stimulus (and distracters) and were instructed to covertly attend that area. Eye movements were recorded to ensure that central fixation was maintained, and that there was no systematic variation in eye movements between experimental conditions (figure 1).
There were two experimental conditions (figure 2a). In both conditions, each trial began with the presentation of a black, bull's-eye fixation target (two open circles with radii of 0.2° and 0.4°). Participants were instructed to maintain fixation and to avoid blinking during each trial. In condition A, participants were asked to monitor their internal attentional status and, when they judged their attention to be maximal, depress a button to initiate a trial sequence. On depression of the button, an audible tone was presented (lasting 50 ms) and the fixation target changed from black to grey. One hundred milliseconds later, the stimulus and distracters were presented for one frame (17 ms) only. While maintaining fixation, participants were required to judge whether the grating stimulus was clockwise (right-button press) or anti-clockwise (left-button press) oriented. No response deadline was imposed. An inter-trial-interval of 2000 ms followed the response. During this time, the fixation target reverted to black, but with the inner circle filled to indicate the allowance of blinking. Presentation of the bull's-eye fixation target marked the beginning of the next trial. Condition B followed the same procedure except that the signals indicating the stimulus was about to be presented (i.e. audible tone and change in fixation colour from black to grey) were under computer control and occurred at some pseudo-random time after the beginning of each trial. The length of this pre-stimulus period in condition B was selected pseudo-randomly from the distribution of latencies recorded in the previous block of condition A. This was done to ensure that the distribution of pre-stimulus times was the same for each experimental condition.
Each observer participated in one practice block of 60 trials and 32 experimental blocks of 30 trials each, conducted over four consecutive days and lasting approximately 1.5 h d−1. Experimental conditions were completed alternately: on the first and third days, nine blocks of trials were completed, beginning with condition A; on the second and fourth days, seven blocks of trials were completed, beginning with condition B. In each trial block, the contrast of the target stimulus was pseudo-randomly chosen from 10 predefined contrasts (range 5–70%), yielding a total of 48 measures for each contrast (three measures per trial block × 16 blocks).
3. Results and discussion
Visual performance for discriminating orientation was significantly affected by experimental condition: performance was greatest for condition A, where participants self-monitored their attentional status, responding when they perceived it to be maximal (two-way repeated-measures ANOVA, F1,120 = 11.934, p < 0.001). Enhanced performance was evident over intermediate and high stimulus contrasts, manifest as a horizontal separation between the group-averaged (n = 7) psychometric data for conditions A and B (figure 2b).
To assess further which factors were modulated by experimental condition, the data were modelled using both a two-parameter cumulative Gaussian distribution function (model 1; Crozier 1950) and a four-parameter sigmoidal function (model 2; Albrecht & Hamilton 1982). Based on Akaike's Information Criterion (AIC) (Akaike 1975), model 1 achieved the highest ranking and is presented here. Details of both models and the AIC calculation are given in appendix A. Figure 2b shows the maximum-likelihood fit of a cumulative distribution to each group-averaged dataset, where threshold (75% correct) corresponds to the mean (μ) and slope corresponds to the reciprocal of the standard deviation (σ) of the Gaussian function (condition A: μ = 0.237, σ = 0.118; condition B: μ = 0.361, σ = 0.221). Both parameters were significantly affected by experimental condition (condition A: μ = 0.241, σ = 0.122; condition B: μ = 0.352, σ = 0.190; p < 0.05, paired t-test, n = 7), indicating that condition A resulted in enhanced orientation discrimination performance with reduced response variability.
The elapsed time required for a participant to judge that their attentional focus was maximal varied from trial to trial (group-average autocorrelation with a lag of 1 was 0.144; see figure 3 for an example), and ranged from 860 ms to 12.84 s (median 3.45 s). Far longer than the inevitable delays associated with information transmission and the computation of motor commands, we assume our latency measures primarily reflect the time taken for neuronal decision units to reach some criterion level. If so, the decision units appear (at first sight) not to follow any simple processing rule because the latency distribution, when plotted as a conventional histogram, was highly skewed towards the right (see figure 4 for group data; figure 5b for individual data).
However, similar non-normal distributions for reaction time (RT) measures of visual performance (Carpenter 1988) have been shown to obey a well-defined stochastic law: the reciprocal of RT is normally distributed, termed a recinormal distribution (Reddi et al. 2003). Here, we show that the latency distribution for judging attentional focus behaves in the same way: plotted on a reciprocal latency scale, the distribution is well modelled by a Gaussian function (figure 5c, left-hand panels). Confirmation of recinormality is further demonstrated by the straight-line fit obtained when plotting the cumulative latency distribution on a probit ordinate scale with latency on a reciprocal abscissa (figure 5c, right-hand panels). In our study, the reciprocal latency distribution for each participant conformed to a normal distribution (Kolmogorov–Smirnov one-sample test, p > 0.05).
A simple, yet broadly applicable, explanation for the recinormality of our latency measures is provided by a class of models known as linear rise-to-threshold models (Nakahara et al. 2006). One such model, termed the Linear Approach to Threshold with Ergodic Rate (LATER) model, has been successfully used to account for RT measures of visual performance (Carpenter & Williams 1995). Grounded on Bayes' Theorem, the LATER model not only provides an empirical description of motor responses but also defines an ideal mechanism for deciding between competing sources of information under ‘noisy’ or uncertain conditions (Nakahara et al. 2006). Based on this model, we propose that the output of an attentionally modulated decision mechanism (D) rises linearly from an initial level (D0) until it reaches threshold (DT), prompting the participant to initiate a trial (figure 5a). The recinormality of the latency measures is accounted for by assuming that the rate of rise (r) varies randomly from trial to trial in a Gaussian fashion (since latency (T) is proportional to (DT−D0)/r), perhaps consequent upon the decision mechanism processing noisy, competing signals derived from both internal (attentional mechanisms) and external sources (visuo-motor processes).
We conclude that individuals are able to self-monitor their attentional status for the purpose of enhancing behavioural performance on a visual task, and that the neural basis of this decision process is well modelled by a linear rise-to-threshold model.
All participants in the study gave written consent and it was approved by the ATR Human Subject Review Committee.
This work was supported by JST PRESTO programme. The authors would like to thank Matthew de Brecht for programming, and Yuka Furukawa for her support with behavioural experiments.
APPENDIX A: DATA ANALYSIS
(a) Model 1
Each psychometric dataset was fitted, using a maximum-likelihood estimation procedure, with a cumulative Gaussian distribution function, and assuming that percentage correct varies from 50 (chance) to 100 per cent. Response performance was defined as: where μ is the mean and σ is the standard deviation of the normal distribution, and erf is the error function, defined as: This function has a long history in modelling frequency-of-seeing data (Crozier 1950). Averaged across participants, the mean correlation coefficient between the data and predicted values (goodness-of-fit) was 0.934 for condition A and 0.926 for condition B.
(b) Model 2
For each dataset, the contrast yielding half-saturation responses (C50; threshold) was determined by fitting a sigmoidal function (hyperbolic ratio function): where response represents performance, C is contrast, n is the slope of the function, Rmax is the level at which the response saturates (asymptote) and M is the response at the lowest contrast. This function has been widely used to fit contrast response functions in both physiological (Naka & Rushton 1966; Sclar et al. 1990; Martinez-Trujillo & Treue 2002) and human psychophysical studies (Ross & Speed 1991; Ling & Carrasco 2006). Best-fit solutions were determined using the maximum-likelihood estimation procedure. Averaged across participants, the mean correlation coefficient between the data and predicted values (goodness-of-fit) was 0.962 for condition A and 0.959 for condition B.
Of the four parameters, only the half-saturation response was significantly affected (C50: 0.193 for condition A, 0.252 for condition B; figure 6a). The data from individual observers were consistent with the group-averaged data. For each participant, the value determined for each parameter using dataset A was plotted against that determined using dataset B. The results are shown as scatter plots in figure 6b, in which the diagonal lines of unit slope indicate no difference between experimental conditions. Note that the derived values for dynamic range (Rmax, M) and slope (n) were equally distributed about the diagonal, whereas the half-saturation responses (C50) for each participant fell below the diagonal. To quantify the difference between response function parameters in the two experimental conditions, a modulation index MI = (PA−PB)/(PA + PB) was calculated, where PA and PB refer to the parameters for condition A and condition B, respectively (Martinez-Trujillo & Treue 2002). The distribution of indices deviated significantly from zero for C50 only (p = 0.002, two-tailed t-test, d.f. = 6), indicating a decrease in this parameter in experimental condition A.
(c) Model selection
AIC (Akaike 1975) provided a measure of the goodness-of-fit of models 1 and 2, and was used for model selection: the preferred model is the one with the lowest AIC value. Here, we used AICc (McQuarrie & Tsai 1998) which is AIC with a second-order correction for small sample sizes (10 levels of contrasts): the residual sum of squares (RSS), where k is the number of parameters in the statistical model, n is the number of samples, y is the data, x is the stimulus value and f is a function. Averaged across participants, AICc was −3.5413 for model 1 and −2.7644 for model 2. This difference was significant (p < 0.001, paired t-test).
- Received April 27, 2010.
- Accepted May 21, 2010.
- © 2010 The Royal Society