Understanding Spatial Autocorrelation: History
Please note this is an old version of this entry, which may differ significantly from the current revision.
Subjects: Geography
Contributor:

An enumeration of spatial autocorrelation’s (SA’s) polyvalent forms occurred nearly three decades ago. Attempts to conceive and disseminate a clearer explanation of it employ metaphors seeking to better relate SA to a student’s or spatial scientist’s personal knowledge databank.

  • Moran scatterplot
  • negative spatial autocorrelation
  • positive spatial autocorrelation

1. Introduction

Salvati [1] points out that “evidence from the analysis of scientific databanks and repositories indicates how the geography discipline has a strong potential for growth and [facilitating] the dissemination of complex global problems.” Realizing this potential requires a wider awareness and deeper understanding of an often glossed over, ignored, or unschooled fundamental property of all of the geospatial data housed in the databanks and repositories he mentions, namely, spatial autocorrelation (SA)—the tendency for (dis)similar attribute values to cluster on a map. As Griffith [2] professes, SA is everywhere! Accordingly, it is an essential ingredient for “develop[ing] and offer[ing] new strategies, visions and proposals on the role of sustainability and resilience related to urban and rural contexts” [1], such as partially constituting the spatial statistical theory underlying tessellated stratified random sampling necessary for economically and efficiently monitoring and “studying [the] degree of resilience and future (sustainable) development [of large territories]” [1]. Not only is SA a fundamental property of georeferenced data, but it also is a fundamental geographic concept (e.g., Tobler’s [3] First Law of Geography; see https://www.researchgate.net/publication/276917830_Concepts_and_Principles_for_Spatial_Literacy (accessed on 24 August 2023)). Its history dates back to its informal, tacit, non-verbal awareness concept formation recognition by, for example, Spilsbury in 1767 [4], who invented the jigsaw puzzle to teach geography, and Brandes in 1816 [5], who invented the isobar map to visualize general west-to-east movements of low pressure across Europe. Nearly a century later, SA had its formal concept creation recognition by Student [6], followed by a quarter century of acknowledgements about its correlated data source [7,8,9] and its impacts on agricultural experimental designs [10,11], its quantification by Moran [12] and Geary [13], its popularization by Cliff and Ord [14], as well as Journel and Huijbregts [15], and its promotion as part of standard spatial statistical/econometric practice by Paelinck and Klaassen [16], Anselin [17], Cressie [18], and Haining [19], among others.
The concept of SA may be more meticulously defined as follows:
Coupling a tertile-classified set of attribute values [i.e., relatively high (H), intermediate (M), and relatively low (L) magnitude groups] with a posited geographic neighbors definition (e.g., nearby points, adjacent line segments, and/or juxtaposed polygons sharing a non-zero length common boundary—the rook designation, based upon its resemblance to chess piece moves), the tendency for pairs of H, of M, and/or of L values (positive SA), or the tendency for contrasting high-low (H-L) or low-high (L-H) value pairings as well as still pairs of M values (negative SA), to be neighbors as defined by this given geographic-based construction.
This tertile definition builds upon Anselin’s [20] local SA index conceptualization, which translates points in a Moran scatterplot into neighboring pairings denoted by high-high (H-H), low-low (L-L), H-L, and L-H; insignificant areal units constitute the M values. SA has other correlated data parallels, including those involving matched pairs, time series, space-time series, and network series [21].
During its catapulting into the forefront of the quantitative spatial sciences, many students, in particular, of quantitative geography found understanding SA and its consequences a challenge, spawning a set of earlier publications devoted to explicating it [22,23]; Griffith also published a monograph with this title in 1987). Contemporary literature, including Getis [24], Goodchild [25], Griffith [26], Haining [27], Legendre [28], and McMillan [29], contains a number of standalone explanatory treatments of SA. Today, the body of literature dedicated to SA is sizeable (Figure 1; for an updated version, see [30]). 
Figure 1. Web of Science (2012–2018) SA keyword cloud infographics (arbitrary group coloring visually differentiates among perceived SA research communities; node size reflects weighted normalized citation counts, which tend to highlight leading community scholars); compilation and portrayals by Drs. Kai Hu (Jiangnan University) and Qing Luo (Wuhan Institute of Technology). Left (a): authors. Right (b): concepts.

2. SA: An Important Geospatial Synoptic Statistic

Elementary descriptive statistics are important for quantitative analyses because they condense a numerical dataset’s information content into a few informative summary values about those data. Two vital descriptors are the mean and the variance because they respectively reveal a typical value and the spread of a dataset, even if the mean is a function of other variables when treated in a multivariate context. SA becomes a third crucial descriptor for georeferenced data—Goodchild [25] describes it as being endemic—because, in part, it exposes the presence of inflation in the variance, and, in part, because it represents redundant information that supplies the “essential economies that allow complex surfaces to be represented in manageable volumes” [25].
Couching this SA notion within a more technical statistical context, Legendre [28] emphasizes the commonly cited undermining by SA of the standard statistical analysis independent observations assumption, mentioning that it most often materializes in a geographic distribution as patches or gradients. Haining [27] highlights this SA non-independence feature as being instrumental to geography’s contribution to spatial statistics, commenting that SA relates to both scale and resolution of geographic data. Cliff and Ord [14] acknowledge that mis-specified regression models can create spurious residual SA, a theme discussed in detail by McMillen [29], and in terms of omitted variable beckoning by Griffith and Chun [34], can introduce omitted variable bias, especially in the presence of disregarded negative SA [33]. In these two latter multivariate contexts, a response variable’s mean varies, rather than being a constant (e.g., only an intercept term); SA contained in a response variable is a function of either that latent in related covariates, or spatial lag terms appearing in spatial autoregressive model specifications (e.g., conditional autoregressive (CAR), simultaneous autoregressive (SAR), and autoregressive response (AR) versions being the most popular) that attempt to usurp missing variable effects. Meanwhile, Goodchild [25] echoes the sentiment of the preceding paragraph, noting that SA is “… a monotonically decreasing function of distance [and hence] a fortunate characteristic of a wide range of spatially distributed phenomena.”

3. SA and Geographic Scale/Resolution

Legendre [28] addresses the geographic scale (i.e., geographic landscape size, relating to increasing domain sampling designs) issue, arguing that SA-related global patterns across a geographic landscape materializing as gradients arise from spatial (e.g., distance decay) processes or wide-ranging underlying common factors that elicit the formation of comparable outcomes in different regions and locations. Likewise, SA-related local patterns, which, landscape-wide, appear as disjoint patches separated by interstices, elicit the formation of numerous geographically small concentrations of outcomes at dispersed locations. Geographic scale provides the perspective that casts a clustering of similar values as being a gradient or patchiness. Pawley and McArdle [35] partner this scale issue with a recognition that the target of inference helps determine when SA presents data analysis complications or an opportunity to achieve additional effectiveness and/or robustness.
The geographic resolution (i.e., size of an areal unit polygon, relating to infill spatial sampling designs) issue involves some sort of data averaging within polygons: as polygons increase in size, more geographic averaging occurs, which has an accuracy highly correlated with any latent degree of positive SA. This averaging implies that, in practice, the SA measurements should change as resolution becomes coarser. Employing regular square quadrats, Chou [36] finds that SA measures increase in magnitude as resolution becomes finer, at a logarithmic rate; Zhang et al. [37] essentially corroborate this finding. Rodrigues and Tenedorio [38] report that the shape of irregular areal unit polygons also impacts SA measures, with aggregation of such nonuniform shapes varying in size not necessarily strictly rendering decreasing values with increasing coarseness. Di et al. [39] also detect an inverse relationship between resolution and SA measurement, while uncovering a tendency for SA quantifications to decrease in magnitude when irregular replace regular square shaped areal unit polygons. Describing this situation as the resolution sensitivity of SA, Mohan et al. [40] show that the aforementioned negative relationship is not necessarily a monotonically decreasing function—a finding similar to that by Rodrigues and Tenedorio [38]—devising a resolution correlogram tool based upon popular SA indices to adjust for this sensitivity.
The principal implication here for the metaphor explicated is that the sizes, shapes, and numbers of jigsaw puzzle pieces [41] affect the interface between a puzzle and SA addressed in the ensuing discussion. It also alludes to the issue of geographic scale and resolution. If a puzzle’s size is held constant, then increasing its number of pieces (all of which frequently are alike in total area) is equivalent to changing its geographic resolution. As geographic resolution increases, visual clues from puzzle pieces become more obscure; as geographic resolution decreases, clues from border buffer areas becomes more informative. Although artwork, piece size/shape, and color range can contribute to the degree of difficulty for solving a given puzzle, its number of pieces tends to be most strongly directly correlated with its degree of difficulty. As noted in the preceding paragraph, SA exhibits a similar type of tendency: it tends to increase in magnitude as resolution becomes finer, at a logarithmic rate.

This entry is adapted from the peer-reviewed paper 10.3390/geographies3030028

This entry is offline, you can click here to edit this entry!