Sources of Active Travel Data: Comparison
Please note this is a comparison between Version 1 by Mohammad Alattar and Version 2 by Amina Yu.

Active travel (AT), namely journeys that have been undertaken either entirely or partially using human-powered transportation modes such as walking, cycling, or using a wheelchair, has been the focus of much attention due to its potential for remedying negative impacts of urbanization. Among other benefits, AT helps to meet required physical activity guidelines and reduces traffic congestion and pollution. Furthermore, AT induces the uptake of emerging micromobility, a term that describes the use of electrically assessed lightweight vehicles such as e-bikes, e-scooters, e-skateboards, and hoverboards.

  • active travel
  • emerging data sources
  • crowdsourced data
  • cycling
  • Strava
  • public participation geographic information system (PPGIS)

1. Introduction

Among other benefits, AT helps to meet required physical activity guidelines and reduces traffic congestion and pollution [1][2][1,2]. Furthermore, AT induces the uptake of emerging micromobility, a term that describes the use of electrically assessed lightweight vehicles such as e-bikes, e-scooters, e-skateboards, and hoverboards. Micromobility transport modes are less physically taxing with a shorter travel duration, reducing the reliance on conventional vehicles, particularly for short journeys [3][4][3,4]. However, the emphasis of transport planning in most cities is still car-dominant, with policies such as minimum car parking requirements and gas subsidies aiming to reduce car delays across many urban transport networks [5][6].

In many cases AT can substitute for a large portion of journeys undertaken by motorized transport. Thus, to change this car-dominant transport paradigm, fostering AT requires well-informed policies and interventions, which have often been stalled by inadequate data available from traditional data sources. Manual data collection methods (e.g., using clickers to obtain AT user volumes) are laborious [6][10], with much data underreported [7][11]. Nevertheless, traditional data is useful for the validation of emerging data sources.

The advent and ubiquity of information and communication technology, including smartphones and wearable devices, has allowed for emerging AT data (hereinafter emerging data) ventures. : volume (very large), variety (highly complex) and velocity (high growth rate), making them unmanageable through traditional methods [8][13]. Compared to traditional AT data, emerging data is much more voluminous, less obstructive, and relatively cheaper to collect. For example, Strava ((https://www.strava.com/accessed on 11 June 2021), a social fitness network (SFN) where users can track, share and monitor their physical activities such as cycling and running; more about this type of data source is provided in Section 3.1) ridership datasets have a fine spatiotemporal resolution whereas traditional counts are limited both spatially and temporally [9][14]; and traditional safety incident reports (e.g., police and insurance) are underreported compared to BikeMaps ((https://bikemaps.org/accessed on 11 June 2021), an online platform where users can voluntarily report concerns about cycling safety, including incidents such as collisions, near misses and hazards; more about this type of data source is provided in Section 3.3)

In an attempt to review the emerging data and address their potential and limitations, this work surveys the current emerging AT data ecosystem, and builds and expands on previous reviews [10][11][15,16] to include additional AT modes and new data sources. This literature review was conducted employing Google Scholar using the following terms: active travel, active transportation, cycling, cycle, bicycle, pedestrian, bike sharing systems (BSSs), big data, data collection, Strava, crowdsourced, emerging data, and traditional data. This paper aims to assess the state of knowledge on emerging data. Section 2introduces potential outcomes of AT, whileSection 3.1provides a brief review of traditional AT data sources andSection 3.2focuses on emerging data.

2. Active Travel Outcomes

Urbanization has transformed cities to obesogenic environments, a term that has been coined to describe environments that induce obesity through promoting a sedentary lifestyle and encouraging an excess calorie intake [12][17]. This has resulted in the first generation to have a shorter life expectancy than their parents [13][18].

Flint et al. [14][19] found that in the UK, AT users had a significantly lower body mass index compared to users of motorized travel modes, suggesting that they are less likely to be obese or suffer from related health conditions. [15][20] concluded that the duration of walking to work is positively associated with the reduction of hypertension. In Bogota, Colombia, cycling to school has been linked to an improved physical fitness profile compared to motorized transportation [16][21].

During the COVID-19 pandemic, AT provided transportation that adhered to the social distancing guidelines implemented to reduce the spread of the outbreak. Similarly, in Scotland BSSs were demonstrated as an alternative to public transportation [17][23]. Sydney exhibited a surge in the willingness to cycle due to hygiene reasons and recreational exposure [18][24]. In addition, delivery services increased their reliance on cycling during this period [19][25].

The physical activity gained from AT has been reported to improve mental wellbeing. Physically active individuals in the UK reported less anxiety-related symptoms or emotional distress [20][26]. Similarly, in Alameda County, California, Camacho et al. [21][27] demonstrated that inactive individuals are more likely to develop clinical depression.

However, Gelb and Apparicio [22][28] observed that in Paris, France, close proximity to traffic can impose wellbeing threats to AT users via noise and air pollution. In addition to more serious threats resulting from traffic injuries [23][29], Stelling-Konczak et al. [24][30] explored the impacts of cyclists’ auditory perception (mobile conversation, music and electric cars’ quietness) on their safety. The compensatory behavior (i.e., reducing speed, looking around more frequently) of cyclists was found to counterbalance the risk arising from losing auditory cues such as tire and engine noises.

On a global scale, motorized vehicles are the second largest source of carbon emissions [25][32]. They also contribute to heat emission, ultimately magnifying the urban heat island effect. Since AT modes do not require fuel and produce substantially less heat, shifting from motorized transportation to AT will reduce climate change severity and the urban heat island effect [26][27][35,36]. Moreover, cities are designed to accommodate motorized transportation through highways, parking and tunnels.

Motorized transportation is associated with numerous overhead costs, such as fuel, insurance, and maintenance, whereas AT is much cheaper (or even free) [28][38]. Rauterkus & Miller [29][39] demonstrated a significant correlation between property values and walk scores when measuring walkability using sample properties from Jefferson County, Alabama, USA. Li & Joh [30][41] determined that bike scores exhibited a positive correlation with transit accessibility and property values in Austin, Texas, USA. These results collectively suggest that AT infrastructure investment can yield higher property values.

The New York City Department of Transportation [31][42] reported that retail sales increased by 49% when protected bicycle lanes were installed on 8th and 9th Avenues, compared to a 3% increase borough-wide, in Manhattan, New York. The Oakland Department of Transportation [32][43] also reported economic growth of 9% in retail sales after improvement via the Telegraph Avenue project in Oakland, California between 20th and 29th Streets. In particular, the project introduced eight high-visibility pedestrian crosswalks and bike lanes that stretched for nine blocks with parking protection to prevent vehicles from parking. This finding has been confirmed elsewhere, where sales and footfalls are proportional to AT users, for example in Toronto, Canada [33][44] and Auckland, New Zealand [34][45].

Additionally, enabling adequate societal participation, known as social inclusion, can be promoted through AT as it provides equitable accessibility and availability (transport equity). horizontal equity, where fairness is established between individuals who are in the same class of wealth and ability; and (ii) vertical equity, where fairness is established between different income and social classes [35][47]. Another desirable societal outcome is social interaction, where people engage in mutual leisure activities such as walking or cycling [36][48]. This in turn can enhance community livability and add a sense of social cohesion [37][49].

Indirectly, the social benefit of cycling and walking in the European Union has been estimated as €0.18 and €0.37 per kilometer using these modes, respectively. For automobiles, however, a social cost of €0.11 per kilometer has been estimated. This cost-benefit analysis includes numerous parameters such as environmental impact (cost of climate change impact; air, water and ground pollution; noise and space required for infrastructure), travel time and vehicle operation (cost of ownership and operation of a particular transport mode; travel time; roadway congestion imposed on other users), and other factors such as healthcare system savings, perceived safety and discomfort, and quality of life [38][50].

3. Active Travel Data Sources

Traditional methods to collect AT data comprise manual and automated approaches. Manual methods require a low level of technology sophistication and are labor intensive, meaning more user input is required [39][51]. The primary advantage of such methods is that they can collect additional information on AT users, such as helmet usage, gender, travel direction, and mobile phone usage, and can differentiate between AT user types (e.g., cyclists, pedestrians, and skaters). These methods can also be used as ground truth counts to validate other methods [40][52].

Automated methods involve more advanced technology compared to manual methods. These methods replace human data collectors, and therefore require less or no user input and can be implemented for lengthier periods of time, irrespective of inclement weather conditions [41][12]. However, unlike manual methods, automatic methods cannot provide additional information on AT users. Table 1 provides a summary of traditional methods.

Table 1. Summary of traditional methods for generating traditional AT data.
Method Description
Manual Methods
Video recording A standard video camera mounted and directed (temporarily or permanently) in the path of AT users (sidewalks or multi-use trails). The footage is manually examined by the data collector using paper sheets, a handheld counter, or computer software [42].A standard video camera mounted and directed (temporarily or permanently) in the path of AT users (sidewalks or multi-use trails). The footage is manually examined by the data collector using paper sheets, a handheld counter, or computer software [55].
Travel survey Travel surveys ask subjects to describe their travel activities or any further information. Data collection methods are based on a range of instruments, such as GPS devices, interviews, and conventional web-based questionnaires [41].Travel surveys ask subjects to describe their travel activities or any further information. Data collection methods are based on a range of instruments, such as GPS devices, interviews, and conventional web-based questionnaires [12].
Handheld counter The use of handheld counters (also known as clickers or tally counters) to count AT users. The data collector can count up to 4,000 AT users per hour [43].The use of handheld counters (also known as clickers or tally counters) to count AT users. The data collector can count up to 4,000 AT users per hour [56].
Ride-along observations The observant collects data from participants during their trips. For instance, the data collector cycles with a study subject to perform a survey or an interview [44].The observant collects data from participants during their trips. For instance, the data collector cycles with a study subject to perform a survey or an interview [57].
Automated Methods
Pneumatic tubes Two rubber tubes are stretched across roadways or pathways, perpendicularly attached to the pavement surface. When a bicycle or wheelchair passes over the tubes, a pulse of air is generated, triggering an electrical conduct that registers a count. The distance between the two tubes is programmed to determine the speed. This sensor is highly consumable, with a lifetime ranging from days to months [40][45].Two rubber tubes are stretched across roadways or pathways, perpendicularly attached to the pavement surface. When a bicycle or wheelchair passes over the tubes, a pulse of air is generated, triggering an electrical conduct that registers a count. The distance between the two tubes is programmed to determine the speed. This sensor is highly consumable, with a lifetime ranging from days to months [52,58].
Infrared sensors Sensors utilize invisible light to detect AT users. There are two main types of sensors: active and passive. Active infrared instruments count AT users when the beam between the transmitter and the receiver is broken. Passive infrared sensors identify temperature variations as AT users move through the detection zone of the sensor. Note that surface temperatures can affect the accuracy of the sensor [40][45].Sensors utilize invisible light to detect AT users. There are two main types of sensors: active and passive. Active infrared instruments count AT users when the beam between the transmitter and the receiver is broken. Passive infrared sensors identify temperature variations as AT users move through the detection zone of the sensor. Note that surface temperatures can affect the accuracy of the sensor [52,58].
Magnetometers Magnetometers detect changes in magnetic fields within the approximation of the sensor created by ferrous metal objects; thus, this sensor is not suitable for non-ferrous metal objects (e.g., carbon-fiber bicycles, pedestrians). The sensor is battery-powered and can be installed below the cycle path. Data are collected through radio communication [46].Magnetometers detect changes in magnetic fields within the approximation of the sensor created by ferrous metal objects; thus, this sensor is not suitable for non-ferrous metal objects (e.g., carbon-fiber bicycles, pedestrians). The sensor is battery-powered and can be installed below the cycle path. Data are collected through radio communication [59].
Pressure and acoustic pads A pressure pad sensor detects changes in weight that occur when AT users step on the detection zone. The sensor is capable of distinguishing between the pressure of cyclists and pedestrians. The acoustic pad sensor is limited to pedestrian counting as it uses ground energy waves caused by feet to detect changes. Both sensors are battery-powered and installed within the ground, making them less prone to vandalism [42][47].A pressure pad sensor detects changes in weight that occur when AT users step on the detection zone. The sensor is capable of distinguishing between the pressure of cyclists and pedestrians. The acoustic pad sensor is limited to pedestrian counting as it uses ground energy waves caused by feet to detect changes. Both sensors are battery-powered and installed within the ground, making them less prone to vandalism [55,60].
CCTV CCTV positioned on streets aided by artificial intelligence (AI) is able to generate data counts for pedestrians and cyclists. Cameras take pictures at predefined time intervals, then process those images to count pedestrians and cyclists [48].CCTV positioned on streets aided by artificial intelligence (AI) is able to generate data counts for pedestrians and cyclists. Cameras take pictures at predefined time intervals, then process those images to count pedestrians and cyclists [61].

Although the aforementioned limitations confine the applications of such data, they are typically used to validate, calibrate and in some cases complement emerging data sources. However, traditional data (specifically counts, which are considered to be reliable) often fail to accurately capture the number of AT users. Bunn [49][62] reported an outlier in Strava bike counter data, resulting from cycling in non-bicycle lanes.

Emerging methods of AT data collection feature high spatial and temporal coverage due to advances in smart devices [50][63]. Data obtained from emerging methods differ from traditional approaches, which are often spatially and temporally restricted, labor-intensive, time consuming, and cumbersome [9][14]. Emerging data can originate from various sources, most of which are considered to be crowdsourced, where a series of users provide data addressing the same topic. the emerging data sources adopted to generate AT data.

Willberg et al. [44][57] surveyed the relevant research to evaluate traditional methods (counters and observations), BSSs, GPS tracking, SFNs, surveys and interviews, Public Participation Geographic Information Systems (PPGIS), and other sources in terms of their spatial and temporal patterns, demographics, trip purpose, determinants, and barriers.

3.1. Social Fitness Networks

The phrase “social fitness” has its origins in physical exercise, weight loss regimes, and means of motivating individuals to achieve their fitness goals. Likewise, SFNs allow users to track and share their various physical activity (e.g., walking, cycling, swimming, handcycling, skiing, etc.) data with online communities [51][64]. (https://www.mapmyfitness.comaccessed on 11 June 2021), and Fitbit (https://www.fitbit.comaccessed on 11 June 2021), which in turn distribute the data commercially [41][12].

These data sources can at times overrepresent certain demographic segments such as male, younger, and tech-savvy users [52][74]. In addition, Strava is associated with several privacy issues, such as unintentionally revealing military outpost locations [53][75]. Recent changes in Strava data specifications to maintain user anonymity have consequently resulted in information loss [54][76] and the data are also restricted by high data acquisition costs due to the high fees [55][77]. Strelnikova [56][78] compared Strava and Endomondo (a SFN that has been retired) in terms of spatial and temporal resolution in South Florida, concluding that although Strava provides more detailed information, Endomondo contains data on small road segments and off-road tracks.

3.2. In-house developed apps

In-house developed apps, also known as regional bicycling tracking apps, offer region-wide cycling data through GPS-oriented travel diaries that provide GPS traces, trip purpose and demographic information. These apps are generally developed by or for public agencies and aim to record cycling travel patterns for app users in order to improve cycling within the community [41][12]. However, the success led a number of agencies and municipalities (e.g., Austin, Texas; Seattle, Washington; and Salt Lake City, Utah) to adopt the app. Other cities have rebranded the app, including Lane County, Oregon (LaneTracks); Atlanta, Georgia (Cycle Atlanta); and Philadelphia, Pennsylvania (CyclePhilly) [57][80]. Although more nuances are provided by in-house developed apps (i.e., disaggregated data at the track level) compared to SFNs (i.e., aggregated data at the street level), participant recruitment is considered the main challenge for deploying in-house developed apps, due to the time-consuming and effort-intensive properties [10][15].

3.3. Participatory mapping

Spatial knowledge from ordinary/non-expert users form datasets that can be collected through Volunteered Geographic Information (VGI), a Public Participation Geographic Information System (PPGIS). SafeLanes (https://safelanes.org/accessed on 11 June 2021)) promote user engagement in the form of voluntary reporting of various issues that can implicate transportation planning [41][12]. PPGIS platforms (i.e., Maptionniare (https://maptionnaire.com/accessed on 11 June 2021) and KoBo Toolbox (https://www.kobotoolbox.org/accessed on 11 June 2021)) are map-based surveys that solicit spatial and nonspatial information input by inviting respondents [58][85]. These platforms are usually operated by researchers or practitioners. The varied mapping skills and familiarity of study areas among users may result in data inconsistencies for such data sources [58][85]. In addition, participatory mapping platforms are subject to vandalism through false data entries [59][89].

3.4. Imagery

High spatiotemporal resolution imagery obtained at low or no cost from satellites (e.g., Google Maps and Bing Maps), street view sources (e.g., Google Street View), or drones can be integrated into supervised or unsupervised methods to extract stationary (e.g., infrastructure) and non-stationary (e.g., ridership) data. Additionally, based on satellite imagery, volunteers can digitize identifiable features including roads and building footprints via the collaborative mapping project OpenStreetMap (OSM). Accordingly, many studies use OSM to extract street networks, a key factor in AT studies. 

3.5. Bike sharing systems

BSSs allow for short-term bike rental with pickup and return locations (docks) across areas denoted as docked BSSs (known as third generation systems). In contrast, dockless BSSs (known as fourth generation systems) allow users to unlock and leave rental bikes within a geofence site [60][96]. BSS-conducted trips are generally less than 30 min [61][97] and the systems play a key role in increasing the connectivity between public transport and origin or destination locations (first mile/last mile) [62][98]. The spatial and temporal bike rebalancing issue is one of the main challenges of BSSs, where certain locations at certain times (e.g., rush hours) suffer from bike shortages causing user dissatisfaction and reducing service reliability [63][105]. This issue may increase overheads, as operators have to instruct vehicles to reestablish the balance. In order to optimize the way BSSs operate, the provided data have been utilized to further investigate and mitigate this challenge. [64][106] propose financial incentives for users to pick up or drop off bikes in alternate locations. The vast majority of BSS dataset records are provided in origin–destination journeys rather than routes. Buning and Lulla’s [65][104] work has, however, incorporated GPS data that reveals information about the used routes rather than just origin and destination. Furthermore, BSS datasets may be detailed enough to infer many useful attributes about the user such as subscription type (annual or casual), gender, year of birth, trip timestamp, and home zip code.

3.6. Social media

Social media platforms have great potential as reliable, cost-effective, and timely information sources [66][107]. Through mining techniques, researchers can extract user perceptions on certain topics, whereby user locations can be inferred from geotags [67][108]. These data have long been acquired from surveys, which require effort in recruiting the sample and may be hindered by low response rates [68][109]. Thus, transport policies can harvest information from social media to monitor traffic in real time, model travel behavior and demand, and qualitatively analyze facilities’ service qualities [69][110]. Despite their benefits, social media data are subject to age group bias and inconsistencies in the data collection [68][109].

3.7. Other

Using GPS tracking apps (e.g., Gaia GPS (https://www.gaiagps.com/accessed on 21 June 2021)), subjects can record their trips and donate them to researchers. Heesch and Langdon [70][115] evaluated the usefulness of this type of app in detecting changes resulting from infrastructure improvement on cycling behavior. The work identified a failure in triangulating GPS data due to insufficient traffic-monitoring devices, which may lead to problematic results. In order to overcome this, the authors suggested complementing GPS data with other data sources.

Data service companies (e.g., StreetLight (https://www.streetlightdata.com/accessed on 21 June 2021)) can aggregate data from different sources to provide a user-friendly analytic platform. Turner [71][116] determined a high correlation between StreetLight data and ground-truth cyclist counts.

Several self-developed apps and web-based services aim to facilitate crowdsourced data. BikeCitizens (https://www.bikecitizens.netaccessed on 21 June 2021) employs user-recorded trips and experiences after they are anonymized to improve cycling in cities. The Bike Data Project (https://www.bikedataproject.orgaccessed on 21 June 2021) aims to gather data from multiple platforms to improve cycling safety through the donation of user trips.

In response to the COVID-19 pandemic, Apple Mobility Data (https://covid19.apple.com/mobilityaccessed on 21 June 2021) reports direction requests (walking, driving, and transit) from the Apple Maps app and compares them to a baseline volume from 13 January 2020. The spatial resolution is confined to a country/region, sub-region, or city, with a daily temporal resolution. Using these data, Oguzoglu [72][117] was able to infer walking trends in Istanbul during the lockdown.

4. Open Challenges and Research Directions

Numerous policies that operate at different scales (society, city, neighborhood, and individual) cater to AT.Table 2 presents an overview of policy types that aim to increase AT. [5][6] determined that more adequate data collection and methodologies are required to optimally implement these policies. The authors explicitly state the need for data improvement and conducted large-scale studies to evaluate these policies. Given the fine spatiotemporal resolution of crowdsourced data, researchers and practitioners can prioritize locations that require policies and interventions and can also justify their investments by quantifying the policy impact.

Table 29. Policies to promote AT.
Policy Level Description
Society Policies to reduce the appeal of motorized vehicles through speed limit reductions and car parking limits, and to promote public transport to incorporate AT.
City Policies to configure urban design through initiatives such as incorporating mixed land use within walking distance to residential areas, the application of car-free centers, reducing block size, and increasing street connectivity.
Neighborhood Policies on AT infrastructure investments to make AT more convenient, comfortable and safe, by adopting separated paths, cycle tracks and end-of-trip facilities (e.g., bicycle parking, showers, lockers).
Individual Policies targeting behavior change, for example through mass media and other campaigns or by providing financial incentives.
Reprinted with permission from ref. [5][6]. Copyright 2017 Springer International Publishing AG.

The ongoing need for evidence-based policies and investment make such a practice an open challenge for future research. For example, AT trends have changed as a result of COVID-19 lockdowns worldwide, requirements to meet recommended physical activity levels, and policies to ensure safe commuting [73][118]. Furthermore, most streets do not maintain the recommended distance between people (2 m). These exceptional circumstances, including the lockdown, travel restrictions, and curfews, demand appropriate polices and interventions to accommodate AT during such conditions, and the embedding of these practices in future transport planning activities.

High-definition imagery can potentially be incorporated into AT studies. The employment of free and commercial images allows researchers to obtain data on features known to improve the AT experience, namely green spaces and water bodies [74][75][120,121]. These two features can be delineated using multispectral imagery through spectral indices such as the normalized difference vegetation index (NDVI) and normalized difference water index (NDWI), respectively. Researchers and practitioners can adopt the cloud service Google Earth Engine to manage, store and process the large amount of data [76][122].

Studies on cycling and micromobility (refer to [77][78][79][123,124,125] for micromobility studies in Austria, San Francisco, California, and Austin, Texas respectively) In contrast, products used to track non-cycling modes are limited. [19][25] indicated that emerging micromobility and conventional non-motored vehicles (i.e., skateboards, scooters, and rollerblades) share common challenges and interests with conventional bikes. The gap in the literature on non-cycling modes, including mobility aids for those with less mobility, creates an opportunity for future researchers to conduct more studies on these modes.

Previous research has indicated the biases of emerging data, which in turn threaten the outcome legitimacy of these data. BSSs provide data on all their users, eliminating the potential of social desirability and self-selection biases. Although BBSs tend to be more reflective of casual cyclists and visitors, the data are subject to spatial anonymity as the origin (or destination) represents the check-in (or -out) location of the bicycle [44][57]. Combining multiple data sources, also known as data fusion techniques, has great potential in overcoming the data uncertainties and biases.

The increasing number of studies focusing on BSSs illustrates the merits of open data, whereby the data are openly accessible to the public. Such a practice facilitates replicability and prompts more researchers to attempt to answer questions using these data. Open data may also provide an opportunity for VGI platforms to engage researchers with their platforms as data collection instruments and increase their visibility among used platforms to achieve representative sample sizes.

Since SFN data ownership belongs to third parties, the data are subject to specification changes and acquisition fees that might problematize interpretation, replicability and acquisition, respectively. Additionally, the acquisition fees are regulated by area size and time span. Thus, to avoid these constraints, open source apps may to some extent substitute this data source. These challenges should be acknowledged by transport agencies prior to adopting this emerging data source.