Submitted Successfully!
To reward your contribution, here is a gift for you: A free trial for our video production service.
Thank you for your contribution! You can also upload a video entry or images related to this topic.
Version Summary Created by Modification Content Size Created at Operation
1 -- 2037 2023-12-18 14:45:45 |
2 Format correct Meta information modification 2037 2023-12-19 13:56:37 |

Video Upload Options

Do you have a full video?

Confirm

Are you sure to Delete?
Cite
If you have any further questions, please contact Encyclopedia Editorial Office.
Zhao, Y.; Ning, Y.; Li, H.; Liao, Z.; Liu, Y.; Li, F. Quality of OpenStreetMap Data. Encyclopedia. Available online: https://encyclopedia.pub/entry/52882 (accessed on 03 July 2024).
Zhao Y, Ning Y, Li H, Liao Z, Liu Y, Li F. Quality of OpenStreetMap Data. Encyclopedia. Available at: https://encyclopedia.pub/entry/52882. Accessed July 03, 2024.
Zhao, Yijiang, Yahan Ning, Haodong Li, Zhuhua Liao, Yizhi Liu, Feng Li. "Quality of OpenStreetMap Data" Encyclopedia, https://encyclopedia.pub/entry/52882 (accessed July 03, 2024).
Zhao, Y., Ning, Y., Li, H., Liao, Z., Liu, Y., & Li, F. (2023, December 18). Quality of OpenStreetMap Data. In Encyclopedia. https://encyclopedia.pub/entry/52882
Zhao, Yijiang, et al. "Quality of OpenStreetMap Data." Encyclopedia. Web. 18 December, 2023.
Quality of OpenStreetMap Data
Edit

OpenStreetMap (OSM) is a potential source of geospatial open data for monitoring sustainable development goals (SDG) indicators. Improving the quality of these crowdsourcing data has significant implications for monitoring and achieving SDGs, such as zero hunger, sustainable cities, ensuring tenure security, and preserving biodiversity. The quality of OpenStreetMap (OSM) has been widely concerned as a valuable source for monitoring some sustainable development goals (SDG) indicators. Improving its semantic quality is still challenging. As a kind of solution, road type prediction plays an important role. However, most existing algorithms show low accuracy, owing to data sparseness and inaccurate description. 

OSM SDG quality evaluation quality improvement data

1. Introduction

Volunteered Geographic Information (VGI) has been widely concerned by academia and industry since it was coined by Goodchild in 2007 [1]. OpenStreetMap (OSM) is one of the most successful VGI projects. According to OSM official statistics, it has more than 9 million registered users so far. Benefiting from numerous volunteers and their familiarity with surrounding features, geospatial data are updated on the OSM platform frequently and quickly. Now various applications are derived and expanded on OSM spatial data because the data is free for all uses [2]. And OSM is a potential source of geospatial open data for monitoring sustainable development goals (SDG) indicators [3]. Improving the quality of these crowdsourcing data has significant implications for monitoring and achieving SDGs, such as zero hunger, sustainable cities, ensuring tenure security, and preserving biodiversity [4].
Nevertheless, there are some issues with spatial data quality because the OSM platform does not have a rigorous error detection and notification mechanism during the contributors’ submission process and many contributors lack knowledge related to geography and geographic mapping, such as inaccuracy and incompleteness, and data quality vary with different countries and regions [5]. And the contributing experience and skill of contributors also vary greatly. Therefore, it is very difficult for most of them to describe OSM geographic elements with accurate semantic attributes (tags in OSM), which leads to certain semantic quality problems in the OSM dataset, especially for some geographic objects with similar types [6]. To some extent, these issues have hindered the development of the OSM and reduced its role as a valuable source for monitoring some SDG indicators. Therefore, the research on how to improve the quality of OSM data [7][8][9] has been a popular topic in the academic community in recent years.
Spatial data mainly contain three basic features: spatial, thematic, and temporal features. The thematic features of geographic elements in OSM are mainly described by tags. It can be divided into element class tags and other attribute tags [6]. Element class tags are applied to differentiate from other types of geographical elements, such as highways, major roads, or residential roads. These tags are very important attributes that connect OSM elements and map layers [6]. Other attribute tags describe other characteristics of the OSM elements, such as road name, width, and speed limit.
The geographical information quality of OSM has been extensively studied recently. Most of them have focused on two aspects: quality evaluation [10][11][12] and quality improvement [7][13]. In terms of quality improvement, many scholars pay more attention to tag recommendations of OSM geographic objects [9][14][15][16][17][18][19][20][21][22]. Existing tag recommendation methods are mostly based on the characteristics of OSM elements themselves [14][15][16][17][18][19]. For example, Storandt et al. proposed a system to recommend suitable road labels only according to the name of points of interest (POIs) [15].
The road is one of the most important elements in OSM, and it is the basis of numerous applications such as navigation and network analysis [23][24][25][26]. Therefore, the tag recommendation of the OSM road network is of particular importance. At present, most of the research on road tag recommendation considers the object’s characteristics and its restricted characteristics [18][19][20]. In addition, the experimental data for the studies are generally selected from the areas with rich OSM data (such as London, UK), and the OSM data of these places are recognized to be extremely useful [27]. In other words, these models are generally only effective in handling dense OSM datasets. However, insufficient quality and availability of the OSM data relatively in some economically underdeveloped countries and regions often limit their application. The problems of incomplete geographical objects or inaccurate semantic descriptions of geographical objects exist in various degrees. Thus, it is still challenging to further improve the accuracy of tag recommendation.
According to Tobler’s First Law of Geography [28], the spatial distribution of geographical things or attributes is interrelated with each other, and it appears with clustering, random, and regular characteristics. The relationship can be described by the spatial context. Hence, spatial context extraction from the surrounding environment of geographic objects can enrich their characteristics and is very useful for predicting their types. As shown in Figure 1, it is easy to find out the difference among the three road types in the OSM platform: secondary, tertiary, and residential roads. The tertiary and residential roads offer more opportunities to be close to residential buildings (regularly arranged buildings in the figure), while secondary roads have only a small part adjacent to residential houses. Its main function is to connect specific administrative centers, traffic hubs, commercial zones, etc., of which characteristics are straight and spacious.
Figure 1. An example of different road types in OSM, where secondary, tertiary, and residential roads are purple, blue, and green, respectively, other types of roads are black, and buildings are yellow rectangles.
Several studies have shown that it is feasible to extract spatial contexts (SC) and apply them to tag recommendation of spatial objects [9][20][21][22], which can improve the recommendation accuracy in the case of poor data integrity and low semantic accuracy in OSM regions. Ali et al. analyzed the spatial context of fuzzy grassland classification tag recommendation and counted the relevant regional entities, such as “amenities” and “leisure” and some linear entities in the research regions [21], in which the effectiveness of spatial context deployment has been verified. However, they did not adequately consider the influence of the surrounding objects, for example, POIs around the park regions. Alghanim et al. added the building context into the feature matrix of road elements, and they used the random forest (RF) algorithm to recommend road tags [6]. The work achieved good results on their dataset, but they ignored the influence of other spatial contexts, such as connecting road context, Zhao and Tang proposed a system to recommend suitable building labels that introduced external semantic features, including the location features of buildings, spatial co-location patterns of points of interest (POI), nighttime light, and land use information of the buildings [22].

2. Quality Evaluation of OSM Data

In recent years, the ability of novice contributors to accurately describe spatial geographic data poses many concerns for OSM data quality [8]. Therefore, many studies have focused on evaluating the accuracy and completeness of OSM data. International Cartographic Association (ICA) [28] developed seven rules for assessing the quality of spatial geographic data. Based on these seven rules [29], Barron et al. extended the rules, including the semantic, geometric accuracy, and availability of spatial data. Moreover, since contributors are an important part of map production, data-centric and contributor-centric assessments are often combined [11][12][30][31][32][33]. Overall, the OSM quality assessment can be divided into extrinsic quality measures and intrinsic quality measures.
Extrinsic quality measures: This type of research compares OSM data with authoritative data from other official institutions for external evaluation [10][34][35][36]. Such methods rely on external data. However, authoritative datasets are more difficult to obtain than public ones, and the update efficiency of their data is sometimes slower than the fast-changing OSM data. All these factors constrain such evaluation methods in applications and expansions.
Intrinsic quality measures: Intrinsic method does not rely on external or authoritative data sources for validation [37][38]. The methods can measure the accuracy of OSM data by assessing changes in historical versions of the data, or by associating user reputation. As an example, Fogliaroni et al. calculated the quality score of geographic features by analyzing the geometric, qualitative, and semantic changes in the edited version of history. They used it to approximate the quality score of spatial data at the end [8]. Zhou and Zhao used spatial similarity and geometrical similarity to calculate the similarities between the versions. The reputation of the contributor was obtained by analyzing the complicit assessments computed by version similarity [28]. Mullagann et al. analyzed the spatial semantic relations of point features. The spatial semantic interaction was used to measure the semantic similarity of the change history of geographic elements [33].

3. Quality Improvement of OSM Data

The quality improvement of OSM data has received much attention from many researchers. It can be divided into two aspects: identifying and correcting erroneous data for OSM, and OSM tag recommendation.
Identifying and correcting erroneous data for OSM: The early literature on improving the quality of crowd-sourced geographic data focused on detecting and modifying error elements. For example, Vargas et al. used a Markov random field method to maximize the correlation among annotations of OSM buildings and predicted building probability maps. After removing several redundant geometric annotations through the relationship between building probability maps and thresholds, they used CNN to predict and add new architectural geometric annotations [7]. Kashian et al. analyzed the “semantics” of the newly contributed data by identifying potential patterns of coexistence between POIs and other geographic features. They calculated the likelihood of a POI being registered at this location to improve the detection and verification system of the OSM platform. The location accuracy of registered POI in OSM can be improved [13]. These studies have contributed significantly to improving the semantic quality of OSM databases.
OSM tag recommendation: OSM does not have a proper tag verification mechanism, which leads to a problem in that the OSM tags vary greatly with different contributors. A tag recommendation method is a good method for this issue and can significantly improve the quality of OSM data [8]. There have been many studies on OSM and other open data in recent years. For example, Arnaud et al. established a tag recommendation system named “OSMantic”. By calculating the corresponding semantic similarity score, the system, which gives timely relevant suggestion tags by calculating the corresponding semantic similarity score when users submit commit them. And the system will give some semantic accuracy hints when the score is too low [14]. Storandt et al. developed a recommendation system that only needs the POI name to recommend appropriate tags [15]. Jilani et al. constructed the features of road elements, such as degree distribution, intermediary centrality, node number in a bounding box, etc. The model constructs and represents the road and its features by using a graph structure. They used an artificial neural network (ANN) to train the model and recommend tags [16]. Corcoran et al. focus on geometric features and define a series of geometric features about road elements, such as degree, road curvature, parallelism, etc. Finally, their model reported 68% and 65% weighted accuracy and recall values, respectively [18]. Hacar used geometric and semantic features to classify and recommend the leisure tags of polygons [19]. Tag recommendation can motivate contributors to contribute correct tags which are highly effective in improving quality. Therefore, it is currently a widely studied method. However, these studies only consider geometric or other semantic features of the element itself, which mostly depend on the quality of the OSM data, and it often struggles to adapt to semantically incomplete datasets.
For geographical spatial features, each element is related to other similar elements, and only the distance determines the size of the influence [27]. In the current research, there is a tendency to combine features of geographic elements with the spatial context, and there are many achievements. For example, Zhang et al. used geometric features and restricted features of road elements to detect label tag semantic inconsistency and other problems, while giving intelligent suggestions based on the information available in the spatial context of the problem data [20]. Alghanim et al. used building context as a feature to analyze road elements and used a 20 M to 200 M linear buffer to count context semantics. They developed classifiers to recommend classification tags for road elements based on this method [9]. Ali et al. identified and predicted several grassland fuzzy categories based on contextual attributes and topological features by analyzing the case of building elements and road elements versus object elements with three selected topologies. Among them, the connecting road context applies several road element categories related to park grass, including “foot” and “bike” [21].

References

  1. Goodchild, M.F. Citizens as voluntary sensors: Spatial data infrastructure in the world of Web 2.0. Int. J. Spat. Data Infrastruct. Res. 2007, 2, 24–32.
  2. Ingensand, J.; Composto, S.; Ertz, O.; Rappo, D.; Nappez, M.; Produit, T.; Oberson, M.; Widmer, I.; Joost, S. Keys to successful scientific VGI projects. In Proceedings of the 4th Open Source Geospatial Research and Education Symposium (OGRS2016), Perugia, Italy, 14–16 October 2016; p. 14.
  3. Borkowska, S.; Pokonieczny, K. Analysis of OpenStreetMap data quality for selected counties in Poland in terms of sustainable development. Sustainability 2022, 14, 3728.
  4. Persello, C.; Wegner, J.D.; Hansch, R.; Tuia, D.; Ghamisi, P.; Koeva, M.; Camps-Valls, G. Deep learning and earth observation to support the sustainable development goals: Current approaches, open challenges, and future opportunities. IEEE Geosci. Remote Sens. Mag. 2022, 10, 172–200.
  5. Zhou, Q.; Zhang, Y.; Chang, K.; Brovelli, M.A. Assessing OSM building completeness for almost 13,000 cities globally. Int. J. Digit. Earth 2022, 15, 2400–2421.
  6. Zhao, Y.; Yang, W.; Liu, Y.; Liao, Z. Discovering transition patterns among OpenStreetMap feature classes based on the Louvain method. Trans. GIS 2022, 26, 236–258.
  7. Vargas-Muñoz, J.E.; Lobry, S.; Falcão, A.X.; Tuia, D. Correcting rural building annotations in OpenStreetMap using convolutional neural networks. ISPRS J. Photogramm. Remote Sens. 2019, 147, 283–293.
  8. Mooney, P.; Minghini, M. A review of OpenStreetMap data. In Mapping and the Citizen Sensor; Ubiquity Press: London, UK, 2017.
  9. Alghanim, A.; Jilani, M.; Bertolotto, M.; McArdle, G. Leveraging Road Characteristics and Contributor Behaviour for Assessing Road Type Quality in OSM. ISPRS Int. J. Geo-Inf. 2021, 10, 436.
  10. Mahabir, R.; Stefanidis, A.; Croitoru, A.; Crooks, A.; Agouris, P. Authoritative and Volunteered Geographical Information in a Developing Country: A Comparative Case Study of Road Datasets in Nairobi, Kenya. ISPRS Int. J. Geo-Inf. 2017, 6, 24.
  11. Fogliaroni, P.; D’Antonio, F.; Clementini, E. Data trustworthiness and user reputation as indicators of VGI quality. Geo-Spat. Inf. Sci. 2018, 21, 213–233.
  12. Anderson, J.; Soden, R.; Keegan, B.; Palen, L.; Anderson, K.M. The crowd is the territory: Assessing quality in peer-produced spatial data during disasters. Int. J. Hum. Comput. Interact. 2018, 34, 295–310.
  13. Kashian, A.; Richter, K.-F.; Rajabifard, A.; Chen, Y.; Both, A.; Duckham, M.; Kealy, A. Mining the co-existence of POIs in OpenStreetMap for faulty entry detection. In Proceedings of the 3rd Annual Conference of Research@Locate, Melbourne, Australia, 12–14 April 2016.
  14. Vandecasteele, A.; Devillers, R. Improving volunteered geographic information quality using a tag recommender system: The case of OpenStreetMap. In OpenStreetMap in GIScience; Springer: Berlin/Heidelberg, Germany, 2015; pp. 59–80.
  15. Storandt, S.; Funke, S. Automatic Improvement of Point-of-Interest Tags for OpenStreetMap Data. In Proceedings of the Free and Open Source Software for Geospatial (FOSS4G) Conference Proceedings, Seoul, Republic of Korea, 14–19 September 2015; p. 56.
  16. Jilani, M.; Corcoran, P.; Bertolotto, M. Automated quality improvement of road network in OpenStreetMap. In Proceedings of the Agile Workshop (Action and Interaction in Volunteered Geographic Information), Leuven, Belgium, 14–17 May 2013; p. 19.
  17. Corcoran, P.; Jilani, M.; Mooney, P.; Bertolotto, M. Inferring semantics from geometry: The case of street networks. In Proceedings of the 23rd SIGSPATIAL International Conference on Advances in Geographic Information Systems, Washington, DC, USA, 3–6 November 2015; pp. 1–10.
  18. Jilani, M.; Corcoran, P.; Bertolotto, M. Machine learning for crowdsourced spatial data. In Machine Learning and Knowledge Discovery in Databases; Springer: Berlin/Heidelberg, Germany, 2016; pp. 294–297.
  19. Hacar, M. Using geometric and semantic attributes for semi-automated tag identification in OpenStreetMap data. In Proceedings of the 29th GISRUK Conference, Cardiff, UK, 13–16 April 2021.
  20. Zhang, X.; Ai, T. How to model roads in OpenStreetMap? A method for evaluating the fitness-for-use of the network for navigation. In Advances in Spatial Data Handling and Analysis; Springer: Berlin/Heidelberg, Germany, 2015; pp. 143–162.
  21. Ali, A.L.; Schmid, F.; Al-Salman, R.; Kauppinen, T. Ambiguity and plausibility: Managing classification quality in volunteered geographic information. In Proceedings of the 22nd ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, Dallas, TX, USA, 4–7 November 2014; pp. 143–152.
  22. Zhao, Y.; Tang, X.; Liao, Z.; Liu, Y.; Liu, M.; Lin, J. Multi-Type Features Embedded Deep Learning Framework for Residential Building Prediction. ISPRS Int. J. Geo-Inf. 2023, 12, 356.
  23. Pazoky, S.H.; Pahlavani, P. Developing a multi-classifier system to classify OSM tags based on centrality parameters. Int. J. Appl. Earth Obs. Geoinf. 2021, 104, 102595.
  24. Kuo, C.-L.; Tsai, M.-H. Road characteristics detection based on joint convolutional neural networks with adaptive squares. ISPRS Int. J. Geo-Inf. 2021, 10, 377.
  25. Zhang, X.; Wang, T.; Jiao, D.; Zhou, Z.; Yu, J.; Cheng, X. Detecting inconsistent information in crowd-sourced street networks based on parallel carriageways identification and the rule of symmetry. ISPRS J. Photogramm. Remote Sens. 2021, 175, 386–402.
  26. Zourlidou, S.; Sester, M.; Hu, S. Recognition of Intersection Traffic Regulations from Crowdsourced Data. ISPRS Int. J. Geo-Inf. 2023, 12, 4.
  27. Tobler, W.R. A computer movie simulating urban growth in the Detroit region. Econ. Geogr. 1970, 46, 234–240.
  28. Morrison, J.L. Spatial data quality. Elem. Spat. Data Qual. 1995, 202, 1–12.
  29. Barron, C.; Neis, P.; Zipf, A. A comprehensive framework for intrinsic OpenStreetMap quality analysis. Trans. GIS 2014, 18, 877–895.
  30. Madubedube, A.; Coetzee, S.; Rautenbach, V. A Contributor-Focused Intrinsic Quality Assessment of OpenStreetMap in Mozambique Using Unsupervised Machine Learning. ISPRS Int. J. Geo-Inf. 2021, 10, 156.
  31. Zhou, X.; Zhao, Y. A Version-Similarity Based Trust Degree Computation Model for Crowdsourcing Geographic Data. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 2016, 41, 327–333.
  32. Bishr, M.; Mantelas, L. A trust and reputation model for filtering and classifying knowledge about urban growth. GeoJournal 2008, 72, 229–237.
  33. Mülligann, C.; Janowicz, K.; Ye, M.; Lee, W.-C. Analyzing the spatial-semantic interaction of points of interest in volunteered geographic information. In Proceedings of the International Conference on Spatial Information Theory, Belfast, Ireland, 12–16 September 2011; pp. 350–370.
  34. Haklay, M. How good is volunteered geographical information? A comparative study of OpenStreetMap and Ordnance Survey datasets. Environ. Plan. B Plan. Des. 2010, 37, 682–703.
  35. Dorn, H.; Törnros, T.; Zipf, A. Quality Evaluation of VGI Using Authoritative Data—A Comparison with Land Use Data in Southern Germany. ISPRS Int. J. Geo-Inf. 2015, 4, 1657–1671.
  36. Zielstra, D.; Zipf, A. A comparative study of proprietary geodata and volunteered geographic information for Germany. In Proceedings of the 13th AGILE International Conference on Geographic Information Science, Guimarães, Portugal, 10–14 May 2010; pp. 1–15.
  37. Zhao, Y.; Zhou, X.; Li, G.; Xing, H. A spatio-temporal VGI model considering trust-related information. ISPRS Int. J. Geo-Inf. 2016, 5, 10.
  38. Zhao, Y.; Wei, X.; Liu, Y.; Liao, Z. A Reputation Model of OSM Contributor Based on Semantic Similarity of Ontology Concepts. Appl. Sci. 2022, 12, 11363.
More
Information
Contributors MDPI registered users' name will be linked to their SciProfiles pages. To register with us, please refer to https://encyclopedia.pub/register : , , , , ,
View Times: 141
Revisions: 2 times (View History)
Update Date: 19 Dec 2023
1000/1000
Video Production Service