Location-based services have been commonly used in daily life. Spatial keyword query, as one of the important technologies in location-based services, has been widely used in many fields, such as intelligent navigation systems and spatial positioning systems. Different types of spatial keyword query problems have been studied in depth by scholars.
. Introduction
With the rapid development of Internet technology and sensor technology in recent years, location-based services have been commonly used in daily life. In particular, spatial keyword query, as one of the important technologies in location-based services, has been widely used in many fields, such as intelligent navigation systems and spatial positioning systems. Different types of spatial keyword query problems have been studied in depth by scholars at home and abroad, such as spatial keyword nearest neighbor query problems
[1][2][3], TOP-k spatial keyword queries
[4][5][6][7][8], inverse nearest neighbor queries
[9][10][11][12] and spatial keyword group queries
[13][14][15]. In the spatial database, a number of spatial–text objects, also known as points of interest (POI), are stored. A spatial keyword query returns one or some points of interest, which need to meet the various requirements of the query and be the best distance. Among the various branches of spatial keyword queries, the application of spatial keyword group queries in daily life has gradually spread in recent years and attracted the attention of many scholars. For example, when a tourist travels to a certain city, planning the desired location of the hotel needs to include activities such as eating, visiting the park, watching movies and so on for a period of time.
Temporal information is also an indispensable consideration in the field of spatial keyword queries. Since each POI such as stores or parks has its own opening hours, users cannot access the POI during non-opening hours. In the query system or recommendation system, different POIs have different opening hours, for example, the breakfast store is open from 6:30 to 10:00 and the shopping mall is open from 9:30 to 20:00, so for the different needs of different users, it is necessary to return different POI objects. Specifically, when the user wants to eat breakfast at 7:00, the query system should return the above breakfast store instead of the mall. Although there may be restaurants in the mall, the mall’s time does not meet the user’s needs. However, most researchers have focused on textual constraints and spatial constraints and have not considered the impact of temporal constraints in daily life.
Furthermore, with the development of science and technology in society, the needs of users are also increasing. For spatial keyword queries, users may not only want to query to meet the required keyword objects, but may also hope that, in the query process, they can be excluded from containing the user’s exclusion keyword objects. Moreover, some users around the world also have many exclusionary matters due to their religious beliefs and other reasons. However, few studies on spatial keyword querying have considered users’ exclusion preferences. Therefore, spatial keyword querying with exclusion keywords has great research value.
2. Spatial Keyword Group Query
The spatial keyword group query problem
[13][14][15], as a special form of spatial keyword query, has common applications in real life. The concept of the mCK (m-closest keywords) query was first proposed in the literature
[16]. Given a set of query keywords, the mCK query finds m interest points and the keyword information of this set of interest points jointly covers the query keywords and minimizes the maximum pairwise distance of the objects in the group. However, Zhang et al.
[16] only focused on the fact that each object contains only one query keyword and its proposed exact algorithm has low adaptability for large data sets. Considering the possibility of multiple keywords, Guo et al.
[17] relaxed the constraint of having only one keyword per object and proposed an approximation algorithm with a factor of (1.15 + ε). Deng et al.
[18] presented an optimal keyword coverage problem, which is a variant of the mCK query. This study considered both intra-group distance and keyword weights and incorporated both factors into a linear cost function, thereby proposing an accurate algorithm to solve this problem. Since the results of mCK queries do not provide sufficient support in some application scenarios, a new spatial keyword query, called the SK-Cover query (spatial keyword cover query), was proposed in the literature
[19]. This query considered the number of objects in the results set and put an approximate algorithm with a time complexity of O (log
m) for this query problem. Effective access policies and pruning rules were laid out to improve the efficiency and scalability of the algorithm in
[19]. Subsequently, Li et al.
[20] studied spatial keyword group queries and proposed a parametric approximation algorithm that allows the approximation ratio to be adaptive and the user to assign arbitrary query precision.
The literature
[13][14][15][16][17][18][19][20] are traditional spatial keyword group queries, focusing only on the study of keyword constraints and distance constraints. Nevertheless, some researchers have extended conventional spatial keyword group queries to some extent. Singh et al.
[21] considered objects labeled with keywords and embedded in a vector space to study the multi-objective query problem in a multi-dimensional space for the closest set of points satisfying a given set of keywords. In a low-dimensional space, IR trees are mainly used to support query processing. But in a high-dimensional space (with dimensions larger than 10), different MBRs (minimum bounding rectangles) of IR trees can contain a large amount of overlapping data, resulting in a reduced index performance. So, this literature studied indexing mechanisms and query algorithms in a high-dimensional space, using random projection and hash-based indexing structures, and achieved high scalability and speedup ratios. The CoSKQ (collective spatial keyword query) query model introduces query points based on mCK, and Gao, Cao and Zhao et al.
[22][23][24] studied the CoSKQ query model in a road network space. Unlike most studies, Zhao et al.
[24] proposed a popularity-aware aggregated keyword in road networks, aiming to find a set of popular POIs (i.e., a popularity region). The POIs cover the query keywords and satisfy the distance requirements from each node to the query node and between each node pair, such that the sum of the scores of these nodes for the query keywords is maximized. For this reason, a scaling technique for scoring was raised to reduce the search space, and a redundant computation reduction technique was proposed to reduce the redundant computations in query processing. Su et al.
[25] also studied ensemble spatial keyword queries in a road network environment, but the difference is that it is based on group-based ensemble queries, i.e., GBCK (group-based collective keyword). Considering the importance of the keyword levels for decision support, Zhang et al.
[26] proposed the Level Aware Set Space Keyword Query (LCSK) to find a set of POIs that jointly covers the query keywords with threshold constraints and minimal spatial distance cost. An exact algorithm and approximation algorithm with provable approximations related to this problem were designed for the LCSK. To better express more fine-grained preferences for cost-aware and distance-constrained keyword queries in the set space, Chan et al.
[27] proposed new criteria, optimized the cost function and designed an exact and approximate algorithm. Xu et al.
[28] differed from other works in that it studied mobile set space keyword queries from the dynamics of query points and raised two approximation algorithms based on the safety zone technique.
Considering the importance of temporal information in spatial keyword queries, some researchers have proposed query models with temporal awareness. Chen et al.
[29] first studied temporally aware Boolean keyword queries, known as a TABSKQ (time-aware Boolean spatial keyword query), and proposed an efficient index structure TA-tree and its corresponding query algorithm for this problem model. The index can effectively prune the search space and take into account both keyword information and temporal information. Chen et al.
[30] proposed two evaluation functions for time-aware aggregate keyword queries to meet different types of query requirements and both proposed corresponding processing algorithms. Chan et al.
[31] studied the model of indoor keyword routing with time constraints, i.e., the TIKRQ problem (time-constrained indoor keyword-aware routing query), and proposed a series of pruning rules and corresponding solution algorithms. Considering that most of the existing works cannot solve the spelling error cases and thus focus on the time-aware approximate set keyword search in traffic networks, Feng et al.
[32] proposed a TDAG-tree index for the distance pruning of query objects and designed two approximation algorithms to improve the processing efficiency to a large extent.