The paper is devoted to an overview of multi-agent principles, methods, and technologies intended to adaptive real-time data clustering. The proposed methods provide new principles of self-organization of records and clusters, represented by software agents, making it possible to increase the adaptability of different clustering processes significantly. The paper also presents a comparative review of the methods and results recently developed in this area and their industrial applications. An ability of self-organization of items and clusters suggests a new perspective to form groups in a bottom-up online fashion together with continuous adaption previously obtained decisions. Multi-agent technology allows implementing this methodology in a parallel and asynchronous multi-thread manner, providing highly flexible, scalable, and reliable solutions. Industrial applications of the intended for solving too complex engineering problems are discussed together with several practical examples of data clustering in manufacturing applications, such as the pre-analysis of customer datasets in the sales process, pattern discovery, and ongoing forecasting and consolidation of orders and resources in logistics, clustering semantic networks in insurance document processing. Future research is outlined in the areas such as capturing the semantics of problem domains and guided self-organization on the virtual market.
The known clustering task consists of categorizing a given matters collection according to its inner similarity, such that items belonging to the same group (cluster) are more alike to each other in comparison to ones located in additional sets. Such a problem is typically resolved with a predefined number of groups [1,2,3,4,5,6,7,8].
However, in many practical problems, data are received gradually, e.g., via real-time record-by-record fashion in small batches within unpredictable periods. The most representative example of such an application is customer classification in an internet portal, with a large number of visitors contributing a small but significant amount of data during each visit. User patterning aiming for coherent and up-to-date behavior suggests an adaptive and go-ahead clustering process, taking into account the dynamic nature of the data.
The main limitation of many current data mining methods and algorithms is the need to suggest a hypothesis about data configuration in advance that is frequently impossible in the online mode.
Despite considerable progress in data mining and machine learning technologies, the intended methods and algorithms cannot cope with the enormous amounts of data with critical uncertainty and dynamics. Complexity becomes even more prominent because of the diversity of datasets and growing of application domains. This difficulty also appears since the individual business requirements and computational complexity of the clustering process requires additional resources for the practical applications. An example of such a complexity is given by the authors of [9], where the similarity of user preferences given as a cluster criterion was combined with many other metrics like topology, domain-specific semantics, etc.
Future clustering is comprehended as a collection of if–then rules, providing the most understandable manner of cluster visualization. This technique produces the possibility for smart solutions in business to learn the behavior of orders and resources “on the fly” and to improve the required business capacities in a real-time fashion.
The second problem is to find possible options for order consolidation aiming to improve efficiency [14]. The clustering method was applied here to find groups of orders similar in the geographical location and time windows. It can be supplied simultaneously by one truck with a specific capacity. The consolidation of orders requires the use of journey time metrics (JTM) and analysis of nearness of source locations by geography and by distance/time (JTM), as well as destination locations in combination with the overlap of time intervals if one truck takes all consolidated orders.
Such orders can be delivered in different ways: All orders in the cluster could be shipped by one truck, or by several trucks similar or not to each other. Particular heuristics were developed to expose the options for decision-makers. The resulting clusters discovered some interesting hidden rules:
The obtained results are given in Figure 7.
In general, the solution is 1.5-times more consolidations, improving the schedule quality by 15%.
Multi-agent technology provides various recently developed methods for adaptive clustering by data their self-organization to expand the current technologies, which can be considered as a new generic methodology. The stated results designate the following research directions for future research in the efficiency of adaptive clustering using multi-agent technology:
The clustering process significantly depends on domain-specific knowledge and decision-making rules. One of the very promising new approaches for capturing domain semantics is an ontology in the form of semantic networks of concepts and relations, which can help improve clustering with the use of specific domain knowledge, e.g., in manufacturing, transport, agriculture, etc. In this case, the generic multi-agent framework of clustering could be customized for particular customers by domain-specific ontologies.
At the moment, agents use real money in decision-making. However, virtual money can be introduced as a way to regulate the abilities of clusters and records to make decisions and solve conflicts. For example, in transportation logistics, the sum of money available for a record can be set as a price for the order, which the customer is due to pay. Then, the history of VIP order is “richer” than a record of a single order from a regular customer. It makes it possible to form clusters with distant records, taking into account a growing context, and generating more groups as a result.
In the case of an advanced model of virtual markets, clusters and records pay virtual taxes to act in the system and stay in groups. This affects their financial resources, making some clusters and records evolutionally disappear from the system, thus decreasing the load on the network, while some of them grow and become more assertive.
A different model of microeconomics can also be applied. For example, clusters can take charges from records for the right to enter the following “club system” model (the amount of payment does not depend on the situation, the productivity of the records, or the number of club members) or according to the “shareholder” model (the amount depends on the case). Thus, agents of clusters and records are supposed to think not only about “money,” but about their “lifespan,” balancing between the criteria. The foremost could stay the same. Many advanced economic models for better managing agent behavior could be applied in the framework of the same multi-agent system.
The self-organization process can be stopped by local attractors, which is hard for agents to avoid. One of the solutions consists of the appointment of an agent of the “whole” system. Its mission is to monitor the situation in clustering, recover the potential local optimums, and to cascade changes during different types of interventions, for example, investing virtual money into some promising clusters or records.
Agents of records and clusters may use different self-learning tools, including neuro-networks or deep learning ones, to alter their decision-making rules and analyze the results.
As the next step in method development, we consider a situation when one record (cluster) can locate in a few groups simultaneously, paying for a membership or some virtual taxes, etc. In this case, the collected virtual money of each agent needs to be assigned by other agents of clusters and records in the best possible way on the virtual market, aiming to decide which candidates to choose.
The results of the research could be useful for designing a new generation of smart software products based on multi-agent technology that is able to adapt, learn, and evolve over their life cycle.
This entry is adapted from the peer-reviewed paper 10.3390/math8101664