Patent Analysis and Tools: Comparison
Please note this is a comparison between Version 1 by Konstantinos Georgiou and Version 2 by Jason Zhu.

Patent analysis is a field that concerns the analysis of patent records, for the purpose of extracting insights and trends, and it is widely used in various fields. Despite the abundance of proprietary software employed for this purpose, there is currently a lack of easy-to-use and publicly available software that can offer simple and intuitive visualizations, while advocating for open science and scientific software development.

  • patent analytics
  • scientific software development
  • topic modeling

1. Introduction

In this era of technological and entrepreneurial progress, an increasing number of companies seek to safeguard their intellectual property. Specifically, the number of annual patent applications has almost tripled in the last two decades, according to a study conducted by the World Intellectual Property Organization (WIPO) [1], rendering patent documents more valuable than ever before. Patents are widely considered as a safe choice for large companies and organizations to secure commercial rights, avoid litigation actions and retain their competitive advantage [2].
The scope and importance of patenting is made clear when considering the large number of patent offices around the world, responsible for receiving, evaluating and granting patent applications. Such offices, with the most prominent ones being the United States Trademark and Patent Office (USPTO), the European Patent Office (EPO) and the China National Intellectual Property Administration (CNIPA), handle the difficult task of processing and analyzing patent documents, examining their objectives and their validity. This wealth of information has led to the emergence of patent analysis (PA), as a promising scientific domain that leverages data from patent offices to extract valuable results [3].
In brief, PA is a field that covers the study of patent documents utilizing proven methodologies and techniques comprising text mining, machine learning and data visualization [4][5][4,5]. The results of PA have numerous applications that can be exploited in different sections within an organization or a business, including R&D management, human resources, mergers and acquisition, company evaluation and competitive intelligence [6]. In addition, PA offers a plethora of opportunities for the extraction of meaningful insights through the application of advanced approaches, such as topic modeling, network analysis and machine learning.
While PA offers valuable insights, it is a time-consuming multi-stage process that requires specific skills to be conducted. Patent documents must be collected from various sources, leveraging APIs offered by the patent offices, if applicable, or by using high-level programming languages and databases. After collecting the documents, they must be preprocessed and filtered to meet certain criteria depending on the research goals and examined domain and, finally, be analyzed using a set of methodologies. While this process may seem simple for a seasoned researcher or an individual with a background in programming, databases and data engineering, there are groups of users, such as industrial actors and business stakeholders, that may not possess these types of skills or knowledge and require PA to be streamlined, automated and free of prior knowledge.
Hence, in recent years, tools that automate the process of PA have emerged and have been utilized within organizations [4], due to the excessive volume of patent documents and the inherent complexity in analyzing them. These tools frequently offer the possibility of identifying and collecting related documents, filtering them based on established criteria and applying PA methodologies. Some of these tools are also offered for advanced scientific purposes and enable researchers from multiple disciplines to overcome the obstacles of PA and easily process patent entries.
However, while PA tools do exist and are in use, very few of them are available as free, accessible and open-source solutions, with the majority of tools being either proprietary or requiring payment after a short free trial. In addition, the existing open-source PA tools are somewhat complex to navigate, requiring a level of scientific knowledge. Thus, the lack of a flexible, open-source and public PA tool that can cater to the needs of multiple target groups for research purposes is a clear gap in the domain of PA software. Particularly in recent years, and even more so during the COVID-19 pandemic, the programming community has greatly encouraged the principles of open science [7][8][7,8] and scientific software development [9][10][9,10]. These two concepts combine the need for transparency and openness in all scientific domains along with the creation of accessible software that can process and analyze data using scientific concepts, moving science forward and used primarily for research.

2. Patent Analysis Literature

Descriptive/Exploratory Analytics: Several studies have leveraged descriptive statistics to portray the temporal, geographical or technological development of patents in various fields. The results of these studies are either descriptive information about patents (e.g., most prominent organizations) or insights from multivariate methods that explain the relationships between multiple variables. Ardito et al. [11][13] focus on the IoT domain and explore its trends and dynamics on a country and assignee level, pinpointing the USA and China as prominent countries and Huawei and Qualcomm as the main assignees. Fujii et al. [12][14] and Tseng and Ting [13][15] explore the AI domain with knowledge-based methodologies and discover the main technologies and investors in AI trends. In the context of software engineering, Georgiou et al. [14][16] perform a large-scale analysis on patents from the USPTO to discover the geographical, organizational and technological distributions. Similar analyses have also been conducted in the fields of low-carbon technologies [15][17], RFID concepts [16][18], augmented reality [17][19], nanoscience [18][20] and photovoltaics [19][21], indicating that PA as a practice can be efficiently used in multiple application domains and yield practical results. Additional studies have also attempted to combine the use of PA with bibliometrics, enhancing the insights of PA with knowledge derived from the research literature and bibliometric indicators [20][21][22][22,23,24]. Topic Modeling: Apart from leveraging descriptive statistics and exploratory analysis on patent data, several studies have employed algorithms on patent data that extract topics and thematic axes, pinpointing promising technologies and objectives. Among them, the Latent Dirichlet Allocation (LDA) algorithm, proposed by Blei et al. [23][25], is by far the most popular when it comes to extracting topics in PA. Due to its efficiency in extracting topics from textual information, LDA has been widely employed in many fields, including vehicular technologies [24][25][26,27], where Zhang et al. [25][27] leveraged a variation of LDA, namely the structural topic modeling (STM) algorithm [26][28], which has also been employed in [27][29] for the profiling of hydrogen technologies. Other fields include smart manufacturing [28][30], sustainable city development [29][31], data-oriented software [30][32] and telecommunication patents [31][33], with the latter reviewing assignee hotspots, based on the extracted topics. Hotspots are particularly important as they emphasize prime investors and technologies and they have also been investigated in a plethora of studies [32][33][34][35][34,35,36,37]. Patent roadmaps, which comprise emerging or trending technologies that pave the road for future patent applications, are also an important part of topic modeling studies. Kim et al. [36][38]. propose a patent development map with a case study in 3D printing, using LDA, while Ma et al. [37][39] apply the same process in solar cell technologies. Zhang et al. [38][40] explore the Blockchain sector to assess technological maturity and forecast trending topics, while a large case study of patents in Australia [39][41] presents a methodology with semantic information that estimates development for specific topics, with a tailored case study. Finally, Kim et al. [40][42] leverage CPC clusters in telemedicine patents to evaluate the development of the field. It should be mentioned that topic modeling has also been employed in studies that explore the profiles of firms, along with their knowledge portfolios [41][43], and the identification of disruptive technologies that may alter the structure of the market [42][44], with a case study on photovoltaics. Citation Networks: Patent citation networks have also been proven to be highly important, based on the related literature, as they portray the interrelations between patent records and uncover the most influential patents or technologies. The most common types of citation networks are the patent-to-patent network, which examines the citations between different patents, and the CPC-to-CPC network, which examines the citations between different patent classes. Patent citation networks have been found to be important indicators in the timely identification of notable patents [43][45], while their use contributes to the mapping of technological research and discovering deeper connections between different domains [44][46]. Patent citation analysis has been employed in multiple sectors to find prominent assignees and organizations, technologies and patent entries, including but not limited to vehicle batteries [45][47], mobile technologies [46][48], agricultural and natural case studies [47][48][49][50][51][52][49,50,51,52,53,54], printed electronics [53][55] and nanotechnology [54][56]. The diffusion of information in patent citation networks has also been studied [55][56][57,58], along with the identification of emerging technologies, their lifecycles [57][58][59][59,60,61] and the concept of open innovation [60][62] and whether it is reflected in patent citations. Technological trajectories are also an aspect that is investigated in patent citations, which can be translated into the forecasting of the evolution of an emerging technology or an established practice based on its status in a citation network. This concept has been studied in patents regarding communication standards and energy devices [61][62][63,64], fuel cell research [63][65] and Blockchain [64][66]. Finally, several studies focus on assignees along with their associated technologies and their status on patent networks as a sign of competitive advantage, inventive prowess and the largest market share [65][66][67,68]. As PA has multiple applications, some studies have also proposed new approaches to exploring patent citations. More specifically, Hu et al. [67][69] introduce ego citation networks as an alternative means of exploring the citation of patents coupled with bibliographic references. Yang et al. [68][70] construct a comprehensive patent citation network leveraging direct, indirect, coupling and co-citation metrics, while Chakraborty et al. [69][71] use exponential random graph models to incorporate social parameters into a patent citation network. Finally, brokerage analysis [70][72], which exploits triadic relationships, has also been used in patent-to-patent networks [30][55][71][32,57,73].

3. Patent Analysis Tools

As mentioned in the Introduction, there are several PA tools that allow the processing of patent records, which are widely used by enterprises and organizations. In Table 1, basic information about the most popular PA tools is presented, highlighting their key characteristics and operations. An inspection of the table reveals that the majority of the tools are, indeed, proprietary and owned by large organizations (e.g., PatSeer, Derwent Innovation, Orbit Intelligence), with most of them providing access to millions of patent records from multiple offices. However, the fact that they are proprietary means that they do not support a free trial (or may do so upon request) and typically require a subscription for their services. In addition, most of the proprietary tools focus on providing business indicators for patent growth (e.g., portfolio quality, investment value), which are often based on AI methodologies, while some of them also provide topic modeling or citation analysis functionalities. Apart from proprietary PA tools, there are also several public tools that act as either PA suites or patent search databases. Among them, Patent2Net [72][74] is an educational suite that leverages data from EPO and focuses on citation networks and clustering. The suite also provides an interface [73][75] that allows users to explore its capabilities and export results in various graph formats. The main target groups of Patent2Net are the educational and scientific communities [72][74], while PatentInspector strives to include more target groups, such as industrial investors, developers, inexperienced researchers and HR representatives. UnifiedPatents is another partially public PA suite that mainly focuses on business indicators and differs from PatentInspector, as it can be primarily used by business owners and economists. The portal provides an intuitive interface and companies with smaller revenue can use it for free, although it introduces a pricing option for larger companies. Finally, PatentMiner [74][76] is a notable effort that was undertaken before PatentInspector and provided an interface that executed advanced PA with topic modeling.
Table 1.
Prominent patent analysis tools.
Tool Name Sources Patent Searching Semantic Analysis Topic Modeling Citation Networks Business Indicators Descriptive Statistics/Exploratory Analysis Public Proprietary Free Trial
TopicTracker [75][77] - - Yes Yes No No No No No No
TechSpectrogram [76][78] PatStat No No Partial Yes No No No No No
PatentMiner [74][76] USPTO, JPO, DPMA, IPO, CPD Yes Yes Yes Yes No No Yes No No
Patent2Net [72][73][74,75] EPO Yes No No Yes No Yes Yes No No
PatSeer [77][79] Multiple Yes No No No Yes Yes No Yes No
Derwent [78][80] Multiple Yes Yes No - Yes Yes No Yes No
Orbit [79][81] Multiple Yes Yes No - No Yes No Yes No
IamIP [80][82] Multiple Yes No Yes No Yes Yes No Yes No
IPRally [81][83] Multiple Yes Yes No No Yes No No Yes Yes
PatBase [82][84] Multiple Yes Yes No Yes Yes Yes No Yes On Request
UnifiedPatents [83][85] Multiple Yes Yes No No Yes Yes Yes No Yes
SciTech Patent Art [84][86] Multiple Yes No Yes No Yes Yes No Yes On Request
Tradespace [85][87] Multiple Yes Yes No No Yes No No Yes On Request
AcclaimIP [86][88] Multiple Yes No No Yes Yes Yes No Yes Yes
Innography [87][89] Multiple Yes Yes No Yes Yes Yes No Yes No
IPLytics [88][90] Multiple Yes Yes No No Yes Yes No Yes On Request
Minesoft Origin [89][91] Multiple Yes Yes No No No No No Yes On Request
Octimine [90][92] Multiple Yes Yes No No Yes Yes No Yes On Request
Patent Inspiration [91][93] Multiple Yes Yes No Yes Yes Yes No Yes On Request
PatentSight [92][94] Multiple Yes Yes Yes No Yes Yes No Yes On Request
PatSnap [93][95] Multiple Yes Yes No No Yes Yes No Yes Yes
PatentInsight [94][96] Multiple Yes Yes Partial Partial Yes Yes No Yes On Request
PQAI [95][97] Multiple Yes Yes No No No No No No Yes
PatZilla [96][98] EPO (mainly) Yes No No No No No Yes No No
Google Patents [97][99] Multiple Yes No Partial No No Yes Yes No No
FreePatents [98][100] USPTO, EPO, JPO, WIPO Yes No No No No No Yes No No
Relucura (TechTraker, TechExplorer, Enterprise Web tool) [99][101] Multiple Yes Yes Yes Yes Yes Yes No Yes No
The Lens [100][102] Multiple Yes No No Partial No Yes Yes Yes No
PatentR [101][103] Multiple No No No No No Yes Yes No Yes
Sumobrain [102][104] Multiple Yes No No No No No Yes No Yes
PatentAnalyzer [103][105] Multiple Yes Yes No Yes No No No Yes On Request
Patexia PatentAnalyzer [104][106] Multiple Yes Yes No No Yes Yes No Yes On Request
PatentInspector USPTO Yes 1 No Yes Yes No Yes Yes No Yes
The remaining free PA tools (PatZilla, FreePatents and GooglePatents) are not PA tools in the typical sense, as they mainly provide advanced search engines for the retrieval of patent documents. Thus, their PA capabilities are minimal and they cannot be considered similar to PatentInspector, which employs established scientific concepts and targets all types of users. GooglePatents [97][99] in particular stands as one of the most popular patent search engines, encompassing data from multiple patent offices and offering limited descriptive information (e.g., top inventors, top organizations). The analysis of PA tools and suites reveals that, as stated in the Introduction, while there is a plethora of such tools in the market and in software repositories, few of them are suitable for users with limited coding or scientific backgrounds. PatentInspector emerges to cover this deficit, with results from the USPTO while also offering different methodologies, efficient visualizations and interpretable insights. In addition, PatentInspector introduces a novel perspective of PA for mainstream users and more advanced parties by including topic modeling methodologies that can profile the thematic axes of patent documents and aid users in making informed decisions.
Video Production Service