ABSTRACT
With the increasing demand for finer granularity in patent analysis, some scholars have started to explore the application of SAO (Subject-Action-Objection) triples to provide systematic descriptions of “problem-solution” in patent texts at a granular level. To investigate the progress in this field of research and assist scholars in identifying research topics and important research teams, this study utilizes the Web of Science database to retrieve academic papers in this domain. The study applies the BERTopic model and Gephi software to analyze the evolution of research topics and core research institutions in this field. The findings of this study reveal that the overall research in this field is in its preliminary exploration stage. Currently, scholars primarily focus on the application of SAO at the methodological level, and a stable core research group has not yet formed in this field. Future research should consider incorporating semantic associations between the Subject (S) and Objection (O) and expand the application of SAO triples to the theoretical level. Moreover, research teams in this field should actively seek collaboration and exchange to promote the advancement of SAO application research in patent analysis.
INTRODUCTION
Patent literature systematically carries technological innovation knowledge, and it helps people understand and master technologies by providing systematic descriptions, thereby serving production and daily life.[1] Patent analysis has become an important tool for government, enterprises, and other sectors in conducting R&D management. Currently, patent analysis methods can be broadly classified into three categories: expert-based analysis methods,[2] bibliometric-based analysis methods,[3,4] and emerging information technology-based analysis methods utilizing deep learning, natural language processing, and others.[5–7] Expert-based analysis methods suffer from subjectivity and high costs in terms of manpower and resources. Bibliometric-based analysis methods partially overcome the issue of subjectivity but are limited to the level of patent literature analysis. With advancements in natural language processing, deep learning, and other technologies, the granularity of patent analysis has started to delve into the textual level. However, the explanatory power of the aforementioned technological methods regarding technical problems and solutions still requires further improvement.
In response to the aforementioned issues, some scholars have started to explore the application of the SAO (Subject-Action-Object) structure to reflect the “problem-solution” relationship in technological innovation.[8] This approach aims to provide a comprehensive and granular description of technical solutions in patent literature for further analysis in technology intelligence. Currently, the application of SAO in patent analysis has achieved promising research results,[8–10] further validating its applicability in patent analysis.
Based on the analysis mentioned above, this study aims to utilize bibliometric analysis methods and retrieve academic papers on patent analysis that apply SAO in methodological level from the Web of Science (WOS) database. The objective is to explore the research development trends in this field, identify research topics, and understand the evolution of research institutions. This study aims to provide a reference for scholars interested in tracking research dynamics and promoting scientific collaboration in this area by conducting an evolutionary analysis of research themes and core research institutions within the field.
LITERATURE REVIEW
The SAO (Subject-Action-Object) triplet is a textual structure used to describe lexical relationships, where the Subject and Object components are connected through the Action component, thereby reflecting semantic logical relationships such as subordination and association between the two parts. As patent analysis delves deeper into the textual level, keyword-based analysis has been found to be deficient in lexical information and inadequate in reflecting the technical content of patent texts.[11] Some scholars have attempted to address these limitations by employing embedded text processing methods, transforming patent texts into embedded text vectors to represent the technical content of patents. Although these text vectors capture contextual relationships between words and provide richer semantic information, they do not clearly reflect the logical relationships between lexical items.[12] In light of these considerations, the SAO structure, which can clearly delineate the logical relationships between lexical items, has gradually been adopted in patent text analysis.
In the theoretical framework, the core of the TRIZ (Theory of Inventive Problem Solving) analysis framework is problem identification and resolution.[2] The SAO structure, which reflects lexical logical relationships, can effectively identify problems and solutions in the Subject (S) and Object (O) components. This makes SAO well-suited for TRIZ analysis, leading to scholarly research in this area. Moreover, SAO has been widely used due to its ability to reveal richer semantic information, with applications in fields such as new energy vehicles,[13] grapheme,[14] finance.[15] In specific research designs, some scholars have used SAO to represent key technological components in patent text analysis.[1] This has enabled studies in disruptive technology identification,[16] technological opportunity discovery,[17–19] and patent infringement determination[9,20] enhancing the depth of patent text understanding. Additionally, some researchers have constructed semantic networks for patent text analysis, using the Action (A) component of SAO as the connecting edge and the Subject (S) and Object (O) components as nodes.[17,21,22] This approach helps to explore logical associations between technological contents in patent texts.
A review of the literature reveals that SAO has been extensively applied as a semantic analysis tool in the field of patent text analysis. Given the multidisciplinary and multi-methodological nature of research in this domain, studies employing SAO for patent text analysis are characterized by their complexity and diversity. This complexity may pose challenges for scholars who are new to the field, making it difficult for them to adapt to the multifaceted research landscape. Moreover, the increasing diversity of knowledge requirements in this area suggests that individual scholars or institutions may find it increasingly challenging to meet these demands on their own. Therefore, this study aims to systematically identify the primary research themes and core research institutions currently engaged in SAO-based patent analysis. By doing so, it provides scholars with a clear understanding of the prevailing research trends and the developmental trajectories of key institutions. This, in turn, enables researchers to more rapidly identify their own research positioning and reference strategies, and to swiftly select appropriate research partners. Additionally, the identification of hot topics and core research institutions in this study can assist science and technology management authorities in clarifying research directions, enhancing guidance, and efficiently allocating scientific research resources. Ultimately, these efforts will contribute to the development of the academic community in this field.
METHODOLOGY
First, this study designs a search query to retrieve patent literature from a patent database in a specific domain. The abstract texts, keywords, publication dates, and other bibliographic information are obtained as the primary data source. Subsequently, the temporal changes in publication, evolution of research topics, and core research institution evolution are analyzed separately. The research framework of this study is illustrated in Figure 1:

Figure 1:
Research framework.
For the diachronic analysis of published papers, this study will use the obtained publication dates of the papers to create a bar-line mixed chart. By utilizing this chart, the changes in the publication volume can be analyzed, providing insights into the research development in this field. Additionally, based on the analysis results in this section, the study will determine the rules for dividing time windows, which will serve as a basis for subsequent evolutionary analysis.
Considering the limitations of author keywords, which may be incomplete and subjective,[23] this study intends to use topic modeling to extract hidden research topics from the abstracts of the papers for analysis. Currently, both the LDA (Latent Dirichlet Allocation) model and the BERTopic model have been widely used to explore the patterns of topic evolution. However, existing research suggests that the topics extracted by the BERTopic model have better semantic readability compared to the LDA model.[24] Therefore, in this study, the BERTopic model will be applied to the abstract texts to perform topic modeling, aiming to obtain more objective and comprehensive representations of research topics and reveal their evolution patterns. This approach will help scholars in this field to understand the development trajectory of research by organizing and analyzing the research topics more effectively.
Existing research indicates that larger research teams are more capable of advancing knowledge in a discipline.[25] Therefore, analyzing research institutions at the organizational level is more conducive to exploring the contributions of research teams to the field, compared to analyzing individual scholars. Based on this notion, this study constructs an institution co-occurrence matrix and forms a collaborative network among institutions in the field, using the co-occurrence relationships of research institutions in the bibliographic information of the papers. This analysis aims to reveal the core institutions engaged in research in this field and their evolution patterns. It will help researchers quickly identify leading research teams to track research hotspots and engage in collaborative research in the future.
RESULTS
Diachronic analysis of publishing articles
This study retrieves the required literature data from the WOS (Web of Science) database. The search query used in this study is as follows: TS=’patent’ AND (TS=’SAO’ OR TS=’subject-action-object’). The search was conducted on June 8, 2024. To ensure the quality and representativeness of the literature, the data source was limited to the Web of Science Core Collection, and the document type was set as ‘article’. After excluding irrelevant articles, a total of 163 articles and their bibliographic information were retrieved. The publication volume for each year is shown in Figure 2. In the figure, the bars represent the number of publications in each year, while the line represents the cumulative publication count.

Figure 2:
Number of publications.
Based on the graph, it can be observed that the publication volume in this field shows a wave-like upward trend. From 1995 to 2010, the annual growth rate was steady, with an increase of 1-6 publications per year. In accordance with the patterns of literature growth, during the nascent stages of a discipline, the number of publications tends to be relatively low and exhibits a slow upward trend. This is primarily due to the immaturity of theoretical frameworks and the instability of research paradigms, which collectively act as limiting factors. This situation may reflect the early exploration stage of research in this field. In 2011 and 2012, there was a significant increase in the annual publication volume, followed by a slightly slower growth rate. This period may indicate a time when new research directions emerged and were being understood and discussed by the academic community. After 2015, the annual publication volume stabilized at six or more papers, indicating that research in this field entered a new stage.
It is important to note that the papers published in 2024, as detected in this study, refer to those published until June 8, 2024. Considering the previous publication trends, it is likely that the annual publication volume will remain at a high level for the entire year.
Based on the above analysis, this study preliminarily sets the time windows for analysis as follows: the initial exploration period (Stage I: 1995-2010), the platform breakthrough period (Stage II: 2011-2014), and the flourishing development period (Stage III: 2015-2024). The subsequent evolutionary analysis will be conducted based on these time window divisions.
Topic Evolution Analysis
This study extracts the abstract texts of the retrieved papers as the data source for topic evolution analysis. Firstly, the original abstract texts are preprocessed using the NLTK (Natural Language Toolkit) library in Python. The preprocessing steps include tokenization, stop word removal, POS (Part-of-Speech) word lemmatization, and lowercasing.
Next, the BERTopic model is trained in stages to perform topic modeling on the preprocessed texts and obtain research topics for each stage. This study uses the default all-MiniLM-L6-v2 English document model for document embedding and employs UMAP for dimensionality reduction. The clustering method for topic word clustering is set to the default HDBSCAN clustering. Subsequently, candidate topic words are selected using c-TF-IDF. For each stage, two topics are obtained, and only the top 10 keywords with the highest probability associations are retained for each topic. The topics and corresponding keywords for each stage are presented in Table 1.
Stage 1 | Stage 2 | Stage 3 | |||
---|---|---|---|---|---|
Topic 1 | Topic 2 | Topic 1 | Topic 2 | Topic 1 | Topic 2 |
patient | patent | patent | patient | technology | group |
pfo | study | technology | minute | patent | patient |
sao | brazil | technological | cerebral | sao | study |
oxygen | human | use | arterial | method | vancomycin |
arterial | drug | propose | oxygen | use | pda |
pulmonary | use | sao | increase | analysis | preterm |
copd | inventor | paper | pda | structure | sao |
stent | plant | infringement | sao | identify | oxygen |
mean | development | approach | infant | propose | high |
level | technology | identify | saturation | technological | result |
From an overall perspective, the research topics can be broadly categorized into two main categories: the application of SAO in technical analysis and the application of SAO in clinical medical research.
Regarding the keywords associated with the topics, initially, SAO was introduced as a novel text analysis method in the field of patent technology analysis. As the research progressed, this method was further applied in areas such as technology infringement and technology entity recognition.
In the field of clinical medical research, a significant focus has been on studying the treatment effects of various drugs and oxygen supply on patient symptoms. Research related to the effectiveness of treatments for patient symptoms has been an important topic in this domain, and scholars have been particularly interested in exploring the effects of different medications and oxygen supply on patient symptoms.
Analysis of the evolution of core institutions
According to the previous rules for dividing the time windows, the retrieved literature bibliographic data will be organized based on their publication dates. For the field “Affiliation”, if the data includes specific college or department information, those will be grouped under their higher-level institutions. For example, the “School of Management, Peking University” will be organized as “Peking University”. Once the bibliographic data is organized, co-occurrence matrices will be constructed in stages based on the collaborative relationships among institutions. The Gephi software will be used to visualize the collaborative network among institutions. Subsequently, network characteristics and node degree centrality will be calculated to analyze the overall evolutionary patterns of the network and identify core institutions. The collaborative network among institutions can be found in Figure 3, network indicators for each stage are presented in Table 2, and the core institutions are listed in Table 3.

Figure 3:
Collaboration network of institutions in various stages (a, b, and c represent Stage 1, Stage 2, and Stage 3).
Stage 1 | Stage 2 | Stage 3 | |
---|---|---|---|
Num_nodes | 44 | 50 | 141 |
Num_edges | 44 | 70 | 293 |
Diameter | 1 | 2 | 4 |
Average path length | 1 | 1.146 | 1.896 |
Density | 0.047 | 0.057 | 0.030 |
Average clustering coefficient | 1 | 0.986 | 0.927 |
Stage 1 | Stage 2 | Stage 3 | |||
---|---|---|---|---|---|
Institution | Degree | Institution | Degree | Institution | Degree |
Royal Brompton Hospital | 3 | Chinese University of Hong Kong | 7 | KU Leuven | 20 |
University of Wisconsin System | 3 | Peking Union Medical College | 4 | Leiden University | 19 |
University of Wisconsin Madison | 3 | Western University (University of Western Ontario) | 4 | Centro Hospitalar de Lisboa Ocidental | 16 |
Imperial College London | 3 | Chinese Academy of Medical Sciences – Peking Union Medical College | 4 | University Hospital Leuven | 14 |
Purdue University | 3 | Capital Medical University | 4 | Beijing Institute of Technology | 13 |
In Figure 3, the nodes represent research institutions involved in the field, with node size indicating their degree of centrality. Research institutions with a higher degree of centrality have greater influence within the collaborative network and can be considered leaders in the field.
In terms of overall network characteristics, the increasing number of nodes and edges indicates a growing number of institutions conducting research using SAO in patent analysis. The formation of collaborative relationships has become more diverse. The growth of network diameter and average path length suggests that any two research institutions in the collaborative network need to rely on more partners to establish collaboration. The decreasing trends in network density and average clustering coefficient further demonstrate the low connectivity nature of the collaboration network in this field.
Regarding the highly central core institutions, in Stage 1 and Stage 2, the top five core institutions did not exhibit a significantly high degree of centrality, and there were no noticeable differences among them. This may indicate that during these stages, the research in this field was still in the exploratory phase, and there were no prominent core research teams or consensus on key issues within the academic community. In Stage 3, the top five core institutions showed a significant increase in degree centrality compared to the previous stages, and there was greater variation in their centrality. This could be attributed to the inclusion of more research institutions, which elevated the degree of centrality of the core institutions. Additionally, as research in the field progressed, a relatively stable set of core research institutions gradually emerged within the collaborative network.
DISCUSSION
This study conducted a bibliometric analysis based on the relevant literature on the application of SAO in patent analysis published in the WOS database from 1995 to 2024. Overall, the application of SAO in patent analysis is still in its early exploratory stage. The publication volume is relatively low, and the research topics are relatively fixed, indicating that a stable research community has not yet been formed.
From the evolution of research topics, currently, the main focus of SAO in patent analysis is its methodological application. Scholars have started to explore the use of SAO as an important representation of technical elements in patent literature in research areas such as patent infringement and technology entity recognition. It is worth noting that some scholars engaged in semantic TRIZ research believe that the SAO structure can be used to describe the relationship between problems and solutions in functional analysis.[8,26] They argue that the technical content and action relationships implied by the S and O parts should be included in the analysis. Currently, some scholars have attempted to build semantic networks based on the semantic associations between S and O to analyze the development trends of technology[21] and identify technological opportunities.[17] They have integrated the consideration of semantic associations between S and O into the fine-grained patent text analysis based on SAO. Furthermore, scholars need to consider how to expand the application of SAO in patent analysis from the methodological level to the theoretical level. They should delve into the insights provided by the SAO triplets on the content of technical solutions, implementation paths, and research motivations,[27] to provide new research perspectives and deepen the theoretical contribution to patent analysis. Comprehending the evolution of research themes in this field enables scholars to grasp the shifts in research hotspots, thereby facilitating more effective research endeavors. Concurrently, relevant authorities can utilize insights into these shifts to rationally design research projects, which in turn guides the developmental trajectory of research within the domain in a more targeted manner.
According to the evolution of core institutions revealed in this study, there is currently no relatively stable core research institution or research community centered around core institutions in this field. The evolutionary pattern of an expanding research institution collaboration network with decreased connectivity indicates that the collaborative research network in this field is still in its initial formation stage, and the “small world” phenomenon is not significant. Research collaboration is crucial for generating new knowledge and improving research efficiency.[28] Given the relatively limited channels for research collaboration and communication in this field, the academic community may encounter challenges in effective communication and collaboration in subsequent research. This could potentially impede the conduct of collaborative research and the production of high-quality research outcomes. Research institutions should explore external connections such as academic conferences and industry-academia-research cooperation to enhance mutual trust and promote collaborative innovation.[4] For individual scholars, the evolutionary patterns of core research institutions can facilitate the rapid identification of dominant research teams within the field. This enables scholars to seek collaborations with these core institutions to advance their own research endeavors. For science and technology management departments, analyzing the changes in core research institutions helps to identify the developmental trends of the academic community in this field. This, in turn, supports the formulation of more scientific and precise science and technology management policies, thereby promoting the development of research in this area. To enhance collaborative research in this field, promote the development of academic communities, and elevate research standards, science and technology authorities should take the lead in establishing open innovation platforms that integrate industry, academia, and research institutions. By optimizing the top-level design of research projects and funding mechanisms, they can effectively encourage and guide collaborative research initiatives. Simultaneously, strengthening intellectual property protection and data privacy safeguards will ensure the sustainable, long-term development of research partnerships.
CONCLUSION
This study is based on the retrieval of research literature on the application of SAO in patent analysis from the WOS database. Based on this, this study conducts a diachronic analysis, topic evolution analysis, and core research institution evolution analysis in the field. The study found that the overall research in this field is in the preliminary exploration stage. Currently, scholars primarily focus on the application of SAO at the methodological level. The research findings of this study provide a reference for scholars engaged in research in this field, allowing them to track research progress, and identify hot research topics, and core research institutions. This is conducive to promoting collaborative research and improving research output efficiency in this field.
However, it should be acknowledged that there are some limitations in this study. The research is based on the retrieval of relevant literature on the application of SAO in patent analysis from the WOS database for bibliometric analysis. Further analysis targeting different disciplines and research topics was not conducted. Therefore, future research endeavors should aim to expand the scope of data sources for retrieval and conduct comparative analyses of the application of Subject-Action-Object (SAO) structures in patent analysis across different disciplinary fields. This approach will facilitate a more comprehensive and nuanced evolutionary analysis.
Cite this article:
Li Y. Evolutionary Analysis of the Application of SAO in Patent Analysis: A Bibliometric Study. J Scientometric Res. 2025;14(2):460-466.
References
- Puccetti G, Giordano V, Spada I, Chiarello F, Fantoni G.. Technology identification from patent texts: A novel named entity recognition method. Technology identification from patent texts.. Technol Forecasting Soc Change.. 2023;186:122160. [CrossRef] | [Google Scholar]
- Ilevbare IM, Probert D, Phaal R.. A review of TRIZ, and its benefits and challenges in practice.. Technovation.. 2013;33(2-3):30-7. [CrossRef] | [Google Scholar]
- Momeni A, Rost K.. Identification and monitoring of possible disruptive technologies by patent-development paths and topic modeling.. Technol Forecasting Soc Change.. 2016;104:16-29. [CrossRef] | [Google Scholar]
- Su Y, Yan Y.. The influence of the two-tier network of a regional innovation system on knowledge emergence.. J Knowl Manag.. 2023;27(9):2526-47. [CrossRef] | [Google Scholar]
- Meng F, Yang S, Wang J, Xia L, Liu H.. Creating knowledge graph of electric power equipment faults based on BERT–BiLSTM–CRF model.. J Electr Eng Technol.. 2022;17(4):2507-16. [CrossRef] | [Google Scholar]
- . Proceedings of the world conference on intelligent and 3-D technologies (WCI3DT 2022).. 2023:435-45. [CrossRef] | [Google Scholar]
- . New opportunities for innovation breakthroughs for developing countries and emerging economies.. 2019;Vol. 572. [CrossRef] | [Google Scholar]
- Yoon J, Kim K.. Identifying rapidly evolving technological trends for R&D planning using SAO-based semantic patent networks.. Scientometrics.. 2011;88(1):213-28. [CrossRef] | [Google Scholar]
- Park H, Yoon J, Kim K.. Identifying patent infringement using SAO based semantic technological similarities.. Scientometrics.. 2012;90(2):515-29. [CrossRef] | [Google Scholar]
- Yang C, Zhu D, Wang X, Zhang Y, Zhang G, Lu J., et al. Requirement-oriented core technological components’ identification based on SAO analysis.. Scientometrics.. 2017;112(3):1229-48. [CrossRef] | [Google Scholar]
- Khan MQ, Shahid A, Uddin MI, Roman M, Alharbi A, Alosaimi W, et al. Impact analysis of keyword extraction using contextual word embedding.. PeerJ Comput Sci.. 2022;8:e967. [PubMed] | [CrossRef] | [Google Scholar]
- Zhang Y, Liu T, Li W.. Corporate fraud detection based on linguistic readability vector: application to financial companies in China.. Int Rev Financ Anal.. 2024;95:103405. [CrossRef] | [Google Scholar]
- Hu R, Ma W, Lin W, Chen X, Zhong Z, Zeng C., et al. Technology topic identification and trend prediction of new energy vehicle using LDA modeling.. Complexity.. 2022;2022(1):9373911. [CrossRef] | [Google Scholar]
- Yang C, Zhu F, Zhang G.. In: Uncertainty modelling in knowledge engineering and decision making.. 2016;Vol. 10.:155-61. [CrossRef] | [Google Scholar]
- Othman R, Noordin MF, Gusmita RH, Sembok TM, Zulkifli Z.. SAO extraction on patent discovery system development for Islamic finance and banking 6th. International Conference on Information and Communication Technology for The Muslim World (ICT4M). 2016:59-63. [CrossRef] | [Google Scholar]
- Qiao Y, Wang X, Huang Y, Zhang S, Yang X.. Tech mining approach for identifying potentially disruptive technologies: from the perspective of technological alternatives.. IEEE Trans Eng Manage.. 2024;71:5921-38. [CrossRef] | [Google Scholar]
- Han X, Zhu D, Wang X, Li J, Qiao Y.. Technology opportunity analysis: combining SAO networks and link prediction.. IEEE Trans Eng Manage.. 2021;68(5):1288-98. [CrossRef] | [Google Scholar]
- Li X, Wu Y, Cheng H, Xie Q, Daim T.. Identifying technology opportunity using SAO semantic mining and outlier detection method: A case of triboelectric nanogenerator technology.. Technol Forecasting Soc Change.. 2023;189:122353. [CrossRef] | [Google Scholar]
- Liu Z, Feng J, Uden L.. Technology opportunity analysis using hierarchical semantic networks and dual link prediction.. Technovation.. 2023;128:102872. [CrossRef] | [Google Scholar]
- Kim S, Yoon B.. Patent infringement analysis using a text mining technique based on SAO structure.. Comput Ind.. 2021;125:103379. [CrossRef] | [Google Scholar]
- Yang C, Huang C, Su J.. An improved SAO network-based method for technology trend analysis: A case study of graphene.. J Inf.. 2018;12(1):271-86. [CrossRef] | [Google Scholar]
- Yoon B, Kim S, Kim S, Seol H.. Doc2vec-based link prediction approach using SAO structures: application to patent network.. Scientometrics.. 2022;127(9):5385-414. [CrossRef] | [Google Scholar]
- Peset F, GarzĂłn-FarinĂłs F, González L, GarcĂa-MassĂł X, Ferrer-Sapena A, Toca-Herrera J, et al. Survival analysis of author keywords: an application to the library and information sciences area.. J Assoc Inf Sci Technol.. 2020;71(4):462-73. [CrossRef] | [Google Scholar]
- Contreras K, Verbel G, Sanchez J, Sanchez-Galan JE.. Using topic modelling for analyzing Panamanian parliamentary proceedings with neural and statistical methods.. Proceedings of the 2022 IEEE 40th central America and Panama convention (CONCAPAN). 40th IEEE Central America and Panama Convention (CONCAPAN),. 2022 [CrossRef] | [Google Scholar]
- Wu L, Wang D, Evans JA.. Large teams develop and small teams disrupt science and technology.. Nature.. 2019;566(7744):378-82. [PubMed] | [CrossRef] | [Google Scholar]
- Moehrle MG, Walter L, Geritz A, MĂĽller S.. Patent-based inventor profiles as a basis for human resource decisions in research and development. R D Manag.. 2005;35(5):513-24. [CrossRef] | [Google Scholar]
- Porter AL. How “tech mining” can enhance R&D management.. Res Technol Manag.. 2007;50(2):15-20. [CrossRef] | [Google Scholar]
- Wagner CS, Whetsell TA, Mukherjee S.. International research collaboration: novelty, conventionality, and atypicality in knowledge recombination.. Res Policy.. 2019;48(5):1260-70. [CrossRef] | [Google Scholar]