Drug-resistant tuberculosis is a form of tuberculosis that is resistant to at least one of the standard first-line anti-tuberculosis drugs. DR-TB can occur when patients do not complete their full course of TB medication, leading to the development of drug resistance. Improved diagnostics and more effective treatments are urgently needed to address this global health challenge, So This study uses bibliometric and text mining techniques to conduct a topical analysis of scientific publications on drug-resistant tuberculosis. WOS Core Collection citation database was used to extract data from the beginning until April 25, 2022. Afterward, the data was analyzed using Python and Microsoft Excel. The results revealed that scientific publications on drug-resistant tuberculosis have increased in recent years, with the majority of the publications consisting of articles and reviews. The USA, India, and South Africa, on the other hand, account for the majority of the publications. Furthermore, the findings demonstrated that publications related to drug-resistant tuberculosis had the highest publication rate in the following subjects: Drug Resistance, Care, Treatment, Drug Activity, Patient, and Drug Dose Therapy Regimen. The findings of the present study showed that the interest in drug-resistant tuberculosis is increasing and controlling its prevalence is becoming one of the key health preferences in the world.
Tuberculosis is particular of the ten lead causes of death worldwide due to the bacteria named Mycobacterium tuberculosis, and it is usually treatable in most cases. Tuberculosis is primarily a lung disease; however, Microbacterium tuberculosis can spread to other organs and cause an extra-pulmonary form. A combination of four drugs, rifampin, isoniazid, ethambutol, and pyrazinamide, is mainly used as the useful and standard drug in treating tuberculosis patients. However, one of the main concerns in tuberculosis is the emergence of anti-tuberculosis drug-resistant species, which is one of the world’s lead causes of death. Furthermore, with the introduction of the world’s first anti-tuberculosis drug in 1943, the emergence of drug resistance to tuberculosis began to increase. Unfortunately, Multidrug-Resistant Tuberculosis [MDR-TB] emerged and quickly developed into a key and threatening problem for tuberculosis control programs in many countries and around the world following the widespread use of rifampin [which began in the 1970s of the twentieth century].
The MDR-TB is a form of tuberculosis infection caused by bacteria resistant to at least two of the most potent first-line anti-tuberculosis drugs, either isoniazid or rifampin. Some types of tuberculosis are likely resistant to second-line drugs and are called extensively drug-resistant tuberculosis.[3, 7]
Since advances in the eradication of tuberculosis over the last two decades are likely to be decreased by the growth of drug-resistant Mycobacterium tuberculosis, research on drug resistant is a critical component of future devising for the eradication of tuberculosis. However, increasing research and innovation is one of the strategic elements for fighting against tuberculosis. As a result, there is a strong need for continuous evaluation of research and review of the trends, and bibliometric and textual analysis techniques are extensively considered the most effective tools for this purpose.
Bibliometrics is a method in statistical approach for analyzing and evaluating different aspects of scientific publications on various topics. Furthermore, bibliometrics analysis includes a variety of factors, including country involvement, pioneering journals, most top researchers, international cooperations, publications growth in annual, and leading papers.
Text mining techniques are used to extract information and latent knowledge in textual data, especially scientific documents, and they have an extensive range of applications in the processing and analysis of documents. As a result, text mining techniques are used to analyze scientific publications, uncover latent knowledge and topics, and track the evolution of scientific publications over time.[10, 11] Furthermore, using scientometric and bibliometric techniques, many studies have examined the trend of scientific publications on tuberculosis and identified the share of organizations and countries in these publications.[12–16] Walid et al.  reviewed the publications on drug-resistant tuberculosis from 2006 to 2015. The num of publication documents, the countries and institutions with the highest participation, citation analysis, co-authorship, international collaboration, authors, and journals were all retrieved and analyzed in their study. Some other studies analyzed scientific publications on Coronavirus, COVID-19, [11,18,19] and Brucellosis using text mining techniques. According to the previous literature, no research has been performed on the analysis of scientific publications on tuberculosis and drug-resistant tuberculosis using text mining techniques, and they have primarily focused on scientometric and bibliometric studies. As a result, the present investigation aimed to examine scientific publications on drug-resistant tuberculosis and analyze them using bibliometric and text mining techniques to recognize patterns and topics.
The present study used an applied methodology that employs bibliometric and text mining techniques in conjunction. The statistic population was all global drug-resistant tuberculosis publications. However, the Web of Science Core Collection (WOSCC) citation database was employed to extract the data based on the search strategy below from the beginning until April 25, 2022.
WOS Search Strategy
TS= (“mycobacterium tuberculosis drug resistance” OR “mdr mycobacterium tuberculosis” OR “Multidrug-resistant TB” OR “MDR TB” OR “Drug-resistant tb” OR “mycobacterium tuberculosis antibiotic resistance” OR “Antibiotic resistance of mycobacterium tuberculosis” OR “M.tuberculosis antibiotic resistance” OR “antibiotic resistance of M.tuberculosis” OR “M.tuberculosis drug resistance” OR “drug resistance of M.tuberculosis” OR “extended drug resistant tb” OR “extended drug resistant mycobacterium tuberculosis” OR “XDR mycobacterium tuberculosis” OR “XDR TB” OR “drug resistance in mycobacterium tuberculosis” OR “drug resistance in TB” OR “Drug resistant TB” OR “Antibiotic resistant TB” OR “Antibiotic resistance in TB” OR “Antibiotic resistance in mycobacterium tuberculosis” OR “antibiotic resistance in M.tuberculosis” OR “XDR M.tuberculosis” OR “MDR M.tuberculosis”)
English titles, as well as titles and abstracts of extracted publications, were considered to perform text mining techniques and the topical modeling algorithm of Latent Dirichlet Allocation (LDA) to discover the topics of drug-resistant tuberculosis in scientific publications.
The initial idea behind LDA was to view text as a blend of multiple topics, with the characteristics of each topic determined by its word distribution. Within LDA, individual words are referred to as terms, collections of words are seen as documents, and the collection of all documents is considered a corpus. The entirety of the words present in the corpus make up the vocabulary. Each topic is represented by a probability distribution across the words found within the vocabulary. Therefore, there are two groups of probability distributions generated by:
-A collection of topic distributions for each document.
-A collection of word distributions for each topic.
The LDA aims to produce outcomes that suggest that every document is comprised of a few topics, with each topic mainly composed of a restricted set of central words.
The LDA modeling algorithm is one of the most extensively used text mining algorithms for extracting scientific text topics, and it is excellent at identifying related semantic issues in scientific texts.[22, 23] Moreover, text mining algorithms were implemented using Python and text-related libraries such as Genism, NLTK, and Spacy.
Gensim, NLTK, and SpaCy are three popular Natural Language Processing (NLP) libraries used in Python.
NLTK (Natural Language Toolkit) is a comprehensive library for NLP tasks such as tokenization, stemming, tagging, parsing, and machine learning. It provides a wide variety of tools and resources that can be used for both research and practical applications. Gensim is a library for unsupervised topic modeling and document similarity analysis. SpaCy is a library for advanced NLP tasks such as part-of-speech tagging, named entity recognition, dependency parsing, and text classification.
Three stages of data preprocessing were used in the text mining process: text mining, illustration techniques, and results and knowledge extraction analysis.
It is noteworthy that the number of subject categories for documents or textual data is not automatically obtained in topical modeling. As a result, the CV Coherence Algorithm was used to solve this problem so that the proper number of topics was identified. In addition, the number of related topics and tags in each topic were determined following the opinions of subject specialists in the field of infectious diseases.
A search of the WOSCC database yielded 7,720 records of drug-related tuberculosis publications. Figure 1 presents the annual publication trend since 1952.
As shown in Figure 1, the scientific publications on drug-resistant tuberculosis were increased since 2000 highest numbers of publications in this field were in 2019, 2020, and 2021.
Figure 2 presents the types of publications in the field of drug-resistant tuberculosis.
As indicated in Figure 2, the highest number of scientific publications on drug-resistant tuberculosis was of the article and review types.
Table 1 presents the countries with the highest number of drug-resistant tuberculosis publications.
As demonstrated in Table 1, the United States, India, and South Africa had the most publications on drug-resistant tuberculosis, with 1783, 1048, and 849 cases, respectively. Figure 3 indicates the results of topical modeling on six main topics in word cloud format.
|5||People’s R China||754|
As demonstrated in word clouds from Figure 3, the publications related to drug-resistant tuberculosis were on six main topics of “care,” “drug activity,” “patient,” “drug resistance,” “treatment,” and “drug dosy therapy regimen.”
Moreover, Figure 4 indicates the publication rate of the six topics published in the field of drug-resistant tuberculosis.
As shown in Figure 4, “drug resistance” with 28.2%, “care” with 23.82%, and “treatment” with 18.86% were the topics with the highest publications on drug-resistant tuberculosis. Moreover, Figure 5 presents the publication trend of the six topics in the publications related to drug-resistant tuberculosis.
In Figure 5, yellow color shows the lowest amount of publication (low rate) and as the colors tend towards green, it shows more publications, so dark green has the highest amount of publications. Also, according to Figure 5, the topic of “drug resistance” in recent years has had the highest rate of publication among other topics, with the highest rate in 2020, 2019, and 2021. “Care” and “treatment” are other topics, the publications of which have recently grown more than of other subjects.
The present study shows a clear picture of scientific publications on drug-resistant tuberculosis using bibliometric and text mining techniques.
Drug-resistant tuberculosis is a universal health issue that, if left uncontrolled, could pose a global public health challenge.[29, 30] Concerns about the prevalence of drug-resistant tuberculosis are a priority; therefore, researchers around the world are looking for ways to solve the problem by conducting research in this area.
The results of this study indicated that the scientific publications in the field of drug-resistant tuberculosis have been growing in recent years, and most of the publications have been of types articles and reviews. Moreover, the United States, India, and South Africa have the highest number of scientific publications.
The present study demonstrated the importance of drug-resistant tuberculosis from 2000 onwards due to the being in emergency conditions and the spread of drug-resistant tuberculosis strains around the world, and this trend has continued until the last three years and has become a significant problem in tuberculosis and a major concern for the health systems of countries and the WHO. This issue shows that humans cannot control strains of drug-resistant tuberculosis, and resistance has an increasing trend.
Furthermore, the findings of the present study show that most research on resistant tuberculosis has been conducted in the United States, India, and South Africa, in respective order. These countries are considered to be among the nations with high populations. On the one hand, many researchers in these countries are studying on drug-resistant tuberculosis, and many research projects are being conducted in this field. On the other hand, due to the high populations of India and South Africa, cases of tuberculosis followed by drug-resistant tuberculosis are highly prevalent in these countries. In South Africa, another reason for the increase in the incidence of tuberculosis and, consequently, drug-resistant tuberculosis is the high number of HIV-positive people, which due to weakening the immune system, causes patients to be prone to other infections, including mycobacterium tuberculosis bacteria.
In this regard, Chang et al. (2019) declared that the field of tuberculosis, in general, has grown rapidly from 1998 to 2017 with increasing global participation. However, the analysis of publications in the last decade indicated that the num of publications on tuberculosis had been less than doubled in the last decade, while the num of publications on drug-resistant tuberculosis has been higher than tripled, representing that resist is generally growing at ahigher amount than tuberculosis.
Regarding the countries with the highest publications on tuberculosis, previous studies have shown that the United States had the most publications in this field.[12, 32] Moreover, Igwaran and Edoamodu (2021) showed that South Africa had the highest publications among other African countries. According to a study by John et al. (2023), both the United States and South Africa are at the forefront of research into tuberculosis in pregnancy, with a particular focus on prevention and treatment. Also Nieto-Chumbipuma and colleagues (2022) note that over the past decade, South Africa has been one of the leading countries in terms of publishing articles on tuberculosis within the African continent.
In addition, Garrido-Cardenas et al. (2020) demonstrated that the leading publishers of tuberculosis-related studies were the USA, the UK, and India. Also, The reason behind this outcome is the substantial investment made by this countries towards scientific research.
The results of this study showed that publications related to drug-resistant tuberculosis were mainly on the following subjects: Drug Resistance, Care, Treatment, Drug Activity, Patient, and Drug Dosy Therapy Regimen.
Drug resistance is a key area of focus for many publications related to drug-resistant tuberculosis. Researchers are working to better understand how drug resistance develops and how it can be prevented. Care is another important subject area, as healthcare providers must provide specialized care to patients with drug-resistant tuberculosis. Treatment is also a major focus, as researchers work to identify effective treatments for drug-resistant tuberculosis.
Drug activity refers to the effectiveness of drugs used to treat drug-resistant tuberculosis. Many publications focus on testing new drugs or combinations of drugs to determine their effectiveness. Patient-centered care is also an important aspect of publications related to drug-resistant tuberculosis, as it recognizes the importance of involving patients in their own care.
Finally, drug dosy therapy regimens are a key area of focus for publications related to drug-resistant tuberculosis. These regimens involve carefully managing the dosages and timing of medications to ensure the best possible outcome for patients.
Dastani et al. (2022) have also shown that tuberculosis scientific publications have been published in the topics of “Clinical symptoms”, “Diagnosis and treatment”, “Bacterial structure, pathogenicity and genetics”, and “Prevention”.
Igwaran and Edoamodu (2021) also indicated that tuberculosis-related studies in Africa focused on three main clusters, including infection types, severity of infection, and mycobacterium species.
Garrido-Cardenas et al. (2020) showed that in global studies related to tuberculosis, rifampicin, isoniazid, ethambutol, pyrazinamide, streptomycin, kanamycin, amikacin, and ciprofloxacin drugs had highest frequencies. The keyword analysis also indicated that drug-resistant tuberculosis and co-infection with HIV were two major health worries about this disease.
This study revealed that the treatment of tuberculosis and the effect of various drugs on the bacterium that causes tuberculosis had been taken into account since 1952, and the first drugs to treat tuberculosis had been investigated and researched. With the discovery and introduction of anti-tuberculosis drugs to the market and utilization in different countries, the resistance of this bacterium to these drugs has been considered, so that drug resistance has been one of the topics studied since 1980. Since the patient must use anti-tuberculosis drugs for a long time, regular use by the patient is probable and consequent drug resistance is expected. Drug resistance in tuberculosis has become one of the global problems and is associated with annual mortality the WHO implements extensive and costly control and prevention programs around the world to combat this issue.
Nowadays, with the expansion of communication and facilitation of transportation, drug-resistant tuberculosis strains are transmitted among various places and countries and easily from endemic and underdeveloped or developing countries with poor health systems to other countries with a higher level of health.
Moreover, the data analysis indicated that drug resistance is the most addressed issue in the field of drug-resistant tuberculosis, which indicates that the emergence of resistant strains has imposed high costs on the health system of different countries, and scientists are looking for a solution to overcome this problem.
In conclusion, publications related to drug-resistant tuberculosis cover a wide range of subjects, but there are several key areas of focus. Drug resistance, care, treatment, drug activity, patient, and drug dosy therapy regimen are all important subject areas that researchers and healthcare providers must consider when working to prevent and treat drug-resistant tuberculosis. The results of bibliometric and text mining analyses on drug-resistant tuberculosis have shown the orientation of current global research in this field. Moreover, the interest in studying drug-resistant tuberculosis is increasing, and controlling its prevalence has grow into one of the top health preferences in the world.
Cite this article
Mardaneh J, Ahmadi R, Dastani M. Topical Analysis of Scientific Publications on Drug-Resistant Tuberculosis Using Bibliometric and Text Mining Techniques. J Scientometric Res. 2023;12(2):416-21.
Although the study provides valuable insights into the trends and patterns of scientific publications on drug-resistant tuberculosis, there are limitations that should be acknowledged.
Firstly, the study only analyzed publications indexed in the WOS Core Collection citation database, which may not include all relevant publications on the topic. Secondly, the study relied solely on bibliometric and text mining techniques to analyze the data, which may have limited the scope of the analysis. Secondly, this study has focused more on text mining techniques for data analysis and some bibliometric indicators have been investigated And due to the limitation in the number of pages, figures and tables of the article, other bibliometric indicators have not been considered.
The authors express their gratitude to the Infectious Diseases Research Center of Gonabad University of Medical Sciences for providing financial support for this study.
- Glaziou P, Floyd K, Raviglione MC. Global epidemiology of tuberculosis. Seminars in respiratory and critical care medicine. 2018;39(3):271-285.
- Dheda K, Gumbo T, Maartens G, Dooley KE, Murray M, Furin J, et al. The Lancet Respiratory Medicine Commission: 2019 update: epidemiology, pathogenesis, transmission, diagnosis, and management of multidrug-resistant and incurable tuberculosis. The Lancet Respiratory Medicine. 2019;7(9):820-6.
- Fassihi A, Azadpour Z, Delbari N, Saghaie L, Memarian HR, Sabet R, et al. Synthesis and antitubercular activity of novel 4-substituted imidazolyl-2, 6-dimethyl-N3, N5-bisaryl-1, 4-dihydropyridine-3, 5-dicarboxamides. European Journal of Medicinal Chemistry. 2009;44(8):3253-8.
- Luna JAC, Mendoza GP, de Castro FR. Multi-drug resistant tuberculosis, ten years later. Medicina Clínica (English Edition). 2021;156(8):393-401.
- Karagoz A, Tutun H, Altintas L, Alanbayi U, Yildirim D, Kocak N, et al. Molecular typing of drug-resistant Mycobacterium tuberculosis strains from Turkey. Journal of Global Antimicrobial Resistance. 2020;23:130-4.
- Khoshneviszadeh M, Edraki N, Javidnia K, Alborzi A, Pourabbas B, Mardaneh J, et al. Synthesis and biological evaluation of some new 1, 4-dihydropyridines containing different ester substitute and diethyl carbamoyl group as anti-tubercular agents. Bioorganic and medicinal chemistry. 2009;17(4):1579-86.
- Cohen KA, Manson AL, Desjardins CA, Abeel T, Earl AM. Deciphering drug resistance in Mycobacterium tuberculosis using whole-genome sequencing: progress, promise, and challenges. Genome Medicine. 2019;11(1):1-18.
- Thompson DF, Walker CK. A descriptive and historical review of bibliometrics with applications to medical sciences. Pharmacotherapy: The Journal of Human Pharmacology and Drug Therapy. 2015;35(6):551-9.
- Choudhary AK, Oluikpe P, Harding JA, Carrillo PM. The needs and benefits of Text Mining applications on Post-Project Reviews. Computers in Industry. 2009;60(9):728-40.
- Danesh F, Dastani M, Ghorbani M. Retrospective and prospective approaches of coronavirus publications in the last half-century: a Latent Dirichlet allocation analysis. Library Hi Tech. 2021
- Dastani M, Danesh F. Iranian COVID-19 Publications in LitCovid: Text Mining and Topic Modeling. Scientific Programming. 2021:2021
- Chang L, Su Y, Zhu R, Duan Z. Mapping international collaboration in tuberculosis research from 1998 to 2017: A scientometric study. Medicine. 2019;98(37)
- Nafade V, Nash M, Huddart S, Pande T, Gebreselassie N, Lienhardt C, et al. A bibliometric analysis of tuberculosis research, 2007–2016. PloS one. 2018;13(6):e0199706
- Groneberg DA, Weber E, Gerber A, Fischer A, Klingelhoefer D, Brueggmann D, et al. Density equalizing mapping of the global tuberculosis research architecture. Tuberculosis. 2015;95(4):515-22.
- Garrido-Cardenas JA, de Lamo-Sevilla C, Cabezas-Fernández MT, Manzano-Agugliaro F, Martínez-Lirola M. Global tuberculosis research and its future prospects. Tuberculosis. 2020;121:101917
- Igwaran A, Edoamodu CE. Bibliometric Analysis on Tuberculosis and Tuberculosis-Related Research Trends in Africa: A Decade-Long Study. Antibiotics. 2021;10(4):423
- Sweileh WM, AbuTaha AS, Sawalha AF, Al-Khalil S, Al-Jabi SW, Zyoud SeH, et al. Bibliometric analysis of worldwide publications on multi-, extensively, and totally drug–resistant tuberculosis (2006-2015). Multidisciplinary Respiratory Medicine. 2016;11(1):1-16.
- Radanliev P, De Roure D, Walton R. Data mining and analysis of scientific research data records on covid 19 mortality, immunity, and vaccine development-In the first wave of the Covid-19 pandemic. Diabetes and Metabolic Syndrome: Clinical Research and Reviews. 2020
- Älgå A, Eriksson O, Nordberg M. Analysis of Scientific Publications During the Early Phase of the COVID-19 Pandemic: Topic Modeling Study. J Med Internet Res. 2020;22(11):e21559
- Dastani M, Mardaneh J, Pouresmaeil O. Detecting Latent Topics and Trends in Global Publications on Brucellosis Disease Using Text Mining. Interdisciplinary Perspectives on Infectious Diseases. 2022:2022
- Blei DM. Probabilistic topic models. Communications of the ACM. 2012;55(4):77-84.
- Kang HJ, Kim C, Kang K. Analysis of the Trends in Biochemical Research Using Latent Dirichlet Allocation (LDA). Processes. 2019;7(6):379
- Jelodar H, Wang Y, Yuan C, Feng X, Jiang X, Li Y, et al. Latent Dirichlet Allocation (LDA) and Topic modeling: models, applications, a survey. Multimedia Tools Applications. 2019;78(11):15169-211.
- Sarkar D. Text Analytics with Python: A Practitioner’s Guide to Natural Language Processing. 2016
- Joshi C, Attar VZ, Kalamkar SP. An unsupervised topic modeling approach for adverse drug reaction extraction and identification from natural language text. 2022:505-14.
- Balaha HM, Saafan MM. Automatic Exam Correction Framework (AECF) for the MCQs, Essays, and Equations Matching. IEEE Access. 2021;9:32368-89.
- Siddiqui T, Amer AY, Khan NA. Criminal activity detection in social network by text mining: Comprehensive analysis. 2019:224-229.
- Dabade MS. Sentiment analysis of Twitter data by using deep learning And machine learning. Turkish Journal of Computer and Mathematics Education (TURCOMAT). 2021;12(6):962-70.
- Zemlyanaya N, Gelmanova I, Mishustin S, Janova G. Estimating costs for treating patients with Multi-Drug Resistant Tuberculosis (MDR TB) under the regional tuberculosis control program, Tomsk (Russia). Eur Respiratory Soc. 2015
- Chung-Delgado K, Guillen-Bravo S, Revilla-Montag A, Bernabe-Ortiz A. Mortality among MDR-TB cases: comparison with drug-susceptible tuberculosis and associated factors. PloS one. 2015;10(3):e0119332
- Aaron L, Saadoun D, Calatroni I, Launay O, Memain N, Vincent V, et al. Tuberculosis in HIV-infected patients: a comprehensive review. Clinical microbiology and infection. 2004;10(5):388-98.
- Ramos J, Padilla S, Masia M, Gutierrez F. A bibliometric analysis of tuberculosis research indexed in PubMed, 1997-2006. The International Journal of Tuberculosis and Lung Disease. 2008;12(12):1461-8.
- Barja-Ore J, Retamozo-Siancas Y, Fernandez-Giusti A, Guerrero ME, Munive-Degregori A, Mayta-Tovalino F, et al. Trends, collaboration, and visibility of global scientific production on birth complications in pregnant women with tuberculosis: A scientometric study. International Journal of Mycobacteriology. 2023;12(2):111-6.
- Nieto-Chumbipuma J, Silva-Reategui L, Fernandez-Giusti A, Barja-Ore J, Retamozo-Siancas Y, Mayta-Tovalino F, et al. Scientometric analysis of the world scientific production on tuberculosis associated with COVID-19. The International Journal of Mycobacteriology. 2022;11(3):249-55.
- Dastani M, Mohammadzadeh A, Mardaneh J, Ahmadi R. Topic Analysis and Mapping of Tuberculosis Research Using Text Mining and Co-Word Analysis. Tuberculosis Research and Treatment. 2022
- Alemu A, Bitew ZW, Worku T, Gamtesa DF, Alebel A. Predictors of mortality in patients with drug-resistant tuberculosis: A systematic review and meta-analysis. PloS one. 2021;16(6):e0253848