My Research

So you are interested in my research? Check out the most recent research down below!

Recent Research

The use of computing to manage, organize and analyze biological and clinical data has become very important element of biology and medical research. My current research in bioinformatics is rooted in an interdisciplinary content both in style and context due to my decade long professional career in a molecular biology laboratory. I bring my wide variety of research experience in the life sciences (plant, animal bacteria and virus molecular biology and biochemistry) into the classroom and into my research work in bioinformatics.

One of the broad goals of my research work in biomedical informatics encompasses the study of information transmission and exchange in living systems, with a particular interest in mitochondria. Additionally, my collaborative translational research effort includes the study of effective communication methods between health service providers and patients with chronic diseases, such as diabetes, COPD and hypertension (common among the elderly population). We are currently evaluating the role of an information technology solution (telehealth) in the delivery of remote health services to further health care services to rural areas where traditional medicine is largely inaccessible.

  • Our research group uses biosystems, computational biology, and bioinformatics to advance our understanding of colorectal cancer (CRC) and identify biomarkers for diagnosis and treatment. Specifically, our studies have identified potential biomarkers for detecting CRC, investigated the role of claudins in CRC, proposed a Data-Driven Reference approach for identifying reliable biomarkers, investigated the role of claudins in obesity-induced organ and tissue-specific tight junction restructuring, and conducted a coexpression network analysis of miRNA-142 overexpression in neuronal cells. These studies have significantly contributed to our understanding of CRC and related diseases, as well as the development of reliable biomarkers for diagnosis and treatment.

    The study aimed to identify biomarkers that can objectively detect colorectal cancer (CRC) and improve CRC diagnosis and treatment. We identified differentially expressed genes (DEGs) and filtered them based on additional parameters to produce a prioritized list of 37 potential CRC biomarkers. The filtering was based on epithelial-mesenchymal transition enrichment, and the results were obtained from two independent datasets. The genes were ranked using a data-driven reference method, and ETV4, CLDN1, and CA2 were identified as the top-ranked biomarkers with an accuracy of 89% and an F1 score of 0.89. The study concluded that the combination of biological and statistical information produced a better set of CRC detection biomarkers.

    This study investigates the role of claudins in colorectal cancer (CRC) and develops a molecular signature based on claudin-1 and claudin-7 associated with poor patient survival and chemoresistance. We used an integrated approach including publicly available datasets, CRC samples from patients, CRC cell lines, and patient-derived tumoroid models to validate their findings. Transcriptomic analysis initially yielded 23 genes that were differentially expressed along with higher claudin-1 and decreased claudin-7. From this analysis, we selected a claudins-associated molecular signature including PIK3CA, SLC6A6, TMEM43, and ASAP-1 based on their importance in CRC. The upregulation of these genes and their protein products was validated using multiple CRC patient datasets, in vitro chemoresistant cell lines, and patient-derived tumoroid models. Blocking these genes improved 5-FU sensitivity in chemoresistant CRC cells. The findings suggest that this claudin-based molecular signature associates with poor prognosis as well as characteristics of treatment-resistant CRC including chemoresistance, metastasis, and relapse.

    The study addresses the challenge of identifying a set of reliable and reproducible biomarkers across various gene expression platforms and laboratories for single sample diagnosis and prognosis. The authors propose a Data-Driven Reference (DDR) approach that employs stably expressed housekeeping genes as references to eliminate platform-specific biases and non-biological variabilities. The method identifies biomarkers with 'built-in' features, which can be interpreted consistently regardless of profiling technology, enabling classification of single samples independently of platforms. The authors validate the approach with RNA-seq data of blood platelets and demonstrate its superior performance in classifying six different tumor types and molecular target statuses with smaller sets of biomarkers. The study also shows that the method is capable of identifying robust biomarkers for subgrouping medulloblastoma samples across different microarray platforms, and even identifies potential new biomarkers. The authors conclude that their data-driven method is simple yet powerful and contributes significantly to identifying a robust cross-platform gene signature for disease classification of single patients, facilitating precision medicine.

    Obesity increases the risk of developing multiple organ disorders, but the exact underlying mechanisms are not well understood. This study focused on the role of gut permeability and subclinical inflammation in obesity-associated comorbidities. The study found that claudin proteins, which are important for maintaining the integrity of tight junctions, undergo tissue-specific switching in obese organs, leading to profound restructuring of the tight junctions. The study also found potential links between the claudins and signaling and metabolic pathways relevant to disease. In vitro studies supported the idea that changes in the tissue microenvironment play a causal role in these barrier deregulations. These findings shed light on the molecular processes underlying obesity-associated changes and suggest new opportunities for prevention and treatment.

    MicroRNAs are small molecules that regulate gene expression by binding to mRNA transcripts. miRNA-142 is a type of microRNA that is overexpressed in neurons and is known to regulate SIRT1 and MAOA genes. However, analyzing gene expression data by only focusing on up- or downregulated genes can overlook important relationships between genes that are affected by miRNA-142 overexpression. To better understand the impact of miRNA-142 overexpression on gene expression networks, a correlation network model was used to identify coexpressed genes in wild type and miRNA-142 overexpressing neuronal cells. By integrating miRNA seed sequence mapping information, genes greatly affected by miRNA-142 overexpression were identified. Analysis of the enriched networks revealed that genes related to nervous system development, such as TEAD2, PLEKHA6, and POGLUT1, were greatly impacted by miRNA-142 overexpression. This study highlights the importance of combining multiple sources of knowledge to infer meaningful relationships in systems biology.

  • Our research group applies information technology to healthcare and conducts studies in health informatics. We use various data-driven and technological approaches to improve healthcare outcomes. Our studies include investigating glycemic variability and diabetes complications, identifying patient characteristics for better glycemic control, summarizing international workshops on technology and data-driven innovations in cancer care, evaluating medical professionals' interpretation of genetic test results and opinions on direct-to-consumer genetic testing, analyzing cancer patients' and healthcare providers' perceptions of chemotherapy using Twitter data, and developing predictive models of mosquito-borne disease spread using social media and other open intelligence sources. These studies exemplify the potential of health informatics to improve healthcare outcomes and tackle public health challenges. They represent a small sample of the diverse research that falls under the health informatics umbrella.

    The study aimed to investigate the impact of glycemic variability (GV) on diabetes complications and identify patient characteristics associated with better GV control. Electronic data from patients with diabetes who had five recent hemoglobin A1C (HbA1c) values were analyzed using control variability grid analysis (CVGA) and coefficient of variability (CV) to cluster glycemic fluctuations. LASSO was used to select important variables, and statistical tests such as Chi-Square, Fisher's exact test, Bonferroni chi-Square adjusted residual analysis, and multivariate Kruskal-Wallis tests were performed to evaluate the disease outcomes. Patients with better GV control were associated with a lower risk of disorders related to lipoproteins, fluid, electrolyte, and acid-base balance. In contrast, those with poor GV control were more likely to have other health issues and required long-term drug therapy. The study suggests that reducing GV could help in managing diabetes complications by improving electrolyte balance and reducing lipid profile differences.

    This study summarizes the results of two international workshops on the use of technology and data-driven innovations in the management of cancer care. Four EU H2020-funded projects organized the workshops, which identified several topics and challenges related to the use of ICT-based systems, including patient engagement, knowledge management, and trust. The study highlights the potential benefits of these innovations, but also outlines the challenges that must be addressed to maximize their impact. The paper concludes that trust and engagement across the stakeholder ecosystem are crucial for the successful implementation of technology and data-driven solutions in cancer care management. Practical recommendations are provided for future research and implementation efforts.

    The study evaluated medical professionals' ability to interpret genetic test results and their opinions on direct-to-consumer genetic tests (DTC-GT). Specialists had a higher correct interpretation rate, self-efficacy, and level of preparedness than medical providers. However, primary care providers can still provide accurate interpretation when specialists are unavailable. The findings suggest a need to increase the number of genetic specialists to meet the demand for precision medicine.

    This study aimed to analyze and compare perceptions about chemotherapy of cancer patients and healthcare providers using Twitter data. Cancer-related Twitter accounts were collected and classified into individuals and organizations using a Long Short-Term Memory (LSTM) network with GloVe word embeddings. The study analyzed 13,273 and 14,051 chemotherapy-related tweets from individual and organizational accounts, respectively, using text mining approaches such as topic modeling, sentiment analysis, and word co-occurrence network. Results showed that personal accounts had more emotional tweets about personal chemotherapy experiences, while professional accounts had a higher proportion of neutral tweets about side effects. However, information about the assessment of response to chemotherapy was deficient from organizations on Twitter. The study highlights the potential of using Twitter as a valuable healthcare data source for helping oncologists in understanding patients' experiences while undergoing chemotherapy, developing personalized therapy plans, and supplementing clinical electronic medical records.

    Mosquito-borne diseases such as West Nile virus, Chikungunya virus, and Zika virus pose significant public health challenges worldwide. Early prediction of disease spread is essential for effective public health interventions. However, predicting the spread of mosquito-borne diseases months in advance can be challenging, particularly when little information is available. To address this issue, researchers propose using social media and other open intelligence sources to develop predictive models of disease progression. In this study, researchers adapted a previously described model for the spread of mosquito-borne diseases and implemented a mixed-model that can be executed quickly. The results indicate that this model can provide fast and relevant predictions with acceptable margins of error.

    The study aimed to assess the ability of customers of the direct-to-consumer (DTC) genetic testing company 23andMe to interpret and comprehend their test results, and to determine if honest brokers are needed. A total of 122 participants were polled in an online survey, where they were asked about their personal test results and to interpret the results of two mock test cases for type 2 diabetes and multiple sclerosis. The results showed that only 23.8% of the participants were able to interpret both cases correctly, although most of the subjects were able to correctly assess the risk for each case. Participants who read the supplemental material provided by the DTC test were almost 4 times more likely to correctly interpret the test results. The study suggests that involving more health professionals in the process may be necessary to ensure proper interpretation of DTC genetic test results, especially as the market for DTC genetic testing continues to grow.

    The study shows that a graph database architecture using SNOMED CT terminology for patient data can improve data richness and advanced data querying capability. The results demonstrate that logical disjunction and negation queries were possible using the data model, as well as, queries that extended beyond the structural IS_A hierarchy of SNOMED CT to include queries that employed defining attribute-values of SNOMED CT concepts as search parameters. This alternative approach to querying patient data can accommodate additional granularity of clinical concepts without sacrificing speed.

  • Our work covers a wide range of fields, including biosystems and applied informatics in pharmacology and pharmacogenomics. We use computational methods to investigate the molecular mechanisms underlying biological processes, such as metabolite production, cancer therapy, drug discovery, and protein localization. We've developed the HerbMicrobeDataBase (HMDB) to study the effects of culinary herbs on gut health and have studied the role of the host immune system in response to cancer therapy. Our group has also developed computational pipelines for predicting therapeutic activities of natural compounds and identifying proteins localized to mitochondria. Through our work, we aim to improve our understanding of various biological processes and contribute to drug discovery.

    Secondary metabolites in plants have been widely used for various purposes, such as dye, drugs, and perfumes. They are increasingly recognized as potential sources of new natural drugs and antibiotics. Recently, gut-associated microbes have been found to play important roles in human health, but our understanding of the impact of secondary metabolites from culinary herbs on gut microbiome is limited. To address this gap, a graph-based database called HerbMicrobeDataBase (HMDB) was developed using the Neo4j framework. HMDB integrates knowledge from key biological entities associated with maintaining gut health and provides efficient storage, retrieval, and graphical presentation of botanical, biochemical, and pharmacological data for culinary herbs and the human microbiome. The resource is useful for understanding the molecular mechanisms of metabolite production and their therapeutic or toxicological effects on gut microbes.

    This study highlights the crucial role of the host immune system in response and resistance to cancer therapy, which has been the basis for the development of immunotherapy. However, the impact of the host immune response in attenuating the action of conventional anticancer therapies is an area that has not been fully explored. Despite advances in systemic therapy, the 5-year survival rate for adenocarcinoma remains low, with acquired resistance being the primary reason for treatment failure. Therefore, reliable biomarkers are needed for guiding treatment of lung and colon adenocarcinoma and predicting the outcomes of specific anticancer therapies. This study analyzed gene expression data using public resources and demonstrates how host immune competence influences the efficacy of various anticancer therapies. Moreover, the results shed light on the regulation of certain biochemical pathways relating to the immune system, suggesting that smart chemotherapeutic intervention strategies could be based on a patient's immune profile.

    Fragment-based approaches have become important in drug discovery, and natural compounds are being explored as potential sources of new drugs. This study presents a computational pipeline that automatically extracts statistically overrepresented chemical fragments in therapeutic classes and searches for similar fragments in a large database of natural products. By identifying enriched fragments in therapeutic groups, the researchers are able to focus on fragments that are likely to be active or structurally important. The results show that enriched fragments in several therapeutic classes are also found in many natural compounds, and the method can detect shared fragments even when overall similarity between a drug and a natural product is low. This approach has potential to predict therapeutic activities of natural compounds and identify novel leads for drug discovery.

    Mitochondria play a critical role in energy production and have been implicated in various diseases. While the human mitochondrial genome contains only 13 protein-coding genes, recent proteomic studies have revealed that over 1000 proteins are localized to mitochondria. Although many nuclear-encoded proteins are thought to be localized to mitochondria through N-terminal signal peptides, only 27% of these proteins contain such signals. In this study, the authors present a computational framework to identify mRNAs with enriched structural features in their 3'-UTRs as a potential alternative to peptide signal-based localization. Using this approach, they identified seven new proteins that were not previously known to be localized to mitochondria but are likely involved in mitochondria-related functions based on literature evidence. Overall, this study provides insights into the mechanisms underlying protein localization to mitochondria and expands the known mitochondrial proteome.

Old Publication