Jump to Navigation
Subscribe to Bioinformatics Oxford Journals feed
Updated: 14 hours 18 min ago

I2b2-etl: Python application for importing Electronic Health Data into the Informatics for Integrating Biology and the Bedside Platform

Fri, 02/09/2022 - 5:30am
AbstractMotivationThe i2b2 platform is used at major academic health institutions and research consortia for querying for electronic health data. However, a major obstacle for wider utilization of the platform is the complexity of data-loading that entails a steep curve of learning the platform’s complex data-schemas. To address this problem, we have developed the i2b2-etl package that simplifies the data loading process, which will facilitate wider deployment and utilization of the platform.ResultsWe have implemented i2b2-etl as a Python application that imports ontology and patient data using simplified input file schemas and provides inbuilt record-number de-identification and data-validation. We describe a real-world deployment of i2b2-etl for a population-management initiative at MassGeneral Brigham.Availabilityi2b2-etl is a free, open-source application implemented in Python available under the Mozilla 2 license. The application can be downloaded as compiled docker images. A live demo is available at https://i2b2clinical.org/demo-i2b2etl/ (username: demo, password: Etl@2021).Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

Heritability estimation for a linear combination of phenotypes via ridge regression

Fri, 02/09/2022 - 5:30am
AbstractMotivationThe joint analysis of multiple phenotypes is important in many biological studies, such as plant and animal breeding. The heritability estimation for a linear combination of phenotypes is designed to account for correlation information. Existing methods for estimating heritability mainly focus on single phenotypes under random-effect models. These methods also require some stringent conditions, which calls for a more flexible and interpretable method for estimating heritability. Fixed-effect models emerge as a useful alternative.ResultsIn this paper, we propose a novel heritability estimator based on multivariate ridge regression for linear combinations of phenotypes, yielding accurate estimates in both sparse and dense cases. Under mild conditions in the high-dimensional setting, the proposed estimator appears to be consistent and asymptotically normally distributed. Simulation studies show that the proposed estimator is promising under different scenarios. Compared with independently combined heritability estimates in the case of multiple phenotypes, the proposed method significantly improves the performance by considering correlations among those phenotypes. We further demonstrate its application in heritability estimation and correlation analysis for the Oryza sativa rice dataset.Availability and implementationAn R package implementing the proposed method is available at https://github.com/xg-SUFE1/MultiRidgeVar, where covariance estimates are also given together with heritability estimates.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

pcnaDeep: A Fast and Robust Single-Cell Tracking Method Using Deep-Learning Mediated Cell Cycle Profiling

Thu, 01/09/2022 - 5:30am
Abstract Computational methods that track single-cells and quantify fluorescent biosensors in time-lapse microscopy images have revolutionised our approach in studying the molecular control of cellular decisions. One barrier that limits the adoption of single-cell analysis in biomedical research is the lack of efficient methods to robustly track single-cells over cell division events. Here, we developed an application that automatically tracks and assigns mother-daughter relationships of single-cells. By incorporating cell cycle information from a well-established fluorescent cell cycle reporter, we associate mitosis relationships enabling high fidelity long-term single-cell tracking. This was achieved by integrating a deep-learning based fluorescent PCNA signal instance segmentation module with a cell tracking and cell cycle resolving pipeline. The application offers a user-friendly interface and extensible APIs for customized cell cycle analysis and manual correction for various imaging configurations.AvailabilitypcnaDeep is an open-source Python application under the Apache 2.0 licence. The source code, documentation and tutorials are available at https://github.com/chan-labsite/PCNAdeep.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

MetaboAnnotator: An efficient toolbox to annotate metabolites in genome-scale metabolic reconstructions

Thu, 01/09/2022 - 5:30am
AbstractMotivationGenome-scale metabolic reconstructions have been assembled for thousands of organisms using a wide-range of tools. However, metabolite annotations, required to compare and link metabolites between reconstructions remain incomplete. Here, we aim to further extend metabolite annotation coverage using various databases and chemoinformatic approaches.ResultsWe developed a COBRA toolbox extension, deemed MetaboAnnotator, which facilitates the comprehensive annotation of metabolites with database independent and dependent identifiers, obtains molecular structure files, and calculates metabolite formula and charge at pH 7.2. The resulting metabolite annotations allow for subsequent cross-mapping between reconstructions and mapping of, e.g., metabolomic data.AvailabilityMetaboAnnotator and tutorials are freely available at https://github.com/opencobra.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

No means ‘No’; a non-im-proper modeling approach, with embedded speculative context

Tue, 30/08/2022 - 5:30am
AbstractMotivationThe medical data are complex in nature as terms that appear in records usually appear in different contexts. Through this paper, we investigate various bio model’s embeddings(BioBERT, BioELECTRA, PubMedBERT) on their understanding of "negation and speculation context" wherein we found that these models were unable to differentiate "negated context" vs "non-negated context". To measure the understanding of models, we used cosine similarity scores of negated sentence embeddings vs non-negated sentence embeddings pairs. For improving these models, we introduce a generic super tuning approach to enhance the embeddings on "negation and speculation context" by utilizing a synthesized dataset.ResultsAfter super-tuning the models we can see that the model’s embeddings are now understanding negative and speculative contexts much better. Furthermore, we fine-tuned the super tuned models on various tasks and we found that the model has outperformed the previous models and achieved state-of-the-art (SOTA) on negation, speculation cue, and scope detection tasks on BioScope abstracts and Sherlock dataset. We also confirmed that our approach had a very minimal trade-off in the performance of the model in other tasks like Natural Language Inference after super-tuning.AvailabilityThe source code and the models are available at: https://github.com/comprehend/engg-airesearch/tree/uncertainty-super-tuning.Supplementary informationSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

SEMgraph: an R Package for Causal Network Inference of High-Throughput Data with Structural Equation Models

Tue, 30/08/2022 - 5:30am
AbstractMotivationWith the advent of high-throughput sequencing (HTS) in molecular biology and medicine, the need for scalable statistical solutions for modeling complex biological systems has become of critical importance. The increasing number of platforms and possible experimental scenarios raised the problem of integrating large amounts of new heterogeneous data and current knowledge, to test novel hypotheses and improve our comprehension of physiological processes and diseases.ResultsCombining network analysis and causal inference within the framework of structural equation modeling (SEM), we developed the R package SEMgraph. It provides a fully automated toolkit, managing complex biological systems as multivariate networks, ensuring robustness and reproducibility through data-driven evaluation of model architecture and perturbation, that is readily interpretable in terms of causal effects among system components.AvailabilitySEMgraph package is available at https://cran.r-project.org/web/packages/SEMgraph.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

scFeatures: Multi-view representations of single-cell and spatial data for disease outcome prediction

Tue, 30/08/2022 - 5:30am
AbstractMotivationWith the recent surge of large-cohort scale single cell research, it is of critical importance that analytical methods can fully utilize the comprehensive characterization of cellular systems that single cell technologies produce to provide insights into samples from individuals. Currently, there is little consensus on the best ways to compress information from the complex data structures of these technologies to summary statistics that represent each sample (e.g. individuals).ResultsHere, we present scFeatures, an approach that creates interpretable cellular and molecular representations of single-cell and spatial data at the sample level. We demonstrate that summarising a broad collection of features at the sample level is both important for understanding underlying disease mechanisms in different experimental studies and for accurately classifying disease status of individuals.AvailabilityscFeatures is publicly available as an R package at https://github.com/SydneyBioX/scFeatures. All data used in this study are publicly available with accession ID reported in the Methods section.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

Tangent normalization for somatic copy-number inference in cancer genome analysis

Tue, 30/08/2022 - 5:30am
AbstractMotivationSomatic copy-number alterations (SCNAs) play an important role in cancer development. Systematic noise in sequencing and array data present a significant challenge to the inference of SCNAs for cancer genome analyses. As part of The Cancer Genome Atlas (TCGA), the Broad Institute Genome Characterization Center developed the Tangent normalization method to generate copy-number profiles using data from single-nucleotide polymorphism (SNP) arrays and whole-exome sequencing (WES) technologies for over 10,000 pairs of tumors and matched normal samples. Here, we describe the Tangent method, which uses a unique linear combination of normal samples as a reference for each tumor sample, to subtract systematic errors that vary across samples. We also describe a modification of Tangent, called Pseudo-Tangent, which enables denoising through comparisons between tumor profiles when few normal samples are available.ResultsTangent normalization substantially increases signal-to-noise ratios (SNRs) compared to conventional normalization methods in both SNP array and WES analyses. Tangent and Pseudo-Tangent normalizations improve the SNR by reducing noise with minimal effect on signal and exceed the contribution of other steps in the analysis such as choice of segmentation algorithm. Tangent and Pseudo-Tangent are broadly applicable and enable more accurate inference of SCNAs from DNA sequencing and array data.AvailabilityTangent is available at https://github.com/broadinstitute/tangent and as a Docker image (https://hub.docker.com/r/broadinstitute/tangent). Tangent is also the normalization method for the copy-number pipeline in Genome Analysis Toolkit 4 (GATK4).Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

CovidGraph: A Graph to fight COVID-19

Tue, 30/08/2022 - 5:30am
AbstractSummaryReliable and integrated data is a prerequisite for effective research on the recent COVID-19 pandemic. The CovidGraph project integrates and connects heterogeneous COVID-19 data in a knowledge graph, referred to as “CovidGraph”. It provides easy access to multiple data sources through a single point of entry and enables flexible data exploration.Availability and ImplementationMore information on CovidGraph is available from the project website: https://healthecco.org/covidgraph/. Source code and documentation are provided on GitHub: https://github.com/covidgraph.Supplementary informationSupplementary dataSupplementary data is available at Bioinformatics online.
Categories: Bioinformatics Trends

APSCALE: advanced pipeline for simple yet comprehensive analyses of DNA Meta-barcoding data

Sat, 27/08/2022 - 5:30am
AbstractSummaryDNA metabarcoding is an emerging approach to assess and monitor biodiversity worldwide and consequently the number and size of data sets increases exponentially. To date no published DNA metabarcoding data processing pipeline exists that is i) platform independent, ii) easy to use (incl. GUI), iii) fast (does scale well with dataset size), and iv) complies with data protection regulations of e.g., environmental agencies. The presented pipeline APSCALE meets these requirements and handles the most common tasks of sequence data processing, such as paired-end merging, primer trimming, quality filtering, clustering and denoising of any popular metabarcoding marker, such as ITS (internal transcribed spacer), 16S, or COI (cytochrome c oxidase subunit I). APSCALE comes in a command-line and a GUI version. The latter provides the user with additional summary statistics options and links to GUI-based downstream applications.AvailabilityAPSCALE is written in Python, a platform-independent language, and integrates functions of the open-source tools, VSEARCH (Rognes et al. 2016), cutadapt (Martin et al, 2011) and LULU (Frøslev et al. 2017). All modules support multithreading to allow fast processing of larger DNA metabarcoding datasets. Further information, and troubleshooting are provided on the respective GitHub pages for the command line version (https://github.com/DominikBuchner/apscale) and the GUI-based version (https://github.com/TillMacher/apscale_gui), including a detailed tutorial.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

expam—high-resolution analysis of metagenomes using distance trees

Sat, 27/08/2022 - 5:30am
AbstractSummaryShotgun metagenomic sequencing provides the capacity to understand microbial community structure and function at unprecedented resolution; however, current analytical methods are constrained by a focus on taxonomic classifications that may obfuscate functional relationships. Here we present expam, a tree based, taxonomy agnostic tool for identification of biologically relevant clades from shotgun metagenomic sequencing.Availability and Implementationexpam is an open-source Python application released under the GNU General Public Licence v3.0. expam installation instructions, source code and tutorials can be found at https://github.com/seansolari/expam.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

DeepToA: An Ensemble Deep-Learning Approach to Predicting the Theater of Activity of a Microbiome

Sat, 27/08/2022 - 5:30am
AbstractMotivationMetagenomics is the study of microbiomes using DNA sequencing. A microbiome consists of an assemblage of microbes that is associated with a “theater of activity” (ToA). An important question is, to what degree does the taxonomic and functional content of the former depend on the (details of the) latter? Here we investigate a related technical question: Given a taxonomic and/or functional profile estimated from metagenomic sequencing data, how to predict the associated ToA? We present a deep-learning approach to this question. We use both taxonomic and functional profiles as input. We apply node2vec to embed hierarchical taxonomic profiles into numerical vectors. We then perform dimension reduction using clustering, to address the sparseness of the taxonomic data and thus make the problem more amenable to deep-learning algorithms. Functional features are combined with textual descriptions of protein families or domains. We present an ensemble deep-learning framework DeepToA for predicting the “theater of activity” of amicrobial community, based on taxonomic and functional profiles. We use SHAP (SHapley Additive exPlanations) values to determine which taxonomic and functional features are important for the prediction.ResultsBased on 7,560 metagenomic profiles downloaded from MGnify, classified into ten different theaters of activity, we demonstrate that DeepToA has an accuracy of 98.30%. We show that adding textual information to functional features increases the accuracy.AvailabilityOur approach is available at http://ab.inf.uni-tuebingen.de/software/deeptoa.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

Aclust2.0: a revamped unsupervised R tool for Infinium methylation beadchips data analyses

Fri, 26/08/2022 - 5:30am
AbstractMotivationA wide range of computational packages has been developed for regional DNA methylation analyses of Illumina’s Infinium array data. Aclust, one of the first unsupervised algorithms, was originally designed to analyze regional methylation of Infinium’s 27K and 450K arrays by clustering neighboring methylation sites prior to downstream analyses. However, Aclust relied on outdated packages that rendered it largely non-operational especially with the newer Infinium EPIC and mouse arrays.ResultsWe have created Aclust2.0, a streamlined pipeline that involves five steps for the analyses of human (450K and EPIC) and mouse array data. Aclust2.0 provides a user-friendly pipeline and versatile for regional DNA methylation analyses for molecular epidemiological and mouse studies.AvailabilityAclust2.0 is freely available on Github (https://github.com/OluwayioseOA/Alcust2.0.git).
Categories: Bioinformatics Trends

FastMix: A Versatile Data Integration Pipeline for Cell Type-Specific Biomarker Inference

Fri, 26/08/2022 - 5:30am
AbstractMotivationFlow cytometry (FCM) and transcription profiling are the two widely used assays in translational immunology research. However, there is no data integration pipeline for analyzing these two types of assays together with experiment variables for biomarker inference. Current FCM data analysis mainly relies on subjective manual gating analysis, which is difficult to be directly integrated with other automated computational methods. Existing deconvolutional analysis of bulk transcriptomics relies on predefined marker genes in the transcriptomics data, which are unavailable for novel cell types and does not utilize the FCM data that provide canonical phenotypic definitions of the cell types.ResultsWe developed a novel analytics pipeline - FastMix - for computational immunology, which integrates flow cytometry, bulk transcriptomics, and clinical covariates for identifying cell type-specific gene expression signatures and biomarker genes. FastMix addresses the “large p, small n” problem in the gene expression and flow cytometry integration analysis via a linear mixed effects model (LMER) for both cross-sectional and longitudinal studies. Its novel moment-based estimator not only reduces bias in parameter estimation but also is more efficient than iterative optimization. The FastMix pipeline also includes a cutting-edge flow cytometry data analysis method - DAFi - for identifying cell populations of interest and their characteristics. Simulation studies showed that FastMix produced smaller type I/II errors than competing methods. Validation using real data of two vaccine studies showed that FastMix identified a consistent set of signature genes as in independent single cell RNA-seq analysis, producing additional interesting findings.AvailabilitySource code of FastMix is publicly available at https://github.com/terrysun0302/FastMix.Supplementary informationSupplementary text and data are available at Bioinformatics online.
Categories: Bioinformatics Trends

hCoCena: Horizontal integration and analysis of transcriptomics datasets

Fri, 26/08/2022 - 5:30am
AbstractMotivationTranscriptome-based gene co-expression analysis has become a standard procedure for structured and contextualized understanding and comparison of different conditions and phenotypes. Since large study designs with a broad variety of conditions are costly and laborious, extensive comparisons are hindered when utilizing only a single data set. Thus, there is an increased need for tools that allow the integration of multiple transcriptomic data sets with subsequent joint analysis, which can provide a more systematic understanding of gene co-expression and co-functionality within and across conditions. To make such an integrative analysis accessible to a wide spectrum of users with differing levels of programming expertise it is essential to provide user-friendliness and customizability as well as thorough documentation.ResultsThis paper introduces horizontal CoCena (hCoCena: horizontal construction of co-expression networks and analysis), an R-package for network-based co-expression analysis that allows the analysis of a single transcriptomic data set as well as the joint analysis of multiple data sets. With hCoCena we provide a freely available, user-friendly, and adaptable tool for integrative multi-study or single-study transcriptomics analyses alongside extensive comparisons to other existing tools.AvailabilityThe hCoCena R-package is provided together with R Markdowns that implement an exemplary analysis workflow including extensive documentation and detailed descriptions of data structures and objects. Such efforts not only make the tool easy to use but also enable the seamless integration of user-written scripts and functions into the workflow, creating a tool that provides a clear design while remaining flexible and highly customizable. The package and additional information including an extensive Wiki are freely available on GitHub: https://github.com/MarieOestreich/hCoCena. The version at the time of writing has been added to Zenodo under the following link: https://doi.org/10.5281/zenodo.6911782Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

Propeller: testing for differences in cell type proportions in single cell data

Thu, 25/08/2022 - 5:30am
AbstractMotivationSingle cell RNA Sequencing (scRNA-seq) has rapidly gained popularity over the last few years for profiling the transcriptomes of thousands to millions of single cells. This technology is now being used to analyse experiments with complex designs including biological replication. One question that can be asked from single cell experiments, which has been difficult to directly address with bulk RNA-seq data, is whether the cell type proportions are different between two or more experimental conditions. As well as gene expression changes, the relative depletion or enrichment of a particular cell type can be the functional consequence of disease or treatment. However, cell type proportion estimates from scRNA-seq data are variable and statistical methods that can correctly account for different sources of variability are needed to confidently identify statistically significant shifts in cell type composition between experimental conditions.ResultsWe have developed propeller, a robust and flexible method that leverages biological replication to find statistically significant differences in cell type proportions between groups. Using simulated cell type proportions data we show that propeller performs well under a variety of scenarios. We applied propeller to test for significant changes in cell type proportions related to human heart development, ageing and COVID-19 disease severity.AvailabilityThe propeller method is publicly available in the open source speckle R package (https://github.com/phipsonlab/speckle). All the analysis code for the paper is available at the associated analysis website: https://phipsonlab.github.io/propeller-paper-analysis/. The speckle package, analysis scripts and datasets have been deposited at https://doi.org/10.5281/zenodo.7009042.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

LipidMS 3.0: an R-package and a web-based tool for LC-MS/MS data processing and lipid annotation

Thu, 25/08/2022 - 5:30am
AbstractMotivationLipidMS was initially envisioned to use fragmentation rules and data-independent acquisition (DIA) for lipid annotation. However, data-dependent acquisition (DDA) remains the most widespread acquisition mode for untargeted LC-MS/MS-based lipidomics. Here we present LipidMS 3.0, an R package that not only adds DDA and new lipid classes to its pipeline, but also the required functionalities to cover the whole data analysis workflow from pre-processing (i.e., peak-peaking, alignment and grouping) to lipid annotation.ResultsWe applied the new workflow in the data analysis of a commercial human serum pool spiked with 68 representative lipid standards acquired in full scan, DDA and DIA modes. When focusing on the detected lipid standard features and total identified lipids, LipidMS 3.0 data pre-processing performance is similar to XCMS, whereas it complements the annotations returned by MS-DIAL, providing a higher level of structural information and a lower number of incorrect annotations. To extend and facilitate LipidMS 3.0 usage among less experienced R-programming users, the workflow was also implemented as a web-based application.AvailabilityThe LipidMS R-package is freely available at https://CRAN.R-project.org/package=LipidMS and as a website at http://www.lipidms.com.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

Microbench: Automated metadata management for systems biology benchmarking and reproducibility in Python

Wed, 24/08/2022 - 5:30am
AbstractMotivationComputational systems biology analyses typically make use of multiple software and their dependencies, which are often run across heterogeneous compute environments. This can introduce differences in performance and reproducibility. Capturing metadata (e.g., package versions, GPU model) currently requires repetitious code and is difficult to store centrally for analysis. Even where virtual environments and containers are used, updates over time mean that versioning metadata should still be captured within analysis pipelines to guarantee reproducibility.ResultsMicrobench is a simple and extensible Python package to automate metadata capture to a file or Redis database. Captured metadata can include execution time, software package versions, environment variables, hardware information, Python version, and more, with plugins. We present three case studies demonstrating Microbench usage to benchmark code execution and examine environment metadata for reproducibility purposes.AvailabilityInstall from the Python Package Index using pip install microbench. Source code is available from https://github.com/alubbock/microbench.
Categories: Bioinformatics Trends

Multi-way relation-enhanced hypergraph representation learning for anti-cancer drug synergy prediction

Wed, 24/08/2022 - 5:30am
AbstractMotivationDrug combinations have exhibited promise in treating cancers with less toxicity and fewer adverse reactions. However, in vitro screening of synergistic drug combinations is time-consuming and labour-intensive because of the combinatorial explosion. Although a number of computational methods have been developed for predicting synergistic drug combinations, the multi-way relations between drug combinations and cell lines existing in drug synergy data have not been well exploited.ResultsWe propose a multi-way relation-enhanced hypergraph representation learning method to predict anti-cancer drug synergy, named HypergraphSynergy. HypergraphSynergy formulates synergistic drug combinations over cancer cell lines as a hypergraph, in which drugs and cell lines are represented by nodes and synergistic drug-drug-cell line triplets are represented by hyperedges, and leverages the biochemical features of drugs and cell lines as node attributes. Then, a hypergraph neural network is designed to learn the embeddings of drugs and cell lines from the hypergraph and predict drug synergy. Moreover, the auxiliary task of reconstructing the similarity networks of drugs and cell lines is considered to enhance the generalization ability of the model. In the computational experiments, HypergraphSynergy outperforms other state-of-the-art synergy prediction methods on two benchmark datasets for both classification and regression tasks, and is applicable to unseen drug combinations or cell lines. The studies revealed that the hypergraph formulation allows us to capture and explain complex multi-way relations of drug combinations and cell lines, and also provides a flexible framework to make the best use of diverse information.Availability and implementationThe source data and codes of HypergraphSynergy can be freely downloaded from https://github.com/liuxuan666/HypergraphSynergy.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

Predicting cross-tissue hormone-gene relations using balanced word embeddings

Wed, 24/08/2022 - 5:30am
AbstractMotivationInter-organ/inter-tissue communication is central to multi-cellular organisms including humans, and mapping inter-tissue interactions can advance system-level whole-body modeling efforts. Large volumes of biomedical literature have fostered studies that map within-tissue or tissue-agnostic interactions, but literature mining studies that infer inter-tissue relations such as between hormones and genes are solely missing.ResultsWe present a first study to predict from biomedical literature the hormone-gene associations mediating inter-tissue signaling in the human body. Our BioEmbedS* models use neural network based Biomedical word Embeddings with a Support Vector Machine classifier to predict if a hormone-gene pair is associated or not, and whether an associated gene is involved in the hormone's production or response. Model training relies on our unified dataset HGv1 (Hormone-Gene version 1) of ground-truth associations between genes and endocrine hormones, which we compiled and carefully balanced in the embedded space to handle data disparities such as between poorly- vs. well-studied hormones. Our BioEmbedS model recapitulates known gene mediators of tissue-tissue signaling with 70.4% accuracy; predicts novel inter-tissue communication genes in humans which are enriched for hormone-related disorders; and generalizes well to mouse, thereby holding promise for its extension to other multi-cellular organisms as well.AvailabilityFreely available at https://cross-tissue-signaling.herokuapp.com are our model predictions & datasets; https://github.com/BIRDSgroup/BioEmbedS has all relevant code.Supplemental InformationSupplementary informationSupplementary information available at Bioinformatics online.
Categories: Bioinformatics Trends

Pages

Calendar

Mon
Tue
Wed
Thu
Fri
Sat
Sun
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
 
December 2022