Jump to Navigation
Subscribe to Bioinformatics Oxford Journals feed
Updated: 7 hours 17 min ago

AEON.py: Python Library for Attractor Analysis in Asynchronous Boolean Networks

Wed, 14/09/2022 - 5:30am
AbstractSummaryAEON.py is a Python library for the analysis of the long-term behaviour in very large asynchronous Boolean networks. It provides significant computational improvements over the state of the art methods for attractor detection. Furthermore, it admits the analysis of partially specified Boolean networks with uncertain update functions. It also includes techniques for identifying viable source-target control strategies and the assessment of their robustness with respect to parameter perturbations.Availability and ImplementationAll relevant results are available in supplementary materialssupplementary materials. The tool is accessible through https://github.com/sybila/biodivine-aeon-py.Supplementary informationSupplementary dataSupplementary data are available online through Bioinformatics.
Categories: Bioinformatics Trends

ScanExitronLR: characterization and quantification of exitron splicing events in long-read RNA-seq data

Tue, 13/09/2022 - 5:30am
AbstractSummaryExitron splicing is a type of alternative splicing where coding sequences are spliced out. Recently, exitron splicing has been shown to increase proteome plasticity and play a role in cancer. Long-read RNA-seq is well suited for quantification and discovery of alternative splicing events; however, there are currently no tools available for detection and annotation of exitrons in long-read RNA-seq data. Here we present ScanExitronLR, an application for the characterization and quantification of exitron splicing events in long-reads. From a BAM alignment file, reference genome and reference gene annotation, ScanExitronLR outputs exitron events at the individual transcript level. Outputs of ScanExitronLR can be used in downstream analyses of differential exitron splicing. In addition, ScanExitronLR optionally reports exitron annotations such as truncation or frameshift type, nonsense-mediated decay status, and Pfam domain interruptions. We demonstrate that ScanExitronLR performs better on noisy long-reads than currently published exitron detection algorithms designed for short-read data.AvailabilityScanExitronLR is freely available at https://github.com/ylab-hi/ScanExitronLR and distributed as a pip package on the Python Package Index.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

MGPLI: Exploring Multigranular Representations for Protein-Ligand Interaction Prediction

Mon, 12/09/2022 - 5:30am
AbstractMotivationThe capability to predict the potential drug binding affinity against a protein target has always been a fundamental challenge in-silico drug discovery. The traditional experiments in vitro and in vivo are costly and time-consuming which need to search over large compound space. Recent years have witnessed significant success on deep learning-based models for drug-target binding affinity (DTA) prediction task.ResultsFollowing the recent success of the Transformer model, we propose a multi-granularity protein ligand interaction (MGPLI) model, which adopts the Transformer encoders to represent the character-level features and fragment-level features, modeling the possible interaction between residues and atoms or their segments. In addition, we use the Convolutional Neural Network (CNN) to extract higher-level features based on transformer encoder outputs and a highway layer to fuse the protein and drug features. We evaluate MGPLI on different protein ligand interaction datasets and show the improvement of prediction performance compared to state-of-the-art baselines.AvailabilityThe model scripts are available at https://github.com/IILab-Resource/MGDTA.gitSupplementary informationSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

Systematic comparison of ranking aggregation methods for gene lists in experimental results

Mon, 12/09/2022 - 5:30am
AbstractMotivationA common experimental output in biomedical science is a list of genes implicated in a given biological process or disease. The gene lists resulting from a group of studies answering the same, or similar, questions can be combined by ranking aggregation methods to find a consensus or a more reliable answer. Evaluating a ranking aggregation method on a specific type of data before using it is required to support the reliability since the property of a dataset can influence the performance of an algorithm. Such evaluation on gene lists is usually based on a simulated database because of the lack of a known truth for real data. However, simulated datasets tend to be too small compared to experimental data and neglect key features, including heterogeneity of quality, relevance and the inclusion of unranked lists.ResultsIn this study, a group of existing methods and their variations which are suitable for meta-analysis of gene lists are compared using simulated and real data. Simulated data was used to explore the performance of the aggregation methods as a function of emulating the common scenarios of real genomic data, with various heterogeneity of quality, noise level, and a mix of unranked and ranked data using 20000 possible entities. In addition to the evaluation with simulated data, a comparison using real genomic data on the SARS-CoV-2 virus, cancer (NSCLC), and bacteria (macrophage apoptosis) was performed. We summarise the results of our evaluation in a simple flowchart to select a ranking aggregation method, and in an automated implementation using the meta-analysis by information content (MAIC) algorithm to infer heterogeneity of data quality across input data sets.AvailabilityThe code for simulated data generation and running edited version of algorithms: https://github.com/baillielab/comparison_of_RA_methods.Code to perform an optimal selection of methods based on the results of this review, using the MAIC algorithm to infer the characteristics of an input dataset, can be downloaded here: https://github.com/baillielab/maic. An online service for running MAIC: https://baillielab.net/maic. Supplementary informationSupplementary file 1Supplementary file 1 is available at Bioinformatics online. Supporting data 1-7 (supporting results and collected real genomic data) are available on GitHub at: https://github.com/baillielab/comparison_of_RA_methods.
Categories: Bioinformatics Trends

MODIG: Integrating Multi-Omics and Multi-Dimensional Gene Network for Cancer Driver Gene Identification based on Graph Attention Network Model

Mon, 12/09/2022 - 5:30am
AbstractMotivationIdentifying genes that play a causal role in cancer evolution remains one of the biggest challenges in cancer biology. With the accumulation of high-throughput multi-omics data over decades, it becomes a great challenge to effectively integrate these data into the identification of cancer driver genes.ResultsHere, we propose MODIG, a graph attention network (GAT)-based framework to identify cancer driver genes by combining multi-omics pan-cancer data (mutations, copy number variants, gene expression, and methylation levels) with multi-dimensional gene networks. First, we established diverse types of gene relationship maps based on protein-protein interactions (PPI), gene sequence similarity, KEGG pathway co-occurrence, gene co-expression patterns, and Gene Ontology (GO). Then, we constructed a multi-dimensional gene network consisting of approximately 20,000 genes as nodes and five types of gene associations as multiplex edges. We applied a GAT to model within-dimension interactions to generate a gene representation for each dimension based on this graph. Moreover, we introduced a joint learning module to fuse multiple dimension-specific representations to generate general gene representations. Finally, we used the obtained gene representation to perform a semi-supervised driver gene identification task. The experiment results show that MODIG outperforms the baseline models in terms of area under precision-recall curves (AUPR) and area under the receiver operating characteristic curves (AUROC).AvailabilityThe MODIG program is available at https://github.com/zjupgx/modig. The code and data underlying this article are also available on Zenodo, at https://doi.org/10.5281/zenodo.7057241.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

CANTATA - prediction of missing links in Boolean networks using genetic programming

Mon, 12/09/2022 - 5:30am
AbstractMotivationBiological processes are complex systems with distinct behaviour. Despite the growing amount of available data, knowledge is sparse and often insufficient to investigate the complex regulatory behaviour of these systems. Moreover, different cellular phenotypes are possible under varying conditions. Mathematical models attempt to unravel these mechanisms by investigating the dynamics of regulatory networks. Therefore, a major challenge is to combine regulations and phenotypical information as well as the underlying mechanisms. To predict regulatory links in these models, we established an approach called CANTATA to support the integration of information into regulatory networks and retrieve potential underlying regulations. This is achieved by optimising both static and dynamic properties of these networks.ResultsInitial results show that the algorithm predicts missing interactions by recapitulating the known phenotypes while preserving the original topology and optimising the robustness of the model. The resulting models allow for hypothesising about the biological impact of certain regulatory dependencies.AvailabilitySource code of the application, example files and results are available at https://github.com/sysbio-bioinf/Cantata.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

ASTRAL-Pro 2: ultrafast species tree reconstruction from multi-copy gene family trees

Mon, 12/09/2022 - 5:30am
AbstractMotivationSpecies tree inference from multi-copy gene trees has long been a challenge in phylogenomics. The recent method ASTRAL-Pro has made large strides by enabling multi-copy gene family trees as input and has been quickly adopted. Yet, its scalability, especially memory usage, needs to improve to accommodate the ever-growing dataset size.ResultsWe present ASTRAL-Pro 2, an ultrafast and memory efficient version of ASTRAL-Pro that adopts a placement-based optimization algorithm for significantly better scalability without sacrificing accuracy.AvailabilityThe source code and binary files are publicly available at https://github.com/chaoszhang/ASTER; data are available at https://github.com/chaoszhang/A-Pro2_data.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

PHi-C2: interpreting Hi-C data as the dynamic 3D genome state

Sat, 10/09/2022 - 5:30am
AbstractSummaryHigh-throughput chromosome conformation capture (Hi-C) is a widely used assay for studying the three-dimensional (3D) genome organization across the whole genome. Here, we present PHi-C2, a Python package supported by mathematical and biophysical polymer modeling that converts input Hi-C matrix data into the polymer model’s dynamics, structural conformations, and rheological features. The updated optimization algorithm for regenerating a highly similar Hi-C matrix provides a fast and accurate optimal solution compared to the previous version by eliminating the factors underlying the inefficiency of the optimization algorithm in the iterative optimization process. In addition, we have enabled a Google Colab workflow to run the algorithm, wherein users can easily change the parameters and check the results in the notebook. Overall, PHi-C2 represents a valuable tool for mining the dynamic 3D genome state embedded in Hi-C data.Availability and implementationPHi-C2 as the phic Python package is freely available under the GPL license and can be installed from the Python package index. The source code is available from GitHub at https://github.com/soyashinkai/PHi-C2. Moreover, users do not have to prepare a Python environment because PHi-C2 can run on Google Colab (https://bit.ly/3rlptGI).Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

A machine learning-based method for automatically identifying novel cells in annotating single cell RNA-seq data

Fri, 09/09/2022 - 5:30am
AbstractMotivationSingle cell RNA sequencing (scRNA-seq) has been widely used to decompose complex tissues into functionally distinct cell types. The first and usually the most important step of scRNA-seq data analysis is to accurately annotate the cell labels. In recent years, many supervised annotation methods have been developed and shown to be more convenient and accurate than unsupervised cell clustering. One challenge faced by all the supervised annotation methods is the identification of the novel cell type, which is defined as the cell type that is not present in the training data, only exists in the testing data. Existing methods usually label the cells simply based on the correlation coefficients or confidence scores, which sometimes results in an excessive number of unlabeled cells.ResultsWe developed a straightforward yet effective method combining autoencoder with iterative feature selection to automatically identify novel cells from scRNA-seq data. Our method trains an autoencoder with the labeled training data and applies the autoencoder to the testing data to obtain reconstruction errors. By iteratively selecting features that demonstrate a bi-modal pattern and reclustering the cells using the selected feature, our method can accurately identify novel cells that are not present in the training data. We further combined this approach with a support vector machine to provide a complete solution for annotating the full range of cell types. Extensive numerical experiments using five real scRNA-seq datasets demonstrated favorable performance of the proposed method over existing methods serving similar purposes.AvailabilityOur R software package CAMLU is publicly available through the Zenodo repository (https://doi.org/10.5281/zenodo.7054422) or GitHub repository (https://github.com/ziyili20/CAMLU).Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

dICC: Distance-based Intraclass Correlation Coefficient for Metagenomic Reproducibility Studies

Fri, 09/09/2022 - 5:30am
AbstractSummaryDue to the sparsity and high-dimensionality, microbiome data are routinely summarized into pairwise distances capturing the compositional differences. Many biological insights can be gained by analyzing the distance matrix in relation to some covariates. A microbiome sampling method that characterizes the inter-sample relationship more reproducibly is expected to yield higher statistical power. Traditionally, the intra-class correlation coefficient (ICC) has been used to quantify the degree of reproducibility for a univariate measurement using technical replicates. In this work, we extend the traditional ICC to distance measures and propose a distance-based ICC (dICC). We derive the asymptotic distribution of the sample-based dICC to facilitate statistical inference. We illustrate dICC using a real dataset from a metagenomic reproducibility study.AvailabilitydICC is implemented in the R CRAN package GUniFrac.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

The phers R package: using phenotype risk scores based on electronic health records to study Mendelian disease and rare genetic variants

Fri, 09/09/2022 - 5:30am
Abstract Electronic health record (EHR) data linked to DNA biobanks are a valuable resource for understanding the phenotypic effects of human genetic variation. We previously developed the phenotype risk score (PheRS) as an approach to quantify the extent to which a patient’s clinical features resemble a given Mendelian disease. Using PheRS, we have uncovered novel associations between Mendelian disease-like phenotypes and rare genetic variants, and identified patients who may have undiagnosed Mendelian disease. Although the PheRS approach is conceptually simple, it involves multiple mapping steps and was previously only available as custom scripts, limiting the approach’s usability. Thus, we developed the phers R package, a complete and user-friendly set of functions and maps for performing a PheRS-based analysis on linked clinical and genetic data. The package includes up-to-date maps between EHR-based phenotypes (i.e., ICD codes and phecodes), human phenotype ontology (HPO) terms, and Mendelian diseases. Starting with occurrences of ICD codes, the package enables the user to calculate phenotype risk scores, validate the scores using case-control analyses, and perform genetic association analyses. By increasing PheRS’s transparency and usability, the phers R package will help improve our understanding of the relationships between rare genetic variants and clinically meaningful human phenotypes.AvailabilityThe phers R package is free and open-source, and available on CRAN and at https://phers.hugheylab.org.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

dsMTL - a computational framework for privacy-preserving, distributed multi-task machine learning

Thu, 08/09/2022 - 5:30am
AbstractMotivationIn multi-cohort machine learning studies, it is critical to differentiate between effects that are reproducible across cohorts and those that are cohort-specific. Multi-task learning (MTL) is a machine learning approach that facilitates this differentiation through the simultaneous learning of prediction tasks across cohorts. Since multi-cohort data can often not be combined into a single storage solution, there would be the substantial utility of an MTL application for geographically distributed data sources.ResultsHere, we describe the development of “dsMTL”, a computational framework for privacy-preserving, distributed multi-task machine learning that includes three supervised and one unsupervised algorithms. First, we derive the theoretical properties of these methods and the relevant machine learning workflows to ensure the validity of the software implementation. Second, we implement dsMTL as a library for the R programming language, building on the DataSHIELD platform that supports the federated analysis of sensitive individual-level data. Third, we demonstrate the applicability of dsMTL for comorbidity modeling in distributed data. We show that comorbidity modeling using dsMTL outperformed conventional, federated machine learning, as well as the aggregation of multiple models built on the distributed datasets individually. The application of dsMTL was computationally efficient and highly scalable when applied to moderate-size (n < 500), real expression data given the actual network latency.AvailabilitydsMTL is freely available at https://github.com/transbioZI/dsMTLBase (server-side package) and https://github.com/transbioZI/dsMTLClient (client-side package). Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

VRPharmer: Bringing Virtual Reality into Pharmacophore-based Virtual Screening with Interactive Exploration and Realistic Visualization

Thu, 08/09/2022 - 5:30am
AbstractSummaryCurrent pharmacophore-based virtual screening (VS) software has limited interactive capabilities and less intuitive screening processes. In this study, a novel tool named VRPharmer is proposed to perform the entire VS workflow in VR environments. VRPharmer enables users to interactively perceive computation processes and immersively observe molecular structures. Besides a typical screening mode (OPT mode), VRPharmer provides a unique interactive screening mode (SCORE mode) for freely exploring the optimal binding poses. Pharmacophore models are editable to study the impact of each feature and further refine the screening results. Moreover, molecular rendering algorithms are improved for precise representations.Availability and implementationVRPharmer is open-source software under the MIT license. The released version is available at https://github.com/VRPharmer/VRPharmer.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

Icolos: A workflow manager for structure based post-processing of de novo generated small molecules

Thu, 08/09/2022 - 5:30am
AbstractSummaryWe present Icolos, a workflow manager written in Python as a tool for automating complex structure-based workflows for drug design. Icolos can be used as a standalone tool, for example in virtual screening campaigns, or can be used in conjunction with deep learning-based molecular generation facilitated for example by REINVENT, a previously published molecular de novo design package. In this publication, we focus on the internal structure and general capabilities of Icolos, using molecular docking experiments as an illustrative example.AvailabilityThe source code is freely available at https://github.com/MolecularAI/Icolos under the Apache 2.0 licence. Tutorial notebooks containing minimal working examples can be found at https://github.com/MolecularAI/IcolosCommunity.Supplementary informationA detailed description of the package, including common use cases, a full list of supported steps, and implementation details is provided in the SI.
Categories: Bioinformatics Trends

Access to ground truth at unconstrained size makes simulated data as indispensable as experimental data for bioinformatics methods development and benchmarking

Thu, 08/09/2022 - 5:30am
AbstractMotivationThe author instructions of this journal require that new methodology includes assessment on “actual biological data” as opposed to simulated data. When available in high quantity, fidelity, and generality, experimental (biological) data may ensure assessment validity. For many bioinformatics application areas, however, experimental data are not available at a sufficient scale or with sufficient access to ground truth to allow conclusive assessment.ResultsWe argue that simulated and experimental data should be considered of equal importance in bioinformatics methods assessment, filling complementary roles. Simulations calibrated by experimental data offer the opportunity to assess and improve methodology early on, so that methods may reach a mature level by the time the capacity to generate large-scale experimental data has been established. Furthermore, the specification of a simulation algorithm contributes to transparency by revealing a method’s data assumptions. In summary, we argue that new bioinformatics methods should be validated on ground truth data that allow valid and reliable assessment, be it experimental or simulated (and ideally both).
Categories: Bioinformatics Trends

Correction to: Topological analysis as a tool for detection of abnormalities in protein–protein interaction data

Thu, 08/09/2022 - 5:30am
This is a correction to: Alicja W Nowakowska, Malgorzata Kotulska, Topological analysis as a tool for detection of abnormalities in protein–protein interaction data, Bioinformatics, Volume 38, Issue 16, 15 August 2022, Pages 3968–3975, https://doi.org/10.1093/bioinformatics/btac440
Categories: Bioinformatics Trends

DRviaSPCN: a software package for drug repurposing in cancer via a subpathway crosstalk network

Tue, 06/09/2022 - 5:30am
AbstractSummaryDrug repurposing is an approach used to discover new indications for existing drugs. Recently, several computational approaches have been developed for drug repurposing in cancer. Nevertheless, no approaches have reported a systematic analysis of pathway crosstalk. Pathway crosstalk, which refers to the phenomenon of interaction or cooperation between pathways, is a critical aspect of tumor pathways that allows cancer cells to survive and acquire resistance to drug therapy. Here, we innovatively developed a system biology R-based software package, DRviaSPCN, to repurpose drugs for cancer via a subpathway (SP) crosstalk network. This package provides a novel approach to prioritize cancer candidate drugs by considering drug-induced SPs and their crosstalk effects. The operation modes mainly include construction of the SP network and calculation of the centrality scores of SPs to reflect the influence of SP crosstalk, calculation of enrichment scores of drug- and disease-induced dysfunctional SPs and weighted them by the centrality scores of SPs, evaluation of the drug-disease reverse association at the weighted SP level, identification of cancer candidate drugs, and visualization of the results. Its capabilities enable DRviaSPCN to find cancer candidate drugs, which will complement the recent tools which did not consider crosstalk among pathways/SPs. DRviaSPCN may help to facilitate the development of drug discovery.Availability and implementationThe package is implemented in R and available under GPL-2 license from the CRAN website (https://CRAN.R-project.org/package=DRviaSPCN).Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

Manifold-Classification of Neuron Types from Microscopic Images

Tue, 06/09/2022 - 5:30am
Abstract Analysis of cell types is recognized as a major task in current single cell genotyping and phenotyping. In neuroscience, 3-D neuron morphologies are often reconstructed from multi-dimensional microscopic images. Recent studies indicate that neurons could form very complicated distributions in the feature space, and thus they can be explored using manifold analysis. We have developed manifold-classification toolkit (MCT) software to replace the conventional clustering analysis to discover cell subtypes from three state-of-the-art collections of single neurons’ 3-D morphologies that reconstructed from images. We have gathered 9,208 3-D spatially registered whole mouse brain neurons from three datasets with the highest quality to date generated by the single neuron morphology community. To explore manifold distribution, our method uses minimum spanning tree based principal skeletons to approximate locally linear embeddings, to explore the morphological feature spaces, which correspond to dendritic arbors, axonal arbors, or both categories of arborization patterns of neurons. We show manifold classification is a suitable approach for a majority of often referred cell types, each of which was also discovered to contain multiple subtypes. Our results show an initial effort to employ manifold-classification but not traditional clustering analysis as an alternative framework for analyzing 3-D neuron morphologies reconstructed from 3-D microscopic images.AvailabilityFreely available at https://github.com/Mr-strlen/Cell_Pattern_Analysis_Tool.
Categories: Bioinformatics Trends

TVAR: Assessing Tissue-specific Functional Effects of Non-coding Variants with Deep Learning

Mon, 05/09/2022 - 5:30am
AbstractMotivationAnalysis of whole-genome sequencing (WGS) for genetics is still a challenge due to the lack of accurate functional annotation of noncoding variants, especially the rare ones. As eQTLs have been extensively implicated in the genetics of human diseases, we hypothesize that rare noncoding variants discovered in WGS play a regulatory role in predisposing disease risk.ResultsWith thousands of tissue- and cell-type-specific epigenomic features, we propose TVAR. This multi-label learning-based deep neural network predicts the functionality of noncoding variants in the genome based on eQTLs across 49 human tissues in the GTEx project. TVAR learns the relationships between high-dimensional epigenomics and eQTLs across tissues, taking the correlation among tissues into account to understand shared and tissue-specific eQTL effects. As a result, TVAR outputs tissue-specific annotations, with an average AUROC of 0.77 across these tissues. We evaluate TVAR’s performance on four complex diseases (coronary artery disease, breast cancer, Type 2 diabetes, and Schizophrenia), using TVAR’s tissue-specific annotations, and observe its superior performance in predicting functional variants for both common and rare variants, compared to five existing state-of-the-art tools. We further evaluate TVAR’s G-score, a scoring scheme across all tissues, on ClinVar, fine-mapped GWAS loci, Massive Parallel Reporter Assay (MPRA) validated variants, and observe the consistently better performance of TVAR compared to other competing tools.AvailabilityThe TVAR source code and its scores on the ClinVar catalog, fine mapped GWAS Loci, high confidence eQTLs from GTEx dataset, and MPRA validated functional variants are available at https://github.com/haiyang1986/TVAR.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

SD2: Spatially resolved transcriptomics deconvolution through integration of dropout and spatial information

Mon, 05/09/2022 - 5:30am
AbstractMotivationUnveiling the heterogeneity in the tissues is crucial to explore cell-cell interactions and cellular targets of human diseases. Spatial transcriptomics (ST) supplies spatial gene expression profile which has revolutionized our biological understanding, but variations in cell type proportions of each spot with dozens of cells would confound downstream analysis. Therefore, deconvolution of ST has been an indispensable step and a technical challenge towards the higher-resolution panorama of tissues.ResultsHere, we propose a novel ST deconvolution method called SD2 integrating spatial information of ST data and embracing an important characteristic, dropout, which is traditionally considered as an obstruction in single-cell RNA sequencing data (scRNA-seq) analysis. First, we extract the dropout-based genes as informative features from ST and scRNA-seq data by fitting a Michaelis-Menten function. After synthesizing pseudo-ST spots by randomly composing cells from scRNA-seq data, auto-encoder is applied to discover low-dimensional and non-linear representation of the real- and pseudo-ST spots. Next, we create a graph containing embedded profiles as nodes, and edges determined by transcriptional similarity and spatial relationship. Given the graph, a graph convolutional neural network is used to predict the cell-type compositions for real-ST spots. We benchmark the performance of SD2 on the simulated seqFISH+ dataset with different resolutions and measurements which show superior performance compared with the state-of-the-art methods. SD2 is further validated on three real-world datasets with different ST technologies, and demonstrates the capability to localize cell-type composition accurately with quantitive evidence. Finally, ablation study is conducted to verify the contribution of different modules proposed in SD2.AvailabilityThe SD2 is freely available in github (https://github.com/leihouyeung/SD2) and Zenodo (https://doi.org/10.5281/zenodo.7024684).
Categories: Bioinformatics Trends

Pages

Calendar

Mon
Tue
Wed
Thu
Fri
Sat
Sun
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
 
September 2022