Jump to Navigation

GlycoEnzOnto: A GlycoEnzyme Pathway and Molecular Function Ontology

Bioinformatics Oxford Journals - Tue, 25/10/2022 - 5:30am
AbstractMotivationThe ‘glycoEnzymes’ include a set of proteins having related enzymatic, metabolic, transport, structural and cofactor functions. Currently there is no established ontology to describe glycoEnzyme properties and to relate them to glycan biosynthesis pathways.ResultsWe present GlycoEnzOnto, an ontology describing 403 human glycoEnzymes curated along 139 glycosylation pathways, 134 molecular functions and 22 cellular compartments. The pathways described regulate nucleotide-sugar metabolism, glycosyl-substrate/donor transport, glycan biosynthesis, and degradation. The role of each enzyme in the glycosylation initiation, elongation/branching, and capping/termination phases is described. IUPAC linear strings present systematic human/machine readable descriptions of individual reaction steps and enable automated knowledge-based curation of biochemical networks. All GlycoEnzOnto knowledge is integrated with the Gene Ontology (GO) biological processes. GlycoEnzOnto enables improved transcript overrepresentation analyses and glycosylation pathway identification compared to other available schema, e.g. KEGG and Reactome. Overall, GlycoEnzOnto represents a holistic glycoinformatics resource for systems-level analyses.Availabilityhttps://github.com/neel-lab/GlycoEnzOntoSupplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

EcoTransLearn: an R-package to easily use Transfer Learning for Ecological Studies. A plankton case study

Bioinformatics Oxford Journals - Tue, 25/10/2022 - 5:30am
AbstractSummaryIn recent years, Deep Learning (DL) has been increasingly used in many fields, in particular in image recognition, due to its ability to solve problems where traditional machine learning algorithms fail. However, building an appropriate DL model from scratch, especially in the context of ecological studies, is a difficult task due to the dynamic nature and morphological variability of living organisms, as well as the high cost in terms of time, human resources and skills required to label a large number of training images. To overcome this problem, Transfer Learning (TL) can be used to improve a classifier by transferring information learnt from many domains thanks to a very large training set composed of various images, to another domain with a smaller amount of training data. To compensate the lack of “easy-to-use” software optimized for ecological studies, we propose the EcoTransLearn R-package, which allows greater automation in classification of images acquired with various devices (FlowCam, ZooScan, photographs, etc.), thanks to different TL methods pre-trained on the generic ImageNet dataset.Availability and ImplementationEcoTransLearn is an open-source package. It is implemented in R, and calls Python scripts for image classification step (using reticulate and tensorflow libraries). The source code, instruction manual and examples can be found at https://github.com/IFREMER-LERBL/EcoTransLearn.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

The LCD-Composer Webserver: High-Specificity Identification and Functional Analysis of Low-Complexity Domains in Proteins

Bioinformatics Oxford Journals - Tue, 25/10/2022 - 5:30am
AbstractSummaryLow-complexity domains (LCDs) in proteins are regions enriched in a small subset of amino acids. LCDs exist in all domains of life, often have unusual biophysical behavior, and function in both normal and pathological processes. We recently developed an algorithm to identify LCDs based predominantly on amino acid composition thresholds. Here, we have integrated this algorithm with a webserver and augmented it with additional analysis options. Specifically, users can: 1) search for LCDs in whole proteomes by setting minimum composition thresholds for individual or grouped amino acids, 2) submit a known LCD sequence to search for similar LCDs, 3) search for and plot LCDs within a single protein, 4) statistically test for enrichment of LCDs within a user-provided protein set, and 5) specifically identify proteins with multiple types of LCDs.AvailabilityThe LCD-Composer server can be accessed at http://lcd-composer.bmb.colostate.edu. The corresponding command-line scripts can be accessed at https://github.com/RossLabCSU/LCD-Composer/tree/master/WebserverScripts.
Categories: Bioinformatics Trends

AHoJ: rapid, tailored search and retrieval of apo and holo protein structures for user-defined ligands

Bioinformatics Oxford Journals - Tue, 25/10/2022 - 5:30am
AbstractSummaryUnderstanding the mechanism of action of a protein or designing better ligands for it, often requires access to a bound (holo) and an unbound (apo) state of the protein. Resources for the quick and easy retrieval of such conformations are severely limited.Apo-Holo Juxtaposition (AHoJ), is a web application for retrieving apo-holo structure pairs for user-defined ligands. Given a query structure and one or more user-specified ligands, it retrieves all other structures of the same protein that feature the same binding site(s), aligns them, and examines the superimposed binding sites to determine whether each structure is apo or holo, in reference to the query. The resulting superimposed datasets of apo-holo pairs can be visualized and downloaded for further analysis. AHoJ accepts multiple input queries, allowing the creation of customized apo-holo datasets.AvailabilityFreely available for non-commercial use at http://apoholo.cz. Source code available at https://github.com/cusbg/AHoJ-project.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

Correction to: Efficient permutation-based genome-wide association studies for normal and skewed phenotypic distributions

Bioinformatics Oxford Journals - Sat, 22/10/2022 - 5:30am
This is a correction to: Maura John, Markus J. Ankenbrand, Carolin Artmann, Jan A. Freudenthal, Arthur Korte, and Dominik G. Grimm, Efficient permutation-based genome-wide association studies for normal and skewed phenotypic distributions, Bioinformatics, Volume 38, Issue Supplement_2, September 2022, Pages ii5–ii12, https://doi.org/10.1093/bioinformatics/btac455
Categories: Bioinformatics Trends

OmicsEV: a tool for comprehensive quality evaluation of omics data tables

Bioinformatics Oxford Journals - Sat, 22/10/2022 - 5:30am
AbstractSummaryRNA-Seq and mass spectrometry-based studies generate omics data tables with measurements for tens of thousands of genes across all samples in a study. The success of a study relies on the quality of these data tables, which is determined by both experimental data generation and computational methods used to process raw experimental data into quantitative data tables. We present OmicsEV, an R package for quality evaluation of omics data tables. For each data table, OmicsEV uses a series of methods to evaluate data depth, data normalization, batch effect, biological signal, platform reproducibility, and multi-omics concordance, producing comprehensive visual and quantitative evaluation results that help assess data quality of individual data tables and facilitate the identification of the optimal data processing method and parameters for the omics study under investigation.AvailabilityThe source code and the user manual of OmicsEV are available at https://github.com/bzhanglab/OmicsEV, and the source code is released under the GPL-3 license.
Categories: Bioinformatics Trends

CRISPRon/off: CRISPR/Cas9 on- and off-target gRNA design

Bioinformatics Oxford Journals - Sat, 22/10/2022 - 5:30am
AbstractSummaryThe effectiveness of CRISPR/Cas9-mediated genome editing experiments largely depends on the guide RNA (gRNA) used by the CRISPR/Cas9 system for target recognition and cleavage activation. Careful design is necessary to select a gRNA with high editing efficiency at the on-target site and with minimum off-target potential. Here we present our webserver for gRNA design with a user-friendly graphical interface, which provides interoperability between our on- and off-target prediction tools, CRISPRon and CRISPRoff, for a complete and streamlined gRNA selection.Availability and implementationThe graphical interface uses the Integrative Genomic Viewer (IGV) JavaScript plugin. The backend tools are implemented in Python and C. The CRISPRon and CRISPRoff webservers and command-line tools are freely available at https://rth.dk/resources/crispr.
Categories: Bioinformatics Trends

DeepPerVar: a multimodal deep learning framework for functional interpretation of genetic variants in personal genome

Bioinformatics Oxford Journals - Sat, 22/10/2022 - 5:30am
AbstractMotivationUnderstanding the functional consequence of genetic variants, especially the noncoding ones, is important but particularly challenging. Genome-wide association studies or quantitative trait locus analyses may be subject to limited statistical power and linkage disequilibrium, and thus are less optimal to pinpoint the causal variants. Moreover, most existing machine learning approaches, which exploit the functional annotations to interpret and prioritize putative causal variants, cannot accommodate the heterogeneity of personal genetic variations and traits in a population study, targeting a specific disease.ResultsBy leveraging paired whole genome sequencing data and epigenetic functional assays in a population study, we propose a multi-modal deep learning framework to predict genome-wide quantitative epigenetic signals by considering both personal genetic variations and traits. The proposed approach can further evaluate the functional consequence of noncoding variants on an individual level by quantifying the allelic difference of predicted epigenetic signals. By applying the approach to the ROSMAP cohort studying Alzheimer’s disease (AD), we demonstrate that the proposed approach can accurately predict quantitative genome-wide epigenetic signals and in key genomic regions of AD causal genes, learn canonical motifs reported to regulate gene expression of AD causal genes, improve the partitioning heritability analysis, and prioritize putative causal variants in a GWAS risk locus. Finally, we release the proposed deep learning model as a stand-alone Python toolkit and a web server.Availabilityhttps://github.com/lichen-lab/DeepPerVar
Categories: Bioinformatics Trends

Defining the extent of gene function using ROC curvature

Bioinformatics Oxford Journals - Sat, 22/10/2022 - 5:30am
AbstractMotivationInteractions between proteins help us understand how genes are functionally related and how they contribute to phenotypes. Experiments provide imperfect “ground truth” information about a small subset of potential interactions in a specific biological context, which can then be extended to the whole genome across different contexts, such as conditions, tissues, or species, through machine learning methods. However, evaluating the performance of these methods remains a critical challenge. Here, we propose to evaluate the generalizability of gene characterizations through the shape of performance curves.ResultsWe identify Functional Equivalence Classes (FECs), subsets of annotated and unannotated genes that jointly drive performance, by assessing the presence of straight lines in ROC curves built from gene-centric prediction tasks, such as function or interaction predictions. FECs are widespread across data types and methods, they can be used to evaluate the extent and context-specificity of functional annotations in a data-driven manner. For example, FECs suggest that B cell markers can be decomposed into shared primary markers (10 to 50 genes), and tissue-specific secondary markers (100 to 500 genes). In addition, FECs suggest the existence of functional modules that span a wide range of the genome, with marker sets spanning at most 5% of the genome and data-driven extensions of Gene Ontology sets spanning up to 40% of the genome. Simple to assess visually and statistically, the identification of FECs in performance curves paves the way for novel functional characterization and increased robustness in the definition of functional gene sets.AvailabilityCode for analyses and figures is available at https://github.com/yexilein/pyroc.Supplementary informationSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

LaGAT: Link-aware Graph Attention Network for Drug-Drug Interaction Prediction

Bioinformatics Oxford Journals - Sat, 22/10/2022 - 5:30am
AbstractMotivationDrug-drug interaction (DDI) prediction is a challenging problem in pharmacology and clinical applications. With the increasing availability of large biomedical databases, large-scale biological knowledge graphs containing drug information have been widely used for DDI prediction. However, large knowledge graphs inevitably suffer from data noise problems, which limit the performance and interpretability of models based on the knowledge graph. Recent studies attempt to improve models by introducing inductive bias through an attention mechanism. However, they all only depend on the topology of entity nodes independently to generate fixed attention pathways, without considering the semantic diversity of entity nodes in different drug pair links. This makes it difficult for models to select more meaningful nodes to overcome data quality limitations and make more interpretable predictions.ResultsTo address this issue, we propose a Link-aware Graph Attention method for DDI prediction, called LaGAT, which is able to generate different attention pathways for drug entities based on different drug pair links. For a drug pair link, the LaGAT uses the embedding representation of one of the drugs as a query vector to calculate the attention weights, thereby selecting the appropriate topological neighbor nodes to obtain the semantic information of the other drug. We separately conduct experiments on binary and multi-class classification and visualize the attention pathways generated by the model. The results prove that LaGAT can better capture semantic relationships and achieves remarkably superior performance over both the classical and state-of-the-art models on DDI prediction.AvailabilityThe source code and data are available at https://github.com/Azra3lzz/LaGATSupplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

SYNPHONI: scale-free & phylogeny-aware reconstruction of synteny conservation & transformation across animal genomes

Bioinformatics Oxford Journals - Fri, 21/10/2022 - 5:30am
AbstractSummaryCurrent approaches detect conserved genomic order either at chromosomal (macro-synteny) or at subchromosomal scales (microsynteny). The latter generally requires collinearity and hard thresholds on syntenic region size, thus excluding a major proportion of syntenies with recent expansions or minor rearrangements. “SYNPHONI” bridges the gap between micro- and macro-synteny detection, providing detailed information on both synteny conservation and transformation throughout the evolutionary history of animal genomes.Availability and ImplementationSource code is freely available 'here' {{https://github.com/nsmro/SYNPHONI}}, implemented in Python3.9.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

Improving and evaluating deep learning models of cellular organization

Bioinformatics Oxford Journals - Thu, 20/10/2022 - 5:30am
AbstractMotivationCells contain dozens of major organelles and thousands of other structures, many of which vary extensively in their number, size, shape and spatial distribution. This complexity and variation dramatically complicates the use of both traditional and deep learning methods to build accurate models of cell organization. Most cellular organelles are distinct objects with defined boundaries that do not overlap, while the pixel resolution of most imaging methods is not sufficient to resolve these boundaries. Thus while cell organization is conceptually object-based, most current methods are pixel-based. Using extensive image collections in which particular organelles were fluorescently-labeled, deep learning methods can be used to build conditional autoencoder models for particular organelles. A major advance occurred with the use of a U-net approach to make multiple models all conditional upon a common reference, unlabeled image, allowing the relationships between different organelles to be at least partially inferred.ResultsWe have developed improved GAN-based approaches for learning these models and have also developed novel criteria for evaluating how well synthetic cell images reflect the properties of real images. The first set of criteria measure how well models preserve the expected property that organelles do not overlap. We also developed a modified loss function that allows retraining of the models to minimize that overlap. The second set of criteria uses object-based modeling to compare object shape and spatial distribution between synthetic and real images. Our work provides the first demonstration that, at least for some organelles, deep learning models can capture object-level properties of cell images.Availabilityhttp://murphylab.cbd.cmu.edu/Software/2022_insilico.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

A novel pipeline for computerized mouse spermatogenesis staging

Bioinformatics Oxford Journals - Thu, 20/10/2022 - 5:30am
AbstractMotivationDifferentiating 12 stages of mouse seminiferous epithelial cycle is vital towards understanding the dynamic spermatogenesis process. However, it is challenging since two adjacent spermatogenic stages are morphologically similar. Distinguishing Stages I-III from Stages IV-V is important for histologists to understand sperm development in wildtype mice and spermatogenic defects in infertile mice. To achieve this, we propose a novel pipeline for Computerized Spermatogenesis Staging (CSS).ResultsThe CSS pipeline comprises four parts: 1) A seminiferous tubule segmentation model is developed to extract every single tubule; 2) A Multi-Scale Learning (MSL) model is developed to integrate local and global information of a seminiferous tubule to distinguish Stages I-V from Stages VI-XII; 3) A Multi-Task Learning (MTL) model is developed to segment the Multiple Testicular Cells (MTCs) for Stages I-V without an exhaustive requirement for manual annotation; 4) A set of 204-dimensional image-derived features is developed to discriminate Stages I-III from Stages IV-V by capturing cell-level and image-level representation. Experimental results suggest that the proposed MSL and MTL models outperform classic single-scale and single-task models when manual annotation is limited. In addition, the proposed image-derived features are discriminative between Stages I-III and Stages IV-V. In conclusion, the CSS pipeline can not only provide histologists with a solution to facilitate quantitative analysis for spermatogenesis stage identification but also help them to uncover novel computerized image-derived biomarkers.Availability and implementationhttps://github.com/jydada/CSSSupplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

MAGScoT - a fast, lightweight, and accurate bin-refinement tool

Bioinformatics Oxford Journals - Thu, 20/10/2022 - 5:30am
AbstractMotivationRecovery of metagenome-assembled genomes (MAGs) from shotgun metagenomic data is an important task for the comprehensive analysis of microbial communities from variable sources. Single binning tools differ in their ability to leverage specific aspects in MAG reconstruction, the use of ensemble binning refinement tools is often time consuming and computational demand increases with community complexity. We introduce MAGScoT, a fast, lightweight and accurate implementation for the reconstruction of highest-quality MAGs from the output of multiple genome-binning tools.ResultsMAGScoT outperforms popular bin-refinement solutions in terms of quality and quantity of MAGs as well as computation time and resource consumption.AvailabilityMAGScoT is available via GitHub (https://github.com/ikmb/MAGScoT) and as an easy-to-use Docker container (https://hub.docker.com/repository/docker/ikmb/magscot).Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online. All scripts to produce the binning results and the subsequent refinement are available via GitHub (https://github.com/mruehlemann/MAGScoT_benchmarking_scripts).
Categories: Bioinformatics Trends

Powerful and interpretable control of false discoveries in two-group differential expression studies

Bioinformatics Oxford Journals - Thu, 20/10/2022 - 5:30am
AbstractMotivationThe standard approach for statistical inference in differential expression (DE) analyses is to control the False Discovery Rate (FDR). However, controlling the FDR does not in fact imply that the proportion of false discoveries is upper bounded. Moreover, no statistical guarantee can be given on subsets of genes selected by FDR thresholding. These known limitations are overcome by post hoc inference, which provides guarantees of the number of proportion of false discoveries among arbitrary gene selections. However, post hoc inference methods are not yet widely used for DE studies.ResultsIn this paper, we demonstrate the relevance and illustrate the performance of adaptive interpolation-based post hoc methods for two-group DE studies. First, we formalize the use of permutation-based methods to obtain sharp confidence bounds that are adaptive to the dependence between genes. Then, we introduce a generic linear time algorithm for computing post hoc bounds, making these bounds applicable to large-scale two-group DE studies. The use of the resulting Adaptive Simes bound is illustrated on a RNA sequencing study. Comprehensive numerical experiments based on real microarray and RNA sequencing data demonstrate the statistical performance of the method.Availability and implementationA cross-platform open source implementation within the R package sanssouci is available at https://sanssouci-org.github.io/sanssouci/.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online. Rmarkdown vignettes for the differential analysis of microarray and RNAseq data are available from the package.
Categories: Bioinformatics Trends

ATLIGATOR: Editing protein interactions with an atlas-based approach

Bioinformatics Oxford Journals - Wed, 19/10/2022 - 5:30am
AbstractMotivationRecognition of specific molecules by proteins is a fundamental cellular mechanism and relevant for many applications. Being able to modify binding is a key interest and can be achieved by repurposing established interaction motifs. We were specifically interested in a methodology for the design of peptide binding modules. By leveraging interaction data from known protein structures, we plan to accelerate the design of novel protein or peptide binders.ResultsWe developed ATLIGATOR—a computational method to support the analysis and design of a protein’s interaction with a single side chain. Our program enables the building of interaction atlases based on structures from the PDB. From these atlases pocket definitions are extracted that can be searched for frequent interactions. These searches can reveal similarities in unrelated proteins as we show here for one example. Such frequent interactions can then be grafted onto a new protein scaffold as a starting point of the design process. The ATLIGATOR tool is made accessible through a python API as well as a CLI with python scripts.Availability and ImplementationSource code can be downloaded at github (https://www.github.com/Hoecker-Lab/atligator), installed from PyPI (“atligator”) and is implemented in Python 3.
Categories: Bioinformatics Trends

CoxMKF: A Knockoff Filter for High-Dimensional Mediation Analysis with a Survival Outcome in Epigenetic Studies

Bioinformatics Oxford Journals - Tue, 18/10/2022 - 5:30am
AbstractMotivationIt is of scientific interest to identify DNA methylation CpG sites that might mediate the effect of an environmental exposure on a survival outcome in high-dimensional mediation analysis. However, there is a lack of powerful statistical methods that can provide a guarantee of false discovery rate (FDR) control in finite-sample settings.ResultsIn this article, we propose a novel method called CoxMKF, which applies aggregation of multiple knockoffs to a Cox proportional hazards model for a survival outcome with high-dimensional mediators. The proposed CoxMKF can achieve FDR control even in finite-sample settings, which is particularly advantageous when the sample size is not large. Moreover, our proposed CoxMKF can overcome the randomness of the unstable model-X knockoffs. Our simulation results show that CoxMKF controls FDR well in finite samples. We further apply CoxMKF to a lung cancer data set from The Cancer Genome Atlas (TCGA) project with 754 subjects and 365 306 DNA methylation CpG sites, and identify four DNA methylation CpG sites that might mediate the effect of smoking on the overall survival among lung cancer patients.AvailabilityThe R package CoxMKF is publicly available at https://github.com/MinhaoYaooo/CoxMKF.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

AIscEA: Unsupervised Integration of Single-cell Gene Expression and Chromatin Accessibility via Their Biological Consistency

Bioinformatics Oxford Journals - Mon, 17/10/2022 - 5:30am
AbstractMotivationThe integrative analysis of single-cell gene expression and chromatin accessibility measurements is essential for revealing gene regulation, but it is one of the key challenges in computational biology. Gene expression and chromatin accessibility are measurements from different modalities, and no common features can be directly used to guide integration. Current state-of-the-art methods lack practical solutions for finding heterogeneous clusters. However, previous methods might not generate reliable results when cluster heterogeneity exists. More importantly, current methods lack an effective way to select hyper-parameters under an unsupervised setting. Therefore, applying computational methods to integrate single-cell gene expression and chromatin accessibility measurements remains difficult.ResultsWe introduce AIscEA—Alignment-based Integration of single-cell gene Expression and chromatin Accessibility—a computational method that integrates single-cell gene expression and chromatin accessibility measurements using their biological consistency. AIscEA first defines a ranked similarity score to quantify the biological consistency between cell clusters across measurements. AIscEA then uses the ranked similarity score and a novel permutation test to identify cluster alignment across measurements. AIscEA further utilizes graph alignment for the aligned cell clusters to align the cells across measurements. We compared AIscEA with the competing methods on several benchmark datasets and demonstrated that AIscEA is highly robust to the choice of hyper-parameters and can better handle the cluster heterogeneity problem. Furthermore, AIscEA significantly outperforms the state-of-the-art methods when integrating real-world SNARE-seq and scMultiome-seq datasets in terms of integration accuracy.AvailabilityAIscEA is available at https://figshare.com/articles/software/AIscEA_zip/21291135 on FigShare as well as https://github.com/elhaam/AIscEA on GitHub.
Categories: Bioinformatics Trends

CEDA: integrating gene expression data with CRISPR pooled screen data identifies essential genes with higher expression

Bioinformatics Oxford Journals - Mon, 17/10/2022 - 5:30am
AbstractMotivationCRISPR-based genetic perturbation screen is a powerful tool to probe gene function. However, experimental noises, especially for the lowly expressed genes, need to be accounted for to maintain proper control of false positive rate.MethodWe develop a statistical method, named CRISPR screen with Expression Data Analysis (CEDA), to integrate gene expression profiles and CRISPR screen data for identifying essential genes. CEDA stratifies genes based on expression level and adopts a three-component mixture model for the log-fold change of single-guide RNAs (sgRNAs). Empirical Bayesian prior and Expectation-Maximization algorithm are used for parameter estimation and false discovery rate inference.ResultsTaking advantage of gene expression data, CEDA identifies essential genes with higher expression. Compared to existing methods, CEDA shows comparable reliability but higher sensitivity in detecting essential genes with moderate sgRNA fold change. Therefore, using the same CRISPR data, CEDA generates an additional hit gene list.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

scGNN 2.0: a graph neural network tool for imputation and clustering of single-cell RNA-Seq data

Bioinformatics Oxford Journals - Mon, 17/10/2022 - 5:30am
AbstractMotivationGene expression imputation has been an essential step of the single-cell RNA-Seq data analysis workflow. Among several deep learning methods, the debut of scGNN gained substantial recognition in 2021 for its superior performance and the ability to produce a cell-cell graph. However, the implementation of scGNN was relatively time-consuming and its performance could still be optimized.ResultsThe implementation of scGNN 2.0 is significantly faster than scGNN thanks to a simplified close-loop architecture. For all eight datasets, cell clustering performance was increased by 85.02% on average in terms of adjusted rand index, and the imputation Median L1 Error was reduced by 67.94% on average. With the built-in visualizations, users can quickly assess the imputation and cell clustering results, compare against benchmarks, and interpret the cell-cell interaction. The expanded input and output formats also pave the way for custom workflows that integrate scGNN 2.0 with other scRNA-Seq toolkits on both Python and R platforms.AvailabilityscGNN 2.0 is implemented in Python (as of version 3.8) with the source code available at https://github.com/OSU-BMBL/scGNN2.0.Supplementary informationSupplementary files are available at Bioinformatics online.
Categories: Bioinformatics Trends

Pages

Subscribe to Centre for Bioinformatics aggregator - Bioinformatics Trends

Calendar

Mon
Tue
Wed
Thu
Fri
Sat
Sun
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
 
December 2022