Jump to Navigation
Subscribe to Bioinformatics Oxford Journals feed
Updated: 5 hours 17 min ago

CCPLS reveals cell-type-specific spatial dependence of transcriptomes in single cells

Mon, 05/09/2022 - 5:30am
AbstractMotivationCell-cell communications regulate internal cellular states, e.g., gene expression and cell functions, and play pivotal roles in normal development and disease states. Furthermore, single-cell RNA sequencing methods have revealed cell-to-cell expression variability of highly variable genes (HVGs), which is also crucial. Nevertheless, the regulation on cell-to-cell expression variability of HVGs via cell-cell communications is still largely unexplored. The recent advent of spatial transcriptome methods has linked gene expression profiles to the spatial context of single cells, which has provided opportunities to reveal those regulations. The existing computational methods extract genes with expression levels influenced by neighboring cell types. However, limitations remain in the quantitativeness and interpretability: they neither focus on HVGs nor consider the effects of multiple neighboring cell types.ResultsHere, we propose CCPLS (Cell-Cell communications analysis by Partial Least Square regression modeling), which is a statistical framework for identifying cell-cell communications as the effects of multiple neighboring cell types on cell-to-cell expression variability of HVGs, based on the spatial transcriptome data. For each cell type, CCPLS performs PLS regression modeling and reports coefficients as the quantitative index of the cell-cell communications. Evaluation using simulated data showed our method accurately estimated the effects of multiple neighboring cell types on HVGs. Furthermore, applications to the two real datasets demonstrate that CCPLS can extract biologically interpretable insights from the inferred cell-cell communications.AvailabilityThe R package is available at https://github.com/bioinfo-tsukuba/CCPLS. The data are available at https://github.com/bioinfo-tsukuba/CCPLS_paper.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

ABEILLE: a novel method for ABerrant Expression Identification empLoying machine Learning from RNA-sequencing data

Mon, 05/09/2022 - 5:30am
AbstractMotivationCurrent advances in omics technologies are paving the diagnosis of rare diseases proposing as a complementary assay to identify the responsible gene. The use of transcriptomic data to identify aberrant gene expression (AGE) have demonstrated to yield potential pathogenic events. However popular approaches for AGE identification are limited by the use of statistical tests that imply the choice of arbitrary cut-off for significance assessment and the availability of several replicates not always possible in clinical contexts.ResultsHence we developed ABEILLE (ABerrant Expression Identification empLoying machine LEarning from sequencing data) a variational autoencoder (VAE) based method for the identification of AGEs from the analysis of RNA-seq data without the need of replicates or a control group. ABEILLE combines the use of a VAE, able to model any data without specific assumptions on their distribution, and a decision tree to classify genes as AGE or non-AGE. An anomaly score is associated to each gene in order to stratify AGE by severity of aberration. We tested ABEILLE on semi-synthetic and an experimental dataset demonstrating the importance of the flexibility of the VAE configuration to identify potential pathogenic candidates.AvailabilityABEILLE source code is freely available at : https://github.com/UCA-MSI/ABEILLE.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

Systematic Replication Enables Normalization of High-throughput Imaging Assays

Mon, 05/09/2022 - 5:30am
AbstractMotivationHigh-throughput fluorescent microscopy is a popular class of techniques for studying tissues and cells through automated imaging and feature extraction of hundreds to thousands of samples. Like other high-throughput assays, these approaches can suffer from unwanted noise and technical artifacts that obscure the biological signal. In this work we consider how an experimental design incorporating multiple levels of replication enables removal of technical artifacts from such image-based platforms.ResultsWe develop a general approach to remove technical artifacts from high-throughput image data that leverages an experimental design with multiple levels of replication. To illustrate the methods we consider microenvironment microarrays (MEMAs), a high-throughput platform designed to study cellular responses to microenvironmental perturbations. In application on MEMAs, our approach removes unwanted spatial artifacts and thereby enhances the biological signal. This approach has broad applicability to diverse biological assays.AvailabilityRaw data is on synapse (syn2862345), analysis code is on github: gjhunt/mema_norm, a reproducible Docker image is available on dockerhub: gjhunt/mema_norm.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

A closed formula relevant to “Theory of local k-mer selection with applications to long-read alignment” by Jim Shaw and Yun William Yu

Mon, 05/09/2022 - 5:30am
AbstractMotivationTo handle the volume from next-generation sequencing data, modern sequence comparison often relies on summary sequence sketches such as minimizers, syncmers, and minimally overlapping words. Let us call an oligonucleotide of length k a k-mer. With the aim of anticipating the practical performance of a rule f that selects the k-mers in a sketch, Theorem 2 of Shaw and Yu gives a formula quantifying conservation of a sketch in the presence of a sequence mutation probability θ per base. Shaw and Yu give a four-variable recursion for computing the formula, a computation that is complicated, difficult to implement, and computationally expensive for large parameter values.ResultsFor minimizers, the earliest of the k-mer sketches, this letter shows that Shaw and Yu’s recursion is equivalent to a simple explicit formula. The proof of the explicit formula can be generalized, with applications to other sequence sketches likely.
Categories: Bioinformatics Trends

Differential RNA Methylation Analysis for MeRIP-seq Data under General Experimental Design

Mon, 05/09/2022 - 5:30am
AbstractMotivationRNA epigenetics is an emerging field to study the post-transcriptional gene regulation. The dynamics of RNA epigenetic modification have been reported to associate with many human diseases. Recently developed high-throughput technology named Methylated RNA Immunoprecipitation Sequencing (MeRIP-seq) enables the transcriptome-wide profiling of N6-methyladenosine (m6A) modification and comparison of RNA epigenetic modifications. There are a few computational methods for the comparison of mRNA modifications under different conditions but they all suffer from serious limitations.ResultsIn this work, we develop a novel statistical method to detect differentially methylated mRNA regions from MeRIP-seq data. We model the sequence count data by a hierarchical negative binomial model that accounts for various sources of variations, and derive parameter estimation and statistical testing procedures for flexible statistical inferences under general experimental designs. Extensive benchmark evaluations in simulation and real data analyses demonstrate that our method is more accurate, robust, and flexible compared to existing methods.AvailabilityOur method TRESS is implemented as an R/Bioconductor package and is available at https://bioconductor.org/packages/devel/TRESS.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

grenepipe: A flexible, scalable, and reproducible pipeline to automate variant calling from sequence reads

Fri, 02/09/2022 - 5:30am
AbstractSummaryWe developed grenepipe, an all-in-one Snakemake workflow to streamline the data processing from raw high-throughput sequencing data of individuals or populations to genotype variant calls. Our pipeline offers a range of popular software tools within a single configuration file, automatically installs software dependencies, is highly optimized for scalability in cluster environments, and runs with a single command.Availabilitygrenepipe is published under the GPLv3, and freely available at github.com/moiexpositoalonsolab/grenepipe
Categories: Bioinformatics Trends

BERN2: an advanced neural biomedical named entity recognition and normalization tool

Fri, 02/09/2022 - 5:30am
AbstractSummaryIn biomedical natural language processing, named entity recognition (NER) and named entity normalization (NEN) are key tasks that enable the automatic extraction of biomedical entities (e.g., diseases and drugs) from the ever-growing biomedical literature. In this paper, we present BERN2 (Advanced Biomedical Entity Recognition and Normalization), a tool that improves the previous neural network-based NER tool (Kim et al., 2019) by employing a multi-task NER model and neural network-based NEN models to achieve much faster and more accurate inference. We hope that our tool can help annotate large-scale biomedical texts for various tasks such as biomedical knowledge graph construction.Availability and implementationWeb service of BERN2 is publicly available at http://bern2.korea.ac.kr. We also provide local installation of BERN2 at https://github.com/dmis-lab/BERN2.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

I2b2-etl: Python application for importing Electronic Health Data into the Informatics for Integrating Biology and the Bedside Platform

Fri, 02/09/2022 - 5:30am
AbstractMotivationThe i2b2 platform is used at major academic health institutions and research consortia for querying for electronic health data. However, a major obstacle for wider utilization of the platform is the complexity of data-loading that entails a steep curve of learning the platform’s complex data-schemas. To address this problem, we have developed the i2b2-etl package that simplifies the data loading process, which will facilitate wider deployment and utilization of the platform.ResultsWe have implemented i2b2-etl as a Python application that imports ontology and patient data using simplified input file schemas and provides inbuilt record-number de-identification and data-validation. We describe a real-world deployment of i2b2-etl for a population-management initiative at MassGeneral Brigham.Availabilityi2b2-etl is a free, open-source application implemented in Python available under the Mozilla 2 license. The application can be downloaded as compiled docker images. A live demo is available at https://i2b2clinical.org/demo-i2b2etl/ (username: demo, password: Etl@2021).Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

Heritability estimation for a linear combination of phenotypes via ridge regression

Fri, 02/09/2022 - 5:30am
AbstractMotivationThe joint analysis of multiple phenotypes is important in many biological studies, such as plant and animal breeding. The heritability estimation for a linear combination of phenotypes is designed to account for correlation information. Existing methods for estimating heritability mainly focus on single phenotypes under random-effect models. These methods also require some stringent conditions, which calls for a more flexible and interpretable method for estimating heritability. Fixed-effect models emerge as a useful alternative.ResultsIn this paper, we propose a novel heritability estimator based on multivariate ridge regression for linear combinations of phenotypes, yielding accurate estimates in both sparse and dense cases. Under mild conditions in the high-dimensional setting, the proposed estimator appears to be consistent and asymptotically normally distributed. Simulation studies show that the proposed estimator is promising under different scenarios. Compared with independently combined heritability estimates in the case of multiple phenotypes, the proposed method significantly improves the performance by considering correlations among those phenotypes. We further demonstrate its application in heritability estimation and correlation analysis for the Oryza sativa rice dataset.Availability and implementationAn R package implementing the proposed method is available at https://github.com/xg-SUFE1/MultiRidgeVar, where covariance estimates are also given together with heritability estimates.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

pcnaDeep: A Fast and Robust Single-Cell Tracking Method Using Deep-Learning Mediated Cell Cycle Profiling

Thu, 01/09/2022 - 5:30am
Abstract Computational methods that track single-cells and quantify fluorescent biosensors in time-lapse microscopy images have revolutionised our approach in studying the molecular control of cellular decisions. One barrier that limits the adoption of single-cell analysis in biomedical research is the lack of efficient methods to robustly track single-cells over cell division events. Here, we developed an application that automatically tracks and assigns mother-daughter relationships of single-cells. By incorporating cell cycle information from a well-established fluorescent cell cycle reporter, we associate mitosis relationships enabling high fidelity long-term single-cell tracking. This was achieved by integrating a deep-learning based fluorescent PCNA signal instance segmentation module with a cell tracking and cell cycle resolving pipeline. The application offers a user-friendly interface and extensible APIs for customized cell cycle analysis and manual correction for various imaging configurations.AvailabilitypcnaDeep is an open-source Python application under the Apache 2.0 licence. The source code, documentation and tutorials are available at https://github.com/chan-labsite/PCNAdeep.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

MetaboAnnotator: An efficient toolbox to annotate metabolites in genome-scale metabolic reconstructions

Thu, 01/09/2022 - 5:30am
AbstractMotivationGenome-scale metabolic reconstructions have been assembled for thousands of organisms using a wide-range of tools. However, metabolite annotations, required to compare and link metabolites between reconstructions remain incomplete. Here, we aim to further extend metabolite annotation coverage using various databases and chemoinformatic approaches.ResultsWe developed a COBRA toolbox extension, deemed MetaboAnnotator, which facilitates the comprehensive annotation of metabolites with database independent and dependent identifiers, obtains molecular structure files, and calculates metabolite formula and charge at pH 7.2. The resulting metabolite annotations allow for subsequent cross-mapping between reconstructions and mapping of, e.g., metabolomic data.AvailabilityMetaboAnnotator and tutorials are freely available at https://github.com/opencobra.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

No means ‘No’; a non-im-proper modeling approach, with embedded speculative context

Tue, 30/08/2022 - 5:30am
AbstractMotivationThe medical data are complex in nature as terms that appear in records usually appear in different contexts. Through this paper, we investigate various bio model’s embeddings(BioBERT, BioELECTRA, PubMedBERT) on their understanding of "negation and speculation context" wherein we found that these models were unable to differentiate "negated context" vs "non-negated context". To measure the understanding of models, we used cosine similarity scores of negated sentence embeddings vs non-negated sentence embeddings pairs. For improving these models, we introduce a generic super tuning approach to enhance the embeddings on "negation and speculation context" by utilizing a synthesized dataset.ResultsAfter super-tuning the models we can see that the model’s embeddings are now understanding negative and speculative contexts much better. Furthermore, we fine-tuned the super tuned models on various tasks and we found that the model has outperformed the previous models and achieved state-of-the-art (SOTA) on negation, speculation cue, and scope detection tasks on BioScope abstracts and Sherlock dataset. We also confirmed that our approach had a very minimal trade-off in the performance of the model in other tasks like Natural Language Inference after super-tuning.AvailabilityThe source code and the models are available at: https://github.com/comprehend/engg-airesearch/tree/uncertainty-super-tuning.Supplementary informationSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

SEMgraph: an R Package for Causal Network Inference of High-Throughput Data with Structural Equation Models

Tue, 30/08/2022 - 5:30am
AbstractMotivationWith the advent of high-throughput sequencing (HTS) in molecular biology and medicine, the need for scalable statistical solutions for modeling complex biological systems has become of critical importance. The increasing number of platforms and possible experimental scenarios raised the problem of integrating large amounts of new heterogeneous data and current knowledge, to test novel hypotheses and improve our comprehension of physiological processes and diseases.ResultsCombining network analysis and causal inference within the framework of structural equation modeling (SEM), we developed the R package SEMgraph. It provides a fully automated toolkit, managing complex biological systems as multivariate networks, ensuring robustness and reproducibility through data-driven evaluation of model architecture and perturbation, that is readily interpretable in terms of causal effects among system components.AvailabilitySEMgraph package is available at https://cran.r-project.org/web/packages/SEMgraph.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

scFeatures: Multi-view representations of single-cell and spatial data for disease outcome prediction

Tue, 30/08/2022 - 5:30am
AbstractMotivationWith the recent surge of large-cohort scale single cell research, it is of critical importance that analytical methods can fully utilize the comprehensive characterization of cellular systems that single cell technologies produce to provide insights into samples from individuals. Currently, there is little consensus on the best ways to compress information from the complex data structures of these technologies to summary statistics that represent each sample (e.g. individuals).ResultsHere, we present scFeatures, an approach that creates interpretable cellular and molecular representations of single-cell and spatial data at the sample level. We demonstrate that summarising a broad collection of features at the sample level is both important for understanding underlying disease mechanisms in different experimental studies and for accurately classifying disease status of individuals.AvailabilityscFeatures is publicly available as an R package at https://github.com/SydneyBioX/scFeatures. All data used in this study are publicly available with accession ID reported in the Methods section.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

Tangent normalization for somatic copy-number inference in cancer genome analysis

Tue, 30/08/2022 - 5:30am
AbstractMotivationSomatic copy-number alterations (SCNAs) play an important role in cancer development. Systematic noise in sequencing and array data present a significant challenge to the inference of SCNAs for cancer genome analyses. As part of The Cancer Genome Atlas (TCGA), the Broad Institute Genome Characterization Center developed the Tangent normalization method to generate copy-number profiles using data from single-nucleotide polymorphism (SNP) arrays and whole-exome sequencing (WES) technologies for over 10,000 pairs of tumors and matched normal samples. Here, we describe the Tangent method, which uses a unique linear combination of normal samples as a reference for each tumor sample, to subtract systematic errors that vary across samples. We also describe a modification of Tangent, called Pseudo-Tangent, which enables denoising through comparisons between tumor profiles when few normal samples are available.ResultsTangent normalization substantially increases signal-to-noise ratios (SNRs) compared to conventional normalization methods in both SNP array and WES analyses. Tangent and Pseudo-Tangent normalizations improve the SNR by reducing noise with minimal effect on signal and exceed the contribution of other steps in the analysis such as choice of segmentation algorithm. Tangent and Pseudo-Tangent are broadly applicable and enable more accurate inference of SCNAs from DNA sequencing and array data.AvailabilityTangent is available at https://github.com/broadinstitute/tangent and as a Docker image (https://hub.docker.com/r/broadinstitute/tangent). Tangent is also the normalization method for the copy-number pipeline in Genome Analysis Toolkit 4 (GATK4).Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

CovidGraph: A Graph to fight COVID-19

Tue, 30/08/2022 - 5:30am
AbstractSummaryReliable and integrated data is a prerequisite for effective research on the recent COVID-19 pandemic. The CovidGraph project integrates and connects heterogeneous COVID-19 data in a knowledge graph, referred to as “CovidGraph”. It provides easy access to multiple data sources through a single point of entry and enables flexible data exploration.Availability and ImplementationMore information on CovidGraph is available from the project website: https://healthecco.org/covidgraph/. Source code and documentation are provided on GitHub: https://github.com/covidgraph.Supplementary informationSupplementary dataSupplementary data is available at Bioinformatics online.
Categories: Bioinformatics Trends

APSCALE: advanced pipeline for simple yet comprehensive analyses of DNA Meta-barcoding data

Sat, 27/08/2022 - 5:30am
AbstractSummaryDNA metabarcoding is an emerging approach to assess and monitor biodiversity worldwide and consequently the number and size of data sets increases exponentially. To date no published DNA metabarcoding data processing pipeline exists that is i) platform independent, ii) easy to use (incl. GUI), iii) fast (does scale well with dataset size), and iv) complies with data protection regulations of e.g., environmental agencies. The presented pipeline APSCALE meets these requirements and handles the most common tasks of sequence data processing, such as paired-end merging, primer trimming, quality filtering, clustering and denoising of any popular metabarcoding marker, such as ITS (internal transcribed spacer), 16S, or COI (cytochrome c oxidase subunit I). APSCALE comes in a command-line and a GUI version. The latter provides the user with additional summary statistics options and links to GUI-based downstream applications.AvailabilityAPSCALE is written in Python, a platform-independent language, and integrates functions of the open-source tools, VSEARCH (Rognes et al. 2016), cutadapt (Martin et al, 2011) and LULU (Frøslev et al. 2017). All modules support multithreading to allow fast processing of larger DNA metabarcoding datasets. Further information, and troubleshooting are provided on the respective GitHub pages for the command line version (https://github.com/DominikBuchner/apscale) and the GUI-based version (https://github.com/TillMacher/apscale_gui), including a detailed tutorial.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

expam—high-resolution analysis of metagenomes using distance trees

Sat, 27/08/2022 - 5:30am
AbstractSummaryShotgun metagenomic sequencing provides the capacity to understand microbial community structure and function at unprecedented resolution; however, current analytical methods are constrained by a focus on taxonomic classifications that may obfuscate functional relationships. Here we present expam, a tree based, taxonomy agnostic tool for identification of biologically relevant clades from shotgun metagenomic sequencing.Availability and Implementationexpam is an open-source Python application released under the GNU General Public Licence v3.0. expam installation instructions, source code and tutorials can be found at https://github.com/seansolari/expam.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

DeepToA: An Ensemble Deep-Learning Approach to Predicting the Theater of Activity of a Microbiome

Sat, 27/08/2022 - 5:30am
AbstractMotivationMetagenomics is the study of microbiomes using DNA sequencing. A microbiome consists of an assemblage of microbes that is associated with a “theater of activity” (ToA). An important question is, to what degree does the taxonomic and functional content of the former depend on the (details of the) latter? Here we investigate a related technical question: Given a taxonomic and/or functional profile estimated from metagenomic sequencing data, how to predict the associated ToA? We present a deep-learning approach to this question. We use both taxonomic and functional profiles as input. We apply node2vec to embed hierarchical taxonomic profiles into numerical vectors. We then perform dimension reduction using clustering, to address the sparseness of the taxonomic data and thus make the problem more amenable to deep-learning algorithms. Functional features are combined with textual descriptions of protein families or domains. We present an ensemble deep-learning framework DeepToA for predicting the “theater of activity” of amicrobial community, based on taxonomic and functional profiles. We use SHAP (SHapley Additive exPlanations) values to determine which taxonomic and functional features are important for the prediction.ResultsBased on 7,560 metagenomic profiles downloaded from MGnify, classified into ten different theaters of activity, we demonstrate that DeepToA has an accuracy of 98.30%. We show that adding textual information to functional features increases the accuracy.AvailabilityOur approach is available at http://ab.inf.uni-tuebingen.de/software/deeptoa.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

Aclust2.0: a revamped unsupervised R tool for Infinium methylation beadchips data analyses

Fri, 26/08/2022 - 5:30am
AbstractMotivationA wide range of computational packages has been developed for regional DNA methylation analyses of Illumina’s Infinium array data. Aclust, one of the first unsupervised algorithms, was originally designed to analyze regional methylation of Infinium’s 27K and 450K arrays by clustering neighboring methylation sites prior to downstream analyses. However, Aclust relied on outdated packages that rendered it largely non-operational especially with the newer Infinium EPIC and mouse arrays.ResultsWe have created Aclust2.0, a streamlined pipeline that involves five steps for the analyses of human (450K and EPIC) and mouse array data. Aclust2.0 provides a user-friendly pipeline and versatile for regional DNA methylation analyses for molecular epidemiological and mouse studies.AvailabilityAclust2.0 is freely available on Github (https://github.com/OluwayioseOA/Alcust2.0.git).
Categories: Bioinformatics Trends

Pages

Calendar

Mon
Tue
Wed
Thu
Fri
Sat
Sun
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
 
September 2022