Jump to Navigation

Poisson hurdle model-based method for clustering microbiome features

Bioinformatics Oxford Journals - Mon, 05/12/2022 - 5:30am
AbstractMotivationHigh-throughput sequencing technologies have greatly facilitated microbiome research and have generated a large volume of microbiome data with the potential to answer key questions regarding microbiome assembly, structure, and function. Cluster analysis aims to group features that behave similarly across treatments, and such grouping helps highlight the functional relationships among features and may provide biological insights into microbiome networks. However, clustering microbiome data is challenging due to the sparsity and high-dimensionality.ResultsWe propose a model-based clustering method based on Poisson hurdle models for sparse microbiome count data. We describe an expectation-maximization algorithm and a modified version using simulated annealing to conduct the cluster analysis. Moreover, we provide algorithms for initialization and choosing the number of clusters. Simulation results demonstrate that our proposed methods provide better clustering results than alternative methods under a variety of settings. We also apply the proposed method to a sorghum rhizosphere microbiome dataset that results in interesting biological findings.AvailabilityR package is freely available for download at https://cran.r-project.org/package=PHclust.Supplementary informationSupplementary MaterialsSupplementary Materials are available at Bioinformatics online.
Categories: Bioinformatics Trends

Microbiome Toolbox: Methodological approaches to derive and visualize microbiome trajectories

Bioinformatics Oxford Journals - Mon, 05/12/2022 - 5:30am
AbstractMotivationThe gut microbiome changes rapidly under the influence of different factors such as age, dietary changes, or medications to name just a few. To analyze and understand such changes we present a microbiome analysis toolbox. We implemented several methods for analysis and exploration to provide interactive visualizations for easy comprehension and reporting of longitudinal microbiome data.ResultsBased on the abundance of microbiome features such as taxa as well as functional capacity modules, and with the corresponding metadata per sample, the toolbox includes methods for 1) data analysis and exploration, 2) data preparation including dataset-specific preprocessing and transformation, 3) best feature selection for log-ratio denominators, 4) two-group analysis, 5) microbiome trajectory prediction with feature importance over time, 6) spline and linear regression statistical analysis for testing universality across different groups and differentiation of two trajectories, 7) longitudinal anomaly detection on the microbiome trajectory, and 8) simulated intervention to return anomaly back to a reference trajectory.AvailabilityThe software tools are open source and implemented in Python. For developers interested in additional functionality of the toolbox, it is modular allowing for further extension with custom methods and analysis. The code, python package, and the link to the interactive dashboard are available on GitHub https://github.com/JelenaBanjac/microbiome-toolbox.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

TransFlow: a Snakemake workflow for transmission analysis of Mycobacterium tuberculosis whole-genome sequencing data

Bioinformatics Oxford Journals - Mon, 05/12/2022 - 5:30am
AbstractMotivationWhole-genome sequencing (WGS) is increasingly used to aid the understanding of Mycobacterium tuberculosis (MTB) transmission. The epidemiological analysis of tuberculosis based on the WGS technique requires a diverse collection of bioinformatics tools. Effectively using these analysis tools in a scalable and reproducible way can be challenging, especially for non-experts.ResultsHere, we present TransFlow (Transmission Workflow), a user-friendly, fast, efficient, and comprehensive WGS-based transmission analysis pipeline. TransFlow combines some state-of-the-art tools to take transmission analysis from raw sequencing data, through quality control, sequence alignment, and variant calling, into downstream transmission clustering, transmission network reconstruction, and transmission risk factor inference, together with summary statistics and data visualization in a summary report. TransFlow relies on Snakemake and Conda to resolve dependencies among consecutive processing steps and can be easily adapted to any computation environment.AvailabilityTransFlow is free available at https://github.com/cvn001/transflow.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

RatesTools: a nextflow pipeline for detecting de novo germline mutations in pedigree sequence data

Bioinformatics Oxford Journals - Mon, 05/12/2022 - 5:30am
AbstractSummaryHere, we introduce RatesTools, an automated pipeline to infer de novo mutation rates from parent-offspring trio data of diploid organisms. By providing a reference genome and high-coverage, whole-genome resequencing data of a minimum of three individuals (sire, dam, offspring), RatesTools provides a list of candidate de novo mutations and calculates a putative mutation rate. RatesTools uses several quality filtering steps, such as discarding sites with low mappability and highly repetitive regions, as well as sites with low genotype and mapping qualities to find potential de novo mutations. In addition, RatesTools implements several optional filters based on post hoc assumptions of the heterozygosity and mutation rate of the organism. Filters are highly customizable to user specifications in order to maximize utility across a wide-range of applications.AvailabilityRatesTools is freely available at https://github.com/campanam/RatesTools under a Creative Commons Zero (CC0) license. The pipeline is implemented in Nextflow (Di Tommaso et al., 2017), Ruby (http://www.ruby-lang.org), Bash (https://www.gnu.org/software/bash/), and R (R Core Team 2020) with reliance upon several other freely available tools. RatesTools is compatible with macOS and Linux operating systems.Supplementary informationSupplementary informationSupplementary information documenting RatesTools’ performance using published datasets is available at Bioinformatics online.
Categories: Bioinformatics Trends

Visual Omics: A web-based platform for omics data analysis and visualization with rich graph-tuning capabilities

Bioinformatics Oxford Journals - Fri, 02/12/2022 - 5:30am
AbstractSummaryWith the continuous development of high-throughput sequencing technology, bioinformatic analysis of omics data plays an increasingly important role in life science research. Many R packages are widely used for omics analysis, such as DESeq2, clusterProfiler and STRINGdb. And some online tools based on them have been developed to free bench scientists from programming with these R packages. However, the charts generated by these tools are usually in a fixed, non-editable format and often fail to clearly demonstrate the details the researchers intend to express. To address these issues, we have created Visual Omics, an online tool for omics data analysis and scientific chart editing. Visual Omics integrates multiple omics analyses which include differential expression analysis, enrichment analysis, protein domain prediction and protein-protein interaction analysis with extensive graph presentations. It can also independently plot and customize basic charts that are involved in omics analysis, such as various PCA/PCoA plots, bar plots, box plots, heat maps, set intersection diagrams, bubble charts, volcano plots. A distinguishing feature of Visual Omics is that it allows users to perform one-stop omics data analyses without programming, iteratively explore the form and layout of graphs online, and fine-tune parameters to generate charts that meet publication requirements.Availability and implementationVisual Omics can be used at http://bioinfo.ihb.ac.cn/visomics, Source code can be downloaded at http://bioinfo.ihb.ac.cn/software/visomics/visomics-1.1.tar.gz.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

DeepCellEss: Cell line-specific essential protein prediction with attention-based interpretable deep learning

Bioinformatics Oxford Journals - Fri, 02/12/2022 - 5:30am
AbstractMotivationProtein essentiality is usually accepted to be a conditional trait and strongly affected by cellular environments. However, existing computational methods often do not take such characteristics into account, preferring to incorporate all available data and train a general model for all cell lines. In addition, the lack of model interpretability limits further exploration and analysis of essential protein predictions.ResultsIn this study, we proposed DeepCellEss, a sequence-based interpretable deep learning framework for cell line-specific essential protein predictions. DeepCellEss utilizes convolutional neural network and bidirectional long short-term memory to learn short- and long-range latent information from protein sequences. Further, a multi-head self-attention mechanism is used to provide residue-level model interpretability. For model construction, we collected extremely large-scale benchmark datasets across 323 cell lines. Extensive computational experiments demonstrate that DeepCellEss yields effective prediction performance for different cell lines, and outperforms existing sequence-based methods as well as network-based centrality measures. Finally, we conducted some case studies to illustrate the necessity of considering specific cell lines and the superiority of DeepCellEss. We believe that DeepCellEss can serve as a useful tool for predicting essential proteins across different cell lines.AvailabilityThe DeepCellEss web server is available at http://csuligroup.com:8000/DeepCellEss. The source code can be obtained from https://github.com/CSUBioGroup/DeepCellEss. Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

Comprehensive visualization of cell-cell interactions in single-cell and spatial transcriptomics with NICHES

Bioinformatics Oxford Journals - Fri, 02/12/2022 - 5:30am
AbstractMotivationRecent years have seen the release of several toolsets that reveal cell-cell interactions from single-cell data. However, all existing approaches leverage mean celltype gene expression values, and do not preserve the single-cell fidelity of the original data. Here, we present NICHES (Niche Interactions and Communication Heterogeneity in Extracellular Signaling), a tool to explore extracellular signaling at the truly single-cell level.ResultsNICHES allows embedding of ligand-receptor signal proxies to visualize heterogeneous signaling archetypes within cell clusters, between cell clusters, and across experimental conditions. When applied to spatial transcriptomic data, NICHES can be used to reflect local cellular microenvironment. NICHES can operate with any list of ligand-receptor signaling mechanisms, is compatible with existing single-cell packages, and allows rapid, flexible analysis of cell-cell signaling at single-cell resolution.AvailabilityNICHES is an open-source software implemented in R under academic free license v3.0 and it is available at github.com/msraredon/NICHES. Use-case vignettes are available at https://msraredon.github.io/NICHES/.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

Expression of Concern: Overcoming the inadaptability of sparse group lasso for data with various group structures by stacking

Bioinformatics Oxford Journals - Thu, 01/12/2022 - 5:30am
This is an Expression of Concern regarding: Huan He, Xinyun Guo, Jialin Yu, Chen Ai, Shaoping Shi, Overcoming the inadaptability of sparse group lasso for data with various group structures by stacking, Bioinformatics, Volume 38, Issue 6, 15 March 2022, Pages 1542–1549, https://doi.org/10.1093/bioinformatics/btab848
Categories: Bioinformatics Trends

Pharokka: a fast scalable bacteriophage annotation tool

Bioinformatics Oxford Journals - Thu, 01/12/2022 - 5:30am
AbstractSummaryIn recent years, there has been an increasing interest in bacteriophages which has led to growing numbers of bacteriophage genomic sequences becoming available. Consequently, there is a need for a rapid and consistent genomic annotation tool dedicated for bacteriophages. Existing tools either are not designed specifically for bacteriophages or are web- and email- based and require significant manual curation, which makes their integration into bioinformatic pipelines challenging. Pharokka was created to provide a tool that annotates bacteriophage genomes easily, rapidly and consistently with standards compliant outputs. Moreover, Pharokka requires only two lines of code to install and use and takes under 5 minutes to run for an average 50 kb bacteriophage genome.Availability and ImplementationPharokka is implemented in Python and is available as a bioconda package using ‘conda install -c bioconda pharokka’. The source code is available on GitHub (https://github.com/gbouras13/pharokka). Pharokka has been tested on Linux-64 and MacOSX machines, and on Windows using a Linux Virtual Machine.Supplementary informationAll benchmarking input FASTA and output files, including the python script calc_gff_coding_density_prokka.py script, is available at https://doi.org/10.5281/zenodo.7227091.
Categories: Bioinformatics Trends

Scaling Neighbor-Joining to One Million Taxa with Dynamic and Heuristic Neighbor-Joining

Bioinformatics Oxford Journals - Thu, 01/12/2022 - 5:30am
AbstractMotivationThe Neighbor-Joining algorithm is a widely used method to perform iterative clustering, and forms the basis for phylogenetic reconstruction in several bioinformatic pipelines. Although Neighbor-Joining is considered to be a computationally efficient algorithm, it does not scale well for datasets exceeding several thousand taxa (> 100 000). Optimizations to the canonical Neighbor-Joining algorithm have been proposed, these optimizations are, however, achieved through approximations or extensive memory usage, which is not feasible for large datasets.ResultsIn this article two new algorithms, Dynamic and Heuristic Neighbor-Joining, are presented, which optimize the canonical Neighbor-Joining method to scale to millions of taxa without increasing the memory requirements. Both Dynamic and Heuristic Neighbor-Joining outperform the current gold standard methods to construct Neighbor-Joining trees, while Dynamic Neighbor-Joining is guaranteed to produce exact Neighbor-Joining trees.Availabilityhttps://bitbucket.org/genomicepidemiology/ccphylo.gitSupplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

Treenome Browser: co-visualization of enormous phylogenies and millions of genomes

Bioinformatics Oxford Journals - Thu, 01/12/2022 - 5:30am
AbstractSummaryTreenome Browser is a web browser tool to interactively visualize millions of genomes alongside huge phylogenetic trees.Availability and ImplementationTreenome Browser for SARS-CoV-2 can be accessed at cov2tree.org, or at taxonium.org for user-provided trees. Source code and documentation are available at github.com/theosanderson/taxonium and docs.taxonium.org/en/latest/treenome.html.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

EDIR: Exome Database of Interspersed Repeats

Bioinformatics Oxford Journals - Thu, 01/12/2022 - 5:30am
AbstractMotivationIntragenic exonic deletions are known to contribute to genetic diseases and are often flanked by regions of homology.ResultsIn order to get a more clear view on these interspersed repeats encompassing a coding sequence, we have developed EDIR (Exome Database of Interspersed Repeats) which contains the positions of these structures within the human exome. EDIR has been calculated by an inductive strategy, rather than by a brute force approach and can be queried through an R/Bioconductor package or a web interface allowing the per gene rapid extraction of homology flanked sequences throughout the exome.AvailabilityThe code used to compile EDIR can be found at https://github.com/lauravongoc/EDIR. The full data set of EDIR can be queried via an Rshiny application at http://193.70.34.71:3857/edir/. The R package for querying EDIR is called “EDIRquery” and is available on Bioconductor. The full EDIR data set can be downloaded from https://osf.io/m3gvx/ or http://193.70.34.71/EDIR.tar.gzSupplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

SCExecute: custom cell barcode-stratified analyses of scRNA-seq data

Bioinformatics Oxford Journals - Wed, 30/11/2022 - 5:30am
AbstractMotivationIn single-cell RNA-sequencing (scRNA-seq) data, stratification of sequencing reads by cellular barcode is necessary to study cell-specific features. However, apart from gene expression, the analyses of cell-specific features are not sufficiently supported by available tools designed for high-throughput sequencing data.ResultsWe introduce SCExecute, which executes a user-provided command on barcode-stratified, extracted on-the-fly, single cell binary alignment map (scBAM) files. SCExecute extracts the alignments with each cell barcode from aligned, pooled single-cell sequencing data. Simple commands, monolithic programs, multi-command shell-scripts, or complex shell-based pipelines are then executed on each scBAM file. scBAM files can be restricted to specific barcodes and/or genomic regions of interest. We demonstrate SCExecute with two popular variant callers—GATK and Strelka2—executed in shell-scripts together with commands for BAM file manipulation and variant filtering, to detect single cell-specific expressed Single Nucleotide Variants (sceSNVs) from droplet scRNA-seq data (10X Genomics Chromium System).ConclusionSCExecute facilitates custom cell-level analyses on barcoded scRNA-seq data using currently available tools and provides an effective solution for studying low (cellular) frequency transcriptome features.AvailabilitySCExecute is implemented in Python3 using the Pysam package and distributed for Linux, MacOS and Python environments from https://horvathlab.github.io/NGS/SCExecute.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

Systematic identification of biochemical networks in cancer cells by Functional Pathway Inference Analysis

Bioinformatics Oxford Journals - Wed, 30/11/2022 - 5:30am
AbstractMotivationPathway inference methods are key for annotating the genome, for providing insights into the mechanisms of biochemical processes and allow the discovery of signalling members and potential new drug targets. Here, we tested the hypothesis that genes with similar impact on cell viability across multiple cell lines belong to a common pathway, thus providing a conceptual basis for a pathway inference method based on correlated anti-proliferative gene properties.MethodsTo test this concept, we used recently available large scale RNAi screens to develop a method, termed Functional Pathway Inference Analysis (FPIA), to systemically identify correlated gene dependencies.ResultsTo assess FPIA, we initially focused on PI3K/AKT/MTOR signalling, a prototypic oncogenic pathway for which we have good sense of ground truth. Dependencies for AKT1, MTOR and PDPK1 were among the most correlated with those for PIK3CA (encoding PI3Kα), as returned by FPIA, whereas negative regulators of PI3K/AKT/MTOR signalling, such as PTEN were anti-correlated. Following FPIA, MTOR, PIK3CA and PIK3CB produced significantly greater correlations for genes in the PI3K-Akt pathway versus other pathways. Application of FPIA to two additional pathways (p53 and MAPK) returned expected associations (e.g., MDM2 and TP53BP1 for p53 and MAPK1 and BRAF for MEK1). Over-representation analysis of FPIA-returned genes enriched the respective pathway, and FPIA restricted to specific tumour lineages uncovered cell type-specific networks. Overall, our study demonstrates the ability of FPIA to identify members of pro-survival biochemical pathways in cancer cells.AvailabilityFPIA is implemented in a new R package named ‘cordial’ freely available from https://github.com/CutillasLab/cordial.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

PoDCall: positive droplet calling and normalisation of droplet digital PCR DNA methylation data

Bioinformatics Oxford Journals - Wed, 30/11/2022 - 5:30am
AbstractMotivationDroplet digital PCR (ddPCR) holds great promises for investigating DNA methylation with high sensitivity. Yet the lack of methods for analyzing ddPCR DNA methylation data has resulted in users processing the data manually at the expense of standardisation.ResultsPoDCall is an R package performing automated calling of positive droplets, quantification, and normalisation of methylation levels in ddPCR experiments. A Shiny application provides users with an intuitive and interactive interface to access PoDCall functionalities.AvailabilityBioconductor package: https://bioconductor.org/packages/PoDCall/Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

Unbiased pangenome graphs

Bioinformatics Oxford Journals - Wed, 30/11/2022 - 5:30am
AbstractMotivationPangenome variation graphs model the mutual alignment of collections of DNA sequences. A set of pairwise alignments implies a variation graph, but there are no scalable methods to generate such a graph from these alignments. Existing related approaches depend on a single reference, a specific ordering of genomes, or a de Bruijn model based on a fixed k-mer length. A scalable, self-contained method to build pangenome graphs without such limitations would be a key step in pangenome construction and manipulation pipelines.ResultsWe design the seqwish algorithm, which builds a variation graph from a set of sequences and alignments between them. We first transform the alignment set into an implicit interval tree. To build up the variation graph, we query this tree-based representation of the alignments to reduce transitive matches into single DNA segments in a sequence graph. By recording the mapping from input sequence to output graph, we can trace the original paths through this graph, yielding a pangenome variation graph. We present an implementation that operates in external memory, using disk-backed data structures and lock-free parallel methods to drive the core graph induction step. We demonstrate that our method scales to very large graph induction problems by applying it to build pangenome graphs for several species.Availabilityseqwish is published as free software under the MIT open source license. Source code and documentation are available at https://github.com/ekg/seqwish. seqwish can be installed via Bioconda https://bioconda.github.io/recipes/seqwish/README.html or GNU Guix https://github.com/ekg/guix-genomics/blob/master/seqwish.scm.
Categories: Bioinformatics Trends

Active Learning for Efficient Analysis of High-throughput Nanopore Data

Bioinformatics Oxford Journals - Tue, 29/11/2022 - 5:30am
Abstract As the third-generation sequencing technology, nanopore sequencing has been used for high-throughput sequencing of DNA, RNA, and even proteins. Recently, many studies have begun to use machine learning technology to analyze the enormous data generated by nanopores. Unfortunately, the success of this technology is due to the extensive labeled data, which often suffer from enormous labor costs. Therefore, there is an urgent need for a novel technology that can not only rapidly analyze nanopore data with high-throughput, but also significantly reduce the cost of labeling. To achieve the above goals, we introduce active learning to alleviate the enormous labor costs by selecting the samples that need to be labeled. This work applies several advanced active learning technologies to the nanopore data, including the RNA classification dataset (RNA-CD) and the Oxford Nanopore Technologies barcode dataset (ONT-BD). Due to the complexity of the nanopore data (with noise sequence), the bias constraint is introduced to improve the sample selection strategy in active learning. The experimental results show that for the same performance metric, 50% labeling amount can achieve the best baseline performance for ONT-BD, while only 15% labeling amount can achieve the best baseline performance for RNA-CD. Crucially, the experiments show that active learning technology can assist experts in labeling samples, and significantly reduce the labeling cost. Active learning can greatly reduce the dilemma of difficult labeling of high-capacity nanopore data. We hope active learning can be applied to other problems in nanopore sequence analysis.AvailabilityThe main program is available at https://github.com/guanxiaoyu11/AL-for-nanopore.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

dnadna: a deep learning framework for population genetics inference

Bioinformatics Oxford Journals - Tue, 29/11/2022 - 5:30am
AbstractMotivationWe present dnadna, a flexible python-based software for deep learning inference in population genetics. It is task-agnostic and aims at facilitating the development, reproducibility, dissemination, and reusability of neural networks designed for population genetic data.Resultsdnadna defines multiple user-friendly workflows. First, users can implement new architectures and tasks, while benefiting from dnadna utility functions, training procedure and test environment, which saves time and decreases the likelihood of bugs. Second, the implemented networks can be re-optimized based on user-specified training sets and/or tasks. Newly implemented architectures and pretrained networks are easily shareable with the community for further benchmarking or other applications. Finally, users can apply pretrained networks in order to predict evolutionary history from alternative real or simulated genetic datasets, without requiring extensive knowledge in deep learning or coding in general.dnadna comes with a peer-reviewed, exchangeable neural network, allowing demographic inference from SNP data, that can be used directly or retrained to solve other tasks. Toy networks are also available to ease the exploration of the software, and we expect that the range of available architectures will keep expanding thanks to community contributions.Availability and Implementationdnadna is a Python (≥ 3.7) package, its repository is available at gitlab.com/mlgenetics/dnadna and its associated documentation at mlgenetics.gitlab.io/dnadna/.
Categories: Bioinformatics Trends

Distinguishing imported cases from locally acquired cases within a geographically limited genomic sample of an infectious disease

Bioinformatics Oxford Journals - Mon, 28/11/2022 - 5:30am
AbstractMotivationThe ability to distinguish imported cases from locally acquired cases has important consequences for the selection of public health control strategies. Genomic data can be useful for this, for example using a phylogeographic analysis in which genomic data from multiple locations is compared to determine likely migration events between locations. However, these methods typically require good samples of genomes from all locations, which is rarely available.ResultsHere we propose an alternative approach that only uses genomic data from a location of interest. By comparing each new case with previous cases from the same location we are able to detect imported cases, as they have a different genealogical distribution than that of locally acquired cases. We show that, when variations in the size of the local population are accounted for, our method has good sensitivity and excellent specificity for the detection of imports. We applied our method to data simulated under the structured coalescent model and demonstrate relatively good performance even when the local population has the same size as the external population. Finally, we applied our method to several recent genomic datasets from both bacterial and viral pathogens, and show that it can, in a matter of seconds or minutes, deliver important insights on the number of imports to a geographically limited sample of a pathogen population.Availability and ImplementationThe R package DetectImports is freely available from https://github.com/xavierdidelot/DetectImportsSupplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

varAmpliCNV: Analyzing variance of amplicons to detect CNVs in targeted NGS data

Bioinformatics Oxford Journals - Mon, 28/11/2022 - 5:30am
AbstractMotivationComputational identification of copy number variants (CNVs) in sequencing data is a challenging task. Existing CNV-detection methods account for various sources of variation and perform different normalization strategies. However, their applicability and predictions are restricted to specific enrichment protocols. Here, we introduce a novel tool named varAmpliCNV, specifically designed for CNV-detection in amplicon-based targeted resequencing data (HaloplexTM enrichment protocol) in the absence of matched controls. VarAmpliCNV utilizes principal component analysis (PCA) and/or metric dimensional scaling (MDS) to control variances of amplicon associated read counts enabling effective detection of CNV signals.ResultsPerformance of VarAmpliCNV was compared against three existing methods (ConVaDING, ONCOCNV, DECoN) on data of 167 samples run with an aortic aneurysm gene panel (n = 30), including 9 positive control samples. Additionally, we validated the performance on a large deafness gene panel (n = 145) run on 138 samples, containing 4 positive controls. VarAmpliCNV achieved higher sensitivity (100%) and specificity (99.78%) in comparison to competing methods. In addition, unsupervised clustering of CNV segments and visualization plots of amplicons spanning these regions is included as a downstream strategy to filter out false positives.Availabilityhttps://hub.docker.com/r/cmgantwerpen/varamplicnv. Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

Pages

Subscribe to Centre for Bioinformatics aggregator - Bioinformatics Trends

Calendar

Mon
Tue
Wed
Thu
Fri
Sat
Sun
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
 
December 2022