Jump to Navigation

Reproducible acquisition, management, and meta-analysis of nucleotide sequence (meta)data using q2-fondue

Bioinformatics Oxford Journals - Tue, 20/09/2022 - 5:30am
AbstractMotivationThe volume of public nucleotide sequence data has blossomed over the past two decades, and is ripe for re- and meta-analyses to enable novel discoveries. However, reproducible re-use and management of sequence datasets and associated metadata remain critical challenges. We created the open source Python package q2-fondue to enable user-friendly acquisition, re-use, and management of public sequence (meta)data while adhering to open data principles.Resultsq2-fondue allows fully provenance-tracked programmatic access to and management of data from the NCBI Sequence Read Archive (SRA). Unlike other packages allowing download of sequence data from the SRA, q2-fondue enables full data provenance tracking from data download to final visualization, integrates with the QIIME 2 ecosystem, prevents data loss upon space exhaustion, and allows download of (meta)data given a publication library. To highlight its manifold capabilities, we present executable demonstrations using publicly available amplicon, whole genome, and metagenome datasets.Availabilityq2-fondue is available as an open-source BSD-3-licensed Python package at https://github.com/bokulich-lab/q2-fondue. Usage tutorials are available in the same repository. All Jupyter notebooks used in this article are available under https://github.com/bokulich-lab/q2-fondue-examples.Supplementary informationAn introductory tutorial with background information and examples for the usage of q2-fondue can be accessed at https://github.com/bokulich-lab/q2-fondue/blob/main/tutorial/tutorial.md.
Categories: Bioinformatics Trends

CGV: Cancer Genome Viewer, a web service for integrative cancer genome and pharmacogenomic data analysis

Bioinformatics Oxford Journals - Tue, 20/09/2022 - 5:30am
AbstractMotivationMultiomic profiling data, such as The Cancer Genome Atlas (TCGA) and pharmacogenomic data, facilitate research into cancer mechanisms and drug development. However, it is not easy for researchers to connect, integrate, and analyze huge and heterogeneous data, which is a major obstacle to the utilization of cancer genomic data.ResultsWe developed Cancer Genome Viewer (CGV), a user-friendly web service that provides functions to integrate and visualize cancer genome data and pharmacogenomic data. Users can easily select and customize the samples to be analyzed with the pre-defined selection options for patients’ clinic-pathological features from multiple data sets. Using the customized data set, users can perform subsequent data analyses comprehensively, including gene set analysis, clustering, or survival analysis. CGV also provides pre-calculated drug response scores from pharmacogenomic data, which may facilitate the discovery of new cancer targets and therapeutics.Availability and implementationCGV web service is implemented with the R Shiny application at http://cgv.sysmed.kr and the source code is freely available at https://git.sysmed.kr/sysmed_public/cgv.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

EagleImp: Fast and Accurate Genome-wide Phasing and Imputation in a Single Tool

Bioinformatics Oxford Journals - Tue, 20/09/2022 - 5:30am
AbstractBackgroundReference-based phasing and genotype imputation algorithms have been developed with sublinear theoretical runtime behaviour, but runtimes are still high in practice when large genome-wide reference datasets are used.MethodsWe developed EagleImp, a software based on the methods used in the existing tools Eagle2 and PBWT, which allows accurate and accelerated phasing and imputation in a single tool by algorithmic and technical improvements and new features.ResultsWe compared accuracy and runtime of EagleImp with Eagle2, PBWT and prominent imputation servers using whole-genome sequencing data from the 1000 Genomes Project, the Haplotype Reference Consortium and simulated data with 1 million reference genomes. EagleImp was 2 to 30 times faster (depending on the single or multiprocessor configuration selected and the size of the reference panel) than Eagle2 combined with PBWT, with the same or better phasing and imputation quality in all tested scenarios. For common variants investigated in typical GWAS studies, EagleImp provided same or higher imputation accuracy than the Sanger Imputation Service, Michigan Imputation Server and the newly developed TOPMed Imputation Server, despite larger (not publicly available) reference panels. Additional features include automated chromosome splitting and memory management at runtime to avoid job aborts, fast reading and writing of large files, and various user-configurable algorithm and output options.ConclusionsDue to the technical optimisations, EagleImp can perform fast and accurate reference-based phasing and imputation and is ready for future large reference panels in the order of one million genomes.AvailabilityEagleImp is freely available for download from https://github.com/ikmb/eagleimp.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

MPVNN: Mutated Pathway Visible Neural Network architecture for interpretable prediction of cancer-specific survival risk

Bioinformatics Oxford Journals - Mon, 19/09/2022 - 5:30am
AbstractMotivationSurvival risk prediction using gene expression data is important in making treatment decisions in cancer. Standard neural network (NN) survival analysis models are black boxes with lack of interpretability. More interpretable visible neural network (VNN) architectures are designed using biological pathway knowledge. But they do not model how pathway structures can change for particular cancer types.ResultsWe propose a novel Mutated Pathway VNN or MPVNN architecture, designed using prior signaling pathway knowledge and random replacement of known pathway edges using gene mutation data simulating signal flow disruption. As a case study, we use the PI3K-Akt pathway and demonstrate overall improved cancer-specific survival risk prediction of MPVNN over other similar sized NN and standard survival analysis methods. We show that trained MPVNN architecture interpretation, which points to smaller sets of genes connected by signal flow within the PI3K-Akt pathway that are important in risk prediction for particular cancer types, is reliable.AvailabilityThe data and code are available at https://github.com/gourabghoshroy/MPVNNSupplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

HMI-PRED 2.0: a biologist-oriented web application for prediction of host-microbe protein-protein interaction by interface mimicry

Bioinformatics Oxford Journals - Mon, 19/09/2022 - 5:30am
AbstractSummaryHMI-PRED 2.0 is a publicly available web service for prediction of host-microbe protein-protein interaction by interface mimicry that is intended to be used without extensive computational experience. A microbial protein structure is screened against a database covering the entire available structural space of complexes of known human proteins.Availability and implementationHMI-PRED 2.0 provides user-friendly graphic interfaces for predicting, visualizing, and analyzing host-microbe interactions. HMI-PRED 2.0 is available at https://hmipred.org/.
Categories: Bioinformatics Trends

GraphLoc: a graph neural network model for predicting protein subcellular localization from immunohistochemistry images

Bioinformatics Oxford Journals - Fri, 16/09/2022 - 5:30am
AbstractMotivationRecognition of protein subcellular distribution patterns and identification of location biomarker proteins in cancer tissues are important for understanding protein functions and related diseases. Immunohistochemical (IHC) images enable visualizing the distribution of proteins at the tissue level, providing an important resource for the protein localization studies. In the past decades, several image-based protein subcellular location prediction methods have been developed, but the prediction accuracies still have much space to improve due to the complexity of protein patterns resulting from multi-label proteins and variation of location patterns across cell types or states.ResultsHere, we propose a multi-label multi-instance model based on deep graph convolutional neural networks, GraphLoc, to recognize protein subcellular location patterns. GraphLoc builds a graph of multiple IHC images for one protein, learns protein-level representations by graph convolutions, and predicts multi-label information by a dynamic threshold method. Our results show that GraphLoc is a promising model for image-based protein subcellular location prediction with model interpretability. Furthermore, we apply GraphLoc to the identification of candidate location biomarkers and potential members for protein networks. A large portion of the predicted results have supporting evidence from the existing literatures and the new candidates also provide guidance for further experimental screening.AvailabilityThe dataset and code are available at: www.csbio.sjtu.edu.cn/bioinf/GraphLoc.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

SBbadger: Biochemical Reaction Networks with Definable Degree Distributions

Bioinformatics Oxford Journals - Fri, 16/09/2022 - 5:30am
AbstractMotivationAn essential step in developing computational tools for the inference, optimization, and simulation of biochemical reaction networks is gauging tool performance against earlier efforts using an appropriate set of benchmarks. General strategies for the assembly of benchmark models include collection from the literature, creation via subnetwork extraction and de novo generation. However, with respect to biochemical reaction networks, these approaches and their associated tools are either poorly suited to generate models that reflect the wide range of properties found in natural biochemical networks or to do so in numbers that enable rigorous statistical analysis.ResultsIn this work we present SBbadger, a python-based software tool for the generation of synthetic biochemical reaction or metabolic networks with user-defined degree distributions, multiple available kinetic formalisms, and a host of other definable properties. SBbadger thus enables the creation of benchmark model sets that reflect properties of biological systems and generate the kinetics and model structures typically targeted by computational analysis and inference software. Here we detail the computational and algorithmic workflow of SBbadger, demonstrate its performance under various settings, provide sample outputs, and compare it to currently available biochemical reaction network generation software.Availability and ImplementationSBbadger is implemented in Python and is freely available at https://github.com/sys-bio/SBbadger and via PyPi at https://pypi.org/project/SBbadger/. Documentation can be found at https://SBbadger.readthedocs.io.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

Drug-Protein Interaction Prediction by Correcting the Effect of Incomplete Information in Heterogeneous Information

Bioinformatics Oxford Journals - Fri, 16/09/2022 - 5:30am
AbstractMotivationLarge-scale heterogeneous data provides diverse perspectives for predicting drug-protein interactions (DPIs). However, the available information on molecular interactions and clinical associations related to drugs or proteins is incomplete because there may be unproven interactions and associations. This incomplete information in the available data is presented in the form of non-interaction and non-correlation, which may mislead the prediction model. Existing methods fuse incomplete and complete information without considering their integrity, so the negative effects of incomplete information still exist.ResultsWe develop a network-based DPI prediction method named BRWCP, which uses the complete information network to correct the prediction results acquired by the incomplete information network. By integrating relevant heterogeneous information that may be incomplete, the feature similarities of drugs and proteins are obtained. Combining the feature similarities and known DPIs, an incomplete information-based drug-protein heterogeneous network is constructed. Then a bidirectional random walk with pruning algorithm is adopted in this heterogeneous network to predict potential DPIs. Next, the predicted DPIs are combined with the chemical fingerprint similarity of drugs and amino acid sequence similarity of proteins to construct the complete information network. The bidirectional random walk with pruning algorithm is applied in the new network to obtain the final prediction results until it converges. Experimental results show that BRWCP is superior to several state-of-the-art DPI prediction methods, and case studies further confirm its ability to tap potential drug-protein interactions.AvailabilityThe code of BRWCP is available at https://github.com/lyfdomain/BRWCP.Supplementary informationSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

StrainDesign: a Comprehensive Python Package for Computational Design of Metabolic Networks

Bioinformatics Oxford Journals - Fri, 16/09/2022 - 5:30am
AbstractSummaryVarious constraint-based optimization approaches have been developed for the computational analysis and design of metabolic networks. Herein we present StrainDesign, a comprehensive Python package that builds upon the COBRApy toolbox and integrates the most popular metabolic design algorithms, including nested strain optimization methods such as OptKnock, RobustKnock and OptCouple as well as the more general minimal cut sets approach. The optimization approaches are embedded in individual modules, which can also be combined for setting up more elaborate strain design problems. Advanced features, such as the efficient integration of GPR rules and the possibility to consider gene and reaction additions or regulatory interventions, have been generalized and are available for all modules. The package uses state-of-the art preprocessing methods, supports multiple solvers and provides a number of enhanced tools for analyzing computed intervention strategies including 2D and 3D plots of user-selected metabolic fluxes or yields. Furthermore, a user-friendly interface for the StrainDesign package has been implemented in the GUI-based metabolic modeling software CNApy. StrainDesign provides thus a unique and rich framework for computational strain design in Python, uniting many algorithmic developments in the field and allowing modular extension in the future.Availability and implementationThe StrainDesign package can be retrieved from PyPi, Anaconda and GitHub (https://github.com/klamt-lab/straindesign) and is also part of the latest CNApy package.
Categories: Bioinformatics Trends

CoDNaS-Q: a database of conformational diversity of the native state of proteins with quaternary structure

Bioinformatics Oxford Journals - Fri, 16/09/2022 - 5:30am
AbstractSummaryA collection of conformers that exist in a dynamical equilibrium defines the native state of a protein. The structural differences between them describe their conformational diversity, a defining characteristic of the protein with an essential role in multiple cellular processes. Since most proteins carry out their functions by assembling into complexes, we have developed CoDNaS-Q, the first online resource to explore conformational diversity in homooligomeric proteins. It features a curated collection of redundant protein structures with known quaternary structure. CoDNaS-Q integrates relevant annotations that allow researchers to identify and explore the extent and possible reasons of conformational diversity in homooligomeric protein complexes.Availability and ImplementationCoDNaS-Q is freely accessible at http://ufq.unq.edu.ar/codnasq/ or https://codnas-q.bioinformatica.org/home. The data can be retrieved from the website. The source code of the database can be downloaded from https://github.com/SfrRonaldo/codnas-q.Supplementary informationSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

The sequence context in poly-alanine regions: structure, function and conservation

Bioinformatics Oxford Journals - Thu, 15/09/2022 - 5:30am
AbstractMotivationPoly-alanine (polyA) regions are protein stretches mostly composed of alanines. Despite their abundance in eukaryotic proteomes and their association to nine inherited human diseases, the structural and functional roles exerted by polyA stretches remain poorly understood. In this work we study how the amino acid context in which polyA regions are settled in proteins influences their structure and function.ResultsWe identified glycine and proline as the most abundant amino acids within polyA and in the flanking regions of polyA tracts, in human proteins as well as in 17 additional eukaryotic species. Our analyses indicate that the non-structuring nature of these two amino acids influences the α-helical conformations predicted for polyA, suggesting a relevant role in reducing the inherent aggregation propensity of long polyA. Then, we show how polyA position in protein N-termini relates with their function as transit peptides. PolyA placed just after the initial methionine is often predicted as part of mitochondrial transit peptides, whereas when placed in downstream positions, polyA are part of signal peptides. A few examples from known structures suggest that short polyA can emerge by alanine substitutions in α-helices; but evolution by insertion is observed for longer polyA. Our results showcase the importance of studying the sequence context of homorepeats as a mechanism to shape their structure–function relationships.Contactmunoz@uni-mainz.de or pau.bernado@cbs.cnrs.frAvailability and implementationThe datasets used and/or analyzed during the current study are available from the corresponding author onreasonable request.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

Multi-Omic Integration by Machine Learning (MIMaL)

Bioinformatics Oxford Journals - Thu, 15/09/2022 - 5:30am
AbstractMotivationCells respond to environments by regulating gene expression to exploit resources optimally. Recent advances in technologies allow measuring the abundances of transcripts, proteins, lipids and metabolites. These highly complex datasets reflect the state of the different layers in a biological system. Multi-omics is the integration of these disparate methods and data to gain a clearer picture of the biological state. Multi-omic studies of the proteome and metabolome are becoming more common as mass spectrometry technology continues to be democratized. However, knowledge extraction through integration of these data remains challenging.ResultsConnections between molecules in different omic layers were discovered through a combination of machine learning and model interpretation. Discovered connections reflected protein control over metabolites. Proteins discovered to control citrate were mapped onto known genetic and metabolic networks, revealing that these protein regulators are novel. Further, clustering the magnitudes of protein control over all metabolites enabled prediction of five gene functions, each of which was validated experimentally. Two uncharacterized genes, YJR120W and YLD157C, were accurately predicted to modulate mitochondrial translation. Functions for three incompletely characterized genes were also predicted and validated, including SDH9, ISC1, and FMP52. A website enables results exploration and also MIMaL analysis of user-supplied multi-omic data.AvailabilityThe website for MIMaL is at https://mimal.appCode for the website is at https://github.com/qdickinson/mimal-websiteCode to implement MIMaL is at https://github.com/jessegmeyerlab/MIMaLSupplementary informationSupplementary figures are available at Bioinformatics online.Supporting data are available at https://doi.org/10.5281/zenodo.6537297MS data are available under the identifier MSV000090100 at https://massive.ucsd.edu/ProteoSAFe/dataset.jsp?task=ba70b1440b2b4c488323fa6644b332cb
Categories: Bioinformatics Trends

SQuAPP – Simple Quantitative Analysis of Proteins & PTMs

Bioinformatics Oxford Journals - Wed, 14/09/2022 - 5:30am
AbstractSummaryThe Comprehensive analysis of the proteome and its modulation by post-translational modification is increasingly used in biological and biomedical studies. As a result, proteomics data analysis is ever more carried out by scientists with limited expertise in this type of data. While excellent software solutions for comprehensive and rigorous analysis of quantitative proteomic data exist, most are complex and not well suited for non-proteomics scientists. Integrative analysis of multi-level proteomics data on protein and diverse post-translational modifications (PTMs), like phosphorylation or proteolytic processing, remains particularly challenging and inaccessible to most biologists. To fill this void, we developed SQuAPP, an R-Shiny web-based analysis pipeline for the quantitative analysis of proteomic data. SQuAPP uses a streamlined workflow model to guide expert and novice users through quality control, data pre-processing, statistical analysis and visualization steps. Processing the protein, peptide, and post-translational modification datasets in parallel and their quantitative integration enable rapid identification of protein-level-independent modulation of protein modifications and intuitive interpretation of dynamic dependencies between different protein modifications.AvailabilitySQuAPP is available at http://squapp.langelab.org/. The source code and local setup instructions can be accessed from https://github.com/LangeLab/SQuAPP.
Categories: Bioinformatics Trends

AEON.py: Python Library for Attractor Analysis in Asynchronous Boolean Networks

Bioinformatics Oxford Journals - Wed, 14/09/2022 - 5:30am
AbstractSummaryAEON.py is a Python library for the analysis of the long-term behaviour in very large asynchronous Boolean networks. It provides significant computational improvements over the state of the art methods for attractor detection. Furthermore, it admits the analysis of partially specified Boolean networks with uncertain update functions. It also includes techniques for identifying viable source-target control strategies and the assessment of their robustness with respect to parameter perturbations.Availability and ImplementationAll relevant results are available in supplementary materialssupplementary materials. The tool is accessible through https://github.com/sybila/biodivine-aeon-py.Supplementary informationSupplementary dataSupplementary data are available online through Bioinformatics.
Categories: Bioinformatics Trends

ScanExitronLR: characterization and quantification of exitron splicing events in long-read RNA-seq data

Bioinformatics Oxford Journals - Tue, 13/09/2022 - 5:30am
AbstractSummaryExitron splicing is a type of alternative splicing where coding sequences are spliced out. Recently, exitron splicing has been shown to increase proteome plasticity and play a role in cancer. Long-read RNA-seq is well suited for quantification and discovery of alternative splicing events; however, there are currently no tools available for detection and annotation of exitrons in long-read RNA-seq data. Here we present ScanExitronLR, an application for the characterization and quantification of exitron splicing events in long-reads. From a BAM alignment file, reference genome and reference gene annotation, ScanExitronLR outputs exitron events at the individual transcript level. Outputs of ScanExitronLR can be used in downstream analyses of differential exitron splicing. In addition, ScanExitronLR optionally reports exitron annotations such as truncation or frameshift type, nonsense-mediated decay status, and Pfam domain interruptions. We demonstrate that ScanExitronLR performs better on noisy long-reads than currently published exitron detection algorithms designed for short-read data.AvailabilityScanExitronLR is freely available at https://github.com/ylab-hi/ScanExitronLR and distributed as a pip package on the Python Package Index.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

MGPLI: Exploring Multigranular Representations for Protein-Ligand Interaction Prediction

Bioinformatics Oxford Journals - Mon, 12/09/2022 - 5:30am
AbstractMotivationThe capability to predict the potential drug binding affinity against a protein target has always been a fundamental challenge in-silico drug discovery. The traditional experiments in vitro and in vivo are costly and time-consuming which need to search over large compound space. Recent years have witnessed significant success on deep learning-based models for drug-target binding affinity (DTA) prediction task.ResultsFollowing the recent success of the Transformer model, we propose a multi-granularity protein ligand interaction (MGPLI) model, which adopts the Transformer encoders to represent the character-level features and fragment-level features, modeling the possible interaction between residues and atoms or their segments. In addition, we use the Convolutional Neural Network (CNN) to extract higher-level features based on transformer encoder outputs and a highway layer to fuse the protein and drug features. We evaluate MGPLI on different protein ligand interaction datasets and show the improvement of prediction performance compared to state-of-the-art baselines.AvailabilityThe model scripts are available at https://github.com/IILab-Resource/MGDTA.gitSupplementary informationSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

Systematic comparison of ranking aggregation methods for gene lists in experimental results

Bioinformatics Oxford Journals - Mon, 12/09/2022 - 5:30am
AbstractMotivationA common experimental output in biomedical science is a list of genes implicated in a given biological process or disease. The gene lists resulting from a group of studies answering the same, or similar, questions can be combined by ranking aggregation methods to find a consensus or a more reliable answer. Evaluating a ranking aggregation method on a specific type of data before using it is required to support the reliability since the property of a dataset can influence the performance of an algorithm. Such evaluation on gene lists is usually based on a simulated database because of the lack of a known truth for real data. However, simulated datasets tend to be too small compared to experimental data and neglect key features, including heterogeneity of quality, relevance and the inclusion of unranked lists.ResultsIn this study, a group of existing methods and their variations which are suitable for meta-analysis of gene lists are compared using simulated and real data. Simulated data was used to explore the performance of the aggregation methods as a function of emulating the common scenarios of real genomic data, with various heterogeneity of quality, noise level, and a mix of unranked and ranked data using 20000 possible entities. In addition to the evaluation with simulated data, a comparison using real genomic data on the SARS-CoV-2 virus, cancer (NSCLC), and bacteria (macrophage apoptosis) was performed. We summarise the results of our evaluation in a simple flowchart to select a ranking aggregation method, and in an automated implementation using the meta-analysis by information content (MAIC) algorithm to infer heterogeneity of data quality across input data sets.AvailabilityThe code for simulated data generation and running edited version of algorithms: https://github.com/baillielab/comparison_of_RA_methods.Code to perform an optimal selection of methods based on the results of this review, using the MAIC algorithm to infer the characteristics of an input dataset, can be downloaded here: https://github.com/baillielab/maic. An online service for running MAIC: https://baillielab.net/maic. Supplementary informationSupplementary file 1Supplementary file 1 is available at Bioinformatics online. Supporting data 1-7 (supporting results and collected real genomic data) are available on GitHub at: https://github.com/baillielab/comparison_of_RA_methods.
Categories: Bioinformatics Trends

MODIG: Integrating Multi-Omics and Multi-Dimensional Gene Network for Cancer Driver Gene Identification based on Graph Attention Network Model

Bioinformatics Oxford Journals - Mon, 12/09/2022 - 5:30am
AbstractMotivationIdentifying genes that play a causal role in cancer evolution remains one of the biggest challenges in cancer biology. With the accumulation of high-throughput multi-omics data over decades, it becomes a great challenge to effectively integrate these data into the identification of cancer driver genes.ResultsHere, we propose MODIG, a graph attention network (GAT)-based framework to identify cancer driver genes by combining multi-omics pan-cancer data (mutations, copy number variants, gene expression, and methylation levels) with multi-dimensional gene networks. First, we established diverse types of gene relationship maps based on protein-protein interactions (PPI), gene sequence similarity, KEGG pathway co-occurrence, gene co-expression patterns, and Gene Ontology (GO). Then, we constructed a multi-dimensional gene network consisting of approximately 20,000 genes as nodes and five types of gene associations as multiplex edges. We applied a GAT to model within-dimension interactions to generate a gene representation for each dimension based on this graph. Moreover, we introduced a joint learning module to fuse multiple dimension-specific representations to generate general gene representations. Finally, we used the obtained gene representation to perform a semi-supervised driver gene identification task. The experiment results show that MODIG outperforms the baseline models in terms of area under precision-recall curves (AUPR) and area under the receiver operating characteristic curves (AUROC).AvailabilityThe MODIG program is available at https://github.com/zjupgx/modig. The code and data underlying this article are also available on Zenodo, at https://doi.org/10.5281/zenodo.7057241.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

CANTATA - prediction of missing links in Boolean networks using genetic programming

Bioinformatics Oxford Journals - Mon, 12/09/2022 - 5:30am
AbstractMotivationBiological processes are complex systems with distinct behaviour. Despite the growing amount of available data, knowledge is sparse and often insufficient to investigate the complex regulatory behaviour of these systems. Moreover, different cellular phenotypes are possible under varying conditions. Mathematical models attempt to unravel these mechanisms by investigating the dynamics of regulatory networks. Therefore, a major challenge is to combine regulations and phenotypical information as well as the underlying mechanisms. To predict regulatory links in these models, we established an approach called CANTATA to support the integration of information into regulatory networks and retrieve potential underlying regulations. This is achieved by optimising both static and dynamic properties of these networks.ResultsInitial results show that the algorithm predicts missing interactions by recapitulating the known phenotypes while preserving the original topology and optimising the robustness of the model. The resulting models allow for hypothesising about the biological impact of certain regulatory dependencies.AvailabilitySource code of the application, example files and results are available at https://github.com/sysbio-bioinf/Cantata.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

ASTRAL-Pro 2: ultrafast species tree reconstruction from multi-copy gene family trees

Bioinformatics Oxford Journals - Mon, 12/09/2022 - 5:30am
AbstractMotivationSpecies tree inference from multi-copy gene trees has long been a challenge in phylogenomics. The recent method ASTRAL-Pro has made large strides by enabling multi-copy gene family trees as input and has been quickly adopted. Yet, its scalability, especially memory usage, needs to improve to accommodate the ever-growing dataset size.ResultsWe present ASTRAL-Pro 2, an ultrafast and memory efficient version of ASTRAL-Pro that adopts a placement-based optimization algorithm for significantly better scalability without sacrificing accuracy.AvailabilityThe source code and binary files are publicly available at https://github.com/chaoszhang/ASTER; data are available at https://github.com/chaoszhang/A-Pro2_data.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

Pages

Subscribe to Centre for Bioinformatics aggregator - Bioinformatics Trends

Calendar

Mon
Tue
Wed
Thu
Fri
Sat
Sun
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
 
December 2022