Jump to Navigation

PHi-C2: interpreting Hi-C data as the dynamic 3D genome state

Bioinformatics Oxford Journals - Sat, 10/09/2022 - 5:30am
AbstractSummaryHigh-throughput chromosome conformation capture (Hi-C) is a widely used assay for studying the three-dimensional (3D) genome organization across the whole genome. Here, we present PHi-C2, a Python package supported by mathematical and biophysical polymer modeling that converts input Hi-C matrix data into the polymer model’s dynamics, structural conformations, and rheological features. The updated optimization algorithm for regenerating a highly similar Hi-C matrix provides a fast and accurate optimal solution compared to the previous version by eliminating the factors underlying the inefficiency of the optimization algorithm in the iterative optimization process. In addition, we have enabled a Google Colab workflow to run the algorithm, wherein users can easily change the parameters and check the results in the notebook. Overall, PHi-C2 represents a valuable tool for mining the dynamic 3D genome state embedded in Hi-C data.Availability and implementationPHi-C2 as the phic Python package is freely available under the GPL license and can be installed from the Python package index. The source code is available from GitHub at https://github.com/soyashinkai/PHi-C2. Moreover, users do not have to prepare a Python environment because PHi-C2 can run on Google Colab (https://bit.ly/3rlptGI).Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

A machine learning-based method for automatically identifying novel cells in annotating single cell RNA-seq data

Bioinformatics Oxford Journals - Fri, 09/09/2022 - 5:30am
AbstractMotivationSingle cell RNA sequencing (scRNA-seq) has been widely used to decompose complex tissues into functionally distinct cell types. The first and usually the most important step of scRNA-seq data analysis is to accurately annotate the cell labels. In recent years, many supervised annotation methods have been developed and shown to be more convenient and accurate than unsupervised cell clustering. One challenge faced by all the supervised annotation methods is the identification of the novel cell type, which is defined as the cell type that is not present in the training data, only exists in the testing data. Existing methods usually label the cells simply based on the correlation coefficients or confidence scores, which sometimes results in an excessive number of unlabeled cells.ResultsWe developed a straightforward yet effective method combining autoencoder with iterative feature selection to automatically identify novel cells from scRNA-seq data. Our method trains an autoencoder with the labeled training data and applies the autoencoder to the testing data to obtain reconstruction errors. By iteratively selecting features that demonstrate a bi-modal pattern and reclustering the cells using the selected feature, our method can accurately identify novel cells that are not present in the training data. We further combined this approach with a support vector machine to provide a complete solution for annotating the full range of cell types. Extensive numerical experiments using five real scRNA-seq datasets demonstrated favorable performance of the proposed method over existing methods serving similar purposes.AvailabilityOur R software package CAMLU is publicly available through the Zenodo repository (https://doi.org/10.5281/zenodo.7054422) or GitHub repository (https://github.com/ziyili20/CAMLU).Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

dICC: Distance-based Intraclass Correlation Coefficient for Metagenomic Reproducibility Studies

Bioinformatics Oxford Journals - Fri, 09/09/2022 - 5:30am
AbstractSummaryDue to the sparsity and high-dimensionality, microbiome data are routinely summarized into pairwise distances capturing the compositional differences. Many biological insights can be gained by analyzing the distance matrix in relation to some covariates. A microbiome sampling method that characterizes the inter-sample relationship more reproducibly is expected to yield higher statistical power. Traditionally, the intra-class correlation coefficient (ICC) has been used to quantify the degree of reproducibility for a univariate measurement using technical replicates. In this work, we extend the traditional ICC to distance measures and propose a distance-based ICC (dICC). We derive the asymptotic distribution of the sample-based dICC to facilitate statistical inference. We illustrate dICC using a real dataset from a metagenomic reproducibility study.AvailabilitydICC is implemented in the R CRAN package GUniFrac.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

The phers R package: using phenotype risk scores based on electronic health records to study Mendelian disease and rare genetic variants

Bioinformatics Oxford Journals - Fri, 09/09/2022 - 5:30am
Abstract Electronic health record (EHR) data linked to DNA biobanks are a valuable resource for understanding the phenotypic effects of human genetic variation. We previously developed the phenotype risk score (PheRS) as an approach to quantify the extent to which a patient’s clinical features resemble a given Mendelian disease. Using PheRS, we have uncovered novel associations between Mendelian disease-like phenotypes and rare genetic variants, and identified patients who may have undiagnosed Mendelian disease. Although the PheRS approach is conceptually simple, it involves multiple mapping steps and was previously only available as custom scripts, limiting the approach’s usability. Thus, we developed the phers R package, a complete and user-friendly set of functions and maps for performing a PheRS-based analysis on linked clinical and genetic data. The package includes up-to-date maps between EHR-based phenotypes (i.e., ICD codes and phecodes), human phenotype ontology (HPO) terms, and Mendelian diseases. Starting with occurrences of ICD codes, the package enables the user to calculate phenotype risk scores, validate the scores using case-control analyses, and perform genetic association analyses. By increasing PheRS’s transparency and usability, the phers R package will help improve our understanding of the relationships between rare genetic variants and clinically meaningful human phenotypes.AvailabilityThe phers R package is free and open-source, and available on CRAN and at https://phers.hugheylab.org.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

dsMTL - a computational framework for privacy-preserving, distributed multi-task machine learning

Bioinformatics Oxford Journals - Thu, 08/09/2022 - 5:30am
AbstractMotivationIn multi-cohort machine learning studies, it is critical to differentiate between effects that are reproducible across cohorts and those that are cohort-specific. Multi-task learning (MTL) is a machine learning approach that facilitates this differentiation through the simultaneous learning of prediction tasks across cohorts. Since multi-cohort data can often not be combined into a single storage solution, there would be the substantial utility of an MTL application for geographically distributed data sources.ResultsHere, we describe the development of “dsMTL”, a computational framework for privacy-preserving, distributed multi-task machine learning that includes three supervised and one unsupervised algorithms. First, we derive the theoretical properties of these methods and the relevant machine learning workflows to ensure the validity of the software implementation. Second, we implement dsMTL as a library for the R programming language, building on the DataSHIELD platform that supports the federated analysis of sensitive individual-level data. Third, we demonstrate the applicability of dsMTL for comorbidity modeling in distributed data. We show that comorbidity modeling using dsMTL outperformed conventional, federated machine learning, as well as the aggregation of multiple models built on the distributed datasets individually. The application of dsMTL was computationally efficient and highly scalable when applied to moderate-size (n < 500), real expression data given the actual network latency.AvailabilitydsMTL is freely available at https://github.com/transbioZI/dsMTLBase (server-side package) and https://github.com/transbioZI/dsMTLClient (client-side package). Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

VRPharmer: Bringing Virtual Reality into Pharmacophore-based Virtual Screening with Interactive Exploration and Realistic Visualization

Bioinformatics Oxford Journals - Thu, 08/09/2022 - 5:30am
AbstractSummaryCurrent pharmacophore-based virtual screening (VS) software has limited interactive capabilities and less intuitive screening processes. In this study, a novel tool named VRPharmer is proposed to perform the entire VS workflow in VR environments. VRPharmer enables users to interactively perceive computation processes and immersively observe molecular structures. Besides a typical screening mode (OPT mode), VRPharmer provides a unique interactive screening mode (SCORE mode) for freely exploring the optimal binding poses. Pharmacophore models are editable to study the impact of each feature and further refine the screening results. Moreover, molecular rendering algorithms are improved for precise representations.Availability and implementationVRPharmer is open-source software under the MIT license. The released version is available at https://github.com/VRPharmer/VRPharmer.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

Icolos: A workflow manager for structure based post-processing of de novo generated small molecules

Bioinformatics Oxford Journals - Thu, 08/09/2022 - 5:30am
AbstractSummaryWe present Icolos, a workflow manager written in Python as a tool for automating complex structure-based workflows for drug design. Icolos can be used as a standalone tool, for example in virtual screening campaigns, or can be used in conjunction with deep learning-based molecular generation facilitated for example by REINVENT, a previously published molecular de novo design package. In this publication, we focus on the internal structure and general capabilities of Icolos, using molecular docking experiments as an illustrative example.AvailabilityThe source code is freely available at https://github.com/MolecularAI/Icolos under the Apache 2.0 licence. Tutorial notebooks containing minimal working examples can be found at https://github.com/MolecularAI/IcolosCommunity.Supplementary informationA detailed description of the package, including common use cases, a full list of supported steps, and implementation details is provided in the SI.
Categories: Bioinformatics Trends

Access to ground truth at unconstrained size makes simulated data as indispensable as experimental data for bioinformatics methods development and benchmarking

Bioinformatics Oxford Journals - Thu, 08/09/2022 - 5:30am
AbstractMotivationThe author instructions of this journal require that new methodology includes assessment on “actual biological data” as opposed to simulated data. When available in high quantity, fidelity, and generality, experimental (biological) data may ensure assessment validity. For many bioinformatics application areas, however, experimental data are not available at a sufficient scale or with sufficient access to ground truth to allow conclusive assessment.ResultsWe argue that simulated and experimental data should be considered of equal importance in bioinformatics methods assessment, filling complementary roles. Simulations calibrated by experimental data offer the opportunity to assess and improve methodology early on, so that methods may reach a mature level by the time the capacity to generate large-scale experimental data has been established. Furthermore, the specification of a simulation algorithm contributes to transparency by revealing a method’s data assumptions. In summary, we argue that new bioinformatics methods should be validated on ground truth data that allow valid and reliable assessment, be it experimental or simulated (and ideally both).
Categories: Bioinformatics Trends

Correction to: Topological analysis as a tool for detection of abnormalities in protein–protein interaction data

Bioinformatics Oxford Journals - Thu, 08/09/2022 - 5:30am
This is a correction to: Alicja W Nowakowska, Malgorzata Kotulska, Topological analysis as a tool for detection of abnormalities in protein–protein interaction data, Bioinformatics, Volume 38, Issue 16, 15 August 2022, Pages 3968–3975, https://doi.org/10.1093/bioinformatics/btac440
Categories: Bioinformatics Trends

DRviaSPCN: a software package for drug repurposing in cancer via a subpathway crosstalk network

Bioinformatics Oxford Journals - Tue, 06/09/2022 - 5:30am
AbstractSummaryDrug repurposing is an approach used to discover new indications for existing drugs. Recently, several computational approaches have been developed for drug repurposing in cancer. Nevertheless, no approaches have reported a systematic analysis of pathway crosstalk. Pathway crosstalk, which refers to the phenomenon of interaction or cooperation between pathways, is a critical aspect of tumor pathways that allows cancer cells to survive and acquire resistance to drug therapy. Here, we innovatively developed a system biology R-based software package, DRviaSPCN, to repurpose drugs for cancer via a subpathway (SP) crosstalk network. This package provides a novel approach to prioritize cancer candidate drugs by considering drug-induced SPs and their crosstalk effects. The operation modes mainly include construction of the SP network and calculation of the centrality scores of SPs to reflect the influence of SP crosstalk, calculation of enrichment scores of drug- and disease-induced dysfunctional SPs and weighted them by the centrality scores of SPs, evaluation of the drug-disease reverse association at the weighted SP level, identification of cancer candidate drugs, and visualization of the results. Its capabilities enable DRviaSPCN to find cancer candidate drugs, which will complement the recent tools which did not consider crosstalk among pathways/SPs. DRviaSPCN may help to facilitate the development of drug discovery.Availability and implementationThe package is implemented in R and available under GPL-2 license from the CRAN website (https://CRAN.R-project.org/package=DRviaSPCN).Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

Manifold-Classification of Neuron Types from Microscopic Images

Bioinformatics Oxford Journals - Tue, 06/09/2022 - 5:30am
Abstract Analysis of cell types is recognized as a major task in current single cell genotyping and phenotyping. In neuroscience, 3-D neuron morphologies are often reconstructed from multi-dimensional microscopic images. Recent studies indicate that neurons could form very complicated distributions in the feature space, and thus they can be explored using manifold analysis. We have developed manifold-classification toolkit (MCT) software to replace the conventional clustering analysis to discover cell subtypes from three state-of-the-art collections of single neurons’ 3-D morphologies that reconstructed from images. We have gathered 9,208 3-D spatially registered whole mouse brain neurons from three datasets with the highest quality to date generated by the single neuron morphology community. To explore manifold distribution, our method uses minimum spanning tree based principal skeletons to approximate locally linear embeddings, to explore the morphological feature spaces, which correspond to dendritic arbors, axonal arbors, or both categories of arborization patterns of neurons. We show manifold classification is a suitable approach for a majority of often referred cell types, each of which was also discovered to contain multiple subtypes. Our results show an initial effort to employ manifold-classification but not traditional clustering analysis as an alternative framework for analyzing 3-D neuron morphologies reconstructed from 3-D microscopic images.AvailabilityFreely available at https://github.com/Mr-strlen/Cell_Pattern_Analysis_Tool.
Categories: Bioinformatics Trends

TVAR: Assessing Tissue-specific Functional Effects of Non-coding Variants with Deep Learning

Bioinformatics Oxford Journals - Mon, 05/09/2022 - 5:30am
AbstractMotivationAnalysis of whole-genome sequencing (WGS) for genetics is still a challenge due to the lack of accurate functional annotation of noncoding variants, especially the rare ones. As eQTLs have been extensively implicated in the genetics of human diseases, we hypothesize that rare noncoding variants discovered in WGS play a regulatory role in predisposing disease risk.ResultsWith thousands of tissue- and cell-type-specific epigenomic features, we propose TVAR. This multi-label learning-based deep neural network predicts the functionality of noncoding variants in the genome based on eQTLs across 49 human tissues in the GTEx project. TVAR learns the relationships between high-dimensional epigenomics and eQTLs across tissues, taking the correlation among tissues into account to understand shared and tissue-specific eQTL effects. As a result, TVAR outputs tissue-specific annotations, with an average AUROC of 0.77 across these tissues. We evaluate TVAR’s performance on four complex diseases (coronary artery disease, breast cancer, Type 2 diabetes, and Schizophrenia), using TVAR’s tissue-specific annotations, and observe its superior performance in predicting functional variants for both common and rare variants, compared to five existing state-of-the-art tools. We further evaluate TVAR’s G-score, a scoring scheme across all tissues, on ClinVar, fine-mapped GWAS loci, Massive Parallel Reporter Assay (MPRA) validated variants, and observe the consistently better performance of TVAR compared to other competing tools.AvailabilityThe TVAR source code and its scores on the ClinVar catalog, fine mapped GWAS Loci, high confidence eQTLs from GTEx dataset, and MPRA validated functional variants are available at https://github.com/haiyang1986/TVAR.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

SD2: Spatially resolved transcriptomics deconvolution through integration of dropout and spatial information

Bioinformatics Oxford Journals - Mon, 05/09/2022 - 5:30am
AbstractMotivationUnveiling the heterogeneity in the tissues is crucial to explore cell-cell interactions and cellular targets of human diseases. Spatial transcriptomics (ST) supplies spatial gene expression profile which has revolutionized our biological understanding, but variations in cell type proportions of each spot with dozens of cells would confound downstream analysis. Therefore, deconvolution of ST has been an indispensable step and a technical challenge towards the higher-resolution panorama of tissues.ResultsHere, we propose a novel ST deconvolution method called SD2 integrating spatial information of ST data and embracing an important characteristic, dropout, which is traditionally considered as an obstruction in single-cell RNA sequencing data (scRNA-seq) analysis. First, we extract the dropout-based genes as informative features from ST and scRNA-seq data by fitting a Michaelis-Menten function. After synthesizing pseudo-ST spots by randomly composing cells from scRNA-seq data, auto-encoder is applied to discover low-dimensional and non-linear representation of the real- and pseudo-ST spots. Next, we create a graph containing embedded profiles as nodes, and edges determined by transcriptional similarity and spatial relationship. Given the graph, a graph convolutional neural network is used to predict the cell-type compositions for real-ST spots. We benchmark the performance of SD2 on the simulated seqFISH+ dataset with different resolutions and measurements which show superior performance compared with the state-of-the-art methods. SD2 is further validated on three real-world datasets with different ST technologies, and demonstrates the capability to localize cell-type composition accurately with quantitive evidence. Finally, ablation study is conducted to verify the contribution of different modules proposed in SD2.AvailabilityThe SD2 is freely available in github (https://github.com/leihouyeung/SD2) and Zenodo (https://doi.org/10.5281/zenodo.7024684).
Categories: Bioinformatics Trends

CCPLS reveals cell-type-specific spatial dependence of transcriptomes in single cells

Bioinformatics Oxford Journals - Mon, 05/09/2022 - 5:30am
AbstractMotivationCell-cell communications regulate internal cellular states, e.g., gene expression and cell functions, and play pivotal roles in normal development and disease states. Furthermore, single-cell RNA sequencing methods have revealed cell-to-cell expression variability of highly variable genes (HVGs), which is also crucial. Nevertheless, the regulation on cell-to-cell expression variability of HVGs via cell-cell communications is still largely unexplored. The recent advent of spatial transcriptome methods has linked gene expression profiles to the spatial context of single cells, which has provided opportunities to reveal those regulations. The existing computational methods extract genes with expression levels influenced by neighboring cell types. However, limitations remain in the quantitativeness and interpretability: they neither focus on HVGs nor consider the effects of multiple neighboring cell types.ResultsHere, we propose CCPLS (Cell-Cell communications analysis by Partial Least Square regression modeling), which is a statistical framework for identifying cell-cell communications as the effects of multiple neighboring cell types on cell-to-cell expression variability of HVGs, based on the spatial transcriptome data. For each cell type, CCPLS performs PLS regression modeling and reports coefficients as the quantitative index of the cell-cell communications. Evaluation using simulated data showed our method accurately estimated the effects of multiple neighboring cell types on HVGs. Furthermore, applications to the two real datasets demonstrate that CCPLS can extract biologically interpretable insights from the inferred cell-cell communications.AvailabilityThe R package is available at https://github.com/bioinfo-tsukuba/CCPLS. The data are available at https://github.com/bioinfo-tsukuba/CCPLS_paper.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

ABEILLE: a novel method for ABerrant Expression Identification empLoying machine Learning from RNA-sequencing data

Bioinformatics Oxford Journals - Mon, 05/09/2022 - 5:30am
AbstractMotivationCurrent advances in omics technologies are paving the diagnosis of rare diseases proposing as a complementary assay to identify the responsible gene. The use of transcriptomic data to identify aberrant gene expression (AGE) have demonstrated to yield potential pathogenic events. However popular approaches for AGE identification are limited by the use of statistical tests that imply the choice of arbitrary cut-off for significance assessment and the availability of several replicates not always possible in clinical contexts.ResultsHence we developed ABEILLE (ABerrant Expression Identification empLoying machine LEarning from sequencing data) a variational autoencoder (VAE) based method for the identification of AGEs from the analysis of RNA-seq data without the need of replicates or a control group. ABEILLE combines the use of a VAE, able to model any data without specific assumptions on their distribution, and a decision tree to classify genes as AGE or non-AGE. An anomaly score is associated to each gene in order to stratify AGE by severity of aberration. We tested ABEILLE on semi-synthetic and an experimental dataset demonstrating the importance of the flexibility of the VAE configuration to identify potential pathogenic candidates.AvailabilityABEILLE source code is freely available at : https://github.com/UCA-MSI/ABEILLE.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

Systematic Replication Enables Normalization of High-throughput Imaging Assays

Bioinformatics Oxford Journals - Mon, 05/09/2022 - 5:30am
AbstractMotivationHigh-throughput fluorescent microscopy is a popular class of techniques for studying tissues and cells through automated imaging and feature extraction of hundreds to thousands of samples. Like other high-throughput assays, these approaches can suffer from unwanted noise and technical artifacts that obscure the biological signal. In this work we consider how an experimental design incorporating multiple levels of replication enables removal of technical artifacts from such image-based platforms.ResultsWe develop a general approach to remove technical artifacts from high-throughput image data that leverages an experimental design with multiple levels of replication. To illustrate the methods we consider microenvironment microarrays (MEMAs), a high-throughput platform designed to study cellular responses to microenvironmental perturbations. In application on MEMAs, our approach removes unwanted spatial artifacts and thereby enhances the biological signal. This approach has broad applicability to diverse biological assays.AvailabilityRaw data is on synapse (syn2862345), analysis code is on github: gjhunt/mema_norm, a reproducible Docker image is available on dockerhub: gjhunt/mema_norm.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

A closed formula relevant to “Theory of local k-mer selection with applications to long-read alignment” by Jim Shaw and Yun William Yu

Bioinformatics Oxford Journals - Mon, 05/09/2022 - 5:30am
AbstractMotivationTo handle the volume from next-generation sequencing data, modern sequence comparison often relies on summary sequence sketches such as minimizers, syncmers, and minimally overlapping words. Let us call an oligonucleotide of length k a k-mer. With the aim of anticipating the practical performance of a rule f that selects the k-mers in a sketch, Theorem 2 of Shaw and Yu gives a formula quantifying conservation of a sketch in the presence of a sequence mutation probability θ per base. Shaw and Yu give a four-variable recursion for computing the formula, a computation that is complicated, difficult to implement, and computationally expensive for large parameter values.ResultsFor minimizers, the earliest of the k-mer sketches, this letter shows that Shaw and Yu’s recursion is equivalent to a simple explicit formula. The proof of the explicit formula can be generalized, with applications to other sequence sketches likely.
Categories: Bioinformatics Trends

Differential RNA Methylation Analysis for MeRIP-seq Data under General Experimental Design

Bioinformatics Oxford Journals - Mon, 05/09/2022 - 5:30am
AbstractMotivationRNA epigenetics is an emerging field to study the post-transcriptional gene regulation. The dynamics of RNA epigenetic modification have been reported to associate with many human diseases. Recently developed high-throughput technology named Methylated RNA Immunoprecipitation Sequencing (MeRIP-seq) enables the transcriptome-wide profiling of N6-methyladenosine (m6A) modification and comparison of RNA epigenetic modifications. There are a few computational methods for the comparison of mRNA modifications under different conditions but they all suffer from serious limitations.ResultsIn this work, we develop a novel statistical method to detect differentially methylated mRNA regions from MeRIP-seq data. We model the sequence count data by a hierarchical negative binomial model that accounts for various sources of variations, and derive parameter estimation and statistical testing procedures for flexible statistical inferences under general experimental designs. Extensive benchmark evaluations in simulation and real data analyses demonstrate that our method is more accurate, robust, and flexible compared to existing methods.AvailabilityOur method TRESS is implemented as an R/Bioconductor package and is available at https://bioconductor.org/packages/devel/TRESS.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

grenepipe: A flexible, scalable, and reproducible pipeline to automate variant calling from sequence reads

Bioinformatics Oxford Journals - Fri, 02/09/2022 - 5:30am
AbstractSummaryWe developed grenepipe, an all-in-one Snakemake workflow to streamline the data processing from raw high-throughput sequencing data of individuals or populations to genotype variant calls. Our pipeline offers a range of popular software tools within a single configuration file, automatically installs software dependencies, is highly optimized for scalability in cluster environments, and runs with a single command.Availabilitygrenepipe is published under the GPLv3, and freely available at github.com/moiexpositoalonsolab/grenepipe
Categories: Bioinformatics Trends

BERN2: an advanced neural biomedical named entity recognition and normalization tool

Bioinformatics Oxford Journals - Fri, 02/09/2022 - 5:30am
AbstractSummaryIn biomedical natural language processing, named entity recognition (NER) and named entity normalization (NEN) are key tasks that enable the automatic extraction of biomedical entities (e.g., diseases and drugs) from the ever-growing biomedical literature. In this paper, we present BERN2 (Advanced Biomedical Entity Recognition and Normalization), a tool that improves the previous neural network-based NER tool (Kim et al., 2019) by employing a multi-task NER model and neural network-based NEN models to achieve much faster and more accurate inference. We hope that our tool can help annotate large-scale biomedical texts for various tasks such as biomedical knowledge graph construction.Availability and implementationWeb service of BERN2 is publicly available at http://bern2.korea.ac.kr. We also provide local installation of BERN2 at https://github.com/dmis-lab/BERN2.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

Pages

Subscribe to Centre for Bioinformatics aggregator - Bioinformatics Trends

Calendar

Mon
Tue
Wed
Thu
Fri
Sat
Sun
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
 
December 2022