Online Narrative and Codified feature Search Engine (ONCE)

About ONCE

ONCE is a feature-generation search engine that identifies related narrative and codified electronic health record (EHR) features based on an input or target disease.

Narrative features: Standardized clinical concepts represented by concept unique identifiers (CUIs) according to the Unified Medical Language System, or UMLS

Codified features: These features include

The key component that powers the ONCE feature selection engine is the semantic representations of all EHR codified and NLP concepts, trained via multi-source knowledge graph representation learning.

Online article knowledge: To identify an initial set of clinical concepts important to a target disease, the ONCE system additionally leverages a large knowledge base of medical articles that describe the relatedness between narrative features and the disease at a higher level. The current article corpus comprises data from seven online sources including Wikipedia, Medscape, and Merck Manuals.

A composite feature score: The final ONCE selection criteria are based on a composite score that integrates information on:

  • Relatedness of a candidate concept to the target concept as quantified by the semantic embeddings
  • Frequency of the candidate concept in the EHR
  • Weighted frequency of the concept in the disease-related knowledge source articles

To read more about ONCE and our paired phenotyping app, KOMAP, you can view our paper on medRxiv.

Quick Start Guide

Watch the attached instructional video to get started using ONCE.

Step 1: In the menu to the left, enter your disease of interest in the search box

Step 2: CUIs and Phecodes will automatically populate in the dropdown menus below the search box. You can use the dropdown menu to select one or many main CUIs from the list and one main Phecode. The main CUI(s) will determine the NLP features, and the main Phecode will determine the codified features

Step 3: Press the Search button to generate your feature lists, which will appear on the NLP Features and Codified Features tabs respectively

Step 4: Browse, filter and sort your features as desired. You can find details about each column in the Output Description section on this page

Step 5: You can download the full feature lists using the Download Full Results button at the top of each feature tab. You can also visit the Dictionary Download tab to download a dictionary of NLP features in a variety of formats that can be used directly for natural language processing of EHR charts or notes

Step 6: Use your generated feature lists as desired; see the Use Cases section on this page for suggested use cases and parameters, including phenotyping using our paired app, KOMAP

Use Cases

We do not always recommend using the full set of downloadable features for a disease of interest - instead, fine a use case that approximates your goal and select features accordingly:

Phenotyping

The features selected by ONCE can be used directly as features in a phenotyping algorithm - KOMAP was built to do this easily.

To use ONCE-selected features in phenotyping, use features where the phenotyping_features column value is True - for NLP and codified features respectively. The threshold used to select these features is the most stringent, so only the most relevant potential features are selected to avoid noise in the phenotyping algorithm.

Clinical study

ONCE-selected features can also be used for broader clinical study. To select a set of features that is broader than the strict threshold used for phenotyping features, use features where the expanded_features column values is True - for NLP and codified features respectively. The threshold for this set of features is more relaxed, so they can provide a broader look at features related to your disease of interest.

To select an even broader set of features, you may wish to use the entire list of downloadable features. If desired, you can use the importance_score to postprocess this list and create a custom threshold.

Output Description

NLP Features

cui contains UMLS CUIs that are related to the target CUI according to cooccurrence within online knowledge sources and EHR narrative notes

term lists the UMLS preferred terms for each CUI

target_similarity displays a cosine similarity measure between the target CUI and the corresponding CUI in the row

importance_score scores each CUI based on the target similarity, the weight of the CUI in the knowledge base, and the log frequency of the CUI in the EHR

phenotyping_features selects features with a strict threshold on the importance score, selecting highly related features that can be used directly for phenotyping

expanded_features selects features with a more relaxed threshold on the importance score, selecting a broader range of features suitable for clinical study

Codified Features

Variable contains codified variable names that are related to the target PheCode according to cooccurrence within the EHR

Description lists the human-readable description for the variable names

target_similarity displays a cosine similarity measure between the target PheCode and the corresponding variable in the row

importance_score scores each variable based on the target similarity and the log frequency of the variable in the EHR

phenotyping_features selects features with a strict threshold on the importance score, selecting highly related features that can be used directly for phenotyping

expanded_features selects features with a more relaxed threshold on the importance score, selecting a broader range of features suitable for clinical study

Dictionary Download

This section of the app allows you to download a dictionary of NLP features for direct use with NLP software. You can select the filtering strategy to be used as well as the format of the downloaded file.

Filtering strategies:

Maximally relevant CUIs for use in phenotyping will select only CUIs where the phenotyping_features column is True

Broadly relevant CUIs for clinical study will select only CUIs where the expanded_features column is True

All CUIs from NLP Features tab will select all CUIs regardless of their relatedness to the target disease

Downloaded file format:

Preferred terms: 1 term per CUI will select only the UMLS preferred term for each selected CUI. This format is most useful for easy viewing of selected CUIs

Full dictionary: all available terms per CUI will select all UMLS terms for each selected CUI. This format will capture any spelling and syntax variations within terms for each CUI and is best used for NLP processing of EHR narrative notes


Download NLP dictionary