Knowledge-Driven Online Multimodal Automated Phenotyping System (KOMAP)

What is it?

The KOMAP pipeline comprises two key components:
  1. Online Narrative and Codified feature Search engine (ONCE), powered by multi-source knowledge graph representation, and illustrated in the ONCE app;
  2. Online phenotyping algorithm training and validation:
    • Training:
      • KOMAP can train a multimodal phenotyping algorithm fully online based on a user-supplied summary of the feature matrix;
    • Validation:
      • KOMAP contains an online evaluation system to approximate evaluation metrics based on additional summary statistics derived from a validation set consisting of labeled data.

How does it work?

  • Training:
    With a given set of selected features, including a set of main surrogate features which can be indicated by ONCE and a healthcare utilization measure, the training of KOMAP contains three steps:
    1. Normalizing the main surrogates with the utilization;
    2. Denoising via regression on each main surrogate;
    3. Combining the derived risk scores of different surrogates.
    All of steps only require the embedding of the empirical covariance matrix, FREE of any individual-level information.
  • Validation:
    • The key working assumption behind the proposed evaluation algorithm is that all the features given the label follows approximately a Gaussian distribution. With this, the ROC curve of the predicted score is uniquely determined by the conditional mean vectors and conditional covariance matrices.

How can I use it?

  • Step 0 (Optional): Identify a list of features related to your diease of interests (ONCE);
  • Step 1: Upload the training and validation covariance matrix of features with corrupted main surrogates; upload your dictionary connecting variable names to their descriptions;
  • Step 2: Specify names for main surrogate(s) and the healthcare utility corresponding to the disease.
  • Step 3 (Optional, only needed for simulated AUC): With label data, upload prevalence, conditional mean vectors and conditional covariance matrices;
  • Step 4: Just click the "GO KOMAP" button and you are ready to go!
Need more instructions on KOMAP inputs?

Model inputs:

Upload covariance matrices
Step 1.1

Training covariance matrix

Step 1.2

Validation covariance matrix


  • Training and validation covariance matrices must have the same set of concpets as their column names and row names.
  • There must exist at least one main surrogate and its corrupted version in each covariance matrix.
  • Corrupted surrogate is generated by replacing 20% of surrogate by its mean.
Upload dictionary
Step 2


Specify feature names
Step 3


  1. Specify the number of surrogates you want to fit.
  2. Identify the name of each surrogate as well as its corrupted version.
  3. Identify the name of the healthcare utilization score.
(Optional) Upload conditional summary data
Step 4

Conditional summary data

Wrap up the following summary-level data into one excel file:
  • Sheet 1: Conditional covariance matrix among patients with negative disease status.
  • Sheet 2: Conditional covariance matrix among patients with positive disease status.
  • Sheet 3: Conditional mean vector among patients with negative disease status.
  • Sheet 4: Conditional mean vector among patients with positive disease status.
  • Sheet 5: A single number indicating the disease prevalence.

Sheet 1:
Sheet 2:
Sheet 3:
Sheet 4:

Covariance matrices (train + valid)

Download sample data

Download sample data
  • Toy train cov matrix:

    Toy valid cov matrix:


Download sample data
  • Dictionary:

Feature names

Conditional suammry data

Download sample data

Model outcomes: