Knowledge-Driven Online Multimodal Automated Phenotyping System (KOMAP)
What is it?
The KOMAP pipeline comprises two key components:
- Online Narrative and Codified feature Search engine (ONCE), powered by multi-source knowledge graph representation, and illustrated in the ONCE app;
Online phenotyping algorithm training and validation:
- KOMAP can train a multimodal phenotyping algorithm fully online based on a user-supplied summary of the feature matrix;
- KOMAP contains an online evaluation system to approximate evaluation metrics based on additional summary statistics derived from a validation set consisting of labeled data.
How does it work?
With a given set of selected features, including a set of main surrogate features which can be indicated by ONCE and a healthcare utilization measure, the training of KOMAP contains three steps:
- Normalizing the main surrogates with the utilization;
- Denoising via regression on each main surrogate;
- Combining the derived risk scores of different surrogates.
The key working assumption behind the proposed evaluation algorithm is that all the features given the label follows approximately a Gaussian distribution. With this, the ROC curve of the predicted score is uniquely determined by the conditional mean vectors and conditional covariance matrices.
How can I use it?
- Step 0 (Optional): Identify a list of features related to your diease of interests (ONCE);
- Step 1: Upload the training and validation covariance matrix of features with corrupted main surrogates; upload your dictionary connecting variable names to their descriptions;
- Step 2: Specify names for main surrogate(s) and the healthcare utility corresponding to the disease.
- Step 3 (Optional, only needed for simulated AUC): With label data, upload prevalence, conditional mean vectors and conditional covariance matrices;
- Step 4: Just click the "GO KOMAP" button and you are ready to go!
Need more instructions on KOMAP inputs?
Upload covariance matrices
Training covariance matrix
Validation covariance matrix
- Training and validation covariance matrices must have the same set of concpets as their column names and row names.
- There must exist at least one main surrogate and its corrupted version in each covariance matrix.
- Corrupted surrogate is generated by replacing 20% of surrogate by its mean.
Specify feature names
- Specify the number of surrogates you want to fit.
- Identify the name of each surrogate as well as its corrupted version.
- Identify the name of the healthcare utilization score.
(Optional) Upload conditional summary data
Conditional summary data
Wrap up the following summary-level data into one excel file:
- Sheet 1: Conditional covariance matrix among patients with negative disease status.
- Sheet 2: Conditional covariance matrix among patients with positive disease status.
- Sheet 3: Conditional mean vector among patients with negative disease status.
- Sheet 4: Conditional mean vector among patients with positive disease status.
- Sheet 5: A single number indicating the disease prevalence.