Levels of supervision
In supervised learning, an ML algorithm infers a function based on labeled examples. For example, given records labeled as normal birth and those labeled as preterm birth, the algorithm learns to predict premature births from routine longitudinal EHR data.
A common version of semi-supervised learning involves an ML algorithm that has learned to assign labels to data. For example, to provide plentiful case/control data for a downstream heart attack prediction task, an algorithm is trained on EHR data to which labels have been affixed and carefully validated. This learning model then turns around and retrieves loads of unlabeled records that can be considered either heart attack cases or patients with normal heart health (termed controls in this context).
Some of the common unsupervised learning applications are recommendation engines for online retail and music platforms, and visual tasks, such as object recognition. An ML algorithm explores a data set without guidance, looking for hidden clusters or associations. For example, to identify clinical workflows, the algorithm infers clinical tasks performed by care team members based on sequences of events captured in EHR user logs (showing who logged in and how they used records). Unsupervised methods are also used to compress data, reducing the number of features in a data set while preserving its integrity as an object of study, enabling, for example, downstream visualization and interpretation of otherwise intractable biomolecular data.