Ype-1b” depending on whether the DVCs exhibit stronger differences in the mean or variance (Fig. 5a). In contrast to type-1 DVCs, type-2 and type-3 DVCs only show differences at the level of DNAm variance, with average levels of DNAm in each phenotype being statistically indistinguishable (Fig. 5a). The key difference between type-2 and type-3 DVCs is that in the type-2 case, the increased variance is driven by few outliers exhibiting coordinated changes (i.e. in the same direction),whereas in the type-3 case, the increased variance is potentially due to more outliers but with a larger level of discoordination, with outliers exhibiting both hyper and hypomethylation (Fig. 5a). Real data examples confirm the existence of these different types of DV (Fig. 5b), and although most of these focus on hypermethylation, analogous types of DV exhibiting hypomethylation are also observed (Additional file 1: Figure S4). Demonstrating that this taxonomy of DV is of biological relevance, we observed that DVCs typically exhibited progressive changes in DNA methylation in carcinogenesis, evolving from being type-2 DVCs in the earliest stages of cancer to being type-1 DVCs in neoplasia (Fig. 6a, Additional file 1: Figure S5). Confirming this dynamics of DV on a global scale, we observed that type-1 and type-2 DV exhibited widely different frequencies depending on disease stage, with type-1 DV being very infrequent in pre-neoplastic lesions but much more prominent in neoplasia and invasive cancer (Fig. 6b). Thus, we posited that the variable performance of DV algorithms and the critical dependence on disease stage, could be explained by their varying sensitivities to detect different types of DV. To this end, we conducted a simulation study, where we simulated DVCs from the type1a, type-1b and type-2 subtypes, and then compared the sensitivity of the different algorithms to detect themTeschendorff et al. BMC Bioinformatics (2016) 17:Page 8 ofFig. 3 Positive Predictive Values (PPVs) of DVCs identified from pre-neoplastic lesions in cervical neoplasia and invasive cervical cancer. a PPVs of XL880 biological activity differentially variable CpGs (DVCs) selected by each of five different DV algorithms from the ARTISTIC data set (comparing 75 normal cervical smear samples from women who 3 years later developed a CIN2+ to 77 from women who remained disease free), with the PPV values estimated in an independent Illumina 27 k set profiling 24 normal cervical smears (N) and 24 CIN2+ samples. The number of top-ranked selected DVCs increases along the panels from left to right. The PPV was estimated as the fraction of hypermethylated DVCs attaining a t-statistic larger than 1.96 (P < 0.05) in the independent set. Only hypermethylation was considered due to the design of the 27 k beadarray which is overrepresented for probes in gene promoters. b As (a), but now for an independent Illumina 27 k set profiling 15 normal cervical tissue (N) and 48 invasive cervical cancers (CC)(Methods). We observed that Bartlett's test, iEVORA and GAMLSS were able to retrieve all types of true DVCs with equal power without losing much control of the false discovery rate (FDR) (Fig. 6c). In contrast, although J-DMDV and DiffVar could achieve much better control of the FDR, their power to detect type-2 DV was clearly compromised (Fig. 6c).Outlier DVCs are not markers of immune or stromal cells, but are PubMed ID:https://www.ncbi.nlm.nih.gov/pubmed/28404814 enriched for transcription factor binding sites and PRC2/bivalent target genesgenes and binding sites of transcr.