Note
Go to the end to download the full example code or to run this example in your browser via Binder
Quality control visualization using GEO datasets#
This example demonstrates the built-in quality control plotting function to visualize the distribution of sample deviations from the population mean in a ridge density plot.
Import required classes and functions#
from biolearn.data_library import DataLibrary, GeoData
from biolearn.visualize import plot_sample_deviations
Create a dictionary of dataset display names to GeoData objects#
library = DataLibrary()
dataset_ids = ["GSE112618", "GSE110554", "GSE41169", "GSE52588"]
datasets = {id: library.get(id).load() for id in dataset_ids}
Generate a quality control report for each dataset#
[dataset.quality_report().show() for dataset in datasets.values()]
Quality Report Summary
------------------------------------------------
Sample Count: 6
Methylation Sites: 866091
Missing Methylation Data: 0 (0.00%)
Samples With High Deviation: 0 (0.00%)
Methylation Sites With Over 20% of Reads Missing: 0 (0.00%)
Notes:
------------------------------------------------
- No missing data points implies that this data has already gone through an imputation process or that low quality reads were included.
Quality Report Summary
------------------------------------------------
Sample Count: 49
Methylation Sites: 866091
Missing Methylation Data: 0 (0.00%)
Samples With High Deviation: 24 (48.98%)
Methylation Sites With Over 20% of Reads Missing: 0 (0.00%)
Notes:
------------------------------------------------
- No missing data points implies that this data has already gone through an imputation process or that low quality reads were included.
- Your data set includes samples with a high deviation. It is likely that the methylation data for these samples has been distorted due to technical issues.
Quality Report Summary
------------------------------------------------
Sample Count: 95
Methylation Sites: 485577
Missing Methylation Data: 60482 (0.13%)
Samples With High Deviation: 0 (0.00%)
Methylation Sites With Over 20% of Reads Missing: 5 (0.00%)
Notes:
------------------------------------------------
- Your data set includes methylation sites that have over 20% of reads missing. Default imputation may replace the values for all reads from this site with a gold standard.
Quality Report Summary
------------------------------------------------
Sample Count: 87
Methylation Sites: 485577
Missing Methylation Data: 27449 (0.06%)
Samples With High Deviation: 0 (0.00%)
Methylation Sites With Over 20% of Reads Missing: 7 (0.00%)
Notes:
------------------------------------------------
- Your data set includes methylation sites that have over 20% of reads missing. Default imputation may replace the values for all reads from this site with a gold standard.
[None, None, None, None]
Visualize the distribution of sample deviations from the population mean#
data:image/s3,"s3://crabby-images/1cc13/1cc13ee6d669f642ff83526a0cef1ef6e6a50ea5" alt="plot quality control"
Total running time of the script: (0 minutes 16.229 seconds)
Estimated memory usage: 3315 MB