Note
This page is a reference documentation. It only explains the function signature, and not how to use it. Please refer to the user guide for the big picture.
biolearn.imputation.hybrid_impute#
- biolearn.imputation.hybrid_impute(dnam, cpg_source, required_cpgs, threshold=0.8)#
Imputes missing values in a DNA methylation dataset based on a threshold. Sites with data below the threshold are replaced from an external source, while others are imputed using the average of existing values.
- Parameters:
dnam (pd.DataFrame) – DataFrame with samples as columns and CpG sites as rows.
cpg_source (pd.Series) – Series containing reference values for CpG sites.
required_cpgs (list of str) – List of CpG sites that need to be in the final dataset.
threshold (float, optional) – Threshold for determining imputation strategy. Default is 0.8.
- Returns:
DataFrame with missing values filled.
- Return type:
pd.DataFrame
- Raises:
ValueError – If certain required CpG sites are missing from both the dataset and the cpg_source.