Note
This page is a reference documentation. It only explains the function signature, and not how to use it. Please refer to the user guide for the big picture.
biolearn.imputation.hybrid_impute¶
- biolearn.imputation.hybrid_impute(dnam, cpg_source, required_cpgs, threshold=0.8)¶
- Imputes missing values in a DNA methylation dataset based on a threshold. Sites with data below the threshold are replaced from an external source, while others are imputed using the average of existing values. - Parameters:
- dnam (pd.DataFrame) – DataFrame with samples as columns and CpG sites as rows. 
- cpg_source (pd.Series) – Series containing reference values for CpG sites. 
- required_cpgs (list of str) – List of CpG sites to impute. Missing cpgs will only be imputed if in this list. 
- threshold (float, optional) – Threshold for determining imputation strategy. Default is 0.8. 
 
- Returns:
- DataFrame with missing values filled. 
- Return type:
- pd.DataFrame 
- Raises:
- ValueError – If certain required CpG sites are missing from both the dataset and the cpg_source.