haploid imputation of parental haplotypes
Table of contents
Rationale
In this section, we will start with our haploid-based imputed data, that we will recode into paternal haplotype, maternal haplotype and differential haplotype using of PofO predictions.
Pipeline
This step rely on a single script that you can found in folder pipeline/step5_encode_parental_haplotypes/src/encode.py
.
It takes as arguments:
Option name | Argument | Default | Description |
---|---|---|---|
-i [--input_vcf ] | STRING | NA | Haploid-based imputed data in vcf.gz format |
-p --prob_file | STRING | NA | Parent-of-origin assignment probability file (see step 5 of the tutorial |
-o --output | STRING | NA | Output file prefix |
This script uses the PofO assignment probability produced in step 5 of the tutorial (specifically, it uses the file PofO_probability.txt
as input) to assign allele to maternal and paternal haplotypes. Additionally, it encode the differential haplotype. For this, it uses only heterozygous sites, and encode the data as 1 if the allele in paternally inherited, and 0 if maternally inherited. All homozygous are set as missing.
To do this, you can simply run the following command and adapt the input files to your data:
bash step0_encode_haplotypes.sh
The above command will produce three output files in .vcf.gz
`format:
${output}.paternal\_haplotype.vcf.gz
${output}.maternal\_haplotype.vcf.gz
${output}.differential\_haplotype.vcf.gz