haploid imputation of parental haplotypes

Table of contents

  1. haploid imputation of parental haplotypes
    1. Rationale
    2. Pipeline

Rationale

In this section, we will start with our haploid-based imputed data, that we will recode into paternal haplotype, maternal haplotype and differential haplotype using of PofO predictions.


Pipeline

This step rely on a single script that you can found in folder pipeline/step5_encode_parental_haplotypes/src/encode.py.

It takes as arguments:

Option nameArgumentDefaultDescription
-i [--input_vcf ]STRINGNAHaploid-based imputed data in vcf.gz format
-p --prob_fileSTRINGNAParent-of-origin assignment probability file (see step 5 of the tutorial
-o --outputSTRINGNAOutput file prefix

This script uses the PofO assignment probability produced in step 5 of the tutorial (specifically, it uses the file PofO_probability.txt as input) to assign allele to maternal and paternal haplotypes. Additionally, it encode the differential haplotype. For this, it uses only heterozygous sites, and encode the data as 1 if the allele in paternally inherited, and 0 if maternally inherited. All homozygous are set as missing.

To do this, you can simply run the following command and adapt the input files to your data:

bash step0_encode_haplotypes.sh

The above command will produce three output files in .vcf.gz`format:

  • ${output}.paternal\_haplotype.vcf.gz
  • ${output}.maternal\_haplotype.vcf.gz
  • ${output}.differential\_haplotype.vcf.gz


Back to top

Copyright © 2022-2025 Robin Hofmeister, Theo Cavinato and Olivier Delaneau | All Rights Reserved | THORIN executables and source code are distributed under the MIT license.