After converting the Sequence Read Archives to Fastq paired-end format using fastq-dump command of SRAtoolkit (2.9.0 version, https://www.ncbi.nlm.nih.gov/sra/docs/toolkitsoft/), the quality control was done by FastQC for each sample (0.11.6 version, http://www.bioinformatics.babraham.ac.uk/projects/fastqc/). Adaptors and low quality reads filtered using Trimmomatic (0.36 version)17. The clean reads were aligned to the equine genome reference (EquCab2) using Burrows-Wheeler Aligner (0.7.17-r1188 version)18 and converted to binary with SAMtools software (1.7 version)19. Picard (2.17.11 version, https://broadinstitute.github.io/picard/) was used to removing PCR duplication as well as to removing the systematic bias, according to the recommended workflow in Genome Analysis Toolkit (GATK) software (3.8.0 version)20 the base quality score recalibration was performed.
The variant discovery was performed by applying “HaplotypeCaller” to detecting insertion-deletions (Indels) and SNPs in a VCF file. We separated SNPs and Indels through “selectVariant” option in GATK. In order to identifying high-quality SNPs and Indels, we set a hard filtration criteria for SNPs and Indels respectively: (1) “QUAL < 40.0 || QD < 2.0 || MQ 60.0 || MQRankSum < -12.50 || ReadPosRankSum < -8.0” ,(2) “QUAL < 40.0 || QD 200.0 || ReadPosRankSum < -20.0”. All filtered Indels and SNPs annotated using SNPEff (4.3.1t-0 version, 2018-03-27).
For the better deciphering of the genetic relationship among all individual, the phylogenetic tree was constructed a using VCF-kit (0.1.6 version) also, the FigTree software (1.4.3 version) was used to visualize the Phylogenetic network.