Characteristic | 1-missingness | Mean (SD) |
---|---|---|
Sample Volume (uL) | 100% | 60.15 (3.80) |
DNA mass (ng) - calculated | 99.6% | 13693.10 (10186.93) |
DNA concentration (ng/uL) | 99.6% | 228.58 (170.61) |
DNA purity (260/280) | 17.5% | 1.72 (0.10) |
Notes:
Total number of regions 712,265
2,572,251,922 (84%)
HG38 Total non-N lenght = 3,074,968,030
Samples | Retrieved | Submitted | Intersection |
---|---|---|---|
N | 2,357 | 2,069 | 2,036 |
Setdiff | 321 | 33 | 0 |
Gender | 1971/65 | ||
Singletons | 134 | 134 (46/88) | |
Twins pairs | 951 | 951 (401/550) | |
Triplets | 0 | -- | |
Trios | 159 (with DNA) | 161 (41/120) | |
Parents | 175 (with DNA) | ||
Missing from DB | 106 | ||
Missing Annot | 40 |
Total = Singletons + 2xTwins + Parents = 2211
Omics | N |
---|---|
PainExomes | 272 |
GOT2DExomes | 100 |
UK10K | 861 |
EB_Fat | 545 |
EB_Skin | 516 |
EB_LCL | 586 |
EB_WB | 298 |
Fat_450K | 449 |
10 Aug 2016
~1.03 in Montgomery 2013 GR
ratio of transition (Ti) to transversion (Tv) ~ 2.0-2.1
number of concordant sites (that is, for the sites that share the same locus as a variant in the comp track, those that have the same alternate allele) / total
dbSNP
FDR=0.08
FP=14,458
TPhat=168,759
Total=183,804
#MultiSNP / #SNP=0.01
Multiallelic variants
Multiallelic variants
knownSNPsPartial- the number of loci at which at least one allele in eval was found in the known comparison file
knownSNPsComplete- the number of loci at which all alleles in eval were also found in the known comparison file
SNPNoveltyRate- the sum of knownSNPsPartial and knownSNPsComplete divided by nMultiSNPs
Multiallelic variants
Filter Chr20: vcftools
Merge VCF: GATK CombineVariants/VCFmerge
WG HLI HG38 VCF
Convert to plink: vcftools, GATK
flip +ve strand: plink
Annotate VCF: ANNOVAR
Filter highpass regions: Bedtools
gen file
matrixeqtl file
phasing: shapeit2
GenotypeConcordance: GATK
multiallelic test
Exon HG19 vcf
Coordinate change HG38:
liftOver/crossmap
convert files: qctool / gtools
plink file
phased plink file
SUM(I(PI_HAT<0.9)) = 22
SUM(I(PI_HAT<0.9)) = 5
Individual is unrelated to any sample, including its genotyped image
Discordant individuals: 5201, 5202, 50472, 59022, 92521
Sample swap within twin pairs:
HLI 5202 is actually SANGER 5201
HLI 5201 is actually SANGER 5202
Sample swaps in unpaired twins:
HLI 50472 is actually SANGER 50471
HLI 59021 is actually SANGER 59022
Individuals with low relatedness with SANGER sample:
HLI 92521 is matched to SANGER 92521 but with lower relatedness.
Unpaired twins either in SANGER of HLI:
SANGER59021
HLI50471
After removal of 16 individuals with |F|>.1
Before filtering SNP with MAF<0.05 and HWE<1E-6
[alvesa@athena HLI_HG38]$ ls | grep 6052
60521.vcf.gz
60521.vcf.gz.tbi
Client.Subject.ID Gender Ethnicity Birth.Date FamilyID Relation Zygosity Stool.Sample HLI.genome.ID 1731 60522 F White 10/4/35 762 Twin MZ TRUE 176500025 |
|
|