Genome-wide association studies have linked millions of genetic variants to biomedical phenotypes, but their utility has been limited by lack of mechanistic understanding and widespread epistatic inte Show more
Genome-wide association studies have linked millions of genetic variants to biomedical phenotypes, but their utility has been limited by lack of mechanistic understanding and widespread epistatic interactions. Recently, Transformer models have emerged as a powerful machine learning architecture with potential to address these and other challenges. Accordingly, here we introduce the Genotype-to-Phenotype Transformer (G2PT), a framework for modeling hierarchical information flow among variants, genes, multigenic systems, and phenotypes. As proof-of-concept, we use G2PT to model the genetics of TG/HDL (triglycerides to high-density lipoprotein cholesterol), an indicator of metabolic health. G2PT predicts this trait via attention to 1,395 variants underlying at least 20 systems, including immune response and cholesterol transport, with accuracy exceeding state-of-the-art. It implicates 40 epistatic interactions, including epistasis between Show less
The Cancer Genome Atlas (TCGA) sequencing analysis of head and neck squamous cell carcinoma (HNSCC) recently reported on gene fusions, however, few human papillomavirus (HPV) positive samples were inc Show more
The Cancer Genome Atlas (TCGA) sequencing analysis of head and neck squamous cell carcinoma (HNSCC) recently reported on gene fusions, however, few human papillomavirus (HPV) positive samples were included, and the functional relevance of identified fusions was not explored. We therefore performed an independent analysis of gene fusions in HPV-positive oropharyngeal SCC (OPSCC). RNA sequencing was performed on 47 HPV-positive OPSCC primary tumors and 25 normal mucosal samples from cancer unaffected controls on an Illumina TruSeq platform. MapSplice2 was used for alignment and identification of fusion candidates. Putative fusions with less than five spanning reads, detected in normal tissues, or that mapped to the same gene were filtered out. Selected fusions were validated by RT-PCR and Sanger sequencing. Within 47 HPV-positive OPSCC tumors, 282 gene fusions were identified. Most fusions (85.1%) occurred in a single tumor, and the remaining fusions recurred in 2-16 tumors. Gene fusions were associated with significant up regulation of 16 genes (including EGFR and ERBB4) and down regulation of four genes (PTPRT, ZNF750, DLG2, SLCO5A1). Expression of these genes followed similar patterns of up regulation and down regulation in tumors without these fusions compared to normal tissue. Five of six gene fusions selected for validation were confirmed through RT-PCR and sequencing. This integrative analysis provides a method of prioritizing functionally relevant gene fusions that may be expanded to other tumor types. These results demonstrate that gene fusions may be one mechanism by which functionally relevant genes are altered in HPV-positive OPSCC. Show less