Methods
Searching of causal genes for Mendelian stroke (genes list of panel)
In order to construct the panel, a comprehensive search for causal mutation and genes of Mendelian strokes was performed on Online Mendelian Inheritance in Man, Human Phenotype Ontology, Human Gene Mutation Database professional databases and the PubMed in November 2017. The search was conducted using keywords of the following three items and were combined by the Boolean logical operator AND: category of hereditary disease (including “monogenic”, “Mendelian”, “single-gene”, “disorder”, and “disease”); genetic mutations (including “pathogenic mutation”, “base pair mismatch”, “DNA repeat expansion”, “trinucleotide repeat expansion”, “frameshift mutation”, “gain of function mutation”, “gene amplification”, “gene duplication”, “genomic instability”, “microsatellite instability”, “germ-line mutation”, “in/del mutation”, “loss of function mutation”, “mutagenesis, insertional”, “mutation, missense”, “point mutation”, “sequence deletion”, “gene deletion”, “sequence inversion”, “suppression, genetic”, and “synthetic lethal mutations”); and cerebrovascular disorders (including “stroke”, “cerebrovascular disease”, “ischemic stroke”, “brain infarction”, “transient ischemic attack”, “TIA”, “intracerebral hemorrhage”, “subarachnoid hemorrhage”, “aneurysm”, “moyamoya disease”, “moyamoya syndrome”, “artery dissection”, “arterial-venous malformation”, and “systematic embolic”) (figure 1).
Figure 1Screening genes of the panel. HGMD, Human Gene Mutation Database; HPO, Human Phenotype Ontology; OMIM, Online Mendelian Inheritance in Man.
Three authors carried out the above manual search, and the inclusion criteria for the search results were items or literatures that (1) reported a causal mutation for cerebrovascular disease, (2) described the detailed phenotype of the patients, (3) contained functional verification or prediction for the mutations. Items or literatures that were conducted using non-human materials or did not report clear cause–effect relationships between mutations and phenotypes were excluded. Any disagreement on inclusion of the items or literatures was reviewed by a senior neurologist and resolved by consensus from senior neurologists.
Afterwards, two authors independently extracted the genes and mutations in the qualified search items (category of hereditary disease; genetic mutations; cerebrovascular disorders) and literatures using a standardised form (online supplementary table s1). Furthermore, a senior expert reviewed the two lists of genes and mutations, and resolved the disagreements based on the agreement of inclusion criteria.
The panel also contained genes and genetic variants that were associated with stroke risk factors or susceptibility, as well as some drug metabolism-related genetic variants of stroke therapy, and other genes designated by the expert group (figure 1). These genes were added for scientific research purposes and will be discussed in other studies.
Construction of the panel
According to the provided list of genes, a SureSelect Target-Enrichment panel was designed using the online tool SureDesign (https://earray.chem.agilent.com/suredesign, Agilent technologies, Santa Clara, California, USA). The panel mainly covered coding exons of the genes, and also covered some genetic variants that were associated with ability of drug metabolism. The panel was designed under default parameter settings of SureDesign.
DNA preparation and NGS
For each participant, DNA was isolated from peripheral leukocytes using DNA Isolation Kit (Bioteke, AU1802, Beijing, CHN). DNA libraries were prepared using KAPA Library Preparation Kit (Kapa Biosystems, KR0453, Wilmington, Massachusetts, USA) following the manufacturer’s instructions. Target fragments were captured using the designed panel. Paired-end reads (150 bp) were generated by HiSeq X10 or Novaseq (Illumina, San Diego, California, USA).
Bioinformatics analysis
Trimmomatic (V.0.36) was applied to remove adapters and low-quality reads.10 Afterward, qualified reads were aligned to the human reference genome sequence from the University of California, Santa Crus Genome Browser Database (UCSC) (hg19, downloaded from http://genome.ucsc.edu/) using the Burrows-Wheeler Alignment tool.11 Genetic variants were called using the Genome Analysis Tool Kit, V.4.0.12.0 joint calling function under best practice guidance.12–14 A hard filter (depth ≥9, genotype quality score ≥15) was applied for quality control of the variants. Genetic variants with allele frequency <1% in 1000 genome, Genome Aggregation Database (gnomAD) and Exome Sequencing Project V. 6500 (esp6500) were further annotated by Clinical Interpretation of Genetic Variants by the 2015 American College of Medical Genetics and Genomics (ACMG)-Association for Molecular Pathology (AMP) Guidelines (InterVar) and dbscSNV under the guidelines of the ACMG and the AMP.15 16
Variant classification
Candidate variants were estimated for pathogenicity based on the ACMG guidelines.17 The special criteria are as follows: (1) whether the variant was reported by functional or family segregation study, previously; (2) the type of the variant (eg, nonsense mutation, frameshift mutation or splicing mutations); (3) variant frequency in the ExAC, gnomAD and 1000 Genomes Project databases; (4) conservation of the altered residue and (5) family segregation studies and de novo mutation. According to this information, a variant was further categorised into one of the following categories: pathogenic, likely pathogenic, variants of uncertain significance (VUS), likely benign or benign.
After the patient was submitted for examination, the variants were interpreted for pathogenicity according to the database retrieval at that time. However, due to the continuous updating of the database and the frequency data of different ethnic groups, the latest database was introduced to interpret these variants again. All of the interpretation processes are in strict accordance with ACMG guidelines.17
Informed consent and clinical diagnoses
In order to explore the effectiveness of the panel, patients who were highly suspected to be affected by Mendelian strokes were recruited. Informed consent was provided by patients from April 2018 to February 2019. Detailed clinical features, cerebrovascular risk factors and family history were collected by neurologists. The diagnostic criteria for Mendelian stroke phenotype-based algorithm which had referred to a literature and made some revisions were2: (1) patients suffered from a stroke or transient ischaemic attack with unknown etiopathogenic causes; (2) the presence <3 conventional vascular risk factors (such as hypertension, hypercholesterolaemia, diabetes mellitus, hyperhomocysteinaemia, obesity, hyperuricaemia, atrial fibrillation and smoking), young age at onset (≤55 years), positive familial history or specific clinical features of Mendelian stroke (such as angiokeratoma, O"Sullivan sign, et al); (3) an age of onset >55 years old, positive familial history, highly suspected Mendelian stroke by experienced neurologists. Patients that conform to (1) and any two of (2) or any two of (3) were highly suspected to be affected by Mendelian stroke and were enrolled in this study.
Confirmation
Sanger sequencing (ABI 3730 DNA Analyzer, Thermo Fisher Scientific, Massachusetts, USA) was performed to verify the genetic variants of pathogenic/likely pathogenic/VUS. The primers were designed using Primer Premier V.5.0 (Premier Biosoft, USA) and PCR was performed to amplify the fragments covering the mutated sites on a LifeECO Thermal Cycler TC-96/G/H (b)C(Bioer Technology, CHN). The PCR products were further purified using agarose gel electrophoresis and then sequenced. Sanger sequencing results were analysed by Chromas Lite V.2.01 (Technelysium, Tewantin, QLD, Australia).
Diagnostic testing
Diagnostic testing was performed to evaluate the specificity and sensitivity of the panel by investigating the consistency between panel and Sanger sequencing. The reference sequences were retrieved from the hg19 human genome in the UCSC genome browser, and Sanger sequencing results were aligned and compared with reference sequences using Lasergene SeqMan Pro software (Version 7.1.0, DNASTAR, Madison, USA). For each of the 78 point mutations (online supplementary table s2), the diagnostic testing was performed by comparing the base calling between panel and Sanger sequencing in the fragment of 41 bp that ranges from 20 bp upstream to 20 bp downstream of the point mutation. For each of the 3 In/Dels (online supplementary table s2), the diagnostic testing was performed by comparing the base calling between panel and Sanger sequencing in the upstream or downstream 40 bp fragments of the In/Del mutation.