Users can upload the genome variations of their interest by pasting them in a search box or uploading a file containing them. A Variant Call Format (VCF)-like input is expected, i.e. the first to the fifth columns indicate chromosome, position, SNP ID, reference allele and alternative allele, respectively, but the first two columns (chromosome and position) are mandatory. Before submitting, users can select options to filter variations with conformational annotations to focus on those that they consider are worth examining in detail with respect to the effects on protein functions.
Searching structures for genomic variants from the main page
Enter genome coordinates in the text box or upload a file that contains the genome coordinates. Users can focus on the variants that are mapped to the structures passing several filters by using following “Search options”.- “Sequence Identity (%)” indicates the sequence identity of the mapped human protein and the protein in the PDB. By default, at least 80% is required for the proteins to be displayed in the result.
- “Relative accessible surface area (rASA)” is the accessible surface are (ASA) of the amino acid relative to the value in an extended conformation of Gly-X-Gly tripeptide for the same residue type. rASA takes a value between 0 and 1. If you are interested in the residues buried in protein core, limit rASA to small value e.g. from 0 to 0.3.
- “Secondary structure” can be used to limit the variants to those with specified secondary structures. Secondary structures are classified into eight classes by using DSSP program; “H”: α-helix, “G”: 3-10 helix, “I”: π-helix, “B”: beta-bridge, “E”: extended strand, “S”: bend, “T”: hydrogen bonded turn and “ “: others.
- “Change of relative ASA in complex” indicates the decrease in rASA for an amino acid is situated in a protein complex compared to in a monomeric protein conformation. In other words, setting large values, e.g. from 0.5 to 1, for this area, you can choose those amino acids that make significant contribution to protein complex formation or protein-protein interaction.
- “UniProt accession for interaction partner” option is used to select those amino acids that are located in the interaction surface to the protein specified by the given UniProt accession. For example, if you are interested in the variations that change the amino acids located in the interface to the P53 protein, enter “P04637”, the accession for human P53 protein, in this box.
- “Distance to ligands (Å)” and “Ligand name” options are used to filter those variants that are mapped to the amino acids at close proximity from ligands. Those amino acids that are closer than the “Distance to ligands (Å)” to the ligands specified by “Ligand name” are selected. If “Ligand name” is blank, any non-protein residues other than water are regarded as ligands.
- “PDB experimental method” specifies the experimental method for structure determination, e.g. X-RAY DIFFRACTION, SOLUTION NMR and ELECTRON MICROSCOPY.
- “PDB publication date (yyyy-mm-dd)” can be used to filter PDB structures by their publication date. This option may be used to focus on the variations on previously determined PDB structures, e.g. from 2020-01-01 to 2020-12-31.
Result Summary page
In the result summary page, among the variations searched in the main page, those variations that are mapped to at least one PDB structures that satisfy the specified search options are listed in a table. Each row of the table corresponds to a variation in the input.Base and amino acid change column shows the changes in mRNA and amino acid sequences in the RefSeq database, together with the description of the mRNA. The summary column shows the summary of the structural features of the mapped amino acids. As a single variant can be mapped to multiple amino acids from different PDB files, numerical structural features are summarized in the form of “median (min-max)(number)” if multiple values exist or “value (number)” if all of the vales are the same, whereas categorical features such as disorder and secondary structures are summarized by the fraction of each feature value.
Each column of the summary information indicates
- rASA: the relative accessible surface area of the residue in its monomeric form
- buASA: the relative accessible surface area of the residue in a biological unit, which represents biologically active protein complex. If multiple biological units are available for a PDB entry, one with the smallest index was used.
- ΔASA: the difference in relative ASA between biological unit and monomeric state
- Disorder: the fraction of disordered residues among all of the mapped amino acids. An amino acid is considered disordered if its atom coordinates are missing.
- Hetero: the UniProt accessions of non-identical protein-protein interaction (PPI) partner proteins for which the amino acids take part in the interface.
- Homo: the fraction of amino acids that are involved in homotopic PPI
- SS8: the fractions of amino acids in eight secondary structure classes calculated by DSSP program. The SS classes are “H”: α-helix, “G”: 3-10 helix, “I”: π-helix, “B”: beta-bridge, “E”: extended strand, “S”: bend, “T”: hydrogen bonded turn and “ “: others
- SS3: the fractions of amino acids in three secondary structure classes calculated by DSSP program. SS8 classes “H”, “G” and “I” are classified as SS3 class “H”(helix), SS8 classes “B” and “E” as SS3 class “S” (sheet), and SS8 classes “S”, “T” and “ “ as SS3 class “L”(loop).
- Disulfide (inter): the fraction of amino acids involved in inter-subunit disulfide bond
- Disulfide (intra): the fraction of amino acids involved in intra-subunit disulfide bond
- HBond (M): the number of hydrogen bonds that the main-chain atoms (N or O) of the amino acid make
- HBond (S): the number of hydrogen bonds that the side-chain atoms of the amino acid make
- Contact (inter): the number of amino acids that make inter-subunit contacts with the mapped amino acids. Residue contact is defined if at least one pair of heavy atom of the two residues, which are not peptide bonded, are at a distance shorter than 6 Å.
- Contact (intra): the number of amino acids that make intra-subunit contacts with the mapped amino acids.
- HBonds to DNA/RNA (M): the number of hydrogen bonds from the main-chain atoms (N or O) of the amino acids to DNA or RNA bases
- HBonds to DNA/RNA (S): the number of hydrogen bonds from the side-chain atoms of the mapped amino acids to DNA or RNA bases
- Ligand: three-letter codes of the non-protein molecules that are in contact (at most 10Å) with the mapped amino acids.
Variant Page
A variant page displays the structural features of all of the mapped amino acids in the PDB for a given variant and can be accessed by clicking the “detail” column in the result summary page.The table on the top of the page shows variant information including chromosome, position, ID, reference and alternative alleles, base and amino acid changes and the description of mRNA.
Below the variant information table is a MolMil structure browser, which highlights the mapped amino acid and the subunit in which the mapped amino acid is included.
In the bottom, all of the mapped amino acids are shown in a table format. The columns are roughly divided into entity, residue, monomeric, bio-unit, ligand and view.
The entity columns describe the mapped protein in the PDB and sequence identity between the human protein and the mapped protein
- PDB: PDB entry in which the amino acid is mapped
- Entity: Entity ID of the mapped protein structure
- UniProt: UniProt Accession for the mapped PDB structure
- Identity: Sequence identity between the query human protein and PDB protein by BLAST alignment
- Evalue: E-value of the BLAST search for the hit
- Date: publication date of the PDB entry
- ExpMethod: experimental method
- Name: Residue name in three-letter code
- Label: label_asym_id and label_seq_id of the amino acid.
- Auth: auth_asym_id and auth_seq_id of the amino acid. “NA” is shown if the coordinate is missing (disorder)
The monomeric columns describe the structural features of the mapped amono acids in a monomeric state of the protein, i.e. the coordinates of the protein subunit is virtually isolated from the other coordinates in the PDB file.
- ASA: Accessible surface area of the residue in Å2. If the coordinate for the amino acid is missing (“DO”) or the number of heavy atoms in the coordinate is different from the standard number for the residue type (“NA”), ASA is not provided.
- rASA: ASA relative to the ASA in a Gly-X-Gly tripeptide in extended conformation for the same residue type.
- SS: secondary structure in eight classes; “H”: α-helix, “G”: 3-10 helix, “I”: π-helix, “B”: beta-bridge, “E”: extended strand, “S”: bend, “T”: hydrogen bonded turn and “ “: others
- φ: dihedral angle φ
- ψ: dihedral angle ψ
- ASA: ASA of the residue in a conformation specified in the biological unit.
- rASA: ASA of the residue in a conformation specified in the biological unit.
- ΔASA: the difference in ASA between monomeric and biounit structures.
- Interaction: if ΔASA is larger than 0, the interaction partner is shown in the format of “label_asym_id:UniProt Accession: ΔASA”. Here, ΔASA is calculated as the difference in ASA between the monomeric state and a complex state consisting of the mapped subunit and the interaction partner
- Name: the three-letter code of the ligand
- Distance: the atomic distance of the closest atom pair between the protein and ligand
- Atom(p): the atom name of the protein
- Atom(l): the atom name of the ligand