HRaDis
HomoRepeats and human Diseases
About this service
Using HRaDis (HomoRepeats and human Diseases) service you can:
  • Study relation of homo-repeats (single-amino-acid tandem repeats) to human diseases.
  • Search for proteins with the given homo-repeat in the human proteome including the list of diseases and the GO annotations for these proteins.
  • Study the coupling of different homo-repeats in one protein.

Eighteen known neurological diseases are associated with genetic abnormalities that elongate simple single letter motifs [1]. Thus, the presence of long (exceeding an acceptable value) polyglutamine and polyalanine repeats in proteins is associated with such diseases. Indeed, previous reports indicate that developmental diseases are associated with homo-repeat expansions such as poly-A (alanine): synpolydactyly type II (HOXD13), blepharophimosis (FOXL2), oculopharyngeal muscular dystrophy (PABPN1), infantile spasm syndrome (ARX), and holoprosencephaly (ZIC2) [2]. Expansion of polyalanine tracts causes at least 9 inherited human diseases, and the pathogenic mechanism of expanded polyalanine tracts contributing to the associated disease states remains poorly understood.

Expansion of poly-Q is implicated in several neurodegenerative diseases, including Huntington’s disease and several spinocerebellar ataxia types. It should be noted that the length of the polyQ repeat is critical to pathogenesis [3]. The cause of Huntington’s disease is the multiple insertion of a CAG codon that codes glutamine, in the IT_15 gene. The wild type genes of different people contain different numbers of the CAG repeats; however, if their number exceeds 36, the disease develops. Although, a 40 glutamine repeat is the normal allele present in the forkhead box P2 transcription factor, a protein that has not been found to be associated with a poly-Q disease [3].

Occurrence of homo-repeats in the protein sequence results in the increasing aggregation ability of the protein. The above mentioned data emphasize the importance of investigation of the functional role of amino acid homo-repeats.


Example of polyalanin tracks in ELN_human protein and the list of diseases assoaciated with this protein:



  1. Jorda J., Xue B., Uversky V.N., Kajava A.V. 2010. Protein tandem repeats: The more perfect, the less structured. FEBS J. 277, 2673–2682.
  2. Mularoni L, Ledda A, Toll-Riera M, Albà MM. 2010. Natural selection drives the accumulation of amino acid tandem repeats in human proteins. Genome Res. 20:745–754.
  3. Robertson AL, Bate MA, Androulakis SG, Bottomley SP, Buckle AM. PolyQ: a database describing the sequence and domain context of polyglutamine repeats in proteins. Nucleic Acids Res. 2011 Jan;39(Database issue):D272-6.


 
Other our services
©
Search proteins with the homorepeat of      by a length of     or longer
and having a diseases annotated by MIM:  (checked MIM-s means "At least one must be present")
  • Diseases
  • Sequence
  • Protein info
Tab1
Tab2
Tab3
List of proteins
     searched        longer        shorter   XXX other
Description

The shorter an amino acid homo-repeat is, the more probable its accidental occurrence in a protein sequence and, therefore, the less significant its impact on the protein structure and function are. The shortest repeats taken into account in genome analysis are five to seven amino acids long [1–5]. This is the initial length at which a homo-repeat can influence both protein function and structure.

To see the occurrence of a homo-repeat in proteins associated with diseases, at the first step the user should choose an amino acid (among 20), and then the length of the homo-repeat. For each protein UniProtID, Function, and GO annotations are present in the section Protein Info. All diseases associated with the given protein are described in the section Diseases and classified according to OMIM classifications.

We demonstrated that homorepeats with the length larger than 4 for such amino acids as L, S, A, G, P have larger propensities to be coupled with diseases (Fig.1).


Fig. 1

Fig. 1. Fraction of proteins linked to disease. The data were taken from the OMIM database http://www.omim.org/. Green colour corresponds to homo-repeats with the length larger than 4 and with Z>5, yellow with 3

Statistical analysis of homo-repeats

In the case when the homo-repeat and disease frequencies are independent, their distribution will have an average number of proteins

(1)

and the root-mean-square deviation

(2)

And the Z-value

(3)

where N is the number of proteins in the human proteome equal to 59053. Na is the number of proteins associated with the disease (2501 according to MIM database, see table 1), and Nb is the number of proteins with homo-repeats with the length larger or equal to 5. Nab is the number of proteins carrying both characters in our database.


Table 1. Number of proteins with homo-repeats larger than 4 associated with the diseases according OMIM database http://www.omim.org/ , bold figures correspond to Z>5.

table1

Online Mendelian Inheritance in Man (OMIM®) is a continuously updated catalog of human genes and genetic disorders and traits, with particular focus on the molecular relationship between genetic variation and phenotypic expression.

Each OMIM entry is given a unique six-digit number as summarized below:
1----- (100000- ) 2----- (200000- ) Autosomal loci or phenotypes (entries created before May 15, 1994)
3----- (300000- ) X-linked loci or phenotypes
4----- (400000- ) Y-linked loci or phenotypes
5----- (500000- ) Mitochondrial loci or phenotypes
6----- (600000- ) Autosomal loci or phenotypes (entries created after May 15, 1994)


  1. Karlin S. 1995. Statistical significance of sequence patterns in proteins. Curr. Opin. Struct. Biol. 5 (3), 360–371.
  2. Katti M.V., Ranjekar P.K., Gupta V.S. 2001. Differential distribution of simple sequence repeats in eukaryotic genome sequences. Mol. Biol. Evol. 18 (7), 1161–1167.
  3. Karlin S., Brocchieri L., Bergman A., Mrazek J. 2002. Amino acid runs in eukaryotic proteomes and disease associations. Proc. Natl. Acad. Sci. U. S. A. 99 (1), 333–338.
  4. Lobanov MY, Galzitskaya OV. 2012. Occurrence of disordered patterns and homorepeats in eukaryotic and bacterial proteomes. Mol. Biosyst. 8:327–337.
  5. Lobanov MY, Galzitskaya OV. 2011. Disordered patterns in clustered Protein Data Bank and in eukaryotic and bacterial proteomes. PloS One 6:e27142.

Authors:
Galzitskaya O.V.
Team Leader
E-mail: ogalzit@vega.protres.ru
Lobanov M.Yu.
Programming
Sokolovskiy I.V.
Programming, Web-programming
Send message to authors