- Search for proteins with the given homorepeat in 122 proteomes, including the GO annotation for these proteins.
- Search for proteins with the given disordered pattern from the library of disordered patterns constructed on the clustered PDB in 122 proteomes, including the GO annotations for these proteins.
- Analyze the lengths of homorepeats in different proteomes.
- Investigate disordered regions in the chosen proteins in 122 proteomes.
- Study the coupling of different homorepeats in one protein.
- Determine longest runs for each amino acid inside each proteome.
- Download the full list of proteins with selected pattern or the given length of a homorepeat.
Example of homorepeat N16 in the protein which contains also homorepeats H7, Q6, Q8, and Q16:
SEQUENCE Q9VSB3_DROME 628 aa 17.D_melanogaster GO:0005515; F:protein binding GO:0009987; P:cellular process HR: HomoRepeat Q: Q6, Q8, Q15 HR: HomoRepeat N: N2, N16 HR: HomoRepeat H: H7 PT: Pattern set 2012: 1 HHHH PT: Pattern set 2012: 71 QQQQQQQ PT: Pattern set 2012: 98 NNNNN PT: Pattern set 2012: 109 QQQQQP 1 MHKTTTNMQQ TTDLDFDFQM PCRYFKSSFA RSLSLNNNNN NNNNNNNNNN NALTLPKKPP TNEAKLQQQQ QQQQLENQES LERDNEQDSP CATPPPALPA 100 101 RRHTANMIHF GASNQLLTAQ PPHPPQHPQP PAQSARNLNL VWPMQPTHQP TQQQDANILA APSHHIQSHI YDLPQQMQPH QHHHHHHHQQ QQQQPQQQQQ 200 201 QQQQQQQQQQ HQLNVEAVIL QNQVDTLHWQ LKQTETNCEM YRAVMEEVAR FFERYQLQQQ LQQTQRNGEQ IARSKSLHHV HGVGNTSLQS DARDDDASSG 300 301 GSASYLRARS STNLMLNKSM HAMDEEHNYE TIAPAGSYNA FKDFTWRRSP KKSGGSGGCK SRLSAPEAAE EKLNQEAFRL ARTIRNLLHT SEQQPDLTQP 400 401 RHSLASISSL PSGNHRLCKG KTSSVMTLLT PPLHNSTSIM SATLETPSPG GKSNAELIFL RANNMRDSRL SLRSSTDSSV HSTISSTASS SSKVETDEET 500 501 QTQTTASNTA ISNSNNKTTS NKQSGSSTED ESGFSSISSF HDVGLPLSST LMNGNQRRLS MSSDSRNSTL KSGLNMVGLP MQNQTQVQVQ VQAQISTAPS 600 601 PSKTYRNANR YQRFSTLSNE DAAAVLWV 628
Source of proteomes: UniProt
Superkingdom | Kingdom * | Phylum | Class | Proteome at UniProt | Search | Search | UniProt download |
All proteomes | homorepeats | patterns | |||||
Eukaryota | Metazoa | Chordata | Mammalia | 25.H_sapiens | homorepeats | patterns | link |
Eukaryota | Metazoa | Chordata | Mammalia | 22974.B_taurus | homorepeats | patterns | link |
Eukaryota | Metazoa | Chordata | Mammalia | 59.M_musculus | homorepeats | patterns | link |
Eukaryota | Metazoa | Chordata | Mammalia | 122.R_norvegicus | homorepeats | patterns | link |
Eukaryota | Metazoa | Chordata | Aves | 21457.G_gallus | homorepeats | patterns | link |
Eukaryota | Metazoa | Chordata | Actinopterygii | 20721.D_rerio | homorepeats | patterns | link |
Eukaryota | Metazoa | Chordata | Actinopterygii | 22388.T_nigroviridis | homorepeats | patterns | link |
Eukaryota | Metazoa | Arthropoda | Insecta | 17.D_melanogaster | homorepeats | patterns | link |
Eukaryota | Metazoa | Arthropoda | Insecta | 25396.D_pseudoobscura | homorepeats | patterns | link |
Eukaryota | Metazoa | Arthropoda | Insecta | 31436.A_aegypti | homorepeats | patterns | link |
Eukaryota | Metazoa | Arthropoda | Insecta | 78607.A_darlingi | homorepeats | patterns | link |
Eukaryota | Metazoa | Arthropoda | Insecta | 22426.A_gambiae | homorepeats | patterns | link |
Eukaryota | Metazoa | Nematoda | Chromadorea | 21633.C_briggsae | homorepeats | patterns | link |
Eukaryota | Metazoa | Nematoda | Chromadorea | 9.C_elegans | homorepeats | patterns | link |
Eukaryota | Metazoa | Nematoda | Chromadorea | 64800.L_loa | homorepeats | patterns | link |
Eukaryota | Metazoa | Nematoda | Enoplea | 79720.T_spiralis | homorepeats | patterns | link |
Eukaryota | Metazoa | Cnidaria | Anthozoa | 30565.N_vectensis | homorepeats | patterns | link |
Eukaryota | Viridiplantae | Streptophyta | Liliopsida | 23214.O_sativa | homorepeats | patterns | link |
Eukaryota | Viridiplantae | Streptophyta | Magnoliopsida | 3.A_thaliana | homorepeats | patterns | link |
Eukaryota | Viridiplantae | Chlorophyta | Prasinophyceae | 33157.Micromonas_sp | homorepeats | patterns | link |
Eukaryota | Viridiplantae | Chlorophyta | Prasinophyceae | 29351.O_lucimarinus | homorepeats | patterns | link |
Eukaryota | Viridiplantae | Chlorophyta | Prasinophyceae | 25972.O_tauri | homorepeats | patterns | link |
Eukaryota | Stramenopiles | Heterokontophyta | Phaeophyceae | 35109.E_siliculosus | homorepeats | patterns | link |
Eukaryota | Choanoflagellida | — | — | 30562.M_brevicollis | homorepeats | patterns | link |
Eukaryota | Euglenozoa | — | — | 83400.L_braziliensis | homorepeats | patterns | link |
Eukaryota | Euglenozoa | — | — | 83363.L_infantum | homorepeats | patterns | link |
Eukaryota | Euglenozoa | — | — | 71330.T_brucei_gambiense | homorepeats | patterns | link |
Eukaryota | Euglenozoa | — | — | 33602.T_cruzi | homorepeats | patterns | link |
Eukaryota | Alveolata | Apicomplexa | Aconoidasida | 32114.P_berghei | homorepeats | patterns | link |
Eukaryota | Alveolata | Apicomplexa | Aconoidasida | 31998.P_chabaudi | homorepeats | patterns | link |
Eukaryota | Alveolata | Apicomplexa | Aconoidasida | 493.P_falciparum | homorepeats | patterns | link |
Eukaryota | Alveolata | Apicomplexa | Aconoidasida | 31342.P_knowlesi | homorepeats | patterns | link |
Eukaryota | Alveolata | Apicomplexa | Aconoidasida | 31632.P_vivax | homorepeats | patterns | link |
Eukaryota | Alveolata | Apicomplexa | Aconoidasida | 21631.P_yoelii | homorepeats | patterns | link |
Eukaryota | Amoebozoa | — | — | 21395.D_discoideum | homorepeats | patterns | link |
Eukaryota | Amoebozoa | — | — | 35301.P_pallidum | homorepeats | patterns | link |
Eukaryota | Diplomonadida | — | — | 33600.G_intestinalis_ATCC_50803 | homorepeats | patterns | link |
Eukaryota | Diplomonadida | — | — | 35295.G_intestinalis_ATCC_50581 | homorepeats | patterns | link |
Eukaryota | Diplomonadida | — | — | 65115.G_intestinalis | homorepeats | patterns | link |
Eukaryota | Fungi | Ascomycota | Dothideomycetes | 25591.P_nodorum | homorepeats | patterns | link |
Eukaryota | Fungi | Ascomycota | Dothideomycetes | 79905.P_teres | homorepeats | patterns | link |
Eukaryota | Fungi | Ascomycota | Eurotiomycetes | 29154.A_clavatus | homorepeats | patterns | link |
Eukaryota | Fungi | Ascomycota | Eurotiomycetes | 33020.A_flavus | homorepeats | patterns | link |
Eukaryota | Fungi | Ascomycota | Eurotiomycetes | 22118.N_fumigata_ATCC_MYA-4609 | homorepeats | patterns | link |
Eukaryota | Fungi | Ascomycota | Eurotiomycetes | 31018.N_fumigata_CEA10 | homorepeats | patterns | link |
Eukaryota | Fungi | Ascomycota | Eurotiomycetes | 29130.A_niger | homorepeats | patterns | link |
Eukaryota | Fungi | Ascomycota | Eurotiomycetes | 23077.A_oryzae | homorepeats | patterns | link |
Eukaryota | Fungi | Ascomycota | Eurotiomycetes | 28239.A_terreus | homorepeats | patterns | link |
Eukaryota | Fungi | Ascomycota | Eurotiomycetes | 29157.N_fischeri | homorepeats | patterns | link |
Eukaryota | Fungi | Ascomycota | Eurotiomycetes | 31898.P_chrysogenum | homorepeats | patterns | link |
Eukaryota | Fungi | Ascomycota | Eurotiomycetes | 32999.P_marneffei | homorepeats | patterns | link |
Eukaryota | Fungi | Ascomycota | Eurotiomycetes | 33056.T_stipitatus | homorepeats | patterns | link |
Eukaryota | Fungi | Ascomycota | Eurotiomycetes | 34218.C_posadasii_C735 | homorepeats | patterns | link |
Eukaryota | Fungi | Ascomycota | Eurotiomycetes | 34307.P_brasiliensis_Pb03 | homorepeats | patterns | link |
Eukaryota | Fungi | Ascomycota | Eurotiomycetes | 34389.P_brasiliensis_Pb18 | homorepeats | patterns | link |
Eukaryota | Fungi | Ascomycota | Eurotiomycetes | 34392.P_brasiliensis_ATCC_MYA-826 | homorepeats | patterns | link |
Eukaryota | Fungi | Ascomycota | Eurotiomycetes | 34310.A_capsulata_ATCC_26029 | homorepeats | patterns | link |
Eukaryota | Fungi | Ascomycota | Eurotiomycetes | 34967.A_capsulata_H143 | homorepeats | patterns | link |
Eukaryota | Fungi | Ascomycota | Eurotiomycetes | 34495.A_dermatitidis_SLH14081 | homorepeats | patterns | link |
Eukaryota | Fungi | Ascomycota | Eurotiomycetes | 34498.A_dermatitidis_ER-3 | homorepeats | patterns | link |
Eukaryota | Fungi | Ascomycota | Eurotiomycetes | 35919.A_benhamiae | homorepeats | patterns | link |
Eukaryota | Fungi | Ascomycota | Eurotiomycetes | 34471.A_otae | homorepeats | patterns | link |
Eukaryota | Fungi | Ascomycota | Eurotiomycetes | 35921.T_verrucosum | homorepeats | patterns | link |
Eukaryota | Fungi | Ascomycota | Eurotiomycetes | 34386.U_reesii | homorepeats | patterns | link |
Eukaryota | Fungi | Ascomycota | Leotiomycetes | 30100.B_fuckeliana | homorepeats | patterns | link |
Eukaryota | Fungi | Ascomycota | Leotiomycetes | 30103.S_sclerotiorum | homorepeats | patterns | link |
Eukaryota | Fungi | Ascomycota | Saccharomycetes | 22024.C_albicans_SC5314 | homorepeats | patterns | link |
Eukaryota | Fungi | Ascomycota | Saccharomycetes | 32738.C_dubliniensis | homorepeats | patterns | link |
Eukaryota | Fungi | Ascomycota | Saccharomycetes | 19665.C_glabrata | homorepeats | patterns | link |
Eukaryota | Fungi | Ascomycota | Saccharomycetes | 34491.C_tropicalis | homorepeats | patterns | link |
Eukaryota | Fungi | Ascomycota | Saccharomycetes | 20018.D_hansenii | homorepeats | patterns | link |
Eukaryota | Fungi | Ascomycota | Saccharomycetes | 29447.L_elongisporus | homorepeats | patterns | link |
Eukaryota | Fungi | Ascomycota | Saccharomycetes | 29448.M_guilliermondii | homorepeats | patterns | link |
Eukaryota | Fungi | Ascomycota | Saccharomycetes | 28727.S_stipitis | homorepeats | patterns | link |
Eukaryota | Fungi | Ascomycota | Saccharomycetes | 20011.Y_lipolytica | homorepeats | patterns | link |
Eukaryota | Fungi | Ascomycota | Saccharomycetes | 34493.C_lusitaniae | homorepeats | patterns | link |
Eukaryota | Fungi | Ascomycota | Saccharomycetes | 34482.L_thermotolerans | homorepeats | patterns | link |
Eukaryota | Fungi | Ascomycota | Saccharomycetes | 30091.S_cerevisiae_YJM789 | homorepeats | patterns | link |
Eukaryota | Fungi | Ascomycota | Saccharomycetes | 31651.S_cerevisiae_RM11-1a | homorepeats | patterns | link |
Eukaryota | Fungi | Ascomycota | Saccharomycetes | 34506.S_cerevisiae_JAY291 | homorepeats | patterns | link |
Eukaryota | Fungi | Ascomycota | Saccharomycetes | 35062.S_cerevisiae_Lalvin_EC1118 | homorepeats | patterns | link |
Eukaryota | Fungi | Ascomycota | Saccharomycetes | 71242.S_cerevisiae_ATCC_204508 | homorepeats | patterns | link |
Eukaryota | Fungi | Ascomycota | Saccharomycetes | 30097.V_polyspora | homorepeats | patterns | link |
Eukaryota | Fungi | Ascomycota | Sordariomycetes | 79902.C_graminicola | homorepeats | patterns | link |
Eukaryota | Fungi | Ascomycota | Sordariomycetes | 35359.V_albo-atrum | homorepeats | patterns | link |
Eukaryota | Fungi | Ascomycota | Sordariomycetes | 34970.N_haematococca | homorepeats | patterns | link |
Eukaryota | Fungi | Ascomycota | Sordariomycetes | 22028.M_oryzae | homorepeats | patterns | link |
Eukaryota | Fungi | Ascomycota | Sordariomycetes | 25585.C_globosum_NBRC_6347 | homorepeats | patterns | link |
Eukaryota | Fungi | Ascomycota | Sordariomycetes | 22025.N_crassa | homorepeats | patterns | link |
Eukaryota | Fungi | Ascomycota | Sordariomycetes | 35280.S_macrospora | homorepeats | patterns | link |
Eukaryota | Fungi | Basidiomycota | — | 79908.P_graminis | homorepeats | patterns | link |
Eukaryota | Fungi | Basidiomycota | Agaricomycetes | 31020.C_cinerea | homorepeats | patterns | link |
Eukaryota | Fungi | Basidiomycota | Agaricomycetes | 31023.L_bicolor | homorepeats | patterns | link |
Eukaryota | Fungi | Basidiomycota | Agaricomycetes | 33031.P_placenta | homorepeats | patterns | link |
Eukaryota | Fungi | Basidiomycota | Tremellomycetes | 20846.C_neoformans_JEC21 | homorepeats | patterns | link |
Eukaryota | Fungi | Basidiomycota | Tremellomycetes | 21380.C_neoformans_B-3501A | homorepeats | patterns | link |
Eukaryota | Fungi | Basidiomycota | Ustilaginomycetes | 22029.U_maydis | homorepeats | patterns | link |
Bacteria | Acidobacteria | Acidobacteria | Solibacteres | 25797.S_usitatus | homorepeats | patterns | link |
Bacteria | Actinobacteria | Actinobacteria | Actinobacteria | 33926.C_acidiphila | homorepeats | patterns | link |
Bacteria | Actinobacteria | Actinobacteria | Actinobacteria | 35278.Frankia_sp_EuI1c | homorepeats | patterns | link |
Bacteria | Actinobacteria | Actinobacteria | Actinobacteria | 35534.F_sp_EUN1f | homorepeats | patterns | link |
Bacteria | Actinobacteria | Actinobacteria | Actinobacteria | 33113.R_opacus | homorepeats | patterns | link |
Bacteria | Actinobacteria | Actinobacteria | Actinobacteria | 25456.Rhodococcus_sp | homorepeats | patterns | link |
Bacteria | Actinobacteria | Actinobacteria | Actinobacteria | 37022.A_mediterranei | homorepeats | patterns | link |
Bacteria | Actinobacteria | Actinobacteria | Actinobacteria | 74443.K_setae | homorepeats | patterns | link |
Bacteria | Actinobacteria | Actinobacteria | Actinobacteria | 131.S_avermitilis | homorepeats | patterns | link |
Bacteria | Actinobacteria | Actinobacteria | Actinobacteria | 36666.S_bingchenggensis | homorepeats | patterns | link |
Bacteria | Actinobacteria | Actinobacteria | Actinobacteria | 84.S_coelicolor | homorepeats | patterns | link |
Bacteria | Actinobacteria | Actinobacteria | Actinobacteria | 34910.S_scabies | homorepeats | patterns | link |
Bacteria | Actinobacteria | Actinobacteria | Actinobacteria | 58962.S_violaceusniger | homorepeats | patterns | link |
Bacteria | Actinobacteria | Actinobacteria | Actinobacteria | 34011.S_roseum | homorepeats | patterns | link |
Bacteria | Proteobacteria | Proteobacteria | Alphaproteobacteria | 112.B_japonicum | homorepeats | patterns | link |
Bacteria | Proteobacteria | Proteobacteria | Betaproteobacteria | 22343.Burkholderia_sp_ATCC_17760 | homorepeats | patterns | link |
Bacteria | Proteobacteria | Proteobacteria | Betaproteobacteria | 25388.B_xenovorans | homorepeats | patterns | link |
Bacteria | Proteobacteria | Proteobacteria | Deltaproteobacteria | 33616.S_aurantiaca | homorepeats | patterns | link |
Bacteria | Proteobacteria | Proteobacteria | Deltaproteobacteria | 33223.H_ochraceum | homorepeats | patterns | link |
Bacteria | Proteobacteria | Proteobacteria | Deltaproteobacteria | 23351.M_xanthus | homorepeats | patterns | link |
Bacteria | Proteobacteria | Proteobacteria | Deltaproteobacteria | 32044.P_pacifica | homorepeats | patterns | link |
Bacteria | Proteobacteria | Proteobacteria | Deltaproteobacteria | 30295.S_cellulosum | homorepeats | patterns | link |
Bacteria | Bacteroidetes | Bacteroidetes | Sphingobacteria | 33930.C_pinensis | homorepeats | patterns | link |
Bacteria | Bacteroidetes | Bacteroidetes | Cytophagia | 32144.M_marina | homorepeats | patterns | link |
Bacteria | Chloroflexi | Chloroflexi | Ktedonobacteria | 36622.K_racemifer | homorepeats | patterns | link |
GO id | Type | Description | Ngo | Homorepeat | Nhm | Nhm,go | PZ | P1 | P2 |
---|---|---|---|---|---|---|---|---|---|
0016021 | C | integral to membrane | 85599 | L | 4386 | 665 | 9e-99 | 0.152 | 0.008 |
0016021 | C | integral to membrane | 85599 | C | 168 | 24 | 0.000107 | 0.143 | 0.000 |
0005634 | C | nucleus | 58238 | H | 3389 | 456 | 4e-103 | 0.135 | 0.008 |
0008270 | F | zinc ion binding | 68859 | Q | 20361 | 2680 | 0 | 0.132 | 0.039 |
0008270 | F | zinc ion binding | 68859 | H | 3389 | 433 | 2e-70 | 0.128 | 0.006 |
0005634 | C | nucleus | 58238 | Q | 20361 | 2461 | 0 | 0.121 | 0.042 |
0005634 | C | nucleus | 58238 | A | 18012 | 2148 | 0 | 0.119 | 0.037 |
0005891 | C | voltage-gated calcium channel complex | 237 | M | 115 | 13 | 4e-33 | 0.113 | 0.055 |
0008270 | F | zinc ion binding | 68859 | P | 13486 | 1453 | 2e-168 | 0.108 | 0.021 |
0008270 | F | zinc ion binding | 68859 | A | 18012 | 1882 | 2e-204 | 0.104 | 0.027 |
0005245 | F | voltage-gated calcium channel activity | 328 | M | 115 | 12 | 2e-28 | 0.104 | 0.037 |
0005634 | C | nucleus | 58238 | E | 13828 | 1393 | 2e-197 | 0.101 | 0.024 |
0070593 | P | dendrite self-avoidance | 60 | P | 13486 | 50 | 2e-91 | 0.004 | 0.833 |
0021551 | P | central nervous system morphogenesis | 66 | P | 13486 | 50 | 2e-87 | 0.004 | 0.758 |
0051635 | F | bacterial cell surface binding | 71 | P | 13486 | 50 | 1e-84 | 0.004 | 0.704 |
0008046 | F | axon guidance receptor activity | 74 | P | 13486 | 50 | 4e-83 | 0.004 | 0.676 |
0048846 | P | axon extension involved in axon guidance | 84 | P | 13486 | 53 | 2e-85 | 0.004 | 0.631 |
0008154 | P | actin polymerization or depolymerization | 69 | P | 13486 | 43 | 2e-69 | 0.003 | 0.623 |
0017048 | F | Rho GTPase binding | 297 | P | 13486 | 175 | 1e-270 | 0.013 | 0.589 |
0044403 | P | symbiosis, encompassing mutualism through parasitism | 53 | N | 12954 | 27 | 4e-41 | 0.002 | 0.509 |
0016319 | P | mushroom body development | 115 | P | 13486 | 51 | 2e-71 | 0.004 | 0.443 |
0030587 | P | sorocarp development | 97 | N | 12954 | 41 | 2e-57 | 0.003 | 0.423 |
0048745 | P | smooth muscle tissue development | 53 | Q | 20361 | 21 | 3e-25 | 0.001 | 0.396 |
0000298 | F | endopolyphosphatase activity | 51 | K | 6762 | 20 | 2e-33 | 0.003 | 0.392 |
0031152 | P | aggregation involved in sorocarp development | 69 | N | 12954 | 23 | 6e-30 | 0.002 | 0.333 |
0030479 | C | actin cortical patch | 154 | P | 13486 | 51 | 2e-63 | 0.004 | 0.331 |
0007413 | P | axonal fasciculation | 168 | P | 13486 | 54 | 3e-66 | 0.004 | 0.321 |
0020030 | C | infected host cell surface knob | 51 | E | 13828 | 16 | 2e-20 | 0.001 | 0.314 |
0030036 | P | actin cytoskeleton organization | 1060 | P | 13486 | 329 | 0 | 0.024 | 0.310 |
0020013 | P | modulation by symbiont of host erythrocyte aggregation | 50 | E | 13828 | 15 | 8e-19 | 0.001 | 0.300 |
0007476 | P | imaginal disc-derived wing morphogenesis | 82 | Q | 20361 | 24 | 5e-25 | 0.001 | 0.293 |
0044212 | F | transcription regulatory region DNA binding | 52 | A | 18012 | 15 | 8e-17 | 0.001 | 0.288 |
0043254 | P | regulation of protein complex assembly | 63 | E | 13828 | 18 | 7e-22 | 0.001 | 0.286 |
0007422 | P | peripheral nervous system development | 201 | P | 13486 | 57 | 3e-66 | 0.004 | 0.284 |
0020035 | P | cytoadherence to microvasculature, mediated by symbiont protein | 53 | E | 13828 | 15 | 2e-18 | 0.001 | 0.283 |
0004534 | F | 5'-3' exoribonuclease activity | 82 | G | 15639 | 23 | 4e-26 | 0.001 | 0.280 |
0000902 | P | cell morphogenesis | 380 | P | 13486 | 101 | 9e-113 | 0.007 | 0.266 |
0044403 | P | symbiosis, encompassing mutualism through parasitism | 53 | K | 6762 | 14 | 5e-21 | 0.002 | 0.264 |
0007298 | P | border follicle cell migration | 76 | Q | 20361 | 20 | 5e-20 | 0.001 | 0.263 |
0005523 | F | tropomyosin binding | 92 | P | 13486 | 24 | 8e-28 | 0.002 | 0.261 |
0030587 | P | sorocarp development | 97 | S | 19631 | 25 | 8e-25 | 0.001 | 0.258 |
0030587 | P | sorocarp development | 97 | Q | 20361 | 25 | 2e-24 | 0.001 | 0.258 |
0032420 | C | stereocilium | 51 | P | 13486 | 13 | 1e-15 | 0.001 | 0.255 |
0030118 | C | clathrin coat | 158 | Q | 20361 | 40 | 8e-38 | 0.002 | 0.253 |
0030276 | F | clathrin binding | 180 | Q | 20361 | 45 | 4e-42 | 0.002 | 0.250 |
GO id — Index in the Gene Ontology base
Type — Cellular component (C), molecular function (F) or biological process (P) (see the Gene Ontology base for details)
Ngo — number of proteins with the given annotation
Homorepeat — homorepeat with length 6 or larger
Nhm — number of proteins with the given homorepeat
Nhm,go — number of proteins with the given homorepeat and annotation
PZ — significance, the probability to find the number of proteins Nhm,go and larger among all proteins
Colorizing: PZ<10−15 , 10−15 ≤ PZ < 10−10 and 10−10 ≤ PZ < 10−7 .
P1 — Nhm,go / Nhm
P2 — Nhm,go / Ngo
Colorizing: P1,2>0.5 , 0.3≤P1,2≤0.5 and 0.1<P1,2<0.3 .
Details described in Description section.
GO id | Type | Description | Ngo | Pattern | Npt | Npt,go | PZ | P1 | P2 |
---|---|---|---|---|---|---|---|---|---|
0003922 | F | GMP synthase (glutamine-hydrolyzing) activity | 132 | 72 IKSHHNVGGLP | 60 | 60 | 2e-225 | 1.000 | 0.455 |
0005524 | F | ATP binding | 89181 | 72 IKSHHNVGGLP | 60 | 60 | 3e-50 | 1.000 | 0.001 |
0006177 | P | GMP biosynthetic process | 269 | 72 IKSHHNVGGLP | 60 | 60 | 6e-203 | 1.000 | 0.223 |
0009186 | P | deoxyribonucleoside diphosphate metabolic process | 201 | 22 GKTNFFEK | 135 | 130 | 0 | 0.963 | 0.647 |
0046914 | F | transition metal ion binding | 682 | 22 GKTNFFEK | 135 | 130 | 0 | 0.963 | 0.191 |
0055114 | P | oxidation-reduction process | 58722 | 22 GKTNFFEK | 135 | 130 | 5e-127 | 0.963 | 0.002 |
0004748 | F | ribonucleoside-diphosphate reductase activity | 371 | 22 GKTNFFEK | 135 | 128 | 0 | 0.948 | 0.345 |
0006541 | P | glutamine metabolic process | 924 | 72 IKSHHNVGGLP | 60 | 53 | 4e-146 | 0.883 | 0.057 |
0004871 | F | signal transducer activity | 3946 | 78 NLREDGE | 91 | 27 | 3e-45 | 0.297 | 0.007 |
0005525 | F | GTP binding | 14556 | 78 NLREDGE | 91 | 27 | 3e-30 | 0.297 | 0.002 |
0008270 | F | zinc ion binding | 68859 | 51 VKPEVKP | 168 | 48 | 6e-22 | 0.286 | 0.001 |
0003677 | F | DNA binding | 52551 | 58 GGAKRH | 662 | 169 | 2e-82 | 0.255 | 0.003 |
0005634 | C | nucleus | 58238 | 93 AAHHHHHHH | 83 | 21 | 8e-11 | 0.253 | 0.000 |
0005634 | C | nucleus | 58238 | 58 GGAKRH | 662 | 157 | 1e-66 | 0.237 | 0.003 |
0003700 | F | sequence-specific DNA binding transcription factor activity | 34548 | 93 AAHHHHHHH | 83 | 19 | 5e-13 | 0.229 | 0.001 |
0005524 | F | ATP binding | 89181 | 93 AAHHHHHHH | 83 | 18 | 7e-06 | 0.217 | 0.000 |
0006355 | P | regulation of transcription, DNA-dependent | 45500 | 93 AAHHHHHHH | 83 | 18 | 4e-10 | 0.217 | 0.000 |
0046080 | P | dUTP metabolic process | 153 | 55 RGEGGFG | 470 | 99 | 3e-304 | 0.211 | 0.647 |
0008270 | F | zinc ion binding | 68859 | 167 GSHGM | 134 | 28 | 2e-10 | 0.209 | 0.000 |
0000786 | C | nucleosome | 1811 | 58 GGAKRH | 662 | 138 | 1e-251 | 0.208 | 0.076 |
0006334 | P | nucleosome assembly | 2143 | 58 GGAKRH | 662 | 138 | 3e-241 | 0.208 | 0.064 |
0005524 | F | ATP binding | 89181 | 30 IDPFT | 283 | 56 | 2e-13 | 0.198 | 0.001 |
0004170 | F | dUTP diphosphatase activity | 118 | 55 RGEGGFG | 470 | 93 | 8e-300 | 0.198 | 0.788 |
0003677 | F | DNA binding | 52551 | 51 VKPEVKP | 168 | 31 | 7e-13 | 0.185 | 0.001 |
0005634 | C | nucleus | 58238 | 142 GAHHHHH | 93 | 17 | 5e-07 | 0.183 | 0.000 |
0043565 | F | sequence-specific DNA binding | 14997 | 93 AAHHHHHHH | 83 | 15 | 3e-14 | 0.181 | 0.001 |
0005524 | F | ATP binding | 89181 | 89 DIPESQ | 610 | 107 | 2e-20 | 0.175 | 0.001 |
0008270 | F | zinc ion binding | 68859 | 23 GSRHHHH | 149 | 26 | 4e-08 | 0.174 | 0.000 |
0008270 | F | zinc ion binding | 68859 | 142 GAHHHHH | 93 | 16 | 0.000016 | 0.172 | 0.000 |
0008270 | F | zinc ion binding | 68859 | 67 GHMA | 1575 | 269 | 2e-67 | 0.171 | 0.004 |
0008270 | F | zinc ion binding | 68859 | 132 CGYSD | 95 | 16 | 0.000021 | 0.168 | 0.000 |
0005992 | P | trehalose biosynthetic process | 411 | 167 GSHGM | 134 | 22 | 3e-53 | 0.164 | 0.054 |
0004222 | F | metalloendopeptidase activity | 4576 | 132 CGYSD | 95 | 15 | 8e-21 | 0.158 | 0.003 |
0008270 | F | zinc ion binding | 68859 | 49 QQQQQG | 3391 | 534 | 9e-119 | 0.157 | 0.008 |
0003824 | F | catalytic activity | 21677 | 167 GSHGM | 134 | 21 | 6e-15 | 0.157 | 0.001 |
0006468 | P | protein phosphorylation | 23851 | 93 AAHHHHHHH | 83 | 13 | 3e-09 | 0.157 | 0.001 |
0015074 | P | DNA integration | 3400 | 51 VKPEVKP | 168 | 26 | 5e-38 | 0.155 | 0.008 |
0003924 | F | GTPase activity | 5656 | 78 NLREDGE | 91 | 14 | 4e-18 | 0.154 | 0.002 |
0008270 | F | zinc ion binding | 68859 | 32 KSCDK | 73 | 11 | 0.000947 | 0.151 | 0.000 |
0008270 | F | zinc ion binding | 68859 | 16 HIEGRH | 93 | 14 | 0.000209 | 0.151 | 0.000 |
0008270 | F | zinc ion binding | 68859 | 158 STSHHHHH | 76 | 11 | 0.001299 | 0.145 | 0.000 |
0007186 | P | G-protein coupled receptor protein signaling pathway | 2024 | 78 NLREDGE | 91 | 13 | 3e-22 | 0.143 | 0.006 |
0007193 | P | inhibition of adenylate cyclase activity by G-protein signaling pathway | 70 | 78 NLREDGE | 91 | 13 | 1e-41 | 0.143 | 0.186 |
0006355 | P | regulation of transcription, DNA-dependent | 45500 | 142 GAHHHHH | 93 | 13 | 0.000012 | 0.140 | 0.000 |
0003676 | F | nucleic acid binding | 35523 | 105 RGRPRG | 6022 | 837 | 0 | 0.139 | 0.024 |
0008270 | F | zinc ion binding | 68859 | 109 QQQQQP | 10773 | 1496 | 4e-275 | 0.139 | 0.022 |
0005524 | F | ATP binding | 89181 | 4 GPGSM | 313 | 43 | 2e-06 | 0.137 | 0.000 |
0005524 | F | ATP binding | 89181 | 90 KKGKS | 1172 | 161 | 1e-19 | 0.137 | 0.002 |
0003700 | F | sequence-specific DNA binding transcription factor activity | 34548 | 18 HHHHHGGS | 73 | 10 | 0.000015 | 0.137 | 0.000 |
0005634 | C | nucleus | 58238 | 18 HHHHHGGS | 73 | 10 | 0.000933 | 0.137 | 0.000 |
0006508 | P | proteolysis | 21294 | 132 CGYSD | 95 | 13 | 3e-09 | 0.137 | 0.001 |
0008270 | F | zinc ion binding | 68859 | 129 SHHHHHH | 726 | 99 | 3e-19 | 0.136 | 0.001 |
0008270 | F | zinc ion binding | 68859 | 71 QQQQQQQ | 20729 | 2813 | 0 | 0.136 | 0.041 |
0008270 | F | zinc ion binding | 68859 | 150 PPPPQ | 6446 | 869 | 1e-152 | 0.135 | 0.013 |
0008270 | F | zinc ion binding | 68859 | 152 GGGGSGGGGS | 943 | 127 | 9e-24 | 0.135 | 0.002 |
0005524 | F | ATP binding | 89181 | 7 VPRGS | 580 | 78 | 6e-10 | 0.134 | 0.001 |
0008270 | F | zinc ion binding | 68859 | 9 LEAHHH | 321 | 43 | 5e-09 | 0.134 | 0.001 |
0008270 | F | zinc ion binding | 68859 | 93 AAHHHHHHH | 83 | 11 | 0.002549 | 0.133 | 0.000 |
0005813 | C | centrosome | 523 | 78 NLREDGE | 91 | 12 | 3e-27 | 0.132 | 0.023 |
0005834 | C | heterotrimeric G-protein complex | 369 | 78 NLREDGE | 91 | 12 | 4e-29 | 0.132 | 0.033 |
0030496 | C | midbody | 163 | 78 NLREDGE | 91 | 12 | 2e-33 | 0.132 | 0.074 |
0051301 | P | cell division | 3391 | 78 NLREDGE | 91 | 12 | 1e-17 | 0.132 | 0.004 |
0005634 | C | nucleus | 58238 | 54 RRGKKK | 2477 | 326 | 5e-72 | 0.132 | 0.006 |
0005524 | F | ATP binding | 89181 | 101 SMAEG | 228 | 30 | 0.000140 | 0.132 | 0.000 |
0005622 | C | intracellular | 37927 | 158 STSHHHHH | 76 | 10 | 0.000044 | 0.132 | 0.000 |
0016491 | F | oxidoreductase activity | 31920 | 55 RGEGGFG | 470 | 61 | 6e-27 | 0.130 | 0.002 |
0005524 | F | ATP binding | 89181 | 156 AELAAATA | 185 | 24 | 0.000736 | 0.130 | 0.000 |
0003700 | F | sequence-specific DNA binding transcription factor activity | 34548 | 142 GAHHHHH | 93 | 12 | 4e-06 | 0.129 | 0.000 |
0005524 | F | ATP binding | 89181 | 103 RPQLDS | 1030 | 132 | 4e-14 | 0.128 | 0.001 |
0003676 | F | nucleic acid binding | 35523 | 55 RGEGGFG | 470 | 60 | 7e-24 | 0.128 | 0.002 |
0005524 | F | ATP binding | 89181 | 138 TDNGNS | 660 | 84 | 2e-09 | 0.127 | 0.001 |
0005634 | C | nucleus | 58238 | 167 GSHGM | 134 | 17 | 0.000049 | 0.127 | 0.000 |
0005634 | C | nucleus | 58238 | 109 QQQQQP | 10773 | 1358 | 2e-278 | 0.126 | 0.023 |
0005634 | C | nucleus | 58238 | 71 QQQQQQQ | 20729 | 2599 | 0 | 0.125 | 0.045 |
0005634 | C | nucleus | 58238 | 86 EDDEDED | 3411 | 427 | 2e-87 | 0.125 | 0.007 |
0005634 | C | nucleus | 58238 | 152 GGGGSGGGGS | 943 | 118 | 2e-25 | 0.125 | 0.002 |
0005634 | C | nucleus | 58238 | 57 GPSSG | 1549 | 193 | 3e-40 | 0.125 | 0.003 |
0005524 | F | ATP binding | 89181 | 96 ASIGQA | 1401 | 173 | 1e-16 | 0.123 | 0.002 |
0005524 | F | ATP binding | 89181 | 18 HHHHHGGS | 73 | 9 | 0.039845 | 0.123 | 0.000 |
0005634 | C | nucleus | 58238 | 129 SHHHHHH | 726 | 89 | 4e-19 | 0.123 | 0.002 |
0005524 | F | ATP binding | 89181 | 145 GGKKKK | 2755 | 337 | 5e-30 | 0.122 | 0.004 |
0005524 | F | ATP binding | 89181 | 74 YKDDD | 231 | 28 | 0.000792 | 0.121 | 0.000 |
0031683 | F | G-protein beta/gamma-subunit complex binding | 107 | 78 NLREDGE | 91 | 11 | 2e-32 | 0.121 | 0.103 |
0005634 | C | nucleus | 58238 | 77 GHHHHH | 1012 | 122 | 7e-25 | 0.121 | 0.002 |
0005634 | C | nucleus | 58238 | 49 QQQQQG | 3391 | 407 | 2e-78 | 0.120 | 0.007 |
0005634 | C | nucleus | 58238 | 150 PPPPQ | 6446 | 773 | 3e-147 | 0.120 | 0.013 |
0005524 | F | ATP binding | 89181 | 2 ENLYFQ | 259 | 31 | 0.000531 | 0.120 | 0.000 |
0005524 | F | ATP binding | 89181 | 53 SHMAS | 117 | 14 | 0.015867 | 0.120 | 0.000 |
0005524 | F | ATP binding | 89181 | 158 STSHHHHH | 76 | 9 | 0.048959 | 0.118 | 0.000 |
0006468 | P | protein phosphorylation | 23851 | 158 STSHHHHH | 76 | 9 | 7e-06 | 0.118 | 0.000 |
0005524 | F | ATP binding | 89181 | 12 GPLGS | 873 | 103 | 2e-09 | 0.118 | 0.001 |
0003677 | F | DNA binding | 52551 | 143 AMADIGS | 51 | 6 | 0.011706 | 0.118 | 0.000 |
0005634 | C | nucleus | 58238 | 1 HHHH | 9759 | 1145 | 2e-210 | 0.117 | 0.020 |
0008270 | F | zinc ion binding | 68859 | 1 HHHH | 9759 | 1134 | 3e-153 | 0.116 | 0.016 |
0005634 | C | nucleus | 58238 | 144 HHLHHHG | 414 | 48 | 3e-10 | 0.116 | 0.001 |
0016021 | C | integral to membrane | 85599 | 122 GSETMA | 468 | 54 | 6e-06 | 0.115 | 0.001 |
0005524 | F | ATP binding | 89181 | 10 SNAM | 2552 | 290 | 1e-21 | 0.114 | 0.003 |
0005524 | F | ATP binding | 89181 | 137 GVPRG | 590 | 67 | 3e-06 | 0.114 | 0.001 |
0005524 | F | ATP binding | 89181 | 130 EDDESD | 2934 | 333 | 2e-24 | 0.113 | 0.004 |
0005887 | C | integral to plasma membrane | 2359 | 30 IDPFT | 283 | 32 | 3e-47 | 0.113 | 0.014 |
0006468 | P | protein phosphorylation | 23851 | 30 IDPFT | 283 | 32 | 1e-16 | 0.113 | 0.001 |
0003677 | F | DNA binding | 52551 | 74 YKDDD | 231 | 26 | 8e-07 | 0.113 | 0.000 |
0005524 | F | ATP binding | 89181 | 136 EKKKS | 1688 | 189 | 4e-14 | 0.112 | 0.002 |
0005524 | F | ATP binding | 89181 | 154 KKEKK | 5682 | 629 | 2e-41 | 0.111 | 0.007 |
0005634 | C | nucleus | 58238 | 5 DDDDK | 1346 | 149 | 2e-26 | 0.111 | 0.003 |
0008270 | F | zinc ion binding | 68859 | 106 DHSPAP | 407 | 45 | 4e-07 | 0.111 | 0.001 |
0005524 | F | ATP binding | 89181 | 38 KKKAA | 1943 | 214 | 5e-15 | 0.110 | 0.002 |
0005524 | F | ATP binding | 89181 | 88 QQREEG | 619 | 68 | 8e-06 | 0.110 | 0.001 |
0005524 | F | ATP binding | 89181 | 166 KSASS | 1686 | 185 | 4e-13 | 0.110 | 0.002 |
0003676 | F | nucleic acid binding | 35523 | 32 KSCDK | 73 | 8 | 0.000539 | 0.110 | 0.000 |
0006355 | P | regulation of transcription, DNA-dependent | 45500 | 18 HHHHHGGS | 73 | 8 | 0.002530 | 0.110 | 0.000 |
0008270 | F | zinc ion binding | 68859 | 39 KKTSS | 1132 | 123 | 4e-16 | 0.109 | 0.002 |
0003677 | F | DNA binding | 52551 | 93 AAHHHHHHH | 83 | 9 | 0.003875 | 0.108 | 0.000 |
0005634 | C | nucleus | 58238 | 153 EEDDD | 7336 | 794 | 3e-128 | 0.108 | 0.014 |
0005488 | F | binding | 41088 | 16 HIEGRH | 93 | 10 | 0.000417 | 0.108 | 0.000 |
0006886 | P | intracellular protein transport | 6383 | 95 DAPDI | 262 | 28 | 6e-29 | 0.107 | 0.004 |
0005634 | C | nucleus | 58238 | 73 EEEED | 14402 | 1529 | 2e-238 | 0.106 | 0.026 |
0005634 | C | nucleus | 58238 | 20 EDEREE | 10648 | 1124 | 2e-173 | 0.106 | 0.019 |
0008270 | F | zinc ion binding | 68859 | 162 KSGYKD | 702 | 74 | 9e-10 | 0.105 | 0.001 |
0008270 | F | zinc ion binding | 68859 | 126 RSVRSN | 1338 | 140 | 9e-17 | 0.105 | 0.002 |
0055114 | P | oxidation-reduction process | 58722 | 55 RGEGGFG | 470 | 49 | 7e-09 | 0.104 | 0.001 |
0005524 | F | ATP binding | 89181 | 148 SGDDDD | 2678 | 279 | 3e-16 | 0.104 | 0.003 |
0005524 | F | ATP binding | 89181 | 123 EEEKKKE | 1958 | 203 | 4e-12 | 0.104 | 0.002 |
0005524 | F | ATP binding | 89181 | 65 AAVGGAA | 2377 | 246 | 3e-14 | 0.103 | 0.003 |
0003677 | F | DNA binding | 52551 | 99 RRRGR | 3347 | 346 | 3e-62 | 0.103 | 0.007 |
0005524 | F | ATP binding | 89181 | 124 RGGGGSG | 1136 | 117 | 2e-07 | 0.103 | 0.001 |
0008270 | F | zinc ion binding | 68859 | 40 GGSGGGGSGGG | 3798 | 389 | 2e-41 | 0.102 | 0.006 |
0008270 | F | zinc ion binding | 68859 | 105 RGRPRG | 6022 | 616 | 2e-64 | 0.102 | 0.009 |
0008270 | F | zinc ion binding | 68859 | 3 GSHM | 1351 | 138 | 9e-16 | 0.102 | 0.002 |
0008270 | F | zinc ion binding | 68859 | 57 GPSSG | 1549 | 158 | 9e-18 | 0.102 | 0.002 |
0005524 | F | ATP binding | 89181 | 35 MGRGS | 226 | 23 | 0.015574 | 0.102 | 0.000 |
0008270 | F | zinc ion binding | 68859 | 35 MGRGS | 226 | 23 | 0.000761 | 0.102 | 0.000 |
0003676 | F | nucleic acid binding | 35523 | 40 GGSGGGGSGGG | 3798 | 384 | 8e-113 | 0.101 | 0.011 |
0005524 | F | ATP binding | 89181 | 121 RGSMAS | 1999 | 202 | 4e-11 | 0.101 | 0.002 |
0005524 | F | ATP binding | 89181 | 34 SSSVD | 1774 | 179 | 6e-10 | 0.101 | 0.002 |
0005634 | C | nucleus | 58238 | 130 EDDESD | 2934 | 296 | 3e-43 | 0.101 | 0.005 |
0005524 | F | ATP binding | 89181 | 170 GSEED | 1201 | 121 | 3e-07 | 0.101 | 0.001 |
0005524 | F | ATP binding | 89181 | 75 LDNGED | 1231 | 124 | 2e-07 | 0.101 | 0.001 |
0005622 | C | intracellular | 37927 | 23 GSRHHHH | 149 | 15 | 0.000015 | 0.101 | 0.000 |
0005634 | C | nucleus | 58238 | 23 GSRHHHH | 149 | 15 | 0.001369 | 0.101 | 0.000 |
0003677 | F | DNA binding | 52551 | 49 QQQQQG | 3391 | 341 | 9e-59 | 0.101 | 0.006 |
0005524 | F | ATP binding | 89181 | 63 DSVISS | 3004 | 302 | 2e-15 | 0.101 | 0.003 |
0005634 | C | nucleus | 58238 | 127 DEEDE | 7229 | 723 | 2e-101 | 0.100 | 0.012 |
0004066 | F | asparagine synthase (glutamine-hydrolyzing) activity | 326 | 72 IKSHHNVGGLP | 60 | 6 | 8e-15 | 0.100 | 0.018 |
0005524 | F | ATP binding | 89181 | 82 NGDTPS | 980 | 98 | 5e-06 | 0.100 | 0.001 |
0005737 | C | cytoplasm | 40740 | 72 IKSHHNVGGLP | 60 | 6 | 0.007706 | 0.100 | 0.000 |
0006529 | P | asparagine biosynthetic process | 331 | 72 IKSHHNVGGLP | 60 | 6 | 9e-15 | 0.100 | 0.018 |
0070593 | P | dendrite self-avoidance | 60 | 19 PPPPP | 26359 | 50 | 6e-77 | 0.002 | 0.833 |
0021551 | P | central nervous system morphogenesis | 66 | 19 PPPPP | 26359 | 50 | 6e-73 | 0.002 | 0.758 |
0051635 | F | bacterial cell surface binding | 71 | 19 PPPPP | 26359 | 50 | 4e-70 | 0.002 | 0.704 |
0008046 | F | axon guidance receptor activity | 74 | 19 PPPPP | 26359 | 52 | 8e-73 | 0.002 | 0.703 |
0000298 | F | endopolyphosphatase activity | 51 | 31 KKKKK | 14181 | 34 | 6e-56 | 0.002 | 0.667 |
0048846 | P | axon extension involved in axon guidance | 84 | 19 PPPPP | 26359 | 54 | 3e-72 | 0.002 | 0.643 |
0008154 | P | actin polymerization or depolymerization | 69 | 19 PPPPP | 26359 | 44 | 7e-59 | 0.002 | 0.638 |
0017048 | F | Rho GTPase binding | 297 | 19 PPPPP | 26359 | 186 | 2e-241 | 0.007 | 0.626 |
0044403 | P | symbiosis, encompassing mutualism through parasitism | 53 | 98 NNNNN | 17632 | 30 | 2e-43 | 0.002 | 0.566 |
0020030 | C | infected host cell surface knob | 51 | 45 EEEEEEE | 16734 | 27 | 8e-39 | 0.002 | 0.529 |
0020013 | P | modulation by symbiont of host erythrocyte aggregation | 50 | 45 EEEEEEE | 16734 | 26 | 4e-37 | 0.002 | 0.520 |
0020035 | P | cytoadherence to microvasculature, mediated by symbiont protein | 53 | 45 EEEEEEE | 16734 | 26 | 3e-36 | 0.002 | 0.491 |
0030587 | P | sorocarp development | 97 | 98 NNNNN | 17632 | 44 | 2e-57 | 0.002 | 0.454 |
0005007 | F | fibroblast growth factor receptor activity | 65 | 94 ESSSS | 6210 | 29 | 5e-51 | 0.005 | 0.446 |
0016319 | P | mushroom body development | 115 | 19 PPPPP | 26359 | 51 | 8e-57 | 0.002 | 0.443 |
0020013 | P | modulation by symbiont of host erythrocyte aggregation | 50 | 73 EEEED | 14402 | 22 | 6e-31 | 0.002 | 0.440 |
0004451 | F | isocitrate lyase activity | 135 | 67 GHMA | 1575 | 59 | 1e-136 | 0.037 | 0.437 |
0020030 | C | infected host cell surface knob | 51 | 73 EEEED | 14402 | 22 | 1e-30 | 0.002 | 0.431 |
0020035 | P | cytoadherence to microvasculature, mediated by symbiont protein | 53 | 73 EEEED | 14402 | 22 | 3e-30 | 0.002 | 0.415 |
0030479 | C | actin cortical patch | 154 | 19 PPPPP | 26359 | 63 | 5e-67 | 0.002 | 0.409 |
0044403 | P | symbiosis, encompassing mutualism through parasitism | 53 | 31 KKKKK | 14181 | 21 | 1e-28 | 0.001 | 0.396 |
0000262 | C | mitochondrial chromosome | 61 | 135 GLVPR | 701 | 24 | 2e-63 | 0.034 | 0.393 |
0015079 | F | potassium ion transmembrane transporter activity | 209 | 105 RGRPRG | 6022 | 81 | 2e-134 | 0.013 | 0.388 |
0032982 | C | myosin filament | 137 | 89 DIPESQ | 610 | 52 | 6e-138 | 0.085 | 0.380 |
0015321 | F | sodium-dependent phosphate transmembrane transporter activity | 56 | 104 TSAETP | 2159 | 21 | 6e-45 | 0.010 | 0.375 |
0031152 | P | aggregation involved in sorocarp development | 69 | 98 NNNNN | 17632 | 25 | 3e-30 | 0.001 | 0.362 |
0020013 | P | modulation by symbiont of host erythrocyte aggregation | 50 | 20 EDEREE | 10648 | 18 | 6e-26 | 0.002 | 0.360 |
0030905 | C | retromer complex, outer shell | 50 | 153 EEDDD | 7336 | 18 | 7e-29 | 0.002 | 0.360 |
0006784 | P | heme a biosynthetic process | 67 | 162 KSGYKD | 702 | 24 | 3e-62 | 0.034 | 0.358 |
0020030 | C | infected host cell surface knob | 51 | 20 EDEREE | 10648 | 18 | 9e-26 | 0.002 | 0.353 |
0031429 | C | box H/ACA snoRNP complex | 193 | 105 RGRPRG | 6022 | 67 | 1e-107 | 0.011 | 0.347 |
0030036 | P | actin cytoskeleton organization | 1060 | 19 PPPPP | 26359 | 367 | 0 | 0.014 | 0.346 |
0020035 | P | cytoadherence to microvasculature, mediated by symbiont protein | 53 | 20 EDEREE | 10648 | 18 | 2e-25 | 0.002 | 0.340 |
0043254 | P | regulation of protein complex assembly | 63 | 45 EEEEEEE | 16734 | 21 | 4e-25 | 0.001 | 0.333 |
0004534 | F | 5'-3' exoribonuclease activity | 82 | 25 GGGGG | 27975 | 27 | 6e-26 | 0.001 | 0.329 |
0007413 | P | axonal fasciculation | 168 | 19 PPPPP | 26359 | 55 | 2e-52 | 0.002 | 0.327 |
0030118 | C | clathrin coat | 158 | 71 QQQQQQQ | 20729 | 51 | 2e-53 | 0.002 | 0.323 |
0048745 | P | smooth muscle tissue development | 53 | 71 QQQQQQQ | 20729 | 17 | 9e-19 | 0.001 | 0.321 |
0020013 | P | modulation by symbiont of host erythrocyte aggregation | 50 | 31 KKKKK | 14181 | 16 | 3e-20 | 0.001 | 0.320 |
0030016 | C | myofibril | 63 | 89 DIPESQ | 610 | 20 | 4e-52 | 0.033 | 0.317 |
0048268 | P | clathrin coat assembly | 164 | 71 QQQQQQQ | 20729 | 52 | 5e-54 | 0.003 | 0.317 |
0017022 | F | myosin binding | 60 | 113 REEEE | 4486 | 19 | 4e-33 | 0.004 | 0.317 |
0007298 | P | border follicle cell migration | 76 | 71 QQQQQQQ | 20729 | 24 | 1e-25 | 0.001 | 0.316 |
0020030 | C | infected host cell surface knob | 51 | 31 KKKKK | 14181 | 16 | 4e-20 | 0.001 | 0.314 |
0030276 | F | clathrin binding | 180 | 71 QQQQQQQ | 20729 | 55 | 5e-56 | 0.003 | 0.306 |
0007476 | P | imaginal disc-derived wing morphogenesis | 82 | 71 QQQQQQQ | 20729 | 25 | 3e-26 | 0.001 | 0.305 |
0020035 | P | cytoadherence to microvasculature, mediated by symbiont protein | 53 | 31 KKKKK | 14181 | 16 | 7e-20 | 0.001 | 0.302 |
0003984 | F | acetolactate synthase activity | 226 | 67 GHMA | 1575 | 68 | 1e-143 | 0.043 | 0.301 |
0030515 | F | snoRNA binding | 224 | 105 RGRPRG | 6022 | 67 | 2e-102 | 0.011 | 0.299 |
0008589 | P | regulation of smoothened signaling pathway | 54 | 94 ESSSS | 6210 | 16 | 2e-25 | 0.003 | 0.296 |
0046439 | P | L-cysteine metabolic process | 108 | 57 GPSSG | 1549 | 32 | 2e-68 | 0.021 | 0.296 |
0000902 | P | cell morphogenesis | 380 | 19 PPPPP | 26359 | 112 | 5e-99 | 0.004 | 0.295 |
0030199 | P | collagen fibril organization | 68 | 119 PPAPAG | 4544 | 20 | 6e-34 | 0.004 | 0.294 |
0035265 | P | organ growth | 55 | 94 ESSSS | 6210 | 16 | 3e-25 | 0.003 | 0.291 |
0008154 | P | actin polymerization or depolymerization | 69 | 70 PAPPP | 7801 | 20 | 4e-29 | 0.003 | 0.290 |
0007422 | P | peripheral nervous system development | 201 | 19 PPPPP | 26359 | 58 | 2e-51 | 0.002 | 0.289 |
0017172 | F | cysteine dioxygenase activity | 111 | 57 GPSSG | 1549 | 32 | 6e-68 | 0.021 | 0.288 |
0017134 | F | fibroblast growth factor binding | 59 | 94 ESSSS | 6210 | 17 | 1e-26 | 0.003 | 0.288 |
0048557 | P | embryonic digestive tract morphogenesis | 66 | 94 ESSSS | 6210 | 19 | 1e-29 | 0.003 | 0.288 |
0005201 | F | extracellular matrix structural constituent | 277 | 119 PPAPAG | 4544 | 77 | 6e-124 | 0.017 | 0.278 |
0005523 | F | tropomyosin binding | 92 | 19 PPPPP | 26359 | 25 | 2e-22 | 0.001 | 0.272 |
0007004 | P | telomere maintenance via telomerase | 107 | 71 QQQQQQQ | 20729 | 29 | 1e-28 | 0.001 | 0.271 |
0042162 | F | telomeric DNA binding | 111 | 71 QQQQQQQ | 20729 | 30 | 2e-29 | 0.001 | 0.270 |
0004534 | F | 5'-3' exoribonuclease activity | 82 | 19 PPPPP | 26359 | 22 | 9e-20 | 0.001 | 0.268 |
0021549 | P | cerebellum development | 53 | 71 QQQQQQQ | 20729 | 14 | 2e-14 | 0.001 | 0.264 |
0005545 | F | 1-phosphatidylinositol binding | 195 | 71 QQQQQQQ | 20729 | 51 | 3e-48 | 0.002 | 0.262 |
0030118 | C | clathrin coat | 158 | 109 QQQQQP | 10773 | 41 | 3e-50 | 0.004 | 0.259 |
0030587 | P | sorocarp development | 97 | 71 QQQQQQQ | 20729 | 25 | 3e-24 | 0.001 | 0.258 |
0031424 | P | keratinization | 74 | 109 QQQQQP | 10773 | 19 | 5e-24 | 0.002 | 0.257 |
0048268 | P | clathrin coat assembly | 164 | 109 QQQQQP | 10773 | 42 | 4e-51 | 0.004 | 0.256 |
0032420 | C | stereocilium | 51 | 19 PPPPP | 26359 | 13 | 6e-12 | 0.000 | 0.255 |
0017048 | F | Rho GTPase binding | 297 | 27 PPAPP | 7072 | 75 | 6e-103 | 0.011 | 0.253 |
0050839 | F | cell adhesion molecule binding | 119 | 45 EEEEEEE | 16734 | 30 | 3e-31 | 0.002 | 0.252 |
0030276 | F | clathrin binding | 180 | 109 QQQQQP | 10773 | 45 | 4e-54 | 0.004 | 0.250 |
0046934 | F | phosphatidylinositol-4,5-bisphosphate 3-kinase activity | 52 | 112 KKSKK | 4391 | 13 | 1e-21 | 0.003 | 0.250 |
0048749 | P | compound eye development | 56 | 71 QQQQQQQ | 20729 | 14 | 5e-14 | 0.001 | 0.250 |
GO id — Index in the Gene Ontology base
Type — Cellular component (C), molecular function (F) or biological process (P) (see the Gene Ontology base for details)
Ngo — number of proteins with the given annotation
Npt — number of proteins with the given pattern
Npt,go — number of proteins with the given pattern and annotation
PZ — significance, the probability to find the number of proteins Npt,go and larger among all proteins
Colorizing: PZ<10−15 , 10−15 ≤ PZ < 10−10 and 10−10 ≤ PZ < 10−7 .
P1 — Npt,go / Npt
P2 — Npt,go / Ngo
Colorizing: P1,2>0.5 , 0.3≤P1,2≤0.5 and 0.1<P1,2<0.3 .
Details described in Description section.
with homorepeats or patterns in any proteomes
With active studying of disordered regions and their function we focus our attention on manifold long repeats of one amino acid (homorepeats) (see Fig.1). Our database includes 122 proteomes, 97 eukaryotic and 25 bacterial ones that can be divided into 9 kingdoms and 5 phyla of bacteria. Considering these proteomes we have 1 449 561 protein sequences. The database includes 771 786 of proteins with GO annotations. It has been found that leucine repeats were especially abundant in the «Receptor and/or Membrane» group, glutamine and alanine repeats in Transcription factor and/or Development, and lysine repeats in Metabolism. HRaP can be used to analyze evolution differences between proteins from different proteomes and connections of these regions with some definite functions.
To see the occurrence of a homorepeat, at the first step the user should choose a proteome among 122 considered ones, and then at the second step choose the investigated homorepeat with the given length or pattern. After that the list of proteins with the given homorepeat or pattern appears with GO annotations (if such is determined). Usually, long proteins contain a homorepeat or several different homorepeats. If several homorepeats and patterns exist in a protein then all these regions will be marked by different colors in the sequence. In the section HomoRepeats or Patterns you can find the occurrence of homorepeats with different lengths (or patterns) for all 122 proteomes. The patterns and homorepeats assotiated with the functions are presented in section GO annotations. Figure 2 presents a comparative analysis of the number of proteins containing homorepeats of 6 residues long in 122 proteomes.
We can suggest that homorepeats and patterns are responsible for common functions of nonhomologous, unrelated proteins from different organisms. To confirm this, we have done the following analysis. All possible GO annotations for proteins were taken for the set of 122 proteomes. The number of different kinds of all annotations is 11 313. Proteins without annotations were combined into the class «absent annotation». The number of proteins including at least one pattern from the last version of the library (171 patterns, [1]) was calculated, «Npt». Also the number of proteins including homorepeats of length 6 or larger was calculated, «Nhm». The number of proteins with the given annotation was calculated as well and indicated in the column «Ngo». For each pattern or homorepeat we can calculate the frequency of occurrence in all proteins:
Taking into account 171 patterns, 20 homorepetas, and 11 313 kinds of GO annotations, we have 11 313*(171+20) = 2 160 783 ≈ 2·106 possible combinations. Therefore, we should not pay attention on the events the probability of which is higher than 10−7. Taking this into account the probabilities pz were colored according to the following conditions: green color corresponds to pz<10−15, light green color corresponds to 10−15 ≤ pz < 10−10, and light yellow color corresponds to 10−10 ≤ pz < 10−7.
We also calculated the probabilities:
The patterns and homorepeats are sorted by p1 and p2 using the following colors: green — p1>0.5, light green — 0.3<p1<0.5, and light yellow — 0.1<p1<0.3.
For each proteome we calculated a set of 109 values reflecting the number of proteins containing at least one disordered pattern for each of the 109 patterns from the library (set 2010). Then considering all possible pairs of proteomes, the correlation coefficients between the 109 values have been calculated resulting in the matrix of correlation coefficients. The correlation coefficient was calculated for each pair of proteomes separately, and then averaging has been done inside each kingdom and phylum. Similar values have been calculated for a set of 141 disordered patterns (set 2011), 171 disordered patterns (set 2012) and 20 homorepeats. A comparative analysis of the number of proteins containing homorepeats of 6 and more residues long in 122 proteomes has demonstrated that the correlation coefficients between numbers of proteins, where at least once a homorepeat of six and more residues long for each of the 20 types of amino acid residues appears in 9 kingdoms of eukaryota and 5 phyla of bacteria, are higher inside the considered kingdom than between them. The same result is valid for the 109 disordered selected patterns (set 2010) [1], the 141 disordered selected patterns (set 2011) [2], and the 171 disordered selected patterns (set 2012) [3].
Example: Correlation between two proteomes
Number of protein with homorepeats (L≥6) in 2 proteomes | ||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
C | M | F | I | L | V | W | Y | A | G | T | S | Q | N | E | D | H | R | K | P | |
H.sapiens | 18 | 3 | 26 | 4 | 618 | 9 | 0 | 2 | 723 | 453 | 65 | 552 | 373 | 3 | 939 | 123 | 120 | 63 | 313 | 745 |
D.melanogaster | 4 | 2 | 1 | 0 | 48 | 3 | 0 | 0 | 576 | 429 | 233 | 409 | 914 | 208 | 100 | 96 | 145 | 39 | 38 | 315 |
Correlation = 47.6%
- M.Yu. Lobanov, O.V. Galzitskaya, Occurrence of disordered patterns and homorepeats in eukaryotic and bacterial proteomes. Molecular BioSystems, 8 (2012) 327—337.
- L. Mularoni, A. Ledda, M. Toll-Riera, M.M. Alba, Natural selection drives the accumulation of amino acid tandem repeats in human proteins, Genome Res., 20 (2010) 745—754.
- M.Yu. Lobanov, E.I. Furletova, N.S. Bogatyreva, M.A. Roytberg, O.V. Galzitskaya, Library of disordered patterns in 3D protein structures, PLoS Computational Biology, 6 (10), (2010) e1000958.
Q. Why do some patterns have zero occurrences in proteomes?
A. The patterns have been obtained from the Protein Data Bank. The seldom occurrence in proteomes means that these patterns are an artificial addition to protein chains. Poly H fragments at the termini of protein chains are artificial parts of proteins in the PDB which have been added for better purification of proteins, but in the eukaryotic proteomes (HHHHHH is practically absent from the bacterial proteomes at all) such a repeat is likely to have a biological function.
Q. Is there any meaning in the ordering of the patterns?
A. The patterns have been ordered according to their significance for prediction of disordered regions. These numbers have been assigned in the corresponding papers ( J. Biomol. Struct. Dyn., 31, 1034-104314, PLoS Computational Biology, 6, e100095818, PLoS One, 6, e2714219 ).
Q. Are there proteins associated with disease in HRaP database?
A.The list of human proteins with homorepeats "associated with disease" can be found in this archive.
Q. Are there proteins with homorepeats of 6 and more residues long in the Protein Data Bank?
A. The list of proteins with homorepeats of 6 and more residues long from the clustered Protein Data Bank ( J. Biomol. Struct. Dyn., 31, 1034-1043 )
can be found here (or download zip-archive).
The separate file for histidine repeats can be found in this archive.
Corresponding Author:
Oxana V. Galzitskaya
Programming:
Michail Yu. Lobanov
Web-programming:
Igor V. Sokolovskiy