Home  |  Contact



         UniProtKB/Swiss-Prot protein knowledgebase release 2020_03 statistics





1.  INTRODUCTION



Release 2020_03 of 17-Jun-20 of UniProtKB/Swiss-Prot contains 562755 sequence entries,

comprising 202599198 amino acids abstracted from 272950 references. 



521 sequences have been added since release 2020_02, the sequence data of

44 existing entries has been updated and the annotations of

217472 entries have been revised.



Number of fragments: 9224

Number of additional sequences produced by alternative splicing, initiation or promoter usage, or ribosomal frameshifting: 40291





Protein existence (PE):           entries     %



1: Evidence at protein level       104267   18.5%

2: Evidence at transcript level     56474     10%

3: Inferred from homology          386822   68.7%

4: Predicted                        13339    2.4%

5: Uncertain                         1853    0.3%



The growth of the database is summarized below.



   





2.  TAXONOMIC ORIGIN



   Total number of species represented in this release of UniProtKB/Swiss-Prot: 13919



   The first twenty species represent 121314 sequences:  21.6 % of the total

   number of entries.





   2.1 Table of the frequency of occurrence of species



        Species represented 1x: 5705

                            2x: 2005

                            3x: 1088

                            4x:  707

                            5x:  517

                            6x:  409

                            7x:  318

                            8x:  253

                            9x:  229

                           10x:  146

                       11- 20x:  800

                       21- 50x:  467

                       51-100x:  221

                         >100x: 1054





   2.2  Table of the most represented species



  ------  ---------  --------------------------------------------

  Number  Frequency  Species

  ------  ---------  --------------------------------------------

       1      20368  Homo sapiens (Human)

       2      17042  Mus musculus (Mouse)

       3      15983  Arabidopsis thaliana (Mouse-ear cress)

       4       8106  Rattus norvegicus (Rat)

       5       6721  Saccharomyces cerevisiae (strain ATCC 204508 / S288c) (Baker's yeast)

       6       6012  Bos taurus (Bovine)

       7       5140  Schizosaccharomyces pombe (strain 972 / ATCC 24843) (Fission yeast)

       8       4518  Escherichia coli (strain K12)

       9       4191  Bacillus subtilis (strain 168)

      10       4149  Dictyostelium discoideum (Slime mold)

      11       4129  Caenorhabditis elegans

      12       4081  Oryza sativa subsp. japonica (Rice)

      13       3608  Drosophila melanogaster (Fruit fly)

      14       3451  Xenopus laevis (African clawed frog)

      15       3158  Danio rerio (Zebrafish) (Brachydanio rerio)

      16       2295  Gallus gallus (Chicken)

      17       2218  Pongo abelii (Sumatran orangutan) (Pongo pygmaeus abelii)

      18       2204  Mycobacterium tuberculosis (strain ATCC 25618 / H37Rv)

      19       2042  Escherichia coli O157:H7

      20       1898  Mycobacterium tuberculosis (strain CDC 1551 / Oshkosh)

      21       1801  Salmonella typhimurium (strain LT2 / SGSC1412 / ATCC 700720)

      22       1787  Methanocaldococcus jannaschii  

      23       1707  Xenopus tropicalis (Western clawed frog) (Silurana tropicalis)

      24       1706  Haemophilus influenzae (strain ATCC 51907 / DSM 11121 / KW20 / Rd)

      25       1697  Escherichia coli O6:H1 (strain CFT073 / ATCC 700928 / UPEC)

      26       1685  Shigella flexneri

      27       1436  Sus scrofa (Pig)

      28       1383  Pseudomonas aeruginosa 

      29       1347  Salmonella typhi

      30       1244  Mycobacterium bovis (strain ATCC BAA-935 / AF2122/97)

      31       1175  Macaca fascicularis (Crab-eating macaque) (Cynomolgus monkey)

      32       1073  Synechocystis sp. (strain PCC 6803 / Kazusa)

      33       1035  Archaeoglobus fulgidus 

      34       1025  Yersinia pestis

      35       1012  Candida albicans (strain SC5314 / ATCC MYA-2876) (Yeast)

      36        986  Vibrio cholerae serotype O1 (strain ATCC 39315 / El Tor Inaba N16961)

      37        948  Emericella nidulans  

      38        941  Staphylococcus aureus (strain Mu50 / ATCC 700699)

      39        930  Salmonella paratyphi A (strain ATCC 9150 / SARB42)

      40        929  Staphylococcus aureus (strain N315)

      41        927  Ashbya gossypii (strain ATCC 10895 / CBS 109.51 / FGSC 9923 / NRRL Y-1056)  

      42        919  Kluyveromyces lactis   

      43        909  Acanthamoeba polyphaga mimivirus (APMV)

      44        903  Staphylococcus aureus (strain COL)

      45        896  Staphylococcus aureus (strain MW2)

      46        894  Oryctolagus cuniculus (Rabbit)

      47        894  Escherichia coli O6:K15:H31 (strain 536 / UPEC)

      48        890  Staphylococcus aureus (strain MSSA476)

      49        888  Staphylococcus aureus (strain MRSA252)

      50        883  Rhizobium meliloti (strain 1021) (Ensifer meliloti) (Sinorhizobium meliloti)

      51        882  Salmonella choleraesuis (strain SC-B67)

      52        879  Shigella sonnei (strain Ss046)

      53        878  Candida glabrata   

      54        874  Neurospora crassa 

      55        863  Yersinia pseudotuberculosis serotype I (strain IP32953)

      56        846  Oryza sativa subsp. indica (Rice)

      57        841  Escherichia coli O9:H4 (strain HS)

      58        834  Escherichia coli O139:H28 (strain E24377A / ETEC)

      59        833  Neosartorya fumigata (strain ATCC MYA-4609 / Af293 / CBS 101355 / FGSC A1100) 

      60        831  Zea mays (Maize)

      61        829  Shigella boydii serotype 4 (strain Sb227)

      62        827  Canis lupus familiaris (Dog) (Canis familiaris)

      63        825  Escherichia coli (strain UTI89 / UPEC)

      64        822  Shigella dysenteriae serotype 1 (strain Sd197)

      65        819  Escherichia coli (strain ATCC 8739 / DSM 1576 / Crooks)

      66        804  Streptomyces coelicolor (strain ATCC BAA-471 / A3(2) / M145)

      67        803  Pectobacterium atrosepticum (strain SCRI 1043 / ATCC BAA-672) 

      68        799  Staphylococcus aureus (strain NCTC 8325)

      69        793  Vibrio parahaemolyticus serotype O3:K6 (strain RIMD 2210633)

      70        791  Escherichia coli (strain SMS-3-5 / SECEC)

      71        787  Aquifex aeolicus (strain VF5)

      72        771  Escherichia coli O127:H6 (strain E2348/69 / EPEC)

      73        771  Escherichia coli (strain K12 / DH10B)

      74        770  Pasteurella multocida (strain Pm70)

      75        765  Escherichia coli O17:K52:H18 (strain UMN026 / ExPEC)

      76        765  Escherichia coli (strain K12 / MC4100 / BW2952)

      77        762  Escherichia coli (strain 55989 / EAEC)

      78        761  Escherichia coli O8 (strain IAI1)

      79        760  Staphylococcus epidermidis (strain ATCC 35984 / RP62A)

      80        760  Shigella flexneri serotype 5b (strain 8401)

      81        760  Staphylococcus epidermidis (strain ATCC 12228)

      82        758  Escherichia coli O45:K1 (strain S88 / ExPEC)

      83        756  Escherichia coli (strain SE11)

      84        753  Escherichia coli O7:K1 (strain IAI39 / ExPEC)

      85        751  Bacillus anthracis

      86        748  Photorhabdus laumondii subsp. laumondii (strain DSM 15139 / CIP 105565 / TT01)

      87        748  Escherichia coli O157:H7 (strain EC4115 / EHEC)

      88        742  Bacillus halodurans 

      89        739  Yersinia enterocolitica serotype O:8 / biotype 1B (strain NCTC 13174 / 8081)

      90        733  Vibrio vulnificus (strain CMCP6)

      91        731  Escherichia coli O81 (strain ED1a)

      92        725  Pseudomonas putida (strain ATCC 47054 / DSM 6125 / NCIMB 11950 / KT2440)

      93        722  Salmonella enteritidis PT4 (strain P125109)

      94        718  Vibrio vulnificus (strain YJ016)

      95        716  Salmonella paratyphi B (strain ATCC BAA-1250 / SPB7)

      96        715  Klebsiella pneumoniae subsp. pneumoniae (strain ATCC 700721 / MGH 78578)

      97        715  Enterobacter sp. (strain 638)

      98        715  Yersinia pestis bv. Antiqua (strain Nepal516)

      99        714  Salmonella paratyphi A (strain AKU_12601)

     100        714  Escherichia coli O1:K1 / APEC

     101        713  Yersinia pseudotuberculosis serotype O:1b (strain IP 31758)

     102        713  Salmonella newport (strain SL254)

     103        713  Salmonella agona (strain SL483)

     104        712  Salmonella schwarzengrund (strain CVM19633)

     105        711  Yersinia pestis bv. Antiqua (strain Antiqua)

     106        710  Salmonella heidelberg (strain SL476)

     107        702  Salmonella dublin (strain CT_02021853)

     108        698  Shigella boydii serotype 18 (strain CDC 3083-94 / BS512)

     109        698  Klebsiella pneumoniae (strain 342)

     110        695  Escherichia fergusonii (strain ATCC 35469 / DSM 13698 / CDC 0568-73)

     111        693  Pan troglodytes (Chimpanzee)

     112        692  Nostoc sp. (strain PCC 7120 / SAG 25.82 / UTEX 2576)

     113        686  Mycoplasma pneumoniae (strain ATCC 29342 / M129)

     114        684  Salmonella gallinarum (strain 287/91 / NCTC 13346)

     115        680  Pseudomonas syringae pv. tomato (strain ATCC BAA-871 / DC3000)

     116        678  Escherichia coli

     117        678  Citrobacter koseri (strain ATCC BAA-895 / CDC 4225-83 / SGSC4696)

     118        677  Staphylococcus aureus (strain USA300)

     119        672  Serratia proteamaculans (strain 568)

     120        669  Mycobacterium leprae (strain TN)

     121        668  Bacillus cereus 

     122        667  Yersinia pestis (strain Pestoides F)

     123        664  Bradyrhizobium diazoefficiens 

     124        664  Yarrowia lipolytica (strain CLIB 122 / E 150) (Yeast) (Candida lipolytica)

     125        657  Sinorhizobium fredii (strain NBRC 101917 / NGR234)

     126        653  Debaryomyces hansenii   

     127        650  Shewanella oneidensis (strain MR-1)

     128        650  Agrobacterium fabrum (strain C58 / ATCC 33970) (Agrobacterium tumefaciens 

     129        643  Staphylococcus aureus (strain bovine RF122 / ET3-1)

     130        642  Salmonella arizonae (strain ATCC BAA-731 / CDC346-86 / RSK2980)

     131        640  Yersinia pseudotuberculosis serotype O:3 (strain YPIII)

     132        634  Yersinia pseudotuberculosis serotype IB (strain PB1/+)

     133        622  Cronobacter sakazakii (strain ATCC BAA-894) (Enterobacter sakazakii)

     134        622  Treponema pallidum (strain Nichols)

     135        622  Methanothermobacter thermautotrophicus  

     136        618  Listeria monocytogenes serovar 1/2a (strain ATCC BAA-679 / EGD-e)

     137        615  Xanthomonas campestris pv. campestris 

     138        614  Staphylococcus haemolyticus (strain JCSC1435)

     139        613  Mesorhizobium japonicum  (Mesorhizobium loti 

     140        608  Helicobacter pylori (strain ATCC 700392 / 26695) (Campylobacter pylori)

     141        603  Ralstonia solanacearum (strain GMI1000) (Pseudomonas solanacearum)

     142        603  Listeria innocua serovar 6a (strain ATCC BAA-680 / CLIP 11262)

     143        602  Staphylococcus saprophyticus subsp. saprophyticus 

     144        602  Photobacterium profundum (strain SS9)

     145        601  Salmonella paratyphi C (strain RKS4594)

     146        600  Pseudomonas aeruginosa (strain UCBPP-PA14)

     147        600  Yersinia pestis bv. Antiqua (strain Angola)

     148        595  Bacillus cereus (strain ATCC 10987 / NRS 248)

     149        591  Pectobacterium carotovorum subsp. carotovorum (strain PC1)

     150        584  Rickettsia prowazekii (strain Madrid E)

     151        582  Neisseria meningitidis serogroup B (strain MC58)

     152        579  Caenorhabditis briggsae

     153        579  Brucella suis biovar 1 (strain 1330)

     154        574  Brucella melitensis biotype 1 (strain 16M / ATCC 23456 / NCTC 10094)

     155        573  Aliivibrio fischeri (strain ATCC 700601 / ES114) (Vibrio fischeri)

     156        572  Buchnera aphidicola subsp. Acyrthosiphon pisum (strain APS) 

     157        572  Caulobacter vibrioides (strain ATCC 19089 / CB15) (Caulobacter crescentus)

     158        569  Bacillus thuringiensis subsp. konkukian (strain 97-27)

     159        568  Helicobacter pylori (strain J99 / ATCC 700824) (Campylobacter pylori J99)

     160        567  Pseudomonas syringae pv. syringae (strain B728a)

     161        564  Bacillus licheniformis 

     162        562  Mycolicibacterium smegmatis (strain ATCC 700084 / mc(2)155) 

     163        562  Buchnera aphidicola subsp. Schizaphis graminum (strain Sg)

     164        562  Bacillus cereus (strain ZK / E33L)

     165        559  Clostridium acetobutylicum 

     166        559  Thermotoga maritima (strain ATCC 43589 / MSB8 / DSM 3109 / JCM 10099)

     167        557  Xanthomonas axonopodis pv. citri (strain 306)

     168        555  Pseudomonas fluorescens (strain Pf0-1)

     169        554  Neisseria meningitidis serogroup A / serotype 4A (strain Z2491)

     170        553  Oceanobacillus iheyensis 

     171        552  Pseudomonas fluorescens (strain ATCC BAA-477 / NRRL B-23932 / Pf-5)

     172        547  Pseudomonas savastanoi pv. phaseolicola  (Pseudomonas syringae pv. phaseolicola 

     173        540  Lactococcus lactis subsp. lactis (strain IL1403) (Streptococcus lactis)

     174        531  Erwinia tasmaniensis (strain DSM 17950 / CIP 109463 / Et1/99)

     175        531  Corynebacterium glutamicum 

     176        530  Aspergillus oryzae (strain ATCC 42149 / RIB 40) (Yellow koji mold)

     177        529  Sodalis glossinidius (strain morsitans)

     178        529  Listeria monocytogenes serotype 4b (strain F2365)

     179        528  Bordetella bronchiseptica (strain ATCC BAA-588 / NCTC 13252 / RB50) 

     180        524  Staphylococcus aureus (strain Newman)

     181        522  Xylella fastidiosa (strain 9a5c)

     182        521  Vibrio cholerae serotype O1 (strain ATCC 39541 / Classical Ogawa 395 / O395)

     183        518  Methanosarcina acetivorans (strain ATCC 35395 / DSM 2834 / JCM 12185 / C2A)

     184        517  Chromobacterium violaceum 

     185        516  Bordetella pertussis (strain Tohama I / ATCC BAA-589 / NCTC 13251)

     186        516  Deinococcus radiodurans 

     187        515  Xylella fastidiosa (strain Temecula1 / ATCC 700964)

     188        512  Pseudomonas aeruginosa (strain PA7)

     189        511  Streptomyces avermitilis 

     190        511  Streptococcus pneumoniae serotype 4 (strain ATCC BAA-334 / TIGR4)

     191        510  Haemophilus ducreyi (strain 35000HP / ATCC 700724)

     192        510  Geobacillus kaustophilus (strain HTA426)

     193        508  Bordetella parapertussis (strain 12822 / ATCC BAA-587 / NCTC 13253)

     194        507  Buchnera aphidicola subsp. Baizongia pistaciae (strain Bp)

     195        502  Pseudomonas entomophila (strain L48)

     196        501  Pyrococcus furiosus (strain ATCC 43587 / DSM 3638 / JCM 8422 / Vc1)

     197        499  Brucella abortus biovar 1 (strain 9-941)

     198        498  Acinetobacter baylyi (strain ATCC 33305 / BD413 / ADP1)

     199        497  Haemophilus influenzae (strain 86-028NP)

     200        496  Bacillus clausii (strain KSM-K16)

     201        496  Burkholderia pseudomallei (strain K96243)

     202        496  Rickettsia conorii (strain ATCC VR-613 / Malish 7)

     203        494  Proteus mirabilis (strain HI4320)

     204        493  Xanthomonas campestris pv. campestris (strain 8004)

     205        493  Pyrococcus horikoshii 

     206        492  Bacillus velezensis (strain DSM 23117 / BGSC 10A6 / FZB42) 

     207        491  Vibrio campbellii (strain ATCC BAA-1116 / BB120)

     208        491  Halobacterium salinarum (strain ATCC 700922 / JCM 11081 / NRC-1) 

     209        488  Methanosarcina mazei  

     210        487  Shewanella sp. (strain MR-7)

     211        486  Brucella abortus (strain 2308)

     212        486  Mannheimia succiniciproducens (strain MBEL55E)

     213        485  Thermosynechococcus elongatus (strain BP-1)

     214        484  Staphylococcus aureus (strain Mu3 / ATCC 700698)

     215        484  Shewanella sp. (strain MR-4)

     216        484  Pseudomonas aeruginosa (strain LESB58)

     217        483  Mycoplasma genitalium (strain ATCC 33530 / G-37 / NCTC 10195)

     218        482  Streptococcus pneumoniae (strain ATCC BAA-255 / R6)

     219        482  Saccharolobus solfataricus (strain ATCC 35092 / DSM 1617 / JCM 11322 / P2) 

     220        480  Solanum lycopersicum (Tomato) (Lycopersicon esculentum)

     221        480  Synechococcus elongatus (strain PCC 7942 / FACHB-805) (Anacystis nidulans R2)

     222        480  Lactobacillus plantarum (strain ATCC BAA-793 / NCIMB 8826 / WCFS1)

     223        478  Pseudomonas putida (strain ATCC 700007 / DSM 6899 / BCRC 17059 / F1)

     224        478  Nicotiana tabacum (Common tobacco)

     225        477  Pyrococcus abyssi (strain GE5 / Orsay)

     226        475  Cupriavidus necator (strain ATCC 17699 / H16 / DSM 428 / Stanier 337) 

     227        474  Burkholderia lata 

     228        472  Rhodopseudomonas palustris (strain ATCC BAA-98 / CGA009)

     229        469  Rhodobacter sphaeroides (strain ATCC 17023 / 2.4.1 / NCIB 8253 / DSM 158)

     230        468  Clostridium perfringens (strain 13 / Type A)

     231        468  Pseudomonas putida (strain GB-1)

     232        467  Campylobacter jejuni subsp. jejuni serotype O:2 

     233        467  Aeromonas hydrophila subsp. hydrophila 

     234        467  Enterococcus faecalis (strain ATCC 700802 / V583)

     235        467  Shewanella frigidimarina (strain NCIMB 400)

     236        466  Xanthomonas campestris pv. vesicatoria (strain 85-10)

     237        466  Shewanella sp. (strain ANA-3)

     238        465  Trichormus variabilis (strain ATCC 29413 / PCC 7937) (Anabaena variabilis)

     239        463  Burkholderia mallei (strain ATCC 23344)

     240        459  Cupriavidus pinatubonensis (strain JMP 134 / LMG 1197) (Cupriavidus necator 

     241        459  Ovis aries (Sheep)

     242        458  Methylococcus capsulatus (strain ATCC 33009 / NCIMB 11132 / Bath)

     243        457  Rickettsia felis (strain ATCC VR-1525 / URRWXCal2) (Rickettsia azadi)

     244        455  Xanthomonas oryzae pv. oryzae (strain MAFF 311018)

     245        455  Staphylococcus aureus (strain JH1)

     246        455  Shewanella baltica (strain OS185)

     247        453  Streptococcus mutans serotype c (strain ATCC 700610 / UA159)

     248        453  Pseudomonas putida (strain W619)

     249        452  Aeromonas salmonicida (strain A449)

     250        450  Caldanaerobacter subterraneus subsp. tengcongensis  





   

   2.3  Taxonomic distribution of the sequences



   



   Kingdom        sequences (% of the database)

    Archaea           19621 (  3%)

    Bacteria         334554 ( 59%)

    Eukaryota        191572 ( 34%)

    Viruses           17008 (  3%)





   Within Eukaryota:



   



    Category            sequences (% of Eukaryota) (% of the complete database)

     Human                  20369 ( 11%)           (  4%)

     Other Mammalia         46908 ( 24%)           (  8%)

     Other Vertebrata       18556 ( 10%)           (  3%)

     Viridiplantae          40435 ( 21%)           (  7%)

     Fungi                  34686 ( 18%)           (  6%)

     Insecta                 9365 (  5%)           (  2%)

     Nematoda                5038 (  3%)           (  1%)

     Other                  16215 (  8%)           (  3%)







3.  SEQUENCE SIZE



   Repartition of the sequences by size (excluding fragments)



               From   To  Number             From   To   Number

                  1-  50    9711             1001-1100     4032

                 51- 100   42881             1101-1200     2823

                101- 150   59346             1201-1300     2158

                151- 200   59120             1301-1400     2028

                201- 250   57986             1401-1500     1639

                251- 300   51783             1501-1600      804

                301- 350   52144             1601-1700      621

                351- 400   45226             1701-1800      567

                401- 450   37238             1801-1900      486

                451- 500   30036             1901-2000      383

                501- 550   21780             2001-2100      255

                551- 600   15502             2101-2200      353

                601- 650   12928             2201-2300      327

                651- 700    9277             2301-2400      221

                701- 750    7703             2401-2500      172

                751- 800    5583             >2500         1352

                801- 850    4818

                851- 900    5232

                901- 950    4064

                951-1000    2952



   





   The average sequence length in UniProtKB/Swiss-Prot is 360 amino acids.



   The shortest sequence is   GWA_SEPOF (P83570):     2 amino acids.

   The longest sequence is  TITIN_MOUSE (A2ASS6): 35213 amino acids.





4.  JOURNAL CITATIONS



   Note: the following citation statistics reflect the number of distinct

         journal citations.



   Total number of journals cited in this release of UniProtKB/Swiss-Prot: 2874





   4.1 Table of the frequency of journal citations



        Journals cited 1x:  917

                       2x:  404

                       3x:  174

                       4x:  134

                       5x:  117

                       6x:   84

                       7x:   77

                       8x:   65

                       9x:   39

                      10x:   29

                  11- 20x:  236

                  21- 50x:  235

                  51-100x:  125

                    >100x:  238





   4.2  List of the most cited journals in UniProtKB/Swiss-Prot



   Nb    Citations   Journal name

   --    ---------   -------------------------------------------------------------

    1        25296   Journal of Biological Chemistry

    2        11773   Proceedings of the National Academy of Sciences of the U.S.A.

    3         6772   Journal of Bacteriology

    4         5748   Biochemical and Biophysical Research Communications

    5         5474   Biochemistry

    6         5081   Nucleic Acids Research

    7         4886   FEBS Letters

    8         4832   Gene

    9         4748   The EMBO Journal

   10         4673   Nature

   11         4416   Molecular and Cellular Biology

   12         4361   Journal of Molecular Biology

   13         3753   Biochimica et Biophysica Acta

   14         3641   Cell

   15         3433   European Journal of Biochemistry

   16         3313   Journal of Virology

   17         3101   Science

   18         2937   Biochemical Journal

   19         2642   Plant Physiology

   20         2621   Molecular Microbiology

   21         2537   Genomics

   22         2259   The American Journal of Human Genetics

   23         2235   Journal of Cell Biology

   24         2109   PLoS ONE

   25         2064   The Plant Cell

   26         1914   The Plant Journal

   27         1881   Plant Molecular Biology

   28         1880   Human Molecular Genetics

   29         1865   Genes and Development

   30         1824   Virology

   31         1790   Nature Genetics

   32         1701   Molecular Biology of the Cell

   33         1695   Development

   34         1596   Molecular Cell

   35         1584   Human Mutation

   36         1545   Oncogene

   37         1525   Journal of Immunology

   38         1403   Molecular and General Genetics

   39         1384   Structure

   40         1366   Journal of Biochemistry

   41         1334   Genetics

   42         1291   Journal of Cell Science

   43         1186   Blood

   44         1161   Infection and Immunity

   45         1151   Journal of General Virology

   46         1097   Microbiology

   47         1080   Archives of Biochemistry and Biophysics

   48         1079   Current Biology

   49         1068   Developmental Biology

   50          945   Journal of Neuroscience

   51          939   Applied and Environmental Microbiology

   52          916   Acta Crystallographica, Section D

   53          894   Cancer Research

   54          854   FEMS Microbiology Letters

   55          840   Yeast

   56          801   Toxicon

   57          794   Protein Science

   58          775   Neuron

   59          765   Journal of Clinical Investigation

   60          724   PLoS Genetics

   61          719   Plant and Cell Physiology

   62          712   American Journal of Physiology

   63          705   Human Genetics

   64          681   The Journal of Experimental Medicine

   65          650   Mechanisms of Development

   66          646   Nature Structural Biology

   67          645   Proteins

   68          641   Journal of Medical Genetics

   69          596   Nature Communications

   70          593   Nature Cell Biology

   71          576   Current Genetics

   72          570   Nature Structural and Molecular Biology

   73          563   Bioscience, Biotechnology, and Biochemistry

   74          552   The FEBS Journal

   75          549   Journal of Neurochemistry

   76          540   Molecular Endocrinology

   77          536   Developmental Cell

   78          529   The Journal of Clinical Endocrinology and Metabolism

   79          514   Endocrinology

   80          502   Scientific Reports

   81          501   Antimicrobial Agents and Chemotherapy

   82          488   Mammalian Genome

   83          468   Experimental Cell Research

   84          454   Molecular and Biochemical Parasitology

   85          438   Peptides

   86          438   Eukaryotic Cell

   87          435   Planta

   88          430   RNA

   89          430   Immunogenetics

   90          429   PLoS Pathogens

   91          412   Journal of the American Chemical Society

   92          407   Journal of Experimental Botany

   93          405   Journal of Molecular Evolution

   94          401   Molecular Biology and Evolution

   95          390   Molecular Pharmacology

   96          389   DNA and Cell Biology

   97          383   EMBO Reports

   98          382   American Journal of Medical Genetics. Part A

   99          382   The FASEB Journal

  100          376   Acta Crystallographica, Section F

  101          375   DNA Sequence

  102          375   Neurology

  103          373   Journal of Investigative Dermatology

  104          372   Molecular Plant-Microbe Interactions

  105          367   European Journal of Human Genetics

  106          355   Comparative Biochemistry and Physiology

  107          354   Immunity

  108          353   Biology of Reproduction

  109          340   Brain Research. Molecular Brain Research

  110          339   Biochimie

  111          331   Virus Research

  112          330   Genes to Cells

  113          324   The New England Journal of Medicine

  114          322   Clinical Genetics

  115          320   Developmental Dynamics

  116          305   Annals of Neurology

  117          300   Genome Research

  118          298   Biological Chemistry Hoppe-Seyler

  119          291   Journal of Lipid Research

  120          289   European Journal of Immunology

  121          288   BMC Genomics

  122          287   Nature Immunology

  123          282   Cytogenetics and Cell Genetics

  124          281   Investigative Ophthalmology and Visual Science

  125          273   Applied Microbiology and Biotechnology

  126          272   Journal of General Microbiology

  127          267   Journal of Human Genetics

  128          267   Cell Reports

  129          266   Journal of Medicinal Chemistry

  130          264   PLoS Biology

  131          252   Glycobiology

  132          250   Archives of Microbiology

  133          240   Traffic

  134          239   Molecular Immunology

  135          238   Journal of Cellular Biochemistry

  136          233   Molecular Genetics and Metabolism

  137          232   DNA Research

  138          225   Cell Cycle

  139          222   Protein Expression and Purification

  140          221   Diabetes

  141          219   Phytochemistry

  142          218   Hoppe-Seyler's Zeitschrift fur Physiologische Chemie

  143          217   Circulation Research

  144          217   Archives of Virology

  145          215   Nature Medicine

  146          213   Fungal Genetics and Biology

  147          208   Molecular and Cellular Endocrinology

  148          201   

  149          200   Nature Chemical Biology

  150          199   Molecular Genetics and Genomics





5.  STATISTICS FOR SOME LINE TYPES



The following table summarizes the total number of some UniProtKB/Swiss-Prot lines,

as well as the number of entries with at least one such line, and the

frequency of the lines.



                                      Total    Number of  Average

   Line type / subtype                number   entries    per entry

------------------------------------  -------- ---------  ---------



References (RL)                      1244302                 2.21                                         

   Journal                           1068224     460508      1.90       1                                 

   Submitted to EMBL/GenBank/DDBJ     165011     149391      0.29       2                                 

   Submitted to other databases         7508       6919      0.01       3                                 

   Book citation                        1834       1811     <0.01       4                                 

   Plant Gene Register                   612        599     <0.01       5                                 

   Unpublished observations              460        456     <0.01       6                                 

   Thesis                                437        434     <0.01       7                                 

   Patent                                210        204     <0.01       8                                 

   Worm Breeder's Gazette                  6          6     <0.01       9                                 



Total number of distinct authors cited in UniProtKB/Swiss-Prot: 424821



                                      Total    Number of  Average

   Line type / subtype                number   entries    per entry  Rank

------------------------------------  -------- ---------  ---------  ----

Comments (CC)                        2602188                 4.62                                         

   ACTIVITY REGULATION                 15695      15690      0.03      17                                 

   ALLERGEN                              906        906     <0.01      26                                 

   ALTERNATIVE PRODUCTS                25423      25423      0.05      13                                 

   BIOPHYSICOCHEMICAL PROPERTIES        9096       9089      0.02      20                                 

   BIOTECHNOLOGY                        1257       1240     <0.01      25                                 

   CATALYTIC ACTIVITY                 303256     242532      0.54       4                                 

   CAUTION                             13402      13125      0.02      18                                 

   COFACTOR                           127417     115829      0.23       7                                 

   DEVELOPMENTAL STAGE                 12691      12682      0.02      19                                 

   DISEASE                              7426       4985      0.01      21                                 

   DISRUPTION PHENOTYPE                16105      16101      0.03      16                                 

   DOMAIN                              51177      43830      0.09       9                                 

   FUNCTION                           472478     451041      0.84       2                                 

   INDUCTION                           22345      22314      0.04      15                                 

   INTERACTION                         22481      22481      0.04      14                                 

   MASS SPECTROMETRY                    6970       5365      0.01      23                                 

   MISCELLANEOUS                       43933      38643      0.08      12                                 

   PATHWAY                            140428     127112      0.25       6                                 

   PHARMACEUTICAL                        136        130     <0.01      29                                 

   POLYMORPHISM                         1279       1224     <0.01      24                                 

   PTM                                 58815      42950      0.10       8                                 

   RNA EDITING                           628        628     <0.01      28                                 

   SEQUENCE CAUTION                    44483      44411      0.08      11                                 

   SIMILARITY                         511586     507405      0.91       1                                 

   SUBCELLULAR LOCATION               352792     345069      0.63       3                                 

   SUBUNIT                            284674     281266      0.51       5                                 

   TISSUE SPECIFICITY                  47397      47337      0.08      10                                 

   TOXIC DOSE                            753        621     <0.01      27                                 

   WEB RESOURCE                         7159       5982      0.01      22                                 



Total number of comment topics: 29





                                      Total    Number of  Average

   Line type / subtype                number   entries    per entry  Rank

------------------------------------  -------- ---------  ---------  ----

Features (FT)                        4629208                 8.23                                         

   ACT_SITE                           166384     100744      0.30      10                                 

   BINDING                            413818     109917      0.74       2                                 

   CA_BIND                              4211       1747      0.01      36                                 

   CARBOHYD                           118407      30386      0.21      15                                 

   CHAIN                              570910     555513      1.01       1                                 

   COILED                              22126      15318      0.04      27                                 

   COMPBIAS                            59211      31860      0.11      21                                 

   CONFLICT                           136856      47768      0.24      13                                 

   CROSSLNK                            23916       8594      0.04      26                                 

   DISULFID                           127890      34235      0.23      14                                 

   DNA_BIND                            11929      10670      0.02      32                                 

   DOMAIN                             205049     126188      0.36       8                                 

   HELIX                              274878      24982      0.49       6                                 

   INIT_MET                            17384      17336      0.03      28                                 

   INTRAMEM                             2799       1283     <0.01      37                                 

   LIPID                               13266       8562      0.02      30                                 

   METAL                              403361      97471      0.72       3                                 

   MOD_RES                            255220      72666      0.45       7                                 

   MOTIF                               44626      29149      0.08      23                                 

   MUTAGEN                             77197      16638      0.14      18                                 

   NON_CONS                             2523        820     <0.01      38                                 

   NON_STD                               358        283     <0.01      39                                 

   NON_TER                             12545       9615      0.02      31                                 

   NP_BIND                            159575      86470      0.28      11                                 

   PEPTIDE                             11910       8166      0.02      33                                 

   PROPEP                              14545      12402      0.03      29                                 

   REGION                             203976      95724      0.36       9                                 

   REPEAT                             106670      14883      0.19      16                                 

   SIGNAL                              42755      42754      0.08      24                                 

   SITE                                60330      32602      0.11      20                                 

   STRAND                             284625      23547      0.51       5                                 

   TOPO_DOM                           144117      29375      0.26      12                                 

   TRANSIT                              9222       9106      0.02      34                                 

   TRANSMEM                           374217      78244      0.66       4                                 

   TURN                                66264      20317      0.12      19                                 

   UNSURE                               5573        838      0.01      35                                 

   VAR_SEQ                             52275      22189      0.09      22                                 

   VARIANT                             98178      17164      0.17      17                                 

   ZN_FING                             30112      12867      0.05      25                                 



Total number of feature keys: 39







                                      Total    Number of  Average

   Line type / subtype                number   entries    per entry  Rank      Category

------------------------------------  -------- ---------  ---------  ----      -------------------------------------------

Cross-references (DR)               18928213                33.63                                                           

   ABCD                                 1896       1896     <0.01     124      Protocols and materials databases            

   Allergome                            1985       1283     <0.01     123      Protein family/group databases               

   Antibodypedia                       32082      31968      0.06      57      Protocols and materials databases            

   ArachnoServer                        1163       1154     <0.01     133      Organism-specific databases                  

   Araport                             16003      15907      0.03      84      Organism-specific databases                  

   Bgee                                56889      56889      0.10      42      Gene expression databases                    

   BindingDB                            5261       5261      0.01     101      Chemistry                                    

   BioCyc                             270351     244737      0.48      22      Enzyme and pathway databases                 

   BioGRID                             54865      53166      0.10      46      Protein-protein interaction databases        

   BioGRID-ORCS                        38898      38400      0.07      55      Other                                        

   BioMuta                             20313      20298      0.04      72      Polymorphism and mutation databases          

   BRENDA                              12975      12193      0.02      87      Enzyme and pathway databases                 

   CarbonylDB                           1157       1157     <0.01     135      PTM databases                                

   CAZy                                 9529       8588      0.02      91      Protein family/group databases               

   CCDS                                48836      34355      0.09      49      Sequence databases                           

   CDD                                184724     167613      0.33      28      Family and domain databases                  

   CGD                                  2000       1983     <0.01     122      Organism-specific databases                  

   ChEMBL                               7818       7655      0.01      95      Chemistry                                    

   ChiTaRS                             29624      29587      0.05      59      Other                                        

   CLAE                                  356        353     <0.01     150      Protein family/group databases               

   CollecTF                              135        135     <0.01     157      Gene expression databases                    

   ComplexPortal                       10221       5660      0.02      90      Protein-protein interaction databases        

   COMPLUYEAST-2DPAGE                     97         97     <0.01     158      2D gel databases                             

   ConoServer                            957        873     <0.01     137      Organism-specific databases                  

   CORUM                                5805       5805      0.01     100      Protein-protein interaction databases        

   CPTAC                                2221       1423     <0.01     120      Proteomic databases                          

   CTD                                 75272      74371      0.13      40      Organism-specific databases                  

   DEPOD                                 239        239     <0.01     155      PTM databases                                

   dictyBase                            4214       4100      0.01     110      Organism-specific databases                  

   DIP                                 17435      17396      0.03      80      Protein-protein interaction databases        

   DisGeNET                            15646      15415      0.03      85      Organism-specific databases                  

   DisProt                              1379       1367     <0.01     130      Family and domain databases                  

   DMDM                                16197      16195      0.03      83      Polymorphism and mutation databases          

   DNASU                               19056      18990      0.03      74      Protocols and materials databases            

   DOSAC-COBS-2DPAGE                     145        145     <0.01     156      2D gel databases                             

   DrugBank                            28101       4590      0.05      63      Chemistry                                    

   DrugCentral                          2532       2532     <0.01     117      Chemistry                                    

   EchoBASE                             4158       4158      0.01     111      Organism-specific databases                  

   eggNOG                             667204     332867      1.19       5      Phylogenomic databases                       

   ELM                                  1811       1811     <0.01     125      Protein-protein interaction databases        

   EMBL                               987951     550647      1.76       3      Sequence databases                           

   Ensembl                             98086      51353      0.17      36      Genome annotation databases                  

   EnsemblBacteria                    356229     337021      0.63      16      Genome annotation databases                  

   EnsemblFungi                        29787      28221      0.05      58      Genome annotation databases                  

   EnsemblMetazoa                      17898      10410      0.03      78      Genome annotation databases                  

   EnsemblPlants                       28264      21164      0.05      62      Genome annotation databases                  

   EnsemblProtists                      5022       4843      0.01     103      Genome annotation databases                  

   EPD                                 21130      21130      0.04      68      Proteomic databases                          

   ESTHER                               2564       2563     <0.01     116      Protein family/group databases               

   euHCVdb                                55         44     <0.01     160      Organism-specific databases                  

   EuPathDB                            39157      38963      0.07      53      Organism-specific databases                  

   EvolutionaryTrace                   16644      16644      0.03      82      Other                                        

   ExpressionAtlas                     48079      48079      0.09      50      Gene expression databases                    

   FlyBase                              4907       4781      0.01     105      Organism-specific databases                  

   Gene3D                             411415     319697      0.73      14      Family and domain databases                  

   GeneCards                           20331      20159      0.04      70      Organism-specific databases                  

   GeneDB                                589        533     <0.01     143      Genome annotation databases                  

   GeneID                             288063     279084      0.51      20      Genome annotation databases                  

   GeneReviews                          1479       1475     <0.01     126      Organism-specific databases                  

   GeneTree                            59856      59812      0.11      41      Phylogenomic databases                       

   Genevisible                         55242      55242      0.10      45      Gene expression databases                    

   GeneWiki                            10350      10267      0.02      89      Other                                        

   GenomeRNAi                          22160      22160      0.04      66      Other                                        

   GlyConnect                           2243       2107     <0.01     118      PTM databases                                

   GO                                3062717     537676      5.44       1      Ontologies                                   

   Gramene                             28264      21164      0.05      61      Genome annotation databases                  

   GuidetoPHARMACOLOGY                  2012       2012     <0.01     121      Chemistry                                    

   HAMAP                              330290     327370      0.59      17      Family and domain databases                  

   HGNC                                20315      20176      0.04      71      Organism-specific databases                  

   HOGENOM                            423356     423356      0.75      12      Phylogenomic databases                       

   HPA                                 18987      18852      0.03      75      Organism-specific databases                  

   IDEAL                                 985        985     <0.01     136      Family and domain databases                  

   IMGT_GENE-DB                          267        267     <0.01     154      Protein family/group databases               

   InParanoid                         140095     140095      0.25      29      Phylogenomic databases                       

   IntAct                              55322      55322      0.10      44      Protein-protein interaction databases        

   InterPro                          2309351     543883      4.10       2      Family and domain databases                  

   iPTMnet                             52671      52671      0.09      47      PTM databases                                

   jPOST                               26792      26792      0.05      64      Proteomic databases                          

   KEGG                               505480     476464      0.90       8      Genome annotation databases                  

   KO                                 406248     406147      0.72      15      Phylogenomic databases                       

   LegioList                             765        763     <0.01     140      Organism-specific databases                  

   Leproma                               672        669     <0.01     141      Organism-specific databases                  

   MaizeGDB                              520        516     <0.01     146      Organism-specific databases                  

   MalaCards                            4737       4733      0.01     106      Organism-specific databases                  

   MassIVE                             17475      17475      0.03      79      Proteomic databases                          

   MaxQB                               29595      29595      0.05      60      Proteomic databases                          

   MEROPS                              11485      11483      0.02      88      Protein family/group databases               

   MetOSite                             3107       3107      0.01     114      PTM databases                                

   MGI                                 16953      16913      0.03      81      Organism-specific databases                  

   MIM                                 21642      15284      0.04      67      Organism-specific databases                  

   MINT                                22791      22791      0.04      65      Protein-protein interaction databases        

   MoonDB                                348        348     <0.01     151      Protein family/group databases               

   MoonProt                              281        281     <0.01     152      Protein family/group databases               

   neXtProt                            20332      20332      0.04      69      Organism-specific databases                  

   NIAGADS                                68         68     <0.01     159      Organism-specific databases                  

   OGP                                   373        373     <0.01     149      2D gel databases                             

   OMA                                413869     413869      0.74      13      Phylogenomic databases                       

   OpenTargets                         18433      18280      0.03      76      Organism-specific databases                  

   Orphanet                             7658       4083      0.01      96      Organism-specific databases                  

   OrthoDB                            245253     245253      0.44      23      Phylogenomic databases                       

   PANTHER                            282905     271015      0.50      21      Family and domain databases                  

   PATRIC                              92332      92332      0.16      39      Genome annotation databases                  

   PaxDb                              125383     125383      0.22      33      Proteomic databases                          

   PDB                                192519      28974      0.34      26      3D structure databases                       

   PDBsum                             192519      28974      0.34      27      3D structure databases                       

   PeptideAtlas                        33378      33378      0.06      56      Proteomic databases                          

   PeroxiBase                            782        760     <0.01     139      Protein family/group databases               

   Pfam                               782327     523239      1.39       4      Family and domain databases                  

   PharmGKB                            18319      18301      0.03      77      Organism-specific databases                  

   Pharos                              20110      20110      0.04      73      Other                                        

   PHI-base                             1439       1194     <0.01     127      Other                                        

   PhosphoSitePlus                     39072      39072      0.07      54      PTM databases                                

   PhylomeDB                           96935      96935      0.17      37      Phylogenomic databases                       

   PIR                                124334     114074      0.22      34      Sequence databases                           

   PIRSF                              109134     108105      0.19      35      Family and domain databases                  

   PlantReactome                        1159        713     <0.01     134      Enzyme and pathway databases                 

   PomBase                              5132       5128      0.01     102      Organism-specific databases                  

   PRIDE                              234640     234640      0.42      24      Proteomic databases                          

   PRINTS                             130744     115960      0.23      31      Family and domain databases                  

   PRO                                 96893      96893      0.17      38      Other                                        

   ProMEX                                467        467     <0.01     148      Proteomic databases                          

   PROSITE                            479276     304741      0.85      10      Family and domain databases                  

   Proteomes                          499108     465757      0.89       9      Other                                        

   ProteomicsDB                        56831      35704      0.10      43      Proteomic databases                          

   PseudoCAP                            1390       1381     <0.01     129      Organism-specific databases                  

   Reactome                           130317      36866      0.23      32      Enzyme and pathway databases                 

   REBASE                                619        378     <0.01     142      Protein family/group databases               

   RefSeq                             615359     469214      1.09       6      Sequence databases                           

   REPRODUCTION-2DPAGE                  1259       1038     <0.01     131      2D gel databases                             

   RGD                                  8031       8028      0.01      94      Organism-specific databases                  

   RNAct                               43015      43015      0.08      52      Other                                        

   SABIO-RK                             4380       4380      0.01     107      Enzyme and pathway databases                 

   SFLD                                 8127       6036      0.01      93      Family and domain databases                  

   SGD                                  6740       6735      0.01      98      Organism-specific databases                  

   SignaLink                            3102       3102      0.01     115      Enzyme and pathway databases                 

   SIGNOR                               4302       4302      0.01     108      Enzyme and pathway databases                 

   SMART                              193598     142703      0.34      25      Family and domain databases                  

   SMR                                449410     449410      0.80      11      3D structure databases                       

   STRING                             329116     329116      0.58      18      Protein-protein interaction databases        

   SUPFAM                             510159     386419      0.91       7      Family and domain databases                  

   SWISS-2DPAGE                         1177       1177     <0.01     132      2D gel databases                             

   SwissLipids                          1428       1342     <0.01     128      Chemistry                                    

   SwissPalm                            8615       8615      0.02      92      PTM databases                                

   TAIR                                14782      14726      0.03      86      Organism-specific databases                  

   TCDB                                 7464       7409      0.01      97      Protein family/group databases               

   TIGRFAMs                           292753     272721      0.52      19      Family and domain databases                  

   TopDownProteomics                    3237       2960      0.01     112      Proteomic databases                          

   TreeFam                             45675      45670      0.08      51      Phylogenomic databases                       

   TubercuList                          2222       2186     <0.01     119      Organism-specific databases                  

   UCD-2DPAGE                            496        496     <0.01     147      2D gel databases                             

   UCSC                                50216      45832      0.09      48      Genome annotation databases                  

   UniCarbKB                             584        584     <0.01     144      PTM databases                                

   UniLectin                             276        276     <0.01     153      Protein family/group databases               

   UniPathway                         137831     124659      0.24      30      Enzyme and pathway databases                 

   VectorBase                            582        503     <0.01     145      Genome annotation databases                  

   VGNC                                 4275       4262      0.01     109      Organism-specific databases                  

   WBParaSite                              5          5     <0.01     161      Genome annotation databases                  

   World-2DPAGE                          931        920     <0.01     138      2D gel databases                             

   WormBase                             6301       4728      0.01      99      Organism-specific databases                  

   Xenbase                              5009       5000      0.01     104      Organism-specific databases                  

   ZFIN                                 3132       3127      0.01     113      Organism-specific databases                  



Total number of cross-referenced databases: 161



6.  AMINO ACID COMPOSITION



   6.1  Composition in percent for the complete database



   Ala (A) 8.25   Gln (Q) 3.93   Leu (L) 9.65   Ser (S) 6.63

   Arg (R) 5.53   Glu (E) 6.72   Lys (K) 5.81   Thr (T) 5.35

   Asn (N) 4.06   Gly (G) 7.08   Met (M) 2.41   Trp (W) 1.09

   Asp (D) 5.46   His (H) 2.27   Phe (F) 3.86   Tyr (Y) 2.92

   Cys (C) 1.38   Ile (I) 5.92   Pro (P) 4.73   Val (V) 6.86



   Asx (B) 0.000  Glx (Z) 0.000  Xaa (X) 0.00



   



   Legend: gray = aliphatic, red = acidic, green = small hydroxy,

           blue = basic, black = aromatic, white = amide, yellow = sulfur





   6.2  Classification of the amino acids by their frequency



   Leu, Ala, Gly, Val, Glu, Ser, Ile, Lys, Arg, Asp, Thr, Pro, Asn, Gln,

   Phe, Tyr, Met, His, Cys, Trp





7.  MISCELLANEOUS STATISTICS



4464 entries are encoded on a mitochondrion, and 3933 are encoded on a plasmid.



12189 entries are encoded on a plastid, 

of which 21 are encoded on apicoplasts, 

11624 on chloroplasts, 

51 on organellar chromatophores,

145 on cyanelles, 

149 on non-photosynthetic plastids and 

199 on unspecified types of plastid.



Number of entries with at least one sequence correction: 79900