Home  |  Contact



         UniProtKB/Swiss-Prot protein knowledgebase release 2021_03 statistics





1.  INTRODUCTION



Release 2021_03 of 02-Jun-21 of UniProtKB/Swiss-Prot contains 565254 sequence entries,

comprising 203850821 amino acids abstracted from 279685 references. 



633 sequences have been added since release 2021_02, the sequence data of

26 existing entries has been updated and the annotations of

279556 entries have been revised.



Number of fragments: 9261

Number of additional sequences produced by alternative splicing, initiation or promoter usage, or ribosomal frameshifting: 40563





Protein existence (PE):           entries     %



1: Evidence at protein level       106971   18.9%

2: Evidence at transcript level     56260     10%

3: Inferred from homology          386903   68.4%

4: Predicted                        13273    2.3%

5: Uncertain                         1847    0.3%



The growth of the database is summarized below.



   





2.  TAXONOMIC ORIGIN



   Total number of species represented in this release of UniProtKB/Swiss-Prot: 14085



   The first twenty species represent 121862 sequences:  21.6 % of the total

   number of entries.





   2.1 Table of the frequency of occurrence of species



        Species represented 1x: 5758

                            2x: 2035

                            3x: 1090

                            4x:  728

                            5x:  513

                            6x:  430

                            7x:  319

                            8x:  256

                            9x:  234

                           10x:  144

                       11- 20x:  824

                       21- 50x:  475

                       51-100x:  224

                         >100x: 1055





   2.2  Table of the most represented species



  ------  ---------  --------------------------------------------

  Number  Frequency  Species

  ------  ---------  --------------------------------------------

       1      20386  Homo sapiens (Human)

       2      17082  Mus musculus (Mouse)

       3      16111  Arabidopsis thaliana (Mouse-ear cress)

       4       8135  Rattus norvegicus (Rat)

       5       6721  Saccharomyces cerevisiae (strain ATCC 204508 / S288c) (Baker's yeast)

       6       6014  Bos taurus (Bovine)

       7       5137  Schizosaccharomyces pombe (strain 972 / ATCC 24843) (Fission yeast)

       8       4519  Escherichia coli (strain K12)

       9       4265  Caenorhabditis elegans

      10       4191  Bacillus subtilis (strain 168)

      11       4150  Dictyostelium discoideum (Slime mold)

      12       4119  Oryza sativa subsp. japonica (Rice)

      13       3641  Drosophila melanogaster (Fruit fly)

      14       3471  Xenopus laevis (African clawed frog)

      15       3204  Danio rerio (Zebrafish) (Brachydanio rerio)

      16       2297  Gallus gallus (Chicken)

      17       2258  Mycobacterium tuberculosis (strain ATCC 25618 / H37Rv)

      18       2218  Pongo abelii (Sumatran orangutan) (Pongo pygmaeus abelii)

      19       2045  Escherichia coli O157:H7

      20       1898  Mycobacterium tuberculosis (strain CDC 1551 / Oshkosh)

      21       1806  Salmonella typhimurium (strain LT2 / SGSC1412 / ATCC 700720)

      22       1787  Methanocaldococcus jannaschii  

      23       1707  Xenopus tropicalis (Western clawed frog) (Silurana tropicalis)

      24       1705  Haemophilus influenzae (strain ATCC 51907 / DSM 11121 / KW20 / Rd)

      25       1702  Escherichia coli O6:H1 (strain CFT073 / ATCC 700928 / UPEC)

      26       1688  Shigella flexneri

      27       1438  Sus scrofa (Pig)

      28       1403  Pseudomonas aeruginosa 

      29       1347  Salmonella typhi

      30       1244  Mycobacterium bovis (strain ATCC BAA-935 / AF2122/97)

      31       1174  Macaca fascicularis (Crab-eating macaque) (Cynomolgus monkey)

      32       1085  Synechocystis sp. (strain PCC 6803 / Kazusa)

      33       1036  Archaeoglobus fulgidus 

      34       1027  Yersinia pestis

      35       1014  Candida albicans (strain SC5314 / ATCC MYA-2876) (Yeast)

      36        988  Vibrio cholerae serotype O1 (strain ATCC 39315 / El Tor Inaba N16961)

      37        978  Emericella nidulans  

      38        941  Staphylococcus aureus (strain Mu50 / ATCC 700699)

      39        930  Salmonella paratyphi A (strain ATCC 9150 / SARB42)

      40        929  Staphylococcus aureus (strain N315)

      41        928  Ashbya gossypii (strain ATCC 10895 / CBS 109.51 / FGSC 9923 / NRRL Y-1056)  

      42        919  Kluyveromyces lactis   

      43        909  Acanthamoeba polyphaga mimivirus (APMV)

      44        903  Staphylococcus aureus (strain COL)

      45        896  Staphylococcus aureus (strain MW2)

      46        894  Oryctolagus cuniculus (Rabbit)

      47        894  Escherichia coli O6:K15:H31 (strain 536 / UPEC)

      48        890  Staphylococcus aureus (strain MSSA476)

      49        888  Staphylococcus aureus (strain MRSA252)

      50        887  Rhizobium meliloti (strain 1021) (Ensifer meliloti) (Sinorhizobium meliloti)

      51        882  Salmonella choleraesuis (strain SC-B67)

      52        879  Shigella sonnei (strain Ss046)

      53        878  Candida glabrata   

      54        875  Neurospora crassa 

      55        863  Yersinia pseudotuberculosis serotype I (strain IP32953)

      56        852  Oryza sativa subsp. indica (Rice)

      57        847  Escherichia coli O9:H4 (strain HS)

      58        839  Neosartorya fumigata (strain ATCC MYA-4609 / Af293 / CBS 101355 / FGSC A1100) 

      59        834  Escherichia coli O139:H28 (strain E24377A / ETEC)

      60        833  Zea mays (Maize)

      61        833  Canis lupus familiaris (Dog) (Canis familiaris)

      62        829  Shigella boydii serotype 4 (strain Sb227)

      63        825  Escherichia coli (strain UTI89 / UPEC)

      64        822  Shigella dysenteriae serotype 1 (strain Sd197)

      65        819  Escherichia coli 

      66        808  Streptomyces coelicolor (strain ATCC BAA-471 / A3(2) / M145)

      67        803  Pectobacterium atrosepticum (strain SCRI 1043 / ATCC BAA-672) 

      68        800  Staphylococcus aureus (strain NCTC 8325 / PS 47)

      69        793  Vibrio parahaemolyticus serotype O3:K6 (strain RIMD 2210633)

      70        791  Escherichia coli (strain SMS-3-5 / SECEC)

      71        787  Aquifex aeolicus (strain VF5)

      72        775  Escherichia coli O127:H6 (strain E2348/69 / EPEC)

      73        771  Escherichia coli (strain K12 / DH10B)

      74        770  Pasteurella multocida (strain Pm70)

      75        765  Escherichia coli O17:K52:H18 (strain UMN026 / ExPEC)

      76        765  Escherichia coli (strain K12 / MC4100 / BW2952)

      77        762  Escherichia coli (strain 55989 / EAEC)

      78        761  Escherichia coli O8 (strain IAI1)

      79        760  Shigella flexneri serotype 5b (strain 8401)

      80        760  Staphylococcus epidermidis (strain ATCC 12228 / FDA PCI 1200)

      81        760  Staphylococcus epidermidis (strain ATCC 35984 / RP62A)

      82        758  Escherichia coli O45:K1 (strain S88 / ExPEC)

      83        756  Escherichia coli (strain SE11)

      84        756  Bacillus anthracis

      85        753  Escherichia coli O7:K1 (strain IAI39 / ExPEC)

      86        748  Photorhabdus laumondii subsp. laumondii (strain DSM 15139 / CIP 105565 / TT01)

      87        748  Escherichia coli O157:H7 (strain EC4115 / EHEC)

      88        742  Bacillus halodurans 

      89        739  Yersinia enterocolitica serotype O:8 / biotype 1B (strain NCTC 13174 / 8081)

      90        733  Vibrio vulnificus (strain CMCP6)

      91        731  Escherichia coli O81 (strain ED1a)

      92        726  Pseudomonas putida (strain ATCC 47054 / DSM 6125 / NCIMB 11950 / KT2440)

      93        722  Salmonella enteritidis PT4 (strain P125109)

      94        718  Vibrio vulnificus (strain YJ016)

      95        716  Salmonella paratyphi B (strain ATCC BAA-1250 / SPB7)

      96        715  Klebsiella pneumoniae subsp. pneumoniae (strain ATCC 700721 / MGH 78578)

      97        715  Enterobacter sp. (strain 638)

      98        715  Yersinia pestis bv. Antiqua (strain Nepal516)

      99        714  Salmonella paratyphi A (strain AKU_12601)

     100        714  Escherichia coli O1:K1 / APEC

     101        713  Yersinia pseudotuberculosis serotype O:1b (strain IP 31758)

     102        713  Salmonella newport (strain SL254)

     103        713  Salmonella agona (strain SL483)

     104        712  Salmonella schwarzengrund (strain CVM19633)

     105        711  Yersinia pestis bv. Antiqua (strain Antiqua)

     106        710  Salmonella heidelberg (strain SL476)

     107        702  Salmonella dublin (strain CT_02021853)

     108        699  Klebsiella pneumoniae (strain 342)

     109        698  Shigella boydii serotype 18 (strain CDC 3083-94 / BS512)

     110        697  Nostoc sp. (strain PCC 7120 / SAG 25.82 / UTEX 2576)

     111        695  Escherichia fergusonii 

     112        692  Pan troglodytes (Chimpanzee)

     113        687  Escherichia coli

     114        686  Mycoplasma pneumoniae (strain ATCC 29342 / M129)

     115        684  Salmonella gallinarum (strain 287/91 / NCTC 13346)

     116        680  Pseudomonas syringae pv. tomato (strain ATCC BAA-871 / DC3000)

     117        678  Citrobacter koseri (strain ATCC BAA-895 / CDC 4225-83 / SGSC4696)

     118        678  Staphylococcus aureus (strain USA300)

     119        672  Serratia proteamaculans (strain 568)

     120        669  Mycobacterium leprae (strain TN)

     121        668  Bacillus cereus 

     122        667  Yersinia pestis (strain Pestoides F)

     123        665  Yarrowia lipolytica (strain CLIB 122 / E 150) (Yeast) (Candida lipolytica)

     124        664  Bradyrhizobium diazoefficiens 

     125        658  Sinorhizobium fredii (strain NBRC 101917 / NGR234)

     126        653  Shewanella oneidensis (strain MR-1)

     127        653  Debaryomyces hansenii   

     128        651  Agrobacterium fabrum (strain C58 / ATCC 33970) (Agrobacterium tumefaciens 

     129        643  Staphylococcus aureus (strain bovine RF122 / ET3-1)

     130        642  Salmonella arizonae (strain ATCC BAA-731 / CDC346-86 / RSK2980)

     131        641  Yersinia pseudotuberculosis serotype O:3 (strain YPIII)

     132        634  Yersinia pseudotuberculosis serotype IB (strain PB1/+)

     133        622  Methanothermobacter thermautotrophicus  

     134        622  Treponema pallidum (strain Nichols)

     135        622  Cronobacter sakazakii (strain ATCC BAA-894) (Enterobacter sakazakii)

     136        621  Listeria monocytogenes serovar 1/2a (strain ATCC BAA-679 / EGD-e)

     137        615  Xanthomonas campestris pv. campestris 

     138        614  Staphylococcus haemolyticus (strain JCSC1435)

     139        613  Mesorhizobium japonicum  (Mesorhizobium loti 

     140        612  Pseudomonas aeruginosa (strain UCBPP-PA14)

     141        608  Helicobacter pylori (strain ATCC 700392 / 26695) (Campylobacter pylori)

     142        604  Listeria innocua serovar 6a (strain ATCC BAA-680 / CLIP 11262)

     143        603  Ralstonia solanacearum (strain GMI1000) (Pseudomonas solanacearum)

     144        602  Photobacterium profundum (strain SS9)

     145        602  Staphylococcus saprophyticus subsp. saprophyticus 

     146        601  Salmonella paratyphi C (strain RKS4594)

     147        600  Yersinia pestis bv. Antiqua (strain Angola)

     148        595  Bacillus cereus (strain ATCC 10987 / NRS 248)

     149        591  Pectobacterium carotovorum subsp. carotovorum (strain PC1)

     150        584  Rickettsia prowazekii (strain Madrid E)

     151        583  Neisseria meningitidis serogroup B (strain MC58)

     152        581  Mycolicibacterium smegmatis (strain ATCC 700084 / mc(2)155) 

     153        579  Caenorhabditis briggsae

     154        579  Brucella suis biovar 1 (strain 1330)

     155        574  Brucella melitensis biotype 1 (strain 16M / ATCC 23456 / NCTC 10094)

     156        573  Aliivibrio fischeri (strain ATCC 700601 / ES114) (Vibrio fischeri)

     157        573  Caulobacter vibrioides (strain ATCC 19089 / CB15) (Caulobacter crescentus)

     158        572  Buchnera aphidicola subsp. Acyrthosiphon pisum (strain APS) 

     159        569  Bacillus thuringiensis subsp. konkukian (strain 97-27)

     160        568  Helicobacter pylori (strain J99 / ATCC 700824) (Campylobacter pylori J99)

     161        567  Pseudomonas syringae pv. syringae (strain B728a)

     162        564  Bacillus licheniformis 

     163        562  Buchnera aphidicola subsp. Schizaphis graminum (strain Sg)

     164        562  Aspergillus oryzae (strain ATCC 42149 / RIB 40) (Yellow koji mold)

     165        562  Bacillus cereus (strain ZK / E33L)

     166        559  Clostridium acetobutylicum 

     167        559  Thermotoga maritima 

     168        557  Xanthomonas axonopodis pv. citri (strain 306)

     169        555  Pseudomonas fluorescens (strain Pf0-1)

     170        554  Pseudomonas fluorescens (strain ATCC BAA-477 / NRRL B-23932 / Pf-5)

     171        554  Neisseria meningitidis serogroup A / serotype 4A (strain DSM 15465 / Z2491)

     172        553  Oceanobacillus iheyensis 

     173        547  Pseudomonas savastanoi pv. phaseolicola  (Pseudomonas syringae pv. phaseolicola 

     174        540  Lactococcus lactis subsp. lactis (strain IL1403) (Streptococcus lactis)

     175        534  Corynebacterium glutamicum 

     176        531  Erwinia tasmaniensis 

     177        529  Sodalis glossinidius (strain morsitans)

     178        529  Listeria monocytogenes serotype 4b (strain F2365)

     179        528  Bordetella bronchiseptica (strain ATCC BAA-588 / NCTC 13252 / RB50) 

     180        524  Staphylococcus aureus (strain Newman)

     181        522  Xylella fastidiosa (strain 9a5c)

     182        521  Vibrio cholerae serotype O1 (strain ATCC 39541 / Classical Ogawa 395 / O395)

     183        519  Methanosarcina acetivorans (strain ATCC 35395 / DSM 2834 / JCM 12185 / C2A)

     184        517  Chromobacterium violaceum 

     185        516  Deinococcus radiodurans 

     186        516  Bordetella pertussis (strain Tohama I / ATCC BAA-589 / NCTC 13251)

     187        515  Xylella fastidiosa (strain Temecula1 / ATCC 700964)

     188        514  Streptococcus pneumoniae serotype 4 (strain ATCC BAA-334 / TIGR4)

     189        512  Pseudomonas aeruginosa (strain PA7)

     190        511  Streptomyces avermitilis 

     191        510  Haemophilus ducreyi (strain 35000HP / ATCC 700724)

     192        510  Geobacillus kaustophilus (strain HTA426)

     193        508  Bordetella parapertussis (strain 12822 / ATCC BAA-587 / NCTC 13253)

     194        507  Buchnera aphidicola subsp. Baizongia pistaciae (strain Bp)

     195        502  Pseudomonas entomophila (strain L48)

     196        502  Pyrococcus furiosus (strain ATCC 43587 / DSM 3638 / JCM 8422 / Vc1)

     197        499  Haemophilus influenzae (strain 86-028NP)

     198        499  Brucella abortus biovar 1 (strain 9-941)

     199        498  Acinetobacter baylyi (strain ATCC 33305 / BD413 / ADP1)

     200        496  Rickettsia conorii (strain ATCC VR-613 / Malish 7)

     201        496  Bacillus clausii (strain KSM-K16)

     202        496  Burkholderia pseudomallei (strain K96243)

     203        494  Xanthomonas campestris pv. campestris (strain 8004)

     204        494  Proteus mirabilis (strain HI4320)

     205        494  Pyrococcus horikoshii 

     206        493  Thermosynechococcus elongatus (strain BP-1)

     207        492  Bacillus velezensis (strain DSM 23117 / BGSC 10A6 / FZB42) 

     208        491  Halobacterium salinarum (strain ATCC 700922 / JCM 11081 / NRC-1) 

     209        491  Vibrio campbellii (strain ATCC BAA-1116 / BB120)

     210        490  Methanosarcina mazei  

     211        489  Brucella abortus (strain 2308)

     212        489  Solanum lycopersicum (Tomato) (Lycopersicon esculentum)

     213        487  Synechococcus elongatus (strain PCC 7942 / FACHB-805) (Anacystis nidulans R2)

     214        487  Shewanella sp. (strain MR-7)

     215        486  Mannheimia succiniciproducens (strain MBEL55E)

     216        484  Streptococcus pneumoniae (strain ATCC BAA-255 / R6)

     217        484  Staphylococcus aureus (strain Mu3 / ATCC 700698)

     218        484  Pseudomonas aeruginosa (strain LESB58)

     219        484  Shewanella sp. (strain MR-4)

     220        483  Mycoplasma genitalium (strain ATCC 33530 / G-37 / NCTC 10195)

     221        482  Saccharolobus solfataricus (strain ATCC 35092 / DSM 1617 / JCM 11322 / P2) 

     222        480  Lactobacillus plantarum (strain ATCC BAA-793 / NCIMB 8826 / WCFS1)

     223        478  Pseudomonas putida (strain ATCC 700007 / DSM 6899 / BCRC 17059 / F1)

     224        477  Pyrococcus abyssi (strain GE5 / Orsay)

     225        477  Nicotiana tabacum (Common tobacco)

     226        475  Cupriavidus necator  

     227        475  Burkholderia lata 

     228        472  Rhodopseudomonas palustris (strain ATCC BAA-98 / CGA009)

     229        469  Clostridium perfringens (strain 13 / Type A)

     230        469  Rhodobacter sphaeroides  

     231        468  Pseudomonas putida (strain GB-1)

     232        467  Shewanella frigidimarina (strain NCIMB 400)

     233        467  Aeromonas hydrophila subsp. hydrophila 

     234        467  Enterococcus faecalis (strain ATCC 700802 / V583)

     235        467  Campylobacter jejuni subsp. jejuni serotype O:2 

     236        466  Shewanella sp. (strain ANA-3)

     237        466  Xanthomonas campestris pv. vesicatoria (strain 85-10)

     238        465  Trichormus variabilis (strain ATCC 29413 / PCC 7937) (Anabaena variabilis)

     239        463  Burkholderia mallei (strain ATCC 23344)

     240        459  Cupriavidus pinatubonensis (strain JMP 134 / LMG 1197) (Cupriavidus necator 

     241        459  Ovis aries (Sheep)

     242        458  Methylococcus capsulatus (strain ATCC 33009 / NCIMB 11132 / Bath)

     243        457  Rickettsia felis (strain ATCC VR-1525 / URRWXCal2) (Rickettsia azadi)

     244        455  Staphylococcus aureus (strain JH1)

     245        455  Shewanella baltica (strain OS185)

     246        455  Xanthomonas oryzae pv. oryzae (strain MAFF 311018)

     247        453  Pseudomonas putida (strain W619)

     248        453  Streptococcus mutans serotype c (strain ATCC 700610 / UA159)

     249        452  Aeromonas salmonicida (strain A449)

     250        451  Caldanaerobacter subterraneus subsp. tengcongensis  





   

   2.3  Taxonomic distribution of the sequences



   



   Kingdom        sequences (% of the database)

    Archaea           19653 (  3%)

    Bacteria         335066 ( 59%)

    Eukaryota        193521 ( 34%)

    Viruses           17014 (  3%)





   Within Eukaryota:



   



    Category            sequences (% of Eukaryota) (% of the complete database)

     Human                  20387 ( 11%)           (  4%)

     Other Mammalia         47033 ( 24%)           (  8%)

     Other Vertebrata       18677 ( 10%)           (  3%)

     Viridiplantae          40925 ( 21%)           (  7%)

     Fungi                  35360 ( 18%)           (  6%)

     Insecta                 9457 (  5%)           (  2%)

     Nematoda                5174 (  3%)           (  1%)

     Other                  16508 (  9%)           (  3%)







3.  SEQUENCE SIZE



   Repartition of the sequences by size (excluding fragments)



               From   To  Number             From   To   Number

                  1-  50    9789             1001-1100     4045

                 51- 100   43089             1101-1200     2850

                101- 150   59497             1201-1300     2180

                151- 200   59219             1301-1400     2045

                201- 250   58118             1401-1500     1653

                251- 300   51969             1501-1600      817

                301- 350   52393             1601-1700      625

                351- 400   45463             1701-1800      573

                401- 450   37377             1801-1900      490

                451- 500   30246             1901-2000      388

                501- 550   21982             2001-2100      260

                551- 600   15584             2101-2200      367

                601- 650   12976             2201-2300      332

                651- 700    9315             2301-2400      226

                701- 750    7766             2401-2500      183

                751- 800    5634             >2500         1390

                801- 850    4849

                851- 900    5252

                901- 950    4078

                951-1000    2973



   





   The average sequence length in UniProtKB/Swiss-Prot is 360 amino acids.



   The shortest sequence is   GWA_SEPOF (P83570):     2 amino acids.

   The longest sequence is  TITIN_MOUSE (A2ASS6): 35213 amino acids.





4.  JOURNAL CITATIONS



   Note: the following citation statistics reflect the number of distinct

         journal citations.



   Total number of journals cited in this release of UniProtKB/Swiss-Prot: 2942





   4.1 Table of the frequency of journal citations



        Journals cited 1x:  939

                       2x:  408

                       3x:  182

                       4x:  145

                       5x:  122

                       6x:   86

                       7x:   63

                       8x:   70

                       9x:   44

                      10x:   32

                  11- 20x:  238

                  21- 50x:  244

                  51-100x:  126

                    >100x:  243





   4.2  List of the most cited journals in UniProtKB/Swiss-Prot



   Nb    Citations   Journal name

   --    ---------   -------------------------------------------------------------

    1        25882   Journal of Biological Chemistry

    2        12046   Proceedings of the National Academy of Sciences of the U.S.A.

    3         6906   Journal of Bacteriology

    4         5823   Biochemical and Biophysical Research Communications

    5         5586   Biochemistry

    6         5131   Nucleic Acids Research

    7         4960   FEBS Letters

    8         4852   Gene

    9         4806   The EMBO Journal

   10         4780   Nature

   11         4466   Molecular and Cellular Biology

   12         4406   Journal of Molecular Biology

   13         3839   Biochimica et Biophysica Acta

   14         3693   Cell

   15         3451   European Journal of Biochemistry

   16         3398   Journal of Virology

   17         3170   Science

   18         3012   Biochemical Journal

   19         2708   Plant Physiology

   20         2698   Molecular Microbiology

   21         2539   Genomics

   22         2303   The American Journal of Human Genetics

   23         2299   PLoS ONE

   24         2268   Journal of Cell Biology

   25         2103   The Plant Cell

   26         1967   The Plant Journal

   27         1910   Human Molecular Genetics

   28         1894   Plant Molecular Biology

   29         1890   Genes and Development

   30         1836   Virology

   31         1803   Nature Genetics

   32         1745   Molecular Biology of the Cell

   33         1736   Development

   34         1645   Molecular Cell

   35         1596   Human Mutation

   36         1572   Journal of Immunology

   37         1555   Oncogene

   38         1415   Structure

   39         1406   Molecular and General Genetics

   40         1377   Journal of Biochemistry

   41         1352   Genetics

   42         1316   Journal of Cell Science

   43         1204   Blood

   44         1197   Infection and Immunity

   45         1159   Journal of General Virology

   46         1118   Microbiology

   47         1106   Developmental Biology

   48         1105   Current Biology

   49         1099   Archives of Biochemistry and Biophysics

   50          964   Journal of Neuroscience

   51          957   Applied and Environmental Microbiology

   52          944   Acta Crystallographica, Section D

   53          910   Cancer Research

   54          867   FEMS Microbiology Letters

   55          841   Yeast

   56          839   Toxicon

   57          815   Protein Science

   58          796   Journal of Clinical Investigation

   59          788   PLoS Genetics

   60          784   Neuron

   61          753   American Journal of Physiology

   62          734   Plant and Cell Physiology

   63          734   Nature Communications

   64          716   Human Genetics

   65          704   The Journal of Experimental Medicine

   66          662   Proteins

   67          655   Mechanisms of Development

   68          652   Journal of Medical Genetics

   69          647   Nature Structural Biology

   70          615   Scientific Reports

   71          605   Nature Cell Biology

   72          594   Nature Structural and Molecular Biology

   73          594   The FEBS Journal

   74          579   Current Genetics

   75          578   Bioscience, Biotechnology, and Biochemistry

   76          553   Journal of Neurochemistry

   77          550   Developmental Cell

   78          546   Molecular Endocrinology

   79          533   The Journal of Clinical Endocrinology and Metabolism

   80          517   Endocrinology

   81          511   Antimicrobial Agents and Chemotherapy

   82          495   PLoS Pathogens

   83          490   Mammalian Genome

   84          477   Experimental Cell Research

   85          470   Molecular and Biochemical Parasitology

   86          452   Eukaryotic Cell

   87          445   Journal of the American Chemical Society

   88          442   Peptides

   89          442   Planta

   90          437   RNA

   91          430   Immunogenetics

   92          425   Journal of Experimental Botany

   93          407   Molecular Biology and Evolution

   94          406   Journal of Molecular Evolution

   95          401   Molecular Pharmacology

   96          400   The FASEB Journal

   97          396   Acta Crystallographica, Section F

   98          395   EMBO Reports

   99          392   DNA and Cell Biology

  100          390   American Journal of Medical Genetics. Part A

  101          383   European Journal of Human Genetics

  102          381   Journal of Investigative Dermatology

  103          379   Molecular Plant-Microbe Interactions

  104          378   Immunity

  105          377   DNA Sequence

  106          375   Neurology

  107          363   Comparative Biochemistry and Physiology

  108          359   Biology of Reproduction

  109          350   Biochimie

  110          341   Brain Research. Molecular Brain Research

  111          340   Genes to Cells

  112          339   Virus Research

  113          338   Clinical Genetics

  114          325   Developmental Dynamics

  115          323   The New England Journal of Medicine

  116          321   Journal of Lipid Research

  117          310   Annals of Neurology

  118          305   Cell Reports

  119          303   Nature Immunology

  120          301   Genome Research

  121          300   BMC Genomics

  122          299   Biological Chemistry Hoppe-Seyler

  123          298   European Journal of Immunology

  124          287   Investigative Ophthalmology and Visual Science

  125          287   Applied Microbiology and Biotechnology

  126          286   PLoS Biology

  127          284   Journal of Medicinal Chemistry

  128          281   Cytogenetics and Cell Genetics

  129          273   Journal of Human Genetics

  130          273   Journal of General Microbiology

  131          263   Glycobiology

  132          257   Archives of Microbiology

  133          256   

  134          247   Traffic

  135          245   Molecular Immunology

  136          242   Molecular Genetics and Metabolism

  137          240   Journal of Cellular Biochemistry

  138          233   DNA Research

  139          232   Phytochemistry

  140          230   Protein Expression and Purification

  141          230   Nature Medicine

  142          229   Cell Cycle

  143          225   Circulation Research

  144          225   Diabetes

  145          222   Archives of Virology

  146          218   Fungal Genetics and Biology

  147          218   Hoppe-Seyler's Zeitschrift fur Physiologische Chemie

  148          217   Nature Chemical Biology

  149          210   Molecular and Cellular Endocrinology

  150          208   Brain





5.  STATISTICS FOR SOME LINE TYPES



The following table summarizes the total number of some UniProtKB/Swiss-Prot lines,

as well as the number of entries with at least one such line, and the

frequency of the lines.



                                      Total    Number of  Average

   Line type / subtype                number   entries    per entry

------------------------------------  -------- ---------  ---------



References (RL)                      1261529                 2.23                                         

   Journal                           1084809     463023      1.92       1                                 

   Submitted to EMBL/GenBank/DDBJ     165505     149820      0.29       2                                 

   Submitted to other databases         7603       6993      0.01       3                                 

   Book citation                        1855       1832     <0.01       4                                 

   Plant Gene Register                   612        599     <0.01       5                                 

   Unpublished observations              487        483     <0.01       6                                 

   Thesis                                439        436     <0.01       7                                 

   Patent                                213        206     <0.01       8                                 

   Worm Breeder's Gazette                  6          6     <0.01       9                                 



Total number of distinct authors cited in UniProtKB/Swiss-Prot: 436524



                                      Total    Number of  Average

   Line type / subtype                number   entries    per entry  Rank

------------------------------------  -------- ---------  ---------  ----

Comments (CC)                        2639637                 4.67                                         

   ACTIVITY REGULATION                 16392      16338      0.03      17                                 

   ALLERGEN                              925        925     <0.01      26                                 

   ALTERNATIVE PRODUCTS                25581      25581      0.05      13                                 

   BIOPHYSICOCHEMICAL PROPERTIES        9721       9708      0.02      20                                 

   BIOTECHNOLOGY                        1450       1424     <0.01      24                                 

   CATALYTIC ACTIVITY                 313402     245464      0.55       4                                 

   CAUTION                             13686      13400      0.02      18                                 

   COFACTOR                           128701     117417      0.23       7                                 

   DEVELOPMENTAL STAGE                 13203      13159      0.02      19                                 

   DISEASE                              7582       5101      0.01      21                                 

   DISRUPTION PHENOTYPE                17467      17455      0.03      16                                 

   DOMAIN                              53085      45359      0.09       9                                 

   FUNCTION                           477496     454890      0.84       2                                 

   INDUCTION                           23016      22961      0.04      15                                 

   INTERACTION                         23697      23697      0.04      14                                 

   MASS SPECTROMETRY                    7151       5507      0.01      22                                 

   MISCELLANEOUS                       44539      39134      0.08      12                                 

   PATHWAY                            141604     128166      0.25       6                                 

   PHARMACEUTICAL                        162        154     <0.01      29                                 

   POLYMORPHISM                         1316       1261     <0.01      25                                 

   PTM                                 60516      43762      0.11       8                                 

   RNA EDITING                           628        628     <0.01      28                                 

   SEQUENCE CAUTION                    44695      44624      0.08      11                                 

   SIMILARITY                         513929     509709      0.91       1                                 

   SUBCELLULAR LOCATION               355384     347435      0.63       3                                 

   SUBUNIT                            288376     284208      0.51       5                                 

   TISSUE SPECIFICITY                  48526      48298      0.09      10                                 

   TOXIC DOSE                            815        653     <0.01      27                                 

   WEB RESOURCE                         6592       5528      0.01      23                                 



Total number of comment topics: 29





                                      Total    Number of  Average

   Line type / subtype                number   entries    per entry  Rank

------------------------------------  -------- ---------  ---------  ----

Features (FT)                        5038477                 8.91                                         

   ACT_SITE                           168907     102056      0.30      11                                 

   BINDING                            422229     112358      0.75       2                                 

   CA_BIND                              4241       1764      0.01      36                                 

   CARBOHYD                           120361      30829      0.21      16                                 

   CHAIN                              573499     557903      1.01       1                                 

   COILED                              22214      15380      0.04      27                                 

   COMPBIAS                           171724      73061      0.30      10                                 

   CONFLICT                           137437      47989      0.24      14                                 

   CROSSLNK                            24266       8720      0.04      26                                 

   DISULFID                           129956      34745      0.23      15                                 

   DNA_BIND                            11979      10731      0.02      33                                 

   DOMAIN                             208105     127634      0.37       9                                 

   HELIX                              291530      26160      0.52       7                                 

   INIT_MET                            17440      17392      0.03      28                                 

   INTRAMEM                             2889       1334      0.01      37                                 

   LIPID                               13411       8651      0.02      30                                 

   METAL                              413401      99071      0.73       3                                 

   MOD_RES                            257281      73058      0.46       8                                 

   MOTIF                               45433      29671      0.08      23                                 

   MUTAGEN                             82213      17592      0.15      19                                 

   NON_CONS                             2532        825     <0.01      38                                 

   NON_STD                               358        283     <0.01      39                                 

   NON_TER                             12595       9663      0.02      31                                 

   NP_BIND                            161524      87129      0.29      12                                 

   PEPTIDE                             12060       8278      0.02      32                                 

   PROPEP                              14761      12595      0.03      29                                 

   REGION                             411724     193929      0.73       4                                 

   REPEAT                             107503      14987      0.19      17                                 

   SIGNAL                              43238      43237      0.08      24                                 

   SITE                                62312      33468      0.11      21                                 

   STRAND                             300683      24669      0.53       6                                 

   TOPO_DOM                           146221      29728      0.26      13                                 

   TRANSIT                              9300       9181      0.02      34                                 

   TRANSMEM                           376849      78855      0.67       5                                 

   TURN                                70510      21306      0.12      20                                 

   UNSURE                               5597        850      0.01      35                                 

   VAR_SEQ                             52571      22349      0.09      22                                 

   VARIANT                             99345      17230      0.18      18                                 

   ZN_FING                             30278      12948      0.05      25                                 



Total number of feature keys: 39







                                      Total    Number of  Average

   Line type / subtype                number   entries    per entry  Rank      Category

------------------------------------  -------- ---------  ---------  ----      -------------------------------------------

Cross-references (DR)               18302460                32.38                                                           

   ABCD                                 2717       2717     <0.01     118      Protocols and materials databases            

   Allergome                            2010       1295     <0.01     125      Protein family/group databases               

   Antibodypedia                       32137      32026      0.06      57      Protocols and materials databases            

   ArachnoServer                        1164       1154     <0.01     135      Organism-specific databases                  

   Araport                             16131      16035      0.03      86      Organism-specific databases                  

   Bgee                                57476      57476      0.10      43      Gene expression databases                    

   BindingDB                            6061       6061      0.01     102      Chemistry databases                          

   BioCyc                             202443     198389      0.36      24      Enzyme and pathway databases                 

   BioGRID                             59325      57494      0.10      41      Protein-protein interaction databases        

   BioGRID-ORCS                        39044      38509      0.07      55      Miscellaneous databases                      

   BioMuta                             20307      20289      0.04      72      Genetic variation databases                  

   BMRB                                 6913       6913      0.01      99      3D structure databases                       

   BRENDA                              19017      17442      0.03      75      Enzyme and pathway databases                 

   CarbonylDB                           1157       1157     <0.01     136      PTM databases                                

   CAZy                                 9565       8621      0.02      92      Protein family/group databases               

   CCDS                                48745      34294      0.09      48      Sequence databases                           

   CDD                                196781     176326      0.35      25      Family and domain databases                  

   CGD                                  2001       1984     <0.01     126      Organism-specific databases                  

   ChEMBL                               8425       8246      0.01      95      Chemistry databases                          

   ChiTaRS                             29663      29626      0.05      61      Miscellaneous databases                      

   CLAE                                  359        356     <0.01     151      Protein family/group databases               

   CollecTF                              135        135     <0.01     159      Gene expression databases                    

   ComplexPortal                       11502       6454      0.02      89      Protein-protein interaction databases        

   COMPLUYEAST-2DPAGE                     97         97     <0.01     161      2D gel databases                             

   ConoServer                            968        880     <0.01     139      Organism-specific databases                  

   CORUM                                5809       5809      0.01     103      Protein-protein interaction databases        

   CPTAC                                2525       1632     <0.01     121      Proteomic databases                          

   CPTC                                  303        303     <0.01     153      Protocols and materials databases            

   CTD                                 75495      74593      0.13      40      Organism-specific databases                  

   DEPOD                                 254        254     <0.01     157      PTM databases                                

   dictyBase                            4215       4101      0.01     112      Organism-specific databases                  

   DIP                                 17473      17433      0.03      81      Protein-protein interaction databases        

   DisGeNET                            17033      16808      0.03      82      Organism-specific databases                  

   DisProt                              1539       1527     <0.01     128      Family and domain databases                  

   DMDM                                16191      16189      0.03      85      Genetic variation databases                  

   DNASU                               48110      48033      0.09      50      Protocols and materials databases            

   DOSAC-COBS-2DPAGE                     145        145     <0.01     158      2D gel databases                             

   DrugBank                            29424       4706      0.05      63      Chemistry databases                          

   DrugCentral                          2564       2564     <0.01     120      Chemistry databases                          

   EchoBASE                             4158       4158      0.01     113      Organism-specific databases                  

   eggNOG                             337209     331338      0.60      14      Phylogenomic databases                       

   ELM                                  1812       1812     <0.01     127      Protein-protein interaction databases        

   EMBL                               992352     552947      1.76       3      Sequence databases                           

   Ensembl                             98303      50798      0.17      35      Genome annotation databases                  

   EnsemblBacteria                    309234     297798      0.55      17      Genome annotation databases                  

   EnsemblFungi                        30189      28569      0.05      60      Genome annotation databases                  

   EnsemblMetazoa                      18483      10824      0.03      78      Genome annotation databases                  

   EnsemblPlants                       30512      21620      0.05      58      Genome annotation databases                  

   EnsemblProtists                      5060       4879      0.01     106      Genome annotation databases                  

   EPD                                 21225      21225      0.04      68      Proteomic databases                          

   ESTHER                               2588       2587     <0.01     119      Protein family/group databases               

   euHCVdb                                55         44     <0.01     163      Organism-specific databases                  

   EvolutionaryTrace                   16679      16679      0.03      84      Miscellaneous databases                      

   ExpressionAtlas                     48601      48601      0.09      49      Gene expression databases                    

   FlyBase                              4970       4845      0.01     107      Organism-specific databases                  

   Gene3D                             416796     323247      0.74      13      Family and domain databases                  

   GeneCards                           20348      20183      0.04      70      Organism-specific databases                  

   GeneDB                                625        569     <0.01     145      Genome annotation databases                  

   GeneID                             308666     283065      0.55      18      Genome annotation databases                  

   GeneReviews                          1493       1490     <0.01     129      Organism-specific databases                  

   GeneTree                            59065      59019      0.10      42      Phylogenomic databases                       

   Genevisible                         55258      55258      0.10      45      Gene expression databases                    

   GeneWiki                            10350      10267      0.02      91      Miscellaneous databases                      

   GenomeRNAi                          22184      22183      0.04      66      Miscellaneous databases                      

   GlyConnect                           2330       2200     <0.01     122      PTM databases                                

   GlyGen                              11179      11179      0.02      90      PTM databases                                

   GO                                3097265     540846      5.48       1      Ontologies                                   

   Gramene                             30512      21620      0.05      59      Genome annotation databases                  

   GuidetoPHARMACOLOGY                  2052       2052     <0.01     124      Chemistry databases                          

   HAMAP                              330628     327703      0.58      15      Family and domain databases                  

   HGNC                                20328      20195      0.04      71      Organism-specific databases                  

   HOGENOM                            424467     424467      0.75      11      Phylogenomic databases                       

   HPA                                 18983      18847      0.03      76      Organism-specific databases                  

   IDEAL                                 986        986     <0.01     138      Family and domain databases                  

   IMGT_GENE-DB                          267        267     <0.01     156      Protein family/group databases               

   InParanoid                         140406     140406      0.25      28      Phylogenomic databases                       

   IntAct                              56920      56920      0.10      44      Protein-protein interaction databases        

   InterPro                          2334204     546292      4.13       2      Family and domain databases                  

   iPTMnet                             52683      52683      0.09      46      PTM databases                                

   jPOST                               26397      26397      0.05      64      Proteomic databases                          

   KEGG                               502923     477412      0.89       8      Genome annotation databases                  

   LegioList                             765        763     <0.01     143      Organism-specific databases                  

   Leproma                               672        669     <0.01     144      Organism-specific databases                  

   MaizeGDB                              521        517     <0.01     146      Organism-specific databases                  

   MalaCards                            4823       4819      0.01     109      Organism-specific databases                  

   MassIVE                             18529      18529      0.03      77      Proteomic databases                          

   MaxQB                               29621      29621      0.05      62      Proteomic databases                          

   MEROPS                              13681      13681      0.02      88      Protein family/group databases               

   MetOSite                             3106       3106      0.01     116      PTM databases                                

   MGI                                 16992      16952      0.03      83      Organism-specific databases                  

   MIM                                 21963      15455      0.04      67      Organism-specific databases                  

   MINT                                22798      22798      0.04      65      Protein-protein interaction databases        

   MoonDB                                348        348     <0.01     152      Protein family/group databases               

   MoonProt                              281        281     <0.01     155      Protein family/group databases               

   neXtProt                            20365      20365      0.04      69      Organism-specific databases                  

   NIAGADS                                68         68     <0.01     162      Organism-specific databases                  

   OGP                                   373        373     <0.01     150      2D gel databases                             

   OMA                                422699     422699      0.75      12      Phylogenomic databases                       

   OpenTargets                         18321      18171      0.03      79      Organism-specific databases                  

   Orphanet                             7724       4116      0.01      98      Organism-specific databases                  

   OrthoDB                            246139     246139      0.44      21      Phylogenomic databases                       

   PANTHER                            287431     274964      0.51      20      Family and domain databases                  

   PathwayCommons                      19484      19484      0.03      74      Enzyme and pathway databases                 

   PATRIC                              92547      92547      0.16      38      Genome annotation databases                  

   PaxDb                              125666     125666      0.22      32      Proteomic databases                          

   PCDDB                                 125        125     <0.01     160      3D structure databases                       

   PDB                                216759      30515      0.38      22      3D structure databases                       

   PDBsum                             216759      30515      0.38      23      3D structure databases                       

   PeptideAtlas                        33723      33723      0.06      56      Proteomic databases                          

   PeroxiBase                            783        761     <0.01     142      Protein family/group databases               

   Pfam                               786427     524357      1.39       4      Family and domain databases                  

   PharmGKB                            18309      18290      0.03      80      Organism-specific databases                  

   Pharos                              20084      20084      0.04      73      Miscellaneous databases                      

   PHI-base                             1485       1230     <0.01     130      Miscellaneous databases                      

   PhosphoSitePlus                     39081      39081      0.07      54      PTM databases                                

   PhylomeDB                           97067      97067      0.17      37      Phylogenomic databases                       

   PIR                                124568     114293      0.22      33      Sequence databases                           

   PIRSF                              107824     106729      0.19      34      Family and domain databases                  

   PlantReactome                        1080        741     <0.01     137      Enzyme and pathway databases                 

   PomBase                              5129       5125      0.01     105      Organism-specific databases                  

   PRIDE                              140850     140850      0.25      27      Proteomic databases                          

   PRINTS                             131250     116377      0.23      30      Family and domain databases                  

   PRO                                 97286      97286      0.17      36      Miscellaneous databases                      

   ProMEX                                467        467     <0.01     149      Proteomic databases                          

   PROSITE                            482652     306199      0.85       9      Family and domain databases                  

   Proteomes                          505460     471285      0.89       7      Miscellaneous databases                      

   ProteomicsDB                        82334      52892      0.15      39      Proteomic databases                          

   PseudoCAP                            1410       1401     <0.01     132      Organism-specific databases                  

   Reactome                           125786      35877      0.22      31      Enzyme and pathway databases                 

   REBASE                                786        395     <0.01     141      Protein family/group databases               

   RefSeq                             617487     471725      1.09       5      Sequence databases                           

   REPRODUCTION-2DPAGE                  1259       1038     <0.01     133      2D gel databases                             

   RGD                                  8056       8053      0.01      96      Organism-specific databases                  

   RNAct                               43016      43016      0.08      52      Miscellaneous databases                      

   SABIO-RK                             4896       4896      0.01     108      Enzyme and pathway databases                 

   SASBDB                                476        476     <0.01     148      3D structure databases                       

   SFLD                                 9480       7361      0.02      93      Family and domain databases                  

   SGD                                  6740       6735      0.01     100      Organism-specific databases                  

   SignaLink                            3093       3093      0.01     117      Enzyme and pathway databases                 

   SIGNOR                               5380       5380      0.01     104      Enzyme and pathway databases                 

   SMART                              194397     143332      0.34      26      Family and domain databases                  

   SMR                                457260     457260      0.81      10      3D structure databases                       

   STRING                             329947     329947      0.58      16      Protein-protein interaction databases        

   SUPFAM                             515367     389645      0.91       6      Family and domain databases                  

   SWISS-2DPAGE                         1177       1177     <0.01     134      2D gel databases                             

   SwissLipids                          1471       1387     <0.01     131      Chemistry databases                          

   SwissPalm                            8647       8647      0.02      94      PTM databases                                

   TAIR                                14906      14850      0.03      87      Organism-specific databases                  

   TCDB                                 7804       7741      0.01      97      Protein family/group databases               

   TIGRFAMs                           293074     273011      0.52      19      Family and domain databases                  

   TopDownProteomics                    3237       2960      0.01     114      Proteomic databases                          

   TreeFam                             45790      45783      0.08      51      Phylogenomic databases                       

   TubercuList                          2277       2241     <0.01     123      Organism-specific databases                  

   UCD-2DPAGE                            496        496     <0.01     147      2D gel databases                             

   UCSC                                50468      46065      0.09      47      Genome annotation databases                  

   UniLectin                             288        288     <0.01     154      Protein family/group databases               

   UniPathway                         138437     125200      0.24      29      Enzyme and pathway databases                 

   VEuPathDB                           41468      40094      0.07      53      Organism-specific databases                  

   VGNC                                 4404       4391      0.01     111      Organism-specific databases                  

   WBParaSite                             48         43     <0.01     164      Genome annotation databases                  

   World-2DPAGE                          933        922     <0.01     140      2D gel databases                             

   WormBase                             6594       4866      0.01     101      Organism-specific databases                  

   Xenbase                              4540       4539      0.01     110      Organism-specific databases                  

   ZFIN                                 3178       3173      0.01     115      Organism-specific databases                  



Total number of cross-referenced databases: 164



6.  AMINO ACID COMPOSITION



   6.1  Composition in percent for the complete database



   Ala (A) 8.25   Gln (Q) 3.93   Leu (L) 9.65   Ser (S) 6.64

   Arg (R) 5.53   Glu (E) 6.72   Lys (K) 5.80   Thr (T) 5.35

   Asn (N) 4.06   Gly (G) 7.07   Met (M) 2.41   Trp (W) 1.10

   Asp (D) 5.46   His (H) 2.27   Phe (F) 3.86   Tyr (Y) 2.92

   Cys (C) 1.38   Ile (I) 5.91   Pro (P) 4.73   Val (V) 6.86



   Asx (B) 0.000  Glx (Z) 0.000  Xaa (X) 0.00



   



   Legend: gray = aliphatic, red = acidic, green = small hydroxy,

           blue = basic, black = aromatic, white = amide, yellow = sulfur





   6.2  Classification of the amino acids by their frequency



   Leu, Ala, Gly, Val, Glu, Ser, Ile, Lys, Arg, Asp, Thr, Pro, Asn, Gln,

   Phe, Tyr, Met, His, Cys, Trp





7.  MISCELLANEOUS STATISTICS



4463 entries are encoded on a mitochondrion, and 3949 are encoded on a plasmid.



12189 entries are encoded on a plastid, 

of which 21 are encoded on apicoplasts, 

11624 on chloroplasts, 

51 on organellar chromatophores,

145 on cyanelles, 

149 on non-photosynthetic plastids and 

199 on unspecified types of plastid.



Number of entries with at least one sequence correction: 80278