Home  |  Contact



         UniProtKB/Swiss-Prot protein knowledgebase release 2023_04 statistics





1.  INTRODUCTION



Release 2023_04 of 13-Sep-2023 of UniProtKB/Swiss-Prot contains 570157 sequence

entries, curated from 294587 unique references and comprising 206173379 amino acids. 



365 sequences have been added since release 2023_03, the sequence data of

60 existing entries has been updated and the annotations of

391929 entries have been revised.



Number of fragments: 9286

Number of additional sequences produced by alternative splicing, initiation or promoter usage, or ribosomal frameshifting: 41011





Protein existence (PE):           entries     %



1: Evidence at protein level       112765   19.8%

2: Evidence at transcript level     55913    9.8%

3: Inferred from homology          386612   67.8%

4: Predicted                        13034    2.3%

5: Uncertain                         1833    0.3%



The growth of the database is summarized below.



   





2.  TAXONOMIC ORIGIN



   Total number of species represented in this release of UniProtKB/Swiss-Prot: 14509



   The first twenty species represent 122865 sequences:  21.5 % of the total

   number of entries.





   2.1 Table of the frequency of occurrence of species



        Species represented 1x: 5928

                            2x: 2099

                            3x: 1123

                            4x:  770

                            5x:  530

                            6x:  438

                            7x:  330

                            8x:  273

                            9x:  239

                           10x:  155

                       11- 20x:  836

                       21- 50x:  504

                       51-100x:  227

                         >100x: 1057





   2.2  Table of the most represented species



  ------  ---------  --------------------------------------------

  Number  Frequency  Species

  ------  ---------  --------------------------------------------

       1      20426  Homo sapiens (Human)

       2      17178  Mus musculus (Mouse)

       3      16369  Arabidopsis thaliana (Mouse-ear cress)

       4       8183  Rattus norvegicus (Rat)

       5       6727  Saccharomyces cerevisiae (strain ATCC 204508 / S288c) (Baker's yeast)

       6       6040  Bos taurus (Bovine)

       7       5121  Schizosaccharomyces pombe (strain 972 / ATCC 24843) (Fission yeast)

       8       4530  Escherichia coli (strain K12)

       9       4457  Caenorhabditis elegans

      10       4191  Bacillus subtilis (strain 168)

      11       4186  Oryza sativa subsp. japonica (Rice)

      12       4159  Dictyostelium discoideum (Social amoeba)

      13       3723  Drosophila melanogaster (Fruit fly)

      14       3493  Xenopus laevis (African clawed frog)

      15       3306  Danio rerio (Zebrafish) (Brachydanio rerio)

      16       2307  Gallus gallus (Chicken)

      17       2306  Mycobacterium tuberculosis (strain ATCC 25618 / H37Rv)

      18       2218  Pongo abelii (Sumatran orangutan) (Pongo pygmaeus abelii)

      19       2046  Escherichia coli O157:H7

      20       1899  Mycobacterium tuberculosis (strain CDC 1551 / Oshkosh)

      21       1820  Salmonella typhimurium (strain LT2 / SGSC1412 / ATCC 700720)

      22       1787  Methanocaldococcus jannaschii  

      23       1710  Xenopus tropicalis (Western clawed frog) (Silurana tropicalis)

      24       1704  Haemophilus influenzae (strain ATCC 51907 / DSM 11121 / KW20 / Rd)

      25       1702  Escherichia coli O6:H1 (strain CFT073 / ATCC 700928 / UPEC)

      26       1696  Shigella flexneri

      27       1458  Sus scrofa (Pig)

      28       1451  Pseudomonas aeruginosa 

      29       1347  Salmonella typhi

      30       1244  Mycobacterium bovis (strain ATCC BAA-935 / AF2122/97)

      31       1176  Macaca fascicularis (Crab-eating macaque) (Cynomolgus monkey)

      32       1109  Candida albicans (strain SC5314 / ATCC MYA-2876) (Yeast)

      33       1087  Synechocystis sp. (strain PCC 6803 / Kazusa)

      34       1036  Archaeoglobus fulgidus 

      35       1030  Yersinia pestis

      36       1004  Emericella nidulans  

      37        997  Vibrio cholerae serotype O1 (strain ATCC 39315 / El Tor Inaba N16961)

      38        956  Neurospora crassa 

      39        941  Staphylococcus aureus (strain Mu50 / ATCC 700699)

      40        930  Salmonella paratyphi A (strain ATCC 9150 / SARB42)

      41        929  Staphylococcus aureus (strain N315)

      42        928  Ashbya gossypii (strain ATCC 10895 / CBS 109.51 / FGSC 9923 / NRRL Y-1056)  

      43        919  Kluyveromyces lactis   

      44        909  Acanthamoeba polyphaga mimivirus (APMV)

      45        905  Staphylococcus aureus (strain COL)

      46        901  Oryctolagus cuniculus (Rabbit)

      47        896  Aspergillus fumigatus (strain ATCC MYA-4609 / CBS 101355 / FGSC A1100 / Af293) 

      48        896  Staphylococcus aureus (strain MW2)

      49        894  Escherichia coli O6:K15:H31 (strain 536 / UPEC)

      50        890  Staphylococcus aureus (strain MSSA476)

      51        888  Staphylococcus aureus (strain MRSA252)

      52        887  Rhizobium meliloti (strain 1021) (Ensifer meliloti) (Sinorhizobium meliloti)

      53        882  Salmonella choleraesuis (strain SC-B67)

      54        882  Candida glabrata   

      55        879  Shigella sonnei (strain Ss046)

      56        867  Oryza sativa subsp. indica (Rice)

      57        863  Yersinia pseudotuberculosis serotype I (strain IP32953)

      58        848  Zea mays (Maize)

      59        847  Canis lupus familiaris (Dog) (Canis familiaris)

      60        847  Escherichia coli O9:H4 (strain HS)

      61        838  Escherichia coli O139:H28 (strain E24377A / ETEC)

      62        829  Shigella boydii serotype 4 (strain Sb227)

      63        825  Escherichia coli (strain UTI89 / UPEC)

      64        822  Shigella dysenteriae serotype 1 (strain Sd197)

      65        822  Escherichia coli 

      66        815  Streptomyces coelicolor (strain ATCC BAA-471 / A3(2) / M145)

      67        809  Staphylococcus aureus (strain NCTC 8325 / PS 47)

      68        803  Pectobacterium atrosepticum (strain SCRI 1043 / ATCC BAA-672) 

      69        796  Vibrio parahaemolyticus serotype O3:K6 (strain RIMD 2210633)

      70        791  Escherichia coli (strain SMS-3-5 / SECEC)

      71        788  Aquifex aeolicus (strain VF5)

      72        779  Escherichia coli O127:H6 (strain E2348/69 / EPEC)

      73        771  Escherichia coli (strain K12 / DH10B)

      74        770  Pasteurella multocida (strain Pm70)

      75        766  Escherichia coli O17:K52:H18 (strain UMN026 / ExPEC)

      76        765  Escherichia coli (strain K12 / MC4100 / BW2952)

      77        762  Escherichia coli (strain 55989 / EAEC)

      78        761  Escherichia coli O8 (strain IAI1)

      79        760  Staphylococcus epidermidis (strain ATCC 12228 / FDA PCI 1200)

      80        760  Staphylococcus epidermidis (strain ATCC 35984 / RP62A)

      81        760  Shigella flexneri serotype 5b (strain 8401)

      82        759  Escherichia coli O45:K1 (strain S88 / ExPEC)

      83        757  Bacillus anthracis

      84        756  Escherichia coli (strain SE11)

      85        753  Escherichia coli O7:K1 (strain IAI39 / ExPEC)

      86        749  Photorhabdus laumondii subsp. laumondii (strain DSM 15139 / CIP 105565 / TT01) 

      87        748  Escherichia coli O157:H7 (strain EC4115 / EHEC)

      88        744  Halalkalibacterium halodurans  

      89        739  Yersinia enterocolitica serotype O:8 / biotype 1B (strain NCTC 13174 / 8081)

      90        733  Vibrio vulnificus (strain CMCP6)

      91        732  Pseudomonas putida 

      92        731  Escherichia coli O81 (strain ED1a)

      93        722  Salmonella enteritidis PT4 (strain P125109)

      94        718  Vibrio vulnificus (strain YJ016)

      95        716  Salmonella paratyphi B (strain ATCC BAA-1250 / SPB7)

      96        715  Klebsiella pneumoniae subsp. pneumoniae (strain ATCC 700721 / MGH 78578)

      97        715  Yersinia pestis bv. Antiqua (strain Nepal516)

      98        715  Escherichia coli O1:K1 / APEC

      99        715  Enterobacter sp. (strain 638)

     100        714  Salmonella paratyphi A (strain AKU_12601)

     101        713  Yersinia pseudotuberculosis serotype O:1b (strain IP 31758)

     102        713  Salmonella agona (strain SL483)

     103        713  Salmonella newport (strain SL254)

     104        712  Salmonella schwarzengrund (strain CVM19633)

     105        711  Escherichia coli

     106        711  Yersinia pestis bv. Antiqua (strain Antiqua)

     107        710  Salmonella heidelberg (strain SL476)

     108        703  Nostoc sp. (strain PCC 7120 / SAG 25.82 / UTEX 2576)

     109        702  Salmonella dublin (strain CT_02021853)

     110        699  Klebsiella pneumoniae (strain 342)

     111        698  Shigella boydii serotype 18 (strain CDC 3083-94 / BS512)

     112        695  Escherichia fergusonii 

     113        692  Pan troglodytes (Chimpanzee)

     114        686  Mycoplasma pneumoniae (strain ATCC 29342 / M129 / Subtype 1) 

     115        684  Salmonella gallinarum (strain 287/91 / NCTC 13346)

     116        682  Pseudomonas syringae pv. tomato (strain ATCC BAA-871 / DC3000)

     117        679  Citrobacter koseri (strain ATCC BAA-895 / CDC 4225-83 / SGSC4696)

     118        679  Staphylococcus aureus (strain USA300)

     119        672  Serratia proteamaculans (strain 568)

     120        669  Bacillus cereus 

     121        669  Mycobacterium leprae (strain TN)

     122        667  Yersinia pestis (strain Pestoides F)

     123        666  Yarrowia lipolytica (strain CLIB 122 / E 150) (Yeast) (Candida lipolytica)

     124        664  Bradyrhizobium diazoefficiens 

     125        658  Shewanella oneidensis (strain MR-1)

     126        658  Sinorhizobium fredii (strain NBRC 101917 / NGR234)

     127        654  Agrobacterium fabrum (strain C58 / ATCC 33970) (Agrobacterium tumefaciens 

     128        653  Debaryomyces hansenii   

     129        643  Staphylococcus aureus (strain bovine RF122 / ET3-1)

     130        642  Salmonella arizonae (strain ATCC BAA-731 / CDC346-86 / RSK2980)

     131        642  Yersinia pseudotuberculosis serotype O:3 (strain YPIII)

     132        634  Yersinia pseudotuberculosis serotype IB (strain PB1/+)

     133        622  Listeria monocytogenes serovar 1/2a (strain ATCC BAA-679 / EGD-e)

     134        622  Methanothermobacter thermautotrophicus  

     135        622  Cronobacter sakazakii (strain ATCC BAA-894) (Enterobacter sakazakii)

     136        622  Treponema pallidum (strain Nichols)

     137        618  Pseudomonas aeruginosa (strain UCBPP-PA14)

     138        615  Xanthomonas campestris pv. campestris 

     139        614  Staphylococcus haemolyticus (strain JCSC1435)

     140        613  Mesorhizobium japonicum  (Mesorhizobium loti 

     141        612  Helicobacter pylori (strain ATCC 700392 / 26695) (Campylobacter pylori)

     142        605  Listeria innocua serovar 6a (strain ATCC BAA-680 / CLIP 11262)

     143        603  Ralstonia nicotianae (strain GMI1000) (Ralstonia solanacearum)

     144        602  Staphylococcus saprophyticus subsp. saprophyticus 

     145        602  Photobacterium profundum (strain SS9)

     146        601  Salmonella paratyphi C (strain RKS4594)

     147        600  Yersinia pestis bv. Antiqua (strain Angola)

     148        595  Bacillus cereus (strain ATCC 10987 / NRS 248)

     149        591  Pectobacterium carotovorum subsp. carotovorum (strain PC1)

     150        587  Neisseria meningitidis serogroup B (strain MC58)

     151        587  Mycolicibacterium smegmatis (strain ATCC 700084 / mc(2)155) 

     152        584  Rickettsia prowazekii (strain Madrid E)

     153        582  Caenorhabditis briggsae

     154        579  Brucella suis biovar 1 (strain 1330)

     155        576  Brucella melitensis biotype 1 (strain 16M / ATCC 23456 / NCTC 10094)

     156        575  Caulobacter vibrioides (strain ATCC 19089 / CB15) (Caulobacter crescentus)

     157        573  Aliivibrio fischeri (strain ATCC 700601 / ES114) (Vibrio fischeri)

     158        572  Buchnera aphidicola subsp. Acyrthosiphon pisum (strain APS) 

     159        569  Bacillus thuringiensis subsp. konkukian (strain 97-27)

     160        568  Helicobacter pylori (strain J99 / ATCC 700824) (Campylobacter pylori J99)

     161        568  Pseudomonas syringae pv. syringae (strain B728a)

     162        566  Aspergillus oryzae (strain ATCC 42149 / RIB 40) (Yellow koji mold)

     163        565  Bacillus licheniformis 

     164        564  Thermotoga maritima 

     165        562  Buchnera aphidicola subsp. Schizaphis graminum (strain Sg)

     166        562  Bacillus cereus (strain ZK / E33L)

     167        559  Clostridium acetobutylicum 

     168        557  Xanthomonas axonopodis pv. citri (strain 306)

     169        555  Pseudomonas fluorescens (strain Pf0-1)

     170        554  Neisseria meningitidis serogroup A / serotype 4A (strain DSM 15465 / Z2491)

     171        554  Pseudomonas fluorescens (strain ATCC BAA-477 / NRRL B-23932 / Pf-5)

     172        553  Oceanobacillus iheyensis 

     173        547  Pseudomonas savastanoi pv. phaseolicola  (Pseudomonas syringae pv. phaseolicola 

     174        540  Lactococcus lactis subsp. lactis (strain IL1403) (Streptococcus lactis)

     175        540  Corynebacterium glutamicum 

     176        531  Erwinia tasmaniensis 

     177        529  Bordetella bronchiseptica (strain ATCC BAA-588 / NCTC 13252 / RB50) 

     178        529  Sodalis glossinidius (strain morsitans)

     179        529  Listeria monocytogenes serotype 4b (strain F2365)

     180        524  Staphylococcus aureus (strain Newman)

     181        523  Vibrio cholerae serotype O1 (strain ATCC 39541 / Classical Ogawa 395 / O395)

     182        522  Xylella fastidiosa (strain 9a5c)

     183        521  Deinococcus radiodurans 

     184        519  Methanosarcina acetivorans (strain ATCC 35395 / DSM 2834 / JCM 12185 / C2A)

     185        519  Chromobacterium violaceum 

     186        516  Bordetella pertussis (strain Tohama I / ATCC BAA-589 / NCTC 13251)

     187        516  Streptococcus pneumoniae serotype 4 (strain ATCC BAA-334 / TIGR4)

     188        515  Xylella fastidiosa (strain Temecula1 / ATCC 700964)

     189        512  Pseudomonas aeruginosa (strain PA7)

     190        512  Geobacillus kaustophilus (strain HTA426)

     191        511  Acinetobacter baylyi (strain ATCC 33305 / BD413 / ADP1)

     192        511  Haemophilus ducreyi (strain 35000HP / ATCC 700724)

     193        511  Streptomyces avermitilis 

     194        508  Bordetella parapertussis (strain 12822 / ATCC BAA-587 / NCTC 13253)

     195        507  Buchnera aphidicola subsp. Baizongia pistaciae (strain Bp)

     196        506  Pyrococcus furiosus (strain ATCC 43587 / DSM 3638 / JCM 8422 / Vc1)

     197        505  Nicotiana tabacum (Common tobacco)

     198        505  Solanum lycopersicum (Tomato) (Lycopersicon esculentum)

     199        504  Pseudomonas entomophila (strain L48)

     200        499  Haemophilus influenzae (strain 86-028NP)

     201        499  Brucella abortus biovar 1 (strain 9-941)

     202        497  Burkholderia pseudomallei (strain K96243)

     203        496  Alkalihalobacillus clausii (strain KSM-K16) (Bacillus clausii)

     204        496  Proteus mirabilis (strain HI4320)

     205        496  Rickettsia conorii (strain ATCC VR-613 / Malish 7)

     206        495  Thermosynechococcus vestitus (strain NIES-2133 / IAM M-273 / BP-1)

     207        495  Pyrococcus horikoshii 

     208        494  Xanthomonas campestris pv. campestris (strain 8004)

     209        493  Synechococcus elongatus (strain ATCC 33912 / PCC 7942 / FACHB-805) 

     210        493  Streptococcus pneumoniae (strain ATCC BAA-255 / R6)

     211        492  Methanosarcina mazei  

     212        492  Brucella abortus (strain 2308)

     213        492  Halobacterium salinarum (strain ATCC 700922 / JCM 11081 / NRC-1) 

     214        492  Bacillus velezensis (strain DSM 23117 / BGSC 10A6 / LMG 26770 / FZB42) 

     215        491  Vibrio campbellii (strain ATCC BAA-1116)

     216        490  Saccharolobus solfataricus (strain ATCC 35092 / DSM 1617 / JCM 11322 / P2) 

     217        487  Shewanella sp. (strain MR-7)

     218        486  Mannheimia succiniciproducens (strain MBEL55E)

     219        484  Pseudomonas aeruginosa (strain LESB58)

     220        484  Staphylococcus aureus (strain Mu3 / ATCC 700698)

     221        484  Shewanella sp. (strain MR-4)

     222        483  Mycoplasma genitalium (strain ATCC 33530 / DSM 19775 / NCTC 10195 / G37) 

     223        483  Lactiplantibacillus plantarum (strain ATCC BAA-793 / NCIMB 8826 / WCFS1) 

     224        479  Pseudomonas putida (strain ATCC 700007 / DSM 6899 / BCRC 17059 / F1)

     225        477  Pyrococcus abyssi (strain GE5 / Orsay)

     226        476  Cupriavidus necator  

     227        475  Burkholderia lata 

     228        473  Campylobacter jejuni subsp. jejuni serotype O:2 

     229        472  Rhodopseudomonas palustris (strain ATCC BAA-98 / CGA009)

     230        470  Clostridium perfringens (strain 13 / Type A)

     231        469  Enterococcus faecalis (strain ATCC 700802 / V583)

     232        469  Cereibacter sphaeroides  

     233        468  Shewanella sp. (strain ANA-3)

     234        468  Pseudomonas putida (strain GB-1)

     235        467  Shewanella frigidimarina (strain NCIMB 400)

     236        467  Aeromonas hydrophila subsp. hydrophila 

     237        466  Xanthomonas campestris pv. vesicatoria (strain 85-10)

     238        465  Trichormus variabilis (strain ATCC 29413 / PCC 7937) (Anabaena variabilis)

     239        463  Burkholderia mallei (strain ATCC 23344)

     240        461  Cupriavidus pinatubonensis (strain JMP 134 / LMG 1197) (Cupriavidus necator 

     241        460  Methylococcus capsulatus (strain ATCC 33009 / NCIMB 11132 / Bath)

     242        460  Ovis aries (Sheep)

     243        457  Rickettsia felis (strain ATCC VR-1525 / URRWXCal2) (Rickettsia azadi)

     244        455  Shewanella baltica (strain OS185)

     245        455  Xanthomonas oryzae pv. oryzae (strain MAFF 311018)

     246        455  Staphylococcus aureus (strain JH1)

     247        453  Mycolicibacterium paratuberculosis (strain ATCC BAA-968 / K-10) 

     248        453  Streptococcus mutans serotype c (strain ATCC 700610 / UA159)

     249        453  Pseudomonas putida (strain W619)

     250        452  Aeromonas salmonicida (strain A449)





   

   2.3  Taxonomic distribution of the sequences



   



   Kingdom        sequences (% of the database)

    Archaea           19710 (  3%)

    Bacteria         336064 ( 59%)

    Eukaryota        197016 ( 35%)

    Viruses           17367 (  3%)





   Within Eukaryota:



   



    Category            sequences (% of Eukaryota) (% of the complete database)

     Human                  20427 ( 10%)           (  4%)

     Other Mammalia         47304 ( 24%)           (  8%)

     Other Vertebrata       18918 ( 10%)           (  3%)

     Viridiplantae          41669 ( 21%)           (  7%)

     Fungi                  36557 ( 19%)           (  6%)

     Insecta                 9726 (  5%)           (  2%)

     Nematoda                5375 (  3%)           (  1%)

     Other                  17040 (  9%)           (  3%)







3.  SEQUENCE SIZE



   Repartition of the sequences by size (excluding fragments)



               From   To  Number             From   To   Number

                  1-  50    9963             1001-1100     4108

                 51- 100   43516             1101-1200     2890

                101- 150   59790             1201-1300     2205

                151- 200   59561             1301-1400     2068

                201- 250   58425             1401-1500     1671

                251- 300   52384             1501-1600      831

                301- 350   52827             1601-1700      642

                351- 400   45880             1701-1800      586

                401- 450   37685             1801-1900      503

                451- 500   30550             1901-2000      395

                501- 550   22275             2001-2100      271

                551- 600   15818             2101-2200      386

                601- 650   13153             2201-2300      340

                651- 700    9396             2301-2400      234

                701- 750    7868             2401-2500      195

                751- 800    5694             >2500         1458

                801- 850    4888

                851- 900    5299

                901- 950    4109

                951-1000    3007



   





   The average sequence length in UniProtKB/Swiss-Prot is 361 amino acids.



   The shortest sequence is   GWA_SEPOF (P83570):     2 amino acids.

   The longest sequence is  TITIN_MOUSE (A2ASS6): 35213 amino acids.





4.  JOURNAL CITATIONS



   Note: the following citation statistics reflect the number of distinct

         journal citations.



   Total number of journals cited in this release of UniProtKB/Swiss-Prot: 3128





   4.1 Table of the frequency of journal citations



        Journals cited 1x: 1002

                       2x:  427

                       3x:  221

                       4x:  139

                       5x:  125

                       6x:   87

                       7x:   66

                       8x:   79

                       9x:   49

                      10x:   36

                  11- 20x:  242

                  21- 50x:  258

                  51-100x:  140

                    >100x:  257





   4.2  List of the most cited journals in UniProtKB/Swiss-Prot



   Nb    Citations   Journal name

   --    ---------   -------------------------------------------------------------

    1        26958   Journal of Biological Chemistry

    2        12569   Proceedings of the National Academy of Sciences of the U.S.A.

    3         7151   Journal of Bacteriology

    4         6009   Biochemical and Biophysical Research Communications

    5         5811   Biochemistry

    6         5293   Nucleic Acids Research

    7         5063   Nature

    8         5063   FEBS Letters

    9         4937   The EMBO Journal

   10         4885   Gene

   11         4568   Journal of Molecular Biology

   12         4553   Molecular and Cellular Biology

   13         3999   Biochimica et Biophysica Acta

   14         3832   Cell

   15         3574   Journal of Virology

   16         3502   European Journal of Biochemistry

   17         3349   Science

   18         3142   Biochemical Journal

   19         2831   Molecular Microbiology

   20         2804   Plant Physiology

   21         2555   PLoS ONE

   22         2546   Genomics

   23         2422   The American Journal of Human Genetics

   24         2342   Journal of Cell Biology

   25         2195   The Plant Cell

   26         2033   The Plant Journal

   27         2001   Human Molecular Genetics

   28         1937   Genes and Development

   29         1922   Plant Molecular Biology

   30         1896   Virology

   31         1842   Nature Genetics

   32         1814   Development

   33         1805   Molecular Biology of the Cell

   34         1777   Molecular Cell

   35         1678   Journal of Immunology

   36         1645   Human Mutation

   37         1565   Oncogene

   38         1451   Structure

   39         1426   Molecular and General Genetics

   40         1411   Genetics

   41         1409   Journal of Biochemistry

   42         1372   Journal of Cell Science

   43         1269   Blood

   44         1259   Infection and Immunity

   45         1184   Journal of General Virology

   46         1181   Developmental Biology

   47         1170   Microbiology

   48         1144   Archives of Biochemistry and Biophysics

   49         1143   Nature Communications

   50         1133   Current Biology

   51         1015   Journal of Neuroscience

   52         1014   Applied and Environmental Microbiology

   53          991   Acta Crystallographica, Section D

   54          924   Cancer Research

   55          905   FEMS Microbiology Letters

   56          886   Toxicon

   57          886   PLoS Genetics

   58          877   American Journal of Physiology

   59          858   Protein Science

   60          849   Journal of Clinical Investigation

   61          847   Yeast

   62          841   Scientific Reports

   63          816   Neuron

   64          763   Plant and Cell Physiology

   65          751   Human Genetics

   66          751   The Journal of Experimental Medicine

   67          700   Journal of Medical Genetics

   68          691   Proteins

   69          671   The FEBS Journal

   70          671   Mechanisms of Development

   71          649   Nature Structural Biology

   72          642   Nature Structural and Molecular Biology

   73          635   PLoS Pathogens

   74          634   Nature Cell Biology

   75          622   Bioscience, Biotechnology, and Biochemistry

   76          589   Current Genetics

   77          582   Developmental Cell

   78          573   Journal of Neurochemistry

   79          552   Molecular Endocrinology

   80          549   The Journal of Clinical Endocrinology and Metabolism

   81          542   Antimicrobial Agents and Chemotherapy

   82          539   Endocrinology

   83          510   Molecular and Biochemical Parasitology

   84          500   Journal of the American Chemical Society

   85          495   Mammalian Genome

   86          489   Experimental Cell Research

   87          475   Peptides

   88          475   Eukaryotic Cell

   89          465   RNA

   90          463   Journal of Experimental Botany

   91          462   Cell Reports

   92          455   Planta

   93          445   EMBO Reports

   94          443   American Journal of Medical Genetics. Part A

   95          441   The FASEB Journal

   96          434   Immunogenetics

   97          431   Molecular Pharmacology

   98          421   Acta Crystallographica, Section F

   99          417   Molecular Biology and Evolution

  100          413   European Journal of Human Genetics

  101          407   Journal of Molecular Evolution

  102          407   Immunity

  103          404   Molecular Plant-Microbe Interactions

  104          396   Journal of Investigative Dermatology

  105          394   DNA and Cell Biology

  106          390   Neurology

  107          388   Clinical Genetics

  108          380   DNA Sequence

  109          378   Biochimie

  110          376   Biology of Reproduction

  111          369   Comparative Biochemistry and Physiology

  112          360   Virus Research

  113          358   

  114          356   Genes to Cells

  115          346   Journal of Lipid Research

  116          342   Brain Research. Molecular Brain Research

  117          338   Nature Immunology

  118          337   The New England Journal of Medicine

  119          335   Developmental Dynamics

  120          331   Annals of Neurology

  121          327   PLoS Biology

  122          325   BMC Genomics

  123          319   Applied Microbiology and Biotechnology

  124          314   European Journal of Immunology

  125          310   Journal of Medicinal Chemistry

  126          309   Genome Research

  127          307   Investigative Ophthalmology and Visual Science

  128          299   Biological Chemistry Hoppe-Seyler

  129          296   Journal of Human Genetics

  130          281   Cytogenetics and Cell Genetics

  131          280   Glycobiology

  132          280   Journal of General Microbiology

  133          277   Archives of Microbiology

  134          261   Brain

  135          261   Nature Chemical Biology

  136          258   Traffic

  137          258   Phytochemistry

  138          255   Molecular Genetics and Metabolism

  139          252   Molecular Immunology

  140          252   Protein Expression and Purification

  141          251   Nature Medicine

  142          248   Journal of Cellular Biochemistry

  143          247   Fungal Genetics and Biology

  144          240   Cell Cycle

  145          234   DNA Research

  146          233   Circulation Research

  147          233   Diabetes

  148          227   Archives of Virology

  149          223   Cell Research

  150          222   Journal of Structural Biology





5.  STATISTICS FOR SOME LINE TYPES



The following table summarizes the total number of some UniProtKB/Swiss-Prot lines,

as well as the number of entries with at least one such line, and the

frequency of the lines.



                                      Total    Number of  Average

   Line type / subtype                number   entries    per entry

------------------------------------  -------- ---------  ---------



References (RL)                      1299566                 2.28                                         

   Journal                           1126189     472580      1.98       1                                 

   Submitted to EMBL/GenBank/DDBJ     161948     146088      0.28       2                                 

   Submitted to other databases         7763       7101      0.01       3                                 

   Book citation                        1866       1843     <0.01       4                                 

   Plant Gene Register                   613        600     <0.01       5                                 

   Unpublished observations              510        506     <0.01       6                                 

   Thesis                                457        454     <0.01       7                                 

   Patent                                214        207     <0.01       8                                 

   Worm Breeder's Gazette                  6          6     <0.01       9                                 



Total number of distinct authors cited in UniProtKB/Swiss-Prot: 462640



                                      Total    Number of  Average

   Line type / subtype                number   entries    per entry  Rank

------------------------------------  -------- ---------  ---------  ----

Comments (CC)                        2726384                 4.78                                         

   ACTIVITY REGULATION                 18288      18171      0.03      17                                 

   ALLERGEN                              947        947     <0.01      26                                 

   ALTERNATIVE PRODUCTS                25843      25843      0.05      13                                 

   BIOPHYSICOCHEMICAL PROPERTIES       11265      11217      0.02      20                                 

   BIOTECHNOLOGY                        1851       1792     <0.01      24                                 

   CATALYTIC ACTIVITY                 337165     253712      0.59       4                                 

   CAUTION                             14365      14065      0.03      18                                 

   COFACTOR                           132734     120474      0.23       7                                 

   DEVELOPMENTAL STAGE                 14150      14069      0.02      19                                 

   DISEASE                              8273       5568      0.01      21                                 

   DISRUPTION PHENOTYPE                20272      20242      0.04      16                                 

   DOMAIN                              57963      49411      0.10       9                                 

   FUNCTION                           488435     464395      0.86       2                                 

   INDUCTION                           25347      25259      0.04      14                                 

   INTERACTION                         23985      23985      0.04      15                                 

   MASS SPECTROMETRY                    7511       5796      0.01      22                                 

   MISCELLANEOUS                       45630      40088      0.08      11                                 

   PATHWAY                            143743     129772      0.25       6                                 

   PHARMACEUTICAL                        166        159     <0.01      29                                 

   POLYMORPHISM                         1487       1359     <0.01      25                                 

   PTM                                 64042      45664      0.11       8                                 

   RNA EDITING                           634        634     <0.01      28                                 

   SEQUENCE CAUTION                    45159      45089      0.08      12                                 

   SIMILARITY                         518702     514408      0.91       1                                 

   SUBCELLULAR LOCATION               364166     355704      0.64       3                                 

   SUBUNIT                            296071     290828      0.52       5                                 

   TISSUE SPECIFICITY                  50781      50253      0.09      10                                 

   TOXIC DOSE                            851        680     <0.01      27                                 

   WEB RESOURCE                         6558       5553      0.01      23                                 



Total number of comment topics: 29





                                      Total    Number of  Average

   Line type / subtype                number   entries    per entry  Rank

------------------------------------  -------- ---------  ---------  ----

Features (FT)                        5292942                 9.28                                         

   ACT_SITE                           176467     104958      0.31       9                                 

   BINDING                           1189210     216459      2.09       1                                 

   CARBOHYD                           123798      31439      0.22      14                                 

   CHAIN                              578456     562525      1.01       2                                 

   COILED                              22445      15541      0.04      25                                 

   COMPBIAS                           174388      74082      0.31      10                                 

   CONFLICT                           138882      48427      0.24      12                                 

   CROSSLNK                            24867       8946      0.04      24                                 

   DISULFID                           134823      35864      0.24      13                                 

   DNA_BIND                            12157      10882      0.02      31                                 

   DOMAIN                             214539     131107      0.38       8                                 

   HELIX                              331916      28903      0.58       5                                 

   INIT_MET                            17535      17486      0.03      26                                 

   INTRAMEM                             3013       1384      0.01      34                                 

   LIPID                               13784       8844      0.02      28                                 

   MOD_RES                            261285      74425      0.46       7                                 

   MOTIF                               47460      30924      0.08      21                                 

   MUTAGEN                             94580      19523      0.17      17                                 

   NON_CONS                             2552        826     <0.01      35                                 

   NON_STD                               358        283     <0.01      36                                 

   NON_TER                             12618       9690      0.02      29                                 

   PEPTIDE                             12607       8719      0.02      30                                 

   PROPEP                              15250      12994      0.03      27                                 

   REGION                             320024     149772      0.56       6                                 

   REPEAT                             109011      15152      0.19      15                                 

   SIGNAL                              44161      44160      0.08      22                                 

   SITE                                64935      35233      0.11      19                                 

   STRAND                             338259      27248      0.59       4                                 

   TOPO_DOM                           149612      30364      0.26      11                                 

   TRANSIT                              9523       9403      0.02      32                                 

   TRANSMEM                           381189      79862      0.67       3                                 

   TURN                                80219      23550      0.14      18                                 

   UNSURE                               5744        892      0.01      33                                 

   VAR_SEQ                             53056      22606      0.09      20                                 

   VARIANT                            103573      17485      0.18      16                                 

   ZN_FING                             30646      13089      0.05      23                                 



Total number of feature keys: 36







                                      Total    Number of  Average

   Line type / subtype                number   entries    per entry  Rank      Category

------------------------------------  -------- ---------  ---------  ----      -------------------------------------------

Cross-references (DR)               20592885                36.12                                                           

   ABCD                                 3063       3063      0.01     122      Protocols and materials databases            

   AGR                                 60776      60107      0.11      43      Organism-specific databases                  

   Allergome                            2036       1309     <0.01     129      Protein family/group databases               

   AlphaFoldDB                        546152     546152      0.96       9      3D structure databases                       

   Antibodypedia                       32277      32168      0.06      61      Protocols and materials databases            

   ArachnoServer                        1164       1154     <0.01     139      Organism-specific databases                  

   Araport                             16390      16294      0.03      91      Organism-specific databases                  

   Bgee                                61341      61339      0.11      41      Gene expression databases                    

   BindingDB                            6417       6417      0.01     107      Chemistry databases                          

   BioCyc                              47988      43951      0.08      52      Enzyme and pathway databases                 

   BioGRID                             61209      59317      0.11      42      Protein-protein interaction databases        

   BioGRID-ORCS                        44919      44336      0.08      54      Miscellaneous databases                      

   BioMuta                             20308      20283      0.04      75      Genetic variation databases                  

   BMRB                                 6905       6905      0.01     105      3D structure databases                       

   BRENDA                              20299      18498      0.04      76      Enzyme and pathway databases                 

   CarbonylDB                           1159       1159     <0.01     140      PTM databases                                

   CAZy                                 9603       8650      0.02      98      Protein family/group databases               

   CCDS                                49435      34604      0.09      50      Sequence databases                           

   CDD                                382467     300618      0.67      16      Family and domain databases                  

   CGD                                  2065       2048     <0.01     128      Organism-specific databases                  

   ChEMBL                               8871       8670      0.02      99      Chemistry databases                          

   ChiTaRS                             29738      29694      0.05      63      Miscellaneous databases                      

   CLAE                                  359        356     <0.01     155      Protein family/group databases               

   CollecTF                              137        137     <0.01     162      Gene expression databases                    

   ComplexPortal                       13251       7253      0.02      96      Protein-protein interaction databases        

   COMPLUYEAST-2DPAGE                     97         97     <0.01     164      2D gel databases                             

   ConoServer                            967        879     <0.01     142      Organism-specific databases                  

   CORUM                                5811       5811      0.01     108      Protein-protein interaction databases        

   CPTAC                                3472       1929      0.01     118      Proteomic databases                          

   CPTC                                  384        384     <0.01     153      Protocols and materials databases            

   CTD                                 76276      75365      0.13      39      Organism-specific databases                  

   DEPOD                                 254        254     <0.01     160      PTM databases                                

   dictyBase                            4224       4110      0.01     115      Organism-specific databases                  

   DIP                                 17537      17497      0.03      87      Protein-protein interaction databases        

   DisGeNET                            17012      16794      0.03      89      Organism-specific databases                  

   DisProt                              1736       1728     <0.01     131      Family and domain databases                  

   DMDM                                16170      16170      0.03      93      Genetic variation databases                  

   DNASU                               48326      48248      0.08      51      Protocols and materials databases            

   DOSAC-COBS-2DPAGE                     145        145     <0.01     161      2D gel databases                             

   DrugBank                            31175       4771      0.05      62      Chemistry databases                          

   DrugCentral                          2565       2565     <0.01     124      Chemistry databases                          

   EchoBASE                             4158       4158      0.01     116      Organism-specific databases                  

   eggNOG                             338896     333062      0.59      18      Phylogenomic databases                       

   ELM                                  1814       1814     <0.01     130      Protein-protein interaction databases        

   EMBL                              1004169     557433      1.76       3      Sequence databases                           

   Ensembl                             98515      48460      0.17      35      Genome annotation databases                  

   EnsemblBacteria                    309955     298424      0.54      20      Genome annotation databases                  

   EnsemblFungi                        23047      22605      0.04      69      Genome annotation databases                  

   EnsemblMetazoa                      18857      11444      0.03      82      Genome annotation databases                  

   EnsemblPlants                       35484      22109      0.06      58      Genome annotation databases                  

   EnsemblProtists                      5298       5043      0.01     111      Genome annotation databases                  

   EPD                                 23251      23251      0.04      67      Proteomic databases                          

   ESTHER                               2981       2980      0.01     123      Protein family/group databases               

   euHCVdb                                55         44     <0.01     166      Organism-specific databases                  

   EvolutionaryTrace                   16760      16760      0.03      90      Miscellaneous databases                      

   ExpressionAtlas                     52825      52825      0.09      48      Gene expression databases                    

   FlyBase                              4126       4011      0.01     117      Organism-specific databases                  

   Gene3D                             738284     458298      1.29       6      Family and domain databases                  

   GeneCards                           20377      20243      0.04      72      Organism-specific databases                  

   GeneID                             274931     267231      0.48      24      Genome annotation databases                  

   GeneReviews                          1578       1575     <0.01     132      Organism-specific databases                  

   GeneTree                            57081      57072      0.10      44      Phylogenomic databases                       

   Genevisible                         55276      55276      0.10      46      Gene expression databases                    

   GeneWiki                            10351      10269      0.02      97      Miscellaneous databases                      

   GenomeRNAi                          22256      22256      0.04      70      Miscellaneous databases                      

   GlyConnect                           2372       2215     <0.01     125      PTM databases                                

   GlyCosmos                           28902      28902      0.05      64      PTM databases                                

   GlyGen                              21596      21596      0.04      71      PTM databases                                

   GO                                3165709     546390      5.55       1      Ontologies                                   

   Gramene                             35484      22109      0.06      59      Genome annotation databases                  

   GuidetoPHARMACOLOGY                  2179       2179     <0.01     127      Chemistry databases                          

   HAMAP                              330866     327933      0.58      19      Family and domain databases                  

   HGNC                                20372      20242      0.04      73      Organism-specific databases                  

   HOGENOM                            426683     426683      0.75      15      Phylogenomic databases                       

   HPA                                 19323      19203      0.03      81      Organism-specific databases                  

   IDEAL                                 986        986     <0.01     141      Family and domain databases                  

   IMGT_GENE-DB                          267        267     <0.01     159      Protein family/group databases               

   InParanoid                         163517     163517      0.29      27      Phylogenomic databases                       

   IntAct                              56812      56812      0.10      45      Protein-protein interaction databases        

   InterPro                          2399884     550928      4.21       2      Family and domain databases                  

   iPTMnet                             54156      54156      0.09      47      PTM databases                                

   jPOST                               26408      26408      0.05      65      Proteomic databases                          

   KEGG                               503579     478355      0.88      12      Genome annotation databases                  

   LegioList                             765        763     <0.01     147      Organism-specific databases                  

   Leproma                               672        669     <0.01     148      Organism-specific databases                  

   MaizeGDB                              529        525     <0.01     150      Organism-specific databases                  

   MalaCards                            5619       5610      0.01     109      Organism-specific databases                  

   MANE-Select                         18373      18261      0.03      85      Genome annotation databases                  

   MassIVE                             18715      18715      0.03      83      Proteomic databases                          

   MaxQB                               33722      33722      0.06      60      Proteomic databases                          

   MEROPS                              14193      13775      0.02      94      Protein family/group databases               

   MetOSite                             3111       3111      0.01     121      PTM databases                                

   MGI                                 17088      17047      0.03      88      Organism-specific databases                  

   MIM                                 23196      16030      0.04      68      Organism-specific databases                  

   MINT                                23452      23452      0.04      66      Protein-protein interaction databases        

   MoonDB                                348        348     <0.01     156      Protein family/group databases               

   MoonProt                              281        281     <0.01     158      Protein family/group databases               

   NCBIfam                            299700     277265      0.53      21      Family and domain databases                  

   neXtProt                            20324      20324      0.04      74      Organism-specific databases                  

   NIAGADS                                69         69     <0.01     165      Organism-specific databases                  

   OGP                                   373        373     <0.01     154      2D gel databases                             

   OMA                                430195     430195      0.75      14      Phylogenomic databases                       

   OpenTargets                         18424      18279      0.03      84      Organism-specific databases                  

   Orphanet                             8144       4374      0.01     101      Organism-specific databases                  

   OrthoDB                            274783     274783      0.48      25      Phylogenomic databases                       

   PANTHER                           1002474     501638      1.76       4      Family and domain databases                  

   PathwayCommons                      19454      19454      0.03      80      Enzyme and pathway databases                 

   PATRIC                              92912      92912      0.16      37      Genome annotation databases                  

   PaxDb                              131498     131498      0.23      31      Proteomic databases                          

   PCDDB                                 127        127     <0.01     163      3D structure databases                       

   PDB                                275445      34119      0.48      23      3D structure databases                       

   PDBsum                             275445      34119      0.48      22      3D structure databases                       

   PeptideAtlas                        39451      39451      0.07      57      Proteomic databases                          

   PeroxiBase                            792        771     <0.01     144      Protein family/group databases               

   Pfam                               821789     538883      1.44       5      Family and domain databases                  

   PharmGKB                            18033      18014      0.03      86      Organism-specific databases                  

   Pharos                              20224      20224      0.04      78      Miscellaneous databases                      

   PHI-base                             1536       1274     <0.01     133      Miscellaneous databases                      

   PhosphoSitePlus                     39628      39628      0.07      56      PTM databases                                

   PhylomeDB                          115454     115454      0.20      33      Phylogenomic databases                       

   PIR                                125023     114704      0.22      32      Sequence databases                           

   PIRSF                              110842     109675      0.19      34      Family and domain databases                  

   PlantReactome                        1278        750     <0.01     136      Enzyme and pathway databases                 

   PomBase                              5129       5125      0.01     112      Organism-specific databases                  

   PRIDE                                 636        636     <0.01     149      Proteomic databases                          

   PRINTS                             150615     129354      0.26      28      Family and domain databases                  

   PRO                                 98140      98140      0.17      36      Miscellaneous databases                      

   ProMEX                                486        486     <0.01     152      Proteomic databases                          

   PROSITE                            490433     310432      0.86      13      Family and domain databases                  

   Proteomes                          505141     462035      0.89      11      Miscellaneous databases                      

   ProteomicsDB                        72660      45354      0.13      40      Proteomic databases                          

   PseudoCAP                            1460       1451     <0.01     135      Organism-specific databases                  

   Reactome                           141328      37969      0.25      29      Enzyme and pathway databases                 

   REBASE                                790        395     <0.01     145      Protein family/group databases               

   RefSeq                             556904     426108      0.98       8      Sequence databases                           

   REPRODUCTION-2DPAGE                  1260       1039     <0.01     137      2D gel databases                             

   RGD                                  8112       8111      0.01     102      Organism-specific databases                  

   RNAct                               43099      43099      0.08      55      Miscellaneous databases                      

   SABIO-RK                             5578       5578      0.01     110      Enzyme and pathway databases                 

   SASBDB                                774        774     <0.01     146      3D structure databases                       

   SFLD                                20265       9042      0.04      77      Family and domain databases                  

   SGD                                  6746       6741      0.01     106      Organism-specific databases                  

   SignaLink                           19959      19959      0.04      79      Enzyme and pathway databases                 

   SIGNOR                               7262       7262      0.01     103      Enzyme and pathway databases                 

   SMART                              205240     148124      0.36      26      Family and domain databases                  

   SMR                                514224     514224      0.90      10      3D structure databases                       

   STRING                             366524     366524      0.64      17      Protein-protein interaction databases        

   SUPFAM                             647395     458882      1.14       7      Family and domain databases                  

   SWISS-2DPAGE                         1177       1177     <0.01     138      2D gel databases                             

   SwissLipids                          1478       1394     <0.01     134      Chemistry databases                          

   SwissPalm                           13335      13335      0.02      95      PTM databases                                

   TAIR                                16364      16278      0.03      92      Organism-specific databases                  

   TCDB                                 8416       8342      0.01     100      Protein family/group databases               

   TopDownProteomics                    3236       2957      0.01     120      Proteomic databases                          

   TreeFam                             46115      46092      0.08      53      Phylogenomic databases                       

   TubercuList                          2325       2289     <0.01     126      Organism-specific databases                  

   UCD-2DPAGE                            496        496     <0.01     151      2D gel databases                             

   UCSC                                50826      46366      0.09      49      Genome annotation databases                  

   UniLectin                             315        315     <0.01     157      Protein family/group databases               

   UniPathway                         139766     126137      0.25      30      Enzyme and pathway databases                 

   VEuPathDB                           80969      74545      0.14      38      Organism-specific databases                  

   VGNC                                 4494       4480      0.01     114      Organism-specific databases                  

   WBParaSite                             49         47     <0.01     167      Genome annotation databases                  

   World-2DPAGE                          935        923     <0.01     143      2D gel databases                             

   WormBase                             6943       5061      0.01     104      Organism-specific databases                  

   Xenbase                              4836       4836      0.01     113      Organism-specific databases                  

   ZFIN                                 3269       3269      0.01     119      Organism-specific databases                  



Total number of cross-referenced databases: 167



6.  AMINO ACID COMPOSITION



   6.1  Composition in percent for the complete database



   Ala (A) 8.25   Gln (Q) 3.93   Leu (L) 9.65   Ser (S) 6.65

   Arg (R) 5.53   Glu (E) 6.72   Lys (K) 5.80   Thr (T) 5.36

   Asn (N) 4.06   Gly (G) 7.07   Met (M) 2.41   Trp (W) 1.10

   Asp (D) 5.46   His (H) 2.27   Phe (F) 3.86   Tyr (Y) 2.92

   Cys (C) 1.38   Ile (I) 5.91   Pro (P) 4.74   Val (V) 6.85



   Asx (B) 0.000  Glx (Z) 0.000  Xaa (X) 0.00



   



   Legend: gray = aliphatic, red = acidic, green = small hydroxy,

           blue = basic, black = aromatic, white = amide, yellow = sulfur





   6.2  Classification of the amino acids by their frequency



   Leu, Ala, Gly, Val, Glu, Ser, Ile, Lys, Arg, Asp, Thr, Pro, Asn, Gln,

   Phe, Tyr, Met, His, Cys, Trp





7.  MISCELLANEOUS STATISTICS



4466 entries are encoded on a mitochondrion, and 3997 are encoded on a plasmid.



12199 entries are encoded on a plastid, 

of which 21 are encoded on apicoplasts, 

11634 on chloroplasts, 

51 on organellar chromatophores,

145 on cyanelles, 

149 on non-photosynthetic plastids and 

199 on unspecified types of plastid.



Number of entries with at least one sequence correction: 81063