Expasy logo

Documents

Due to maintenance work, this service will be unavailable Tuesday 16 between 06:00 and 06:30 - CEST. Apologies for the inconvenience.



         UniProtKB/Swiss-Prot protein knowledgebase release 2024_03 statistics





1.  INTRODUCTION



Release 2024_03 of 29-May-2024 of UniProtKB/Swiss-Prot contains 571609 sequence

entries, curated from 299621 unique references and comprising 206878625 amino acids. 



336 sequences have been added since release 2024_02, the sequence data of

185 existing entries has been updated and the annotations of

332060 entries have been revised.



Number of fragments: 9294

Number of additional sequences produced by alternative splicing, initiation or promoter usage, or ribosomal frameshifting: 41169





Protein existence (PE):           entries     %



1: Evidence at protein level       114697   20.1%

2: Evidence at transcript level     55769    9.8%

3: Inferred from homology          386317   67.6%

4: Predicted                        13000    2.3%

5: Uncertain                         1826    0.3%



The growth of the database is summarized below.



   





2.  TAXONOMIC ORIGIN



   Total number of species represented in this release of UniProtKB/Swiss-Prot: 14626



   The first twenty species represent 123063 sequences:  21.5 % of the total

   number of entries.





   2.1 Table of the frequency of occurrence of species



        Species represented 1x: 5964

                            2x: 2121

                            3x: 1143

                            4x:  774

                            5x:  538

                            6x:  443

                            7x:  328

                            8x:  279

                            9x:  242

                           10x:  158

                       11- 20x:  838

                       21- 50x:  512

                       51-100x:  230

                         >100x: 1056





   2.2  Table of the most represented species



  ------  ---------  --------------------------------------------

  Number  Frequency  Species

  ------  ---------  --------------------------------------------

       1      20435  Homo sapiens (Human)

       2      17212  Mus musculus (Mouse)

       3      16386  Arabidopsis thaliana (Mouse-ear cress)

       4       8199  Rattus norvegicus (Rat)

       5       6727  Saccharomyces cerevisiae (strain ATCC 204508 / S288c) (Baker's yeast)

       6       6046  Bos taurus (Bovine)

       7       5121  Schizosaccharomyces pombe (strain 972 / ATCC 24843) (Fission yeast)

       8       4530  Escherichia coli (strain K12)

       9       4472  Caenorhabditis elegans

      10       4191  Bacillus subtilis (strain 168)

      11       4187  Oryza sativa subsp. japonica (Rice)

      12       4160  Dictyostelium discoideum (Social amoeba)

      13       3778  Drosophila melanogaster (Fruit fly)

      14       3506  Xenopus laevis (African clawed frog)

      15       3332  Danio rerio (Zebrafish) (Brachydanio rerio)

      16       2309  Mycobacterium tuberculosis (strain ATCC 25618 / H37Rv)

      17       2309  Gallus gallus (Chicken)

      18       2218  Pongo abelii (Sumatran orangutan) (Pongo pygmaeus abelii)

      19       2046  Escherichia coli O157:H7

      20       1899  Mycobacterium tuberculosis (strain CDC 1551 / Oshkosh)

      21       1827  Salmonella typhimurium (strain LT2 / SGSC1412 / ATCC 700720)

      22       1787  Methanocaldococcus jannaschii  

      23       1711  Xenopus tropicalis (Western clawed frog) (Silurana tropicalis)

      24       1704  Haemophilus influenzae (strain ATCC 51907 / DSM 11121 / KW20 / Rd)

      25       1702  Escherichia coli O6:H1 (strain CFT073 / ATCC 700928 / UPEC)

      26       1696  Shigella flexneri

      27       1460  Pseudomonas aeruginosa 

      28       1458  Sus scrofa (Pig)

      29       1349  Salmonella typhi

      30       1244  Mycobacterium bovis (strain ATCC BAA-935 / AF2122/97)

      31       1176  Macaca fascicularis (Crab-eating macaque) (Cynomolgus monkey)

      32       1144  Candida albicans (strain SC5314 / ATCC MYA-2876) (Yeast)

      33       1098  Synechocystis sp. (strain PCC 6803 / Kazusa)

      34       1038  Archaeoglobus fulgidus 

      35       1030  Yersinia pestis

      36       1016  Emericella nidulans  

      37        997  Vibrio cholerae serotype O1 (strain ATCC 39315 / El Tor Inaba N16961)

      38        979  Oryctolagus cuniculus (Rabbit)

      39        967  Neurospora crassa 

      40        942  Staphylococcus aureus (strain Mu50 / ATCC 700699)

      41        930  Salmonella paratyphi A (strain ATCC 9150 / SARB42)

      42        929  Staphylococcus aureus (strain N315)

      43        928  Eremothecium gossypii   

      44        925  Aspergillus fumigatus (strain ATCC MYA-4609 / CBS 101355 / FGSC A1100 / Af293) 

      45        919  Kluyveromyces lactis   

      46        909  Acanthamoeba polyphaga mimivirus (APMV)

      47        905  Staphylococcus aureus (strain COL)

      48        896  Staphylococcus aureus (strain MW2)

      49        894  Escherichia coli O6:K15:H31 (strain 536 / UPEC)

      50        890  Staphylococcus aureus (strain MSSA476)

      51        888  Candida glabrata   

      52        888  Staphylococcus aureus (strain MRSA252)

      53        887  Rhizobium meliloti (strain 1021) (Ensifer meliloti) (Sinorhizobium meliloti)

      54        882  Salmonella choleraesuis (strain SC-B67)

      55        879  Shigella sonnei (strain Ss046)

      56        872  Oryza sativa subsp. indica (Rice)

      57        863  Yersinia pseudotuberculosis serotype I (strain IP32953)

      58        850  Zea mays (Maize)

      59        847  Canis lupus familiaris (Dog) (Canis familiaris)

      60        847  Escherichia coli O9:H4 (strain HS)

      61        838  Escherichia coli O139:H28 (strain E24377A / ETEC)

      62        829  Shigella boydii serotype 4 (strain Sb227)

      63        825  Escherichia coli (strain UTI89 / UPEC)

      64        822  Shigella dysenteriae serotype 1 (strain Sd197)

      65        822  Escherichia coli 

      66        817  Streptomyces coelicolor (strain ATCC BAA-471 / A3(2) / M145)

      67        811  Staphylococcus aureus (strain NCTC 8325 / PS 47)

      68        804  Pectobacterium atrosepticum (strain SCRI 1043 / ATCC BAA-672) 

      69        796  Vibrio parahaemolyticus serotype O3:K6 (strain RIMD 2210633)

      70        791  Escherichia coli (strain SMS-3-5 / SECEC)

      71        788  Aquifex aeolicus (strain VF5)

      72        779  Escherichia coli O127:H6 (strain E2348/69 / EPEC)

      73        771  Escherichia coli (strain K12 / DH10B)

      74        770  Pasteurella multocida (strain Pm70)

      75        767  Escherichia coli O17:K52:H18 (strain UMN026 / ExPEC)

      76        765  Escherichia coli (strain K12 / MC4100 / BW2952)

      77        762  Escherichia coli (strain 55989 / EAEC)

      78        761  Escherichia coli O8 (strain IAI1)

      79        760  Shigella flexneri serotype 5b (strain 8401)

      80        760  Staphylococcus epidermidis (strain ATCC 35984 / RP62A)

      81        760  Staphylococcus epidermidis (strain ATCC 12228 / FDA PCI 1200)

      82        759  Escherichia coli O45:K1 (strain S88 / ExPEC)

      83        758  Bacillus anthracis

      84        756  Escherichia coli (strain SE11)

      85        753  Escherichia coli O7:K1 (strain IAI39 / ExPEC)

      86        749  Photorhabdus laumondii subsp. laumondii (strain DSM 15139 / CIP 105565 / TT01) 

      87        748  Escherichia coli O157:H7 (strain EC4115 / EHEC)

      88        744  Halalkalibacterium halodurans  

      89        739  Yersinia enterocolitica serotype O:8 / biotype 1B (strain NCTC 13174 / 8081)

      90        734  Pseudomonas putida 

      91        733  Vibrio vulnificus (strain CMCP6)

      92        731  Escherichia coli O81 (strain ED1a)

      93        722  Salmonella enteritidis PT4 (strain P125109)

      94        722  Escherichia coli

      95        718  Vibrio vulnificus (strain YJ016)

      96        716  Salmonella paratyphi B (strain ATCC BAA-1250 / SPB7)

      97        715  Yersinia pestis bv. Antiqua (strain Nepal516)

      98        715  Enterobacter sp. (strain 638)

      99        715  Klebsiella pneumoniae subsp. pneumoniae (strain ATCC 700721 / MGH 78578)

     100        715  Escherichia coli O1:K1 / APEC

     101        714  Salmonella paratyphi A (strain AKU_12601)

     102        713  Salmonella agona (strain SL483)

     103        713  Yersinia pseudotuberculosis serotype O:1b (strain IP 31758)

     104        713  Salmonella newport (strain SL254)

     105        712  Salmonella schwarzengrund (strain CVM19633)

     106        711  Yersinia pestis bv. Antiqua (strain Antiqua)

     107        710  Salmonella heidelberg (strain SL476)

     108        707  Nostoc sp. (strain PCC 7120 / SAG 25.82 / UTEX 2576)

     109        702  Salmonella dublin (strain CT_02021853)

     110        699  Klebsiella pneumoniae (strain 342)

     111        698  Shigella boydii serotype 18 (strain CDC 3083-94 / BS512)

     112        695  Escherichia fergusonii 

     113        692  Pan troglodytes (Chimpanzee)

     114        686  Mycoplasma pneumoniae (strain ATCC 29342 / M129 / Subtype 1) 

     115        684  Salmonella gallinarum (strain 287/91 / NCTC 13346)

     116        683  Pseudomonas syringae pv. tomato (strain ATCC BAA-871 / DC3000)

     117        679  Staphylococcus aureus (strain USA300)

     118        679  Citrobacter koseri (strain ATCC BAA-895 / CDC 4225-83 / SGSC4696)

     119        672  Serratia proteamaculans (strain 568)

     120        670  Bacillus cereus 

     121        669  Mycobacterium leprae (strain TN)

     122        668  Agrobacterium fabrum (strain C58 / ATCC 33970) (Agrobacterium tumefaciens 

     123        667  Bradyrhizobium diazoefficiens 

     124        667  Yarrowia lipolytica (strain CLIB 122 / E 150) (Yeast) (Candida lipolytica)

     125        667  Yersinia pestis (strain Pestoides F)

     126        662  Shewanella oneidensis 

     127        658  Sinorhizobium fredii (strain NBRC 101917 / NGR234)

     128        653  Debaryomyces hansenii   

     129        643  Staphylococcus aureus (strain bovine RF122 / ET3-1)

     130        642  Salmonella arizonae (strain ATCC BAA-731 / CDC346-86 / RSK2980)

     131        642  Yersinia pseudotuberculosis serotype O:3 (strain YPIII)

     132        634  Yersinia pseudotuberculosis serotype IB (strain PB1/+)

     133        623  Methanothermobacter thermautotrophicus  

     134        622  Listeria monocytogenes serovar 1/2a (strain ATCC BAA-679 / EGD-e)

     135        622  Treponema pallidum (strain Nichols)

     136        622  Cronobacter sakazakii (strain ATCC BAA-894) (Enterobacter sakazakii)

     137        620  Pseudomonas aeruginosa (strain UCBPP-PA14)

     138        615  Xanthomonas campestris pv. campestris 

     139        614  Staphylococcus haemolyticus (strain JCSC1435)

     140        613  Mesorhizobium japonicum  (Mesorhizobium loti 

     141        612  Helicobacter pylori (strain ATCC 700392 / 26695) (Campylobacter pylori)

     142        605  Listeria innocua serovar 6a (strain ATCC BAA-680 / CLIP 11262)

     143        603  Ralstonia nicotianae (strain GMI1000) (Ralstonia solanacearum)

     144        602  Staphylococcus saprophyticus subsp. saprophyticus 

     145        602  Photobacterium profundum (strain SS9)

     146        601  Salmonella paratyphi C (strain RKS4594)

     147        600  Yersinia pestis bv. Antiqua (strain Angola)

     148        595  Bacillus cereus (strain ATCC 10987 / NRS 248)

     149        591  Pectobacterium carotovorum subsp. carotovorum (strain PC1)

     150        588  Mycolicibacterium smegmatis (strain ATCC 700084 / mc(2)155) 

     151        588  Neisseria meningitidis serogroup B (strain MC58)

     152        584  Rickettsia prowazekii (strain Madrid E)

     153        582  Caenorhabditis briggsae

     154        579  Brucella suis biovar 1 (strain 1330)

     155        576  Brucella melitensis biotype 1 

     156        575  Caulobacter vibrioides (strain ATCC 19089 / CB15) (Caulobacter crescentus)

     157        573  Aliivibrio fischeri (strain ATCC 700601 / ES114) (Vibrio fischeri)

     158        572  Buchnera aphidicola subsp. Acyrthosiphon pisum (strain APS) 

     159        572  Aspergillus oryzae (strain ATCC 42149 / RIB 40) (Yellow koji mold)

     160        569  Bacillus thuringiensis subsp. konkukian (strain 97-27)

     161        568  Helicobacter pylori (strain J99 / ATCC 700824) (Campylobacter pylori J99)

     162        568  Pseudomonas syringae pv. syringae (strain B728a)

     163        565  Bacillus licheniformis 

     164        565  Thermotoga maritima 

     165        562  Bacillus cereus (strain ZK / E33L)

     166        562  Buchnera aphidicola subsp. Schizaphis graminum (strain Sg)

     167        559  Clostridium acetobutylicum 

     168        557  Xanthomonas axonopodis pv. citri (strain 306)

     169        555  Pseudomonas fluorescens (strain Pf0-1)

     170        554  Pseudomonas fluorescens (strain ATCC BAA-477 / NRRL B-23932 / Pf-5)

     171        554  Neisseria meningitidis serogroup A / serotype 4A (strain DSM 15465 / Z2491)

     172        553  Oceanobacillus iheyensis 

     173        547  Pseudomonas savastanoi pv. phaseolicola  (Pseudomonas syringae pv. phaseolicola 

     174        543  Corynebacterium glutamicum 

     175        540  Lactococcus lactis subsp. lactis (strain IL1403) (Streptococcus lactis)

     176        531  Erwinia tasmaniensis 

     177        530  Listeria monocytogenes serotype 4b (strain F2365)

     178        529  Sodalis glossinidius (strain morsitans)

     179        529  Bordetella bronchiseptica (strain ATCC BAA-588 / NCTC 13252 / RB50) 

     180        524  Staphylococcus aureus (strain Newman)

     181        523  Vibrio cholerae serotype O1 (strain ATCC 39541 / Classical Ogawa 395 / O395)

     182        522  Xylella fastidiosa (strain 9a5c)

     183        521  Deinococcus radiodurans 

     184        519  Chromobacterium violaceum 

     185        519  Streptococcus pneumoniae serotype 4 (strain ATCC BAA-334 / TIGR4)

     186        519  Methanosarcina acetivorans (strain ATCC 35395 / DSM 2834 / JCM 12185 / C2A)

     187        516  Bordetella pertussis (strain Tohama I / ATCC BAA-589 / NCTC 13251)

     188        515  Xylella fastidiosa (strain Temecula1 / ATCC 700964)

     189        512  Geobacillus kaustophilus (strain HTA426)

     190        512  Pseudomonas aeruginosa (strain PA7)

     191        512  Haemophilus ducreyi (strain 35000HP / ATCC 700724)

     192        511  Streptomyces avermitilis 

     193        511  Acinetobacter baylyi (strain ATCC 33305 / BD413 / ADP1)

     194        508  Bordetella parapertussis (strain 12822 / ATCC BAA-587 / NCTC 13253)

     195        507  Streptococcus pneumoniae (strain ATCC BAA-255 / R6)

     196        507  Buchnera aphidicola subsp. Baizongia pistaciae (strain Bp)

     197        506  Solanum lycopersicum (Tomato) (Lycopersicon esculentum)

     198        506  Pyrococcus furiosus (strain ATCC 43587 / DSM 3638 / JCM 8422 / Vc1)

     199        505  Nicotiana tabacum (Common tobacco)

     200        504  Pseudomonas entomophila (strain L48)

     201        499  Methanosarcina mazei  

     202        499  Haemophilus influenzae (strain 86-028NP)

     203        499  Brucella abortus biovar 1 (strain 9-941)

     204        498  Thermosynechococcus vestitus (strain NIES-2133 / IAM M-273 / BP-1)

     205        497  Proteus mirabilis (strain HI4320)

     206        497  Burkholderia pseudomallei (strain K96243)

     207        496  Pyrococcus horikoshii 

     208        496  Synechococcus elongatus (strain ATCC 33912 / PCC 7942 / FACHB-805) 

     209        496  Rickettsia conorii (strain ATCC VR-613 / Malish 7)

     210        496  Shouchella clausii (strain KSM-K16) (Alkalihalobacillus clausii)

     211        494  Xanthomonas campestris pv. campestris (strain 8004)

     212        493  Halobacterium salinarum (strain ATCC 700922 / JCM 11081 / NRC-1) 

     213        492  Bacillus velezensis (strain DSM 23117 / BGSC 10A6 / LMG 26770 / FZB42) 

     214        492  Brucella abortus (strain 2308)

     215        492  Saccharolobus solfataricus (strain ATCC 35092 / DSM 1617 / JCM 11322 / P2) 

     216        491  Vibrio campbellii (strain ATCC BAA-1116)

     217        487  Shewanella sp. (strain MR-7)

     218        486  Mannheimia succiniciproducens (strain MBEL55E)

     219        484  Shewanella sp. (strain MR-4)

     220        484  Pseudomonas aeruginosa (strain LESB58)

     221        484  Staphylococcus aureus (strain Mu3 / ATCC 700698)

     222        483  Lactiplantibacillus plantarum (strain ATCC BAA-793 / NCIMB 8826 / WCFS1) 

     223        483  Mycoplasma genitalium (strain ATCC 33530 / DSM 19775 / NCTC 10195 / G37) 

     224        479  Pseudomonas putida (strain ATCC 700007 / DSM 6899 / BCRC 17059 / F1)

     225        478  Pyrococcus abyssi (strain GE5 / Orsay)

     226        476  Cupriavidus necator  

     227        475  Burkholderia lata 

     228        475  Campylobacter jejuni subsp. jejuni serotype O:2 

     229        472  Rhodopseudomonas palustris (strain ATCC BAA-98 / CGA009)

     230        470  Enterococcus faecalis (strain ATCC 700802 / V583)

     231        470  Clostridium perfringens (strain 13 / Type A)

     232        470  Cereibacter sphaeroides  

     233        468  Pseudomonas putida (strain GB-1)

     234        468  Shewanella sp. (strain ANA-3)

     235        467  Shewanella frigidimarina (strain NCIMB 400)

     236        467  Aeromonas hydrophila subsp. hydrophila 

     237        466  Xanthomonas euvesicatoria pv. vesicatoria (strain 85-10) 

     238        465  Trichormus variabilis (strain ATCC 29413 / PCC 7937) (Anabaena variabilis)

     239        463  Burkholderia mallei (strain ATCC 23344)

     240        461  Cupriavidus pinatubonensis (strain JMP 134 / LMG 1197) (Cupriavidus necator 

     241        460  Methylococcus capsulatus (strain ATCC 33009 / NCIMB 11132 / Bath)

     242        460  Ovis aries (Sheep)

     243        457  Rickettsia felis (strain ATCC VR-1525 / URRWXCal2) (Rickettsia azadi)

     244        455  Xanthomonas oryzae pv. oryzae (strain MAFF 311018)

     245        455  Staphylococcus aureus (strain JH1)

     246        455  Shewanella baltica (strain OS185)

     247        453  Mycolicibacterium paratuberculosis (strain ATCC BAA-968 / K-10) 

     248        453  Pseudomonas putida (strain W619)

     249        453  Streptococcus mutans serotype c (strain ATCC 700610 / UA159)

     250        452  Aeromonas salmonicida (strain A449)





   

   2.3  Taxonomic distribution of the sequences



   



   Kingdom        sequences (% of the database)

    Archaea           19756 (  3%)

    Bacteria         336427 ( 59%)

    Eukaryota        198033 ( 35%)

    Viruses           17393 (  3%)





   Within Eukaryota:



   



    Category            sequences (% of Eukaryota) (% of the complete database)

     Human                  20436 ( 10%)           (  4%)

     Other Mammalia         47438 ( 24%)           (  8%)

     Other Vertebrata       18980 ( 10%)           (  3%)

     Viridiplantae          41744 ( 21%)           (  7%)

     Fungi                  36951 ( 19%)           (  6%)

     Insecta                 9845 (  5%)           (  2%)

     Nematoda                5390 (  3%)           (  1%)

     Other                  17249 (  9%)           (  3%)







3.  SEQUENCE SIZE



   Repartition of the sequences by size (excluding fragments)



               From   To  Number             From   To   Number

                  1-  50    9979             1001-1100     4135

                 51- 100   43598             1101-1200     2904

                101- 150   59894             1201-1300     2217

                151- 200   59669             1301-1400     2082

                201- 250   58561             1401-1500     1684

                251- 300   52515             1501-1600      839

                301- 350   52987             1601-1700      648

                351- 400   46023             1701-1800      591

                401- 450   37758             1801-1900      510

                451- 500   30638             1901-2000      399

                501- 550   22348             2001-2100      275

                551- 600   15861             2101-2200      387

                601- 650   13191             2201-2300      341

                651- 700    9423             2301-2400      235

                701- 750    7890             2401-2500      197

                751- 800    5708             >2500         1471

                801- 850    4897

                851- 900    5325

                901- 950    4117

                951-1000    3018



   





   The average sequence length in UniProtKB/Swiss-Prot is 361 amino acids.



   The shortest sequence is   GWA_SEPOF (P83570):     2 amino acids.

   The longest sequence is  TITIN_MOUSE (A2ASS6): 35213 amino acids.





4.  JOURNAL CITATIONS



   Note: the following citation statistics reflect the number of distinct

         journal citations.



   Total number of journals cited in this release of UniProtKB/Swiss-Prot: 3158





   4.1 Table of the frequency of journal citations



        Journals cited 1x: 1000

                       2x:  426

                       3x:  223

                       4x:  149

                       5x:  123

                       6x:   90

                       7x:   64

                       8x:   77

                       9x:   47

                      10x:   43

                  11- 20x:  243

                  21- 50x:  268

                  51-100x:  145

                    >100x:  260





   4.2  List of the most cited journals in UniProtKB/Swiss-Prot



   Nb    Citations   Journal name

   --    ---------   -------------------------------------------------------------

    1        27287   Journal of Biological Chemistry

    2        12755   Proceedings of the National Academy of Sciences of the U.S.A.

    3         7219   Journal of Bacteriology

    4         6079   Biochemical and Biophysical Research Communications

    5         5885   Biochemistry

    6         5360   Nucleic Acids Research

    7         5176   Nature

    8         5108   FEBS Letters

    9         5010   The EMBO Journal

   10         4894   Gene

   11         4604   Journal of Molecular Biology

   12         4584   Molecular and Cellular Biology

   13         4050   Biochimica et Biophysica Acta

   14         3904   Cell

   15         3622   Journal of Virology

   16         3515   European Journal of Biochemistry

   17         3401   Science

   18         3186   Biochemical Journal

   19         2869   Molecular Microbiology

   20         2817   Plant Physiology

   21         2638   PLoS ONE

   22         2548   Genomics

   23         2445   The American Journal of Human Genetics

   24         2372   Journal of Cell Biology

   25         2201   The Plant Cell

   26         2039   The Plant Journal

   27         2034   Human Molecular Genetics

   28         1961   Genes and Development

   29         1925   Plant Molecular Biology

   30         1910   Virology

   31         1863   Molecular Cell

   32         1853   Nature Genetics

   33         1836   Molecular Biology of the Cell

   34         1832   Development

   35         1700   Journal of Immunology

   36         1661   Human Mutation

   37         1569   Oncogene

   38         1475   Structure

   39         1433   Molecular and General Genetics

   40         1429   Journal of Biochemistry

   41         1426   Genetics

   42         1395   Journal of Cell Science

   43         1318   Nature Communications

   44         1289   Blood

   45         1279   Infection and Immunity

   46         1190   Journal of General Virology

   47         1189   Developmental Biology

   48         1184   Microbiology

   49         1159   Archives of Biochemistry and Biophysics

   50         1156   Current Biology

   51         1034   Journal of Neuroscience

   52         1029   Applied and Environmental Microbiology

   53          996   Acta Crystallographica, Section D

   54          928   Cancer Research

   55          919   Scientific Reports

   56          918   FEMS Microbiology Letters

   57          916   PLoS Genetics

   58          891   Toxicon

   59          890   American Journal of Physiology

   60          872   Protein Science

   61          857   Journal of Clinical Investigation

   62          854   Yeast

   63          831   Neuron

   64          775   Plant and Cell Physiology

   65          764   The Journal of Experimental Medicine

   66          760   Human Genetics

   67          714   Journal of Medical Genetics

   68          700   Proteins

   69          696   The FEBS Journal

   70          693   PLoS Pathogens

   71          689   Nature Structural and Molecular Biology

   72          678   Mechanisms of Development

   73          652   Nature Structural Biology

   74          647   Nature Cell Biology

   75          632   Bioscience, Biotechnology, and Biochemistry

   76          596   Developmental Cell

   77          595   Current Genetics

   78          580   Antimicrobial Agents and Chemotherapy

   79          576   Journal of Neurochemistry

   80          554   Molecular Endocrinology

   81          551   The Journal of Clinical Endocrinology and Metabolism

   82          540   Endocrinology

   83          524   Molecular and Biochemical Parasitology

   84          517   Journal of the American Chemical Society

   85          516   Cell Reports

   86          496   Mammalian Genome

   87          492   Experimental Cell Research

   88          490   Eukaryotic Cell

   89          483   RNA

   90          477   Peptides

   91          473   Journal of Experimental Botany

   92          464   EMBO Reports

   93          461   The FASEB Journal

   94          457   Planta

   95          454   American Journal of Medical Genetics. Part A

   96          437   Molecular Pharmacology

   97          434   Immunogenetics

   98          431   Acta Crystallographica, Section F

   99          422   Molecular Biology and Evolution

  100          420   European Journal of Human Genetics

  101          416   Molecular Plant-Microbe Interactions

  102          413   Immunity

  103          407   Clinical Genetics

  104          407   Journal of Molecular Evolution

  105          405   Journal of Investigative Dermatology

  106          396   DNA and Cell Biology

  107          395   Neurology

  108          386   Biochimie

  109          381   DNA Sequence

  110          380   

  111          380   Biology of Reproduction

  112          374   Comparative Biochemistry and Physiology

  113          362   Virus Research

  114          358   Genes to Cells

  115          348   Journal of Lipid Research

  116          346   Nature Immunology

  117          344   Developmental Dynamics

  118          342   Brain Research. Molecular Brain Research

  119          341   The New England Journal of Medicine

  120          341   PLoS Biology

  121          338   Applied Microbiology and Biotechnology

  122          335   Annals of Neurology

  123          329   BMC Genomics

  124          327   Journal of Medicinal Chemistry

  125          316   European Journal of Immunology

  126          314   Genome Research

  127          308   Investigative Ophthalmology and Visual Science

  128          301   Journal of Human Genetics

  129          299   Biological Chemistry Hoppe-Seyler

  130          287   Glycobiology

  131          282   Journal of General Microbiology

  132          281   Cytogenetics and Cell Genetics

  133          279   Archives of Microbiology

  134          276   Nature Chemical Biology

  135          274   Brain

  136          262   Traffic

  137          260   Phytochemistry

  138          259   Nature Medicine

  139          259   Molecular Genetics and Metabolism

  140          258   Protein Expression and Purification

  141          256   Molecular Immunology

  142          251   Fungal Genetics and Biology

  143          249   Journal of Cellular Biochemistry

  144          245   Cell Cycle

  145          239   Circulation Research

  146          237   Cell Research

  147          234   DNA Research

  148          233   Diabetes

  149          228   New Phytologist

  150          227   Archives of Virology





5.  STATISTICS FOR SOME LINE TYPES



The following table summarizes the total number of some UniProtKB/Swiss-Prot lines,

as well as the number of entries with at least one such line, and the

frequency of the lines.



                                      Total    Number of  Average

   Line type / subtype                number   entries    per entry

------------------------------------  -------- ---------  ---------



References (RL)                      1313260                 2.30                                         

   Journal                           1141013     475625      2.00       1                                 

   Submitted to EMBL/GenBank/DDBJ     160717     144836      0.28       2                                 

   Submitted to other databases         7808       7138      0.01       3                                 

   Book citation                        1876       1853     <0.01       4                                 

   Plant Gene Register                   613        600     <0.01       5                                 

   Unpublished observations              536        532     <0.01       6                                 

   Thesis                                477        474     <0.01       7                                 

   Patent                                214        207     <0.01       8                                 

   Worm Breeder's Gazette                  6          6     <0.01       9                                 



Total number of distinct authors cited in UniProtKB/Swiss-Prot: 471457



                                      Total    Number of  Average

   Line type / subtype                number   entries    per entry  Rank

------------------------------------  -------- ---------  ---------  ----

Comments (CC)                        2750030                 4.81                                         

   ACTIVITY REGULATION                 18060      17938      0.03      17                                 

   ALLERGEN                              951        951     <0.01      26                                 

   ALTERNATIVE PRODUCTS                25922      25922      0.05      13                                 

   BIOPHYSICOCHEMICAL PROPERTIES       11618      11568      0.02      20                                 

   BIOTECHNOLOGY                        2059       1999     <0.01      24                                 

   CATALYTIC ACTIVITY                 342896     255402      0.60       4                                 

   CAUTION                             14428      14128      0.03      19                                 

   COFACTOR                           133541     121179      0.23       7                                 

   DEVELOPMENTAL STAGE                 14469      14364      0.03      18                                 

   DISEASE                              8410       5661      0.01      21                                 

   DISRUPTION PHENOTYPE                21218      21175      0.04      16                                 

   DOMAIN                              59416      50510      0.10       9                                 

   FUNCTION                           492691     468150      0.86       2                                 

   INDUCTION                           25884      25786      0.05      14                                 

   INTERACTION                         24252      24252      0.04      15                                 

   MASS SPECTROMETRY                    7571       5850      0.01      22                                 

   MISCELLANEOUS                       46265      40674      0.08      11                                 

   PATHWAY                            143876     129887      0.25       6                                 

   PHARMACEUTICAL                        171        164     <0.01      29                                 

   POLYMORPHISM                         1508       1380     <0.01      25                                 

   PTM                                 65304      46385      0.11       8                                 

   RNA EDITING                           637        637     <0.01      28                                 

   SEQUENCE CAUTION                    45270      45199      0.08      12                                 

   SIMILARITY                         520047     515709      0.91       1                                 

   SUBCELLULAR LOCATION               365919     357305      0.64       3                                 

   SUBUNIT                            298969     293500      0.52       5                                 

   TISSUE SPECIFICITY                  51291      50669      0.09      10                                 

   TOXIC DOSE                            862        690     <0.01      27                                 

   WEB RESOURCE                         6525       5534      0.01      23                                 



Total number of comment topics: 29





                                      Total    Number of  Average

   Line type / subtype                number   entries    per entry  Rank

------------------------------------  -------- ---------  ---------  ----

Features (FT)                        5386957                 9.42                                         

   ACT_SITE                           176799     105470      0.31       9                                 

   BINDING                           1232736     218591      2.16       1                                 

   CARBOHYD                           124395      31682      0.22      14                                 

   CHAIN                              580021     563953      1.01       2                                 

   COILED                              22579      15608      0.04      25                                 

   COMPBIAS                           174992      74318      0.31      10                                 

   CONFLICT                           139291      48543      0.24      12                                 

   CROSSLNK                            25275       9069      0.04      24                                 

   DISULFID                           136616      36445      0.24      13                                 

   DNA_BIND                            12204      10927      0.02      31                                 

   DOMAIN                             217159     133103      0.38       8                                 

   HELIX                              343708      29756      0.60       5                                 

   INIT_MET                            17602      17553      0.03      26                                 

   INTRAMEM                             3088       1425      0.01      34                                 

   LIPID                               13902       8912      0.02      28                                 

   MOD_RES                            263138      74750      0.46       7                                 

   MOTIF                               47978      31235      0.08      21                                 

   MUTAGEN                             98693      20190      0.17      17                                 

   NON_CONS                             2662        833     <0.01      35                                 

   NON_STD                               358        283     <0.01      36                                 

   NON_TER                             12620       9699      0.02      30                                 

   PEPTIDE                             12634       8742      0.02      29                                 

   PROPEP                              15457      13204      0.03      27                                 

   REGION                             322342     149951      0.56       6                                 

   REPEAT                             109473      15203      0.19      15                                 

   SIGNAL                              44514      44513      0.08      22                                 

   SITE                                65401      35476      0.11      19                                 

   STRAND                             350234      28024      0.61       4                                 

   TOPO_DOM                           151793      30623      0.27      11                                 

   TRANSIT                              9569       9449      0.02      32                                 

   TRANSMEM                           382391      80097      0.67       3                                 

   TURN                                83048      24266      0.15      18                                 

   UNSURE                               5758        898      0.01      33                                 

   VAR_SEQ                             53297      22683      0.09      20                                 

   VARIANT                            104413      17532      0.18      16                                 

   ZN_FING                             30817      13157      0.05      23                                 



Total number of feature keys: 36







                                      Total    Number of  Average

   Line type / subtype                number   entries    per entry  Rank      Category

------------------------------------  -------- ---------  ---------  ----      -------------------------------------------

Cross-references (DR)               20534995                35.92                                                           

   ABCD                                 3126       3126      0.01     122      Protocols and materials databases            

   AGR                                 69304      68573      0.12      41      Organism-specific databases                  

   Allergome                            2041       1312     <0.01     130      Protein family/group databases               

   AlphaFoldDB                        547258     547258      0.96       9      3D structure databases                       

   Antibodypedia                       32313      32204      0.06      61      Protocols and materials databases            

   ArachnoServer                        1148       1138     <0.01     140      Organism-specific databases                  

   Araport                             16406      16310      0.03      92      Organism-specific databases                  

   Bgee                                61668      61664      0.11      42      Gene expression databases                    

   BindingDB                            6662       6662      0.01     108      Chemistry databases                          

   BioCyc                              48132      44085      0.08      52      Enzyme and pathway databases                 

   BioGRID                             61480      59573      0.11      43      Protein-protein interaction databases        

   BioGRID-ORCS                        45012      44427      0.08      54      Miscellaneous databases                      

   BioMuta                             20308      20282      0.04      76      Genetic variation databases                  

   BMRB                                 6910       6910      0.01     106      3D structure databases                       

   BRENDA                              20394      18582      0.04      72      Enzyme and pathway databases                 

   CarbonylDB                           1159       1159     <0.01     139      PTM databases                                

   CAZy                                 9658       8693      0.02      99      Protein family/group databases               

   CCDS                                49650      34772      0.09      50      Sequence databases                           

   CDD                                383275     301271      0.67      15      Family and domain databases                  

   CGD                                  2105       2088     <0.01     129      Organism-specific databases                  

   ChEMBL                               9023       8835      0.02     100      Chemistry databases                          

   ChiTaRS                             29775      29730      0.05      63      Miscellaneous databases                      

   CLAE                                  360        357     <0.01     154      Protein family/group databases               

   CollecTF                              137        137     <0.01     160      Gene expression databases                    

   ComplexPortal                       15622       8213      0.03      95      Protein-protein interaction databases        

   ConoServer                            967        879     <0.01     142      Organism-specific databases                  

   CORUM                                5812       5812      0.01     109      Protein-protein interaction databases        

   CPTAC                                3472       1929      0.01     117      Proteomic databases                          

   CPTC                                  389        389     <0.01     151      Protocols and materials databases            

   CTD                                 74921      74050      0.13      38      Organism-specific databases                  

   DEPOD                                 254        254     <0.01     158      PTM databases                                

   dictyBase                            2034       2001     <0.01     132      Organism-specific databases                  

   DIP                                 17558      17517      0.03      89      Protein-protein interaction databases        

   DisGeNET                            17610      17412      0.03      88      Organism-specific databases                  

   DisProt                              1787       1781     <0.01     134      Family and domain databases                  

   DMDM                                16171      16170      0.03      94      Genetic variation databases                  

   DNASU                               48409      48331      0.08      51      Protocols and materials databases            

   DOSAC-COBS-2DPAGE                     145        145     <0.01     159      2D gel databases                             

   DrugBank                            31638       4785      0.06      62      Chemistry databases                          

   DrugCentral                          2982       2982      0.01     124      Chemistry databases                          

   EchoBASE                             4158       4158      0.01     116      Organism-specific databases                  

   eggNOG                             339542     333684      0.59      16      Phylogenomic databases                       

   ELM                                  1814       1814     <0.01     133      Protein-protein interaction databases        

   EMBL                              1006581     558797      1.76       4      Sequence databases                           

   EMDB                                74435       8520      0.13      39      3D structure databases                       

   Ensembl                            113548      49216      0.20      33      Genome annotation databases                  

   EnsemblBacteria                     55473      55295      0.10      46      Genome annotation databases                  

   EnsemblFungi                        23242      22793      0.04      69      Genome annotation databases                  

   EnsemblMetazoa                      19161      11638      0.03      82      Genome annotation databases                  

   EnsemblPlants                       43787      22485      0.08      56      Genome annotation databases                  

   EnsemblProtists                      5401       5146      0.01     112      Genome annotation databases                  

   EPD                                 23263      23263      0.04      68      Proteomic databases                          

   ESTHER                               3009       3006      0.01     123      Protein family/group databases               

   euHCVdb                                55         44     <0.01     163      Organism-specific databases                  

   EvolutionaryTrace                   16792      16792      0.03      91      Miscellaneous databases                      

   ExpressionAtlas                     53142      53142      0.09      48      Gene expression databases                    

   FlyBase                              4200       4085      0.01     115      Organism-specific databases                  

   Gene3D                             740048     459391      1.29       6      Family and domain databases                  

   GeneCards                           20379      20247      0.04      73      Organism-specific databases                  

   GeneID                             293930     284156      0.51      22      Genome annotation databases                  

   GeneReviews                          1609       1605     <0.01     135      Organism-specific databases                  

   GeneTree                            56253      56243      0.10      45      Phylogenomic databases                       

   GeneWiki                            10351      10269      0.02      98      Miscellaneous databases                      

   GenomeRNAi                          22314      22314      0.04      70      Miscellaneous databases                      

   GlyConnect                           2372       2215     <0.01     125      PTM databases                                

   GlyCosmos                           28906      28906      0.05      64      PTM databases                                

   GlyGen                              22246      22246      0.04      71      PTM databases                                

   GO                                3394774     551192      5.94       1      Ontologies                                   

   Gramene                             43787      22485      0.08      55      Genome annotation databases                  

   GuidetoPHARMACOLOGY                  2228       2228     <0.01     128      Chemistry databases                          

   HAMAP                              330934     327998      0.58      18      Family and domain databases                  

   HGNC                                20379      20250      0.04      74      Organism-specific databases                  

   HOGENOM                            427443     427443      0.75      14      Phylogenomic databases                       

   HPA                                 19354      19215      0.03      81      Organism-specific databases                  

   IDEAL                                1100       1100     <0.01     141      Family and domain databases                  

   IMGT_GENE-DB                          267        267     <0.01     157      Protein family/group databases               

   InParanoid                         163971     163971      0.29      25      Phylogenomic databases                       

   IntAct                              57520      57520      0.10      44      Protein-protein interaction databases        

   InterPro                          2434262     552668      4.26       2      Family and domain databases                  

   iPTMnet                             54159      54159      0.09      47      PTM databases                                

   JaponicusDB                            43         43     <0.01     165      Organism-specific databases                  

   jPOST                               26412      26412      0.05      65      Proteomic databases                          

   KEGG                               502892     477772      0.88      12      Genome annotation databases                  

   LegioList                             765        763     <0.01     146      Organism-specific databases                  

   Leproma                               672        669     <0.01     147      Organism-specific databases                  

   MaizeGDB                              529        525     <0.01     149      Organism-specific databases                  

   MalaCards                            5664       5655      0.01     111      Organism-specific databases                  

   MANE-Select                         18505      18392      0.03      85      Genome annotation databases                  

   MassIVE                             19141      19141      0.03      83      Proteomic databases                          

   MaxQB                               33727      33727      0.06      60      Proteomic databases                          

   MEROPS                              14212      13794      0.02      96      Protein family/group databases               

   MetOSite                             3455       3455      0.01     118      PTM databases                                

   MGI                                 17122      17081      0.03      90      Organism-specific databases                  

   MIM                                 23480      16174      0.04      67      Organism-specific databases                  

   MINT                                23915      23915      0.04      66      Protein-protein interaction databases        

   MoonDB                                348        348     <0.01     156      Protein family/group databases               

   MoonProt                              368        368     <0.01     153      Protein family/group databases               

   NCBIfam                            300578     277572      0.53      21      Family and domain databases                  

   neXtProt                            20321      20321      0.04      75      Organism-specific databases                  

   NIAGADS                                69         69     <0.01     162      Organism-specific databases                  

   OGP                                   373        373     <0.01     152      2D gel databases                             

   OMA                                119853     119853      0.21      31      Phylogenomic databases                       

   OpenTargets                         18546      18400      0.03      84      Organism-specific databases                  

   Orphanet                             8178       4418      0.01     102      Organism-specific databases                  

   OrthoDB                            275597     275597      0.48      23      Phylogenomic databases                       

   PANTHER                           1007676     504252      1.76       3      Family and domain databases                  

   PathwayCommons                      19451      19451      0.03      80      Enzyme and pathway databases                 

   PATRIC                              93057      93057      0.16      36      Genome annotation databases                  

   PaxDb                              153655     153655      0.27      26      Proteomic databases                          

   PCDDB                                 134        134     <0.01     161      3D structure databases                       

   PDB                                304238      35646      0.53      19      3D structure databases                       

   PDBsum                             304238      35646      0.53      20      3D structure databases                       

   PeptideAtlas                        39609      39609      0.07      59      Proteomic databases                          

   PeroxiBase                            792        771     <0.01     145      Protein family/group databases               

   Pfam                               840434     541304      1.47       5      Family and domain databases                  

   PharmGKB                            18032      18013      0.03      87      Organism-specific databases                  

   Pharos                              20221      20221      0.04      78      Miscellaneous databases                      

   PHI-base                             2350       1843     <0.01     126      Miscellaneous databases                      

   PhosphoSitePlus                     42165      42165      0.07      58      PTM databases                                

   PhylomeDB                          115608     115608      0.20      32      Phylogenomic databases                       

   PIR                                125141     114808      0.22      30      Sequence databases                           

   PIRSF                              111008     109839      0.19      34      Family and domain databases                  

   PlantReactome                        1320        771     <0.01     137      Enzyme and pathway databases                 

   PomBase                              5129       5125      0.01     113      Organism-specific databases                  

   PRIDE                                 637        637     <0.01     148      Proteomic databases                          

   PRINTS                             150910     129596      0.26      27      Family and domain databases                  

   PRO                                 98139      98138      0.17      35      Miscellaneous databases                      

   ProMEX                                489        489     <0.01     150      Proteomic databases                          

   PROSITE                            492874     311452      0.86      13      Family and domain databases                  

   Proteomes                          507990     463174      0.89      11      Miscellaneous databases                      

   ProteomicsDB                        72725      45385      0.13      40      Proteomic databases                          

   PseudoCAP                            2036       2036     <0.01     131      Organism-specific databases                  

   Pumba                               18207      18207      0.03      86      Proteomic databases                          

   Reactome                           144384      38493      0.25      28      Enzyme and pathway databases                 

   REBASE                                798        395     <0.01     144      Protein family/group databases               

   RefSeq                             597451     452034      1.05       8      Sequence databases                           

   REPRODUCTION-2DPAGE                  1260       1039     <0.01     138      2D gel databases                             

   RGD                                  8132       8131      0.01     103      Organism-specific databases                  

   RNAct                               43109      43109      0.08      57      Miscellaneous databases                      

   SABIO-RK                             5756       5756      0.01     110      Enzyme and pathway databases                 

   SASBDB                                891        891     <0.01     143      3D structure databases                       

   SFLD                                20288       9055      0.04      77      Family and domain databases                  

   SGD                                  6746       6741      0.01     107      Organism-specific databases                  

   SignaLink                           19957      19957      0.03      79      Enzyme and pathway databases                 

   SIGNOR                               7573       7573      0.01     104      Enzyme and pathway databases                 

   SMART                              205810     148514      0.36      24      Family and domain databases                  

   SMR                                518578     518578      0.91      10      3D structure databases                       

   STRING                             335982     335982      0.59      17      Protein-protein interaction databases        

   SUPFAM                             648751     459866      1.13       7      Family and domain databases                  

   SwissLipids                          1478       1394     <0.01     136      Chemistry databases                          

   SwissPalm                           13354      13354      0.02      97      PTM databases                                

   TAIR                                16396      16310      0.03      93      Organism-specific databases                  

   TCDB                                 8576       8490      0.02     101      Protein family/group databases               

   TopDownProteomics                    3236       2957      0.01     121      Proteomic databases                          

   TreeFam                             46266      46243      0.08      53      Phylogenomic databases                       

   TubercuList                          2330       2294     <0.01     127      Organism-specific databases                  

   UCSC                                50926      46461      0.09      49      Genome annotation databases                  

   UniLectin                             360        360     <0.01     155      Protein family/group databases               

   UniPathway                         139800     126160      0.24      29      Enzyme and pathway databases                 

   VEuPathDB                           82051      75305      0.14      37      Organism-specific databases                  

   VGNC                                 3433       3430      0.01     119      Organism-specific databases                  

   WBParaSite                             51         49     <0.01     164      Genome annotation databases                  

   WormBase                             6963       5076      0.01     105      Organism-specific databases                  

   Xenbase                              4749       4749      0.01     114      Organism-specific databases                  

   ZFIN                                 3266       3265      0.01     120      Organism-specific databases                  



Total number of cross-referenced databases: 165



6.  AMINO ACID COMPOSITION



   6.1  Composition in percent for the complete database



   Ala (A) 8.25   Gln (Q) 3.93   Leu (L) 9.65   Ser (S) 6.65

   Arg (R) 5.52   Glu (E) 6.72   Lys (K) 5.80   Thr (T) 5.36

   Asn (N) 4.06   Gly (G) 7.07   Met (M) 2.41   Trp (W) 1.10

   Asp (D) 5.46   His (H) 2.27   Phe (F) 3.86   Tyr (Y) 2.92

   Cys (C) 1.38   Ile (I) 5.91   Pro (P) 4.74   Val (V) 6.85



   Asx (B) 0.000  Glx (Z) 0.000  Xaa (X) 0.00



   



   Legend: gray = aliphatic, red = acidic, green = small hydroxy,

           blue = basic, black = aromatic, white = amide, yellow = sulfur





   6.2  Classification of the amino acids by their frequency



   Leu, Ala, Gly, Val, Glu, Ser, Ile, Lys, Arg, Asp, Thr, Pro, Asn, Gln,

   Phe, Tyr, Met, His, Cys, Trp





7.  MISCELLANEOUS STATISTICS



4467 entries are encoded on a mitochondrion, and 4013 are encoded on a plasmid.



12200 entries are encoded on a plastid, 

of which 22 are encoded on apicoplasts, 

11634 on chloroplasts, 

51 on organellar chromatophores,

145 on cyanelles, 

149 on non-photosynthetic plastids and 

199 on unspecified types of plastid.



Number of entries with at least one sequence correction: 81257