Expasy logo

Documents




         UniProtKB/Swiss-Prot protein knowledgebase release 2024_06 statistics





1.  INTRODUCTION



Release 2024_06 of 27-Nov-2024 of UniProtKB/Swiss-Prot contains 572619 sequence

entries, curated from 303020 unique references and comprising 207431389 amino acids. 



435 sequences have been added since release 2024_05, the sequence data of

42 existing entries has been updated and the annotations of

440420 entries have been revised.



Number of fragments: 9292

Number of additional sequences produced by alternative splicing, initiation or promoter usage, or ribosomal frameshifting: 41218





Protein existence (PE):           entries     %



1: Evidence at protein level       117598   20.5%

2: Evidence at transcript level     54363    9.5%

3: Inferred from homology          385971   67.4%

4: Predicted                        12873    2.2%

5: Uncertain                         1814    0.3%



The growth of the database is summarized below.



   





2.  TAXONOMIC ORIGIN



   Total number of species represented in this release of UniProtKB/Swiss-Prot: 14724



   The first twenty species represent 123153 sequences:  21.5 % of the total

   number of entries.





   2.1 Table of the frequency of occurrence of species



        Species represented 1x: 6006

                            2x: 2132

                            3x: 1152

                            4x:  783

                            5x:  542

                            6x:  448

                            7x:  326

                            8x:  281

                            9x:  241

                           10x:  163

                       11- 20x:  846

                       21- 50x:  515

                       51-100x:  229

                         >100x: 1060





   2.2  Table of the most represented species



  ------  ---------  --------------------------------------------

  Number  Frequency  Species

  ------  ---------  --------------------------------------------

       1      20421  Homo sapiens (Human)

       2      17229  Mus musculus (Mouse)

       3      16394  Arabidopsis thaliana (Mouse-ear cress)

       4       8207  Rattus norvegicus (Rat)

       5       6727  Saccharomyces cerevisiae (strain ATCC 204508 / S288c) (Baker's yeast)

       6       6048  Bos taurus (Bovine)

       7       5121  Schizosaccharomyces pombe (strain 972 / ATCC 24843) (Fission yeast)

       8       4530  Escherichia coli (strain K12)

       9       4487  Caenorhabditis elegans

      10       4191  Bacillus subtilis (strain 168)

      11       4190  Oryza sativa subsp. japonica (Rice)

      12       4160  Dictyostelium discoideum (Social amoeba)

      13       3803  Drosophila melanogaster (Fruit fly)

      14       3507  Xenopus laevis (African clawed frog)

      15       3348  Danio rerio (Zebrafish) (Brachydanio rerio)

      16       2318  Mycobacterium tuberculosis (strain ATCC 25618 / H37Rv)

      17       2309  Gallus gallus (Chicken)

      18       2218  Pongo abelii (Sumatran orangutan) (Pongo pygmaeus abelii)

      19       2046  Escherichia coli O157:H7

      20       1899  Mycobacterium tuberculosis (strain CDC 1551 / Oshkosh)

      21       1828  Salmonella typhimurium (strain LT2 / SGSC1412 / ATCC 700720)

      22       1787  Methanocaldococcus jannaschii  

      23       1711  Xenopus tropicalis (Western clawed frog) (Silurana tropicalis)

      24       1704  Haemophilus influenzae (strain ATCC 51907 / DSM 11121 / KW20 / Rd)

      25       1702  Escherichia coli O6:H1 (strain CFT073 / ATCC 700928 / UPEC)

      26       1696  Shigella flexneri

      27       1477  Pseudomonas aeruginosa 

      28       1459  Sus scrofa (Pig)

      29       1349  Salmonella typhi

      30       1244  Mycobacterium bovis (strain ATCC BAA-935 / AF2122/97)

      31       1176  Macaca fascicularis (Crab-eating macaque) (Cynomolgus monkey)

      32       1145  Candida albicans (strain SC5314 / ATCC MYA-2876) (Yeast)

      33       1103  Synechocystis sp. (strain ATCC 27184 / PCC 6803 / Kazusa)

      34       1038  Archaeoglobus fulgidus 

      35       1030  Yersinia pestis

      36       1017  Emericella nidulans  

      37        997  Vibrio cholerae serotype O1 (strain ATCC 39315 / El Tor Inaba N16961)

      38        978  Oryctolagus cuniculus (Rabbit)

      39        969  Neurospora crassa 

      40        942  Staphylococcus aureus (strain Mu50 / ATCC 700699)

      41        935  Aspergillus fumigatus (strain ATCC MYA-4609 / CBS 101355 / FGSC A1100 / Af293) 

      42        930  Salmonella paratyphi A (strain ATCC 9150 / SARB42)

      43        929  Staphylococcus aureus (strain N315)

      44        928  Eremothecium gossypii   

      45        919  Kluyveromyces lactis   

      46        909  Acanthamoeba polyphaga mimivirus (APMV)

      47        905  Staphylococcus aureus (strain COL)

      48        896  Staphylococcus aureus (strain MW2)

      49        894  Escherichia coli O6:K15:H31 (strain 536 / UPEC)

      50        890  Staphylococcus aureus (strain MSSA476)

      51        888  Rhizobium meliloti (strain 1021) (Ensifer meliloti) (Sinorhizobium meliloti)

      52        888  Candida glabrata   

      53        888  Staphylococcus aureus (strain MRSA252)

      54        882  Salmonella choleraesuis (strain SC-B67)

      55        879  Shigella sonnei (strain Ss046)

      56        873  Oryza sativa subsp. indica (Rice)

      57        863  Yersinia pseudotuberculosis serotype I (strain IP32953)

      58        854  Canis lupus familiaris (Dog) (Canis familiaris)

      59        850  Zea mays (Maize)

      60        847  Escherichia coli O9:H4 (strain HS)

      61        838  Escherichia coli O139:H28 (strain E24377A / ETEC)

      62        829  Shigella boydii serotype 4 (strain Sb227)

      63        825  Escherichia coli (strain UTI89 / UPEC)

      64        822  Escherichia coli 

      65        822  Shigella dysenteriae serotype 1 (strain Sd197)

      66        818  Streptomyces coelicolor (strain ATCC BAA-471 / A3(2) / M145)

      67        811  Staphylococcus aureus (strain NCTC 8325 / PS 47)

      68        804  Pectobacterium atrosepticum (strain SCRI 1043 / ATCC BAA-672) 

      69        796  Vibrio parahaemolyticus serotype O3:K6 (strain RIMD 2210633)

      70        791  Escherichia coli (strain SMS-3-5 / SECEC)

      71        788  Aquifex aeolicus (strain VF5)

      72        779  Escherichia coli O127:H6 (strain E2348/69 / EPEC)

      73        771  Escherichia coli (strain K12 / DH10B)

      74        770  Pasteurella multocida (strain Pm70)

      75        767  Escherichia coli O17:K52:H18 (strain UMN026 / ExPEC)

      76        765  Escherichia coli (strain K12 / MC4100 / BW2952)

      77        762  Escherichia coli (strain 55989 / EAEC)

      78        761  Escherichia coli O8 (strain IAI1)

      79        760  Staphylococcus epidermidis (strain ATCC 12228 / FDA PCI 1200)

      80        760  Shigella flexneri serotype 5b (strain 8401)

      81        760  Staphylococcus epidermidis 

      82        759  Escherichia coli O45:K1 (strain S88 / ExPEC)

      83        758  Bacillus anthracis

      84        756  Escherichia coli (strain SE11)

      85        753  Escherichia coli O7:K1 (strain IAI39 / ExPEC)

      86        749  Photorhabdus laumondii subsp. laumondii (strain DSM 15139 / CIP 105565 / TT01) 

      87        748  Escherichia coli O157:H7 (strain EC4115 / EHEC)

      88        744  Halalkalibacterium halodurans  

      89        739  Yersinia enterocolitica serotype O:8 / biotype 1B (strain NCTC 13174 / 8081)

      90        737  Pseudomonas putida 

      91        733  Vibrio vulnificus (strain CMCP6)

      92        731  Escherichia coli O81 (strain ED1a)

      93        724  Escherichia coli

      94        722  Salmonella enteritidis PT4 (strain P125109)

      95        718  Vibrio vulnificus (strain YJ016)

      96        716  Salmonella paratyphi B (strain ATCC BAA-1250 / SPB7)

      97        715  Escherichia coli O1:K1 / APEC

      98        715  Yersinia pestis bv. Antiqua (strain Nepal516)

      99        715  Enterobacter sp. (strain 638)

     100        715  Klebsiella pneumoniae subsp. pneumoniae (strain ATCC 700721 / MGH 78578)

     101        714  Salmonella paratyphi A (strain AKU_12601)

     102        713  Salmonella newport (strain SL254)

     103        713  Yersinia pseudotuberculosis serotype O:1b (strain IP 31758)

     104        713  Salmonella agona (strain SL483)

     105        712  Salmonella schwarzengrund (strain CVM19633)

     106        711  Yersinia pestis bv. Antiqua (strain Antiqua)

     107        710  Salmonella heidelberg (strain SL476)

     108        708  Nostoc sp. (strain PCC 7120 / SAG 25.82 / UTEX 2576)

     109        702  Salmonella dublin (strain CT_02021853)

     110        699  Klebsiella pneumoniae (strain 342)

     111        698  Shigella boydii serotype 18 (strain CDC 3083-94 / BS512)

     112        695  Escherichia fergusonii 

     113        692  Pan troglodytes (Chimpanzee)

     114        686  Mycoplasma pneumoniae (strain ATCC 29342 / M129 / Subtype 1) 

     115        684  Salmonella gallinarum (strain 287/91 / NCTC 13346)

     116        683  Pseudomonas syringae pv. tomato (strain ATCC BAA-871 / DC3000)

     117        679  Staphylococcus aureus (strain USA300)

     118        679  Citrobacter koseri (strain ATCC BAA-895 / CDC 4225-83 / SGSC4696)

     119        672  Serratia proteamaculans (strain 568)

     120        670  Bacillus cereus 

     121        669  Agrobacterium fabrum (strain C58 / ATCC 33970) (Agrobacterium tumefaciens 

     122        669  Mycobacterium leprae (strain TN)

     123        667  Bradyrhizobium diazoefficiens 

     124        667  Yarrowia lipolytica (strain CLIB 122 / E 150) (Yeast) (Candida lipolytica)

     125        667  Yersinia pestis (strain Pestoides F)

     126        663  Shewanella oneidensis 

     127        658  Sinorhizobium fredii (strain NBRC 101917 / NGR234)

     128        653  Debaryomyces hansenii   

     129        643  Staphylococcus aureus (strain bovine RF122 / ET3-1)

     130        642  Salmonella arizonae (strain ATCC BAA-731 / CDC346-86 / RSK2980)

     131        642  Yersinia pseudotuberculosis serotype O:3 (strain YPIII)

     132        634  Yersinia pseudotuberculosis serotype IB (strain PB1/+)

     133        623  Methanothermobacter thermautotrophicus  

     134        622  Treponema pallidum (strain Nichols)

     135        622  Cronobacter sakazakii (strain ATCC BAA-894) (Enterobacter sakazakii)

     136        622  Listeria monocytogenes serovar 1/2a (strain ATCC BAA-679 / EGD-e)

     137        620  Pseudomonas aeruginosa (strain UCBPP-PA14)

     138        615  Xanthomonas campestris pv. campestris 

     139        614  Staphylococcus haemolyticus (strain JCSC1435)

     140        613  Mesorhizobium japonicum  (Mesorhizobium loti 

     141        612  Helicobacter pylori (strain ATCC 700392 / 26695) (Campylobacter pylori)

     142        605  Listeria innocua serovar 6a (strain ATCC BAA-680 / CLIP 11262)

     143        604  Ralstonia nicotianae (strain ATCC BAA-1114 / GMI1000) (Ralstonia solanacearum)

     144        602  Staphylococcus saprophyticus subsp. saprophyticus 

     145        602  Photobacterium profundum (strain SS9)

     146        601  Salmonella paratyphi C (strain RKS4594)

     147        600  Yersinia pestis bv. Antiqua (strain Angola)

     148        595  Bacillus cereus (strain ATCC 10987 / NRS 248)

     149        591  Pectobacterium carotovorum subsp. carotovorum (strain PC1)

     150        591  Mycolicibacterium smegmatis (strain ATCC 700084 / mc(2)155) 

     151        588  Neisseria meningitidis serogroup B (strain ATCC BAA-335 / MC58)

     152        584  Rickettsia prowazekii (strain Madrid E)

     153        582  Caenorhabditis briggsae

     154        580  Aspergillus oryzae (strain ATCC 42149 / RIB 40) (Yellow koji mold)

     155        579  Brucella suis biovar 1 (strain 1330)

     156        576  Brucella melitensis biotype 1 

     157        575  Caulobacter vibrioides (strain ATCC 19089 / CIP 103742 / CB 15) 

     158        573  Aliivibrio fischeri (strain ATCC 700601 / ES114) (Vibrio fischeri)

     159        572  Buchnera aphidicola subsp. Acyrthosiphon pisum (strain APS) 

     160        569  Bacillus thuringiensis subsp. konkukian (strain 97-27)

     161        568  Pseudomonas syringae pv. syringae (strain B728a)

     162        568  Helicobacter pylori (strain J99 / ATCC 700824) (Campylobacter pylori J99)

     163        566  Bacillus licheniformis 

     164        566  Thermotoga maritima 

     165        562  Bacillus cereus (strain ZK / E33L)

     166        562  Buchnera aphidicola subsp. Schizaphis graminum (strain Sg)

     167        559  Clostridium acetobutylicum 

     168        557  Xanthomonas axonopodis pv. citri (strain 306)

     169        555  Pseudomonas fluorescens (strain Pf0-1)

     170        554  Neisseria meningitidis serogroup A / serotype 4A (strain DSM 15465 / Z2491)

     171        554  Pseudomonas fluorescens (strain ATCC BAA-477 / NRRL B-23932 / Pf-5)

     172        553  Oceanobacillus iheyensis 

     173        547  Pseudomonas savastanoi pv. phaseolicola  (Pseudomonas syringae pv. phaseolicola 

     174        543  Corynebacterium glutamicum 

     175        541  Lactococcus lactis subsp. lactis (strain IL1403) (Streptococcus lactis)

     176        531  Erwinia tasmaniensis 

     177        530  Bordetella bronchiseptica (strain ATCC BAA-588 / NCTC 13252 / RB50) 

     178        530  Listeria monocytogenes serotype 4b (strain F2365)

     179        529  Sodalis glossinidius (strain morsitans)

     180        524  Staphylococcus aureus (strain Newman)

     181        523  Vibrio cholerae serotype O1 (strain ATCC 39541 / Classical Ogawa 395 / O395)

     182        522  Xylella fastidiosa (strain 9a5c)

     183        522  Deinococcus radiodurans 

     184        519  Methanosarcina acetivorans (strain ATCC 35395 / DSM 2834 / JCM 12185 / C2A)

     185        519  Chromobacterium violaceum 

     186        519  Streptococcus pneumoniae serotype 4 (strain ATCC BAA-334 / TIGR4)

     187        516  Bordetella pertussis (strain Tohama I / ATCC BAA-589 / NCTC 13251)

     188        515  Xylella fastidiosa (strain Temecula1 / ATCC 700964)

     189        512  Haemophilus ducreyi (strain 35000HP / ATCC 700724)

     190        512  Geobacillus kaustophilus (strain HTA426)

     191        512  Pseudomonas aeruginosa (strain PA7)

     192        511  Acinetobacter baylyi (strain ATCC 33305 / BD413 / ADP1)

     193        511  Streptomyces avermitilis 

     194        508  Bordetella parapertussis (strain 12822 / ATCC BAA-587 / NCTC 13253)

     195        507  Solanum lycopersicum (Tomato) (Lycopersicon esculentum)

     196        507  Streptococcus pneumoniae (strain ATCC BAA-255 / R6)

     197        507  Buchnera aphidicola subsp. Baizongia pistaciae (strain Bp)

     198        506  Pyrococcus furiosus (strain ATCC 43587 / DSM 3638 / JCM 8422 / Vc1)

     199        505  Nicotiana tabacum (Common tobacco)

     200        504  Pseudomonas entomophila (strain L48)

     201        501  Methanosarcina mazei  

     202        499  Brucella abortus biovar 1 (strain 9-941)

     203        499  Haemophilus influenzae (strain 86-028NP)

     204        498  Thermosynechococcus vestitus (strain NIES-2133 / IAM M-273 / BP-1)

     205        497  Burkholderia pseudomallei (strain K96243)

     206        497  Proteus mirabilis (strain HI4320)

     207        497  Pyrococcus horikoshii 

     208        496  Rickettsia conorii (strain ATCC VR-613 / Malish 7)

     209        496  Shouchella clausii (strain KSM-K16) (Alkalihalobacillus clausii)

     210        496  Synechococcus elongatus (strain ATCC 33912 / PCC 7942 / FACHB-805) 

     211        494  Halobacterium salinarum (strain ATCC 700922 / JCM 11081 / NRC-1) 

     212        494  Xanthomonas campestris pv. campestris (strain 8004)

     213        492  Bacillus velezensis (strain DSM 23117 / BGSC 10A6 / LMG 26770 / FZB42) 

     214        492  Saccharolobus solfataricus (strain ATCC 35092 / DSM 1617 / JCM 11322 / P2) 

     215        492  Brucella abortus (strain 2308)

     216        491  Vibrio campbellii (strain ATCC BAA-1116)

     217        488  Shewanella sp. (strain MR-7)

     218        486  Mannheimia succiniciproducens (strain KCTC 0769BP / MBEL55E)

     219        485  Shewanella sp. (strain MR-4)

     220        484  Staphylococcus aureus (strain Mu3 / ATCC 700698)

     221        484  Pseudomonas aeruginosa (strain LESB58)

     222        483  Mycoplasma genitalium (strain ATCC 33530 / DSM 19775 / NCTC 10195 / G37) 

     223        483  Lactiplantibacillus plantarum (strain ATCC BAA-793 / NCIMB 8826 / WCFS1) 

     224        480  Pseudomonas putida 

     225        478  Pyrococcus abyssi (strain GE5 / Orsay)

     226        477  Cupriavidus necator  

     227        475  Campylobacter jejuni subsp. jejuni serotype O:2 

     228        475  Burkholderia lata 

     229        472  Enterococcus faecalis (strain ATCC 700802 / V583)

     230        472  Rhodopseudomonas palustris (strain ATCC BAA-98 / CGA009)

     231        470  Clostridium perfringens (strain 13 / Type A)

     232        470  Cereibacter sphaeroides  

     233        468  Shewanella sp. (strain ANA-3)

     234        468  Pseudomonas putida (strain GB-1)

     235        467  Shewanella frigidimarina (strain NCIMB 400)

     236        467  Aeromonas hydrophila subsp. hydrophila 

     237        466  Xanthomonas euvesicatoria pv. vesicatoria (strain 85-10) 

     238        465  Trichormus variabilis (strain ATCC 29413 / PCC 7937) (Anabaena variabilis)

     239        463  Burkholderia mallei (strain ATCC 23344)

     240        461  Ovis aries (Sheep)

     241        461  Cupriavidus pinatubonensis (strain JMP 134 / LMG 1197) (Cupriavidus necator 

     242        460  Methylococcus capsulatus (strain ATCC 33009 / NCIMB 11132 / Bath)

     243        457  Rickettsia felis (strain ATCC VR-1525 / URRWXCal2) (Rickettsia azadi)

     244        455  Staphylococcus aureus (strain JH1)

     245        455  Xanthomonas oryzae pv. oryzae (strain MAFF 311018)

     246        455  Shewanella baltica (strain OS185)

     247        453  Mycolicibacterium paratuberculosis (strain ATCC BAA-968 / K-10) 

     248        453  Streptococcus mutans serotype c (strain ATCC 700610 / UA159)

     249        453  Pseudomonas putida (strain W619)

     250        452  Caldanaerobacter subterraneus subsp. tengcongensis  





   

   2.3  Taxonomic distribution of the sequences



   



   Kingdom        sequences (% of the database)

    Archaea           19778 (  3%)

    Bacteria         336651 ( 59%)

    Eukaryota        198746 ( 35%)

    Viruses           17444 (  3%)





   Within Eukaryota:



   



    Category            sequences (% of Eukaryota) (% of the complete database)

     Human                  20422 ( 10%)           (  4%)

     Other Mammalia         47473 ( 24%)           (  8%)

     Other Vertebrata       19004 ( 10%)           (  3%)

     Viridiplantae          41866 ( 21%)           (  7%)

     Fungi                  37303 ( 19%)           (  7%)

     Insecta                 9959 (  5%)           (  2%)

     Nematoda                5405 (  3%)           (  1%)

     Other                  17314 (  9%)           (  3%)







3.  SEQUENCE SIZE



   Repartition of the sequences by size (excluding fragments)



               From   To  Number             From   To   Number

                  1-  50    9983             1001-1100     4145

                 51- 100   43651             1101-1200     2915

                101- 150   59938             1201-1300     2227

                151- 200   59708             1301-1400     2090

                201- 250   58639             1401-1500     1693

                251- 300   52615             1501-1600      841

                301- 350   53063             1601-1700      649

                351- 400   46103             1701-1800      600

                401- 450   37829             1801-1900      529

                451- 500   30725             1901-2000      402

                501- 550   22445             2001-2100      277

                551- 600   15909             2101-2200      390

                601- 650   13219             2201-2300      342

                651- 700    9444             2301-2400      239

                701- 750    7915             2401-2500      197

                751- 800    5729             >2500         1484

                801- 850    4903

                851- 900    5333

                901- 950    4135

                951-1000    3021



   





   The average sequence length in UniProtKB/Swiss-Prot is 362 amino acids.



   The shortest sequence is   GWA_SEPOF (P83570):     2 amino acids.

   The longest sequence is  TITIN_MOUSE (A2ASS6): 35213 amino acids.





4.  JOURNAL CITATIONS



   Note: the following citation statistics reflect the number of distinct

         journal citations.



   Total number of journals cited in this release of UniProtKB/Swiss-Prot: 3189





   4.1 Table of the frequency of journal citations



        Journals cited 1x: 1009

                       2x:  428

                       3x:  222

                       4x:  151

                       5x:  126

                       6x:   90

                       7x:   65

                       8x:   79

                       9x:   48

                      10x:   45

                  11- 20x:  245

                  21- 50x:  274

                  51-100x:  144

                    >100x:  263





   4.2  List of the most cited journals in UniProtKB/Swiss-Prot



   Nb    Citations   Journal name

   --    ---------   -------------------------------------------------------------

    1        27476   Journal of Biological Chemistry

    2        12904   Proceedings of the National Academy of Sciences of the U.S.A.

    3         7271   Journal of Bacteriology

    4         6118   Biochemical and Biophysical Research Communications

    5         5939   Biochemistry

    6         5385   Nucleic Acids Research

    7         5255   Nature

    8         5123   FEBS Letters

    9         5038   The EMBO Journal

   10         4898   Gene

   11         4633   Journal of Molecular Biology

   12         4604   Molecular and Cellular Biology

   13         4070   Biochimica et Biophysica Acta

   14         3932   Cell

   15         3660   Journal of Virology

   16         3521   European Journal of Biochemistry

   17         3433   Science

   18         3204   Biochemical Journal

   19         2897   Molecular Microbiology

   20         2828   Plant Physiology

   21         2697   PLoS ONE

   22         2547   Genomics

   23         2458   The American Journal of Human Genetics

   24         2394   Journal of Cell Biology

   25         2206   The Plant Cell

   26         2048   Human Molecular Genetics

   27         2042   The Plant Journal

   28         1966   Genes and Development

   29         1928   Plant Molecular Biology

   30         1926   Virology

   31         1898   Molecular Cell

   32         1863   Nature Genetics

   33         1848   Molecular Biology of the Cell

   34         1834   Development

   35         1724   Journal of Immunology

   36         1669   Human Mutation

   37         1574   Oncogene

   38         1497   Structure

   39         1437   Molecular and General Genetics

   40         1436   Journal of Biochemistry

   41         1431   Genetics

   42         1430   Nature Communications

   43         1412   Journal of Cell Science

   44         1309   Blood

   45         1284   Infection and Immunity

   46         1199   Microbiology

   47         1196   Developmental Biology

   48         1194   Journal of General Virology

   49         1165   Current Biology

   50         1163   Archives of Biochemistry and Biophysics

   51         1051   Applied and Environmental Microbiology

   52         1051   Journal of Neuroscience

   53         1000   Acta Crystallographica, Section D

   54          988   Scientific Reports

   55          936   FEMS Microbiology Letters

   56          936   PLoS Genetics

   57          932   Cancer Research

   58          899   American Journal of Physiology

   59          891   Toxicon

   60          877   Protein Science

   61          868   Journal of Clinical Investigation

   62          855   Yeast

   63          845   Neuron

   64          776   Plant and Cell Physiology

   65          776   The Journal of Experimental Medicine

   66          766   Human Genetics

   67          726   Journal of Medical Genetics

   68          720   PLoS Pathogens

   69          719   Nature Structural and Molecular Biology

   70          708   The FEBS Journal

   71          703   Proteins

   72          679   Mechanisms of Development

   73          664   Nature Cell Biology

   74          654   Nature Structural Biology

   75          638   Bioscience, Biotechnology, and Biochemistry

   76          615   Antimicrobial Agents and Chemotherapy

   77          608   Developmental Cell

   78          600   Current Genetics

   79          577   Journal of Neurochemistry

   80          557   Molecular Endocrinology

   81          553   The Journal of Clinical Endocrinology and Metabolism

   82          546   Cell Reports

   83          543   Endocrinology

   84          534   Journal of the American Chemical Society

   85          528   Molecular and Biochemical Parasitology

   86          497   Eukaryotic Cell

   87          496   Experimental Cell Research

   88          496   Mammalian Genome

   89          486   RNA

   90          478   EMBO Reports

   91          477   Peptides

   92          473   Journal of Experimental Botany

   93          471   The FASEB Journal

   94          467   American Journal of Medical Genetics. Part A

   95          460   Planta

   96          440   Molecular Pharmacology

   97          436   Acta Crystallographica, Section F

   98          434   Immunogenetics

   99          432   European Journal of Human Genetics

  100          428   

  101          425   Clinical Genetics

  102          423   Molecular Biology and Evolution

  103          421   Immunity

  104          420   Molecular Plant-Microbe Interactions

  105          418   Journal of Investigative Dermatology

  106          407   Journal of Molecular Evolution

  107          398   Neurology

  108          397   DNA and Cell Biology

  109          390   Biochimie

  110          382   Biology of Reproduction

  111          381   DNA Sequence

  112          375   Comparative Biochemistry and Physiology

  113          362   Virus Research

  114          360   Genes to Cells

  115          356   Nature Immunology

  116          349   Journal of Lipid Research

  117          348   Applied Microbiology and Biotechnology

  118          347   PLoS Biology

  119          346   Developmental Dynamics

  120          342   Brain Research. Molecular Brain Research

  121          342   The New England Journal of Medicine

  122          338   Annals of Neurology

  123          338   Journal of Medicinal Chemistry

  124          335   BMC Genomics

  125          323   European Journal of Immunology

  126          319   Genome Research

  127          308   Investigative Ophthalmology and Visual Science

  128          305   Journal of Human Genetics

  129          299   Biological Chemistry Hoppe-Seyler

  130          290   Glycobiology

  131          289   Nature Chemical Biology

  132          286   Brain

  133          283   Archives of Microbiology

  134          282   Journal of General Microbiology

  135          281   Cytogenetics and Cell Genetics

  136          265   Traffic

  137          263   Fungal Genetics and Biology

  138          262   Phytochemistry

  139          261   Nature Medicine

  140          260   Molecular Genetics and Metabolism

  141          259   Protein Expression and Purification

  142          258   Molecular Immunology

  143          250   Journal of Cellular Biochemistry

  144          247   Cell Research

  145          246   Cell Cycle

  146          243   Circulation Research

  147          237   Diabetes

  148          234   DNA Research

  149          230   Chemistry and Biology

  150          230   Insect Biochemistry and Molecular Biology





5.  STATISTICS FOR SOME LINE TYPES



The following table summarizes the total number of some UniProtKB/Swiss-Prot lines,

as well as the number of entries with at least one such line, and the

frequency of the lines.



                                      Total    Number of  Average

   Line type / subtype                number   entries    per entry

------------------------------------  -------- ---------  ---------



References (RL)                      1321287                 2.31                                         

   Journal                           1149403     477298      2.01       1                                 

   Submitted to EMBL/GenBank/DDBJ     160311     144404      0.28       2                                 

   Submitted to other databases         7850       7171      0.01       3                                 

   Book citation                        1876       1853     <0.01       4                                 

   Plant Gene Register                   613        600     <0.01       5                                 

   Unpublished observations              536        532     <0.01       6                                 

   Thesis                                478        475     <0.01       7                                 

   Patent                                214        207     <0.01       8                                 

   Worm Breeder's Gazette                  6          6     <0.01       9                                 



Total number of distinct authors cited in UniProtKB/Swiss-Prot: 477612



                                      Total    Number of  Average

   Line type / subtype                number   entries    per entry  Rank

------------------------------------  -------- ---------  ---------  ----

Comments (CC)                        2765031                 4.83                                         

   ACTIVITY REGULATION                 18564      18437      0.03      17                                 

   ALLERGEN                              954        954     <0.01      26                                 

   ALTERNATIVE PRODUCTS                25952      25952      0.05      14                                 

   BIOPHYSICOCHEMICAL PROPERTIES       11901      11849      0.02      20                                 

   BIOTECHNOLOGY                        2168       2107     <0.01      24                                 

   CATALYTIC ACTIVITY                 346628     256829      0.61       4                                 

   CAUTION                             14504      14199      0.03      19                                 

   COFACTOR                           134048     121660      0.23       7                                 

   DEVELOPMENTAL STAGE                 14580      14468      0.03      18                                 

   DISEASE                              8527       5728      0.01      21                                 

   DISRUPTION PHENOTYPE                21919      21861      0.04      16                                 

   DOMAIN                              60438      51186      0.11       9                                 

   FUNCTION                           494930     469889      0.86       2                                 

   INDUCTION                           26244      26138      0.05      13                                 

   INTERACTION                         24903      24903      0.04      15                                 

   MASS SPECTROMETRY                    7592       5869      0.01      22                                 

   MISCELLANEOUS                       46467      40866      0.08      11                                 

   PATHWAY                            144337     130332      0.25       6                                 

   PHARMACEUTICAL                        171        164     <0.01      29                                 

   POLYMORPHISM                         1510       1382     <0.01      25                                 

   PTM                                 66029      46750      0.12       8                                 

   RNA EDITING                           638        638     <0.01      28                                 

   SEQUENCE CAUTION                    45311      45240      0.08      12                                 

   SIMILARITY                         521064     516697      0.91       1                                 

   SUBCELLULAR LOCATION               367156     358398      0.64       3                                 

   SUBUNIT                            300327     294704      0.52       5                                 

   TISSUE SPECIFICITY                  51670      50985      0.09      10                                 

   TOXIC DOSE                            881        704     <0.01      27                                 

   WEB RESOURCE                         5618       5033      0.01      23                                 



Total number of comment topics: 29





                                      Total    Number of  Average

   Line type / subtype                number   entries    per entry  Rank

------------------------------------  -------- ---------  ---------  ----

Features (FT)                        5446911                 9.51                                         

   ACT_SITE                           177453     105799      0.31       9                                 

   BINDING                           1250414     219596      2.18       1                                 

   CARBOHYD                           125195      31868      0.22      14                                 

   CHAIN                              581116     564940      1.01       2                                 

   COILED                              22637      15660      0.04      25                                 

   COMPBIAS                           175737      74628      0.31      10                                 

   CONFLICT                           139575      48630      0.24      12                                 

   CROSSLNK                            25429       9120      0.04      24                                 

   DISULFID                           137354      36612      0.24      13                                 

   DNA_BIND                            12218      10940      0.02      31                                 

   DOMAIN                             218181     133514      0.38       8                                 

   HELIX                              355513      30537      0.62       5                                 

   INIT_MET                            17598      17549      0.03      26                                 

   INTRAMEM                             3146       1464      0.01      34                                 

   LIPID                               13916       8913      0.02      28                                 

   MOD_RES                            263902      74894      0.46       7                                 

   MOTIF                               48412      31519      0.08      21                                 

   MUTAGEN                            102049      20648      0.18      17                                 

   NON_CONS                             2662        833     <0.01      35                                 

   NON_STD                               360        285     <0.01      36                                 

   NON_TER                             12613       9696      0.02      30                                 

   PEPTIDE                             12657       8765      0.02      29                                 

   PROPEP                              15489      13235      0.03      27                                 

   REGION                             324335     150583      0.57       6                                 

   REPEAT                             109718      15238      0.19      15                                 

   SIGNAL                              44737      44736      0.08      22                                 

   SITE                                65893      35649      0.12      19                                 

   STRAND                             361018      28761      0.63       4                                 

   TOPO_DOM                           152862      30789      0.27      11                                 

   TRANSIT                              9620       9497      0.02      32                                 

   TRANSMEM                           383801      80381      0.67       3                                 

   TURN                                86009      24938      0.15      18                                 

   UNSURE                               5763        900      0.01      33                                 

   VAR_SEQ                             53352      22713      0.09      20                                 

   VARIANT                            105326      17576      0.18      16                                 

   ZN_FING                             30851      13182      0.05      23                                 



Total number of feature keys: 36







                                      Total    Number of  Average

   Line type / subtype                number   entries    per entry  Rank      Category

------------------------------------  -------- ---------  ---------  ----      -------------------------------------------

Cross-references (DR)               21183251                36.99                                                           

   ABCD                                 3130       3130      0.01     122      Protocols and materials databases            

   AGR                                 69200      68513      0.12      42      Organism-specific databases                  

   Allergome                            2044       1314     <0.01     131      Protein family/group databases               

   AlphaFoldDB                        548003     548003      0.96      10      3D structure databases                       

   Antibodypedia                       32305      32196      0.06      61      Protocols and materials databases            

   AntiFam                                20         20     <0.01     163      Family and domain databases                  

   ArachnoServer                        1148       1138     <0.01     139      Organism-specific databases                  

   Araport                             16414      16318      0.03      92      Organism-specific databases                  

   Bgee                                61779      61775      0.11      44      Gene expression databases                    

   BindingDB                            6662       6662      0.01     107      Chemistry databases                          

   BioCyc                              48194      44143      0.08      53      Enzyme and pathway databases                 

   BioGRID                             61804      59841      0.11      43      Protein-protein interaction databases        

   BioGRID-ORCS                        45050      44464      0.08      55      Miscellaneous databases                      

   BioMuta                             20295      20268      0.04      78      Genetic variation databases                  

   BMRB                                 6912       6912      0.01     105      3D structure databases                       

   BRENDA                              20438      18626      0.04      73      Enzyme and pathway databases                 

   CarbonylDB                           1159       1159     <0.01     138      PTM databases                                

   CAZy                                 9703       8735      0.02      98      Protein family/group databases               

   CCDS                                49696      34799      0.09      51      Sequence databases                           

   CDD                                383784     301706      0.67      16      Family and domain databases                  

   CGD                                  2106       2089     <0.01     129      Organism-specific databases                  

   ChEMBL                               9144       8954      0.02      99      Chemistry databases                          

   ChiTaRS                             29789      29744      0.05      63      Miscellaneous databases                      

   CollecTF                              137        137     <0.01     157      Gene expression databases                    

   ComplexPortal                       17040       8686      0.03      91      Protein-protein interaction databases        

   ConoServer                            967        879     <0.01     141      Organism-specific databases                  

   CORUM                                5812       5812      0.01     108      Protein-protein interaction databases        

   CPTAC                                3472       1929      0.01     117      Proteomic databases                          

   CPTC                                  392        392     <0.01     150      Protocols and materials databases            

   CTD                                 76195      75326      0.13      40      Organism-specific databases                  

   DEPOD                                 254        254     <0.01     156      PTM databases                                

   dictyBase                            4225       4111      0.01     114      Organism-specific databases                  

   DIP                                 17565      17524      0.03      89      Protein-protein interaction databases        

   DisGeNET                            17608      17410      0.03      88      Organism-specific databases                  

   DisProt                              1772       1766     <0.01     133      Family and domain databases                  

   DMDM                                16168      16167      0.03      94      Genetic variation databases                  

   DNASU                               48450      48371      0.08      52      Protocols and materials databases            

   DrugBank                            31669       4787      0.06      62      Chemistry databases                          

   DrugCentral                          2982       2982      0.01     124      Chemistry databases                          

   EchoBASE                             4158       4158      0.01     115      Organism-specific databases                  

   eggNOG                             339868     333998      0.59      17      Phylogenomic databases                       

   ELM                                  1814       1814     <0.01     132      Protein-protein interaction databases        

   EMBL                              1008020     559745      1.76       4      Sequence databases                           

   EMDB                               102185       9780      0.18      36      3D structure databases                       

   Ensembl                            116642      51241      0.20      33      Genome annotation databases                  

   EnsemblBacteria                     55500      55322      0.10      48      Genome annotation databases                  

   EnsemblFungi                        23323      22874      0.04      69      Genome annotation databases                  

   EnsemblMetazoa                      21112      11819      0.04      72      Genome annotation databases                  

   EnsemblPlants                       43805      22489      0.08      56      Genome annotation databases                  

   EnsemblProtists                      5417       5161      0.01     111      Genome annotation databases                  

   ESTHER                               3014       3011      0.01     123      Protein family/group databases               

   euHCVdb                                55         44     <0.01     161      Organism-specific databases                  

   EvolutionaryTrace                   22656      22656      0.04      70      Miscellaneous databases                      

   ExpressionAtlas                     51114      51114      0.09      49      Gene expression databases                    

   FlyBase                              3933       3824      0.01     116      Organism-specific databases                  

   FunFam                             557539     326990      0.97       9      Family and domain databases                  

   Gene3D                             741215     460080      1.29       6      Family and domain databases                  

   GeneCards                           20374      20244      0.04      75      Organism-specific databases                  

   GeneID                             294083     284397      0.51      23      Genome annotation databases                  

   GeneReviews                          1609       1605     <0.01     134      Organism-specific databases                  

   GeneTree                            56345      56333      0.10      47      Phylogenomic databases                       

   GeneWiki                            10351      10269      0.02      97      Miscellaneous databases                      

   GenomeRNAi                          22332      22331      0.04      71      Miscellaneous databases                      

   GlyConnect                           2372       2215     <0.01     126      PTM databases                                

   GlyCosmos                           28907      28907      0.05      64      PTM databases                                

   GlyGen                              25595      25595      0.04      66      PTM databases                                

   GO                                3280483     551635      5.73       1      Ontologies                                   

   Gramene                             43805      22489      0.08      57      Genome annotation databases                  

   GuidetoPHARMACOLOGY                  2270       2270     <0.01     128      Chemistry databases                          

   HAMAP                              330971     328034      0.58      19      Family and domain databases                  

   HGNC                                20374      20246      0.04      74      Organism-specific databases                  

   HOGENOM                            427856     427856      0.75      15      Phylogenomic databases                       

   HPA                                 19354      19215      0.03      82      Organism-specific databases                  

   IDEAL                                1101       1101     <0.01     140      Family and domain databases                  

   IMGT_GENE-DB                          267        267     <0.01     155      Protein family/group databases               

   InParanoid                         164186     164186      0.29      26      Phylogenomic databases                       

   IntAct                              58226      58226      0.10      45      Protein-protein interaction databases        

   InterPro                          2560496     554318      4.47       2      Family and domain databases                  

   iPTMnet                             56773      56773      0.10      46      PTM databases                                

   JaponicusDB                            43         43     <0.01     162      Organism-specific databases                  

   jPOST                               26412      26412      0.05      65      Proteomic databases                          

   KEGG                               514523     476053      0.90      12      Genome annotation databases                  

   LegioList                             765        763     <0.01     145      Organism-specific databases                  

   Leproma                               672        669     <0.01     146      Organism-specific databases                  

   MaizeGDB                              529        525     <0.01     148      Organism-specific databases                  

   MalaCards                            5693       5684      0.01     110      Organism-specific databases                  

   MANE-Select                         18530      18417      0.03      85      Genome annotation databases                  

   MassIVE                             19138      19138      0.03      83      Proteomic databases                          

   MEROPS                              14242      13823      0.02      95      Protein family/group databases               

   MetOSite                             3456       3456      0.01     118      PTM databases                                

   MGI                                 17142      17101      0.03      90      Organism-specific databases                  

   MIM                                 23715      16305      0.04      68      Organism-specific databases                  

   MINT                                24101      24101      0.04      67      Protein-protein interaction databases        

   MoonDB                                348        348     <0.01     154      Protein family/group databases               

   MoonProt                              368        368     <0.01     152      Protein family/group databases               

   NCBIfam                            302221     278846      0.53      22      Family and domain databases                  

   neXtProt                            20307      20306      0.04      77      Organism-specific databases                  

   NIAGADS                                69         69     <0.01     159      Organism-specific databases                  

   OGP                                   373        373     <0.01     151      2D gel databases                             

   OMA                                120138     120138      0.21      32      Phylogenomic databases                       

   OpenTargets                         18559      18414      0.03      84      Organism-specific databases                  

   Orphanet                             7996       4386      0.01     102      Organism-specific databases                  

   OrthoDB                            276097     276097      0.48      24      Phylogenomic databases                       

   PANTHER                           1009161     504997      1.76       3      Family and domain databases                  

   PathwayCommons                      19440      19440      0.03      81      Enzyme and pathway databases                 

   PATRIC                              93140      93140      0.16      38      Genome annotation databases                  

   PaxDb                              153859     153859      0.27      27      Proteomic databases                          

   PCDDB                                 133        133     <0.01     158      3D structure databases                       

   PDB                                322825      36568      0.56      20      3D structure databases                       

   PDBsum                             322825      36568      0.56      21      3D structure databases                       

   PeptideAtlas                        39648      39648      0.07      60      Proteomic databases                          

   PeroxiBase                            793        772     <0.01     144      Protein family/group databases               

   Pfam                               854227     543616      1.49       5      Family and domain databases                  

   PharmGKB                            18032      18013      0.03      87      Organism-specific databases                  

   Pharos                              20206      20206      0.04      79      Miscellaneous databases                      

   PHI-base                             2416       1902     <0.01     125      Miscellaneous databases                      

   PhosphoSitePlus                     42170      42170      0.07      59      PTM databases                                

   PhylomeDB                          115685     115685      0.20      34      Phylogenomic databases                       

   PIR                                125204     114868      0.22      31      Sequence databases                           

   PIRSF                              111072     109903      0.19      35      Family and domain databases                  

   PlantReactome                        1320        771     <0.01     136      Enzyme and pathway databases                 

   PomBase                              5129       5125      0.01     112      Organism-specific databases                  

   PRIDE                                 637        637     <0.01     147      Proteomic databases                          

   PRINTS                             151119     129746      0.26      28      Family and domain databases                  

   PRO                                 98647      98646      0.17      37      Miscellaneous databases                      

   ProMEX                                489        489     <0.01     149      Proteomic databases                          

   PROSITE                            493776     311943      0.86      14      Family and domain databases                  

   Proteomes                          508561     463540      0.89      13      Miscellaneous databases                      

   ProteomicsDB                        72759      45401      0.13      41      Proteomic databases                          

   PseudoCAP                            2052       2052     <0.01     130      Organism-specific databases                  

   Pumba                               18206      18206      0.03      86      Proteomic databases                          

   Reactome                           145979      38881      0.25      29      Enzyme and pathway databases                 

   REBASE                                797        396     <0.01     143      Protein family/group databases               

   RefSeq                             598394     452707      1.05       8      Sequence databases                           

   REPRODUCTION-2DPAGE                  1260       1039     <0.01     137      2D gel databases                             

   RGD                                  8140       8139      0.01     101      Organism-specific databases                  

   RNAct                               43123      43123      0.08      58      Miscellaneous databases                      

   SABIO-RK                             5761       5761      0.01     109      Enzyme and pathway databases                 

   SASBDB                                933        933     <0.01     142      3D structure databases                       

   SFLD                                20366       9099      0.04      76      Family and domain databases                  

   SGD                                  6746       6741      0.01     106      Organism-specific databases                  

   SignaLink                           19953      19953      0.03      80      Enzyme and pathway databases                 

   SIGNOR                               7650       7650      0.01     103      Enzyme and pathway databases                 

   SMART                              206144     148747      0.36      25      Family and domain databases                  

   SMR                                521254     521254      0.91      11      3D structure databases                       

   STRING                             336375     336375      0.59      18      Protein-protein interaction databases        

   SUPFAM                             649776     460558      1.13       7      Family and domain databases                  

   SwissLipids                          1478       1394     <0.01     135      Chemistry databases                          

   SwissPalm                           13365      13365      0.02      96      PTM databases                                

   TAIR                                16404      16318      0.03      93      Organism-specific databases                  

   TCDB                                 8634       8547      0.02     100      Protein family/group databases               

   TopDownProteomics                    3236       2957      0.01     121      Proteomic databases                          

   TreeFam                             46317      46294      0.08      54      Phylogenomic databases                       

   TubercuList                          2339       2303     <0.01     127      Organism-specific databases                  

   UCSC                                50991      46518      0.09      50      Genome annotation databases                  

   UniLectin                             366        366     <0.01     153      Protein family/group databases               

   UniPathway                         140132     126478      0.24      30      Enzyme and pathway databases                 

   VEuPathDB                           86870      79663      0.15      39      Organism-specific databases                  

   VGNC                                 3448       3445      0.01     119      Organism-specific databases                  

   WBParaSite                             59         56     <0.01     160      Genome annotation databases                  

   WormBase                             6973       5096      0.01     104      Organism-specific databases                  

   Xenbase                              4751       4751      0.01     113      Organism-specific databases                  

   ZFIN                                 3282       3281      0.01     120      Organism-specific databases                  



Total number of cross-referenced databases: 163



6.  AMINO ACID COMPOSITION



   6.1  Composition in percent for the complete database



   Ala (A) 8.25   Gln (Q) 3.93   Leu (L) 9.64   Ser (S) 6.65

   Arg (R) 5.52   Glu (E) 6.71   Lys (K) 5.80   Thr (T) 5.36

   Asn (N) 4.06   Gly (G) 7.07   Met (M) 2.41   Trp (W) 1.10

   Asp (D) 5.46   His (H) 2.27   Phe (F) 3.86   Tyr (Y) 2.92

   Cys (C) 1.38   Ile (I) 5.90   Pro (P) 4.74   Val (V) 6.85



   Asx (B) 0.000  Glx (Z) 0.000  Xaa (X) 0.00



   



   Legend: gray = aliphatic, red = acidic, green = small hydroxy,

           blue = basic, black = aromatic, white = amide, yellow = sulfur





   6.2  Classification of the amino acids by their frequency



   Leu, Ala, Gly, Val, Glu, Ser, Ile, Lys, Arg, Asp, Thr, Pro, Asn, Gln,

   Phe, Tyr, Met, His, Cys, Trp





7.  MISCELLANEOUS STATISTICS



4467 entries are encoded on a mitochondrion, and 4039 are encoded on a plasmid.



12200 entries are encoded on a plastid, 

of which 22 are encoded on apicoplasts, 

11634 on chloroplasts, 

51 on organellar chromatophores,

145 on cyanelles, 

149 on non-photosynthetic plastids and 

199 on unspecified types of plastid.



Number of entries with at least one sequence correction: 81359