Expasy logo

Documents




         UniProtKB/Swiss-Prot protein knowledgebase release 2024_01 statistics





1.  INTRODUCTION



Release 2024_01 of 24-Jan-2024 of UniProtKB/Swiss-Prot contains 570830 sequence

entries, curated from 296829 unique references and comprising 206533160 amino acids. 



415 sequences have been added since release 2023_05, the sequence data of

169 existing entries has been updated and the annotations of

433476 entries have been revised.



Number of fragments: 9300

Number of additional sequences produced by alternative splicing, initiation or promoter usage, or ribosomal frameshifting: 41105





Protein existence (PE):           entries     %



1: Evidence at protein level       113649   19.9%

2: Evidence at transcript level     55837    9.8%

3: Inferred from homology          386501   67.7%

4: Predicted                        13019    2.3%

5: Uncertain                         1824    0.3%



The growth of the database is summarized below.



   





2.  TAXONOMIC ORIGIN



   Total number of species represented in this release of UniProtKB/Swiss-Prot: 14570



   The first twenty species represent 122964 sequences:  21.5 % of the total

   number of entries.





   2.1 Table of the frequency of occurrence of species



        Species represented 1x: 5941

                            2x: 2115

                            3x: 1135

                            4x:  773

                            5x:  534

                            6x:  439

                            7x:  330

                            8x:  277

                            9x:  239

                           10x:  158

                       11- 20x:  839

                       21- 50x:  505

                       51-100x:  228

                         >100x: 1057





   2.2  Table of the most represented species



  ------  ---------  --------------------------------------------

  Number  Frequency  Species

  ------  ---------  --------------------------------------------

       1      20433  Homo sapiens (Human)

       2      17201  Mus musculus (Mouse)

       3      16381  Arabidopsis thaliana (Mouse-ear cress)

       4       8188  Rattus norvegicus (Rat)

       5       6727  Saccharomyces cerevisiae (strain ATCC 204508 / S288c) (Baker's yeast)

       6       6041  Bos taurus (Bovine)

       7       5121  Schizosaccharomyces pombe (strain 972 / ATCC 24843) (Fission yeast)

       8       4530  Escherichia coli (strain K12)

       9       4465  Caenorhabditis elegans

      10       4191  Bacillus subtilis (strain 168)

      11       4186  Oryza sativa subsp. japonica (Rice)

      12       4159  Dictyostelium discoideum (Social amoeba)

      13       3753  Drosophila melanogaster (Fruit fly)

      14       3495  Xenopus laevis (African clawed frog)

      15       3314  Danio rerio (Zebrafish) (Brachydanio rerio)

      16       2308  Gallus gallus (Chicken)

      17       2308  Mycobacterium tuberculosis (strain ATCC 25618 / H37Rv)

      18       2218  Pongo abelii (Sumatran orangutan) (Pongo pygmaeus abelii)

      19       2046  Escherichia coli O157:H7

      20       1899  Mycobacterium tuberculosis (strain CDC 1551 / Oshkosh)

      21       1821  Salmonella typhimurium (strain LT2 / SGSC1412 / ATCC 700720)

      22       1787  Methanocaldococcus jannaschii  

      23       1710  Xenopus tropicalis (Western clawed frog) (Silurana tropicalis)

      24       1704  Haemophilus influenzae (strain ATCC 51907 / DSM 11121 / KW20 / Rd)

      25       1702  Escherichia coli O6:H1 (strain CFT073 / ATCC 700928 / UPEC)

      26       1696  Shigella flexneri

      27       1460  Pseudomonas aeruginosa 

      28       1458  Sus scrofa (Pig)

      29       1348  Salmonella typhi

      30       1244  Mycobacterium bovis (strain ATCC BAA-935 / AF2122/97)

      31       1176  Macaca fascicularis (Crab-eating macaque) (Cynomolgus monkey)

      32       1143  Candida albicans (strain SC5314 / ATCC MYA-2876) (Yeast)

      33       1096  Synechocystis sp. (strain PCC 6803 / Kazusa)

      34       1036  Archaeoglobus fulgidus 

      35       1030  Yersinia pestis

      36       1005  Emericella nidulans  

      37        997  Vibrio cholerae serotype O1 (strain ATCC 39315 / El Tor Inaba N16961)

      38        960  Neurospora crassa 

      39        941  Staphylococcus aureus (strain Mu50 / ATCC 700699)

      40        930  Salmonella paratyphi A (strain ATCC 9150 / SARB42)

      41        929  Staphylococcus aureus (strain N315)

      42        928  Eremothecium gossypii   

      43        919  Kluyveromyces lactis   

      44        909  Acanthamoeba polyphaga mimivirus (APMV)

      45        905  Staphylococcus aureus (strain COL)

      46        902  Oryctolagus cuniculus (Rabbit)

      47        902  Aspergillus fumigatus (strain ATCC MYA-4609 / CBS 101355 / FGSC A1100 / Af293) 

      48        896  Staphylococcus aureus (strain MW2)

      49        894  Escherichia coli O6:K15:H31 (strain 536 / UPEC)

      50        890  Staphylococcus aureus (strain MSSA476)

      51        888  Staphylococcus aureus (strain MRSA252)

      52        888  Candida glabrata   

      53        887  Rhizobium meliloti (strain 1021) (Ensifer meliloti) (Sinorhizobium meliloti)

      54        882  Salmonella choleraesuis (strain SC-B67)

      55        879  Shigella sonnei (strain Ss046)

      56        867  Oryza sativa subsp. indica (Rice)

      57        863  Yersinia pseudotuberculosis serotype I (strain IP32953)

      58        849  Zea mays (Maize)

      59        847  Escherichia coli O9:H4 (strain HS)

      60        847  Canis lupus familiaris (Dog) (Canis familiaris)

      61        838  Escherichia coli O139:H28 (strain E24377A / ETEC)

      62        829  Shigella boydii serotype 4 (strain Sb227)

      63        825  Escherichia coli (strain UTI89 / UPEC)

      64        822  Shigella dysenteriae serotype 1 (strain Sd197)

      65        822  Escherichia coli 

      66        816  Streptomyces coelicolor (strain ATCC BAA-471 / A3(2) / M145)

      67        810  Staphylococcus aureus (strain NCTC 8325 / PS 47)

      68        803  Pectobacterium atrosepticum (strain SCRI 1043 / ATCC BAA-672) 

      69        796  Vibrio parahaemolyticus serotype O3:K6 (strain RIMD 2210633)

      70        791  Escherichia coli (strain SMS-3-5 / SECEC)

      71        788  Aquifex aeolicus (strain VF5)

      72        779  Escherichia coli O127:H6 (strain E2348/69 / EPEC)

      73        771  Escherichia coli (strain K12 / DH10B)

      74        770  Pasteurella multocida (strain Pm70)

      75        766  Escherichia coli O17:K52:H18 (strain UMN026 / ExPEC)

      76        765  Escherichia coli (strain K12 / MC4100 / BW2952)

      77        762  Escherichia coli (strain 55989 / EAEC)

      78        761  Escherichia coli O8 (strain IAI1)

      79        760  Staphylococcus epidermidis (strain ATCC 35984 / RP62A)

      80        760  Staphylococcus epidermidis (strain ATCC 12228 / FDA PCI 1200)

      81        760  Shigella flexneri serotype 5b (strain 8401)

      82        759  Escherichia coli O45:K1 (strain S88 / ExPEC)

      83        758  Bacillus anthracis

      84        756  Escherichia coli (strain SE11)

      85        753  Escherichia coli O7:K1 (strain IAI39 / ExPEC)

      86        749  Photorhabdus laumondii subsp. laumondii (strain DSM 15139 / CIP 105565 / TT01) 

      87        748  Escherichia coli O157:H7 (strain EC4115 / EHEC)

      88        744  Halalkalibacterium halodurans  

      89        739  Yersinia enterocolitica serotype O:8 / biotype 1B (strain NCTC 13174 / 8081)

      90        734  Pseudomonas putida 

      91        733  Vibrio vulnificus (strain CMCP6)

      92        731  Escherichia coli O81 (strain ED1a)

      93        722  Salmonella enteritidis PT4 (strain P125109)

      94        720  Escherichia coli

      95        718  Vibrio vulnificus (strain YJ016)

      96        716  Salmonella paratyphi B (strain ATCC BAA-1250 / SPB7)

      97        715  Yersinia pestis bv. Antiqua (strain Nepal516)

      98        715  Klebsiella pneumoniae subsp. pneumoniae (strain ATCC 700721 / MGH 78578)

      99        715  Escherichia coli O1:K1 / APEC

     100        715  Enterobacter sp. (strain 638)

     101        714  Salmonella paratyphi A (strain AKU_12601)

     102        713  Salmonella newport (strain SL254)

     103        713  Yersinia pseudotuberculosis serotype O:1b (strain IP 31758)

     104        713  Salmonella agona (strain SL483)

     105        712  Salmonella schwarzengrund (strain CVM19633)

     106        711  Yersinia pestis bv. Antiqua (strain Antiqua)

     107        710  Salmonella heidelberg (strain SL476)

     108        707  Nostoc sp. (strain PCC 7120 / SAG 25.82 / UTEX 2576)

     109        702  Salmonella dublin (strain CT_02021853)

     110        699  Klebsiella pneumoniae (strain 342)

     111        698  Shigella boydii serotype 18 (strain CDC 3083-94 / BS512)

     112        695  Escherichia fergusonii 

     113        692  Pan troglodytes (Chimpanzee)

     114        686  Mycoplasma pneumoniae (strain ATCC 29342 / M129 / Subtype 1) 

     115        684  Salmonella gallinarum (strain 287/91 / NCTC 13346)

     116        683  Pseudomonas syringae pv. tomato (strain ATCC BAA-871 / DC3000)

     117        679  Citrobacter koseri (strain ATCC BAA-895 / CDC 4225-83 / SGSC4696)

     118        679  Staphylococcus aureus (strain USA300)

     119        672  Serratia proteamaculans (strain 568)

     120        669  Bacillus cereus 

     121        669  Mycobacterium leprae (strain TN)

     122        667  Yersinia pestis (strain Pestoides F)

     123        666  Yarrowia lipolytica (strain CLIB 122 / E 150) (Yeast) (Candida lipolytica)

     124        666  Bradyrhizobium diazoefficiens 

     125        663  Agrobacterium fabrum (strain C58 / ATCC 33970) (Agrobacterium tumefaciens 

     126        659  Shewanella oneidensis (strain MR-1)

     127        658  Sinorhizobium fredii (strain NBRC 101917 / NGR234)

     128        653  Debaryomyces hansenii   

     129        643  Staphylococcus aureus (strain bovine RF122 / ET3-1)

     130        642  Yersinia pseudotuberculosis serotype O:3 (strain YPIII)

     131        642  Salmonella arizonae (strain ATCC BAA-731 / CDC346-86 / RSK2980)

     132        634  Yersinia pseudotuberculosis serotype IB (strain PB1/+)

     133        622  Listeria monocytogenes serovar 1/2a (strain ATCC BAA-679 / EGD-e)

     134        622  Methanothermobacter thermautotrophicus  

     135        622  Cronobacter sakazakii (strain ATCC BAA-894) (Enterobacter sakazakii)

     136        622  Treponema pallidum (strain Nichols)

     137        620  Pseudomonas aeruginosa (strain UCBPP-PA14)

     138        615  Xanthomonas campestris pv. campestris 

     139        614  Staphylococcus haemolyticus (strain JCSC1435)

     140        613  Mesorhizobium japonicum  (Mesorhizobium loti 

     141        612  Helicobacter pylori (strain ATCC 700392 / 26695) (Campylobacter pylori)

     142        605  Listeria innocua serovar 6a (strain ATCC BAA-680 / CLIP 11262)

     143        603  Ralstonia nicotianae (strain GMI1000) (Ralstonia solanacearum)

     144        602  Staphylococcus saprophyticus subsp. saprophyticus 

     145        602  Photobacterium profundum (strain SS9)

     146        601  Salmonella paratyphi C (strain RKS4594)

     147        600  Yersinia pestis bv. Antiqua (strain Angola)

     148        595  Bacillus cereus (strain ATCC 10987 / NRS 248)

     149        591  Pectobacterium carotovorum subsp. carotovorum (strain PC1)

     150        588  Neisseria meningitidis serogroup B (strain MC58)

     151        587  Mycolicibacterium smegmatis (strain ATCC 700084 / mc(2)155) 

     152        584  Rickettsia prowazekii (strain Madrid E)

     153        582  Caenorhabditis briggsae

     154        579  Brucella suis biovar 1 (strain 1330)

     155        576  Brucella melitensis biotype 1 (strain 16M / ATCC 23456 / NCTC 10094)

     156        575  Caulobacter vibrioides (strain ATCC 19089 / CB15) (Caulobacter crescentus)

     157        573  Aliivibrio fischeri (strain ATCC 700601 / ES114) (Vibrio fischeri)

     158        572  Buchnera aphidicola subsp. Acyrthosiphon pisum (strain APS) 

     159        571  Aspergillus oryzae (strain ATCC 42149 / RIB 40) (Yellow koji mold)

     160        569  Bacillus thuringiensis subsp. konkukian (strain 97-27)

     161        568  Pseudomonas syringae pv. syringae (strain B728a)

     162        568  Helicobacter pylori (strain J99 / ATCC 700824) (Campylobacter pylori J99)

     163        565  Bacillus licheniformis 

     164        564  Thermotoga maritima 

     165        562  Bacillus cereus (strain ZK / E33L)

     166        562  Buchnera aphidicola subsp. Schizaphis graminum (strain Sg)

     167        559  Clostridium acetobutylicum 

     168        557  Xanthomonas axonopodis pv. citri (strain 306)

     169        555  Pseudomonas fluorescens (strain Pf0-1)

     170        554  Neisseria meningitidis serogroup A / serotype 4A (strain DSM 15465 / Z2491)

     171        554  Pseudomonas fluorescens (strain ATCC BAA-477 / NRRL B-23932 / Pf-5)

     172        553  Oceanobacillus iheyensis 

     173        547  Pseudomonas savastanoi pv. phaseolicola  (Pseudomonas syringae pv. phaseolicola 

     174        543  Corynebacterium glutamicum 

     175        540  Lactococcus lactis subsp. lactis (strain IL1403) (Streptococcus lactis)

     176        531  Erwinia tasmaniensis 

     177        529  Listeria monocytogenes serotype 4b (strain F2365)

     178        529  Bordetella bronchiseptica (strain ATCC BAA-588 / NCTC 13252 / RB50) 

     179        529  Sodalis glossinidius (strain morsitans)

     180        524  Staphylococcus aureus (strain Newman)

     181        523  Vibrio cholerae serotype O1 (strain ATCC 39541 / Classical Ogawa 395 / O395)

     182        522  Xylella fastidiosa (strain 9a5c)

     183        521  Deinococcus radiodurans 

     184        519  Chromobacterium violaceum 

     185        519  Methanosarcina acetivorans (strain ATCC 35395 / DSM 2834 / JCM 12185 / C2A)

     186        518  Streptococcus pneumoniae serotype 4 (strain ATCC BAA-334 / TIGR4)

     187        516  Bordetella pertussis (strain Tohama I / ATCC BAA-589 / NCTC 13251)

     188        515  Xylella fastidiosa (strain Temecula1 / ATCC 700964)

     189        512  Geobacillus kaustophilus (strain HTA426)

     190        512  Pseudomonas aeruginosa (strain PA7)

     191        512  Haemophilus ducreyi (strain 35000HP / ATCC 700724)

     192        511  Streptomyces avermitilis 

     193        511  Acinetobacter baylyi (strain ATCC 33305 / BD413 / ADP1)

     194        508  Bordetella parapertussis (strain 12822 / ATCC BAA-587 / NCTC 13253)

     195        507  Buchnera aphidicola subsp. Baizongia pistaciae (strain Bp)

     196        506  Pyrococcus furiosus (strain ATCC 43587 / DSM 3638 / JCM 8422 / Vc1)

     197        506  Solanum lycopersicum (Tomato) (Lycopersicon esculentum)

     198        505  Nicotiana tabacum (Common tobacco)

     199        504  Pseudomonas entomophila (strain L48)

     200        502  Streptococcus pneumoniae (strain ATCC BAA-255 / R6)

     201        499  Haemophilus influenzae (strain 86-028NP)

     202        499  Brucella abortus biovar 1 (strain 9-941)

     203        498  Thermosynechococcus vestitus (strain NIES-2133 / IAM M-273 / BP-1)

     204        498  Methanosarcina mazei  

     205        497  Burkholderia pseudomallei (strain K96243)

     206        496  Shouchella clausii (strain KSM-K16) (Alkalihalobacillus clausii)

     207        496  Proteus mirabilis (strain HI4320)

     208        496  Rickettsia conorii (strain ATCC VR-613 / Malish 7)

     209        495  Pyrococcus horikoshii 

     210        494  Synechococcus elongatus (strain ATCC 33912 / PCC 7942 / FACHB-805) 

     211        494  Xanthomonas campestris pv. campestris (strain 8004)

     212        492  Halobacterium salinarum (strain ATCC 700922 / JCM 11081 / NRC-1) 

     213        492  Bacillus velezensis (strain DSM 23117 / BGSC 10A6 / LMG 26770 / FZB42) 

     214        492  Brucella abortus (strain 2308)

     215        491  Vibrio campbellii (strain ATCC BAA-1116)

     216        490  Saccharolobus solfataricus (strain ATCC 35092 / DSM 1617 / JCM 11322 / P2) 

     217        487  Shewanella sp. (strain MR-7)

     218        486  Mannheimia succiniciproducens (strain MBEL55E)

     219        484  Staphylococcus aureus (strain Mu3 / ATCC 700698)

     220        484  Shewanella sp. (strain MR-4)

     221        484  Pseudomonas aeruginosa (strain LESB58)

     222        483  Mycoplasma genitalium (strain ATCC 33530 / DSM 19775 / NCTC 10195 / G37) 

     223        483  Lactiplantibacillus plantarum (strain ATCC BAA-793 / NCIMB 8826 / WCFS1) 

     224        479  Pseudomonas putida (strain ATCC 700007 / DSM 6899 / BCRC 17059 / F1)

     225        477  Pyrococcus abyssi (strain GE5 / Orsay)

     226        476  Cupriavidus necator  

     227        475  Campylobacter jejuni subsp. jejuni serotype O:2 

     228        475  Burkholderia lata 

     229        472  Rhodopseudomonas palustris (strain ATCC BAA-98 / CGA009)

     230        470  Cereibacter sphaeroides  

     231        470  Enterococcus faecalis (strain ATCC 700802 / V583)

     232        470  Clostridium perfringens (strain 13 / Type A)

     233        468  Shewanella sp. (strain ANA-3)

     234        468  Pseudomonas putida (strain GB-1)

     235        467  Aeromonas hydrophila subsp. hydrophila 

     236        467  Shewanella frigidimarina (strain NCIMB 400)

     237        466  Xanthomonas campestris pv. vesicatoria (strain 85-10)

     238        465  Trichormus variabilis (strain ATCC 29413 / PCC 7937) (Anabaena variabilis)

     239        463  Burkholderia mallei (strain ATCC 23344)

     240        461  Cupriavidus pinatubonensis (strain JMP 134 / LMG 1197) (Cupriavidus necator 

     241        460  Methylococcus capsulatus (strain ATCC 33009 / NCIMB 11132 / Bath)

     242        460  Ovis aries (Sheep)

     243        457  Rickettsia felis (strain ATCC VR-1525 / URRWXCal2) (Rickettsia azadi)

     244        455  Staphylococcus aureus (strain JH1)

     245        455  Xanthomonas oryzae pv. oryzae (strain MAFF 311018)

     246        455  Shewanella baltica (strain OS185)

     247        453  Pseudomonas putida (strain W619)

     248        453  Streptococcus mutans serotype c (strain ATCC 700610 / UA159)

     249        453  Mycolicibacterium paratuberculosis (strain ATCC BAA-968 / K-10) 

     250        452  Caldanaerobacter subterraneus subsp. tengcongensis  





   

   2.3  Taxonomic distribution of the sequences



   



   Kingdom        sequences (% of the database)

    Archaea           19726 (  3%)

    Bacteria         336252 ( 59%)

    Eukaryota        197465 ( 35%)

    Viruses           17387 (  3%)





   Within Eukaryota:



   



    Category            sequences (% of Eukaryota) (% of the complete database)

     Human                  20434 ( 10%)           (  4%)

     Other Mammalia         47334 ( 24%)           (  8%)

     Other Vertebrata       18946 ( 10%)           (  3%)

     Viridiplantae          41698 ( 21%)           (  7%)

     Fungi                  36745 ( 19%)           (  6%)

     Insecta                 9776 (  5%)           (  2%)

     Nematoda                5383 (  3%)           (  1%)

     Other                  17149 (  9%)           (  3%)







3.  SEQUENCE SIZE



   Repartition of the sequences by size (excluding fragments)



               From   To  Number             From   To   Number

                  1-  50    9977             1001-1100     4128

                 51- 100   43563             1101-1200     2900

                101- 150   59823             1201-1300     2209

                151- 200   59600             1301-1400     2069

                201- 250   58477             1401-1500     1679

                251- 300   52445             1501-1600      835

                301- 350   52890             1601-1700      645

                351- 400   45925             1701-1800      593

                401- 450   37721             1801-1900      507

                451- 500   30601             1901-2000      397

                501- 550   22318             2001-2100      273

                551- 600   15849             2101-2200      387

                601- 650   13164             2201-2300      340

                651- 700    9411             2301-2400      235

                701- 750    7881             2401-2500      197

                751- 800    5699             >2500         1467

                801- 850    4890

                851- 900    5311

                901- 950    4109

                951-1000    3015



   





   The average sequence length in UniProtKB/Swiss-Prot is 361 amino acids.



   The shortest sequence is   GWA_SEPOF (P83570):     2 amino acids.

   The longest sequence is  TITIN_MOUSE (A2ASS6): 35213 amino acids.





4.  JOURNAL CITATIONS



   Note: the following citation statistics reflect the number of distinct

         journal citations.



   Total number of journals cited in this release of UniProtKB/Swiss-Prot: 3138





   4.1 Table of the frequency of journal citations



        Journals cited 1x:  996

                       2x:  430

                       3x:  223

                       4x:  142

                       5x:  124

                       6x:   85

                       7x:   69

                       8x:   77

                       9x:   49

                      10x:   33

                  11- 20x:  249

                  21- 50x:  261

                  51-100x:  140

                    >100x:  260





   4.2  List of the most cited journals in UniProtKB/Swiss-Prot



   Nb    Citations   Journal name

   --    ---------   -------------------------------------------------------------

    1        27122   Journal of Biological Chemistry

    2        12663   Proceedings of the National Academy of Sciences of the U.S.A.

    3         7188   Journal of Bacteriology

    4         6051   Biochemical and Biophysical Research Communications

    5         5841   Biochemistry

    6         5325   Nucleic Acids Research

    7         5110   Nature

    8         5084   FEBS Letters

    9         4963   The EMBO Journal

   10         4889   Gene

   11         4589   Journal of Molecular Biology

   12         4567   Molecular and Cellular Biology

   13         4021   Biochimica et Biophysica Acta

   14         3865   Cell

   15         3597   Journal of Virology

   16         3508   European Journal of Biochemistry

   17         3376   Science

   18         3157   Biochemical Journal

   19         2853   Molecular Microbiology

   20         2809   Plant Physiology

   21         2595   PLoS ONE

   22         2548   Genomics

   23         2426   The American Journal of Human Genetics

   24         2357   Journal of Cell Biology

   25         2200   The Plant Cell

   26         2037   The Plant Journal

   27         2015   Human Molecular Genetics

   28         1949   Genes and Development

   29         1923   Plant Molecular Biology

   30         1900   Virology

   31         1846   Nature Genetics

   32         1820   Development

   33         1820   Molecular Biology of the Cell

   34         1804   Molecular Cell

   35         1691   Journal of Immunology

   36         1657   Human Mutation

   37         1566   Oncogene

   38         1464   Structure

   39         1430   Molecular and General Genetics

   40         1423   Genetics

   41         1413   Journal of Biochemistry

   42         1388   Journal of Cell Science

   43         1276   Blood

   44         1272   Infection and Immunity

   45         1220   Nature Communications

   46         1186   Journal of General Virology

   47         1184   Developmental Biology

   48         1179   Microbiology

   49         1154   Archives of Biochemistry and Biophysics

   50         1146   Current Biology

   51         1020   Applied and Environmental Microbiology

   52         1016   Journal of Neuroscience

   53          993   Acta Crystallographica, Section D

   54          925   Cancer Research

   55          914   FEMS Microbiology Letters

   56          904   PLoS Genetics

   57          888   Toxicon

   58          881   American Journal of Physiology

   59          868   Scientific Reports

   60          864   Protein Science

   61          853   Journal of Clinical Investigation

   62          851   Yeast

   63          825   Neuron

   64          766   Plant and Cell Physiology

   65          756   The Journal of Experimental Medicine

   66          754   Human Genetics

   67          705   Journal of Medical Genetics

   68          692   Proteins

   69          683   The FEBS Journal

   70          676   Mechanisms of Development

   71          663   PLoS Pathogens

   72          659   Nature Structural and Molecular Biology

   73          650   Nature Structural Biology

   74          638   Nature Cell Biology

   75          624   Bioscience, Biotechnology, and Biochemistry

   76          592   Current Genetics

   77          589   Developmental Cell

   78          575   Journal of Neurochemistry

   79          554   Antimicrobial Agents and Chemotherapy

   80          554   Molecular Endocrinology

   81          550   The Journal of Clinical Endocrinology and Metabolism

   82          540   Endocrinology

   83          517   Molecular and Biochemical Parasitology

   84          505   Journal of the American Chemical Society

   85          496   Mammalian Genome

   86          489   Experimental Cell Research

   87          488   Cell Reports

   88          486   Eukaryotic Cell

   89          477   Peptides

   90          476   RNA

   91          464   Journal of Experimental Botany

   92          457   Planta

   93          454   The FASEB Journal

   94          452   EMBO Reports

   95          445   American Journal of Medical Genetics. Part A

   96          434   Immunogenetics

   97          432   Molecular Pharmacology

   98          425   Acta Crystallographica, Section F

   99          418   Molecular Biology and Evolution

  100          415   European Journal of Human Genetics

  101          412   Molecular Plant-Microbe Interactions

  102          410   Immunity

  103          407   Journal of Molecular Evolution

  104          400   Clinical Genetics

  105          398   Journal of Investigative Dermatology

  106          396   DNA and Cell Biology

  107          390   Neurology

  108          382   Biochimie

  109          380   DNA Sequence

  110          378   Biology of Reproduction

  111          371   

  112          370   Comparative Biochemistry and Physiology

  113          362   Virus Research

  114          358   Genes to Cells

  115          346   Journal of Lipid Research

  116          344   Nature Immunology

  117          342   Brain Research. Molecular Brain Research

  118          339   The New England Journal of Medicine

  119          337   Developmental Dynamics

  120          335   PLoS Biology

  121          334   Annals of Neurology

  122          326   BMC Genomics

  123          324   Applied Microbiology and Biotechnology

  124          314   Genome Research

  125          314   Journal of Medicinal Chemistry

  126          314   European Journal of Immunology

  127          308   Investigative Ophthalmology and Visual Science

  128          299   Biological Chemistry Hoppe-Seyler

  129          297   Journal of Human Genetics

  130          282   Journal of General Microbiology

  131          281   Glycobiology

  132          281   Cytogenetics and Cell Genetics

  133          278   Archives of Microbiology

  134          267   Nature Chemical Biology

  135          263   Brain

  136          259   Phytochemistry

  137          258   Traffic

  138          258   Molecular Genetics and Metabolism

  139          255   Molecular Immunology

  140          255   Nature Medicine

  141          254   Protein Expression and Purification

  142          249   Journal of Cellular Biochemistry

  143          249   Fungal Genetics and Biology

  144          242   Cell Cycle

  145          236   Circulation Research

  146          234   DNA Research

  147          233   Diabetes

  148          228   Cell Research

  149          227   Archives of Virology

  150          224   Journal of Structural Biology





5.  STATISTICS FOR SOME LINE TYPES



The following table summarizes the total number of some UniProtKB/Swiss-Prot lines,

as well as the number of entries with at least one such line, and the

frequency of the lines.



                                      Total    Number of  Average

   Line type / subtype                number   entries    per entry

------------------------------------  -------- ---------  ---------



References (RL)                      1305541                 2.29                                         

   Journal                           1133240     474553      1.99       1                                 

   Submitted to EMBL/GenBank/DDBJ     160811     144932      0.28       2                                 

   Submitted to other databases         7788       7118      0.01       3                                 

   Book citation                        1875       1852     <0.01       4                                 

   Plant Gene Register                   613        600     <0.01       5                                 

   Unpublished observations              536        532     <0.01       6                                 

   Thesis                                458        455     <0.01       7                                 

   Patent                                214        207     <0.01       8                                 

   Worm Breeder's Gazette                  6          6     <0.01       9                                 



Total number of distinct authors cited in UniProtKB/Swiss-Prot: 466237



                                      Total    Number of  Average

   Line type / subtype                number   entries    per entry  Rank

------------------------------------  -------- ---------  ---------  ----

Comments (CC)                        2738587                 4.80                                         

   ACTIVITY REGULATION                 18544      18425      0.03      17                                 

   ALLERGEN                              948        948     <0.01      26                                 

   ALTERNATIVE PRODUCTS                25888      25888      0.05      13                                 

   BIOPHYSICOCHEMICAL PROPERTIES       11428      11380      0.02      20                                 

   BIOTECHNOLOGY                        1907       1847     <0.01      24                                 

   CATALYTIC ACTIVITY                 340065     254456      0.60       4                                 

   CAUTION                             14385      14086      0.03      18                                 

   COFACTOR                           133102     120778      0.23       7                                 

   DEVELOPMENTAL STAGE                 14308      14211      0.03      19                                 

   DISEASE                              8298       5589      0.01      21                                 

   DISRUPTION PHENOTYPE                20762      20726      0.04      16                                 

   DOMAIN                              58661      50003      0.10       9                                 

   FUNCTION                           490865     466522      0.86       2                                 

   INDUCTION                           25587      25492      0.04      14                                 

   INTERACTION                         24184      24184      0.04      15                                 

   MASS SPECTROMETRY                    7548       5831      0.01      22                                 

   MISCELLANEOUS                       45866      40279      0.08      11                                 

   PATHWAY                            143718     129735      0.25       6                                 

   PHARMACEUTICAL                        167        160     <0.01      29                                 

   POLYMORPHISM                         1502       1374     <0.01      25                                 

   PTM                                 64470      45934      0.11       8                                 

   RNA EDITING                           636        636     <0.01      28                                 

   SEQUENCE CAUTION                    45224      45153      0.08      12                                 

   SIMILARITY                         519322     514992      0.91       1                                 

   SUBCELLULAR LOCATION               364941     356407      0.64       3                                 

   SUBUNIT                            297873     292487      0.52       5                                 

   TISSUE SPECIFICITY                  51004      50426      0.09      10                                 

   TOXIC DOSE                            860        689     <0.01      27                                 

   WEB RESOURCE                         6524       5533      0.01      23                                 



Total number of comment topics: 29





                                      Total    Number of  Average

   Line type / subtype                number   entries    per entry  Rank

------------------------------------  -------- ---------  ---------  ----

Features (FT)                        5339060                 9.35                                         

   ACT_SITE                           176984     105259      0.31       9                                 

   BINDING                           1216961     217927      2.13       1                                 

   CARBOHYD                           124051      31529      0.22      14                                 

   CHAIN                              579201     563178      1.01       2                                 

   COILED                              22520      15576      0.04      25                                 

   COMPBIAS                           174690      74207      0.31      10                                 

   CONFLICT                           139087      48489      0.24      12                                 

   CROSSLNK                            24989       8976      0.04      24                                 

   DISULFID                           135519      36035      0.24      13                                 

   DNA_BIND                            12193      10916      0.02      31                                 

   DOMAIN                             215542     131669      0.38       8                                 

   HELIX                              335721      29161      0.59       5                                 

   INIT_MET                            17545      17496      0.03      26                                 

   INTRAMEM                             3026       1390      0.01      34                                 

   LIPID                               13812       8861      0.02      28                                 

   MOD_RES                            261729      74559      0.46       7                                 

   MOTIF                               47670      31043      0.08      21                                 

   MUTAGEN                             96528      19844      0.17      17                                 

   NON_CONS                             2681        836     <0.01      35                                 

   NON_STD                               358        283     <0.01      36                                 

   NON_TER                             12627       9704      0.02      30                                 

   PEPTIDE                             12629       8738      0.02      29                                 

   PROPEP                              15292      13037      0.03      27                                 

   REGION                             321132     150076      0.56       6                                 

   REPEAT                             109168      15173      0.19      15                                 

   SIGNAL                              44283      44282      0.08      22                                 

   SITE                                65147      35341      0.11      19                                 

   STRAND                             341973      27470      0.60       4                                 

   TOPO_DOM                           149962      30420      0.26      11                                 

   TRANSIT                              9564       9444      0.02      32                                 

   TRANSMEM                           381766      79963      0.67       3                                 

   TURN                                81139      23758      0.14      18                                 

   UNSURE                               5757        897      0.01      33                                 

   VAR_SEQ                             53221      22651      0.09      20                                 

   VARIANT                            103806      17501      0.18      16                                 

   ZN_FING                             30787      13135      0.05      23                                 



Total number of feature keys: 36







                                      Total    Number of  Average

   Line type / subtype                number   entries    per entry  Rank      Category

------------------------------------  -------- ---------  ---------  ----      -------------------------------------------

Cross-references (DR)               20546636                35.99                                                           

   ABCD                                 3067       3067      0.01     124      Protocols and materials databases            

   AGR                                 60842      60172      0.11      43      Organism-specific databases                  

   Allergome                            2037       1310     <0.01     132      Protein family/group databases               

   AlphaFoldDB                        546628     546628      0.96       9      3D structure databases                       

   Antibodypedia                       32300      32191      0.06      62      Protocols and materials databases            

   ArachnoServer                        1148       1138     <0.01     142      Organism-specific databases                  

   Araport                             16401      16305      0.03      93      Organism-specific databases                  

   Bgee                                61485      61485      0.11      41      Gene expression databases                    

   BindingDB                            6660       6660      0.01     109      Chemistry databases                          

   BioCyc                              48068      44022      0.08      53      Enzyme and pathway databases                 

   BioGRID                             61310      59413      0.11      42      Protein-protein interaction databases        

   BioGRID-ORCS                        44975      44390      0.08      55      Miscellaneous databases                      

   BioMuta                             20309      20283      0.04      77      Genetic variation databases                  

   BMRB                                 6910       6910      0.01     107      3D structure databases                       

   BRENDA                              20329      18525      0.04      75      Enzyme and pathway databases                 

   CarbonylDB                           1159       1159     <0.01     141      PTM databases                                

   CAZy                                 9618       8665      0.02     100      Protein family/group databases               

   CCDS                                49494      34651      0.09      51      Sequence databases                           

   CDD                                382807     300877      0.67      16      Family and domain databases                  

   CGD                                  2103       2086     <0.01     131      Organism-specific databases                  

   ChEMBL                               9021       8833      0.02     101      Chemistry databases                          

   ChiTaRS                             29761      29716      0.05      64      Miscellaneous databases                      

   CLAE                                  359        356     <0.01     158      Protein family/group databases               

   CollecTF                              137        137     <0.01     164      Gene expression databases                    

   ComplexPortal                       14953       7896      0.03      96      Protein-protein interaction databases        

   COMPLUYEAST-2DPAGE                     97         97     <0.01     166      2D gel databases                             

   ConoServer                            967        879     <0.01     144      Organism-specific databases                  

   CORUM                                5812       5812      0.01     110      Protein-protein interaction databases        

   CPTAC                                3472       1929      0.01     120      Proteomic databases                          

   CPTC                                  389        389     <0.01     155      Protocols and materials databases            

   CTD                                 69492      68729      0.12      40      Organism-specific databases                  

   DEPOD                                 254        254     <0.01     162      PTM databases                                

   dictyBase                            4224       4110      0.01     117      Organism-specific databases                  

   DIP                                 17546      17505      0.03      89      Protein-protein interaction databases        

   DisGeNET                            17011      16793      0.03      91      Organism-specific databases                  

   DisProt                              1721       1715     <0.01     135      Family and domain databases                  

   DMDM                                16171      16170      0.03      95      Genetic variation databases                  

   DNASU                               48377      48299      0.08      52      Protocols and materials databases            

   DOSAC-COBS-2DPAGE                     145        145     <0.01     163      2D gel databases                             

   DrugBank                            31176       4772      0.05      63      Chemistry databases                          

   DrugCentral                          2565       2565     <0.01     126      Chemistry databases                          

   EchoBASE                             4158       4158      0.01     118      Organism-specific databases                  

   eggNOG                             339191     333346      0.59      17      Phylogenomic databases                       

   ELM                                  1814       1814     <0.01     134      Protein-protein interaction databases        

   EMBL                              1005456     558056      1.76       3      Sequence databases                           

   EMDB                                69660       8097      0.12      39      3D structure databases                       

   Ensembl                            114577      50301      0.20      33      Genome annotation databases                  

   EnsemblBacteria                     55438      55260      0.10      46      Genome annotation databases                  

   EnsemblFungi                        23120      22677      0.04      70      Genome annotation databases                  

   EnsemblMetazoa                      19031      11558      0.03      84      Genome annotation databases                  

   EnsemblPlants                       36885      22344      0.06      60      Genome annotation databases                  

   EnsemblProtists                      5382       5127      0.01     113      Genome annotation databases                  

   EPD                                 23260      23260      0.04      69      Proteomic databases                          

   ESTHER                               3008       3006      0.01     125      Protein family/group databases               

   euHCVdb                                55         44     <0.01     168      Organism-specific databases                  

   EvolutionaryTrace                   16776      16776      0.03      92      Miscellaneous databases                      

   ExpressionAtlas                     53094      53094      0.09      49      Gene expression databases                    

   FlyBase                              4156       4041      0.01     119      Organism-specific databases                  

   Gene3D                             739077     458752      1.29       6      Family and domain databases                  

   GeneCards                           20380      20246      0.04      73      Organism-specific databases                  

   GeneID                             293156     283419      0.51      21      Genome annotation databases                  

   GeneReviews                          1591       1588     <0.01     136      Organism-specific databases                  

   GeneTree                            57210      57202      0.10      45      Phylogenomic databases                       

   Genevisible                         55283      55283      0.10      47      Gene expression databases                    

   GeneWiki                            10351      10269      0.02      99      Miscellaneous databases                      

   GenomeRNAi                          22289      22289      0.04      71      Miscellaneous databases                      

   GlyConnect                           2372       2215     <0.01     127      PTM databases                                

   GlyCosmos                           28903      28903      0.05      65      PTM databases                                

   GlyGen                              21596      21596      0.04      72      PTM databases                                

   GO                                3144059     547026      5.51       1      Ontologies                                   

   Gramene                             36885      22344      0.06      59      Genome annotation databases                  

   GuidetoPHARMACOLOGY                  2205       2205     <0.01     130      Chemistry databases                          

   HAMAP                              330877     327942      0.58      19      Family and domain databases                  

   HGNC                                20377      20248      0.04      74      Organism-specific databases                  

   HOGENOM                            427021     427021      0.75      15      Phylogenomic databases                       

   HPA                                 19354      19215      0.03      82      Organism-specific databases                  

   IDEAL                                 986        986     <0.01     143      Family and domain databases                  

   IMGT_GENE-DB                          267        267     <0.01     161      Protein family/group databases               

   InParanoid                         163754     163754      0.29      26      Phylogenomic databases                       

   IntAct                              57277      57277      0.10      44      Protein-protein interaction databases        

   InterPro                          2427248     551756      4.25       2      Family and domain databases                  

   iPTMnet                             54156      54156      0.09      48      PTM databases                                

   JaponicusDB                            43         43     <0.01     170      Organism-specific databases                  

   jPOST                               26410      26410      0.05      66      Proteomic databases                          

   KEGG                               493293     470312      0.86      12      Genome annotation databases                  

   LegioList                             765        763     <0.01     149      Organism-specific databases                  

   Leproma                               672        669     <0.01     150      Organism-specific databases                  

   MaizeGDB                              529        525     <0.01     152      Organism-specific databases                  

   MalaCards                            5619       5610      0.01     111      Organism-specific databases                  

   MANE-Select                         18422      18310      0.03      86      Genome annotation databases                  

   MassIVE                             19139      19139      0.03      83      Proteomic databases                          

   MaxQB                               33723      33723      0.06      61      Proteomic databases                          

   MEROPS                              14200      13782      0.02      97      Protein family/group databases               

   MetOSite                             3455       3455      0.01     121      PTM databases                                

   MGI                                 17111      17070      0.03      90      Organism-specific databases                  

   MIM                                 23295      16097      0.04      68      Organism-specific databases                  

   MINT                                23842      23842      0.04      67      Protein-protein interaction databases        

   MoonDB                                348        348     <0.01     160      Protein family/group databases               

   MoonProt                              368        368     <0.01     157      Protein family/group databases               

   NCBIfam                            300521     277567      0.53      20      Family and domain databases                  

   neXtProt                            20324      20324      0.04      76      Organism-specific databases                  

   NIAGADS                                69         69     <0.01     167      Organism-specific databases                  

   OGP                                   373        373     <0.01     156      2D gel databases                             

   OMA                                430603     430603      0.75      14      Phylogenomic databases                       

   OpenTargets                         18427      18282      0.03      85      Organism-specific databases                  

   Orphanet                             8177       4417      0.01     103      Organism-specific databases                  

   OrthoDB                            275159     275159      0.48      24      Phylogenomic databases                       

   PANTHER                           1003431     502117      1.76       4      Family and domain databases                  

   PathwayCommons                      19454      19454      0.03      81      Enzyme and pathway databases                 

   PATRIC                              92992      92992      0.16      36      Genome annotation databases                  

   PaxDb                              153442     153442      0.27      27      Proteomic databases                          

   PCDDB                                 132        132     <0.01     165      3D structure databases                       

   PDB                                288259      34877      0.50      23      3D structure databases                       

   PDBsum                             288259      34877      0.50      22      3D structure databases                       

   PeptideAtlas                        39465      39465      0.07      58      Proteomic databases                          

   PeroxiBase                            792        771     <0.01     147      Protein family/group databases               

   Pfam                               839359     540554      1.47       5      Family and domain databases                  

   PharmGKB                            18033      18014      0.03      88      Organism-specific databases                  

   Pharos                              20224      20224      0.04      79      Miscellaneous databases                      

   PHI-base                             2341       1837     <0.01     128      Miscellaneous databases                      

   PhosphoSitePlus                     42102      42102      0.07      57      PTM databases                                

   PhylomeDB                          115539     115539      0.20      32      Phylogenomic databases                       

   PIR                                125101     114773      0.22      31      Sequence databases                           

   PIRSF                              110920     109752      0.19      34      Family and domain databases                  

   PlantReactome                        1320        771     <0.01     138      Enzyme and pathway databases                 

   PomBase                              5129       5125      0.01     114      Organism-specific databases                  

   PRIDE                                 637        637     <0.01     151      Proteomic databases                          

   PRINTS                             150747     129449      0.26      28      Family and domain databases                  

   PRO                                 98141      98140      0.17      35      Miscellaneous databases                      

   ProMEX                                487        487     <0.01     154      Proteomic databases                          

   PROSITE                            491564     310962      0.86      13      Family and domain databases                  

   Proteomes                          503601     461663      0.88      11      Miscellaneous databases                      

   ProteomicsDB                        72691      45372      0.13      38      Proteomic databases                          

   PseudoCAP                            2036       2036     <0.01     133      Organism-specific databases                  

   Pumba                               18207      18207      0.03      87      Proteomic databases                          

   Reactome                           142323      38166      0.25      29      Enzyme and pathway databases                 

   REBASE                                787        391     <0.01     148      Protein family/group databases               

   RefSeq                             596391     451523      1.04       8      Sequence databases                           

   REPRODUCTION-2DPAGE                  1260       1039     <0.01     139      2D gel databases                             

   RGD                                  8120       8118      0.01     104      Organism-specific databases                  

   RNAct                               43106      43106      0.08      56      Miscellaneous databases                      

   SABIO-RK                             5579       5579      0.01     112      Enzyme and pathway databases                 

   SASBDB                                840        840     <0.01     146      3D structure databases                       

   SFLD                                20269       9044      0.04      78      Family and domain databases                  

   SGD                                  6746       6741      0.01     108      Organism-specific databases                  

   SignaLink                           19959      19959      0.03      80      Enzyme and pathway databases                 

   SIGNOR                               7457       7457      0.01     105      Enzyme and pathway databases                 

   SMART                              205489     148271      0.36      25      Family and domain databases                  

   SMR                                516325     516325      0.90      10      3D structure databases                       

   STRING                             335612     335612      0.59      18      Protein-protein interaction databases        

   SUPFAM                             648065     459341      1.14       7      Family and domain databases                  

   SWISS-2DPAGE                         1177       1177     <0.01     140      2D gel databases                             

   SwissLipids                          1478       1394     <0.01     137      Chemistry databases                          

   SwissPalm                           13344      13344      0.02      98      PTM databases                                

   TAIR                                16391      16305      0.03      94      Organism-specific databases                  

   TCDB                                 8525       8443      0.01     102      Protein family/group databases               

   TopDownProteomics                    3236       2957      0.01     123      Proteomic databases                          

   TreeFam                             46157      46134      0.08      54      Phylogenomic databases                       

   TubercuList                          2327       2291     <0.01     129      Organism-specific databases                  

   UCD-2DPAGE                            496        496     <0.01     153      2D gel databases                             

   UCSC                                50881      46420      0.09      50      Genome annotation databases                  

   UniLectin                             356        356     <0.01     159      Protein family/group databases               

   UniPathway                         139693     126059      0.24      30      Enzyme and pathway databases                 

   VEuPathDB                           81490      74972      0.14      37      Organism-specific databases                  

   VGNC                                 4510       4496      0.01     116      Organism-specific databases                  

   WBParaSite                             49         47     <0.01     169      Genome annotation databases                  

   World-2DPAGE                          936        924     <0.01     145      2D gel databases                             

   WormBase                             6953       5069      0.01     106      Organism-specific databases                  

   Xenbase                              4737       4737      0.01     115      Organism-specific databases                  

   ZFIN                                 3245       3244      0.01     122      Organism-specific databases                  



Total number of cross-referenced databases: 170



6.  AMINO ACID COMPOSITION



   6.1  Composition in percent for the complete database



   Ala (A) 8.25   Gln (Q) 3.93   Leu (L) 9.65   Ser (S) 6.65

   Arg (R) 5.52   Glu (E) 6.72   Lys (K) 5.80   Thr (T) 5.36

   Asn (N) 4.06   Gly (G) 7.07   Met (M) 2.41   Trp (W) 1.10

   Asp (D) 5.46   His (H) 2.27   Phe (F) 3.86   Tyr (Y) 2.92

   Cys (C) 1.38   Ile (I) 5.91   Pro (P) 4.74   Val (V) 6.85



   Asx (B) 0.000  Glx (Z) 0.000  Xaa (X) 0.00



   



   Legend: gray = aliphatic, red = acidic, green = small hydroxy,

           blue = basic, black = aromatic, white = amide, yellow = sulfur





   6.2  Classification of the amino acids by their frequency



   Leu, Ala, Gly, Val, Glu, Ser, Ile, Lys, Arg, Asp, Thr, Pro, Asn, Gln,

   Phe, Tyr, Met, His, Cys, Trp





7.  MISCELLANEOUS STATISTICS



4467 entries are encoded on a mitochondrion, and 3999 are encoded on a plasmid.



12200 entries are encoded on a plastid, 

of which 22 are encoded on apicoplasts, 

11634 on chloroplasts, 

51 on organellar chromatophores,

145 on cyanelles, 

149 on non-photosynthetic plastids and 

199 on unspecified types of plastid.



Number of entries with at least one sequence correction: 81170