UniProtKB/Swiss-Prot protein knowledgebase release 2024_04 statistics 1. INTRODUCTION Release 2024_04 of 24-Jul-2024 of UniProtKB/Swiss-Prot contains 571864 sequence entries, curated from 300755 unique references and comprising 207016062 amino acids. 257 sequences have been added since release 2024_03, the sequence data of 22 existing entries has been updated and the annotations of 230683 entries have been revised. Number of fragments: 9293 Number of additional sequences produced by alternative splicing, initiation or promoter usage, or ribosomal frameshifting: 41189 Protein existence (PE): entries % 1: Evidence at protein level 116647 20.4% 2: Evidence at transcript level 54396 9.5% 3: Inferred from homology 386070 67.5% 4: Predicted 12925 2.3% 5: Uncertain 1826 0.3% The growth of the database is summarized below.2. TAXONOMIC ORIGIN Total number of species represented in this release of UniProtKB/Swiss-Prot: 14652 The first twenty species represent 123105 sequences: 21.5 % of the total number of entries. 2.1 Table of the frequency of occurrence of species Species represented 1x: 5979 2x: 2125 3x: 1141 4x: 778 5x: 540 6x: 442 7x: 327 8x: 281 9x: 242 10x: 158 11- 20x: 839 21- 50x: 512 51-100x: 232 >100x: 1056 2.2 Table of the most represented species ------ --------- -------------------------------------------- Number Frequency Species ------ --------- -------------------------------------------- 1 20435 Homo sapiens (Human) 2 17217 Mus musculus (Mouse) 3 16389 Arabidopsis thaliana (Mouse-ear cress) 4 8202 Rattus norvegicus (Rat) 5 6727 Saccharomyces cerevisiae (strain ATCC 204508 / S288c) (Baker's yeast) 6 6047 Bos taurus (Bovine) 7 5121 Schizosaccharomyces pombe (strain 972 / ATCC 24843) (Fission yeast) 8 4530 Escherichia coli (strain K12) 9 4484 Caenorhabditis elegans 10 4191 Bacillus subtilis (strain 168) 11 4190 Oryza sativa subsp. japonica (Rice) 12 4160 Dictyostelium discoideum (Social amoeba) 13 3786 Drosophila melanogaster (Fruit fly) 14 3507 Xenopus laevis (African clawed frog) 15 3338 Danio rerio (Zebrafish) (Brachydanio rerio) 16 2309 Gallus gallus (Chicken) 17 2309 Mycobacterium tuberculosis (strain ATCC 25618 / H37Rv) 18 2218 Pongo abelii (Sumatran orangutan) (Pongo pygmaeus abelii) 19 2046 Escherichia coli O157:H7 20 1899 Mycobacterium tuberculosis (strain CDC 1551 / Oshkosh) 21 1828 Salmonella typhimurium (strain LT2 / SGSC1412 / ATCC 700720) 22 1787 Methanocaldococcus jannaschii 23 1710 Xenopus tropicalis (Western clawed frog) (Silurana tropicalis) 24 1704 Haemophilus influenzae (strain ATCC 51907 / DSM 11121 / KW20 / Rd) 25 1702 Escherichia coli O6:H1 (strain CFT073 / ATCC 700928 / UPEC) 26 1696 Shigella flexneri 27 1460 Pseudomonas aeruginosa 28 1459 Sus scrofa (Pig) 29 1349 Salmonella typhi 30 1244 Mycobacterium bovis (strain ATCC BAA-935 / AF2122/97) 31 1176 Macaca fascicularis (Crab-eating macaque) (Cynomolgus monkey) 32 1144 Candida albicans (strain SC5314 / ATCC MYA-2876) (Yeast) 33 1102 Synechocystis sp. (strain PCC 6803 / Kazusa) 34 1038 Archaeoglobus fulgidus 35 1030 Yersinia pestis 36 1016 Emericella nidulans 37 997 Vibrio cholerae serotype O1 (strain ATCC 39315 / El Tor Inaba N16961) 38 978 Oryctolagus cuniculus (Rabbit) 39 968 Neurospora crassa 40 942 Staphylococcus aureus (strain Mu50 / ATCC 700699) 41 930 Salmonella paratyphi A (strain ATCC 9150 / SARB42) 42 929 Staphylococcus aureus (strain N315) 43 928 Aspergillus fumigatus (strain ATCC MYA-4609 / CBS 101355 / FGSC A1100 / Af293) 44 928 Eremothecium gossypii 45 919 Kluyveromyces lactis 46 909 Acanthamoeba polyphaga mimivirus (APMV) 47 905 Staphylococcus aureus (strain COL) 48 896 Staphylococcus aureus (strain MW2) 49 894 Escherichia coli O6:K15:H31 (strain 536 / UPEC) 50 890 Staphylococcus aureus (strain MSSA476) 51 888 Staphylococcus aureus (strain MRSA252) 52 888 Candida glabrata 53 887 Rhizobium meliloti (strain 1021) (Ensifer meliloti) (Sinorhizobium meliloti) 54 882 Salmonella choleraesuis (strain SC-B67) 55 879 Shigella sonnei (strain Ss046) 56 873 Oryza sativa subsp. indica (Rice) 57 863 Yersinia pseudotuberculosis serotype I (strain IP32953) 58 850 Zea mays (Maize) 59 848 Canis lupus familiaris (Dog) (Canis familiaris) 60 847 Escherichia coli O9:H4 (strain HS) 61 838 Escherichia coli O139:H28 (strain E24377A / ETEC) 62 829 Shigella boydii serotype 4 (strain Sb227) 63 825 Escherichia coli (strain UTI89 / UPEC) 64 822 Shigella dysenteriae serotype 1 (strain Sd197) 65 822 Escherichia coli 66 817 Streptomyces coelicolor (strain ATCC BAA-471 / A3(2) / M145) 67 811 Staphylococcus aureus (strain NCTC 8325 / PS 47) 68 804 Pectobacterium atrosepticum (strain SCRI 1043 / ATCC BAA-672) 69 796 Vibrio parahaemolyticus serotype O3:K6 (strain RIMD 2210633) 70 791 Escherichia coli (strain SMS-3-5 / SECEC) 71 788 Aquifex aeolicus (strain VF5) 72 779 Escherichia coli O127:H6 (strain E2348/69 / EPEC) 73 771 Escherichia coli (strain K12 / DH10B) 74 770 Pasteurella multocida (strain Pm70) 75 767 Escherichia coli O17:K52:H18 (strain UMN026 / ExPEC) 76 765 Escherichia coli (strain K12 / MC4100 / BW2952) 77 762 Escherichia coli (strain 55989 / EAEC) 78 761 Escherichia coli O8 (strain IAI1) 79 760 Staphylococcus epidermidis (strain ATCC 35984 / RP62A) 80 760 Staphylococcus epidermidis (strain ATCC 12228 / FDA PCI 1200) 81 760 Shigella flexneri serotype 5b (strain 8401) 82 759 Escherichia coli O45:K1 (strain S88 / ExPEC) 83 758 Bacillus anthracis 84 756 Escherichia coli (strain SE11) 85 753 Escherichia coli O7:K1 (strain IAI39 / ExPEC) 86 749 Photorhabdus laumondii subsp. laumondii (strain DSM 15139 / CIP 105565 / TT01) 87 748 Escherichia coli O157:H7 (strain EC4115 / EHEC) 88 744 Halalkalibacterium halodurans 89 739 Yersinia enterocolitica serotype O:8 / biotype 1B (strain NCTC 13174 / 8081) 90 736 Pseudomonas putida 91 733 Vibrio vulnificus (strain CMCP6) 92 731 Escherichia coli O81 (strain ED1a) 93 722 Escherichia coli 94 722 Salmonella enteritidis PT4 (strain P125109) 95 718 Vibrio vulnificus (strain YJ016) 96 716 Salmonella paratyphi B (strain ATCC BAA-1250 / SPB7) 97 715 Yersinia pestis bv. Antiqua (strain Nepal516) 98 715 Klebsiella pneumoniae subsp. pneumoniae (strain ATCC 700721 / MGH 78578) 99 715 Enterobacter sp. (strain 638) 100 715 Escherichia coli O1:K1 / APEC 101 714 Salmonella paratyphi A (strain AKU_12601) 102 713 Yersinia pseudotuberculosis serotype O:1b (strain IP 31758) 103 713 Salmonella newport (strain SL254) 104 713 Salmonella agona (strain SL483) 105 712 Salmonella schwarzengrund (strain CVM19633) 106 711 Yersinia pestis bv. Antiqua (strain Antiqua) 107 710 Salmonella heidelberg (strain SL476) 108 708 Nostoc sp. (strain PCC 7120 / SAG 25.82 / UTEX 2576) 109 702 Salmonella dublin (strain CT_02021853) 110 699 Klebsiella pneumoniae (strain 342) 111 698 Shigella boydii serotype 18 (strain CDC 3083-94 / BS512) 112 695 Escherichia fergusonii 113 692 Pan troglodytes (Chimpanzee) 114 686 Mycoplasma pneumoniae (strain ATCC 29342 / M129 / Subtype 1) 115 684 Salmonella gallinarum (strain 287/91 / NCTC 13346) 116 683 Pseudomonas syringae pv. tomato (strain ATCC BAA-871 / DC3000) 117 679 Staphylococcus aureus (strain USA300) 118 679 Citrobacter koseri (strain ATCC BAA-895 / CDC 4225-83 / SGSC4696) 119 672 Serratia proteamaculans (strain 568) 120 670 Bacillus cereus 121 669 Mycobacterium leprae (strain TN) 122 668 Agrobacterium fabrum (strain C58 / ATCC 33970) (Agrobacterium tumefaciens 123 667 Bradyrhizobium diazoefficiens 124 667 Yersinia pestis (strain Pestoides F) 125 667 Yarrowia lipolytica (strain CLIB 122 / E 150) (Yeast) (Candida lipolytica) 126 662 Shewanella oneidensis 127 658 Sinorhizobium fredii (strain NBRC 101917 / NGR234) 128 653 Debaryomyces hansenii 129 643 Staphylococcus aureus (strain bovine RF122 / ET3-1) 130 642 Yersinia pseudotuberculosis serotype O:3 (strain YPIII) 131 642 Salmonella arizonae (strain ATCC BAA-731 / CDC346-86 / RSK2980) 132 634 Yersinia pseudotuberculosis serotype IB (strain PB1/+) 133 623 Methanothermobacter thermautotrophicus 134 622 Cronobacter sakazakii (strain ATCC BAA-894) (Enterobacter sakazakii) 135 622 Listeria monocytogenes serovar 1/2a (strain ATCC BAA-679 / EGD-e) 136 622 Treponema pallidum (strain Nichols) 137 620 Pseudomonas aeruginosa (strain UCBPP-PA14) 138 615 Xanthomonas campestris pv. campestris 139 614 Staphylococcus haemolyticus (strain JCSC1435) 140 613 Mesorhizobium japonicum (Mesorhizobium loti 141 612 Helicobacter pylori (strain ATCC 700392 / 26695) (Campylobacter pylori) 142 605 Listeria innocua serovar 6a (strain ATCC BAA-680 / CLIP 11262) 143 603 Ralstonia nicotianae (strain ATCC BAA-1114 / GMI1000) (Ralstonia solanacearum) 144 602 Staphylococcus saprophyticus subsp. saprophyticus 145 602 Photobacterium profundum (strain SS9) 146 601 Salmonella paratyphi C (strain RKS4594) 147 600 Yersinia pestis bv. Antiqua (strain Angola) 148 595 Bacillus cereus (strain ATCC 10987 / NRS 248) 149 591 Pectobacterium carotovorum subsp. carotovorum (strain PC1) 150 588 Neisseria meningitidis serogroup B (strain MC58) 151 588 Mycolicibacterium smegmatis (strain ATCC 700084 / mc(2)155) 152 584 Rickettsia prowazekii (strain Madrid E) 153 582 Caenorhabditis briggsae 154 579 Brucella suis biovar 1 (strain 1330) 155 576 Brucella melitensis biotype 1 156 575 Caulobacter vibrioides (strain ATCC 19089 / CIP 103742 / CB 15) 157 573 Aliivibrio fischeri (strain ATCC 700601 / ES114) (Vibrio fischeri) 158 572 Buchnera aphidicola subsp. Acyrthosiphon pisum (strain APS) 159 572 Aspergillus oryzae (strain ATCC 42149 / RIB 40) (Yellow koji mold) 160 569 Bacillus thuringiensis subsp. konkukian (strain 97-27) 161 568 Pseudomonas syringae pv. syringae (strain B728a) 162 568 Helicobacter pylori (strain J99 / ATCC 700824) (Campylobacter pylori J99) 163 565 Thermotoga maritima 164 565 Bacillus licheniformis 165 562 Buchnera aphidicola subsp. Schizaphis graminum (strain Sg) 166 562 Bacillus cereus (strain ZK / E33L) 167 559 Clostridium acetobutylicum 168 557 Xanthomonas axonopodis pv. citri (strain 306) 169 555 Pseudomonas fluorescens (strain Pf0-1) 170 554 Neisseria meningitidis serogroup A / serotype 4A (strain DSM 15465 / Z2491) 171 554 Pseudomonas fluorescens (strain ATCC BAA-477 / NRRL B-23932 / Pf-5) 172 553 Oceanobacillus iheyensis 173 547 Pseudomonas savastanoi pv. phaseolicola (Pseudomonas syringae pv. phaseolicola 174 543 Corynebacterium glutamicum 175 541 Lactococcus lactis subsp. lactis (strain IL1403) (Streptococcus lactis) 176 531 Erwinia tasmaniensis 177 530 Listeria monocytogenes serotype 4b (strain F2365) 178 529 Bordetella bronchiseptica (strain ATCC BAA-588 / NCTC 13252 / RB50) 179 529 Sodalis glossinidius (strain morsitans) 180 524 Staphylococcus aureus (strain Newman) 181 523 Vibrio cholerae serotype O1 (strain ATCC 39541 / Classical Ogawa 395 / O395) 182 522 Xylella fastidiosa (strain 9a5c) 183 521 Deinococcus radiodurans 184 519 Streptococcus pneumoniae serotype 4 (strain ATCC BAA-334 / TIGR4) 185 519 Methanosarcina acetivorans (strain ATCC 35395 / DSM 2834 / JCM 12185 / C2A) 186 519 Chromobacterium violaceum 187 516 Bordetella pertussis (strain Tohama I / ATCC BAA-589 / NCTC 13251) 188 515 Xylella fastidiosa (strain Temecula1 / ATCC 700964) 189 512 Pseudomonas aeruginosa (strain PA7) 190 512 Haemophilus ducreyi (strain 35000HP / ATCC 700724) 191 512 Geobacillus kaustophilus (strain HTA426) 192 511 Streptomyces avermitilis 193 511 Acinetobacter baylyi (strain ATCC 33305 / BD413 / ADP1) 194 508 Bordetella parapertussis (strain 12822 / ATCC BAA-587 / NCTC 13253) 195 507 Streptococcus pneumoniae (strain ATCC BAA-255 / R6) 196 507 Buchnera aphidicola subsp. Baizongia pistaciae (strain Bp) 197 506 Pyrococcus furiosus (strain ATCC 43587 / DSM 3638 / JCM 8422 / Vc1) 198 506 Solanum lycopersicum (Tomato) (Lycopersicon esculentum) 199 505 Nicotiana tabacum (Common tobacco) 200 504 Pseudomonas entomophila (strain L48) 201 499 Methanosarcina mazei 202 499 Brucella abortus biovar 1 (strain 9-941) 203 499 Haemophilus influenzae (strain 86-028NP) 204 498 Thermosynechococcus vestitus (strain NIES-2133 / IAM M-273 / BP-1) 205 497 Burkholderia pseudomallei (strain K96243) 206 497 Pyrococcus horikoshii 207 497 Proteus mirabilis (strain HI4320) 208 496 Shouchella clausii (strain KSM-K16) (Alkalihalobacillus clausii) 209 496 Rickettsia conorii (strain ATCC VR-613 / Malish 7) 210 496 Synechococcus elongatus (strain ATCC 33912 / PCC 7942 / FACHB-805) 211 494 Xanthomonas campestris pv. campestris (strain 8004) 212 493 Halobacterium salinarum (strain ATCC 700922 / JCM 11081 / NRC-1) 213 492 Brucella abortus (strain 2308) 214 492 Saccharolobus solfataricus (strain ATCC 35092 / DSM 1617 / JCM 11322 / P2) 215 492 Bacillus velezensis (strain DSM 23117 / BGSC 10A6 / LMG 26770 / FZB42) 216 491 Vibrio campbellii (strain ATCC BAA-1116) 217 487 Shewanella sp. (strain MR-7) 218 486 Mannheimia succiniciproducens (strain MBEL55E) 219 484 Staphylococcus aureus (strain Mu3 / ATCC 700698) 220 484 Pseudomonas aeruginosa (strain LESB58) 221 484 Shewanella sp. (strain MR-4) 222 483 Lactiplantibacillus plantarum (strain ATCC BAA-793 / NCIMB 8826 / WCFS1) 223 483 Mycoplasma genitalium (strain ATCC 33530 / DSM 19775 / NCTC 10195 / G37) 224 479 Pseudomonas putida (strain ATCC 700007 / DSM 6899 / BCRC 17059 / F1) 225 478 Pyrococcus abyssi (strain GE5 / Orsay) 226 476 Cupriavidus necator 227 475 Burkholderia lata 228 475 Campylobacter jejuni subsp. jejuni serotype O:2 229 472 Rhodopseudomonas palustris (strain ATCC BAA-98 / CGA009) 230 470 Enterococcus faecalis (strain ATCC 700802 / V583) 231 470 Cereibacter sphaeroides 232 470 Clostridium perfringens (strain 13 / Type A) 233 468 Shewanella sp. (strain ANA-3) 234 468 Pseudomonas putida (strain GB-1) 235 467 Aeromonas hydrophila subsp. hydrophila 236 467 Shewanella frigidimarina (strain NCIMB 400) 237 466 Xanthomonas euvesicatoria pv. vesicatoria (strain 85-10) 238 465 Trichormus variabilis (strain ATCC 29413 / PCC 7937) (Anabaena variabilis) 239 463 Burkholderia mallei (strain ATCC 23344) 240 461 Ovis aries (Sheep) 241 461 Cupriavidus pinatubonensis (strain JMP 134 / LMG 1197) (Cupriavidus necator 242 460 Methylococcus capsulatus (strain ATCC 33009 / NCIMB 11132 / Bath) 243 457 Rickettsia felis (strain ATCC VR-1525 / URRWXCal2) (Rickettsia azadi) 244 455 Xanthomonas oryzae pv. oryzae (strain MAFF 311018) 245 455 Shewanella baltica (strain OS185) 246 455 Staphylococcus aureus (strain JH1) 247 453 Pseudomonas putida (strain W619) 248 453 Streptococcus mutans serotype c (strain ATCC 700610 / UA159) 249 453 Mycolicibacterium paratuberculosis (strain ATCC BAA-968 / K-10) 250 452 Aeromonas salmonicida (strain A449) 2.3 Taxonomic distribution of the sequences
Kingdom sequences (% of the database) Archaea 19762 ( 3%) Bacteria 336468 ( 59%) Eukaryota 198217 ( 35%) Viruses 17417 ( 3%) Within Eukaryota:
Category sequences (% of Eukaryota) (% of the complete database) Human 20436 ( 10%) ( 4%) Other Mammalia 47449 ( 24%) ( 8%) Other Vertebrata 18986 ( 10%) ( 3%) Viridiplantae 41755 ( 21%) ( 7%) Fungi 37048 ( 19%) ( 6%) Insecta 9871 ( 5%) ( 2%) Nematoda 5402 ( 3%) ( 1%) Other 17270 ( 9%) ( 3%) 3. SEQUENCE SIZE Repartition of the sequences by size (excluding fragments) From To Number From To Number 1- 50 9979 1001-1100 4135 51- 100 43606 1101-1200 2907 101- 150 59905 1201-1300 2218 151- 200 59686 1301-1400 2084 201- 250 58587 1401-1500 1688 251- 300 52542 1501-1600 840 301- 350 53005 1601-1700 648 351- 400 46049 1701-1800 591 401- 450 37770 1801-1900 513 451- 500 30654 1901-2000 400 501- 550 22375 2001-2100 276 551- 600 15877 2101-2200 388 601- 650 13197 2201-2300 341 651- 700 9431 2301-2400 238 701- 750 7898 2401-2500 197 751- 800 5710 >2500 1476 801- 850 4900 851- 900 5325 901- 950 4117 951-1000 3018
The average sequence length in UniProtKB/Swiss-Prot is 362 amino acids. The shortest sequence is GWA_SEPOF (P83570): 2 amino acids. The longest sequence is TITIN_MOUSE (A2ASS6): 35213 amino acids. 4. JOURNAL CITATIONS Note: the following citation statistics reflect the number of distinct journal citations. Total number of journals cited in this release of UniProtKB/Swiss-Prot: 3166 4.1 Table of the frequency of journal citations Journals cited 1x: 1005 2x: 421 3x: 221 4x: 152 5x: 127 6x: 88 7x: 62 8x: 79 9x: 49 10x: 42 11- 20x: 245 21- 50x: 270 51-100x: 144 >100x: 261 4.2 List of the most cited journals in UniProtKB/Swiss-Prot Nb Citations Journal name -- --------- ------------------------------------------------------------- 1 27355 Journal of Biological Chemistry 2 12803 Proceedings of the National Academy of Sciences of the U.S.A. 3 7235 Journal of Bacteriology 4 6090 Biochemical and Biophysical Research Communications 5 5905 Biochemistry 6 5372 Nucleic Acids Research 7 5216 Nature 8 5113 FEBS Letters 9 5020 The EMBO Journal 10 4895 Gene 11 4615 Journal of Molecular Biology 12 4591 Molecular and Cellular Biology 13 4057 Biochimica et Biophysica Acta 14 3913 Cell 15 3634 Journal of Virology 16 3518 European Journal of Biochemistry 17 3409 Science 18 3195 Biochemical Journal 19 2881 Molecular Microbiology 20 2821 Plant Physiology 21 2660 PLoS ONE 22 2548 Genomics 23 2451 The American Journal of Human Genetics 24 2376 Journal of Cell Biology 25 2203 The Plant Cell 26 2041 The Plant Journal 27 2040 Human Molecular Genetics 28 1962 Genes and Development 29 1927 Plant Molecular Biology 30 1919 Virology 31 1879 Molecular Cell 32 1858 Nature Genetics 33 1842 Molecular Biology of the Cell 34 1833 Development 35 1712 Journal of Immunology 36 1662 Human Mutation 37 1570 Oncogene 38 1484 Structure 39 1434 Molecular and General Genetics 40 1429 Journal of Biochemistry 41 1426 Genetics 42 1400 Journal of Cell Science 43 1358 Nature Communications 44 1295 Blood 45 1280 Infection and Immunity 46 1193 Journal of General Virology 47 1189 Developmental Biology 48 1185 Microbiology 49 1161 Archives of Biochemistry and Biophysics 50 1160 Current Biology 51 1039 Journal of Neuroscience 52 1034 Applied and Environmental Microbiology 53 997 Acta Crystallographica, Section D 54 941 Scientific Reports 55 928 Cancer Research 56 925 PLoS Genetics 57 920 FEMS Microbiology Letters 58 893 American Journal of Physiology 59 891 Toxicon 60 876 Protein Science 61 863 Journal of Clinical Investigation 62 854 Yeast 63 837 Neuron 64 776 Plant and Cell Physiology 65 769 The Journal of Experimental Medicine 66 762 Human Genetics 67 717 Journal of Medical Genetics 68 703 PLoS Pathogens 69 702 The FEBS Journal 70 700 Proteins 71 697 Nature Structural and Molecular Biology 72 678 Mechanisms of Development 73 656 Nature Cell Biology 74 654 Nature Structural Biology 75 634 Bioscience, Biotechnology, and Biochemistry 76 600 Developmental Cell 77 595 Current Genetics 78 581 Antimicrobial Agents and Chemotherapy 79 576 Journal of Neurochemistry 80 556 Molecular Endocrinology 81 552 The Journal of Clinical Endocrinology and Metabolism 82 540 Endocrinology 83 531 Cell Reports 84 524 Molecular and Biochemical Parasitology 85 524 Journal of the American Chemical Society 86 496 Mammalian Genome 87 493 Experimental Cell Research 88 492 Eukaryotic Cell 89 483 RNA 90 477 Peptides 91 473 Journal of Experimental Botany 92 467 EMBO Reports 93 465 The FASEB Journal 94 460 American Journal of Medical Genetics. Part A 95 457 Planta 96 437 Molecular Pharmacology 97 434 Immunogenetics 98 432 Acta Crystallographica, Section F 99 426 European Journal of Human Genetics 100 422 Molecular Biology and Evolution 101 417 Molecular Plant-Microbe Interactions 102 417 Clinical Genetics 103 417 Immunity 104 407 Journal of Investigative Dermatology 105 407 Journal of Molecular Evolution 106 396 Neurology 107 396 DNA and Cell Biology 108 394 109 388 Biochimie 110 381 Biology of Reproduction 111 381 DNA Sequence 112 374 Comparative Biochemistry and Physiology 113 362 Virus Research 114 358 Genes to Cells 115 351 Nature Immunology 116 348 Journal of Lipid Research 117 344 Developmental Dynamics 118 342 PLoS Biology 119 342 The New England Journal of Medicine 120 342 Brain Research. Molecular Brain Research 121 339 Applied Microbiology and Biotechnology 122 336 Annals of Neurology 123 332 Journal of Medicinal Chemistry 124 330 BMC Genomics 125 322 European Journal of Immunology 126 316 Genome Research 127 308 Investigative Ophthalmology and Visual Science 128 302 Journal of Human Genetics 129 299 Biological Chemistry Hoppe-Seyler 130 287 Glycobiology 131 283 Nature Chemical Biology 132 282 Journal of General Microbiology 133 281 Cytogenetics and Cell Genetics 134 281 Archives of Microbiology 135 277 Brain 136 263 Traffic 137 261 Phytochemistry 138 259 Molecular Genetics and Metabolism 139 259 Nature Medicine 140 259 Protein Expression and Purification 141 257 Molecular Immunology 142 254 Fungal Genetics and Biology 143 250 Journal of Cellular Biochemistry 144 245 Cell Cycle 145 240 Circulation Research 146 240 Cell Research 147 235 Diabetes 148 234 DNA Research 149 229 New Phytologist 150 228 Archives of Virology 5. STATISTICS FOR SOME LINE TYPES The following table summarizes the total number of some UniProtKB/Swiss-Prot lines, as well as the number of entries with at least one such line, and the frequency of the lines. Total Number of Average Line type / subtype number entries per entry ------------------------------------ -------- --------- --------- References (RL) 1315897 2.30 Journal 1143852 476151 2.00 1 Submitted to EMBL/GenBank/DDBJ 160501 144615 0.28 2 Submitted to other databases 7822 7149 0.01 3 Book citation 1876 1853 <0.01 4 Plant Gene Register 613 600 <0.01 5 Unpublished observations 536 532 <0.01 6 Thesis 477 474 <0.01 7 Patent 214 207 <0.01 8 Worm Breeder's Gazette 6 6 <0.01 9 Total number of distinct authors cited in UniProtKB/Swiss-Prot: 473538 Total Number of Average Line type / subtype number entries per entry Rank ------------------------------------ -------- --------- --------- ---- Comments (CC) 2754712 4.82 ACTIVITY REGULATION 18191 18068 0.03 17 ALLERGEN 952 952 <0.01 26 ALTERNATIVE PRODUCTS 25935 25935 0.05 14 BIOPHYSICOCHEMICAL PROPERTIES 11699 11649 0.02 20 BIOTECHNOLOGY 2083 2023 <0.01 24 CATALYTIC ACTIVITY 344255 255789 0.60 4 CAUTION 14444 14144 0.03 19 COFACTOR 133722 121350 0.23 7 DEVELOPMENTAL STAGE 14510 14405 0.03 18 DISEASE 8462 5691 0.01 21 DISRUPTION PHENOTYPE 21431 21385 0.04 16 DOMAIN 59697 50693 0.10 9 FUNCTION 493214 468538 0.86 2 INDUCTION 25984 25885 0.05 13 INTERACTION 24259 24259 0.04 15 MASS SPECTROMETRY 7575 5854 0.01 22 MISCELLANEOUS 46378 40777 0.08 11 PATHWAY 143969 129969 0.25 6 PHARMACEUTICAL 171 164 <0.01 29 POLYMORPHISM 1508 1380 <0.01 25 PTM 65662 46564 0.11 8 RNA EDITING 637 637 <0.01 28 SEQUENCE CAUTION 45273 45202 0.08 12 SIMILARITY 520288 515945 0.91 1 SUBCELLULAR LOCATION 366264 357589 0.64 3 SUBUNIT 299351 293787 0.52 5 TISSUE SPECIFICITY 51398 50741 0.09 10 TOXIC DOSE 869 692 <0.01 27 WEB RESOURCE 6531 5543 0.01 23 Total number of comment topics: 29 Total Number of Average Line type / subtype number entries per entry Rank ------------------------------------ -------- --------- --------- ---- Features (FT) 5402916 9.45 ACT_SITE 176958 105539 0.31 9 BINDING 1235866 218845 2.16 1 CARBOHYD 124560 31717 0.22 14 CHAIN 580332 564207 1.01 2 COILED 22592 15616 0.04 25 COMPBIAS 175436 74490 0.31 10 CONFLICT 139338 48561 0.24 12 CROSSLNK 25346 9088 0.04 24 DISULFID 136729 36492 0.24 13 DNA_BIND 12207 10929 0.02 31 DOMAIN 217541 133286 0.38 8 HELIX 347050 29984 0.61 5 INIT_MET 17604 17555 0.03 26 INTRAMEM 3164 1436 0.01 34 LIPID 13917 8921 0.02 28 MOD_RES 263356 74796 0.46 7 MOTIF 48245 31398 0.08 21 MUTAGEN 99853 20351 0.17 17 NON_CONS 2662 833 <0.01 35 NON_STD 358 283 <0.01 36 NON_TER 12617 9698 0.02 30 PEPTIDE 12635 8743 0.02 29 PROPEP 15457 13204 0.03 27 REGION 323103 150209 0.56 6 REPEAT 109509 15209 0.19 15 SIGNAL 44570 44569 0.08 22 SITE 65497 35497 0.11 19 STRAND 353354 28228 0.62 4 TOPO_DOM 152409 30725 0.27 11 TRANSIT 9593 9473 0.02 32 TRANSMEM 382671 80157 0.67 3 TURN 83811 24449 0.15 18 UNSURE 5758 898 0.01 33 VAR_SEQ 53322 22696 0.09 20 VARIANT 104662 17551 0.18 16 ZN_FING 30834 13166 0.05 23 Total number of feature keys: 36 Total Number of Average Line type / subtype number entries per entry Rank Category ------------------------------------ -------- --------- --------- ---- ------------------------------------------- Cross-references (DR) 20562820 35.96 ABCD 3130 3130 0.01 121 Protocols and materials databases AGR 69134 68437 0.12 41 Organism-specific databases Allergome 2043 1313 <0.01 129 Protein family/group databases AlphaFoldDB 547530 547530 0.96 9 3D structure databases Antibodypedia 32302 32193 0.06 60 Protocols and materials databases ArachnoServer 1148 1138 <0.01 138 Organism-specific databases Araport 16409 16313 0.03 90 Organism-specific databases Bgee 61715 61711 0.11 42 Gene expression databases BindingDB 6662 6662 0.01 106 Chemistry databases BioCyc 48140 44093 0.08 52 Enzyme and pathway databases BioGRID 61627 59675 0.11 43 Protein-protein interaction databases BioGRID-ORCS 45024 44439 0.08 54 Miscellaneous databases BioMuta 20308 20282 0.04 74 Genetic variation databases BMRB 6911 6911 0.01 104 3D structure databases BRENDA 20407 18595 0.04 70 Enzyme and pathway databases CarbonylDB 1159 1159 <0.01 137 PTM databases CAZy 9662 8697 0.02 97 Protein family/group databases CCDS 49671 34784 0.09 50 Sequence databases CDD 383431 301400 0.67 15 Family and domain databases CGD 2105 2088 <0.01 128 Organism-specific databases ChEMBL 9024 8836 0.02 98 Chemistry databases ChiTaRS 29779 29734 0.05 62 Miscellaneous databases CollecTF 137 137 <0.01 156 Gene expression databases ComplexPortal 16289 8392 0.03 92 Protein-protein interaction databases ConoServer 967 879 <0.01 140 Organism-specific databases CORUM 5812 5812 0.01 107 Protein-protein interaction databases CPTAC 3472 1929 0.01 116 Proteomic databases CPTC 390 390 <0.01 149 Protocols and materials databases CTD 75010 74139 0.13 39 Organism-specific databases DEPOD 254 254 <0.01 155 PTM databases dictyBase 4225 4111 0.01 113 Organism-specific databases DIP 17563 17522 0.03 87 Protein-protein interaction databases DisGeNET 17610 17412 0.03 86 Organism-specific databases DisProt 1787 1781 <0.01 132 Family and domain databases DMDM 16171 16170 0.03 93 Genetic variation databases DNASU 48420 48342 0.08 51 Protocols and materials databases DrugBank 31669 4787 0.06 61 Chemistry databases DrugCentral 2982 2982 0.01 123 Chemistry databases EchoBASE 4158 4158 0.01 114 Organism-specific databases eggNOG 339636 333776 0.59 16 Phylogenomic databases ELM 1814 1814 <0.01 131 Protein-protein interaction databases EMBL 1006946 559041 1.76 4 Sequence databases EMDB 82808 9009 0.14 38 3D structure databases Ensembl 113604 49234 0.20 33 Genome annotation databases EnsemblBacteria 55481 55303 0.10 47 Genome annotation databases EnsemblFungi 23258 22809 0.04 66 Genome annotation databases EnsemblMetazoa 19209 11666 0.03 80 Genome annotation databases EnsemblPlants 43808 22497 0.08 56 Genome annotation databases EnsemblProtists 5406 5151 0.01 110 Genome annotation databases ESTHER 3011 3008 0.01 122 Protein family/group databases euHCVdb 55 44 <0.01 159 Organism-specific databases EvolutionaryTrace 22640 22640 0.04 67 Miscellaneous databases ExpressionAtlas 53160 53160 0.09 48 Gene expression databases FlyBase 3916 3807 0.01 115 Organism-specific databases Gene3D 740362 459583 1.29 6 Family and domain databases GeneCards 20382 20251 0.04 71 Organism-specific databases GeneID 294063 284289 0.51 22 Genome annotation databases GeneReviews 1605 1601 <0.01 133 Organism-specific databases GeneTree 56294 56284 0.10 46 Phylogenomic databases GeneWiki 10351 10269 0.02 96 Miscellaneous databases GenomeRNAi 22322 22322 0.04 68 Miscellaneous databases GlyConnect 2372 2215 <0.01 125 PTM databases GlyCosmos 28906 28906 0.05 63 PTM databases GlyGen 22246 22246 0.04 69 PTM databases GO 3332967 551127 5.83 1 Ontologies Gramene 43808 22497 0.08 55 Genome annotation databases GuidetoPHARMACOLOGY 2252 2252 <0.01 127 Chemistry databases HAMAP 330940 328004 0.58 18 Family and domain databases HGNC 20379 20250 0.04 72 Organism-specific databases HOGENOM 427571 427571 0.75 14 Phylogenomic databases HPA 19354 19215 0.03 79 Organism-specific databases IDEAL 1100 1100 <0.01 139 Family and domain databases IMGT_GENE-DB 267 267 <0.01 154 Protein family/group databases InParanoid 164035 164035 0.29 25 Phylogenomic databases IntAct 57536 57536 0.10 44 Protein-protein interaction databases InterPro 2545164 553565 4.45 2 Family and domain databases iPTMnet 56774 56774 0.10 45 PTM databases JaponicusDB 43 43 <0.01 161 Organism-specific databases jPOST 26413 26413 0.05 64 Proteomic databases KEGG 503584 477920 0.88 12 Genome annotation databases LegioList 765 763 <0.01 144 Organism-specific databases Leproma 672 669 <0.01 145 Organism-specific databases MaizeGDB 529 525 <0.01 147 Organism-specific databases MalaCards 5693 5684 0.01 109 Organism-specific databases MANE-Select 18512 18399 0.03 83 Genome annotation databases MassIVE 19142 19142 0.03 81 Proteomic databases MEROPS 14220 13801 0.02 94 Protein family/group databases MetOSite 3455 3455 0.01 117 PTM databases MGI 17129 17088 0.03 89 Organism-specific databases MIM 23558 16199 0.04 65 Organism-specific databases MINT 17289 17289 0.03 88 Protein-protein interaction databases MoonDB 348 348 <0.01 153 Protein family/group databases MoonProt 368 368 <0.01 151 Protein family/group databases NCBIfam 301467 278327 0.53 21 Family and domain databases neXtProt 20321 20321 0.04 73 Organism-specific databases NIAGADS 69 69 <0.01 158 Organism-specific databases OGP 373 373 <0.01 150 2D gel databases OMA 119938 119938 0.21 31 Phylogenomic databases OpenTargets 18558 18412 0.03 82 Organism-specific databases Orphanet 8227 4440 0.01 100 Organism-specific databases OrthoDB 275766 275766 0.48 23 Phylogenomic databases PANTHER 1008106 504468 1.76 3 Family and domain databases PathwayCommons 19451 19451 0.03 78 Enzyme and pathway databases PATRIC 93068 93068 0.16 36 Genome annotation databases PaxDb 153705 153705 0.27 26 Proteomic databases PCDDB 134 134 <0.01 157 3D structure databases PDB 309384 35919 0.54 19 3D structure databases PDBsum 309384 35919 0.54 20 3D structure databases PeptideAtlas 39650 39650 0.07 59 Proteomic databases PeroxiBase 792 771 <0.01 143 Protein family/group databases Pfam 840801 541531 1.47 5 Family and domain databases PharmGKB 18032 18013 0.03 85 Organism-specific databases Pharos 20221 20221 0.04 76 Miscellaneous databases PHI-base 2385 1874 <0.01 124 Miscellaneous databases PhosphoSitePlus 42170 42170 0.07 58 PTM databases PhylomeDB 115641 115641 0.20 32 Phylogenomic databases PIR 125168 114834 0.22 30 Sequence databases PIRSF 111031 109862 0.19 34 Family and domain databases PlantReactome 1320 771 <0.01 135 Enzyme and pathway databases PomBase 5129 5125 0.01 111 Organism-specific databases PRIDE 637 637 <0.01 146 Proteomic databases PRINTS 150966 129640 0.26 27 Family and domain databases PRO 98651 98651 0.17 35 Miscellaneous databases ProMEX 489 489 <0.01 148 Proteomic databases PROSITE 493123 311610 0.86 13 Family and domain databases Proteomes 508530 463467 0.89 11 Miscellaneous databases ProteomicsDB 72737 45392 0.13 40 Proteomic databases PseudoCAP 2036 2036 <0.01 130 Organism-specific databases Pumba 18207 18207 0.03 84 Proteomic databases Reactome 144979 38700 0.25 28 Enzyme and pathway databases REBASE 799 395 <0.01 142 Protein family/group databases RefSeq 597630 452186 1.05 8 Sequence databases REPRODUCTION-2DPAGE 1260 1039 <0.01 136 2D gel databases RGD 8135 8134 0.01 101 Organism-specific databases RNAct 43112 43112 0.08 57 Miscellaneous databases SABIO-RK 5759 5759 0.01 108 Enzyme and pathway databases SASBDB 900 900 <0.01 141 3D structure databases SFLD 20299 9062 0.04 75 Family and domain databases SGD 6746 6741 0.01 105 Organism-specific databases SignaLink 19957 19957 0.03 77 Enzyme and pathway databases SIGNOR 7608 7608 0.01 102 Enzyme and pathway databases SMART 205923 148596 0.36 24 Family and domain databases SMR 519576 519576 0.91 10 3D structure databases STRING 336118 336118 0.59 17 Protein-protein interaction databases SUPFAM 649036 460051 1.13 7 Family and domain databases SwissLipids 1478 1394 <0.01 134 Chemistry databases SwissPalm 13358 13358 0.02 95 PTM databases TAIR 16399 16313 0.03 91 Organism-specific databases TCDB 8592 8505 0.02 99 Protein family/group databases TopDownProteomics 3236 2957 0.01 120 Proteomic databases TreeFam 46288 46265 0.08 53 Phylogenomic databases TubercuList 2330 2294 <0.01 126 Organism-specific databases UCSC 50947 46480 0.09 49 Genome annotation databases UniLectin 366 366 <0.01 152 Protein family/group databases UniPathway 139870 126219 0.24 29 Enzyme and pathway databases VEuPathDB 86199 79389 0.15 37 Organism-specific databases VGNC 3443 3440 0.01 118 Organism-specific databases WBParaSite 52 50 <0.01 160 Genome annotation databases WormBase 6977 5088 0.01 103 Organism-specific databases Xenbase 4750 4750 0.01 112 Organism-specific databases ZFIN 3272 3271 0.01 119 Organism-specific databases Total number of cross-referenced databases: 161 6. AMINO ACID COMPOSITION 6.1 Composition in percent for the complete database Ala (A) 8.25 Gln (Q) 3.93 Leu (L) 9.64 Ser (S) 6.65 Arg (R) 5.52 Glu (E) 6.71 Lys (K) 5.80 Thr (T) 5.36 Asn (N) 4.06 Gly (G) 7.07 Met (M) 2.41 Trp (W) 1.10 Asp (D) 5.46 His (H) 2.27 Phe (F) 3.86 Tyr (Y) 2.92 Cys (C) 1.38 Ile (I) 5.91 Pro (P) 4.74 Val (V) 6.85 Asx (B) 0.000 Glx (Z) 0.000 Xaa (X) 0.00
Legend: gray = aliphatic, red = acidic, green = small hydroxy, blue = basic, black = aromatic, white = amide, yellow = sulfur 6.2 Classification of the amino acids by their frequency Leu, Ala, Gly, Val, Glu, Ser, Ile, Lys, Arg, Asp, Thr, Pro, Asn, Gln, Phe, Tyr, Met, His, Cys, Trp 7. MISCELLANEOUS STATISTICS 4467 entries are encoded on a mitochondrion, and 4025 are encoded on a plasmid. 12200 entries are encoded on a plastid, of which 22 are encoded on apicoplasts, 11634 on chloroplasts, 51 on organellar chromatophores, 145 on cyanelles, 149 on non-photosynthetic plastids and 199 on unspecified types of plastid. Number of entries with at least one sequence correction: 81280