UniProtKB/Swiss-Prot protein knowledgebase release 2024_06 statistics 1. INTRODUCTION Release 2024_06 of 27-Nov-2024 of UniProtKB/Swiss-Prot contains 572619 sequence entries, curated from 303020 unique references and comprising 207431389 amino acids. 435 sequences have been added since release 2024_05, the sequence data of 42 existing entries has been updated and the annotations of 440420 entries have been revised. Number of fragments: 9292 Number of additional sequences produced by alternative splicing, initiation or promoter usage, or ribosomal frameshifting: 41218 Protein existence (PE): entries % 1: Evidence at protein level 117598 20.5% 2: Evidence at transcript level 54363 9.5% 3: Inferred from homology 385971 67.4% 4: Predicted 12873 2.2% 5: Uncertain 1814 0.3% The growth of the database is summarized below. 2. TAXONOMIC ORIGIN Total number of species represented in this release of UniProtKB/Swiss-Prot: 14724 The first twenty species represent 123153 sequences: 21.5 % of the total number of entries. 2.1 Table of the frequency of occurrence of species Species represented 1x: 6006 2x: 2132 3x: 1152 4x: 783 5x: 542 6x: 448 7x: 326 8x: 281 9x: 241 10x: 163 11- 20x: 846 21- 50x: 515 51-100x: 229 >100x: 1060 2.2 Table of the most represented species ------ --------- -------------------------------------------- Number Frequency Species ------ --------- -------------------------------------------- 1 20421 Homo sapiens (Human) 2 17229 Mus musculus (Mouse) 3 16394 Arabidopsis thaliana (Mouse-ear cress) 4 8207 Rattus norvegicus (Rat) 5 6727 Saccharomyces cerevisiae (strain ATCC 204508 / S288c) (Baker's yeast) 6 6048 Bos taurus (Bovine) 7 5121 Schizosaccharomyces pombe (strain 972 / ATCC 24843) (Fission yeast) 8 4530 Escherichia coli (strain K12) 9 4487 Caenorhabditis elegans 10 4191 Bacillus subtilis (strain 168) 11 4190 Oryza sativa subsp. japonica (Rice) 12 4160 Dictyostelium discoideum (Social amoeba) 13 3803 Drosophila melanogaster (Fruit fly) 14 3507 Xenopus laevis (African clawed frog) 15 3348 Danio rerio (Zebrafish) (Brachydanio rerio) 16 2318 Mycobacterium tuberculosis (strain ATCC 25618 / H37Rv) 17 2309 Gallus gallus (Chicken) 18 2218 Pongo abelii (Sumatran orangutan) (Pongo pygmaeus abelii) 19 2046 Escherichia coli O157:H7 20 1899 Mycobacterium tuberculosis (strain CDC 1551 / Oshkosh) 21 1828 Salmonella typhimurium (strain LT2 / SGSC1412 / ATCC 700720) 22 1787 Methanocaldococcus jannaschii 23 1711 Xenopus tropicalis (Western clawed frog) (Silurana tropicalis) 24 1704 Haemophilus influenzae (strain ATCC 51907 / DSM 11121 / KW20 / Rd) 25 1702 Escherichia coli O6:H1 (strain CFT073 / ATCC 700928 / UPEC) 26 1696 Shigella flexneri 27 1477 Pseudomonas aeruginosa 28 1459 Sus scrofa (Pig) 29 1349 Salmonella typhi 30 1244 Mycobacterium bovis (strain ATCC BAA-935 / AF2122/97) 31 1176 Macaca fascicularis (Crab-eating macaque) (Cynomolgus monkey) 32 1145 Candida albicans (strain SC5314 / ATCC MYA-2876) (Yeast) 33 1103 Synechocystis sp. (strain ATCC 27184 / PCC 6803 / Kazusa) 34 1038 Archaeoglobus fulgidus 35 1030 Yersinia pestis 36 1017 Emericella nidulans 37 997 Vibrio cholerae serotype O1 (strain ATCC 39315 / El Tor Inaba N16961) 38 978 Oryctolagus cuniculus (Rabbit) 39 969 Neurospora crassa 40 942 Staphylococcus aureus (strain Mu50 / ATCC 700699) 41 935 Aspergillus fumigatus (strain ATCC MYA-4609 / CBS 101355 / FGSC A1100 / Af293) 42 930 Salmonella paratyphi A (strain ATCC 9150 / SARB42) 43 929 Staphylococcus aureus (strain N315) 44 928 Eremothecium gossypii 45 919 Kluyveromyces lactis 46 909 Acanthamoeba polyphaga mimivirus (APMV) 47 905 Staphylococcus aureus (strain COL) 48 896 Staphylococcus aureus (strain MW2) 49 894 Escherichia coli O6:K15:H31 (strain 536 / UPEC) 50 890 Staphylococcus aureus (strain MSSA476) 51 888 Rhizobium meliloti (strain 1021) (Ensifer meliloti) (Sinorhizobium meliloti) 52 888 Candida glabrata 53 888 Staphylococcus aureus (strain MRSA252) 54 882 Salmonella choleraesuis (strain SC-B67) 55 879 Shigella sonnei (strain Ss046) 56 873 Oryza sativa subsp. indica (Rice) 57 863 Yersinia pseudotuberculosis serotype I (strain IP32953) 58 854 Canis lupus familiaris (Dog) (Canis familiaris) 59 850 Zea mays (Maize) 60 847 Escherichia coli O9:H4 (strain HS) 61 838 Escherichia coli O139:H28 (strain E24377A / ETEC) 62 829 Shigella boydii serotype 4 (strain Sb227) 63 825 Escherichia coli (strain UTI89 / UPEC) 64 822 Escherichia coli 65 822 Shigella dysenteriae serotype 1 (strain Sd197) 66 818 Streptomyces coelicolor (strain ATCC BAA-471 / A3(2) / M145) 67 811 Staphylococcus aureus (strain NCTC 8325 / PS 47) 68 804 Pectobacterium atrosepticum (strain SCRI 1043 / ATCC BAA-672) 69 796 Vibrio parahaemolyticus serotype O3:K6 (strain RIMD 2210633) 70 791 Escherichia coli (strain SMS-3-5 / SECEC) 71 788 Aquifex aeolicus (strain VF5) 72 779 Escherichia coli O127:H6 (strain E2348/69 / EPEC) 73 771 Escherichia coli (strain K12 / DH10B) 74 770 Pasteurella multocida (strain Pm70) 75 767 Escherichia coli O17:K52:H18 (strain UMN026 / ExPEC) 76 765 Escherichia coli (strain K12 / MC4100 / BW2952) 77 762 Escherichia coli (strain 55989 / EAEC) 78 761 Escherichia coli O8 (strain IAI1) 79 760 Staphylococcus epidermidis (strain ATCC 12228 / FDA PCI 1200) 80 760 Shigella flexneri serotype 5b (strain 8401) 81 760 Staphylococcus epidermidis 82 759 Escherichia coli O45:K1 (strain S88 / ExPEC) 83 758 Bacillus anthracis 84 756 Escherichia coli (strain SE11) 85 753 Escherichia coli O7:K1 (strain IAI39 / ExPEC) 86 749 Photorhabdus laumondii subsp. laumondii (strain DSM 15139 / CIP 105565 / TT01) 87 748 Escherichia coli O157:H7 (strain EC4115 / EHEC) 88 744 Halalkalibacterium halodurans 89 739 Yersinia enterocolitica serotype O:8 / biotype 1B (strain NCTC 13174 / 8081) 90 737 Pseudomonas putida 91 733 Vibrio vulnificus (strain CMCP6) 92 731 Escherichia coli O81 (strain ED1a) 93 724 Escherichia coli 94 722 Salmonella enteritidis PT4 (strain P125109) 95 718 Vibrio vulnificus (strain YJ016) 96 716 Salmonella paratyphi B (strain ATCC BAA-1250 / SPB7) 97 715 Escherichia coli O1:K1 / APEC 98 715 Yersinia pestis bv. Antiqua (strain Nepal516) 99 715 Enterobacter sp. (strain 638) 100 715 Klebsiella pneumoniae subsp. pneumoniae (strain ATCC 700721 / MGH 78578) 101 714 Salmonella paratyphi A (strain AKU_12601) 102 713 Salmonella newport (strain SL254) 103 713 Yersinia pseudotuberculosis serotype O:1b (strain IP 31758) 104 713 Salmonella agona (strain SL483) 105 712 Salmonella schwarzengrund (strain CVM19633) 106 711 Yersinia pestis bv. Antiqua (strain Antiqua) 107 710 Salmonella heidelberg (strain SL476) 108 708 Nostoc sp. (strain PCC 7120 / SAG 25.82 / UTEX 2576) 109 702 Salmonella dublin (strain CT_02021853) 110 699 Klebsiella pneumoniae (strain 342) 111 698 Shigella boydii serotype 18 (strain CDC 3083-94 / BS512) 112 695 Escherichia fergusonii 113 692 Pan troglodytes (Chimpanzee) 114 686 Mycoplasma pneumoniae (strain ATCC 29342 / M129 / Subtype 1) 115 684 Salmonella gallinarum (strain 287/91 / NCTC 13346) 116 683 Pseudomonas syringae pv. tomato (strain ATCC BAA-871 / DC3000) 117 679 Staphylococcus aureus (strain USA300) 118 679 Citrobacter koseri (strain ATCC BAA-895 / CDC 4225-83 / SGSC4696) 119 672 Serratia proteamaculans (strain 568) 120 670 Bacillus cereus 121 669 Agrobacterium fabrum (strain C58 / ATCC 33970) (Agrobacterium tumefaciens 122 669 Mycobacterium leprae (strain TN) 123 667 Bradyrhizobium diazoefficiens 124 667 Yarrowia lipolytica (strain CLIB 122 / E 150) (Yeast) (Candida lipolytica) 125 667 Yersinia pestis (strain Pestoides F) 126 663 Shewanella oneidensis 127 658 Sinorhizobium fredii (strain NBRC 101917 / NGR234) 128 653 Debaryomyces hansenii 129 643 Staphylococcus aureus (strain bovine RF122 / ET3-1) 130 642 Salmonella arizonae (strain ATCC BAA-731 / CDC346-86 / RSK2980) 131 642 Yersinia pseudotuberculosis serotype O:3 (strain YPIII) 132 634 Yersinia pseudotuberculosis serotype IB (strain PB1/+) 133 623 Methanothermobacter thermautotrophicus 134 622 Treponema pallidum (strain Nichols) 135 622 Cronobacter sakazakii (strain ATCC BAA-894) (Enterobacter sakazakii) 136 622 Listeria monocytogenes serovar 1/2a (strain ATCC BAA-679 / EGD-e) 137 620 Pseudomonas aeruginosa (strain UCBPP-PA14) 138 615 Xanthomonas campestris pv. campestris 139 614 Staphylococcus haemolyticus (strain JCSC1435) 140 613 Mesorhizobium japonicum (Mesorhizobium loti 141 612 Helicobacter pylori (strain ATCC 700392 / 26695) (Campylobacter pylori) 142 605 Listeria innocua serovar 6a (strain ATCC BAA-680 / CLIP 11262) 143 604 Ralstonia nicotianae (strain ATCC BAA-1114 / GMI1000) (Ralstonia solanacearum) 144 602 Staphylococcus saprophyticus subsp. saprophyticus 145 602 Photobacterium profundum (strain SS9) 146 601 Salmonella paratyphi C (strain RKS4594) 147 600 Yersinia pestis bv. Antiqua (strain Angola) 148 595 Bacillus cereus (strain ATCC 10987 / NRS 248) 149 591 Pectobacterium carotovorum subsp. carotovorum (strain PC1) 150 591 Mycolicibacterium smegmatis (strain ATCC 700084 / mc(2)155) 151 588 Neisseria meningitidis serogroup B (strain ATCC BAA-335 / MC58) 152 584 Rickettsia prowazekii (strain Madrid E) 153 582 Caenorhabditis briggsae 154 580 Aspergillus oryzae (strain ATCC 42149 / RIB 40) (Yellow koji mold) 155 579 Brucella suis biovar 1 (strain 1330) 156 576 Brucella melitensis biotype 1 157 575 Caulobacter vibrioides (strain ATCC 19089 / CIP 103742 / CB 15) 158 573 Aliivibrio fischeri (strain ATCC 700601 / ES114) (Vibrio fischeri) 159 572 Buchnera aphidicola subsp. Acyrthosiphon pisum (strain APS) 160 569 Bacillus thuringiensis subsp. konkukian (strain 97-27) 161 568 Pseudomonas syringae pv. syringae (strain B728a) 162 568 Helicobacter pylori (strain J99 / ATCC 700824) (Campylobacter pylori J99) 163 566 Bacillus licheniformis 164 566 Thermotoga maritima 165 562 Bacillus cereus (strain ZK / E33L) 166 562 Buchnera aphidicola subsp. Schizaphis graminum (strain Sg) 167 559 Clostridium acetobutylicum 168 557 Xanthomonas axonopodis pv. citri (strain 306) 169 555 Pseudomonas fluorescens (strain Pf0-1) 170 554 Neisseria meningitidis serogroup A / serotype 4A (strain DSM 15465 / Z2491) 171 554 Pseudomonas fluorescens (strain ATCC BAA-477 / NRRL B-23932 / Pf-5) 172 553 Oceanobacillus iheyensis 173 547 Pseudomonas savastanoi pv. phaseolicola (Pseudomonas syringae pv. phaseolicola 174 543 Corynebacterium glutamicum 175 541 Lactococcus lactis subsp. lactis (strain IL1403) (Streptococcus lactis) 176 531 Erwinia tasmaniensis 177 530 Bordetella bronchiseptica (strain ATCC BAA-588 / NCTC 13252 / RB50) 178 530 Listeria monocytogenes serotype 4b (strain F2365) 179 529 Sodalis glossinidius (strain morsitans) 180 524 Staphylococcus aureus (strain Newman) 181 523 Vibrio cholerae serotype O1 (strain ATCC 39541 / Classical Ogawa 395 / O395) 182 522 Xylella fastidiosa (strain 9a5c) 183 522 Deinococcus radiodurans 184 519 Methanosarcina acetivorans (strain ATCC 35395 / DSM 2834 / JCM 12185 / C2A) 185 519 Chromobacterium violaceum 186 519 Streptococcus pneumoniae serotype 4 (strain ATCC BAA-334 / TIGR4) 187 516 Bordetella pertussis (strain Tohama I / ATCC BAA-589 / NCTC 13251) 188 515 Xylella fastidiosa (strain Temecula1 / ATCC 700964) 189 512 Haemophilus ducreyi (strain 35000HP / ATCC 700724) 190 512 Geobacillus kaustophilus (strain HTA426) 191 512 Pseudomonas aeruginosa (strain PA7) 192 511 Acinetobacter baylyi (strain ATCC 33305 / BD413 / ADP1) 193 511 Streptomyces avermitilis 194 508 Bordetella parapertussis (strain 12822 / ATCC BAA-587 / NCTC 13253) 195 507 Solanum lycopersicum (Tomato) (Lycopersicon esculentum) 196 507 Streptococcus pneumoniae (strain ATCC BAA-255 / R6) 197 507 Buchnera aphidicola subsp. Baizongia pistaciae (strain Bp) 198 506 Pyrococcus furiosus (strain ATCC 43587 / DSM 3638 / JCM 8422 / Vc1) 199 505 Nicotiana tabacum (Common tobacco) 200 504 Pseudomonas entomophila (strain L48) 201 501 Methanosarcina mazei 202 499 Brucella abortus biovar 1 (strain 9-941) 203 499 Haemophilus influenzae (strain 86-028NP) 204 498 Thermosynechococcus vestitus (strain NIES-2133 / IAM M-273 / BP-1) 205 497 Burkholderia pseudomallei (strain K96243) 206 497 Proteus mirabilis (strain HI4320) 207 497 Pyrococcus horikoshii 208 496 Rickettsia conorii (strain ATCC VR-613 / Malish 7) 209 496 Shouchella clausii (strain KSM-K16) (Alkalihalobacillus clausii) 210 496 Synechococcus elongatus (strain ATCC 33912 / PCC 7942 / FACHB-805) 211 494 Halobacterium salinarum (strain ATCC 700922 / JCM 11081 / NRC-1) 212 494 Xanthomonas campestris pv. campestris (strain 8004) 213 492 Bacillus velezensis (strain DSM 23117 / BGSC 10A6 / LMG 26770 / FZB42) 214 492 Saccharolobus solfataricus (strain ATCC 35092 / DSM 1617 / JCM 11322 / P2) 215 492 Brucella abortus (strain 2308) 216 491 Vibrio campbellii (strain ATCC BAA-1116) 217 488 Shewanella sp. (strain MR-7) 218 486 Mannheimia succiniciproducens (strain KCTC 0769BP / MBEL55E) 219 485 Shewanella sp. (strain MR-4) 220 484 Staphylococcus aureus (strain Mu3 / ATCC 700698) 221 484 Pseudomonas aeruginosa (strain LESB58) 222 483 Mycoplasma genitalium (strain ATCC 33530 / DSM 19775 / NCTC 10195 / G37) 223 483 Lactiplantibacillus plantarum (strain ATCC BAA-793 / NCIMB 8826 / WCFS1) 224 480 Pseudomonas putida 225 478 Pyrococcus abyssi (strain GE5 / Orsay) 226 477 Cupriavidus necator 227 475 Campylobacter jejuni subsp. jejuni serotype O:2 228 475 Burkholderia lata 229 472 Enterococcus faecalis (strain ATCC 700802 / V583) 230 472 Rhodopseudomonas palustris (strain ATCC BAA-98 / CGA009) 231 470 Clostridium perfringens (strain 13 / Type A) 232 470 Cereibacter sphaeroides 233 468 Shewanella sp. (strain ANA-3) 234 468 Pseudomonas putida (strain GB-1) 235 467 Shewanella frigidimarina (strain NCIMB 400) 236 467 Aeromonas hydrophila subsp. hydrophila 237 466 Xanthomonas euvesicatoria pv. vesicatoria (strain 85-10) 238 465 Trichormus variabilis (strain ATCC 29413 / PCC 7937) (Anabaena variabilis) 239 463 Burkholderia mallei (strain ATCC 23344) 240 461 Ovis aries (Sheep) 241 461 Cupriavidus pinatubonensis (strain JMP 134 / LMG 1197) (Cupriavidus necator 242 460 Methylococcus capsulatus (strain ATCC 33009 / NCIMB 11132 / Bath) 243 457 Rickettsia felis (strain ATCC VR-1525 / URRWXCal2) (Rickettsia azadi) 244 455 Staphylococcus aureus (strain JH1) 245 455 Xanthomonas oryzae pv. oryzae (strain MAFF 311018) 246 455 Shewanella baltica (strain OS185) 247 453 Mycolicibacterium paratuberculosis (strain ATCC BAA-968 / K-10) 248 453 Streptococcus mutans serotype c (strain ATCC 700610 / UA159) 249 453 Pseudomonas putida (strain W619) 250 452 Caldanaerobacter subterraneus subsp. tengcongensis 2.3 Taxonomic distribution of the sequences Kingdom sequences (% of the database) Archaea 19778 ( 3%) Bacteria 336651 ( 59%) Eukaryota 198746 ( 35%) Viruses 17444 ( 3%) Within Eukaryota: Category sequences (% of Eukaryota) (% of the complete database) Human 20422 ( 10%) ( 4%) Other Mammalia 47473 ( 24%) ( 8%) Other Vertebrata 19004 ( 10%) ( 3%) Viridiplantae 41866 ( 21%) ( 7%) Fungi 37303 ( 19%) ( 7%) Insecta 9959 ( 5%) ( 2%) Nematoda 5405 ( 3%) ( 1%) Other 17314 ( 9%) ( 3%) 3. SEQUENCE SIZE Repartition of the sequences by size (excluding fragments) From To Number From To Number 1- 50 9983 1001-1100 4145 51- 100 43651 1101-1200 2915 101- 150 59938 1201-1300 2227 151- 200 59708 1301-1400 2090 201- 250 58639 1401-1500 1693 251- 300 52615 1501-1600 841 301- 350 53063 1601-1700 649 351- 400 46103 1701-1800 600 401- 450 37829 1801-1900 529 451- 500 30725 1901-2000 402 501- 550 22445 2001-2100 277 551- 600 15909 2101-2200 390 601- 650 13219 2201-2300 342 651- 700 9444 2301-2400 239 701- 750 7915 2401-2500 197 751- 800 5729 >2500 1484 801- 850 4903 851- 900 5333 901- 950 4135 951-1000 3021 The average sequence length in UniProtKB/Swiss-Prot is 362 amino acids. The shortest sequence is GWA_SEPOF (P83570): 2 amino acids. The longest sequence is TITIN_MOUSE (A2ASS6): 35213 amino acids. 4. JOURNAL CITATIONS Note: the following citation statistics reflect the number of distinct journal citations. Total number of journals cited in this release of UniProtKB/Swiss-Prot: 3189 4.1 Table of the frequency of journal citations Journals cited 1x: 1009 2x: 428 3x: 222 4x: 151 5x: 126 6x: 90 7x: 65 8x: 79 9x: 48 10x: 45 11- 20x: 245 21- 50x: 274 51-100x: 144 >100x: 263 4.2 List of the most cited journals in UniProtKB/Swiss-Prot Nb Citations Journal name -- --------- ------------------------------------------------------------- 1 27476 Journal of Biological Chemistry 2 12904 Proceedings of the National Academy of Sciences of the U.S.A. 3 7271 Journal of Bacteriology 4 6118 Biochemical and Biophysical Research Communications 5 5939 Biochemistry 6 5385 Nucleic Acids Research 7 5255 Nature 8 5123 FEBS Letters 9 5038 The EMBO Journal 10 4898 Gene 11 4633 Journal of Molecular Biology 12 4604 Molecular and Cellular Biology 13 4070 Biochimica et Biophysica Acta 14 3932 Cell 15 3660 Journal of Virology 16 3521 European Journal of Biochemistry 17 3433 Science 18 3204 Biochemical Journal 19 2897 Molecular Microbiology 20 2828 Plant Physiology 21 2697 PLoS ONE 22 2547 Genomics 23 2458 The American Journal of Human Genetics 24 2394 Journal of Cell Biology 25 2206 The Plant Cell 26 2048 Human Molecular Genetics 27 2042 The Plant Journal 28 1966 Genes and Development 29 1928 Plant Molecular Biology 30 1926 Virology 31 1898 Molecular Cell 32 1863 Nature Genetics 33 1848 Molecular Biology of the Cell 34 1834 Development 35 1724 Journal of Immunology 36 1669 Human Mutation 37 1574 Oncogene 38 1497 Structure 39 1437 Molecular and General Genetics 40 1436 Journal of Biochemistry 41 1431 Genetics 42 1430 Nature Communications 43 1412 Journal of Cell Science 44 1309 Blood 45 1284 Infection and Immunity 46 1199 Microbiology 47 1196 Developmental Biology 48 1194 Journal of General Virology 49 1165 Current Biology 50 1163 Archives of Biochemistry and Biophysics 51 1051 Applied and Environmental Microbiology 52 1051 Journal of Neuroscience 53 1000 Acta Crystallographica, Section D 54 988 Scientific Reports 55 936 FEMS Microbiology Letters 56 936 PLoS Genetics 57 932 Cancer Research 58 899 American Journal of Physiology 59 891 Toxicon 60 877 Protein Science 61 868 Journal of Clinical Investigation 62 855 Yeast 63 845 Neuron 64 776 Plant and Cell Physiology 65 776 The Journal of Experimental Medicine 66 766 Human Genetics 67 726 Journal of Medical Genetics 68 720 PLoS Pathogens 69 719 Nature Structural and Molecular Biology 70 708 The FEBS Journal 71 703 Proteins 72 679 Mechanisms of Development 73 664 Nature Cell Biology 74 654 Nature Structural Biology 75 638 Bioscience, Biotechnology, and Biochemistry 76 615 Antimicrobial Agents and Chemotherapy 77 608 Developmental Cell 78 600 Current Genetics 79 577 Journal of Neurochemistry 80 557 Molecular Endocrinology 81 553 The Journal of Clinical Endocrinology and Metabolism 82 546 Cell Reports 83 543 Endocrinology 84 534 Journal of the American Chemical Society 85 528 Molecular and Biochemical Parasitology 86 497 Eukaryotic Cell 87 496 Experimental Cell Research 88 496 Mammalian Genome 89 486 RNA 90 478 EMBO Reports 91 477 Peptides 92 473 Journal of Experimental Botany 93 471 The FASEB Journal 94 467 American Journal of Medical Genetics. Part A 95 460 Planta 96 440 Molecular Pharmacology 97 436 Acta Crystallographica, Section F 98 434 Immunogenetics 99 432 European Journal of Human Genetics 100 428 101 425 Clinical Genetics 102 423 Molecular Biology and Evolution 103 421 Immunity 104 420 Molecular Plant-Microbe Interactions 105 418 Journal of Investigative Dermatology 106 407 Journal of Molecular Evolution 107 398 Neurology 108 397 DNA and Cell Biology 109 390 Biochimie 110 382 Biology of Reproduction 111 381 DNA Sequence 112 375 Comparative Biochemistry and Physiology 113 362 Virus Research 114 360 Genes to Cells 115 356 Nature Immunology 116 349 Journal of Lipid Research 117 348 Applied Microbiology and Biotechnology 118 347 PLoS Biology 119 346 Developmental Dynamics 120 342 Brain Research. Molecular Brain Research 121 342 The New England Journal of Medicine 122 338 Annals of Neurology 123 338 Journal of Medicinal Chemistry 124 335 BMC Genomics 125 323 European Journal of Immunology 126 319 Genome Research 127 308 Investigative Ophthalmology and Visual Science 128 305 Journal of Human Genetics 129 299 Biological Chemistry Hoppe-Seyler 130 290 Glycobiology 131 289 Nature Chemical Biology 132 286 Brain 133 283 Archives of Microbiology 134 282 Journal of General Microbiology 135 281 Cytogenetics and Cell Genetics 136 265 Traffic 137 263 Fungal Genetics and Biology 138 262 Phytochemistry 139 261 Nature Medicine 140 260 Molecular Genetics and Metabolism 141 259 Protein Expression and Purification 142 258 Molecular Immunology 143 250 Journal of Cellular Biochemistry 144 247 Cell Research 145 246 Cell Cycle 146 243 Circulation Research 147 237 Diabetes 148 234 DNA Research 149 230 Chemistry and Biology 150 230 Insect Biochemistry and Molecular Biology 5. STATISTICS FOR SOME LINE TYPES The following table summarizes the total number of some UniProtKB/Swiss-Prot lines, as well as the number of entries with at least one such line, and the frequency of the lines. Total Number of Average Line type / subtype number entries per entry ------------------------------------ -------- --------- --------- References (RL) 1321287 2.31 Journal 1149403 477298 2.01 1 Submitted to EMBL/GenBank/DDBJ 160311 144404 0.28 2 Submitted to other databases 7850 7171 0.01 3 Book citation 1876 1853 <0.01 4 Plant Gene Register 613 600 <0.01 5 Unpublished observations 536 532 <0.01 6 Thesis 478 475 <0.01 7 Patent 214 207 <0.01 8 Worm Breeder's Gazette 6 6 <0.01 9 Total number of distinct authors cited in UniProtKB/Swiss-Prot: 477612 Total Number of Average Line type / subtype number entries per entry Rank ------------------------------------ -------- --------- --------- ---- Comments (CC) 2765031 4.83 ACTIVITY REGULATION 18564 18437 0.03 17 ALLERGEN 954 954 <0.01 26 ALTERNATIVE PRODUCTS 25952 25952 0.05 14 BIOPHYSICOCHEMICAL PROPERTIES 11901 11849 0.02 20 BIOTECHNOLOGY 2168 2107 <0.01 24 CATALYTIC ACTIVITY 346628 256829 0.61 4 CAUTION 14504 14199 0.03 19 COFACTOR 134048 121660 0.23 7 DEVELOPMENTAL STAGE 14580 14468 0.03 18 DISEASE 8527 5728 0.01 21 DISRUPTION PHENOTYPE 21919 21861 0.04 16 DOMAIN 60438 51186 0.11 9 FUNCTION 494930 469889 0.86 2 INDUCTION 26244 26138 0.05 13 INTERACTION 24903 24903 0.04 15 MASS SPECTROMETRY 7592 5869 0.01 22 MISCELLANEOUS 46467 40866 0.08 11 PATHWAY 144337 130332 0.25 6 PHARMACEUTICAL 171 164 <0.01 29 POLYMORPHISM 1510 1382 <0.01 25 PTM 66029 46750 0.12 8 RNA EDITING 638 638 <0.01 28 SEQUENCE CAUTION 45311 45240 0.08 12 SIMILARITY 521064 516697 0.91 1 SUBCELLULAR LOCATION 367156 358398 0.64 3 SUBUNIT 300327 294704 0.52 5 TISSUE SPECIFICITY 51670 50985 0.09 10 TOXIC DOSE 881 704 <0.01 27 WEB RESOURCE 5618 5033 0.01 23 Total number of comment topics: 29 Total Number of Average Line type / subtype number entries per entry Rank ------------------------------------ -------- --------- --------- ---- Features (FT) 5446911 9.51 ACT_SITE 177453 105799 0.31 9 BINDING 1250414 219596 2.18 1 CARBOHYD 125195 31868 0.22 14 CHAIN 581116 564940 1.01 2 COILED 22637 15660 0.04 25 COMPBIAS 175737 74628 0.31 10 CONFLICT 139575 48630 0.24 12 CROSSLNK 25429 9120 0.04 24 DISULFID 137354 36612 0.24 13 DNA_BIND 12218 10940 0.02 31 DOMAIN 218181 133514 0.38 8 HELIX 355513 30537 0.62 5 INIT_MET 17598 17549 0.03 26 INTRAMEM 3146 1464 0.01 34 LIPID 13916 8913 0.02 28 MOD_RES 263902 74894 0.46 7 MOTIF 48412 31519 0.08 21 MUTAGEN 102049 20648 0.18 17 NON_CONS 2662 833 <0.01 35 NON_STD 360 285 <0.01 36 NON_TER 12613 9696 0.02 30 PEPTIDE 12657 8765 0.02 29 PROPEP 15489 13235 0.03 27 REGION 324335 150583 0.57 6 REPEAT 109718 15238 0.19 15 SIGNAL 44737 44736 0.08 22 SITE 65893 35649 0.12 19 STRAND 361018 28761 0.63 4 TOPO_DOM 152862 30789 0.27 11 TRANSIT 9620 9497 0.02 32 TRANSMEM 383801 80381 0.67 3 TURN 86009 24938 0.15 18 UNSURE 5763 900 0.01 33 VAR_SEQ 53352 22713 0.09 20 VARIANT 105326 17576 0.18 16 ZN_FING 30851 13182 0.05 23 Total number of feature keys: 36 Total Number of Average Line type / subtype number entries per entry Rank Category ------------------------------------ -------- --------- --------- ---- ------------------------------------------- Cross-references (DR) 21183251 36.99 ABCD 3130 3130 0.01 122 Protocols and materials databases AGR 69200 68513 0.12 42 Organism-specific databases Allergome 2044 1314 <0.01 131 Protein family/group databases AlphaFoldDB 548003 548003 0.96 10 3D structure databases Antibodypedia 32305 32196 0.06 61 Protocols and materials databases AntiFam 20 20 <0.01 163 Family and domain databases ArachnoServer 1148 1138 <0.01 139 Organism-specific databases Araport 16414 16318 0.03 92 Organism-specific databases Bgee 61779 61775 0.11 44 Gene expression databases BindingDB 6662 6662 0.01 107 Chemistry databases BioCyc 48194 44143 0.08 53 Enzyme and pathway databases BioGRID 61804 59841 0.11 43 Protein-protein interaction databases BioGRID-ORCS 45050 44464 0.08 55 Miscellaneous databases BioMuta 20295 20268 0.04 78 Genetic variation databases BMRB 6912 6912 0.01 105 3D structure databases BRENDA 20438 18626 0.04 73 Enzyme and pathway databases CarbonylDB 1159 1159 <0.01 138 PTM databases CAZy 9703 8735 0.02 98 Protein family/group databases CCDS 49696 34799 0.09 51 Sequence databases CDD 383784 301706 0.67 16 Family and domain databases CGD 2106 2089 <0.01 129 Organism-specific databases ChEMBL 9144 8954 0.02 99 Chemistry databases ChiTaRS 29789 29744 0.05 63 Miscellaneous databases CollecTF 137 137 <0.01 157 Gene expression databases ComplexPortal 17040 8686 0.03 91 Protein-protein interaction databases ConoServer 967 879 <0.01 141 Organism-specific databases CORUM 5812 5812 0.01 108 Protein-protein interaction databases CPTAC 3472 1929 0.01 117 Proteomic databases CPTC 392 392 <0.01 150 Protocols and materials databases CTD 76195 75326 0.13 40 Organism-specific databases DEPOD 254 254 <0.01 156 PTM databases dictyBase 4225 4111 0.01 114 Organism-specific databases DIP 17565 17524 0.03 89 Protein-protein interaction databases DisGeNET 17608 17410 0.03 88 Organism-specific databases DisProt 1772 1766 <0.01 133 Family and domain databases DMDM 16168 16167 0.03 94 Genetic variation databases DNASU 48450 48371 0.08 52 Protocols and materials databases DrugBank 31669 4787 0.06 62 Chemistry databases DrugCentral 2982 2982 0.01 124 Chemistry databases EchoBASE 4158 4158 0.01 115 Organism-specific databases eggNOG 339868 333998 0.59 17 Phylogenomic databases ELM 1814 1814 <0.01 132 Protein-protein interaction databases EMBL 1008020 559745 1.76 4 Sequence databases EMDB 102185 9780 0.18 36 3D structure databases Ensembl 116642 51241 0.20 33 Genome annotation databases EnsemblBacteria 55500 55322 0.10 48 Genome annotation databases EnsemblFungi 23323 22874 0.04 69 Genome annotation databases EnsemblMetazoa 21112 11819 0.04 72 Genome annotation databases EnsemblPlants 43805 22489 0.08 56 Genome annotation databases EnsemblProtists 5417 5161 0.01 111 Genome annotation databases ESTHER 3014 3011 0.01 123 Protein family/group databases euHCVdb 55 44 <0.01 161 Organism-specific databases EvolutionaryTrace 22656 22656 0.04 70 Miscellaneous databases ExpressionAtlas 51114 51114 0.09 49 Gene expression databases FlyBase 3933 3824 0.01 116 Organism-specific databases FunFam 557539 326990 0.97 9 Family and domain databases Gene3D 741215 460080 1.29 6 Family and domain databases GeneCards 20374 20244 0.04 75 Organism-specific databases GeneID 294083 284397 0.51 23 Genome annotation databases GeneReviews 1609 1605 <0.01 134 Organism-specific databases GeneTree 56345 56333 0.10 47 Phylogenomic databases GeneWiki 10351 10269 0.02 97 Miscellaneous databases GenomeRNAi 22332 22331 0.04 71 Miscellaneous databases GlyConnect 2372 2215 <0.01 126 PTM databases GlyCosmos 28907 28907 0.05 64 PTM databases GlyGen 25595 25595 0.04 66 PTM databases GO 3280483 551635 5.73 1 Ontologies Gramene 43805 22489 0.08 57 Genome annotation databases GuidetoPHARMACOLOGY 2270 2270 <0.01 128 Chemistry databases HAMAP 330971 328034 0.58 19 Family and domain databases HGNC 20374 20246 0.04 74 Organism-specific databases HOGENOM 427856 427856 0.75 15 Phylogenomic databases HPA 19354 19215 0.03 82 Organism-specific databases IDEAL 1101 1101 <0.01 140 Family and domain databases IMGT_GENE-DB 267 267 <0.01 155 Protein family/group databases InParanoid 164186 164186 0.29 26 Phylogenomic databases IntAct 58226 58226 0.10 45 Protein-protein interaction databases InterPro 2560496 554318 4.47 2 Family and domain databases iPTMnet 56773 56773 0.10 46 PTM databases JaponicusDB 43 43 <0.01 162 Organism-specific databases jPOST 26412 26412 0.05 65 Proteomic databases KEGG 514523 476053 0.90 12 Genome annotation databases LegioList 765 763 <0.01 145 Organism-specific databases Leproma 672 669 <0.01 146 Organism-specific databases MaizeGDB 529 525 <0.01 148 Organism-specific databases MalaCards 5693 5684 0.01 110 Organism-specific databases MANE-Select 18530 18417 0.03 85 Genome annotation databases MassIVE 19138 19138 0.03 83 Proteomic databases MEROPS 14242 13823 0.02 95 Protein family/group databases MetOSite 3456 3456 0.01 118 PTM databases MGI 17142 17101 0.03 90 Organism-specific databases MIM 23715 16305 0.04 68 Organism-specific databases MINT 24101 24101 0.04 67 Protein-protein interaction databases MoonDB 348 348 <0.01 154 Protein family/group databases MoonProt 368 368 <0.01 152 Protein family/group databases NCBIfam 302221 278846 0.53 22 Family and domain databases neXtProt 20307 20306 0.04 77 Organism-specific databases NIAGADS 69 69 <0.01 159 Organism-specific databases OGP 373 373 <0.01 151 2D gel databases OMA 120138 120138 0.21 32 Phylogenomic databases OpenTargets 18559 18414 0.03 84 Organism-specific databases Orphanet 7996 4386 0.01 102 Organism-specific databases OrthoDB 276097 276097 0.48 24 Phylogenomic databases PANTHER 1009161 504997 1.76 3 Family and domain databases PathwayCommons 19440 19440 0.03 81 Enzyme and pathway databases PATRIC 93140 93140 0.16 38 Genome annotation databases PaxDb 153859 153859 0.27 27 Proteomic databases PCDDB 133 133 <0.01 158 3D structure databases PDB 322825 36568 0.56 20 3D structure databases PDBsum 322825 36568 0.56 21 3D structure databases PeptideAtlas 39648 39648 0.07 60 Proteomic databases PeroxiBase 793 772 <0.01 144 Protein family/group databases Pfam 854227 543616 1.49 5 Family and domain databases PharmGKB 18032 18013 0.03 87 Organism-specific databases Pharos 20206 20206 0.04 79 Miscellaneous databases PHI-base 2416 1902 <0.01 125 Miscellaneous databases PhosphoSitePlus 42170 42170 0.07 59 PTM databases PhylomeDB 115685 115685 0.20 34 Phylogenomic databases PIR 125204 114868 0.22 31 Sequence databases PIRSF 111072 109903 0.19 35 Family and domain databases PlantReactome 1320 771 <0.01 136 Enzyme and pathway databases PomBase 5129 5125 0.01 112 Organism-specific databases PRIDE 637 637 <0.01 147 Proteomic databases PRINTS 151119 129746 0.26 28 Family and domain databases PRO 98647 98646 0.17 37 Miscellaneous databases ProMEX 489 489 <0.01 149 Proteomic databases PROSITE 493776 311943 0.86 14 Family and domain databases Proteomes 508561 463540 0.89 13 Miscellaneous databases ProteomicsDB 72759 45401 0.13 41 Proteomic databases PseudoCAP 2052 2052 <0.01 130 Organism-specific databases Pumba 18206 18206 0.03 86 Proteomic databases Reactome 145979 38881 0.25 29 Enzyme and pathway databases REBASE 797 396 <0.01 143 Protein family/group databases RefSeq 598394 452707 1.05 8 Sequence databases REPRODUCTION-2DPAGE 1260 1039 <0.01 137 2D gel databases RGD 8140 8139 0.01 101 Organism-specific databases RNAct 43123 43123 0.08 58 Miscellaneous databases SABIO-RK 5761 5761 0.01 109 Enzyme and pathway databases SASBDB 933 933 <0.01 142 3D structure databases SFLD 20366 9099 0.04 76 Family and domain databases SGD 6746 6741 0.01 106 Organism-specific databases SignaLink 19953 19953 0.03 80 Enzyme and pathway databases SIGNOR 7650 7650 0.01 103 Enzyme and pathway databases SMART 206144 148747 0.36 25 Family and domain databases SMR 521254 521254 0.91 11 3D structure databases STRING 336375 336375 0.59 18 Protein-protein interaction databases SUPFAM 649776 460558 1.13 7 Family and domain databases SwissLipids 1478 1394 <0.01 135 Chemistry databases SwissPalm 13365 13365 0.02 96 PTM databases TAIR 16404 16318 0.03 93 Organism-specific databases TCDB 8634 8547 0.02 100 Protein family/group databases TopDownProteomics 3236 2957 0.01 121 Proteomic databases TreeFam 46317 46294 0.08 54 Phylogenomic databases TubercuList 2339 2303 <0.01 127 Organism-specific databases UCSC 50991 46518 0.09 50 Genome annotation databases UniLectin 366 366 <0.01 153 Protein family/group databases UniPathway 140132 126478 0.24 30 Enzyme and pathway databases VEuPathDB 86870 79663 0.15 39 Organism-specific databases VGNC 3448 3445 0.01 119 Organism-specific databases WBParaSite 59 56 <0.01 160 Genome annotation databases WormBase 6973 5096 0.01 104 Organism-specific databases Xenbase 4751 4751 0.01 113 Organism-specific databases ZFIN 3282 3281 0.01 120 Organism-specific databases Total number of cross-referenced databases: 163 6. AMINO ACID COMPOSITION 6.1 Composition in percent for the complete database Ala (A) 8.25 Gln (Q) 3.93 Leu (L) 9.64 Ser (S) 6.65 Arg (R) 5.52 Glu (E) 6.71 Lys (K) 5.80 Thr (T) 5.36 Asn (N) 4.06 Gly (G) 7.07 Met (M) 2.41 Trp (W) 1.10 Asp (D) 5.46 His (H) 2.27 Phe (F) 3.86 Tyr (Y) 2.92 Cys (C) 1.38 Ile (I) 5.90 Pro (P) 4.74 Val (V) 6.85 Asx (B) 0.000 Glx (Z) 0.000 Xaa (X) 0.00 Legend: gray = aliphatic, red = acidic, green = small hydroxy, blue = basic, black = aromatic, white = amide, yellow = sulfur 6.2 Classification of the amino acids by their frequency Leu, Ala, Gly, Val, Glu, Ser, Ile, Lys, Arg, Asp, Thr, Pro, Asn, Gln, Phe, Tyr, Met, His, Cys, Trp 7. MISCELLANEOUS STATISTICS 4467 entries are encoded on a mitochondrion, and 4039 are encoded on a plasmid. 12200 entries are encoded on a plastid, of which 22 are encoded on apicoplasts, 11634 on chloroplasts, 51 on organellar chromatophores, 145 on cyanelles, 149 on non-photosynthetic plastids and 199 on unspecified types of plastid. Number of entries with at least one sequence correction: 81359