UniProtKB/Swiss-Prot protein knowledgebase release 2023_04 statistics 1. INTRODUCTION Release 2023_04 of 13-Sep-2023 of UniProtKB/Swiss-Prot contains 570157 sequence entries, curated from 294587 unique references and comprising 206173379 amino acids. 365 sequences have been added since release 2023_03, the sequence data of 60 existing entries has been updated and the annotations of 391929 entries have been revised. Number of fragments: 9286 Number of additional sequences produced by alternative splicing, initiation or promoter usage, or ribosomal frameshifting: 41011 Protein existence (PE): entries % 1: Evidence at protein level 112765 19.8% 2: Evidence at transcript level 55913 9.8% 3: Inferred from homology 386612 67.8% 4: Predicted 13034 2.3% 5: Uncertain 1833 0.3% The growth of the database is summarized below.2. TAXONOMIC ORIGIN Total number of species represented in this release of UniProtKB/Swiss-Prot: 14509 The first twenty species represent 122865 sequences: 21.5 % of the total number of entries. 2.1 Table of the frequency of occurrence of species Species represented 1x: 5928 2x: 2099 3x: 1123 4x: 770 5x: 530 6x: 438 7x: 330 8x: 273 9x: 239 10x: 155 11- 20x: 836 21- 50x: 504 51-100x: 227 >100x: 1057 2.2 Table of the most represented species ------ --------- -------------------------------------------- Number Frequency Species ------ --------- -------------------------------------------- 1 20426 Homo sapiens (Human) 2 17178 Mus musculus (Mouse) 3 16369 Arabidopsis thaliana (Mouse-ear cress) 4 8183 Rattus norvegicus (Rat) 5 6727 Saccharomyces cerevisiae (strain ATCC 204508 / S288c) (Baker's yeast) 6 6040 Bos taurus (Bovine) 7 5121 Schizosaccharomyces pombe (strain 972 / ATCC 24843) (Fission yeast) 8 4530 Escherichia coli (strain K12) 9 4457 Caenorhabditis elegans 10 4191 Bacillus subtilis (strain 168) 11 4186 Oryza sativa subsp. japonica (Rice) 12 4159 Dictyostelium discoideum (Social amoeba) 13 3723 Drosophila melanogaster (Fruit fly) 14 3493 Xenopus laevis (African clawed frog) 15 3306 Danio rerio (Zebrafish) (Brachydanio rerio) 16 2307 Gallus gallus (Chicken) 17 2306 Mycobacterium tuberculosis (strain ATCC 25618 / H37Rv) 18 2218 Pongo abelii (Sumatran orangutan) (Pongo pygmaeus abelii) 19 2046 Escherichia coli O157:H7 20 1899 Mycobacterium tuberculosis (strain CDC 1551 / Oshkosh) 21 1820 Salmonella typhimurium (strain LT2 / SGSC1412 / ATCC 700720) 22 1787 Methanocaldococcus jannaschii 23 1710 Xenopus tropicalis (Western clawed frog) (Silurana tropicalis) 24 1704 Haemophilus influenzae (strain ATCC 51907 / DSM 11121 / KW20 / Rd) 25 1702 Escherichia coli O6:H1 (strain CFT073 / ATCC 700928 / UPEC) 26 1696 Shigella flexneri 27 1458 Sus scrofa (Pig) 28 1451 Pseudomonas aeruginosa 29 1347 Salmonella typhi 30 1244 Mycobacterium bovis (strain ATCC BAA-935 / AF2122/97) 31 1176 Macaca fascicularis (Crab-eating macaque) (Cynomolgus monkey) 32 1109 Candida albicans (strain SC5314 / ATCC MYA-2876) (Yeast) 33 1087 Synechocystis sp. (strain PCC 6803 / Kazusa) 34 1036 Archaeoglobus fulgidus 35 1030 Yersinia pestis 36 1004 Emericella nidulans 37 997 Vibrio cholerae serotype O1 (strain ATCC 39315 / El Tor Inaba N16961) 38 956 Neurospora crassa 39 941 Staphylococcus aureus (strain Mu50 / ATCC 700699) 40 930 Salmonella paratyphi A (strain ATCC 9150 / SARB42) 41 929 Staphylococcus aureus (strain N315) 42 928 Ashbya gossypii (strain ATCC 10895 / CBS 109.51 / FGSC 9923 / NRRL Y-1056) 43 919 Kluyveromyces lactis 44 909 Acanthamoeba polyphaga mimivirus (APMV) 45 905 Staphylococcus aureus (strain COL) 46 901 Oryctolagus cuniculus (Rabbit) 47 896 Aspergillus fumigatus (strain ATCC MYA-4609 / CBS 101355 / FGSC A1100 / Af293) 48 896 Staphylococcus aureus (strain MW2) 49 894 Escherichia coli O6:K15:H31 (strain 536 / UPEC) 50 890 Staphylococcus aureus (strain MSSA476) 51 888 Staphylococcus aureus (strain MRSA252) 52 887 Rhizobium meliloti (strain 1021) (Ensifer meliloti) (Sinorhizobium meliloti) 53 882 Salmonella choleraesuis (strain SC-B67) 54 882 Candida glabrata 55 879 Shigella sonnei (strain Ss046) 56 867 Oryza sativa subsp. indica (Rice) 57 863 Yersinia pseudotuberculosis serotype I (strain IP32953) 58 848 Zea mays (Maize) 59 847 Canis lupus familiaris (Dog) (Canis familiaris) 60 847 Escherichia coli O9:H4 (strain HS) 61 838 Escherichia coli O139:H28 (strain E24377A / ETEC) 62 829 Shigella boydii serotype 4 (strain Sb227) 63 825 Escherichia coli (strain UTI89 / UPEC) 64 822 Shigella dysenteriae serotype 1 (strain Sd197) 65 822 Escherichia coli 66 815 Streptomyces coelicolor (strain ATCC BAA-471 / A3(2) / M145) 67 809 Staphylococcus aureus (strain NCTC 8325 / PS 47) 68 803 Pectobacterium atrosepticum (strain SCRI 1043 / ATCC BAA-672) 69 796 Vibrio parahaemolyticus serotype O3:K6 (strain RIMD 2210633) 70 791 Escherichia coli (strain SMS-3-5 / SECEC) 71 788 Aquifex aeolicus (strain VF5) 72 779 Escherichia coli O127:H6 (strain E2348/69 / EPEC) 73 771 Escherichia coli (strain K12 / DH10B) 74 770 Pasteurella multocida (strain Pm70) 75 766 Escherichia coli O17:K52:H18 (strain UMN026 / ExPEC) 76 765 Escherichia coli (strain K12 / MC4100 / BW2952) 77 762 Escherichia coli (strain 55989 / EAEC) 78 761 Escherichia coli O8 (strain IAI1) 79 760 Staphylococcus epidermidis (strain ATCC 12228 / FDA PCI 1200) 80 760 Staphylococcus epidermidis (strain ATCC 35984 / RP62A) 81 760 Shigella flexneri serotype 5b (strain 8401) 82 759 Escherichia coli O45:K1 (strain S88 / ExPEC) 83 757 Bacillus anthracis 84 756 Escherichia coli (strain SE11) 85 753 Escherichia coli O7:K1 (strain IAI39 / ExPEC) 86 749 Photorhabdus laumondii subsp. laumondii (strain DSM 15139 / CIP 105565 / TT01) 87 748 Escherichia coli O157:H7 (strain EC4115 / EHEC) 88 744 Halalkalibacterium halodurans 89 739 Yersinia enterocolitica serotype O:8 / biotype 1B (strain NCTC 13174 / 8081) 90 733 Vibrio vulnificus (strain CMCP6) 91 732 Pseudomonas putida 92 731 Escherichia coli O81 (strain ED1a) 93 722 Salmonella enteritidis PT4 (strain P125109) 94 718 Vibrio vulnificus (strain YJ016) 95 716 Salmonella paratyphi B (strain ATCC BAA-1250 / SPB7) 96 715 Klebsiella pneumoniae subsp. pneumoniae (strain ATCC 700721 / MGH 78578) 97 715 Yersinia pestis bv. Antiqua (strain Nepal516) 98 715 Escherichia coli O1:K1 / APEC 99 715 Enterobacter sp. (strain 638) 100 714 Salmonella paratyphi A (strain AKU_12601) 101 713 Yersinia pseudotuberculosis serotype O:1b (strain IP 31758) 102 713 Salmonella agona (strain SL483) 103 713 Salmonella newport (strain SL254) 104 712 Salmonella schwarzengrund (strain CVM19633) 105 711 Escherichia coli 106 711 Yersinia pestis bv. Antiqua (strain Antiqua) 107 710 Salmonella heidelberg (strain SL476) 108 703 Nostoc sp. (strain PCC 7120 / SAG 25.82 / UTEX 2576) 109 702 Salmonella dublin (strain CT_02021853) 110 699 Klebsiella pneumoniae (strain 342) 111 698 Shigella boydii serotype 18 (strain CDC 3083-94 / BS512) 112 695 Escherichia fergusonii 113 692 Pan troglodytes (Chimpanzee) 114 686 Mycoplasma pneumoniae (strain ATCC 29342 / M129 / Subtype 1) 115 684 Salmonella gallinarum (strain 287/91 / NCTC 13346) 116 682 Pseudomonas syringae pv. tomato (strain ATCC BAA-871 / DC3000) 117 679 Citrobacter koseri (strain ATCC BAA-895 / CDC 4225-83 / SGSC4696) 118 679 Staphylococcus aureus (strain USA300) 119 672 Serratia proteamaculans (strain 568) 120 669 Bacillus cereus 121 669 Mycobacterium leprae (strain TN) 122 667 Yersinia pestis (strain Pestoides F) 123 666 Yarrowia lipolytica (strain CLIB 122 / E 150) (Yeast) (Candida lipolytica) 124 664 Bradyrhizobium diazoefficiens 125 658 Shewanella oneidensis (strain MR-1) 126 658 Sinorhizobium fredii (strain NBRC 101917 / NGR234) 127 654 Agrobacterium fabrum (strain C58 / ATCC 33970) (Agrobacterium tumefaciens 128 653 Debaryomyces hansenii 129 643 Staphylococcus aureus (strain bovine RF122 / ET3-1) 130 642 Salmonella arizonae (strain ATCC BAA-731 / CDC346-86 / RSK2980) 131 642 Yersinia pseudotuberculosis serotype O:3 (strain YPIII) 132 634 Yersinia pseudotuberculosis serotype IB (strain PB1/+) 133 622 Listeria monocytogenes serovar 1/2a (strain ATCC BAA-679 / EGD-e) 134 622 Methanothermobacter thermautotrophicus 135 622 Cronobacter sakazakii (strain ATCC BAA-894) (Enterobacter sakazakii) 136 622 Treponema pallidum (strain Nichols) 137 618 Pseudomonas aeruginosa (strain UCBPP-PA14) 138 615 Xanthomonas campestris pv. campestris 139 614 Staphylococcus haemolyticus (strain JCSC1435) 140 613 Mesorhizobium japonicum (Mesorhizobium loti 141 612 Helicobacter pylori (strain ATCC 700392 / 26695) (Campylobacter pylori) 142 605 Listeria innocua serovar 6a (strain ATCC BAA-680 / CLIP 11262) 143 603 Ralstonia nicotianae (strain GMI1000) (Ralstonia solanacearum) 144 602 Staphylococcus saprophyticus subsp. saprophyticus 145 602 Photobacterium profundum (strain SS9) 146 601 Salmonella paratyphi C (strain RKS4594) 147 600 Yersinia pestis bv. Antiqua (strain Angola) 148 595 Bacillus cereus (strain ATCC 10987 / NRS 248) 149 591 Pectobacterium carotovorum subsp. carotovorum (strain PC1) 150 587 Neisseria meningitidis serogroup B (strain MC58) 151 587 Mycolicibacterium smegmatis (strain ATCC 700084 / mc(2)155) 152 584 Rickettsia prowazekii (strain Madrid E) 153 582 Caenorhabditis briggsae 154 579 Brucella suis biovar 1 (strain 1330) 155 576 Brucella melitensis biotype 1 (strain 16M / ATCC 23456 / NCTC 10094) 156 575 Caulobacter vibrioides (strain ATCC 19089 / CB15) (Caulobacter crescentus) 157 573 Aliivibrio fischeri (strain ATCC 700601 / ES114) (Vibrio fischeri) 158 572 Buchnera aphidicola subsp. Acyrthosiphon pisum (strain APS) 159 569 Bacillus thuringiensis subsp. konkukian (strain 97-27) 160 568 Helicobacter pylori (strain J99 / ATCC 700824) (Campylobacter pylori J99) 161 568 Pseudomonas syringae pv. syringae (strain B728a) 162 566 Aspergillus oryzae (strain ATCC 42149 / RIB 40) (Yellow koji mold) 163 565 Bacillus licheniformis 164 564 Thermotoga maritima 165 562 Buchnera aphidicola subsp. Schizaphis graminum (strain Sg) 166 562 Bacillus cereus (strain ZK / E33L) 167 559 Clostridium acetobutylicum 168 557 Xanthomonas axonopodis pv. citri (strain 306) 169 555 Pseudomonas fluorescens (strain Pf0-1) 170 554 Neisseria meningitidis serogroup A / serotype 4A (strain DSM 15465 / Z2491) 171 554 Pseudomonas fluorescens (strain ATCC BAA-477 / NRRL B-23932 / Pf-5) 172 553 Oceanobacillus iheyensis 173 547 Pseudomonas savastanoi pv. phaseolicola (Pseudomonas syringae pv. phaseolicola 174 540 Lactococcus lactis subsp. lactis (strain IL1403) (Streptococcus lactis) 175 540 Corynebacterium glutamicum 176 531 Erwinia tasmaniensis 177 529 Bordetella bronchiseptica (strain ATCC BAA-588 / NCTC 13252 / RB50) 178 529 Sodalis glossinidius (strain morsitans) 179 529 Listeria monocytogenes serotype 4b (strain F2365) 180 524 Staphylococcus aureus (strain Newman) 181 523 Vibrio cholerae serotype O1 (strain ATCC 39541 / Classical Ogawa 395 / O395) 182 522 Xylella fastidiosa (strain 9a5c) 183 521 Deinococcus radiodurans 184 519 Methanosarcina acetivorans (strain ATCC 35395 / DSM 2834 / JCM 12185 / C2A) 185 519 Chromobacterium violaceum 186 516 Bordetella pertussis (strain Tohama I / ATCC BAA-589 / NCTC 13251) 187 516 Streptococcus pneumoniae serotype 4 (strain ATCC BAA-334 / TIGR4) 188 515 Xylella fastidiosa (strain Temecula1 / ATCC 700964) 189 512 Pseudomonas aeruginosa (strain PA7) 190 512 Geobacillus kaustophilus (strain HTA426) 191 511 Acinetobacter baylyi (strain ATCC 33305 / BD413 / ADP1) 192 511 Haemophilus ducreyi (strain 35000HP / ATCC 700724) 193 511 Streptomyces avermitilis 194 508 Bordetella parapertussis (strain 12822 / ATCC BAA-587 / NCTC 13253) 195 507 Buchnera aphidicola subsp. Baizongia pistaciae (strain Bp) 196 506 Pyrococcus furiosus (strain ATCC 43587 / DSM 3638 / JCM 8422 / Vc1) 197 505 Nicotiana tabacum (Common tobacco) 198 505 Solanum lycopersicum (Tomato) (Lycopersicon esculentum) 199 504 Pseudomonas entomophila (strain L48) 200 499 Haemophilus influenzae (strain 86-028NP) 201 499 Brucella abortus biovar 1 (strain 9-941) 202 497 Burkholderia pseudomallei (strain K96243) 203 496 Alkalihalobacillus clausii (strain KSM-K16) (Bacillus clausii) 204 496 Proteus mirabilis (strain HI4320) 205 496 Rickettsia conorii (strain ATCC VR-613 / Malish 7) 206 495 Thermosynechococcus vestitus (strain NIES-2133 / IAM M-273 / BP-1) 207 495 Pyrococcus horikoshii 208 494 Xanthomonas campestris pv. campestris (strain 8004) 209 493 Synechococcus elongatus (strain ATCC 33912 / PCC 7942 / FACHB-805) 210 493 Streptococcus pneumoniae (strain ATCC BAA-255 / R6) 211 492 Methanosarcina mazei 212 492 Brucella abortus (strain 2308) 213 492 Halobacterium salinarum (strain ATCC 700922 / JCM 11081 / NRC-1) 214 492 Bacillus velezensis (strain DSM 23117 / BGSC 10A6 / LMG 26770 / FZB42) 215 491 Vibrio campbellii (strain ATCC BAA-1116) 216 490 Saccharolobus solfataricus (strain ATCC 35092 / DSM 1617 / JCM 11322 / P2) 217 487 Shewanella sp. (strain MR-7) 218 486 Mannheimia succiniciproducens (strain MBEL55E) 219 484 Pseudomonas aeruginosa (strain LESB58) 220 484 Staphylococcus aureus (strain Mu3 / ATCC 700698) 221 484 Shewanella sp. (strain MR-4) 222 483 Mycoplasma genitalium (strain ATCC 33530 / DSM 19775 / NCTC 10195 / G37) 223 483 Lactiplantibacillus plantarum (strain ATCC BAA-793 / NCIMB 8826 / WCFS1) 224 479 Pseudomonas putida (strain ATCC 700007 / DSM 6899 / BCRC 17059 / F1) 225 477 Pyrococcus abyssi (strain GE5 / Orsay) 226 476 Cupriavidus necator 227 475 Burkholderia lata 228 473 Campylobacter jejuni subsp. jejuni serotype O:2 229 472 Rhodopseudomonas palustris (strain ATCC BAA-98 / CGA009) 230 470 Clostridium perfringens (strain 13 / Type A) 231 469 Enterococcus faecalis (strain ATCC 700802 / V583) 232 469 Cereibacter sphaeroides 233 468 Shewanella sp. (strain ANA-3) 234 468 Pseudomonas putida (strain GB-1) 235 467 Shewanella frigidimarina (strain NCIMB 400) 236 467 Aeromonas hydrophila subsp. hydrophila 237 466 Xanthomonas campestris pv. vesicatoria (strain 85-10) 238 465 Trichormus variabilis (strain ATCC 29413 / PCC 7937) (Anabaena variabilis) 239 463 Burkholderia mallei (strain ATCC 23344) 240 461 Cupriavidus pinatubonensis (strain JMP 134 / LMG 1197) (Cupriavidus necator 241 460 Methylococcus capsulatus (strain ATCC 33009 / NCIMB 11132 / Bath) 242 460 Ovis aries (Sheep) 243 457 Rickettsia felis (strain ATCC VR-1525 / URRWXCal2) (Rickettsia azadi) 244 455 Shewanella baltica (strain OS185) 245 455 Xanthomonas oryzae pv. oryzae (strain MAFF 311018) 246 455 Staphylococcus aureus (strain JH1) 247 453 Mycolicibacterium paratuberculosis (strain ATCC BAA-968 / K-10) 248 453 Streptococcus mutans serotype c (strain ATCC 700610 / UA159) 249 453 Pseudomonas putida (strain W619) 250 452 Aeromonas salmonicida (strain A449) 2.3 Taxonomic distribution of the sequences
Kingdom sequences (% of the database) Archaea 19710 ( 3%) Bacteria 336064 ( 59%) Eukaryota 197016 ( 35%) Viruses 17367 ( 3%) Within Eukaryota:
Category sequences (% of Eukaryota) (% of the complete database) Human 20427 ( 10%) ( 4%) Other Mammalia 47304 ( 24%) ( 8%) Other Vertebrata 18918 ( 10%) ( 3%) Viridiplantae 41669 ( 21%) ( 7%) Fungi 36557 ( 19%) ( 6%) Insecta 9726 ( 5%) ( 2%) Nematoda 5375 ( 3%) ( 1%) Other 17040 ( 9%) ( 3%) 3. SEQUENCE SIZE Repartition of the sequences by size (excluding fragments) From To Number From To Number 1- 50 9963 1001-1100 4108 51- 100 43516 1101-1200 2890 101- 150 59790 1201-1300 2205 151- 200 59561 1301-1400 2068 201- 250 58425 1401-1500 1671 251- 300 52384 1501-1600 831 301- 350 52827 1601-1700 642 351- 400 45880 1701-1800 586 401- 450 37685 1801-1900 503 451- 500 30550 1901-2000 395 501- 550 22275 2001-2100 271 551- 600 15818 2101-2200 386 601- 650 13153 2201-2300 340 651- 700 9396 2301-2400 234 701- 750 7868 2401-2500 195 751- 800 5694 >2500 1458 801- 850 4888 851- 900 5299 901- 950 4109 951-1000 3007
The average sequence length in UniProtKB/Swiss-Prot is 361 amino acids. The shortest sequence is GWA_SEPOF (P83570): 2 amino acids. The longest sequence is TITIN_MOUSE (A2ASS6): 35213 amino acids. 4. JOURNAL CITATIONS Note: the following citation statistics reflect the number of distinct journal citations. Total number of journals cited in this release of UniProtKB/Swiss-Prot: 3128 4.1 Table of the frequency of journal citations Journals cited 1x: 1002 2x: 427 3x: 221 4x: 139 5x: 125 6x: 87 7x: 66 8x: 79 9x: 49 10x: 36 11- 20x: 242 21- 50x: 258 51-100x: 140 >100x: 257 4.2 List of the most cited journals in UniProtKB/Swiss-Prot Nb Citations Journal name -- --------- ------------------------------------------------------------- 1 26958 Journal of Biological Chemistry 2 12569 Proceedings of the National Academy of Sciences of the U.S.A. 3 7151 Journal of Bacteriology 4 6009 Biochemical and Biophysical Research Communications 5 5811 Biochemistry 6 5293 Nucleic Acids Research 7 5063 Nature 8 5063 FEBS Letters 9 4937 The EMBO Journal 10 4885 Gene 11 4568 Journal of Molecular Biology 12 4553 Molecular and Cellular Biology 13 3999 Biochimica et Biophysica Acta 14 3832 Cell 15 3574 Journal of Virology 16 3502 European Journal of Biochemistry 17 3349 Science 18 3142 Biochemical Journal 19 2831 Molecular Microbiology 20 2804 Plant Physiology 21 2555 PLoS ONE 22 2546 Genomics 23 2422 The American Journal of Human Genetics 24 2342 Journal of Cell Biology 25 2195 The Plant Cell 26 2033 The Plant Journal 27 2001 Human Molecular Genetics 28 1937 Genes and Development 29 1922 Plant Molecular Biology 30 1896 Virology 31 1842 Nature Genetics 32 1814 Development 33 1805 Molecular Biology of the Cell 34 1777 Molecular Cell 35 1678 Journal of Immunology 36 1645 Human Mutation 37 1565 Oncogene 38 1451 Structure 39 1426 Molecular and General Genetics 40 1411 Genetics 41 1409 Journal of Biochemistry 42 1372 Journal of Cell Science 43 1269 Blood 44 1259 Infection and Immunity 45 1184 Journal of General Virology 46 1181 Developmental Biology 47 1170 Microbiology 48 1144 Archives of Biochemistry and Biophysics 49 1143 Nature Communications 50 1133 Current Biology 51 1015 Journal of Neuroscience 52 1014 Applied and Environmental Microbiology 53 991 Acta Crystallographica, Section D 54 924 Cancer Research 55 905 FEMS Microbiology Letters 56 886 Toxicon 57 886 PLoS Genetics 58 877 American Journal of Physiology 59 858 Protein Science 60 849 Journal of Clinical Investigation 61 847 Yeast 62 841 Scientific Reports 63 816 Neuron 64 763 Plant and Cell Physiology 65 751 Human Genetics 66 751 The Journal of Experimental Medicine 67 700 Journal of Medical Genetics 68 691 Proteins 69 671 The FEBS Journal 70 671 Mechanisms of Development 71 649 Nature Structural Biology 72 642 Nature Structural and Molecular Biology 73 635 PLoS Pathogens 74 634 Nature Cell Biology 75 622 Bioscience, Biotechnology, and Biochemistry 76 589 Current Genetics 77 582 Developmental Cell 78 573 Journal of Neurochemistry 79 552 Molecular Endocrinology 80 549 The Journal of Clinical Endocrinology and Metabolism 81 542 Antimicrobial Agents and Chemotherapy 82 539 Endocrinology 83 510 Molecular and Biochemical Parasitology 84 500 Journal of the American Chemical Society 85 495 Mammalian Genome 86 489 Experimental Cell Research 87 475 Peptides 88 475 Eukaryotic Cell 89 465 RNA 90 463 Journal of Experimental Botany 91 462 Cell Reports 92 455 Planta 93 445 EMBO Reports 94 443 American Journal of Medical Genetics. Part A 95 441 The FASEB Journal 96 434 Immunogenetics 97 431 Molecular Pharmacology 98 421 Acta Crystallographica, Section F 99 417 Molecular Biology and Evolution 100 413 European Journal of Human Genetics 101 407 Journal of Molecular Evolution 102 407 Immunity 103 404 Molecular Plant-Microbe Interactions 104 396 Journal of Investigative Dermatology 105 394 DNA and Cell Biology 106 390 Neurology 107 388 Clinical Genetics 108 380 DNA Sequence 109 378 Biochimie 110 376 Biology of Reproduction 111 369 Comparative Biochemistry and Physiology 112 360 Virus Research 113 358 114 356 Genes to Cells 115 346 Journal of Lipid Research 116 342 Brain Research. Molecular Brain Research 117 338 Nature Immunology 118 337 The New England Journal of Medicine 119 335 Developmental Dynamics 120 331 Annals of Neurology 121 327 PLoS Biology 122 325 BMC Genomics 123 319 Applied Microbiology and Biotechnology 124 314 European Journal of Immunology 125 310 Journal of Medicinal Chemistry 126 309 Genome Research 127 307 Investigative Ophthalmology and Visual Science 128 299 Biological Chemistry Hoppe-Seyler 129 296 Journal of Human Genetics 130 281 Cytogenetics and Cell Genetics 131 280 Glycobiology 132 280 Journal of General Microbiology 133 277 Archives of Microbiology 134 261 Brain 135 261 Nature Chemical Biology 136 258 Traffic 137 258 Phytochemistry 138 255 Molecular Genetics and Metabolism 139 252 Molecular Immunology 140 252 Protein Expression and Purification 141 251 Nature Medicine 142 248 Journal of Cellular Biochemistry 143 247 Fungal Genetics and Biology 144 240 Cell Cycle 145 234 DNA Research 146 233 Circulation Research 147 233 Diabetes 148 227 Archives of Virology 149 223 Cell Research 150 222 Journal of Structural Biology 5. STATISTICS FOR SOME LINE TYPES The following table summarizes the total number of some UniProtKB/Swiss-Prot lines, as well as the number of entries with at least one such line, and the frequency of the lines. Total Number of Average Line type / subtype number entries per entry ------------------------------------ -------- --------- --------- References (RL) 1299566 2.28 Journal 1126189 472580 1.98 1 Submitted to EMBL/GenBank/DDBJ 161948 146088 0.28 2 Submitted to other databases 7763 7101 0.01 3 Book citation 1866 1843 <0.01 4 Plant Gene Register 613 600 <0.01 5 Unpublished observations 510 506 <0.01 6 Thesis 457 454 <0.01 7 Patent 214 207 <0.01 8 Worm Breeder's Gazette 6 6 <0.01 9 Total number of distinct authors cited in UniProtKB/Swiss-Prot: 462640 Total Number of Average Line type / subtype number entries per entry Rank ------------------------------------ -------- --------- --------- ---- Comments (CC) 2726384 4.78 ACTIVITY REGULATION 18288 18171 0.03 17 ALLERGEN 947 947 <0.01 26 ALTERNATIVE PRODUCTS 25843 25843 0.05 13 BIOPHYSICOCHEMICAL PROPERTIES 11265 11217 0.02 20 BIOTECHNOLOGY 1851 1792 <0.01 24 CATALYTIC ACTIVITY 337165 253712 0.59 4 CAUTION 14365 14065 0.03 18 COFACTOR 132734 120474 0.23 7 DEVELOPMENTAL STAGE 14150 14069 0.02 19 DISEASE 8273 5568 0.01 21 DISRUPTION PHENOTYPE 20272 20242 0.04 16 DOMAIN 57963 49411 0.10 9 FUNCTION 488435 464395 0.86 2 INDUCTION 25347 25259 0.04 14 INTERACTION 23985 23985 0.04 15 MASS SPECTROMETRY 7511 5796 0.01 22 MISCELLANEOUS 45630 40088 0.08 11 PATHWAY 143743 129772 0.25 6 PHARMACEUTICAL 166 159 <0.01 29 POLYMORPHISM 1487 1359 <0.01 25 PTM 64042 45664 0.11 8 RNA EDITING 634 634 <0.01 28 SEQUENCE CAUTION 45159 45089 0.08 12 SIMILARITY 518702 514408 0.91 1 SUBCELLULAR LOCATION 364166 355704 0.64 3 SUBUNIT 296071 290828 0.52 5 TISSUE SPECIFICITY 50781 50253 0.09 10 TOXIC DOSE 851 680 <0.01 27 WEB RESOURCE 6558 5553 0.01 23 Total number of comment topics: 29 Total Number of Average Line type / subtype number entries per entry Rank ------------------------------------ -------- --------- --------- ---- Features (FT) 5292942 9.28 ACT_SITE 176467 104958 0.31 9 BINDING 1189210 216459 2.09 1 CARBOHYD 123798 31439 0.22 14 CHAIN 578456 562525 1.01 2 COILED 22445 15541 0.04 25 COMPBIAS 174388 74082 0.31 10 CONFLICT 138882 48427 0.24 12 CROSSLNK 24867 8946 0.04 24 DISULFID 134823 35864 0.24 13 DNA_BIND 12157 10882 0.02 31 DOMAIN 214539 131107 0.38 8 HELIX 331916 28903 0.58 5 INIT_MET 17535 17486 0.03 26 INTRAMEM 3013 1384 0.01 34 LIPID 13784 8844 0.02 28 MOD_RES 261285 74425 0.46 7 MOTIF 47460 30924 0.08 21 MUTAGEN 94580 19523 0.17 17 NON_CONS 2552 826 <0.01 35 NON_STD 358 283 <0.01 36 NON_TER 12618 9690 0.02 29 PEPTIDE 12607 8719 0.02 30 PROPEP 15250 12994 0.03 27 REGION 320024 149772 0.56 6 REPEAT 109011 15152 0.19 15 SIGNAL 44161 44160 0.08 22 SITE 64935 35233 0.11 19 STRAND 338259 27248 0.59 4 TOPO_DOM 149612 30364 0.26 11 TRANSIT 9523 9403 0.02 32 TRANSMEM 381189 79862 0.67 3 TURN 80219 23550 0.14 18 UNSURE 5744 892 0.01 33 VAR_SEQ 53056 22606 0.09 20 VARIANT 103573 17485 0.18 16 ZN_FING 30646 13089 0.05 23 Total number of feature keys: 36 Total Number of Average Line type / subtype number entries per entry Rank Category ------------------------------------ -------- --------- --------- ---- ------------------------------------------- Cross-references (DR) 20592885 36.12 ABCD 3063 3063 0.01 122 Protocols and materials databases AGR 60776 60107 0.11 43 Organism-specific databases Allergome 2036 1309 <0.01 129 Protein family/group databases AlphaFoldDB 546152 546152 0.96 9 3D structure databases Antibodypedia 32277 32168 0.06 61 Protocols and materials databases ArachnoServer 1164 1154 <0.01 139 Organism-specific databases Araport 16390 16294 0.03 91 Organism-specific databases Bgee 61341 61339 0.11 41 Gene expression databases BindingDB 6417 6417 0.01 107 Chemistry databases BioCyc 47988 43951 0.08 52 Enzyme and pathway databases BioGRID 61209 59317 0.11 42 Protein-protein interaction databases BioGRID-ORCS 44919 44336 0.08 54 Miscellaneous databases BioMuta 20308 20283 0.04 75 Genetic variation databases BMRB 6905 6905 0.01 105 3D structure databases BRENDA 20299 18498 0.04 76 Enzyme and pathway databases CarbonylDB 1159 1159 <0.01 140 PTM databases CAZy 9603 8650 0.02 98 Protein family/group databases CCDS 49435 34604 0.09 50 Sequence databases CDD 382467 300618 0.67 16 Family and domain databases CGD 2065 2048 <0.01 128 Organism-specific databases ChEMBL 8871 8670 0.02 99 Chemistry databases ChiTaRS 29738 29694 0.05 63 Miscellaneous databases CLAE 359 356 <0.01 155 Protein family/group databases CollecTF 137 137 <0.01 162 Gene expression databases ComplexPortal 13251 7253 0.02 96 Protein-protein interaction databases COMPLUYEAST-2DPAGE 97 97 <0.01 164 2D gel databases ConoServer 967 879 <0.01 142 Organism-specific databases CORUM 5811 5811 0.01 108 Protein-protein interaction databases CPTAC 3472 1929 0.01 118 Proteomic databases CPTC 384 384 <0.01 153 Protocols and materials databases CTD 76276 75365 0.13 39 Organism-specific databases DEPOD 254 254 <0.01 160 PTM databases dictyBase 4224 4110 0.01 115 Organism-specific databases DIP 17537 17497 0.03 87 Protein-protein interaction databases DisGeNET 17012 16794 0.03 89 Organism-specific databases DisProt 1736 1728 <0.01 131 Family and domain databases DMDM 16170 16170 0.03 93 Genetic variation databases DNASU 48326 48248 0.08 51 Protocols and materials databases DOSAC-COBS-2DPAGE 145 145 <0.01 161 2D gel databases DrugBank 31175 4771 0.05 62 Chemistry databases DrugCentral 2565 2565 <0.01 124 Chemistry databases EchoBASE 4158 4158 0.01 116 Organism-specific databases eggNOG 338896 333062 0.59 18 Phylogenomic databases ELM 1814 1814 <0.01 130 Protein-protein interaction databases EMBL 1004169 557433 1.76 3 Sequence databases Ensembl 98515 48460 0.17 35 Genome annotation databases EnsemblBacteria 309955 298424 0.54 20 Genome annotation databases EnsemblFungi 23047 22605 0.04 69 Genome annotation databases EnsemblMetazoa 18857 11444 0.03 82 Genome annotation databases EnsemblPlants 35484 22109 0.06 58 Genome annotation databases EnsemblProtists 5298 5043 0.01 111 Genome annotation databases EPD 23251 23251 0.04 67 Proteomic databases ESTHER 2981 2980 0.01 123 Protein family/group databases euHCVdb 55 44 <0.01 166 Organism-specific databases EvolutionaryTrace 16760 16760 0.03 90 Miscellaneous databases ExpressionAtlas 52825 52825 0.09 48 Gene expression databases FlyBase 4126 4011 0.01 117 Organism-specific databases Gene3D 738284 458298 1.29 6 Family and domain databases GeneCards 20377 20243 0.04 72 Organism-specific databases GeneID 274931 267231 0.48 24 Genome annotation databases GeneReviews 1578 1575 <0.01 132 Organism-specific databases GeneTree 57081 57072 0.10 44 Phylogenomic databases Genevisible 55276 55276 0.10 46 Gene expression databases GeneWiki 10351 10269 0.02 97 Miscellaneous databases GenomeRNAi 22256 22256 0.04 70 Miscellaneous databases GlyConnect 2372 2215 <0.01 125 PTM databases GlyCosmos 28902 28902 0.05 64 PTM databases GlyGen 21596 21596 0.04 71 PTM databases GO 3165709 546390 5.55 1 Ontologies Gramene 35484 22109 0.06 59 Genome annotation databases GuidetoPHARMACOLOGY 2179 2179 <0.01 127 Chemistry databases HAMAP 330866 327933 0.58 19 Family and domain databases HGNC 20372 20242 0.04 73 Organism-specific databases HOGENOM 426683 426683 0.75 15 Phylogenomic databases HPA 19323 19203 0.03 81 Organism-specific databases IDEAL 986 986 <0.01 141 Family and domain databases IMGT_GENE-DB 267 267 <0.01 159 Protein family/group databases InParanoid 163517 163517 0.29 27 Phylogenomic databases IntAct 56812 56812 0.10 45 Protein-protein interaction databases InterPro 2399884 550928 4.21 2 Family and domain databases iPTMnet 54156 54156 0.09 47 PTM databases jPOST 26408 26408 0.05 65 Proteomic databases KEGG 503579 478355 0.88 12 Genome annotation databases LegioList 765 763 <0.01 147 Organism-specific databases Leproma 672 669 <0.01 148 Organism-specific databases MaizeGDB 529 525 <0.01 150 Organism-specific databases MalaCards 5619 5610 0.01 109 Organism-specific databases MANE-Select 18373 18261 0.03 85 Genome annotation databases MassIVE 18715 18715 0.03 83 Proteomic databases MaxQB 33722 33722 0.06 60 Proteomic databases MEROPS 14193 13775 0.02 94 Protein family/group databases MetOSite 3111 3111 0.01 121 PTM databases MGI 17088 17047 0.03 88 Organism-specific databases MIM 23196 16030 0.04 68 Organism-specific databases MINT 23452 23452 0.04 66 Protein-protein interaction databases MoonDB 348 348 <0.01 156 Protein family/group databases MoonProt 281 281 <0.01 158 Protein family/group databases NCBIfam 299700 277265 0.53 21 Family and domain databases neXtProt 20324 20324 0.04 74 Organism-specific databases NIAGADS 69 69 <0.01 165 Organism-specific databases OGP 373 373 <0.01 154 2D gel databases OMA 430195 430195 0.75 14 Phylogenomic databases OpenTargets 18424 18279 0.03 84 Organism-specific databases Orphanet 8144 4374 0.01 101 Organism-specific databases OrthoDB 274783 274783 0.48 25 Phylogenomic databases PANTHER 1002474 501638 1.76 4 Family and domain databases PathwayCommons 19454 19454 0.03 80 Enzyme and pathway databases PATRIC 92912 92912 0.16 37 Genome annotation databases PaxDb 131498 131498 0.23 31 Proteomic databases PCDDB 127 127 <0.01 163 3D structure databases PDB 275445 34119 0.48 23 3D structure databases PDBsum 275445 34119 0.48 22 3D structure databases PeptideAtlas 39451 39451 0.07 57 Proteomic databases PeroxiBase 792 771 <0.01 144 Protein family/group databases Pfam 821789 538883 1.44 5 Family and domain databases PharmGKB 18033 18014 0.03 86 Organism-specific databases Pharos 20224 20224 0.04 78 Miscellaneous databases PHI-base 1536 1274 <0.01 133 Miscellaneous databases PhosphoSitePlus 39628 39628 0.07 56 PTM databases PhylomeDB 115454 115454 0.20 33 Phylogenomic databases PIR 125023 114704 0.22 32 Sequence databases PIRSF 110842 109675 0.19 34 Family and domain databases PlantReactome 1278 750 <0.01 136 Enzyme and pathway databases PomBase 5129 5125 0.01 112 Organism-specific databases PRIDE 636 636 <0.01 149 Proteomic databases PRINTS 150615 129354 0.26 28 Family and domain databases PRO 98140 98140 0.17 36 Miscellaneous databases ProMEX 486 486 <0.01 152 Proteomic databases PROSITE 490433 310432 0.86 13 Family and domain databases Proteomes 505141 462035 0.89 11 Miscellaneous databases ProteomicsDB 72660 45354 0.13 40 Proteomic databases PseudoCAP 1460 1451 <0.01 135 Organism-specific databases Reactome 141328 37969 0.25 29 Enzyme and pathway databases REBASE 790 395 <0.01 145 Protein family/group databases RefSeq 556904 426108 0.98 8 Sequence databases REPRODUCTION-2DPAGE 1260 1039 <0.01 137 2D gel databases RGD 8112 8111 0.01 102 Organism-specific databases RNAct 43099 43099 0.08 55 Miscellaneous databases SABIO-RK 5578 5578 0.01 110 Enzyme and pathway databases SASBDB 774 774 <0.01 146 3D structure databases SFLD 20265 9042 0.04 77 Family and domain databases SGD 6746 6741 0.01 106 Organism-specific databases SignaLink 19959 19959 0.04 79 Enzyme and pathway databases SIGNOR 7262 7262 0.01 103 Enzyme and pathway databases SMART 205240 148124 0.36 26 Family and domain databases SMR 514224 514224 0.90 10 3D structure databases STRING 366524 366524 0.64 17 Protein-protein interaction databases SUPFAM 647395 458882 1.14 7 Family and domain databases SWISS-2DPAGE 1177 1177 <0.01 138 2D gel databases SwissLipids 1478 1394 <0.01 134 Chemistry databases SwissPalm 13335 13335 0.02 95 PTM databases TAIR 16364 16278 0.03 92 Organism-specific databases TCDB 8416 8342 0.01 100 Protein family/group databases TopDownProteomics 3236 2957 0.01 120 Proteomic databases TreeFam 46115 46092 0.08 53 Phylogenomic databases TubercuList 2325 2289 <0.01 126 Organism-specific databases UCD-2DPAGE 496 496 <0.01 151 2D gel databases UCSC 50826 46366 0.09 49 Genome annotation databases UniLectin 315 315 <0.01 157 Protein family/group databases UniPathway 139766 126137 0.25 30 Enzyme and pathway databases VEuPathDB 80969 74545 0.14 38 Organism-specific databases VGNC 4494 4480 0.01 114 Organism-specific databases WBParaSite 49 47 <0.01 167 Genome annotation databases World-2DPAGE 935 923 <0.01 143 2D gel databases WormBase 6943 5061 0.01 104 Organism-specific databases Xenbase 4836 4836 0.01 113 Organism-specific databases ZFIN 3269 3269 0.01 119 Organism-specific databases Total number of cross-referenced databases: 167 6. AMINO ACID COMPOSITION 6.1 Composition in percent for the complete database Ala (A) 8.25 Gln (Q) 3.93 Leu (L) 9.65 Ser (S) 6.65 Arg (R) 5.53 Glu (E) 6.72 Lys (K) 5.80 Thr (T) 5.36 Asn (N) 4.06 Gly (G) 7.07 Met (M) 2.41 Trp (W) 1.10 Asp (D) 5.46 His (H) 2.27 Phe (F) 3.86 Tyr (Y) 2.92 Cys (C) 1.38 Ile (I) 5.91 Pro (P) 4.74 Val (V) 6.85 Asx (B) 0.000 Glx (Z) 0.000 Xaa (X) 0.00
Legend: gray = aliphatic, red = acidic, green = small hydroxy, blue = basic, black = aromatic, white = amide, yellow = sulfur 6.2 Classification of the amino acids by their frequency Leu, Ala, Gly, Val, Glu, Ser, Ile, Lys, Arg, Asp, Thr, Pro, Asn, Gln, Phe, Tyr, Met, His, Cys, Trp 7. MISCELLANEOUS STATISTICS 4466 entries are encoded on a mitochondrion, and 3997 are encoded on a plasmid. 12199 entries are encoded on a plastid, of which 21 are encoded on apicoplasts, 11634 on chloroplasts, 51 on organellar chromatophores, 145 on cyanelles, 149 on non-photosynthetic plastids and 199 on unspecified types of plastid. Number of entries with at least one sequence correction: 81063