UniProtKB/Swiss-Prot protein knowledgebase release 2024_01 statistics 1. INTRODUCTION Release 2024_01 of 24-Jan-2024 of UniProtKB/Swiss-Prot contains 570830 sequence entries, curated from 296829 unique references and comprising 206533160 amino acids. 415 sequences have been added since release 2023_05, the sequence data of 169 existing entries has been updated and the annotations of 433476 entries have been revised. Number of fragments: 9300 Number of additional sequences produced by alternative splicing, initiation or promoter usage, or ribosomal frameshifting: 41105 Protein existence (PE): entries % 1: Evidence at protein level 113649 19.9% 2: Evidence at transcript level 55837 9.8% 3: Inferred from homology 386501 67.7% 4: Predicted 13019 2.3% 5: Uncertain 1824 0.3% The growth of the database is summarized below. 2. TAXONOMIC ORIGIN Total number of species represented in this release of UniProtKB/Swiss-Prot: 14570 The first twenty species represent 122964 sequences: 21.5 % of the total number of entries. 2.1 Table of the frequency of occurrence of species Species represented 1x: 5941 2x: 2115 3x: 1135 4x: 773 5x: 534 6x: 439 7x: 330 8x: 277 9x: 239 10x: 158 11- 20x: 839 21- 50x: 505 51-100x: 228 >100x: 1057 2.2 Table of the most represented species ------ --------- -------------------------------------------- Number Frequency Species ------ --------- -------------------------------------------- 1 20433 Homo sapiens (Human) 2 17201 Mus musculus (Mouse) 3 16381 Arabidopsis thaliana (Mouse-ear cress) 4 8188 Rattus norvegicus (Rat) 5 6727 Saccharomyces cerevisiae (strain ATCC 204508 / S288c) (Baker's yeast) 6 6041 Bos taurus (Bovine) 7 5121 Schizosaccharomyces pombe (strain 972 / ATCC 24843) (Fission yeast) 8 4530 Escherichia coli (strain K12) 9 4465 Caenorhabditis elegans 10 4191 Bacillus subtilis (strain 168) 11 4186 Oryza sativa subsp. japonica (Rice) 12 4159 Dictyostelium discoideum (Social amoeba) 13 3753 Drosophila melanogaster (Fruit fly) 14 3495 Xenopus laevis (African clawed frog) 15 3314 Danio rerio (Zebrafish) (Brachydanio rerio) 16 2308 Gallus gallus (Chicken) 17 2308 Mycobacterium tuberculosis (strain ATCC 25618 / H37Rv) 18 2218 Pongo abelii (Sumatran orangutan) (Pongo pygmaeus abelii) 19 2046 Escherichia coli O157:H7 20 1899 Mycobacterium tuberculosis (strain CDC 1551 / Oshkosh) 21 1821 Salmonella typhimurium (strain LT2 / SGSC1412 / ATCC 700720) 22 1787 Methanocaldococcus jannaschii 23 1710 Xenopus tropicalis (Western clawed frog) (Silurana tropicalis) 24 1704 Haemophilus influenzae (strain ATCC 51907 / DSM 11121 / KW20 / Rd) 25 1702 Escherichia coli O6:H1 (strain CFT073 / ATCC 700928 / UPEC) 26 1696 Shigella flexneri 27 1460 Pseudomonas aeruginosa 28 1458 Sus scrofa (Pig) 29 1348 Salmonella typhi 30 1244 Mycobacterium bovis (strain ATCC BAA-935 / AF2122/97) 31 1176 Macaca fascicularis (Crab-eating macaque) (Cynomolgus monkey) 32 1143 Candida albicans (strain SC5314 / ATCC MYA-2876) (Yeast) 33 1096 Synechocystis sp. (strain PCC 6803 / Kazusa) 34 1036 Archaeoglobus fulgidus 35 1030 Yersinia pestis 36 1005 Emericella nidulans 37 997 Vibrio cholerae serotype O1 (strain ATCC 39315 / El Tor Inaba N16961) 38 960 Neurospora crassa 39 941 Staphylococcus aureus (strain Mu50 / ATCC 700699) 40 930 Salmonella paratyphi A (strain ATCC 9150 / SARB42) 41 929 Staphylococcus aureus (strain N315) 42 928 Eremothecium gossypii 43 919 Kluyveromyces lactis 44 909 Acanthamoeba polyphaga mimivirus (APMV) 45 905 Staphylococcus aureus (strain COL) 46 902 Oryctolagus cuniculus (Rabbit) 47 902 Aspergillus fumigatus (strain ATCC MYA-4609 / CBS 101355 / FGSC A1100 / Af293) 48 896 Staphylococcus aureus (strain MW2) 49 894 Escherichia coli O6:K15:H31 (strain 536 / UPEC) 50 890 Staphylococcus aureus (strain MSSA476) 51 888 Staphylococcus aureus (strain MRSA252) 52 888 Candida glabrata 53 887 Rhizobium meliloti (strain 1021) (Ensifer meliloti) (Sinorhizobium meliloti) 54 882 Salmonella choleraesuis (strain SC-B67) 55 879 Shigella sonnei (strain Ss046) 56 867 Oryza sativa subsp. indica (Rice) 57 863 Yersinia pseudotuberculosis serotype I (strain IP32953) 58 849 Zea mays (Maize) 59 847 Escherichia coli O9:H4 (strain HS) 60 847 Canis lupus familiaris (Dog) (Canis familiaris) 61 838 Escherichia coli O139:H28 (strain E24377A / ETEC) 62 829 Shigella boydii serotype 4 (strain Sb227) 63 825 Escherichia coli (strain UTI89 / UPEC) 64 822 Shigella dysenteriae serotype 1 (strain Sd197) 65 822 Escherichia coli 66 816 Streptomyces coelicolor (strain ATCC BAA-471 / A3(2) / M145) 67 810 Staphylococcus aureus (strain NCTC 8325 / PS 47) 68 803 Pectobacterium atrosepticum (strain SCRI 1043 / ATCC BAA-672) 69 796 Vibrio parahaemolyticus serotype O3:K6 (strain RIMD 2210633) 70 791 Escherichia coli (strain SMS-3-5 / SECEC) 71 788 Aquifex aeolicus (strain VF5) 72 779 Escherichia coli O127:H6 (strain E2348/69 / EPEC) 73 771 Escherichia coli (strain K12 / DH10B) 74 770 Pasteurella multocida (strain Pm70) 75 766 Escherichia coli O17:K52:H18 (strain UMN026 / ExPEC) 76 765 Escherichia coli (strain K12 / MC4100 / BW2952) 77 762 Escherichia coli (strain 55989 / EAEC) 78 761 Escherichia coli O8 (strain IAI1) 79 760 Staphylococcus epidermidis (strain ATCC 35984 / RP62A) 80 760 Staphylococcus epidermidis (strain ATCC 12228 / FDA PCI 1200) 81 760 Shigella flexneri serotype 5b (strain 8401) 82 759 Escherichia coli O45:K1 (strain S88 / ExPEC) 83 758 Bacillus anthracis 84 756 Escherichia coli (strain SE11) 85 753 Escherichia coli O7:K1 (strain IAI39 / ExPEC) 86 749 Photorhabdus laumondii subsp. laumondii (strain DSM 15139 / CIP 105565 / TT01) 87 748 Escherichia coli O157:H7 (strain EC4115 / EHEC) 88 744 Halalkalibacterium halodurans 89 739 Yersinia enterocolitica serotype O:8 / biotype 1B (strain NCTC 13174 / 8081) 90 734 Pseudomonas putida 91 733 Vibrio vulnificus (strain CMCP6) 92 731 Escherichia coli O81 (strain ED1a) 93 722 Salmonella enteritidis PT4 (strain P125109) 94 720 Escherichia coli 95 718 Vibrio vulnificus (strain YJ016) 96 716 Salmonella paratyphi B (strain ATCC BAA-1250 / SPB7) 97 715 Yersinia pestis bv. Antiqua (strain Nepal516) 98 715 Klebsiella pneumoniae subsp. pneumoniae (strain ATCC 700721 / MGH 78578) 99 715 Escherichia coli O1:K1 / APEC 100 715 Enterobacter sp. (strain 638) 101 714 Salmonella paratyphi A (strain AKU_12601) 102 713 Salmonella newport (strain SL254) 103 713 Yersinia pseudotuberculosis serotype O:1b (strain IP 31758) 104 713 Salmonella agona (strain SL483) 105 712 Salmonella schwarzengrund (strain CVM19633) 106 711 Yersinia pestis bv. Antiqua (strain Antiqua) 107 710 Salmonella heidelberg (strain SL476) 108 707 Nostoc sp. (strain PCC 7120 / SAG 25.82 / UTEX 2576) 109 702 Salmonella dublin (strain CT_02021853) 110 699 Klebsiella pneumoniae (strain 342) 111 698 Shigella boydii serotype 18 (strain CDC 3083-94 / BS512) 112 695 Escherichia fergusonii 113 692 Pan troglodytes (Chimpanzee) 114 686 Mycoplasma pneumoniae (strain ATCC 29342 / M129 / Subtype 1) 115 684 Salmonella gallinarum (strain 287/91 / NCTC 13346) 116 683 Pseudomonas syringae pv. tomato (strain ATCC BAA-871 / DC3000) 117 679 Citrobacter koseri (strain ATCC BAA-895 / CDC 4225-83 / SGSC4696) 118 679 Staphylococcus aureus (strain USA300) 119 672 Serratia proteamaculans (strain 568) 120 669 Bacillus cereus 121 669 Mycobacterium leprae (strain TN) 122 667 Yersinia pestis (strain Pestoides F) 123 666 Yarrowia lipolytica (strain CLIB 122 / E 150) (Yeast) (Candida lipolytica) 124 666 Bradyrhizobium diazoefficiens 125 663 Agrobacterium fabrum (strain C58 / ATCC 33970) (Agrobacterium tumefaciens 126 659 Shewanella oneidensis (strain MR-1) 127 658 Sinorhizobium fredii (strain NBRC 101917 / NGR234) 128 653 Debaryomyces hansenii 129 643 Staphylococcus aureus (strain bovine RF122 / ET3-1) 130 642 Yersinia pseudotuberculosis serotype O:3 (strain YPIII) 131 642 Salmonella arizonae (strain ATCC BAA-731 / CDC346-86 / RSK2980) 132 634 Yersinia pseudotuberculosis serotype IB (strain PB1/+) 133 622 Listeria monocytogenes serovar 1/2a (strain ATCC BAA-679 / EGD-e) 134 622 Methanothermobacter thermautotrophicus 135 622 Cronobacter sakazakii (strain ATCC BAA-894) (Enterobacter sakazakii) 136 622 Treponema pallidum (strain Nichols) 137 620 Pseudomonas aeruginosa (strain UCBPP-PA14) 138 615 Xanthomonas campestris pv. campestris 139 614 Staphylococcus haemolyticus (strain JCSC1435) 140 613 Mesorhizobium japonicum (Mesorhizobium loti 141 612 Helicobacter pylori (strain ATCC 700392 / 26695) (Campylobacter pylori) 142 605 Listeria innocua serovar 6a (strain ATCC BAA-680 / CLIP 11262) 143 603 Ralstonia nicotianae (strain GMI1000) (Ralstonia solanacearum) 144 602 Staphylococcus saprophyticus subsp. saprophyticus 145 602 Photobacterium profundum (strain SS9) 146 601 Salmonella paratyphi C (strain RKS4594) 147 600 Yersinia pestis bv. Antiqua (strain Angola) 148 595 Bacillus cereus (strain ATCC 10987 / NRS 248) 149 591 Pectobacterium carotovorum subsp. carotovorum (strain PC1) 150 588 Neisseria meningitidis serogroup B (strain MC58) 151 587 Mycolicibacterium smegmatis (strain ATCC 700084 / mc(2)155) 152 584 Rickettsia prowazekii (strain Madrid E) 153 582 Caenorhabditis briggsae 154 579 Brucella suis biovar 1 (strain 1330) 155 576 Brucella melitensis biotype 1 (strain 16M / ATCC 23456 / NCTC 10094) 156 575 Caulobacter vibrioides (strain ATCC 19089 / CB15) (Caulobacter crescentus) 157 573 Aliivibrio fischeri (strain ATCC 700601 / ES114) (Vibrio fischeri) 158 572 Buchnera aphidicola subsp. Acyrthosiphon pisum (strain APS) 159 571 Aspergillus oryzae (strain ATCC 42149 / RIB 40) (Yellow koji mold) 160 569 Bacillus thuringiensis subsp. konkukian (strain 97-27) 161 568 Pseudomonas syringae pv. syringae (strain B728a) 162 568 Helicobacter pylori (strain J99 / ATCC 700824) (Campylobacter pylori J99) 163 565 Bacillus licheniformis 164 564 Thermotoga maritima 165 562 Bacillus cereus (strain ZK / E33L) 166 562 Buchnera aphidicola subsp. Schizaphis graminum (strain Sg) 167 559 Clostridium acetobutylicum 168 557 Xanthomonas axonopodis pv. citri (strain 306) 169 555 Pseudomonas fluorescens (strain Pf0-1) 170 554 Neisseria meningitidis serogroup A / serotype 4A (strain DSM 15465 / Z2491) 171 554 Pseudomonas fluorescens (strain ATCC BAA-477 / NRRL B-23932 / Pf-5) 172 553 Oceanobacillus iheyensis 173 547 Pseudomonas savastanoi pv. phaseolicola (Pseudomonas syringae pv. phaseolicola 174 543 Corynebacterium glutamicum 175 540 Lactococcus lactis subsp. lactis (strain IL1403) (Streptococcus lactis) 176 531 Erwinia tasmaniensis 177 529 Listeria monocytogenes serotype 4b (strain F2365) 178 529 Bordetella bronchiseptica (strain ATCC BAA-588 / NCTC 13252 / RB50) 179 529 Sodalis glossinidius (strain morsitans) 180 524 Staphylococcus aureus (strain Newman) 181 523 Vibrio cholerae serotype O1 (strain ATCC 39541 / Classical Ogawa 395 / O395) 182 522 Xylella fastidiosa (strain 9a5c) 183 521 Deinococcus radiodurans 184 519 Chromobacterium violaceum 185 519 Methanosarcina acetivorans (strain ATCC 35395 / DSM 2834 / JCM 12185 / C2A) 186 518 Streptococcus pneumoniae serotype 4 (strain ATCC BAA-334 / TIGR4) 187 516 Bordetella pertussis (strain Tohama I / ATCC BAA-589 / NCTC 13251) 188 515 Xylella fastidiosa (strain Temecula1 / ATCC 700964) 189 512 Geobacillus kaustophilus (strain HTA426) 190 512 Pseudomonas aeruginosa (strain PA7) 191 512 Haemophilus ducreyi (strain 35000HP / ATCC 700724) 192 511 Streptomyces avermitilis 193 511 Acinetobacter baylyi (strain ATCC 33305 / BD413 / ADP1) 194 508 Bordetella parapertussis (strain 12822 / ATCC BAA-587 / NCTC 13253) 195 507 Buchnera aphidicola subsp. Baizongia pistaciae (strain Bp) 196 506 Pyrococcus furiosus (strain ATCC 43587 / DSM 3638 / JCM 8422 / Vc1) 197 506 Solanum lycopersicum (Tomato) (Lycopersicon esculentum) 198 505 Nicotiana tabacum (Common tobacco) 199 504 Pseudomonas entomophila (strain L48) 200 502 Streptococcus pneumoniae (strain ATCC BAA-255 / R6) 201 499 Haemophilus influenzae (strain 86-028NP) 202 499 Brucella abortus biovar 1 (strain 9-941) 203 498 Thermosynechococcus vestitus (strain NIES-2133 / IAM M-273 / BP-1) 204 498 Methanosarcina mazei 205 497 Burkholderia pseudomallei (strain K96243) 206 496 Shouchella clausii (strain KSM-K16) (Alkalihalobacillus clausii) 207 496 Proteus mirabilis (strain HI4320) 208 496 Rickettsia conorii (strain ATCC VR-613 / Malish 7) 209 495 Pyrococcus horikoshii 210 494 Synechococcus elongatus (strain ATCC 33912 / PCC 7942 / FACHB-805) 211 494 Xanthomonas campestris pv. campestris (strain 8004) 212 492 Halobacterium salinarum (strain ATCC 700922 / JCM 11081 / NRC-1) 213 492 Bacillus velezensis (strain DSM 23117 / BGSC 10A6 / LMG 26770 / FZB42) 214 492 Brucella abortus (strain 2308) 215 491 Vibrio campbellii (strain ATCC BAA-1116) 216 490 Saccharolobus solfataricus (strain ATCC 35092 / DSM 1617 / JCM 11322 / P2) 217 487 Shewanella sp. (strain MR-7) 218 486 Mannheimia succiniciproducens (strain MBEL55E) 219 484 Staphylococcus aureus (strain Mu3 / ATCC 700698) 220 484 Shewanella sp. (strain MR-4) 221 484 Pseudomonas aeruginosa (strain LESB58) 222 483 Mycoplasma genitalium (strain ATCC 33530 / DSM 19775 / NCTC 10195 / G37) 223 483 Lactiplantibacillus plantarum (strain ATCC BAA-793 / NCIMB 8826 / WCFS1) 224 479 Pseudomonas putida (strain ATCC 700007 / DSM 6899 / BCRC 17059 / F1) 225 477 Pyrococcus abyssi (strain GE5 / Orsay) 226 476 Cupriavidus necator 227 475 Campylobacter jejuni subsp. jejuni serotype O:2 228 475 Burkholderia lata 229 472 Rhodopseudomonas palustris (strain ATCC BAA-98 / CGA009) 230 470 Cereibacter sphaeroides 231 470 Enterococcus faecalis (strain ATCC 700802 / V583) 232 470 Clostridium perfringens (strain 13 / Type A) 233 468 Shewanella sp. (strain ANA-3) 234 468 Pseudomonas putida (strain GB-1) 235 467 Aeromonas hydrophila subsp. hydrophila 236 467 Shewanella frigidimarina (strain NCIMB 400) 237 466 Xanthomonas campestris pv. vesicatoria (strain 85-10) 238 465 Trichormus variabilis (strain ATCC 29413 / PCC 7937) (Anabaena variabilis) 239 463 Burkholderia mallei (strain ATCC 23344) 240 461 Cupriavidus pinatubonensis (strain JMP 134 / LMG 1197) (Cupriavidus necator 241 460 Methylococcus capsulatus (strain ATCC 33009 / NCIMB 11132 / Bath) 242 460 Ovis aries (Sheep) 243 457 Rickettsia felis (strain ATCC VR-1525 / URRWXCal2) (Rickettsia azadi) 244 455 Staphylococcus aureus (strain JH1) 245 455 Xanthomonas oryzae pv. oryzae (strain MAFF 311018) 246 455 Shewanella baltica (strain OS185) 247 453 Pseudomonas putida (strain W619) 248 453 Streptococcus mutans serotype c (strain ATCC 700610 / UA159) 249 453 Mycolicibacterium paratuberculosis (strain ATCC BAA-968 / K-10) 250 452 Caldanaerobacter subterraneus subsp. tengcongensis 2.3 Taxonomic distribution of the sequences Kingdom sequences (% of the database) Archaea 19726 ( 3%) Bacteria 336252 ( 59%) Eukaryota 197465 ( 35%) Viruses 17387 ( 3%) Within Eukaryota: Category sequences (% of Eukaryota) (% of the complete database) Human 20434 ( 10%) ( 4%) Other Mammalia 47334 ( 24%) ( 8%) Other Vertebrata 18946 ( 10%) ( 3%) Viridiplantae 41698 ( 21%) ( 7%) Fungi 36745 ( 19%) ( 6%) Insecta 9776 ( 5%) ( 2%) Nematoda 5383 ( 3%) ( 1%) Other 17149 ( 9%) ( 3%) 3. SEQUENCE SIZE Repartition of the sequences by size (excluding fragments) From To Number From To Number 1- 50 9977 1001-1100 4128 51- 100 43563 1101-1200 2900 101- 150 59823 1201-1300 2209 151- 200 59600 1301-1400 2069 201- 250 58477 1401-1500 1679 251- 300 52445 1501-1600 835 301- 350 52890 1601-1700 645 351- 400 45925 1701-1800 593 401- 450 37721 1801-1900 507 451- 500 30601 1901-2000 397 501- 550 22318 2001-2100 273 551- 600 15849 2101-2200 387 601- 650 13164 2201-2300 340 651- 700 9411 2301-2400 235 701- 750 7881 2401-2500 197 751- 800 5699 >2500 1467 801- 850 4890 851- 900 5311 901- 950 4109 951-1000 3015 The average sequence length in UniProtKB/Swiss-Prot is 361 amino acids. The shortest sequence is GWA_SEPOF (P83570): 2 amino acids. The longest sequence is TITIN_MOUSE (A2ASS6): 35213 amino acids. 4. JOURNAL CITATIONS Note: the following citation statistics reflect the number of distinct journal citations. Total number of journals cited in this release of UniProtKB/Swiss-Prot: 3138 4.1 Table of the frequency of journal citations Journals cited 1x: 996 2x: 430 3x: 223 4x: 142 5x: 124 6x: 85 7x: 69 8x: 77 9x: 49 10x: 33 11- 20x: 249 21- 50x: 261 51-100x: 140 >100x: 260 4.2 List of the most cited journals in UniProtKB/Swiss-Prot Nb Citations Journal name -- --------- ------------------------------------------------------------- 1 27122 Journal of Biological Chemistry 2 12663 Proceedings of the National Academy of Sciences of the U.S.A. 3 7188 Journal of Bacteriology 4 6051 Biochemical and Biophysical Research Communications 5 5841 Biochemistry 6 5325 Nucleic Acids Research 7 5110 Nature 8 5084 FEBS Letters 9 4963 The EMBO Journal 10 4889 Gene 11 4589 Journal of Molecular Biology 12 4567 Molecular and Cellular Biology 13 4021 Biochimica et Biophysica Acta 14 3865 Cell 15 3597 Journal of Virology 16 3508 European Journal of Biochemistry 17 3376 Science 18 3157 Biochemical Journal 19 2853 Molecular Microbiology 20 2809 Plant Physiology 21 2595 PLoS ONE 22 2548 Genomics 23 2426 The American Journal of Human Genetics 24 2357 Journal of Cell Biology 25 2200 The Plant Cell 26 2037 The Plant Journal 27 2015 Human Molecular Genetics 28 1949 Genes and Development 29 1923 Plant Molecular Biology 30 1900 Virology 31 1846 Nature Genetics 32 1820 Development 33 1820 Molecular Biology of the Cell 34 1804 Molecular Cell 35 1691 Journal of Immunology 36 1657 Human Mutation 37 1566 Oncogene 38 1464 Structure 39 1430 Molecular and General Genetics 40 1423 Genetics 41 1413 Journal of Biochemistry 42 1388 Journal of Cell Science 43 1276 Blood 44 1272 Infection and Immunity 45 1220 Nature Communications 46 1186 Journal of General Virology 47 1184 Developmental Biology 48 1179 Microbiology 49 1154 Archives of Biochemistry and Biophysics 50 1146 Current Biology 51 1020 Applied and Environmental Microbiology 52 1016 Journal of Neuroscience 53 993 Acta Crystallographica, Section D 54 925 Cancer Research 55 914 FEMS Microbiology Letters 56 904 PLoS Genetics 57 888 Toxicon 58 881 American Journal of Physiology 59 868 Scientific Reports 60 864 Protein Science 61 853 Journal of Clinical Investigation 62 851 Yeast 63 825 Neuron 64 766 Plant and Cell Physiology 65 756 The Journal of Experimental Medicine 66 754 Human Genetics 67 705 Journal of Medical Genetics 68 692 Proteins 69 683 The FEBS Journal 70 676 Mechanisms of Development 71 663 PLoS Pathogens 72 659 Nature Structural and Molecular Biology 73 650 Nature Structural Biology 74 638 Nature Cell Biology 75 624 Bioscience, Biotechnology, and Biochemistry 76 592 Current Genetics 77 589 Developmental Cell 78 575 Journal of Neurochemistry 79 554 Antimicrobial Agents and Chemotherapy 80 554 Molecular Endocrinology 81 550 The Journal of Clinical Endocrinology and Metabolism 82 540 Endocrinology 83 517 Molecular and Biochemical Parasitology 84 505 Journal of the American Chemical Society 85 496 Mammalian Genome 86 489 Experimental Cell Research 87 488 Cell Reports 88 486 Eukaryotic Cell 89 477 Peptides 90 476 RNA 91 464 Journal of Experimental Botany 92 457 Planta 93 454 The FASEB Journal 94 452 EMBO Reports 95 445 American Journal of Medical Genetics. Part A 96 434 Immunogenetics 97 432 Molecular Pharmacology 98 425 Acta Crystallographica, Section F 99 418 Molecular Biology and Evolution 100 415 European Journal of Human Genetics 101 412 Molecular Plant-Microbe Interactions 102 410 Immunity 103 407 Journal of Molecular Evolution 104 400 Clinical Genetics 105 398 Journal of Investigative Dermatology 106 396 DNA and Cell Biology 107 390 Neurology 108 382 Biochimie 109 380 DNA Sequence 110 378 Biology of Reproduction 111 371 112 370 Comparative Biochemistry and Physiology 113 362 Virus Research 114 358 Genes to Cells 115 346 Journal of Lipid Research 116 344 Nature Immunology 117 342 Brain Research. Molecular Brain Research 118 339 The New England Journal of Medicine 119 337 Developmental Dynamics 120 335 PLoS Biology 121 334 Annals of Neurology 122 326 BMC Genomics 123 324 Applied Microbiology and Biotechnology 124 314 Genome Research 125 314 Journal of Medicinal Chemistry 126 314 European Journal of Immunology 127 308 Investigative Ophthalmology and Visual Science 128 299 Biological Chemistry Hoppe-Seyler 129 297 Journal of Human Genetics 130 282 Journal of General Microbiology 131 281 Glycobiology 132 281 Cytogenetics and Cell Genetics 133 278 Archives of Microbiology 134 267 Nature Chemical Biology 135 263 Brain 136 259 Phytochemistry 137 258 Traffic 138 258 Molecular Genetics and Metabolism 139 255 Molecular Immunology 140 255 Nature Medicine 141 254 Protein Expression and Purification 142 249 Journal of Cellular Biochemistry 143 249 Fungal Genetics and Biology 144 242 Cell Cycle 145 236 Circulation Research 146 234 DNA Research 147 233 Diabetes 148 228 Cell Research 149 227 Archives of Virology 150 224 Journal of Structural Biology 5. STATISTICS FOR SOME LINE TYPES The following table summarizes the total number of some UniProtKB/Swiss-Prot lines, as well as the number of entries with at least one such line, and the frequency of the lines. Total Number of Average Line type / subtype number entries per entry ------------------------------------ -------- --------- --------- References (RL) 1305541 2.29 Journal 1133240 474553 1.99 1 Submitted to EMBL/GenBank/DDBJ 160811 144932 0.28 2 Submitted to other databases 7788 7118 0.01 3 Book citation 1875 1852 <0.01 4 Plant Gene Register 613 600 <0.01 5 Unpublished observations 536 532 <0.01 6 Thesis 458 455 <0.01 7 Patent 214 207 <0.01 8 Worm Breeder's Gazette 6 6 <0.01 9 Total number of distinct authors cited in UniProtKB/Swiss-Prot: 466237 Total Number of Average Line type / subtype number entries per entry Rank ------------------------------------ -------- --------- --------- ---- Comments (CC) 2738587 4.80 ACTIVITY REGULATION 18544 18425 0.03 17 ALLERGEN 948 948 <0.01 26 ALTERNATIVE PRODUCTS 25888 25888 0.05 13 BIOPHYSICOCHEMICAL PROPERTIES 11428 11380 0.02 20 BIOTECHNOLOGY 1907 1847 <0.01 24 CATALYTIC ACTIVITY 340065 254456 0.60 4 CAUTION 14385 14086 0.03 18 COFACTOR 133102 120778 0.23 7 DEVELOPMENTAL STAGE 14308 14211 0.03 19 DISEASE 8298 5589 0.01 21 DISRUPTION PHENOTYPE 20762 20726 0.04 16 DOMAIN 58661 50003 0.10 9 FUNCTION 490865 466522 0.86 2 INDUCTION 25587 25492 0.04 14 INTERACTION 24184 24184 0.04 15 MASS SPECTROMETRY 7548 5831 0.01 22 MISCELLANEOUS 45866 40279 0.08 11 PATHWAY 143718 129735 0.25 6 PHARMACEUTICAL 167 160 <0.01 29 POLYMORPHISM 1502 1374 <0.01 25 PTM 64470 45934 0.11 8 RNA EDITING 636 636 <0.01 28 SEQUENCE CAUTION 45224 45153 0.08 12 SIMILARITY 519322 514992 0.91 1 SUBCELLULAR LOCATION 364941 356407 0.64 3 SUBUNIT 297873 292487 0.52 5 TISSUE SPECIFICITY 51004 50426 0.09 10 TOXIC DOSE 860 689 <0.01 27 WEB RESOURCE 6524 5533 0.01 23 Total number of comment topics: 29 Total Number of Average Line type / subtype number entries per entry Rank ------------------------------------ -------- --------- --------- ---- Features (FT) 5339060 9.35 ACT_SITE 176984 105259 0.31 9 BINDING 1216961 217927 2.13 1 CARBOHYD 124051 31529 0.22 14 CHAIN 579201 563178 1.01 2 COILED 22520 15576 0.04 25 COMPBIAS 174690 74207 0.31 10 CONFLICT 139087 48489 0.24 12 CROSSLNK 24989 8976 0.04 24 DISULFID 135519 36035 0.24 13 DNA_BIND 12193 10916 0.02 31 DOMAIN 215542 131669 0.38 8 HELIX 335721 29161 0.59 5 INIT_MET 17545 17496 0.03 26 INTRAMEM 3026 1390 0.01 34 LIPID 13812 8861 0.02 28 MOD_RES 261729 74559 0.46 7 MOTIF 47670 31043 0.08 21 MUTAGEN 96528 19844 0.17 17 NON_CONS 2681 836 <0.01 35 NON_STD 358 283 <0.01 36 NON_TER 12627 9704 0.02 30 PEPTIDE 12629 8738 0.02 29 PROPEP 15292 13037 0.03 27 REGION 321132 150076 0.56 6 REPEAT 109168 15173 0.19 15 SIGNAL 44283 44282 0.08 22 SITE 65147 35341 0.11 19 STRAND 341973 27470 0.60 4 TOPO_DOM 149962 30420 0.26 11 TRANSIT 9564 9444 0.02 32 TRANSMEM 381766 79963 0.67 3 TURN 81139 23758 0.14 18 UNSURE 5757 897 0.01 33 VAR_SEQ 53221 22651 0.09 20 VARIANT 103806 17501 0.18 16 ZN_FING 30787 13135 0.05 23 Total number of feature keys: 36 Total Number of Average Line type / subtype number entries per entry Rank Category ------------------------------------ -------- --------- --------- ---- ------------------------------------------- Cross-references (DR) 20546636 35.99 ABCD 3067 3067 0.01 124 Protocols and materials databases AGR 60842 60172 0.11 43 Organism-specific databases Allergome 2037 1310 <0.01 132 Protein family/group databases AlphaFoldDB 546628 546628 0.96 9 3D structure databases Antibodypedia 32300 32191 0.06 62 Protocols and materials databases ArachnoServer 1148 1138 <0.01 142 Organism-specific databases Araport 16401 16305 0.03 93 Organism-specific databases Bgee 61485 61485 0.11 41 Gene expression databases BindingDB 6660 6660 0.01 109 Chemistry databases BioCyc 48068 44022 0.08 53 Enzyme and pathway databases BioGRID 61310 59413 0.11 42 Protein-protein interaction databases BioGRID-ORCS 44975 44390 0.08 55 Miscellaneous databases BioMuta 20309 20283 0.04 77 Genetic variation databases BMRB 6910 6910 0.01 107 3D structure databases BRENDA 20329 18525 0.04 75 Enzyme and pathway databases CarbonylDB 1159 1159 <0.01 141 PTM databases CAZy 9618 8665 0.02 100 Protein family/group databases CCDS 49494 34651 0.09 51 Sequence databases CDD 382807 300877 0.67 16 Family and domain databases CGD 2103 2086 <0.01 131 Organism-specific databases ChEMBL 9021 8833 0.02 101 Chemistry databases ChiTaRS 29761 29716 0.05 64 Miscellaneous databases CLAE 359 356 <0.01 158 Protein family/group databases CollecTF 137 137 <0.01 164 Gene expression databases ComplexPortal 14953 7896 0.03 96 Protein-protein interaction databases COMPLUYEAST-2DPAGE 97 97 <0.01 166 2D gel databases ConoServer 967 879 <0.01 144 Organism-specific databases CORUM 5812 5812 0.01 110 Protein-protein interaction databases CPTAC 3472 1929 0.01 120 Proteomic databases CPTC 389 389 <0.01 155 Protocols and materials databases CTD 69492 68729 0.12 40 Organism-specific databases DEPOD 254 254 <0.01 162 PTM databases dictyBase 4224 4110 0.01 117 Organism-specific databases DIP 17546 17505 0.03 89 Protein-protein interaction databases DisGeNET 17011 16793 0.03 91 Organism-specific databases DisProt 1721 1715 <0.01 135 Family and domain databases DMDM 16171 16170 0.03 95 Genetic variation databases DNASU 48377 48299 0.08 52 Protocols and materials databases DOSAC-COBS-2DPAGE 145 145 <0.01 163 2D gel databases DrugBank 31176 4772 0.05 63 Chemistry databases DrugCentral 2565 2565 <0.01 126 Chemistry databases EchoBASE 4158 4158 0.01 118 Organism-specific databases eggNOG 339191 333346 0.59 17 Phylogenomic databases ELM 1814 1814 <0.01 134 Protein-protein interaction databases EMBL 1005456 558056 1.76 3 Sequence databases EMDB 69660 8097 0.12 39 3D structure databases Ensembl 114577 50301 0.20 33 Genome annotation databases EnsemblBacteria 55438 55260 0.10 46 Genome annotation databases EnsemblFungi 23120 22677 0.04 70 Genome annotation databases EnsemblMetazoa 19031 11558 0.03 84 Genome annotation databases EnsemblPlants 36885 22344 0.06 60 Genome annotation databases EnsemblProtists 5382 5127 0.01 113 Genome annotation databases EPD 23260 23260 0.04 69 Proteomic databases ESTHER 3008 3006 0.01 125 Protein family/group databases euHCVdb 55 44 <0.01 168 Organism-specific databases EvolutionaryTrace 16776 16776 0.03 92 Miscellaneous databases ExpressionAtlas 53094 53094 0.09 49 Gene expression databases FlyBase 4156 4041 0.01 119 Organism-specific databases Gene3D 739077 458752 1.29 6 Family and domain databases GeneCards 20380 20246 0.04 73 Organism-specific databases GeneID 293156 283419 0.51 21 Genome annotation databases GeneReviews 1591 1588 <0.01 136 Organism-specific databases GeneTree 57210 57202 0.10 45 Phylogenomic databases Genevisible 55283 55283 0.10 47 Gene expression databases GeneWiki 10351 10269 0.02 99 Miscellaneous databases GenomeRNAi 22289 22289 0.04 71 Miscellaneous databases GlyConnect 2372 2215 <0.01 127 PTM databases GlyCosmos 28903 28903 0.05 65 PTM databases GlyGen 21596 21596 0.04 72 PTM databases GO 3144059 547026 5.51 1 Ontologies Gramene 36885 22344 0.06 59 Genome annotation databases GuidetoPHARMACOLOGY 2205 2205 <0.01 130 Chemistry databases HAMAP 330877 327942 0.58 19 Family and domain databases HGNC 20377 20248 0.04 74 Organism-specific databases HOGENOM 427021 427021 0.75 15 Phylogenomic databases HPA 19354 19215 0.03 82 Organism-specific databases IDEAL 986 986 <0.01 143 Family and domain databases IMGT_GENE-DB 267 267 <0.01 161 Protein family/group databases InParanoid 163754 163754 0.29 26 Phylogenomic databases IntAct 57277 57277 0.10 44 Protein-protein interaction databases InterPro 2427248 551756 4.25 2 Family and domain databases iPTMnet 54156 54156 0.09 48 PTM databases JaponicusDB 43 43 <0.01 170 Organism-specific databases jPOST 26410 26410 0.05 66 Proteomic databases KEGG 493293 470312 0.86 12 Genome annotation databases LegioList 765 763 <0.01 149 Organism-specific databases Leproma 672 669 <0.01 150 Organism-specific databases MaizeGDB 529 525 <0.01 152 Organism-specific databases MalaCards 5619 5610 0.01 111 Organism-specific databases MANE-Select 18422 18310 0.03 86 Genome annotation databases MassIVE 19139 19139 0.03 83 Proteomic databases MaxQB 33723 33723 0.06 61 Proteomic databases MEROPS 14200 13782 0.02 97 Protein family/group databases MetOSite 3455 3455 0.01 121 PTM databases MGI 17111 17070 0.03 90 Organism-specific databases MIM 23295 16097 0.04 68 Organism-specific databases MINT 23842 23842 0.04 67 Protein-protein interaction databases MoonDB 348 348 <0.01 160 Protein family/group databases MoonProt 368 368 <0.01 157 Protein family/group databases NCBIfam 300521 277567 0.53 20 Family and domain databases neXtProt 20324 20324 0.04 76 Organism-specific databases NIAGADS 69 69 <0.01 167 Organism-specific databases OGP 373 373 <0.01 156 2D gel databases OMA 430603 430603 0.75 14 Phylogenomic databases OpenTargets 18427 18282 0.03 85 Organism-specific databases Orphanet 8177 4417 0.01 103 Organism-specific databases OrthoDB 275159 275159 0.48 24 Phylogenomic databases PANTHER 1003431 502117 1.76 4 Family and domain databases PathwayCommons 19454 19454 0.03 81 Enzyme and pathway databases PATRIC 92992 92992 0.16 36 Genome annotation databases PaxDb 153442 153442 0.27 27 Proteomic databases PCDDB 132 132 <0.01 165 3D structure databases PDB 288259 34877 0.50 23 3D structure databases PDBsum 288259 34877 0.50 22 3D structure databases PeptideAtlas 39465 39465 0.07 58 Proteomic databases PeroxiBase 792 771 <0.01 147 Protein family/group databases Pfam 839359 540554 1.47 5 Family and domain databases PharmGKB 18033 18014 0.03 88 Organism-specific databases Pharos 20224 20224 0.04 79 Miscellaneous databases PHI-base 2341 1837 <0.01 128 Miscellaneous databases PhosphoSitePlus 42102 42102 0.07 57 PTM databases PhylomeDB 115539 115539 0.20 32 Phylogenomic databases PIR 125101 114773 0.22 31 Sequence databases PIRSF 110920 109752 0.19 34 Family and domain databases PlantReactome 1320 771 <0.01 138 Enzyme and pathway databases PomBase 5129 5125 0.01 114 Organism-specific databases PRIDE 637 637 <0.01 151 Proteomic databases PRINTS 150747 129449 0.26 28 Family and domain databases PRO 98141 98140 0.17 35 Miscellaneous databases ProMEX 487 487 <0.01 154 Proteomic databases PROSITE 491564 310962 0.86 13 Family and domain databases Proteomes 503601 461663 0.88 11 Miscellaneous databases ProteomicsDB 72691 45372 0.13 38 Proteomic databases PseudoCAP 2036 2036 <0.01 133 Organism-specific databases Pumba 18207 18207 0.03 87 Proteomic databases Reactome 142323 38166 0.25 29 Enzyme and pathway databases REBASE 787 391 <0.01 148 Protein family/group databases RefSeq 596391 451523 1.04 8 Sequence databases REPRODUCTION-2DPAGE 1260 1039 <0.01 139 2D gel databases RGD 8120 8118 0.01 104 Organism-specific databases RNAct 43106 43106 0.08 56 Miscellaneous databases SABIO-RK 5579 5579 0.01 112 Enzyme and pathway databases SASBDB 840 840 <0.01 146 3D structure databases SFLD 20269 9044 0.04 78 Family and domain databases SGD 6746 6741 0.01 108 Organism-specific databases SignaLink 19959 19959 0.03 80 Enzyme and pathway databases SIGNOR 7457 7457 0.01 105 Enzyme and pathway databases SMART 205489 148271 0.36 25 Family and domain databases SMR 516325 516325 0.90 10 3D structure databases STRING 335612 335612 0.59 18 Protein-protein interaction databases SUPFAM 648065 459341 1.14 7 Family and domain databases SWISS-2DPAGE 1177 1177 <0.01 140 2D gel databases SwissLipids 1478 1394 <0.01 137 Chemistry databases SwissPalm 13344 13344 0.02 98 PTM databases TAIR 16391 16305 0.03 94 Organism-specific databases TCDB 8525 8443 0.01 102 Protein family/group databases TopDownProteomics 3236 2957 0.01 123 Proteomic databases TreeFam 46157 46134 0.08 54 Phylogenomic databases TubercuList 2327 2291 <0.01 129 Organism-specific databases UCD-2DPAGE 496 496 <0.01 153 2D gel databases UCSC 50881 46420 0.09 50 Genome annotation databases UniLectin 356 356 <0.01 159 Protein family/group databases UniPathway 139693 126059 0.24 30 Enzyme and pathway databases VEuPathDB 81490 74972 0.14 37 Organism-specific databases VGNC 4510 4496 0.01 116 Organism-specific databases WBParaSite 49 47 <0.01 169 Genome annotation databases World-2DPAGE 936 924 <0.01 145 2D gel databases WormBase 6953 5069 0.01 106 Organism-specific databases Xenbase 4737 4737 0.01 115 Organism-specific databases ZFIN 3245 3244 0.01 122 Organism-specific databases Total number of cross-referenced databases: 170 6. AMINO ACID COMPOSITION 6.1 Composition in percent for the complete database Ala (A) 8.25 Gln (Q) 3.93 Leu (L) 9.65 Ser (S) 6.65 Arg (R) 5.52 Glu (E) 6.72 Lys (K) 5.80 Thr (T) 5.36 Asn (N) 4.06 Gly (G) 7.07 Met (M) 2.41 Trp (W) 1.10 Asp (D) 5.46 His (H) 2.27 Phe (F) 3.86 Tyr (Y) 2.92 Cys (C) 1.38 Ile (I) 5.91 Pro (P) 4.74 Val (V) 6.85 Asx (B) 0.000 Glx (Z) 0.000 Xaa (X) 0.00 Legend: gray = aliphatic, red = acidic, green = small hydroxy, blue = basic, black = aromatic, white = amide, yellow = sulfur 6.2 Classification of the amino acids by their frequency Leu, Ala, Gly, Val, Glu, Ser, Ile, Lys, Arg, Asp, Thr, Pro, Asn, Gln, Phe, Tyr, Met, His, Cys, Trp 7. MISCELLANEOUS STATISTICS 4467 entries are encoded on a mitochondrion, and 3999 are encoded on a plasmid. 12200 entries are encoded on a plastid, of which 22 are encoded on apicoplasts, 11634 on chloroplasts, 51 on organellar chromatophores, 145 on cyanelles, 149 on non-photosynthetic plastids and 199 on unspecified types of plastid. Number of entries with at least one sequence correction: 81170