UniProtKB/Swiss-Prot protein knowledgebase release 2021_01 statistics 1. INTRODUCTION Release 2021_01 of 10-Feb-21 of UniProtKB/Swiss-Prot contains 564277 sequence entries, comprising 203340877 amino acids abstracted from 276972 references. 312 sequences have been added since release 2020_06, the sequence data of 57 existing entries has been updated and the annotations of 211834 entries have been revised. Number of fragments: 9231 Number of additional sequences produced by alternative splicing, initiation or promoter usage, or ribosomal frameshifting: 40440 Protein existence (PE): entries % 1: Evidence at protein level 105964 18.8% 2: Evidence at transcript level 56294 10% 3: Inferred from homology 386864 68.6% 4: Predicted 13309 2.4% 5: Uncertain 1846 0.3% The growth of the database is summarized below.2. TAXONOMIC ORIGIN Total number of species represented in this release of UniProtKB/Swiss-Prot: 14014 The first twenty species represent 121612 sequences: 21.6 % of the total number of entries. 2.1 Table of the frequency of occurrence of species Species represented 1x: 5734 2x: 2026 3x: 1091 4x: 723 5x: 511 6x: 419 7x: 319 8x: 257 9x: 230 10x: 143 11- 20x: 812 21- 50x: 470 51-100x: 224 >100x: 1055 2.2 Table of the most represented species ------ --------- -------------------------------------------- Number Frequency Species ------ --------- -------------------------------------------- 1 20396 Homo sapiens (Human) 2 17063 Mus musculus (Mouse) 3 16036 Arabidopsis thaliana (Mouse-ear cress) 4 8118 Rattus norvegicus (Rat) 5 6721 Saccharomyces cerevisiae (strain ATCC 204508 / S288c) (Baker's yeast) 6 6014 Bos taurus (Bovine) 7 5140 Schizosaccharomyces pombe (strain 972 / ATCC 24843) (Fission yeast) 8 4518 Escherichia coli (strain K12) 9 4191 Bacillus subtilis (strain 168) 10 4190 Caenorhabditis elegans 11 4150 Dictyostelium discoideum (Slime mold) 12 4104 Oryza sativa subsp. japonica (Rice) 13 3623 Drosophila melanogaster (Fruit fly) 14 3459 Xenopus laevis (African clawed frog) 15 3195 Danio rerio (Zebrafish) (Brachydanio rerio) 16 2297 Gallus gallus (Chicken) 17 2239 Mycobacterium tuberculosis (strain ATCC 25618 / H37Rv) 18 2218 Pongo abelii (Sumatran orangutan) (Pongo pygmaeus abelii) 19 2042 Escherichia coli O157:H7 20 1898 Mycobacterium tuberculosis (strain CDC 1551 / Oshkosh) 21 1802 Salmonella typhimurium (strain LT2 / SGSC1412 / ATCC 700720) 22 1787 Methanocaldococcus jannaschii 23 1707 Xenopus tropicalis (Western clawed frog) (Silurana tropicalis) 24 1705 Haemophilus influenzae (strain ATCC 51907 / DSM 11121 / KW20 / Rd) 25 1697 Escherichia coli O6:H1 (strain CFT073 / ATCC 700928 / UPEC) 26 1685 Shigella flexneri 27 1438 Sus scrofa (Pig) 28 1394 Pseudomonas aeruginosa 29 1347 Salmonella typhi 30 1244 Mycobacterium bovis (strain ATCC BAA-935 / AF2122/97) 31 1174 Macaca fascicularis (Crab-eating macaque) (Cynomolgus monkey) 32 1079 Synechocystis sp. (strain PCC 6803 / Kazusa) 33 1035 Archaeoglobus fulgidus 34 1026 Yersinia pestis 35 1014 Candida albicans (strain SC5314 / ATCC MYA-2876) (Yeast) 36 988 Vibrio cholerae serotype O1 (strain ATCC 39315 / El Tor Inaba N16961) 37 978 Emericella nidulans 38 941 Staphylococcus aureus (strain Mu50 / ATCC 700699) 39 930 Salmonella paratyphi A (strain ATCC 9150 / SARB42) 40 929 Staphylococcus aureus (strain N315) 41 928 Ashbya gossypii (strain ATCC 10895 / CBS 109.51 / FGSC 9923 / NRRL Y-1056) 42 919 Kluyveromyces lactis 43 909 Acanthamoeba polyphaga mimivirus (APMV) 44 903 Staphylococcus aureus (strain COL) 45 896 Staphylococcus aureus (strain MW2) 46 894 Escherichia coli O6:K15:H31 (strain 536 / UPEC) 47 894 Oryctolagus cuniculus (Rabbit) 48 890 Staphylococcus aureus (strain MSSA476) 49 888 Staphylococcus aureus (strain MRSA252) 50 883 Rhizobium meliloti (strain 1021) (Ensifer meliloti) (Sinorhizobium meliloti) 51 882 Salmonella choleraesuis (strain SC-B67) 52 879 Shigella sonnei (strain Ss046) 53 878 Candida glabrata 54 875 Neurospora crassa 55 863 Yersinia pseudotuberculosis serotype I (strain IP32953) 56 848 Oryza sativa subsp. indica (Rice) 57 841 Escherichia coli O9:H4 (strain HS) 58 834 Escherichia coli O139:H28 (strain E24377A / ETEC) 59 833 Neosartorya fumigata (strain ATCC MYA-4609 / Af293 / CBS 101355 / FGSC A1100) 60 833 Zea mays (Maize) 61 832 Canis lupus familiaris (Dog) (Canis familiaris) 62 829 Shigella boydii serotype 4 (strain Sb227) 63 825 Escherichia coli (strain UTI89 / UPEC) 64 822 Shigella dysenteriae serotype 1 (strain Sd197) 65 819 Escherichia coli (strain ATCC 8739 / DSM 1576 / Crooks) 66 804 Streptomyces coelicolor (strain ATCC BAA-471 / A3(2) / M145) 67 803 Pectobacterium atrosepticum (strain SCRI 1043 / ATCC BAA-672) 68 800 Staphylococcus aureus (strain NCTC 8325 / PS 47) 69 793 Vibrio parahaemolyticus serotype O3:K6 (strain RIMD 2210633) 70 791 Escherichia coli (strain SMS-3-5 / SECEC) 71 787 Aquifex aeolicus (strain VF5) 72 772 Pasteurella multocida (strain Pm70) 73 771 Escherichia coli O127:H6 (strain E2348/69 / EPEC) 74 771 Escherichia coli (strain K12 / DH10B) 75 765 Escherichia coli (strain K12 / MC4100 / BW2952) 76 765 Escherichia coli O17:K52:H18 (strain UMN026 / ExPEC) 77 762 Escherichia coli (strain 55989 / EAEC) 78 761 Escherichia coli O8 (strain IAI1) 79 760 Shigella flexneri serotype 5b (strain 8401) 80 760 Staphylococcus epidermidis (strain ATCC 35984 / RP62A) 81 760 Staphylococcus epidermidis (strain ATCC 12228 / FDA PCI 1200) 82 758 Escherichia coli O45:K1 (strain S88 / ExPEC) 83 756 Escherichia coli (strain SE11) 84 756 Bacillus anthracis 85 753 Escherichia coli O7:K1 (strain IAI39 / ExPEC) 86 748 Escherichia coli O157:H7 (strain EC4115 / EHEC) 87 748 Photorhabdus laumondii subsp. laumondii (strain DSM 15139 / CIP 105565 / TT01) 88 742 Bacillus halodurans 89 739 Yersinia enterocolitica serotype O:8 / biotype 1B (strain NCTC 13174 / 8081) 90 733 Vibrio vulnificus (strain CMCP6) 91 731 Escherichia coli O81 (strain ED1a) 92 725 Pseudomonas putida (strain ATCC 47054 / DSM 6125 / NCIMB 11950 / KT2440) 93 722 Salmonella enteritidis PT4 (strain P125109) 94 718 Vibrio vulnificus (strain YJ016) 95 716 Salmonella paratyphi B (strain ATCC BAA-1250 / SPB7) 96 715 Yersinia pestis bv. Antiqua (strain Nepal516) 97 715 Enterobacter sp. (strain 638) 98 715 Klebsiella pneumoniae subsp. pneumoniae (strain ATCC 700721 / MGH 78578) 99 714 Salmonella paratyphi A (strain AKU_12601) 100 714 Escherichia coli O1:K1 / APEC 101 713 Salmonella newport (strain SL254) 102 713 Salmonella agona (strain SL483) 103 713 Yersinia pseudotuberculosis serotype O:1b (strain IP 31758) 104 712 Salmonella schwarzengrund (strain CVM19633) 105 711 Yersinia pestis bv. Antiqua (strain Antiqua) 106 710 Salmonella heidelberg (strain SL476) 107 702 Salmonella dublin (strain CT_02021853) 108 698 Klebsiella pneumoniae (strain 342) 109 698 Shigella boydii serotype 18 (strain CDC 3083-94 / BS512) 110 697 Nostoc sp. (strain PCC 7120 / SAG 25.82 / UTEX 2576) 111 695 Escherichia fergusonii (strain ATCC 35469 / DSM 13698 / CDC 0568-73) 112 692 Pan troglodytes (Chimpanzee) 113 686 Escherichia coli 114 686 Mycoplasma pneumoniae (strain ATCC 29342 / M129) 115 684 Salmonella gallinarum (strain 287/91 / NCTC 13346) 116 680 Pseudomonas syringae pv. tomato (strain ATCC BAA-871 / DC3000) 117 678 Citrobacter koseri (strain ATCC BAA-895 / CDC 4225-83 / SGSC4696) 118 677 Staphylococcus aureus (strain USA300) 119 672 Serratia proteamaculans (strain 568) 120 669 Mycobacterium leprae (strain TN) 121 668 Bacillus cereus 122 667 Yersinia pestis (strain Pestoides F) 123 665 Yarrowia lipolytica (strain CLIB 122 / E 150) (Yeast) (Candida lipolytica) 124 664 Bradyrhizobium diazoefficiens 125 657 Sinorhizobium fredii (strain NBRC 101917 / NGR234) 126 653 Debaryomyces hansenii 127 650 Agrobacterium fabrum (strain C58 / ATCC 33970) (Agrobacterium tumefaciens 128 650 Shewanella oneidensis (strain MR-1) 129 643 Staphylococcus aureus (strain bovine RF122 / ET3-1) 130 642 Salmonella arizonae (strain ATCC BAA-731 / CDC346-86 / RSK2980) 131 641 Yersinia pseudotuberculosis serotype O:3 (strain YPIII) 132 634 Yersinia pseudotuberculosis serotype IB (strain PB1/+) 133 622 Treponema pallidum (strain Nichols) 134 622 Cronobacter sakazakii (strain ATCC BAA-894) (Enterobacter sakazakii) 135 622 Methanothermobacter thermautotrophicus 136 619 Listeria monocytogenes serovar 1/2a (strain ATCC BAA-679 / EGD-e) 137 615 Xanthomonas campestris pv. campestris 138 614 Staphylococcus haemolyticus (strain JCSC1435) 139 613 Mesorhizobium japonicum (Mesorhizobium loti 140 608 Helicobacter pylori (strain ATCC 700392 / 26695) (Campylobacter pylori) 141 603 Pseudomonas aeruginosa (strain UCBPP-PA14) 142 603 Ralstonia solanacearum (strain GMI1000) (Pseudomonas solanacearum) 143 603 Listeria innocua serovar 6a (strain ATCC BAA-680 / CLIP 11262) 144 602 Staphylococcus saprophyticus subsp. saprophyticus 145 602 Photobacterium profundum (strain SS9) 146 601 Salmonella paratyphi C (strain RKS4594) 147 600 Yersinia pestis bv. Antiqua (strain Angola) 148 595 Bacillus cereus (strain ATCC 10987 / NRS 248) 149 591 Pectobacterium carotovorum subsp. carotovorum (strain PC1) 150 584 Rickettsia prowazekii (strain Madrid E) 151 583 Neisseria meningitidis serogroup B (strain MC58) 152 579 Caenorhabditis briggsae 153 579 Brucella suis biovar 1 (strain 1330) 154 574 Brucella melitensis biotype 1 (strain 16M / ATCC 23456 / NCTC 10094) 155 573 Caulobacter vibrioides (strain ATCC 19089 / CB15) (Caulobacter crescentus) 156 573 Aliivibrio fischeri (strain ATCC 700601 / ES114) (Vibrio fischeri) 157 572 Buchnera aphidicola subsp. Acyrthosiphon pisum (strain APS) 158 572 Mycolicibacterium smegmatis (strain ATCC 700084 / mc(2)155) 159 569 Bacillus thuringiensis subsp. konkukian (strain 97-27) 160 568 Helicobacter pylori (strain J99 / ATCC 700824) (Campylobacter pylori J99) 161 567 Pseudomonas syringae pv. syringae (strain B728a) 162 564 Bacillus licheniformis 163 562 Buchnera aphidicola subsp. Schizaphis graminum (strain Sg) 164 562 Bacillus cereus (strain ZK / E33L) 165 561 Aspergillus oryzae (strain ATCC 42149 / RIB 40) (Yellow koji mold) 166 559 Thermotoga maritima (strain ATCC 43589 / MSB8 / DSM 3109 / JCM 10099) 167 559 Clostridium acetobutylicum 168 557 Xanthomonas axonopodis pv. citri (strain 306) 169 555 Pseudomonas fluorescens (strain Pf0-1) 170 554 Neisseria meningitidis serogroup A / serotype 4A (strain DSM 15465 / Z2491) 171 553 Pseudomonas fluorescens (strain ATCC BAA-477 / NRRL B-23932 / Pf-5) 172 553 Oceanobacillus iheyensis 173 547 Pseudomonas savastanoi pv. phaseolicola (Pseudomonas syringae pv. phaseolicola 174 540 Lactococcus lactis subsp. lactis (strain IL1403) (Streptococcus lactis) 175 534 Corynebacterium glutamicum 176 531 Erwinia tasmaniensis (strain DSM 17950 / CIP 109463 / Et1/99) 177 529 Listeria monocytogenes serotype 4b (strain F2365) 178 529 Sodalis glossinidius (strain morsitans) 179 528 Bordetella bronchiseptica (strain ATCC BAA-588 / NCTC 13252 / RB50) 180 524 Staphylococcus aureus (strain Newman) 181 522 Xylella fastidiosa (strain 9a5c) 182 521 Vibrio cholerae serotype O1 (strain ATCC 39541 / Classical Ogawa 395 / O395) 183 519 Methanosarcina acetivorans (strain ATCC 35395 / DSM 2834 / JCM 12185 / C2A) 184 517 Chromobacterium violaceum 185 516 Deinococcus radiodurans 186 516 Bordetella pertussis (strain Tohama I / ATCC BAA-589 / NCTC 13251) 187 515 Xylella fastidiosa (strain Temecula1 / ATCC 700964) 188 514 Streptococcus pneumoniae serotype 4 (strain ATCC BAA-334 / TIGR4) 189 512 Pseudomonas aeruginosa (strain PA7) 190 511 Streptomyces avermitilis 191 510 Haemophilus ducreyi (strain 35000HP / ATCC 700724) 192 510 Geobacillus kaustophilus (strain HTA426) 193 508 Bordetella parapertussis (strain 12822 / ATCC BAA-587 / NCTC 13253) 194 507 Buchnera aphidicola subsp. Baizongia pistaciae (strain Bp) 195 502 Pyrococcus furiosus (strain ATCC 43587 / DSM 3638 / JCM 8422 / Vc1) 196 502 Pseudomonas entomophila (strain L48) 197 499 Brucella abortus biovar 1 (strain 9-941) 198 499 Haemophilus influenzae (strain 86-028NP) 199 498 Acinetobacter baylyi (strain ATCC 33305 / BD413 / ADP1) 200 496 Rickettsia conorii (strain ATCC VR-613 / Malish 7) 201 496 Bacillus clausii (strain KSM-K16) 202 496 Burkholderia pseudomallei (strain K96243) 203 494 Pyrococcus horikoshii 204 494 Proteus mirabilis (strain HI4320) 205 494 Xanthomonas campestris pv. campestris (strain 8004) 206 492 Thermosynechococcus elongatus (strain BP-1) 207 492 Bacillus velezensis (strain DSM 23117 / BGSC 10A6 / FZB42) 208 491 Vibrio campbellii (strain ATCC BAA-1116 / BB120) 209 491 Halobacterium salinarum (strain ATCC 700922 / JCM 11081 / NRC-1) 210 489 Solanum lycopersicum (Tomato) (Lycopersicon esculentum) 211 488 Methanosarcina mazei 212 487 Shewanella sp. (strain MR-7) 213 487 Synechococcus elongatus (strain PCC 7942 / FACHB-805) (Anacystis nidulans R2) 214 486 Mannheimia succiniciproducens (strain MBEL55E) 215 486 Brucella abortus (strain 2308) 216 484 Staphylococcus aureus (strain Mu3 / ATCC 700698) 217 484 Pseudomonas aeruginosa (strain LESB58) 218 484 Shewanella sp. (strain MR-4) 219 483 Mycoplasma genitalium (strain ATCC 33530 / G-37 / NCTC 10195) 220 482 Streptococcus pneumoniae (strain ATCC BAA-255 / R6) 221 482 Saccharolobus solfataricus (strain ATCC 35092 / DSM 1617 / JCM 11322 / P2) 222 480 Lactobacillus plantarum (strain ATCC BAA-793 / NCIMB 8826 / WCFS1) 223 478 Nicotiana tabacum (Common tobacco) 224 478 Pseudomonas putida (strain ATCC 700007 / DSM 6899 / BCRC 17059 / F1) 225 477 Pyrococcus abyssi (strain GE5 / Orsay) 226 475 Burkholderia lata 227 475 Cupriavidus necator 228 472 Rhodopseudomonas palustris (strain ATCC BAA-98 / CGA009) 229 469 Rhodobacter sphaeroides (strain ATCC 17023 / 2.4.1 / NCIB 8253 / DSM 158) 230 468 Clostridium perfringens (strain 13 / Type A) 231 468 Pseudomonas putida (strain GB-1) 232 467 Enterococcus faecalis (strain ATCC 700802 / V583) 233 467 Aeromonas hydrophila subsp. hydrophila 234 467 Shewanella frigidimarina (strain NCIMB 400) 235 467 Campylobacter jejuni subsp. jejuni serotype O:2 236 466 Xanthomonas campestris pv. vesicatoria (strain 85-10) 237 466 Shewanella sp. (strain ANA-3) 238 465 Trichormus variabilis (strain ATCC 29413 / PCC 7937) (Anabaena variabilis) 239 463 Burkholderia mallei (strain ATCC 23344) 240 459 Cupriavidus pinatubonensis (strain JMP 134 / LMG 1197) (Cupriavidus necator 241 459 Ovis aries (Sheep) 242 458 Methylococcus capsulatus (strain ATCC 33009 / NCIMB 11132 / Bath) 243 457 Rickettsia felis (strain ATCC VR-1525 / URRWXCal2) (Rickettsia azadi) 244 455 Shewanella baltica (strain OS185) 245 455 Staphylococcus aureus (strain JH1) 246 455 Xanthomonas oryzae pv. oryzae (strain MAFF 311018) 247 453 Streptococcus mutans serotype c (strain ATCC 700610 / UA159) 248 453 Pseudomonas putida (strain W619) 249 452 Aeromonas salmonicida (strain A449) 250 450 Dechloromonas aromatica (strain RCB) 2.3 Taxonomic distribution of the sequences
Kingdom sequences (% of the database) Archaea 19641 ( 3%) Bacteria 334868 ( 59%) Eukaryota 192754 ( 34%) Viruses 17014 ( 3%) Within Eukaryota:
Category sequences (% of Eukaryota) (% of the complete database) Human 20397 ( 11%) ( 4%) Other Mammalia 46960 ( 24%) ( 8%) Other Vertebrata 18616 ( 10%) ( 3%) Viridiplantae 40718 ( 21%) ( 7%) Fungi 35121 ( 18%) ( 6%) Insecta 9401 ( 5%) ( 2%) Nematoda 5099 ( 3%) ( 1%) Other 16442 ( 9%) ( 3%) 3. SEQUENCE SIZE Repartition of the sequences by size (excluding fragments) From To Number From To Number 1- 50 9757 1001-1100 4038 51- 100 43014 1101-1200 2837 101- 150 59443 1201-1300 2167 151- 200 59201 1301-1400 2038 201- 250 58074 1401-1500 1643 251- 300 51899 1501-1600 809 301- 350 52301 1601-1700 624 351- 400 45375 1701-1800 572 401- 450 37316 1801-1900 489 451- 500 30167 1901-2000 386 501- 550 21909 2001-2100 259 551- 600 15548 2101-2200 361 601- 650 12957 2201-2300 331 651- 700 9299 2301-2400 225 701- 750 7730 2401-2500 180 751- 800 5603 >2500 1372 801- 850 4836 851- 900 5245 901- 950 4074 951-1000 2967
The average sequence length in UniProtKB/Swiss-Prot is 360 amino acids. The shortest sequence is GWA_SEPOF (P83570): 2 amino acids. The longest sequence is TITIN_MOUSE (A2ASS6): 35213 amino acids. 4. JOURNAL CITATIONS Note: the following citation statistics reflect the number of distinct journal citations. Total number of journals cited in this release of UniProtKB/Swiss-Prot: 2917 4.1 Table of the frequency of journal citations Journals cited 1x: 938 2x: 402 3x: 181 4x: 139 5x: 119 6x: 82 7x: 73 8x: 68 9x: 39 10x: 33 11- 20x: 237 21- 50x: 241 51-100x: 125 >100x: 240 4.2 List of the most cited journals in UniProtKB/Swiss-Prot Nb Citations Journal name -- --------- ------------------------------------------------------------- 1 25629 Journal of Biological Chemistry 2 11930 Proceedings of the National Academy of Sciences of the U.S.A. 3 6854 Journal of Bacteriology 4 5785 Biochemical and Biophysical Research Communications 5 5546 Biochemistry 6 5115 Nucleic Acids Research 7 4923 FEBS Letters 8 4845 Gene 9 4774 The EMBO Journal 10 4744 Nature 11 4441 Molecular and Cellular Biology 12 4384 Journal of Molecular Biology 13 3801 Biochimica et Biophysica Acta 14 3669 Cell 15 3445 European Journal of Biochemistry 16 3385 Journal of Virology 17 3141 Science 18 2981 Biochemical Journal 19 2688 Plant Physiology 20 2666 Molecular Microbiology 21 2537 Genomics 22 2288 The American Journal of Human Genetics 23 2255 Journal of Cell Biology 24 2213 PLoS ONE 25 2093 The Plant Cell 26 1948 The Plant Journal 27 1895 Human Molecular Genetics 28 1890 Plant Molecular Biology 29 1881 Genes and Development 30 1833 Virology 31 1797 Nature Genetics 32 1726 Molecular Biology of the Cell 33 1713 Development 34 1625 Molecular Cell 35 1591 Human Mutation 36 1556 Journal of Immunology 37 1547 Oncogene 38 1407 Structure 39 1405 Molecular and General Genetics 40 1373 Journal of Biochemistry 41 1348 Genetics 42 1306 Journal of Cell Science 43 1198 Blood 44 1181 Infection and Immunity 45 1158 Journal of General Virology 46 1109 Microbiology 47 1093 Current Biology 48 1090 Archives of Biochemistry and Biophysics 49 1088 Developmental Biology 50 954 Journal of Neuroscience 51 948 Applied and Environmental Microbiology 52 937 Acta Crystallographica, Section D 53 903 Cancer Research 54 858 FEMS Microbiology Letters 55 841 Yeast 56 817 Toxicon 57 805 Protein Science 58 783 Journal of Clinical Investigation 59 780 Neuron 60 759 PLoS Genetics 61 730 Plant and Cell Physiology 62 729 American Journal of Physiology 63 713 Human Genetics 64 700 The Journal of Experimental Medicine 65 684 Nature Communications 66 655 Mechanisms of Development 67 654 Proteins 68 648 Journal of Medical Genetics 69 647 Nature Structural Biology 70 602 Nature Cell Biology 71 585 Nature Structural and Molecular Biology 72 581 Scientific Reports 73 579 Current Genetics 74 574 The FEBS Journal 75 572 Bioscience, Biotechnology, and Biochemistry 76 552 Journal of Neurochemistry 77 544 Developmental Cell 78 544 Molecular Endocrinology 79 533 The Journal of Clinical Endocrinology and Metabolism 80 514 Endocrinology 81 508 Antimicrobial Agents and Chemotherapy 82 488 Mammalian Genome 83 472 Experimental Cell Research 84 469 Molecular and Biochemical Parasitology 85 468 PLoS Pathogens 86 448 Eukaryotic Cell 87 441 Peptides 88 438 Planta 89 434 Journal of the American Chemical Society 90 432 RNA 91 430 Immunogenetics 92 420 Journal of Experimental Botany 93 406 Journal of Molecular Evolution 94 403 Molecular Biology and Evolution 95 398 Molecular Pharmacology 96 392 EMBO Reports 97 390 The FASEB Journal 98 390 DNA and Cell Biology 99 386 American Journal of Medical Genetics. Part A 100 385 Acta Crystallographica, Section F 101 377 Journal of Investigative Dermatology 102 376 Molecular Plant-Microbe Interactions 103 376 DNA Sequence 104 375 Neurology 105 374 European Journal of Human Genetics 106 371 Immunity 107 360 Comparative Biochemistry and Physiology 108 356 Biology of Reproduction 109 345 Biochimie 110 341 Brain Research. Molecular Brain Research 111 337 Virus Research 112 336 Genes to Cells 113 330 Clinical Genetics 114 323 The New England Journal of Medicine 115 322 Developmental Dynamics 116 319 Journal of Lipid Research 117 306 Annals of Neurology 118 301 Genome Research 119 299 Biological Chemistry Hoppe-Seyler 120 298 BMC Genomics 121 297 European Journal of Immunology 122 297 Nature Immunology 123 292 Cell Reports 124 285 Applied Microbiology and Biotechnology 125 284 Investigative Ophthalmology and Visual Science 126 282 Cytogenetics and Cell Genetics 127 279 Journal of Medicinal Chemistry 128 277 PLoS Biology 129 273 Journal of General Microbiology 130 270 Journal of Human Genetics 131 262 Glycobiology 132 255 Archives of Microbiology 133 245 Traffic 134 244 Molecular Immunology 135 239 Journal of Cellular Biochemistry 136 238 Molecular Genetics and Metabolism 137 237 138 232 DNA Research 139 229 Protein Expression and Purification 140 228 Phytochemistry 141 227 Cell Cycle 142 225 Diabetes 143 223 Nature Medicine 144 219 Archives of Virology 145 219 Circulation Research 146 218 Hoppe-Seyler's Zeitschrift fur Physiologische Chemie 147 216 Fungal Genetics and Biology 148 209 Nature Chemical Biology 149 209 Molecular and Cellular Endocrinology 150 204 Molecular Genetics and Genomics 5. STATISTICS FOR SOME LINE TYPES The following table summarizes the total number of some UniProtKB/Swiss-Prot lines, as well as the number of entries with at least one such line, and the frequency of the lines. Total Number of Average Line type / subtype number entries per entry ------------------------------------ -------- --------- --------- References (RL) 1254796 2.22 Journal 1078330 462042 1.91 1 Submitted to EMBL/GenBank/DDBJ 165304 149648 0.29 2 Submitted to other databases 7576 6975 0.01 3 Book citation 1855 1832 <0.01 4 Plant Gene Register 612 599 <0.01 5 Unpublished observations 464 460 <0.01 6 Thesis 437 434 <0.01 7 Patent 212 205 <0.01 8 Worm Breeder's Gazette 6 6 <0.01 9 Total number of distinct authors cited in UniProtKB/Swiss-Prot: 431899 Total Number of Average Line type / subtype number entries per entry Rank ------------------------------------ -------- --------- --------- ---- Comments (CC) 2626922 4.66 ACTIVITY REGULATION 16137 16088 0.03 17 ALLERGEN 924 924 <0.01 26 ALTERNATIVE PRODUCTS 25508 25508 0.05 13 BIOPHYSICOCHEMICAL PROPERTIES 9499 9488 0.02 20 BIOTECHNOLOGY 1405 1381 <0.01 24 CATALYTIC ACTIVITY 311270 243910 0.55 4 CAUTION 13518 13240 0.02 18 COFACTOR 128001 116345 0.23 7 DEVELOPMENTAL STAGE 13020 12989 0.02 19 DISEASE 7513 5048 0.01 21 DISRUPTION PHENOTYPE 16915 16905 0.03 16 DOMAIN 52441 44831 0.09 9 FUNCTION 475857 453509 0.84 2 INDUCTION 22752 22700 0.04 15 INTERACTION 23573 23573 0.04 14 MASS SPECTROMETRY 7061 5429 0.01 22 MISCELLANEOUS 44226 38870 0.08 12 PATHWAY 141162 127770 0.25 6 PHARMACEUTICAL 157 149 <0.01 29 POLYMORPHISM 1295 1241 <0.01 25 PTM 59800 43454 0.11 8 RNA EDITING 628 628 <0.01 28 SEQUENCE CAUTION 44626 44555 0.08 11 SIMILARITY 512923 508717 0.91 1 SUBCELLULAR LOCATION 354294 346420 0.63 3 SUBUNIT 286814 282788 0.51 5 TISSUE SPECIFICITY 48100 47918 0.09 10 TOXIC DOSE 783 638 <0.01 27 WEB RESOURCE 6720 5559 0.01 23 Total number of comment topics: 29 Total Number of Average Line type / subtype number entries per entry Rank ------------------------------------ -------- --------- --------- ---- Features (FT) 4681627 8.30 ACT_SITE 167671 101625 0.30 10 BINDING 416498 110706 0.74 2 CA_BIND 4214 1749 0.01 36 CARBOHYD 119765 30676 0.21 15 CHAIN 572513 556964 1.01 1 COILED 22167 15342 0.04 27 COMPBIAS 59233 31875 0.10 21 CONFLICT 137197 47896 0.24 13 CROSSLNK 24116 8683 0.04 26 DISULFID 129289 34570 0.23 14 DNA_BIND 11978 10715 0.02 33 DOMAIN 206884 127208 0.37 9 HELIX 285055 25728 0.51 6 INIT_MET 17435 17387 0.03 28 INTRAMEM 2862 1313 0.01 37 LIPID 13406 8644 0.02 30 METAL 405777 97939 0.72 3 MOD_RES 255988 72908 0.45 7 MOTIF 45094 29444 0.08 23 MUTAGEN 80289 17226 0.14 18 NON_CONS 2513 817 <0.01 38 NON_STD 358 283 <0.01 39 NON_TER 12558 9634 0.02 31 NP_BIND 160805 86892 0.28 11 PEPTIDE 12021 8242 0.02 32 PROPEP 14692 12531 0.03 29 REGION 208334 96917 0.37 8 REPEAT 107173 14932 0.19 16 SIGNAL 43055 43054 0.08 24 SITE 61475 33205 0.11 20 STRAND 294914 24270 0.52 5 TOPO_DOM 145404 29593 0.26 12 TRANSIT 9260 9144 0.02 34 TRANSMEM 375789 78632 0.67 4 TURN 68863 20959 0.12 19 UNSURE 5578 842 0.01 35 VAR_SEQ 52428 22275 0.09 22 VARIANT 98787 17195 0.18 17 ZN_FING 30189 12902 0.05 25 Total number of feature keys: 39 Total Number of Average Line type / subtype number entries per entry Rank Category ------------------------------------ -------- --------- --------- ---- ------------------------------------------- Cross-references (DR) 18146830 32.16 ABCD 2716 2716 <0.01 118 Protocols and materials databases Allergome 2011 1296 <0.01 125 Protein family/group databases Antibodypedia 32100 31989 0.06 56 Protocols and materials databases ArachnoServer 1163 1154 <0.01 136 Organism-specific databases Araport 16056 15960 0.03 85 Organism-specific databases Bgee 57296 57294 0.10 42 Gene expression databases BindingDB 5782 5782 0.01 103 Chemistry databases BioCyc 202323 198273 0.36 24 Enzyme and pathway databases BioGRID 58210 56424 0.10 41 Protein-protein interaction databases BioGRID-ORCS 38973 38460 0.07 54 Miscellaneous databases BioMuta 20318 20301 0.04 71 Genetic variation databases BMRB 6905 6905 0.01 99 3D structure databases BRENDA 13021 12236 0.02 87 Enzyme and pathway databases CarbonylDB 1157 1157 <0.01 137 PTM databases CAZy 9550 8606 0.02 92 Protein family/group databases CCDS 48745 34302 0.09 48 Sequence databases CDD 190486 171209 0.34 26 Family and domain databases CGD 2001 1984 <0.01 126 Organism-specific databases ChEMBL 7827 7663 0.01 96 Chemistry databases ChiTaRS 29650 29613 0.05 60 Miscellaneous databases CLAE 357 354 <0.01 151 Protein family/group databases CollecTF 135 135 <0.01 159 Gene expression databases ComplexPortal 11392 6366 0.02 89 Protein-protein interaction databases COMPLUYEAST-2DPAGE 97 97 <0.01 161 2D gel databases ConoServer 969 881 <0.01 139 Organism-specific databases CORUM 5808 5808 0.01 102 Protein-protein interaction databases CPTAC 2525 1632 <0.01 121 Proteomic databases CPTC 296 296 <0.01 153 Protocols and materials databases CTD 75330 74422 0.13 39 Organism-specific databases DEPOD 254 254 <0.01 157 PTM databases dictyBase 4215 4101 0.01 112 Organism-specific databases DIP 17452 17412 0.03 80 Protein-protein interaction databases DisGeNET 17034 16809 0.03 81 Organism-specific databases DisProt 1425 1413 <0.01 131 Family and domain databases DMDM 16195 16193 0.03 84 Genetic variation databases DNASU 19072 19005 0.03 74 Protocols and materials databases DOSAC-COBS-2DPAGE 145 145 <0.01 158 2D gel databases DrugBank 29149 4692 0.05 62 Chemistry databases DrugCentral 2533 2533 <0.01 120 Chemistry databases EchoBASE 4158 4158 0.01 113 Organism-specific databases eggNOG 336826 330968 0.60 15 Phylogenomic databases ELM 1811 1811 <0.01 127 Protein-protein interaction databases EMBL 990643 552048 1.76 3 Sequence databases Ensembl 98873 51726 0.18 35 Genome annotation databases EnsemblBacteria 356516 337260 0.63 14 Genome annotation databases EnsemblFungi 30055 28462 0.05 59 Genome annotation databases EnsemblMetazoa 18076 10461 0.03 78 Genome annotation databases EnsemblPlants 30220 21450 0.05 58 Genome annotation databases EnsemblProtists 5046 4867 0.01 105 Genome annotation databases EPD 21175 21175 0.04 67 Proteomic databases ESTHER 2584 2583 <0.01 119 Protein family/group databases euHCVdb 55 44 <0.01 163 Organism-specific databases EvolutionaryTrace 16668 16668 0.03 83 Miscellaneous databases ExpressionAtlas 48456 48456 0.09 49 Gene expression databases FlyBase 4921 4796 0.01 106 Organism-specific databases Gene3D 415333 322407 0.74 12 Family and domain databases GeneCards 20359 20194 0.04 68 Organism-specific databases GeneDB 618 562 <0.01 145 Genome annotation databases GeneID 264993 249251 0.47 20 Genome annotation databases GeneReviews 1479 1475 <0.01 128 Organism-specific databases GeneTree 59985 59946 0.11 40 Phylogenomic databases Genevisible 55252 55252 0.10 45 Gene expression databases GeneWiki 10350 10267 0.02 91 Miscellaneous databases GenomeRNAi 22173 22172 0.04 65 Miscellaneous databases GlyConnect 2320 2178 <0.01 122 PTM databases GlyGen 11178 11178 0.02 90 PTM databases GO 3082830 539890 5.46 1 Ontologies Gramene 30220 21450 0.05 57 Genome annotation databases GuidetoPHARMACOLOGY 2041 2041 <0.01 124 Chemistry databases HAMAP 330401 327481 0.59 16 Family and domain databases HGNC 20339 20206 0.04 69 Organism-specific databases HOGENOM 424046 424046 0.75 11 Phylogenomic databases HPA 18983 18847 0.03 75 Organism-specific databases IDEAL 985 985 <0.01 138 Family and domain databases IMGT_GENE-DB 267 267 <0.01 156 Protein family/group databases InParanoid 140274 140274 0.25 27 Phylogenomic databases IntAct 55876 55876 0.10 44 Protein-protein interaction databases InterPro 2326050 545361 4.12 2 Family and domain databases iPTMnet 52679 52679 0.09 46 PTM databases jPOST 26396 26396 0.05 63 Proteomic databases KEGG 499149 476657 0.88 8 Genome annotation databases LegioList 765 763 <0.01 142 Organism-specific databases Leproma 672 669 <0.01 143 Organism-specific databases MaizeGDB 520 516 <0.01 146 Organism-specific databases MalaCards 4823 4819 0.01 108 Organism-specific databases MassIVE 17470 17470 0.03 79 Proteomic databases MaxQB 29613 29613 0.05 61 Proteomic databases MEROPS 11512 11510 0.02 88 Protein family/group databases MetOSite 3106 3106 0.01 117 PTM databases MGI 16974 16934 0.03 82 Organism-specific databases MIM 21832 15396 0.04 66 Organism-specific databases MINT 22798 22798 0.04 64 Protein-protein interaction databases MoonDB 348 348 <0.01 152 Protein family/group databases MoonProt 281 281 <0.01 155 Protein family/group databases neXtProt 20338 20338 0.04 70 Organism-specific databases NIAGADS 68 68 <0.01 162 Organism-specific databases OGP 373 373 <0.01 150 2D gel databases OMA 414480 414480 0.73 13 Phylogenomic databases OpenTargets 18361 18210 0.03 76 Organism-specific databases Orphanet 7731 4117 0.01 97 Organism-specific databases OrthoDB 245788 245788 0.44 21 Phylogenomic databases PANTHER 280917 268488 0.50 19 Family and domain databases PathwayCommons 19492 19492 0.03 73 Enzyme and pathway databases PATRIC 92453 92453 0.16 38 Genome annotation databases PaxDb 125552 125552 0.22 31 Proteomic databases PCDDB 129 129 <0.01 160 3D structure databases PDB 209826 30003 0.37 22 3D structure databases PDBsum 209826 30003 0.37 23 3D structure databases PeptideAtlas 33424 33424 0.06 55 Proteomic databases PeroxiBase 783 761 <0.01 141 Protein family/group databases Pfam 785366 523734 1.39 4 Family and domain databases PharmGKB 18313 18294 0.03 77 Organism-specific databases Pharos 20097 20097 0.04 72 Miscellaneous databases PHI-base 1459 1212 <0.01 129 Miscellaneous databases PhosphoSitePlus 39080 39080 0.07 53 PTM databases PhylomeDB 97002 97002 0.17 37 Phylogenomic databases PIR 124444 114176 0.22 33 Sequence databases PIRSF 107786 106691 0.19 34 Family and domain databases PlantReactome 1163 715 <0.01 135 Enzyme and pathway databases PomBase 5132 5128 0.01 104 Organism-specific databases PRIDE 134830 134830 0.24 29 Proteomic databases PRINTS 131178 116284 0.23 30 Family and domain databases PRO 97294 97294 0.17 36 Miscellaneous databases ProMEX 467 467 <0.01 148 Proteomic databases PROSITE 481266 305701 0.85 9 Family and domain databases Proteomes 501655 467531 0.89 7 Miscellaneous databases ProteomicsDB 56871 35743 0.10 43 Proteomic databases PseudoCAP 1401 1392 <0.01 132 Organism-specific databases Reactome 125359 35805 0.22 32 Enzyme and pathway databases REBASE 622 384 <0.01 144 Protein family/group databases RefSeq 615194 470036 1.09 5 Sequence databases REPRODUCTION-2DPAGE 1259 1038 <0.01 133 2D gel databases RGD 8043 8040 0.01 95 Organism-specific databases RNAct 43019 43019 0.08 51 Miscellaneous databases SABIO-RK 4580 4580 0.01 109 Enzyme and pathway databases SASBDB 422 422 <0.01 149 3D structure databases SFLD 8213 6102 0.01 94 Family and domain databases SGD 6740 6735 0.01 100 Organism-specific databases SignaLink 3106 3106 0.01 116 Enzyme and pathway databases SIGNOR 4843 4843 0.01 107 Enzyme and pathway databases SMART 194131 143077 0.34 25 Family and domain databases SMR 453874 453874 0.80 10 3D structure databases STRING 329595 329595 0.58 17 Protein-protein interaction databases SUPFAM 514592 389098 0.91 6 Family and domain databases SWISS-2DPAGE 1177 1177 <0.01 134 2D gel databases SwissLipids 1451 1367 <0.01 130 Chemistry databases SwissPalm 8637 8637 0.02 93 PTM databases TAIR 14833 14777 0.03 86 Organism-specific databases TCDB 7709 7650 0.01 98 Protein family/group databases TIGRFAMs 292939 272891 0.52 18 Family and domain databases TopDownProteomics 3237 2960 0.01 114 Proteomic databases TreeFam 45758 45751 0.08 50 Phylogenomic databases TubercuList 2258 2222 <0.01 123 Organism-specific databases UCD-2DPAGE 496 496 <0.01 147 2D gel databases UCSC 50324 45928 0.09 47 Genome annotation databases UniLectin 282 282 <0.01 154 Protein family/group databases UniPathway 138219 125014 0.24 28 Enzyme and pathway databases VEuPathDB 40076 39822 0.07 52 Organism-specific databases VGNC 4330 4317 0.01 111 Organism-specific databases WBParaSite 48 43 <0.01 164 Genome annotation databases World-2DPAGE 932 921 <0.01 140 2D gel databases WormBase 6441 4791 0.01 101 Organism-specific databases Xenbase 4531 4530 0.01 110 Organism-specific databases ZFIN 3169 3164 0.01 115 Organism-specific databases Total number of cross-referenced databases: 164 6. AMINO ACID COMPOSITION 6.1 Composition in percent for the complete database Ala (A) 8.25 Gln (Q) 3.93 Leu (L) 9.65 Ser (S) 6.63 Arg (R) 5.53 Glu (E) 6.72 Lys (K) 5.80 Thr (T) 5.35 Asn (N) 4.06 Gly (G) 7.07 Met (M) 2.41 Trp (W) 1.10 Asp (D) 5.46 His (H) 2.27 Phe (F) 3.86 Tyr (Y) 2.92 Cys (C) 1.38 Ile (I) 5.91 Pro (P) 4.73 Val (V) 6.86 Asx (B) 0.000 Glx (Z) 0.000 Xaa (X) 0.00
Legend: gray = aliphatic, red = acidic, green = small hydroxy, blue = basic, black = aromatic, white = amide, yellow = sulfur 6.2 Classification of the amino acids by their frequency Leu, Ala, Gly, Val, Glu, Ser, Ile, Lys, Arg, Asp, Thr, Pro, Asn, Gln, Phe, Tyr, Met, His, Cys, Trp 7. MISCELLANEOUS STATISTICS 4464 entries are encoded on a mitochondrion, and 3945 are encoded on a plasmid. 12189 entries are encoded on a plastid, of which 21 are encoded on apicoplasts, 11624 on chloroplasts, 51 on organellar chromatophores, 145 on cyanelles, 149 on non-photosynthetic plastids and 199 on unspecified types of plastid. Number of entries with at least one sequence correction: 80138