UniProtKB/Swiss-Prot protein knowledgebase release 2012_02 statistics
1. INTRODUCTION
Release 2012_02 of 22-Feb-12 of UniProtKB/Swiss-Prot contains 534695 sequence entries,
comprising 189667883 amino acids abstracted from 207395 references.
468 sequences have been added since release 2012_01, the sequence data of
116 existing entries has been updated and the annotations of
456192 entries have been revised.
Number of fragments: 8976
Number of additional sequences produced by alternative splicing, initiation or promoter usage, or ribosomal frameshifting: 31041
Protein existence (PE): entries %
1: Evidence at protein level 74107 13.9%
2: Evidence at transcript level 69984 13.1%
3: Inferred from homology 374279 70%
4: Predicted 14442 2.7%
5: Uncertain 1883 0.4%
The growth of the database is summarized below.
2. TAXONOMIC ORIGIN
Total number of species represented in this release of UniProtKB/Swiss-Prot: 12726
The first twenty species represent 111314 sequences: 20.8 % of the total
number of entries.
2.1 Table of the frequency of occurrence of species
Species represented 1x: 5365
2x: 1849
3x: 955
4x: 628
5x: 463
6x: 374
7x: 272
8x: 218
9x: 198
10x: 110
11- 20x: 655
21- 50x: 392
51-100x: 209
>100x: 1038
2.2 Table of the most represented species
------ --------- --------------------------------------------
Number Frequency Species
------ --------- --------------------------------------------
1 20246 Homo sapiens (Human)
2 16473 Mus musculus (Mouse)
3 11018 Arabidopsis thaliana (Mouse-ear cress)
4 7690 Rattus norvegicus (Rat)
5 6619 Saccharomyces cerevisiae (strain ATCC 204508 / S288c) (Baker's yeast)
6 5885 Bos taurus (Bovine)
7 4976 Schizosaccharomyces pombe (strain 972 / ATCC 24843) (Fission yeast)
8 4431 Escherichia coli (strain K12)
9 4244 Bacillus subtilis
10 4122 Dictyostelium discoideum (Slime mold)
11 3344 Caenorhabditis elegans
12 3331 Xenopus laevis (African clawed frog)
13 3141 Drosophila melanogaster (Fruit fly)
14 2835 Oryza sativa subsp. japonica (Rice)
15 2782 Danio rerio (Zebrafish) (Brachydanio rerio)
16 2229 Gallus gallus (Chicken)
17 2217 Pongo abelii (Sumatran orangutan)
18 2011 Escherichia coli O157:H7
19 1925 Mycobacterium tuberculosis
20 1795 Salmonella typhimurium
21 1787 Methanocaldococcus jannaschii
22 1707 Haemophilus influenzae (strain ATCC 51907 / DSM 11121 / KW20 / Rd)
23 1678 Shigella flexneri
24 1675 Escherichia coli O6
25 1623 Xenopus tropicalis (Western clawed frog) (Silurana tropicalis)
26 1407 Sus scrofa (Pig)
27 1346 Salmonella typhi
28 1244 Mycobacterium bovis
29 1221 Pseudomonas aeruginosa (strain ATCC 15692 / PAO1 / 1C / PRS 101 / LMG 12228)
30 1169 Macaca fascicularis (Crab-eating macaque) (Cynomolgus monkey)
31 1026 Synechocystis sp. (strain PCC 6803 / Kazusa)
32 1013 Yersinia pestis
33 1000 Archaeoglobus fulgidus
34 955 Vibrio cholerae
35 930 Salmonella paratyphi A
36 926 Ashbya gossypii (strain ATCC 10895 / CBS 109.51 / FGSC 9923 / NRRL Y-1056)
37 925 Staphylococcus aureus (strain N315)
38 923 Staphylococcus aureus (strain Mu50 / ATCC 700699)
39 909 Acanthamoeba polyphaga mimivirus (APMV)
40 901 Kluyveromyces lactis
41 899 Staphylococcus aureus (strain COL)
42 895 Staphylococcus aureus (strain MW2)
43 889 Staphylococcus aureus (strain MSSA476)
44 888 Escherichia coli O6:K15:H31 (strain 536 / UPEC)
45 888 Staphylococcus aureus (strain MRSA252)
46 885 Oryctolagus cuniculus (Rabbit)
47 882 Salmonella choleraesuis
48 878 Shigella sonnei (strain Ss046)
49 868 Rhizobium meliloti (strain 1021) (Ensifer meliloti) (Sinorhizobium meliloti)
50 864 Yersinia pseudotuberculosis
51 861 Candida glabrata
52 841 Escherichia coli O9:H4 (strain HS)
53 834 Escherichia coli O139:H28 (strain E24377A / ETEC)
54 830 Neurospora crassa
55 829 Shigella boydii serotype 4 (strain Sb227)
56 824 Escherichia coli (strain UTI89 / UPEC)
57 819 Escherichia coli (strain ATCC 8739 / DSM 1576 / Crooks)
58 817 Shigella dysenteriae serotype 1 (strain Sd197)
59 799 Canis familiaris (Dog) (Canis lupus familiaris)
60 795 Vibrio parahaemolyticus
61 791 Escherichia coli (strain SMS-3-5 / SECEC)
62 783 Erwinia carotovora subsp. atroseptica (Pectobacterium atrosepticum)
63 777 Aquifex aeolicus (strain VF5)
64 773 Pasteurella multocida (strain Pm70)
65 771 Escherichia coli (strain K12 / DH10B)
66 765 Escherichia coli O127:H6 (strain E2348/69 / EPEC)
67 765 Escherichia coli (strain K12 / MC4100 / BW2952)
68 764 Escherichia coli O17:K52:H18 (strain UMN026 / ExPEC)
69 764 Emericella nidulans
70 762 Escherichia coli (strain 55989 / EAEC)
71 761 Escherichia coli O8 (strain IAI1)
72 760 Shigella flexneri serotype 5b (strain 8401)
73 759 Staphylococcus epidermidis (strain ATCC 35984 / RP62A)
74 758 Streptomyces coelicolor
75 757 Staphylococcus epidermidis (strain ATCC 12228)
76 756 Escherichia coli (strain SE11)
77 756 Escherichia coli O45:K1 (strain S88 / ExPEC)
78 753 Escherichia coli O7:K1 (strain IAI39 / ExPEC)
79 748 Escherichia coli O157:H7 (strain EC4115 / EHEC)
80 743 Photorhabdus luminescens subsp. laumondii (strain TT01)
81 735 Yersinia enterocolitica serotype O:8 / biotype 1B (strain 8081)
82 734 Bacillus halodurans
83 734 Staphylococcus aureus (strain NCTC 8325)
84 733 Vibrio vulnificus
85 732 Bacillus anthracis
86 731 Escherichia coli O81 (strain ED1a)
87 721 Salmonella enteritidis PT4 (strain P125109)
88 717 Vibrio vulnificus (strain YJ016)
89 716 Salmonella paratyphi B (strain ATCC BAA-1250 / SPB7)
90 715 Yersinia pestis bv. Antiqua (strain Nepal516)
91 714 Salmonella paratyphi A (strain AKU_12601)
92 713 Enterobacter sp. (strain 638)
93 713 Salmonella agona (strain SL483)
94 713 Escherichia coli O1:K1 / APEC
95 713 Salmonella newport (strain SL254)
96 713 Yersinia pseudotuberculosis serotype O:1b (strain IP 31758)
97 712 Klebsiella pneumoniae subsp. pneumoniae (strain ATCC 700721 / MGH 78578)
98 712 Salmonella schwarzengrund (strain CVM19633)
99 711 Yersinia pestis bv. Antiqua (strain Antiqua)
100 710 Salmonella heidelberg (strain SL476)
101 702 Salmonella dublin (strain CT_02021853)
102 698 Shigella boydii serotype 18 (strain CDC 3083-94 / BS512)
103 696 Klebsiella pneumoniae (strain 342)
104 695 Escherichia fergusonii (strain ATCC 35469 / DSM 13698 / CDC 0568-73)
105 690 Candida albicans (strain SC5314 / ATCC MYA-2876) (Yeast)
106 689 Zea mays (Maize)
107 687 Mycoplasma pneumoniae (strain ATCC 29342 / M129)
108 687 Pan troglodytes (Chimpanzee)
109 687 Nostoc sp. (strain PCC 7120 / UTEX 2576)
110 683 Salmonella gallinarum (strain 287/91 / NCTC 13346)
111 678 Citrobacter koseri (strain ATCC BAA-895 / CDC 4225-83 / SGSC4696)
112 675 Pseudomonas putida (strain KT2440)
113 675 Pseudomonas syringae pv. tomato (strain DC3000)
114 669 Serratia proteamaculans (strain 568)
115 668 Mycobacterium leprae
116 667 Yersinia pestis (strain Pestoides F)
117 666 Staphylococcus aureus (strain USA300)
118 658 Rhizobium sp. (strain NGR234)
119 656 Bradyrhizobium japonicum
120 653 Debaryomyces hansenii
121 648 Bacillus cereus (strain ATCC 14579 / DSM 31)
122 643 Escherichia coli
123 643 Staphylococcus aureus (strain bovine RF122 / ET3-1)
124 642 Salmonella arizonae (strain ATCC BAA-731 / CDC346-86 / RSK2980)
125 641 Yarrowia lipolytica (strain CLIB 122 / E 150) (Yeast) (Candida lipolytica)
126 638 Yersinia pseudotuberculosis serotype O:3 (strain YPIII)
127 634 Yersinia pseudotuberculosis serotype IB (strain PB1/+)
128 632 Neosartorya fumigata (strain ATCC MYA-4609 / Af293 / CBS 101355 / FGSC A1100)
129 629 Agrobacterium tumefaciens (strain C58 / ATCC 33970)
130 628 Shewanella oneidensis
131 622 Cronobacter sakazakii (strain ATCC BAA-894) (Enterobacter sakazakii)
132 615 Treponema pallidum (strain Nichols)
133 612 Staphylococcus haemolyticus (strain JCSC1435)
134 608 Methanobacterium thermoautotrophicum (strain Delta H)
135 605 Rhizobium loti (strain MAFF303099) (Mesorhizobium loti)
136 602 Photobacterium profundum (Photobacterium sp. (strain SS9))
137 602 Listeria monocytogenes
138 602 Staphylococcus saprophyticus subsp. saprophyticus
139 601 Salmonella paratyphi C (strain RKS4594)
140 601 Ralstonia solanacearum (strain GMI1000) (Pseudomonas solanacearum)
141 600 Yersinia pestis bv. Antiqua (strain Angola)
142 600 Xanthomonas campestris pv. campestris
143 593 Oryza sativa subsp. indica (Rice)
144 590 Bacillus cereus (strain ATCC 10987)
145 590 Listeria innocua
146 589 Pectobacterium carotovorum subsp. carotovorum (strain PC1)
147 585 Rickettsia prowazekii (strain Madrid E)
148 581 Neisseria meningitidis serogroup B
149 576 Brucella suis biovar 1 (strain 1330)
150 572 Brucella melitensis biotype 1 (strain 16M / ATCC 23456 / NCTC 10094)
151 572 Helicobacter pylori (strain ATCC 700392 / 26695) (Campylobacter pylori)
152 572 Buchnera aphidicola subsp. Acyrthosiphon pisum (strain APS)
153 567 Bacillus thuringiensis subsp. konkukian (strain 97-27)
154 565 Helicobacter pylori (strain J99) (Campylobacter pylori J99)
155 565 Pseudomonas syringae pv. syringae (strain B728a)
156 563 Caulobacter crescentus (Caulobacter vibrioides)
157 562 Bacillus licheniformis (strain DSM 13 / ATCC 14580)
158 562 Caenorhabditis briggsae
159 562 Buchnera aphidicola subsp. Schizaphis graminum (strain Sg)
160 561 Vibrio fischeri (strain ATCC 700601 / ES114)
161 560 Bacillus cereus (strain ZK / E33L)
162 558 Clostridium acetobutylicum
163 558 Pseudomonas aeruginosa (strain UCBPP-PA14)
164 556 Neisseria meningitidis serogroup A
165 556 Xanthomonas axonopodis pv. citri (Citrus canker)
166 552 Pseudomonas fluorescens (strain Pf0-1)
167 551 Oceanobacillus iheyensis (strain DSM 14371 / JCM 11309 / KCTC 3954 / HTE831)
168 546 Pseudomonas fluorescens (strain Pf-5 / ATCC BAA-477)
169 544 Pseudomonas syringae pv. phaseolicola (strain 1448A / Race 6)
170 531 Erwinia tasmaniensis (strain DSM 17950 / Et1/99)
171 530 Lactococcus lactis subsp. lactis (strain IL1403) (Streptococcus lactis)
172 529 Listeria monocytogenes serotype 4b (strain F2365)
173 528 Sodalis glossinidius (strain morsitans)
174 527 Streptococcus pneumoniae
175 523 Thermotoga maritima
176 522 Xylella fastidiosa
177 520 Bordetella bronchiseptica (strain ATCC BAA-588 / NCTC 13252 / RB50)
178 514 Chromobacterium violaceum
179 512 Bordetella pertussis
180 512 Xylella fastidiosa (strain Temecula1 / ATCC 700964)
181 511 Pseudomonas aeruginosa (strain PA7)
182 511 Vibrio cholerae serotype O1 (strain ATCC 39541 / Ogawa 395 / O395)
183 510 Haemophilus ducreyi (strain 35000HP / ATCC 700724)
184 508 Bordetella parapertussis
185 507 Buchnera aphidicola subsp. Baizongia pistaciae (strain Bp)
186 507 Geobacillus kaustophilus (strain HTA426)
187 506 Staphylococcus aureus (strain Newman)
188 501 Deinococcus radiodurans
189 500 Pseudomonas entomophila (strain L48)
190 499 Brucella abortus biovar 1 (strain 9-941)
191 498 Corynebacterium glutamicum (Brevibacterium flavum)
192 497 Rickettsia conorii (strain ATCC VR-613 / Malish 7)
193 496 Bacillus clausii (strain KSM-K16)
194 494 Haemophilus influenzae (strain 86-028NP)
195 494 Streptomyces avermitilis
196 493 Burkholderia pseudomallei (Pseudomonas pseudomallei)
197 492 Proteus mirabilis (strain HI4320)
198 492 Bacillus amyloliquefaciens (strain FZB42)
199 491 Xanthomonas campestris pv. campestris (strain 8004)
200 490 Vibrio harveyi (strain ATCC BAA-1116 / BB120)
201 490 Clostridium perfringens
202 487 Shewanella sp. (strain MR-7)
203 486 Methanosarcina acetivorans (strain ATCC 35395 / DSM 2834 / JCM 12185 / C2A)
204 485 Mannheimia succiniciproducens (strain MBEL55E)
205 484 Pseudomonas aeruginosa (strain LESB58)
206 484 Staphylococcus aureus (strain Mu3 / ATCC 700698)
207 484 Shewanella sp. (strain MR-4)
208 483 Mycoplasma genitalium (strain ATCC 33530 / G-37 / NCTC 10195)
209 480 Acinetobacter sp. (strain ADP1)
210 478 Thermosynechococcus elongatus (strain BP-1)
211 476 Enterococcus faecalis (Streptococcus faecalis)
212 476 Synechococcus elongatus (strain PCC 7942) (Anacystis nidulans R2)
213 475 Pyrococcus horikoshii
214 474 Burkholderia sp. (strain 383) (Burkholderia cepacia
215 474 Pseudomonas putida (strain F1 / ATCC 700007)
216 473 Brucella abortus (strain 2308)
217 473 Aspergillus oryzae (strain ATCC 42149 / RIB 40) (Yellow koji mold)
218 466 Xanthomonas campestris pv. vesicatoria (strain 85-10)
219 466 Shewanella frigidimarina (strain NCIMB 400)
220 466 Pseudomonas putida (strain GB-1)
221 466 Halobacterium salinarium (strain ATCC 700922 / JCM 11081 / NRC-1)
222 465 Pyrococcus abyssi (strain GE5 / Orsay)
223 465 Methanosarcina mazei
224 464 Aeromonas hydrophila subsp. hydrophila (strain ATCC 7966 / NCIB 9240)
225 463 Shewanella sp. (strain ANA-3)
226 462 Streptococcus pneumoniae (strain ATCC BAA-255 / R6)
227 462 Anabaena variabilis (strain ATCC 29413 / PCC 7937)
228 462 Cupriavidus necator (strain ATCC 17699 / H16 / DSM 428 / Stanier 337)
229 462 Rhodopseudomonas palustris (strain ATCC BAA-98 / CGA009)
230 461 Burkholderia mallei (Pseudomonas mallei)
231 458 Cupriavidus pinatubonensis (strain JMP134 / LMG 1197) (Alcaligenes eutrophus)
232 458 Lactobacillus plantarum (strain ATCC BAA-793 / NCIMB 8826 / WCFS1)
233 455 Staphylococcus aureus (strain JH1)
234 455 Rhodobacter sphaeroides (strain ATCC 17023 / 2.4.1 / NCIB 8253 / DSM 158)
235 454 Xanthomonas oryzae pv. oryzae (strain MAFF 311018)
236 453 Pseudomonas putida (strain W619)
237 453 Rickettsia felis (strain ATCC VR-1525 / URRWXCal2) (Rickettsia azadi)
238 452 Methylococcus capsulatus (strain ATCC 33009 / NCIMB 11132 / Bath)
239 452 Ovis aries (Sheep)
240 452 Shewanella baltica (strain OS185)
241 451 Streptococcus mutans
242 451 Aeromonas salmonicida (strain A449)
243 450 Pyrococcus furiosus (strain ATCC 43587 / DSM 3638 / JCM 8422 / Vc1)
244 449 Thermoanaerobacter tengcongensis
245 449 Staphylococcus aureus (strain JH9)
246 448 Mycobacterium paratuberculosis
247 448 Hahella chejuensis (strain KCTC 2396)
248 447 Vibrio fischeri (strain MJ11)
249 445 Nicotiana tabacum (Common tobacco)
250 445 Pseudomonas mendocina (strain ymp)
2.3 Taxonomic distribution of the sequences
Kingdom sequences (% of the database)
Archaea 18798 ( 4%)
Bacteria 327658 ( 61%)
Eukaryota 172223 ( 32%)
Viruses 16016 ( 3%)
Within Eukaryota:
Category sequences (% of Eukaryota) (% of the complete database)
Human 20247 ( 12%) ( 4%)
Other Mammalia 45441 ( 26%) ( 8%)
Other Vertebrata 16891 ( 10%) ( 3%)
Viridiplantae 32152 ( 19%) ( 6%)
Fungi 30554 ( 18%) ( 6%)
Insecta 8356 ( 5%) ( 2%)
Nematoda 4212 ( 2%) ( 1%)
Other 14370 ( 8%) ( 3%)
3. SEQUENCE SIZE
Repartition of the sequences by size (excluding fragments)
From To Number From To Number
1- 50 8682 1001-1100 3690
51- 100 41017 1101-1200 2561
101- 150 57183 1201-1300 1993
151- 200 57392 1301-1400 1846
201- 250 56078 1401-1500 1492
251- 300 49431 1501-1600 725
301- 350 49656 1601-1700 543
351- 400 42935 1701-1800 453
401- 450 35211 1801-1900 418
451- 500 28279 1901-2000 336
501- 550 20084 2001-2100 206
551- 600 14374 2101-2200 277
601- 650 12133 2201-2300 287
651- 700 8757 2301-2400 170
701- 750 7208 2401-2500 136
751- 800 5108 >2500 1071
801- 850 4479
851- 900 4975
901- 950 3827
951-1000 2706
The average sequence length in UniProtKB/Swiss-Prot is 354 amino acids.
The shortest sequence is GWA_SEPOF (P83570): 2 amino acids.
The longest sequence is TITIN_MOUSE (A2ASS6): 35213 amino acids.
4. JOURNAL CITATIONS
Note: the following citation statistics reflect the number of distinct
journal citations.
Total number of journals cited in this release of UniProtKB/Swiss-Prot: 2217
4.1 Table of the frequency of journal citations
Journals cited 1x: 721
2x: 287
3x: 146
4x: 107
5x: 99
6x: 77
7x: 46
8x: 37
9x: 31
10x: 28
11- 20x: 175
21- 50x: 186
51-100x: 100
>100x: 177
4.2 List of the most cited journals in UniProtKB/Swiss-Prot
Nb Citations Journal name
-- --------- -------------------------------------------------------------
1 19870 Journal of Biological Chemistry
2 9076 Proceedings of the National Academy of Sciences of the U.S.A.
3 5389 Journal of Bacteriology
4 4856 Biochemical and Biophysical Research Communications
5 4557 Gene
6 4425 Nucleic Acids Research
7 4205 Biochemistry
8 4192 FEBS Letters
9 4061 The EMBO Journal
10 3788 Molecular and Cellular Biology
11 3534 Nature
12 3355 Journal of Molecular Biology
13 3179 European Journal of Biochemistry
14 3109 Biochimica et Biophysica Acta
15 2916 Cell
16 2497 Genomics
17 2348 Biochemical Journal
18 2341 Journal of Virology
19 2324 Science
20 1920 Molecular Microbiology
21 1769 Journal of Cell Biology
22 1625 Plant Physiology
23 1571 Plant Molecular Biology
24 1516 Genes and Development
25 1496 Virology
26 1473 The American Journal of Human Genetics
27 1429 Nature Genetics
28 1402 Human Molecular Genetics
29 1372 Oncogene
30 1318 Molecular and General Genetics
31 1270 Development
32 1211 Human Mutation
33 1207 Journal of Biochemistry
34 1191 Molecular Biology of the Cell
35 1134 The Plant Cell
36 1117 Journal of Immunology
37 1049 Genetics
38 1023 Molecular Cell
39 998 Structure
40 993 The Plant Journal
41 989 Journal of General Virology
42 918 Blood
43 915 Infection and Immunity
44 885 Archives of Biochemistry and Biophysics
45 869 Journal of Cell Science
46 797 Microbiology
47 787 Developmental Biology
48 780 Yeast
49 767 Cancer Research
50 742 Current Biology
51 691 FEMS Microbiology Letters
52 617 Acta Crystallographica, Section D
53 616 Human Genetics
54 615 Nature Structural Biology
55 612 Mechanisms of Development
56 610 Protein Science
57 603 Journal of Neuroscience
58 587 Applied and Environmental Microbiology
59 570 Neuron
60 565 Toxicon
61 553 Journal of Clinical Investigation
62 536 Current Genetics
63 515 American Journal of Physiology
64 503 The Journal of Experimental Medicine
65 478 Mammalian Genome
66 467 Molecular Endocrinology
67 453 Immunogenetics
68 448 Journal of Neurochemistry
69 441 Proteins
70 436 The Journal of Clinical Endocrinology and Metabolism
71 427 Molecular and Biochemical Parasitology
72 421 Endocrinology
73 401 Bioscience, Biotechnology, and Biochemistry
74 397 Nature Cell Biology
75 396 Plant and Cell Physiology
76 389 Journal of Molecular Evolution
77 386 Journal of Medical Genetics
78 373 DNA and Cell Biology
79 368 Molecular Biology and Evolution
80 361 DNA Sequence
81 354 Experimental Cell Research
82 325 Peptides
83 325 Brain Research. Molecular Brain Research
84 321 Tissue Antigens
85 312 Comparative Biochemistry and Physiology
86 303 PLoS ONE
87 299 Molecular Pharmacology
88 295 Antimicrobial Agents and Chemotherapy
89 293 Biological Chemistry Hoppe-Seyler
90 292 Journal of Investigative Dermatology
91 290 Developmental Cell
92 287 RNA
93 277 Cytogenetics and Cell Genetics
94 273 Biology of Reproduction
95 271 Neurology
96 261 Developmental Dynamics
97 261 Virus Research
98 260 Planta
99 257 Genome Research
100 257 Nature Structural and Molecular Biology
101 256 The FEBS Journal
102 252 Journal of General Microbiology
103 242 Molecular Plant-Microbe Interactions
104 233 Immunity
105 227 EMBO Reports
106 224 European Journal of Immunology
107 222 Genes to Cells
108 220 Biochimie
109 218 The New England Journal of Medicine
110 218 Hoppe-Seyler's Zeitschrift fur Physiologische Chemie
111 217 Annals of Neurology
112 216 Eukaryotic Cell
113 213 The FASEB Journal
114 210 European Journal of Human Genetics
115 210 DNA Research
116 199 Journal of Human Genetics
117 190 Investigative Ophthalmology and Visual Science
118 186 Archives of Virology
119 182 Molecular and Cellular Endocrinology
120 178 Archives of Microbiology
121 174 Journal of the American Chemical Society
122 172 American Journal of Medical Genetics. Part A
123 171 Molecular Immunology
124 170 BMC Genomics
125 170 Journal of Cellular Biochemistry
126 169 Diabetes
127 169 Insect Biochemistry and Molecular Biology
128 167 Glycobiology
129 167 American Journal of Medical Genetics
130 167 Molecular Phylogenetics and Evolution
131 166 Clinical Genetics
132 163 Nature Immunology
133 160 Journal of Medicinal Chemistry
134 159 DNA
135 158 International Journal of Cancer
136 156 Molecular Reproduction and Development
137 155 Hemoglobin
138 153 Circulation Research
139 153 Bioorganicheskaia Khimiia
140 146 Molecular and Cellular Neuroscience
141 146 Molecular Genetics and Metabolism
142 144 Biological Chemistry
143 142 Molecular Genetics and Genomics
144 139 British Journal of Haematology
145 138 General and Comparative Endocrinology
146 138 Acta Crystallographica, Section F
147 138 Animal Genetics
148 134 Protein Expression and Purification
149 133 Journal of Experimental Botany
150 133 Proteomics
5. STATISTICS FOR SOME LINE TYPES
The following table summarizes the total number of some UniProtKB/Swiss-Prot lines,
as well as the number of entries with at least one such line, and the
frequency of the lines.
Total Number of Average
Line type / subtype number entries per entry
------------------------------------ -------- --------- ---------
References (RL) 1006647 1.88
Journal 801572 409174 1.50 1
Submitted to EMBL/GenBank/DDBJ 196199 176707 0.37 2
Submitted to other databases 6729 6273 0.01 3
Book citation 687 673 <0.01 4
Plant Gene Register 576 564 <0.01 5
Thesis 406 403 <0.01 6
Unpublished observations 285 281 <0.01 7
Patent 187 184 <0.01 8
Worm Breeder's Gazette 6 6 <0.01 9
Total number of distinct authors cited in UniProtKB/Swiss-Prot: 317799
Total Number of Average
Line type / subtype number entries per entry Rank
------------------------------------ -------- --------- --------- ----
Comments (CC) 2357115 4.41
ALLERGEN 515 515 <0.01 26
ALTERNATIVE PRODUCTS 20358 20358 0.04 13
BIOPHYSICOCHEMICAL PROPERTIES 3985 3985 0.01 23
BIOTECHNOLOGY 324 322 <0.01 28
CATALYTIC ACTIVITY 238512 216612 0.45 4
CAUTION 7887 7729 0.01 19
COFACTOR 103434 95063 0.19 7
DEVELOPMENTAL STAGE 9533 9533 0.02 16
DISEASE 4904 3298 0.01 21
DISRUPTION PHENOTYPE 4357 4357 0.01 22
DOMAIN 36745 32558 0.07 10
ENZYME REGULATION 10368 10368 0.02 15
FUNCTION 409612 392767 0.77 2
INDUCTION 13852 13852 0.03 14
INTERACTION 9138 9138 0.02 17
MASS SPECTROMETRY 5004 3804 0.01 20
MISCELLANEOUS 31148 28736 0.06 12
PATHWAY 131183 118986 0.25 6
PHARMACEUTICAL 85 85 <0.01 29
POLYMORPHISM 848 802 <0.01 24
PTM 42439 33779 0.08 8
RNA EDITING 623 623 <0.01 25
SEQUENCE CAUTION 40195 40195 0.08 9
SIMILARITY 630685 509895 1.18 1
SUBCELLULAR LOCATION 319652 314117 0.60 3
SUBUNIT 235767 235767 0.44 5
TISSUE SPECIFICITY 36735 36735 0.07 11
TOXIC DOSE 492 478 <0.01 27
WEB RESOURCE 8735 7007 0.02 18
Total number of comment topics: 29
Total Number of Average
Line type / subtype number entries per entry Rank
------------------------------------ -------- --------- --------- ----
Features (FT) 3508748 6.56
ACT_SITE 134334 82112 0.25 9
BINDING 246164 67210 0.46 4
CA_BIND 3814 1581 0.01 35
CARBOHYD 105846 27001 0.20 14
CHAIN 541249 528892 1.01 1
COILED 19727 13531 0.04 26
COMPBIAS 52996 27998 0.10 18
CONFLICT 124394 43609 0.23 11
CROSSLNK 6280 3725 0.01 34
DISULFID 103788 27980 0.19 15
DNA_BIND 11286 10395 0.02 31
DOMAIN 156473 93242 0.29 6
HELIX 152500 15859 0.29 7
INIT_MET 15175 15175 0.03 27
INTRAMEM 1921 844 <0.01 38
LIPID 11358 7211 0.02 30
METAL 297218 72741 0.56 3
MOD_RES 191507 62961 0.36 5
MOTIF 34657 22341 0.06 24
MUTAGEN 38471 8979 0.07 21
NON_CONS 1967 728 <0.01 37
NON_STD 353 278 <0.01 39
NON_TER 12112 9247 0.02 29
NP_BIND 113323 71022 0.21 12
PEPTIDE 9765 6575 0.02 32
PROPEP 12442 10696 0.02 28
REGION 110157 58957 0.21 13
REPEAT 92992 13770 0.17 16
SIGNAL 37375 37365 0.07 22
SITE 40690 24105 0.08 20
STRAND 149267 14743 0.28 8
TOPO_DOM 127703 26402 0.24 10
TRANSIT 7917 7828 0.01 33
TRANSMEM 351730 72438 0.66 2
TURN 35190 12356 0.07 23
UNSURE 2984 520 0.01 36
VAR_SEQ 41239 17790 0.08 19
VARIANT 83220 16624 0.16 17
ZN_FING 29164 12719 0.05 25
Total number of feature keys: 39
Total Number of Average
Line type / subtype number entries per entry Rank Category
------------------------------------ -------- --------- --------- ---- -------------------------------------------
Cross-references (DR) 15217785 28.46
2DBase-Ecoli 85 85 <0.01 124 2D gel databases
Aarhus/Ghent-2DPAGE 126 96 <0.01 121 2D gel databases
AGD 932 926 <0.01 99 Organism-specific databases
Allergome 1413 868 <0.01 95 Protein family/group databases
ANU-2DPAGE 26 26 <0.01 130 2D gel databases
ArachnoServer 763 755 <0.01 105 Organism-specific databases
ArrayExpress 59678 59678 0.11 42 Gene expression databases
Bgee 39297 39297 0.07 47 Gene expression databases
BindingDB 295 295 <0.01 117 Other
BioCyc 248363 239909 0.46 22 Enzyme and pathway databases
BRENDA 4241 4234 0.01 86 Enzyme and pathway databases
CAZy 7526 6768 0.01 72 Protein family/group databases
CGD 669 649 <0.01 107 Organism-specific databases
CleanEx 30110 29468 0.06 51 Gene expression databases
COMPLUYEAST-2DPAGE 99 98 <0.01 123 2D gel databases
ConoServer 915 833 <0.01 101 Organism-specific databases
Cornea-2DPAGE 67 67 <0.01 125 2D gel databases
CTD 68075 67467 0.13 39 Organism-specific databases
CYGD 5594 5591 0.01 76 Organism-specific databases
dictyBase 4199 4083 0.01 87 Organism-specific databases
DIP 13453 13345 0.03 65 Protein-protein interaction databases
DisProt 397 394 <0.01 113 3D structure databases
DMDM 16779 16778 0.03 59 Polymorphism databases
DOSAC-COBS-2DPAGE 149 147 <0.01 120 2D gel databases
DrugBank 5318 1627 0.01 77 Other
EchoBASE 4167 4163 0.01 88 Organism-specific databases
ECO2DBASE 352 300 <0.01 115 2D gel databases
EcoGene 4292 4290 0.01 85 Organism-specific databases
eggNOG 428639 428639 0.80 9 Phylogenomic databases
EMBL 916523 524266 1.71 3 Sequence databases
Ensembl 66485 48014 0.12 40 Genome annotation databases
EnsemblBacteria 97851 84909 0.18 29 Genome annotation databases
EnsemblFungi 16535 16243 0.03 60 Genome annotation databases
EnsemblMetazoa 10891 8262 0.02 68 Genome annotation databases
EnsemblPlants 15743 13499 0.03 64 Genome annotation databases
EnsemblProtists 4425 4301 0.01 84 Genome annotation databases
euHCVdb 55 44 <0.01 126 Organism-specific databases
EuPathDB 785 784 <0.01 102 Organism-specific databases
FlyBase 5839 5465 0.01 75 Organism-specific databases
Gene3D 328152 253401 0.61 17 Family and domain databases
GeneCards 19975 19668 0.04 55 Organism-specific databases
GeneFarm 3048 3034 0.01 91 Organism-specific databases
GeneID 483613 464098 0.90 6 Genome annotation databases
GeneTree 56760 56731 0.11 43 Phylogenomic databases
Genevestigator 66416 66416 0.12 41 Gene expression databases
GenoList 7063 7051 0.01 73 Organism-specific databases
GenomeReviews 376020 356446 0.70 12 Genome annotation databases
GermOnline 41906 41332 0.08 46 Gene expression databases
GlycoSuiteDB 272 272 <0.01 118 PTM databases
GO 2168969 502275 4.06 1 Ontologies
Gramene 4723 4723 0.01 80 Organism-specific databases
H-InvDB 13251 12337 0.02 66 Organism-specific databases
HAMAP 311519 311343 0.58 18 Family and domain databases
HGNC 19756 19597 0.04 56 Organism-specific databases
HOGENOM 364729 364729 0.68 13 Phylogenomic databases
HOVERGEN 75077 75077 0.14 36 Phylogenomic databases
HPA 15761 12094 0.03 63 Organism-specific databases
HSSP 30133 30133 0.06 50 3D structure databases
InParanoid 68932 68932 0.13 38 Phylogenomic databases
IntAct 33040 33040 0.06 49 Protein-protein interaction databases
InterPro 1775403 509803 3.32 2 Family and domain databases
IPI 93192 66275 0.17 32 Sequence databases
KEGG 458305 436722 0.86 8 Genome annotation databases
KO 361260 360795 0.68 14 Phylogenomic databases
LegioList 763 761 <0.01 104 Organism-specific databases
Leproma 671 668 <0.01 106 Organism-specific databases
MaizeGDB 486 481 <0.01 111 Organism-specific databases
MEROPS 10265 10265 0.02 70 Protein family/group databases
MGI 16375 16330 0.03 61 Organism-specific databases
MIM 17324 13255 0.03 58 Organism-specific databases
MINT 17583 17583 0.03 57 Protein-protein interaction databases
NextBio 49249 49247 0.09 44 Other
neXtProt 20101 20101 0.04 54 Organism-specific databases
OGP 377 377 <0.01 114 2D gel databases
OMA 384921 384921 0.72 11 Phylogenomic databases
Orphanet 4048 2446 0.01 89 Organism-specific databases
OrthoDB 77881 77881 0.15 35 Phylogenomic databases
PANTHER 199934 185789 0.37 24 Family and domain databases
Pathway_Interaction_DB 4567 1665 0.01 83 Enzyme and pathway databases
PATRIC 308109 308091 0.58 20 Genome annotation databases
PDB 81976 17762 0.15 34 3D structure databases
PDBsum 81976 17762 0.15 33 3D structure databases
PeptideAtlas 5164 5164 0.01 78 Proteomic databases
PeroxiBase 764 747 <0.01 103 Protein family/group databases
Pfam 710769 495730 1.33 4 Family and domain databases
PharmGKB 15809 15484 0.03 62 Organism-specific databases
PHCI-2DPAGE 247 247 <0.01 119 2D gel databases
PhosphoSite 25542 25542 0.05 53 PTM databases
PhosSite 351 351 <0.01 116 PTM databases
PhylomeDB 169213 169213 0.32 25 Phylogenomic databases
PIR 117573 107503 0.22 28 Sequence databases
PIRSF 96345 96331 0.18 30 Family and domain databases
PMAP-CutDB 1457 1457 <0.01 94 Other
PMMA-2DPAGE 52 52 <0.01 127 2D gel databases
PomBase 5008 4948 0.01 79 Organism-specific databases
PptaseDB 34 34 <0.01 128 Protein family/group databases
PRIDE 74600 74600 0.14 37 Proteomic databases
PRINTS 137428 120280 0.26 27 Family and domain databases
ProDom 29211 29032 0.05 52 Family and domain databases
ProMEX 495 495 <0.01 110 Proteomic databases
PROSITE 475473 300997 0.89 7 Family and domain databases
ProtClustDB 341737 341737 0.64 15 Phylogenomic databases
ProteinModelPortal 428620 428620 0.80 10 3D structure databases
PseudoCAP 1229 1220 <0.01 97 Organism-specific databases
Rat-heart-2DPAGE 28 28 <0.01 129 2D gel databases
Reactome 10590 6700 0.02 69 Enzyme and pathway databases
REBASE 400 400 <0.01 112 Protein family/group databases
RefSeq 505774 465497 0.95 5 Sequence databases
REPRODUCTION-2DPAGE 1256 1035 <0.01 96 2D gel databases
RGD 7594 7590 0.01 71 Organism-specific databases
SGD 6638 6633 0.01 74 Organism-specific databases
Siena-2DPAGE 102 102 <0.01 122 2D gel databases
SMART 166051 124341 0.31 26 Family and domain databases
SMR 211222 211222 0.40 23 3D structure databases
STRING 308517 308515 0.58 19 Protein-protein interaction databases
SUPFAM 329638 261256 0.62 16 Family and domain databases
SWISS-2DPAGE 1183 1182 <0.01 98 2D gel databases
TAIR 11062 10982 0.02 67 Organism-specific databases
TCDB 3617 3602 0.01 90 Protein family/group databases
TIGR 34506 33726 0.06 48 Genome annotation databases
TIGRFAMs 288223 267925 0.54 21 Family and domain databases
TubercuList 1941 1905 <0.01 93 Organism-specific databases
UCD-2DPAGE 510 501 <0.01 109 2D gel databases
UCSC 47716 37166 0.09 45 Genome annotation databases
UniGene 95518 87862 0.18 31 Sequence databases
VectorBase 568 554 <0.01 108 Genome annotation databases
World-2DPAGE 919 908 <0.01 100 2D gel databases
WormBase 4699 3850 0.01 81 Organism-specific databases
Xenbase 4657 4652 0.01 82 Organism-specific databases
ZFIN 2703 2691 0.01 92 Organism-specific databases
Total number of cross-referenced databases: 130
6. AMINO ACID COMPOSITION
6.1 Composition in percent for the complete database
Ala (A) 8.26 Gln (Q) 3.93 Leu (L) 9.66 Ser (S) 6.55
Arg (R) 5.53 Glu (E) 6.75 Lys (K) 5.84 Thr (T) 5.34
Asn (N) 4.06 Gly (G) 7.08 Met (M) 2.42 Trp (W) 1.08
Asp (D) 5.46 His (H) 2.27 Phe (F) 3.86 Tyr (Y) 2.92
Cys (C) 1.36 Ile (I) 5.97 Pro (P) 4.70 Val (V) 6.87
Asx (B) 0.000 Glx (Z) 0.000 Xaa (X) 0.00
Legend: gray = aliphatic, red = acidic, green = small hydroxy,
blue = basic, black = aromatic, white = amide, yellow = sulfur
6.2 Classification of the amino acids by their frequency
Leu, Ala, Gly, Val, Glu, Ser, Ile, Lys, Arg, Asp, Thr, Pro, Asn, Gln,
Phe, Tyr, Met, His, Cys, Trp
7. MISCELLANEOUS STATISTICS
4461 entries are encoded on a mitochondrion, and 3639 are encoded on a plasmid.
12188 entries are encoded on a plastid,
of which 21 are encoded on apicoplasts,
11623 on chloroplasts,
51 on organellar chromatophores,
145 on cyanelles,
149 on non-photosynthetic plastids and
199 on unspecified types of plastid.
Number of entries with at least one sequence correction: 73088