To aid computation codonW converts sequence information automatically from its original text format into a numerical format. This is normally transparent to the user. To add additional genetic codes or a personal choice of codon values for calculating the Fop, CAI or CBI indices, some understanding of the scheme used to convert the sequences to numerical strings is advisable.
When calculating the indices Fop, CBI, or CAI which are a measure of codon bias in relation to the codon usage of a set of optimal genes, there is an option of using a personal choice of these values. These are read from file, there must be one value for each codon (64 in total) and they must be found in the file in a set sequence (i.e. the numerical order of the codons, TTT, TCT ... GAG, GGG). This is also the order in which codon and amino acid results are recorded to file.
Internally CodonW recodes all nucleotides, codons and amino acids. Nucleotides are recoded as T/U=1, C=2, A=3, G=4. The 20 standard amino acids and the termination codons are recoded as integer values in the range 1 to 21, note that stop codons are assigned the amino acid value 11 (see Table 2). The decision about whether a codon is synonymous, or how many members are in a particular amino acid synonymous family are taken at run-time and are dependent on the genetic code chosen.
Each codon is recoded into an integer value in the range 1 to 64, see Table 1. The formulae used to recode the codons is:
Where each of the three codon positions is represented by P1, P2 and P3. Using this recoding convention, the codon ATG has the value 45.
Unrecognised or non-translatable bases, codons or amino acids are assigned the value zero.
Table 1 Numerical values used for recoding codons
Code |
Codon |
Amino acid |
Code |
Codon |
Amino acid |
Code |
Codon |
Amino acid |
Code |
Codon |
Amino acid |
1 |
UUU |
Phe |
2 |
UCU |
Ser |
3 |
UAU |
Tyr |
4 |
UGU |
Cys |
5 |
UUC |
6 |
UCC |
7 |
UAC |
8 |
UGC |
||||
9 |
UUA |
Leu |
10 |
UCA |
11 |
UAA |
STOP |
12 |
UGA |
STOP |
|
13 |
UUG |
14 |
UCG |
15 |
UAG |
16 |
UGG |
Trp |
|||
17 |
CUU |
18 |
CCU |
Pro |
19 |
CAU |
His |
20 |
CGU |
Arg |
|
21 |
CUC |
22 |
CCC |
23 |
CAC |
24 |
CGC |
||||
25 |
CUA |
26 |
CCA |
27 |
CAA |
Gln |
28 |
CGA |
|||
29 |
CUG |
30 |
CCG |
31 |
CAG |
32 |
CGG |
||||
33 |
AUU |
Ile |
34 |
ACU |
Thr |
35 |
AAU |
Asn |
36 |
AGU |
Ser |
37 |
AUC |
38 |
ACC |
39 |
AAC |
40 |
AGC |
||||
41 |
AUA |
42 |
ACA |
43 |
AAA |
Lys |
44 |
AGA |
Arg |
||
45 |
AUG |
Met |
46 |
ACG |
47 |
AAG |
48 |
AGG |
|||
49 |
GUU |
Val |
50 |
GCU |
Ala |
51 |
GAU |
Asp |
52 |
GGU |
Gly |
53 |
GUC |
54 |
GCC |
55 |
GAC |
56 |
GGC |
||||
57 |
GUA |
58 |
GCA |
59 |
GAA |
Glu |
60 |
GGA |
|||
61 |
GUG |
62 |
GCG |
63 |
GAG |
64 |
GGG |
Table 2 Numerical values used to recode amino acids.
Code |
Amino Acid |
One letter code |
Code |
Amino Acid |
One letter code |
1 |
Phe |
F |
2 |
Leu |
L |
3 |
Ile |
I |
4 |
Met |
M |
5 |
Val |
V |
6 |
Ser |
S |
7 |
Pro |
P |
8 |
Thr |
T |
9 |
Ala |
A |
10 |
Tyr |
Y |
11 |
Stop |
* |
12 |
His |
H |
13 |
Gln |
Q |
14 |
Asn |
N |
15 |
Lys |
K |
16 |
Asp |
D |
17 |
Glu |
E |
18 |
Cys |
C |
19 |
Trp |
W |
20 |
Arg |
R |
21 |
Gly |
G |
Last updated on July 4, 1997 by John Peden
For the most up to date versions see http://codonw.sourceforge.net/DataRecoding.html