Data Recoding

        Data Recoding

        To aid computation codonW converts sequence information automatically from its original text format into a numerical format. This is normally transparent to the user. To add additional genetic codes or a personal choice of codon values for calculating the Fop, CAI or CBI indices, some understanding of the scheme used to convert the sequences to numerical strings is advisable.

        When calculating the indices Fop, CBI, or CAI which are a measure of codon bias in relation to the codon usage of a set of optimal genes, there is an option of using a personal choice of these values. These are read from file, there must be one value for each codon (64 in total) and they must be found in the file in a set sequence (i.e. the numerical order of the codons, TTT, TCT ... GAG, GGG). This is also the order in which codon and amino acid results are recorded to file.

         

        Internally CodonW recodes all nucleotides, codons and amino acids. Nucleotides are recoded as T/U=1, C=2, A=3, G=4. The 20 standard amino acids and the termination codons are recoded as integer values in the range 1 to 21, note that stop codons are assigned the amino acid value 11 (see Table 2). The decision about whether a codon is synonymous, or how many members are in a particular amino acid synonymous family are taken at run-time and are dependent on the genetic code chosen.

        Each codon is recoded into an integer value in the range 1 to 64, see Table 1. The formulae used to recode the codons is:

        Equation 1

        Where each of the three codon positions is represented by P1, P2 and P3. Using this recoding convention, the codon ATG has the value 45.

        Unrecognised or non-translatable bases, codons or amino acids are assigned the value zero.

         

        Table 1 Numerical values used for recoding codons

        Code

        Codon

        Amino acid

        Code

        Codon

        Amino acid

        Code

        Codon

        Amino acid

        Code

        Codon

        Amino acid

        1

        UUU

        Phe

        2

        UCU

        Ser

        3

        UAU

        Tyr

        4

        UGU

        Cys

        5

        UUC

        6

        UCC

        7

        UAC

        8

        UGC

        9

        UUA

        Leu

        10

        UCA

        11

        UAA

        STOP

        12

        UGA

        STOP

        13

        UUG

        14

        UCG

        15

        UAG

        16

        UGG

        Trp

        17

        CUU

        18

        CCU

        Pro

        19

        CAU

        His

        20

        CGU

        Arg

        21

        CUC

        22

        CCC

        23

        CAC

        24

        CGC

        25

        CUA

        26

        CCA

        27

        CAA

        Gln

        28

        CGA

        29

        CUG

        30

        CCG

        31

        CAG

        32

        CGG

        33

        AUU

        Ile

        34

        ACU

        Thr

        35

        AAU

        Asn

        36

        AGU

        Ser

        37

        AUC

        38

        ACC

        39

        AAC

        40

        AGC

        41

        AUA

        42

        ACA

        43

        AAA

        Lys

        44

        AGA

        Arg

        45

        AUG

        Met

        46

        ACG

        47

        AAG

        48

        AGG

        49

        GUU

        Val

        50

        GCU

        Ala

        51

        GAU

        Asp

        52

        GGU

        Gly

        53

        GUC

        54

        GCC

        55

        GAC

        56

        GGC

        57

        GUA

        58

        GCA

        59

        GAA

        Glu

        60

        GGA

        61

        GUG

        62

        GCG

        63

        GAG

        64

        GGG

         

        Table 2 Numerical values used to recode amino acids.

        Code

        Amino Acid

        One letter code

        Code

        Amino Acid

        One letter code

        1

        Phe

        F

        2

        Leu

        L

        3

        Ile

        I

        4

        Met

        M

        5

        Val

        V

        6

        Ser

        S

        7

        Pro

        P

        8

        Thr

        T

        9

        Ala

        A

        10

        Tyr

        Y

        11

        Stop

        *

        12

        His

        H

        13

        Gln

        Q

        14

        Asn

        N

        15

        Lys

        K

        16

        Asp

        D

        17

        Glu

        E

        18

        Cys

        C

        19

        Trp

        W

        20

        Arg

        R

        21

        Gly

        G

             

        Last updated on July 4, 1997 by John Peden

        For the most up to date versions see http://codonw.sourceforge.net/DataRecoding.html