Our genetic heritage is made up of longof which form our . If we were able to see a bit of under the microscope, we would see that it is a simple succession of four different : adenine, thymine, cytosine and guanine, otherwise known as A, T, C and G. Our are hidden inside a text which is 3 billion characters long and based on an alphabet of 4 letters. But how do you read it? How do you make sense out of it?
Allhave a beginning and an end. Between the two limits lies a recipe which produces, for the great majority, a . A is a succession of known as , of which there are 20, all represented by a letter. Going from to is like going from Chinese to Russian. You need a translator. In other words, you need the .
3 consecutive ‘letters’ in corresponds to 1 in a . By reading these three letter words, a is subsequently translated into a . It is precisely in this way that our are able to produce . And the is almost the same for all organisms: from viruses to elephants, to snails, and tulips to humans.
Here is one of the many ways of representing the. To read it:
start at the centre and move towards the outside.
An example: ATG (on the) codes for M (methionine), and TTC codes for F (phenylalanine).
TAA, TAG and TGA do not code for an, but indicate a ‘stop’.
Nature is not simple.which code for represent barely 3 to 5% of our . Finding them is like hunting for a needle in a haystack! What is more, not only are these scattered all over our but they are also frequently in parts. Scientists then have to seek out the parts and reassemble them so that the can be read correctly.
Computer programs were designed for just this problem and are able to predict the beginning of a, its end, and what is found between. Such programs are still light years away from giving perfect results, but they are a precious aid to biologists.