Ourare made up of long of . If we were able to see a bit of under the microscope, we would see that it is a simple succession of four different : adenine, thymine, cytosine and guanine, otherwise known as A, T, C and G.
Theof our 23 has been sequenced. The result? 3 billions of A,T,C and G! It is what we call the human .
are bits of of varied lengths. About 20’000 have been discovered, which produce all the in our body. These are hidden inside a text which is 3 billion characters long. How do you find them? How do you read them? How do you make sense out of them?
Allhave a beginning and an end. Between the two limits lies a recipe which produces, for example, a . A is a succession of known as , of which there are 20, all represented by a letter. Going from to is like going from Chinese to Russian. You need a translator. In other words, you need the .
3 consecutive ‘letters’ in corresponds to 1 in a . By reading these three letter words, a is subsequently translated into a . It is precisely in this way that our are able to produce . And the is almost the same for all organisms: from viruses to elephants, to snails, and tulips to humans.
Here is one of the many ways of representing the. To read it:
start at the centre and move towards the outside.
An example: ATG (on the) codes for M (methionine), and TTC codes for F (phenylalanine).
TAA, TAG and TGA do not code for an, but indicate a ‘stop’.
Nature is not simple. which code for represent less than 5% of our . Finding them is like hunting for a needle in a haystack! What is more, not only are these scattered all over our but they are also frequently in parts. Scientists then have to seek out the parts and reassemble them so that the can be read correctly.
Computer programs were designed for just this problem and are able to predict the beginning of a , its end, and what is found between. Such programs are still light years away from giving perfect results, but they are a precious aid to biologists.