Sunday, February 17, 2008

Hierarchical clustering using R

The example is using a file, primates.txt, from Table 10.1 of the book, Computational Genome Analysis.

Hy GCCCTCTTCCTAACACTCACAACAAAACTAACCAACACTAACATTACGGATGCCCAAGAA
Pa GCCCTTTTCCTAACACTCACAACAAAACTAACTAATACTAGTATTTCAGACGCCCAGGAA
Go GCCCTTTTCCTAACACTCACAACAAAGCTAACTAGCACCAACATCTCAGACGCCCAAGAA
Ho GCCCTTTTCCTAACACTCACAACAAAACTAACTAATACTAACATCTCAGACGCTCAGGAA
Po GCCCTTTTCCTAACACTCACAACGAAACTCACCAACACTAACATCTCAGATGCCCAAGAG

Above is the content of the Table 10.1. It will occur error, if you copy the sequence into the primates.txt. There are some works to do. We just use 4 rows like the example.

You have to delete the labels, like Hy, Pa, Go, etc., and replace the A,C,G,T into 1,2,3,4, respectively.
It is easy to using the replace function by any kind of editor even the simplest one, wordpad in windows.
Then you will get the 4 sequences.

322242442241121242121121111241122112124112144123314322211311
322244442241121242121121111241124114124134144421312322213311
322244442241121242121121113241124132122112142421312322211311
322244442241121242121121111241124114124112142421312324213311

You will get wrong result of displaying of "dapes".
Becuase you just read 4 items into R, that means the one sequence is one item.
So you have to using any kind of software to let the file to be the text file separating each letter with tab.
Then you will get correct primates.txt and get right result of displaying dapes.

Clicky

Clicky Web Analytics