Rosalind is a nice web site I recently discovered. Like 99 scala problems, Topcoder or similar sites it poses little programming challenges but with a special focus on bioinformatics.
The first and simplest programming challenges is to count the nucleotides within a DNA sequence (see here). For instance, given a sequence "AGCTTTTCATTCTGACTGC", what are the frequencies of the four nucleotides A,C,T,G?
It is a trivial problem, I used to solve with a dictionary but reading the problem description, which requires an ordered output of the frequencies, I realized that there is an even more elegant solution.
An efficient but ugly solution would be to count the individual nucleotide occurrences within a loop, e.g.
s = "AGCTTTTCATTCTGACTGC" A,C,T,G = 0,0,0,0 for e in s: if e=='A': A +=1 if e=='C': C +=1 if e=='G': T +=1 if e=='T': G +=1 print A,C,T,G
The repetition of the four ifs is not nice and could be replaced as follows
s = "AGCTTTTCATTCTGACTGC" idx = {'A':0, 'C':1, 'G':2, 'T':3} counts = [0]*4 for e in s: counts[idx[e]] += 1 print counts
However, it remains a rather ugly solution and I find the following line much more elegant
print map(s.count, "ACTG")
The good news is, it is even pretty efficient. The sequence gets parsed 4 times and the complexity is therefore O(4n) = O(n), which is not worse than the complexity of the dictionary approach:
counts = dict() for e in s: counts[e] = counts.get(e,0)+1 print counts
print map(s.count, "ACTG")
Just for fun and comparison the same in Scala. First the dictionary approach
def counts = s.foldLeft(Map[Char,Int]() withDefaultValue 0){ (m,c) => m + ((c,m(c)+1)) } println(counts)
which is rather convoluted. Less efficient but much more readable is the version that uses "groupBy":
val s = "AGCTTTTCATTCTGACTGC" def counts = s.groupBy(identity).mapValues(_.size) println(counts)
and mapping the count function onto the nucleotides again leads to the most elegant code
val s = "AGCTTTTCATTCTGACTGC" val counts = "ACTG".map(ch => s.count(_==ch)) println(counts)
A slightly different version I saw on Rosalind
val s = "AGCTTTTCATTCTGACTGC" println( "ACGT" map (s count _.==) mkString " " )
No comments:
Post a Comment