Saturday, February 9, 2013

Toy DNA Analysis Part II

After the first activity and recess break, I posed the challenge of
finding out which animal is most closely related to the hippopotamus.
I gave them short (400-base) DNA sequences of various mammals and
asked them to figure out which was most similar to the hippo's.
Again, I just made up these short sequences to keep it simple.  In
real life, there are a lot more complications just as varying number
of chromosomes, billions of base pairs, etc.

The bottom line is that they had to take an animal, look at each
position in the sequence, ask whether that that animal's DNA at that
position was different from the hippo's, and total up the differences.
After repeating this for several animals, they would see which one had
the least differences.  After the near-chaos of the morning's first
activity, I thought it would be a good idea to go over the big picture
and assemble some pseudocode on the board based on their ideas:

get hippo dna
for each animal other than hippo:
    get this animal's dna
    compare this animal's to hippodna
    print number of differences

I emphasized basic ideas like what do we want to put inside the loop and what do we want to put outside the loop. A program like this does it:
 
hippodna = open('hippo.txt').read()
nletters = len(hippodna)

for filename in ('cat.txt','dog.txt','rhino.txt','bluewhale.txt','rockhyrax.txt','rhesusmonkey.txt'):
    dna = open(filename).read()
    differences = 0
    position = 0
    for letters in hippodna:
        if hippodna[position] != dna[position]:
            differences += 1
 position += 1
    print filename,'has',differences,'differences with hippo dna' 

Yes, Python experts, I know there are more efficient ways to do it but this seems most straightforward for a kid. I won't go into the detail I went into for the morning's first activity, but it was a similarly intense back-and-forth with students running into an obstacle every 30 seconds, which I tried to turn into a learning experience. They again needed emphasis on proper indenting, punctuation, remembering that if they named a variable 'difference' early on then they couldn't refer to it as 'differences' later on, etc. They again wanted to do one animal at a time rather than write a loop. But it was a really good learning experience.

This took close to an hour. At the end, we discussed how they might modify the program to see what's most closely related to some other mammal, and then a third mammal, etc, and build up a picture of how everything is related to everything else: a family tree. That was the emphasis of the morning's third activity.

No comments:

Post a Comment