In groups or on your own, draw an outline of the computational process used in the 4th domain paper. While doing so, consider: what data did they start with, how did they analyze it, where were parameters chosen, how were controls performed, etc.?
If you can use Jot! to draw, please do so.
Skim the “Instruction set reference” and “example logic gene”, here,
and the more detailed instructions, here,
and write AND and NOR from Table I of
by hand. I’ve transcribed them here: AND is
A and B
(i.e ‘true’ if both ‘a’ and ‘b’ are true), while NOR is
(NOT A) and (NOT B)
(i.e. ‘true’ if neither ‘a’ nor ‘b’ are true).
A few hints. First,
IO
nop-C
will get a new input value (think true/false) and put it in the CX register. If you change nop-C to a nop-A, then the value will go into the AX register; by default, or with a nop-B, the value will go into the BX register. Any time you see a ?BX? in an instruction description, it means the output destination can be modified by following the instruction with a nop-A or a nop-C, as with IO.
Second, (A NAND A) is equivalent to (NOT A) – see the truth table in http://en.wikipedia.org/wiki/Logical_NAND. Similarly, NOT (A NAND B) is equivalent to (A AND B) (see the same truth table :).x
Third, you can use ‘push’ and ‘pop’ to copy values from one register to another. For example,
push
pop
nop-C
copies from BX into CX. And
push
nop-C
pop
copies from CX into BX.
In class we’ll also look at the default starting organism,
and an evolved guy with high fitness:
Consider an Avida environment with just two digital organisms, Art and Dakota.
In small groups, discuss how Avida handles updates, reproduction, mutation, and fitness calculations. Draw a diagram representing some aspect of this process & we’ll discuss it in class.
Using Jot!, sketch box-and-arrow diagrams of “program flow” (interpret what that means as you wish...) for the following four programs:
Can you change just the code in ‘fn’ to make the tests in these files pass? (The same ‘fn’ will work for all three files; the tests are just added incrementally.)
And can you change just the code in ‘fn2’ to make the tests in these files pass?
Take a look at this program from last week:
and now consider these modifications at the bottom:
Consider:
- do these tests (the ‘assert’ statements, and code setting them up) seem reasonable, given hw 5 and 6? (see: Tutorial 5)
- which of the programs from last week passes these tests? which of the programs are “wrong” and which of them are “right”?
- what are some additional tests that you might consider adding?
- are tests like these a reasonable way to “specify” the required results in homework assignments? why or why not?
- what is the cost of having these tests in the source code?
At the beginning of class, score the following five programs based on (a) concision, (b) ease of understanding, (c) correctness, and (d) efficiency. (If you could put your scores up on a Jot! board, that’d be extra super neat.)
https://github.com/ctb/edda/blob/master/doc/beacon-2011/week9/hw-a.py
https://github.com/ctb/edda/blob/master/doc/beacon-2011/week9/hw-b.py
https://github.com/ctb/edda/blob/master/doc/beacon-2011/week9/hw-c.py
https://github.com/ctb/edda/blob/master/doc/beacon-2011/week9/hw-d.py
https://github.com/ctb/edda/blob/master/doc/beacon-2011/week9/hw-e.py
At the beginning of class, draw out (using Jot! in local OR distance groups, as you want) the workflow or pipeline for the simulation done by Drummond and Wilke, here:
That is, draw boxes with each significant set of actions taken, and arrows between them; hint, there should be at least two boxes :). For each box/arrow, consider:
- how would you test that this aspect of the workflow works correctly?
- is this aspect of the workflow a proper analogy (in the Dakotan/Pennock sense) for the biological process under study? How is it or is it not?
Also consider: is it sufficient for the correctness of the final result that each step in the pipeline leading to the final result be correct? Is it it necessary?
At the start of class, working alone in or in groups, examine these code snippets and figure out what they will print out:
https://raw.github.com/ctb/edda/master/doc/beacon-2011/week8/d-4.py
https://raw.github.com/ctb/edda/master/doc/beacon-2011/week8/d-5.py
https://raw.github.com/ctb/edda/master/doc/beacon-2011/week8/d-6.py
https://raw.github.com/ctb/edda/master/doc/beacon-2011/week8/t-1.py
https://raw.github.com/ctb/edda/master/doc/beacon-2011/week8/t-2.py
https://raw.github.com/ctb/edda/master/doc/beacon-2011/week8/t-3.py
https://raw.github.com/ctb/edda/master/doc/beacon-2011/week8/f-1.py
https://raw.github.com/ctb/edda/master/doc/beacon-2011/week8/f-2.py
https://raw.github.com/ctb/edda/master/doc/beacon-2011/week8/f-3.py
https://raw.github.com/ctb/edda/master/doc/beacon-2011/week8/f-4.py
At the start of class: working alone or in local groups, examine the following code examples and (without running them!) predict what they will print out, and why.
https://raw.github.com/ctb/edda/master/doc/beacon-2011/week8/l-1.py
https://raw.github.com/ctb/edda/master/doc/beacon-2011/week8/l-2.py
https://raw.github.com/ctb/edda/master/doc/beacon-2011/week8/l-3.py
https://raw.github.com/ctb/edda/master/doc/beacon-2011/week8/l-4.py
https://raw.github.com/ctb/edda/master/doc/beacon-2011/week8/l-5.py
https://raw.github.com/ctb/edda/master/doc/beacon-2011/week8/l-6.py
https://raw.github.com/ctb/edda/master/doc/beacon-2011/week8/l-7.py
https://raw.github.com/ctb/edda/master/doc/beacon-2011/week8/l-8.py
https://raw.github.com/ctb/edda/master/doc/beacon-2011/week8/l-9.py
https://raw.github.com/ctb/edda/master/doc/beacon-2011/week8/d-1.py
https://raw.github.com/ctb/edda/master/doc/beacon-2011/week8/d-2.py
https://raw.github.com/ctb/edda/master/doc/beacon-2011/week8/d-3.py
For the first 15 minutes, discuss:
What are the evolutionary and health implications of Neandertal DNA in modern, non-African humans?
How did the investigators come to the conclusion that as much as 7.4% of Melanesian genomes are derives from Neandertals/Denisovans?
Presentations for today:
At the start of class: working alone or in local groups, examine the following code examples and (without running them!) predict what they will print out, and why.
https://raw.github.com/ctb/edda/master/doc/beacon-2011/week7/function-1.py
https://raw.github.com/ctb/edda/master/doc/beacon-2011/week7/function-2.py
https://raw.github.com/ctb/edda/master/doc/beacon-2011/week7/function-3.py
https://raw.github.com/ctb/edda/master/doc/beacon-2011/week7/function-4.py
https://raw.github.com/ctb/edda/master/doc/beacon-2011/week7/function-5.py
https://raw.github.com/ctb/edda/master/doc/beacon-2011/week7/scope-1.py
https://raw.github.com/ctb/edda/master/doc/beacon-2011/week7/scope-2.py
https://raw.github.com/ctb/edda/master/doc/beacon-2011/week7/scope-3.py
https://raw.github.com/ctb/edda/master/doc/beacon-2011/week7/scope-4.py
https://raw.github.com/ctb/edda/master/doc/beacon-2011/week7/scope-5.py
https://raw.github.com/ctb/edda/master/doc/beacon-2011/week7/scope-6.py
BONUS: If you want to be thoroughly confused, read this discussion of scoping rules – e.g. see the first answer.
Later in the class, we’ll also take a look at this little assembler:
At the start of class: working in your usual groups, ACROSS INSTITUTIONS, check out the list of sentences submitted, here:
Divide them into categories of easy- vs hard-to-assemble, and (based on your expertise in sequence assembly) further categorize the “hard” sentences around the reason they’d be hard to assemble. Think especially about:
- short reads
- single- vs paired-end reads / short-insert vs long insert
- reverse/forward matching sequences (which we didn’t discuss in class...)
- error prone sequencing
Please put at least one of each category up on a Jot! board specific to your group (your categories should encompass all the sentences, but you can use an exemplar of each sequence). If you disagree on anything, note that and ask.
At the start of class:
Working in groups or individually, please come up with a list of items you do not like about the course so far, and what (if anything) should be changed for each item. You might touch on some or all of these –
- programming tutorials
- programming HW
- programming support
- programming handins
- in-class group discussions
- in-class “global” discussions
- in-class presentations
- paper topics and discussions of papers
- remote interaction
- groups and group structure
and anything else miscellaneous you might have to say.
Please try to be inclusive, that is, synthesize everything the group has to say rather than eliminating concerns that aren’t universally agreed upon.
Choose one person at each location to be a scribe, write them down, and then paste them into this link. Anonymous and individual comments and suggestions are also welcome. You can submit during class, or after class – basically any time before Wed evening would be good.
Then, let’s play... THE ASSEMBLY GAME!
Each group should have a pile of papers, representing different sets of short-read sequencing data resulting from feeding the same set of English quotes into a woodchipper. The papers contain reads randomly chosen from the quotes; some are in paired-end form, where the ‘...’ represents missing text of unknown (but generally consistent...) length.
Your goal(s)? Using only pen(cil) and paper – no Google! –
- Identify the total number of source quotes.
- Determine the true source text of all the quotes.
This is known as the assembly problem in sequence analysis.
Also, develop (and be prepared to describe) a workflow for doing this in general (i.e. suppose you were given this every day for a week, how would you proceed?) Feel free to draw...
Issues to consider: how does adding more people to local groups help or hurt? How does adding more people to remote groups help or hurt? What kind of data preparation steps would be a good investment, if you were given the data in a Python script and could manipulate it at will? (Think: sorting? trimming off the end? removing?)
Final question – each of the data sets (collections of paper) you got have different properties. Some of them are marked – others aren’t. What are the different properties that aren’t marked?
In-class questions to answer (in groups):
1. What did the simulation add to the Cell paper that you all read? (Drummond and Wilke)
2. What evidence was provided that the mechanism of selection against misfolding is cytotoxicity? Can you think of alternative mechanisms?
Answers to reading question 1: http://lyorn.idyll.org/~t/transfer/cse891/hw2-reading.html
Presentations:
Presentations:
HW answers for discussion:
Question 1 (reasons for sequencing human genome) https://docs.google.com/document/d/11jcO1eULx-rChK23yjfj1QzlDY0SZsFBhZDloa3DGoQ/edit?hl=en_US
Question 2 (novelty/coolness of human genome) https://docs.google.com/document/d/11y7TM171vcyURBHrRwo42VZRB07dlpEKHRSL5VrKuHY/edit?hl=en_US
Question 3 (less than 30k genes?) https://docs.google.com/document/d/1yslsJalu2aVjitm9ZH9o8MHHPgKsNHB_m_dW2hzvDyg/edit?hl=en_US