Algebra, statistics and computational biology
Graduate course
Dept. of Mathematics
Spring 2006
Anders Nedergaard Jensen and Niels Lauritzen
Link to the official course home page.
The human genome can be viewed mathematically as a string of
around 3 billion of the letters A, C, G and T denoting
the base pairs in a DNA molecule. Around 5% of the genome
represents 25,000 actual genes i.e. functional words coding for
proteins in the human body. In
analyzing genes and similarities of DNA-sequences one usually
resorts to statistical models of joint distributions of discrete
random variables like hidden Markov models. Surprisingly the
statistical models used can be seen as solutions of a system
of highly structured polynomial equations (an algebraic variety).
This framework has been coined algebraic statistics. The inference
algorithms in computational biology fall under the heading of the
new field tropical algebraic geometry, where the usual operations of
+ and ⋅ are replaced by minimum and + respectively.
Based on the book Algebraic statistics for computational biology
we will go through statistical models in computational biology
in the context of algebra and polyhedral geometry. The book is
based on a highly successful seminar series at Berkeley in
2004-5 organized by Pachter and Sturmfels. Pachter, Sturmfels and
Sullivant are organizers of a workshop based on the book
at the Sophus Lie Conference Center in Nordfjordeid, Norway in June 2006.
Being a math course related to biology we plan a field trip to
Nordfjordeid in June.
Prerequisites
Algebra and basic statistics.