There's been a lot going on here over the last two months. First, Guillaume, an additional member to our software team, started work here at the beginning of March. Second, we've been working heavily on the implementation and integration of the new case-management model (mentioned in the Feb update; it models how humans seek treatment when they have a simulated sickness event). This includes a considerable redesign of how we input data into that, to make this easier to work with for the project scientists. Then, there's been a lot of smaller fixes, tweaks, and the like; for example Guillaume has recently included the ability to "import" infections into our simulations (basically, when we try to model the elimination of malaria via various interventions, we need to see what kind of follow-up precautions are required to prevent malaria from reestablishing itself were an infected person to move into the area, or some such).
One thing we thought we'd mention is performance optimizations, that is changes to the program that will make it run faster without affecting its behaviour: I (Diggory) and also our new developer, Guillaume, have found a few optimizations that made sense, due to the amount of time profiling (performance analysis) showed the code to be spending in some small regions of code. One of these, for example, was looking up age groups (skip to the next paragraph if you find it too technical…): every human, as represented by our model, has an age. When we report data (say, the number of sicknesses caused by malaria, or just the total number of humans) we don't want to entirely ignore the age in the output, and at the same time reporting the exact age of each human each time their mentioned would just be too much data (we often run scenarios with a population of 100,000 humans). As a middle ground, we define a number of age groups and report the number of cases, humans, or whatever, divided into each age group (twenty-five under 1s, fifteen 1-2s, ..., twenty-six 60-99s). To get back to the optimization: every time we report something, we need to know which age group the human is in. This was done as a linear-time operation: is human in first age group, if not are they in second, etc. Those of you who've taken courses on computer data structures should know a good answer to this one: use a binary-search approach to get the answer in logarithmic time (<=12? no -> <=20? yes -> <= 14? yes -> <= 13? No -> you're 14 years old) (computer scientists: you can't use a hash-map because your key is a floating-point number). But I realized it's possible to do this even faster: by remembering each human's age-group the previous time-step, you can just ask: "are they in the same age-group or are they now in the next one?", which is constant time. This and one other optimization resulted in about a 10% decrease in execution time While I think we've dealt with the most obvious optimization candidates, anyone interested in profiling the code themselves, and possibly contributing a patch, is welcome to take a look at the code on google code (svn) or github (git clone and thus not always 100% up-to-date, but also hosts some branches not on svn).
I realize there have been some questions here about timelines. No, we're not going to give ourselves any fixed deadlines to stick to. It would be nice though if our new models are ready (coded up, with required input data) to run some testing scenarios by the end of this month (this will likely just be a few tests, rather than jumping straight into full-scale parameter fitting).
In the meantime, we are preparing another simulation run that we would like to start before the end of the month. The study will be an extension of some of the earlier work published here:
Tediosi F, Maire N, Penny M, Studer A, Smith T. Simulation of the cost-effectiveness of malaria vaccines. Malaria Journal 2009, 8:127
As the experimental design is very similar to this previous study, we plan to re-use some of the old software for creating the workunits, and as a consequence also an earlier implementation of the simulation model. We will therefore temporarily re-deploy an older version of the science application as the main application for this relatively small study (several ten-thousand workunits).
Diggory and Nicolas
(Aside: does anyone know why this forum eats and double-encodes unicode sequences? For example an m-dash: — )