Status update -- june 2010

Message boards : Malaria Control : Status update -- june 2010

Author Message
Profile GGnaegi
Volunteer moderator
Send message
Joined: Mar 4 10
Posts: 98
Credit: 31,313
RAC: 159

Update June 2010

In May we started testing our new science application which includes several new features. After fixing some problems with software library dependencies (discussed here) and testing new binaries we are able to reduce the failure rate to an acceptable level: more than 90% of the workunits now run successfully on mac and windows. This week we deployed one more version of the application (6.41). This includes some minor changes and improvements and a new model for case-management.

More importantly, we now have prepared the input data files that we will need to launch the next phase of model optimization (see previous update). These new workunits are slightly different in their system requirements than previous ones. The simulations demand more memory. Therefore, the checkpoints will also use more disk space. We are aware that this may lead to problems on some older hardware, but there is currently no way to change this: checkpointing only works if we can save all of the relevant values to disk.

So far, the results we get back for the new workunits look OK, and we plan to deploy this as our new main science application next week for the win and mac versions at least. We are aware that there are still a few issues with the validation of the results. For a low proportion of results, we see that the outcomes differ despite the fact that the workunit ran successfully and led to an output file being uploaded. We are investigating this, but so far are having difficulties reproducing these problems. Whenever we re-run such a workunit on our computers here, we get the same outcome as the �valid� result on malariacontrol.net.

Finally, we would like to point you to another scientific article which was recently published, and which is based on some of the runs that you helped us with during late summer 2009. The study investigates the effect of differences between human individuals on malaria epidemiology in endemic countries. As you know, individuals differ from one to another. This is the case for a malaria endemic community too. Some will not seek any treatment; others will go to the clinic when the first fever appears. This type of heterogeneity and others such as transmission and risk of co-morbidity were introduced in the simulation to see how it affects patterns of illness and deaths.
Ross A. and Smith T. Interpreting malaria age-prevalence and incidence curves: a simulation study of the effects of different types of heterogeneity. Malaria Journal 2010, 9:132
____________
Guillaume Gnaegi
Swiss Tropical and Public Health Institute
http://www.swisstph.ch

RandyC
Avatar
Send message
Joined: Jun 23 06
Posts: 2695
Credit: 850,101
RAC: 1,184

Getting significant errors in v6.41. See thread here.

Michael Karlinsky
Avatar
Send message
Joined: Mar 7 06
Posts: 94
Credit: 501,336
RAC: 0

Hi,

thanks for the update. But one question remains: Linux?

Michael
____________
Team Linux Users Everywhere

hardy
Volunteer moderator
Project administrator
Project developer
Avatar
Send message
Joined: Feb 18 09
Posts: 141
Credit: 49,002
RAC: 160

thanks for the update. But one question remains: Linux?

Michael

Hi Michael,

Linux installations can vary quite a bit with regards to installed library versions, and unfortunately we're finding it a bit difficult making linux binaries that work on all systems. In any case, linux binaries are available for the openmalariaBeta application, and when we've managed to solve the incompatibility problems somehow, we'll release linux versions for the non-testing applications too.

Normand Plouffe
Send message
Joined: Nov 30 09
Posts: 1
Credit: 485,620
RAC: 318

On of the issue that I have is that there is very little time for the WU to complete. I usually want store 2 days worth of WU (in case the server is down for maintenance or the internet conx is down). When Malaria control see that I share my computer with any other BOINC project, it will dispatch Malaria control in HIGH PRIORITY because the deadline to return the WU result is too short.

How about giving a little more time between dispatch and deadline, you may get more clients.

hardy
Volunteer moderator
Project administrator
Project developer
Avatar
Send message
Joined: Feb 18 09
Posts: 141
Credit: 49,002
RAC: 160

On of the issue that I have is that there is very little time for the WU to complete. I usually want store 2 days worth of WU (in case the server is down for maintenance or the internet conx is down). When Malaria control see that I share my computer with any other BOINC project, it will dispatch Malaria control in HIGH PRIORITY because the deadline to return the WU result is too short.

How about giving a little more time between dispatch and deadline, you may get more clients.


This question has come up a few times. Basically, when we use openmalaria for parameter fitting, we iteratively create work units based on previously completed sets (of workunits). Thus, if we allow workunits more time and sets take longer to complete, our server will have to use older completed sets when creating new work, and the fitting actually becomes less efficient.

As far as I understand, however, the BOINC client should still attempt to balance the overall workload between projects as in your settings, however it may do a whole batch from one project before doing another batch from a different project.

Warped
Avatar
Send message
Joined: Aug 1 10
Posts: 22
Credit: 175,737
RAC: 598

Thanks for the information. It's nice to know that our crunching is contributing to a worthy cause.

Can anyone tell us whether UCT : malariacontrol.net does anything worthwhile?

Warped.

Profile maire
Volunteer moderator
Project administrator
Project developer
Project scientist
Send message
Joined: Nov 7 05
Posts: 438
Credit: 118,258
RAC: 0

Thanks for the information. It's nice to know that our crunching is contributing to a worthy cause.

Can anyone tell us whether UCT : malariacontrol.net does anything worthwhile?

Warped.


Yes and maybe:
Yes, they're getting ready to test a BOSSA (BOINC-extension for volunteer thinking) project called africaMap. This should be a worthy project, and for this they have the server up and running. In addition, there's a possibility that there will be another BOINC project run out of UCT (currently in early planning).
Maybe, I'm not sure the workunits they're currently sending out are of any use. I asked their project admin to switch workunit creation off, should this not be the case.

Nick

____________
Nicolas Maire
Swiss Tropical and Public Health Institute
http://www.swisstph.ch

Warped
Avatar
Send message
Joined: Aug 1 10
Posts: 22
Credit: 175,737
RAC: 598

Thanks for the prompt response, Nick.

It seems they have taken your request to heart. Not only did they switch workunit creation off, but it now seems that the entire site has been switched off.

The AfricaMap project was mentioned in March and nothing seems to have come of it. Hopefully this will be launched soon. In the meantime, I'll divert my malaria crunching to this project.

Thanks again.
Warped.

RedMenace
Send message
Joined: Apr 28 08
Posts: 1
Credit: 60,157
RAC: 0

I don't suppose you can get them to turn the servers back on for a bit? I have several dozen dangling wu's O_0

Oh well.

zombie67 [MM]
Avatar
Send message
Joined: Jan 4 07
Posts: 63
Credit: 1,006,390
RAC: 0

Yes, please turn the project back on long enough for us to report the work in progress.

Even if it is not useful, the work has been done by the volunteers, and they need to get paid for it. Also, we need some way to clear out the completed tasks in our queue.

See more here:

https://malariacontrol.net/forum_thread.php?id=1023
____________
Dublin, CA
Team SETI.USA

Profile John Neale
Avatar
Send message
Joined: Feb 21 10
Posts: 83
Credit: 88,737
RAC: 37

Thanks for the prompt response, Nick.

It seems they have taken your request to heart. Not only did they switch workunit creation off, but it now seems that the entire site has been switched off.

The AfricaMap project was mentioned in March and nothing seems to have come of it. Hopefully this will be launched soon. In the meantime, I'll divert my malaria crunching to this project.

Thanks again.
Warped.


I agree with the sentiments expressed by others: that it would (at least) be polite if the UCT : malariacontrol.net admins could switch their site back on, with work creation switched off, to allow users to return work that they have completed. It would also allow users to set their resource share on that project to 0.

Nick, you seem to have some influence with the UCT Computer Science Department ... any chance you could use it one more time?
____________

Profile maire
Volunteer moderator
Project administrator
Project developer
Project scientist
Send message
Joined: Nov 7 05
Posts: 438
Credit: 118,258
RAC: 0

I'll check with them and see what I can do.
Nick

Thanks for the prompt response, Nick.

It seems they have taken your request to heart. Not only did they switch workunit creation off, but it now seems that the entire site has been switched off.

The AfricaMap project was mentioned in March and nothing seems to have come of it. Hopefully this will be launched soon. In the meantime, I'll divert my malaria crunching to this project.

Thanks again.
Warped.


I agree with the sentiments expressed by others: that it would (at least) be polite if the UCT : malariacontrol.net admins could switch their site back on, with work creation switched off, to allow users to return work that they have completed. It would also allow users to set their resource share on that project to 0.

Nick, you seem to have some influence with the UCT Computer Science Department ... any chance you could use it one more time?


____________
Nicolas Maire
Swiss Tropical and Public Health Institute
http://www.swisstph.ch

Profile John Neale
Avatar
Send message
Joined: Feb 21 10
Posts: 83
Credit: 88,737
RAC: 37

Thank you, Nick.

I'll check with them and see what I can do.
Nick

Thanks for the prompt response, Nick.

It seems they have taken your request to heart. Not only did they switch workunit creation off, but it now seems that the entire site has been switched off.

The AfricaMap project was mentioned in March and nothing seems to have come of it. Hopefully this will be launched soon. In the meantime, I'll divert my malaria crunching to this project.

Thanks again.
Warped.


I agree with the sentiments expressed by others: that it would (at least) be polite if the UCT : malariacontrol.net admins could switch their site back on, with work creation switched off, to allow users to return work that they have completed. It would also allow users to set their resource share on that project to 0.

Nick, you seem to have some influence with the UCT Computer Science Department ... any chance you could use it one more time?



____________

zombie67 [MM]
Avatar
Send message
Joined: Jan 4 07
Posts: 63
Credit: 1,006,390
RAC: 0

I'll check with them and see what I can do.
Nick


Any progress? We are all still sitting on a bunch of tasks that need to be returned.

Thanks!
____________
Dublin, CA
Team SETI.USA

jcmb
Send message
Joined: Aug 3 08
Posts: 2
Credit: 26,018
RAC: 75

The problem with the behaviour of the Malaria Control server at the moment seems to be that it is delivering WU's assuming that it has 100% of the BOINC resources. The BOINC client works out that it needs to run those WU at a high priority to make the dead line.

This is effectively overriding my priority settings and is staving the other projects, which is clearly not polite. At the moment I have all 4 cores running MC instead of being spread around the other projects, I have to manually go in a suspend MC, if you just give extra time to the WU it would be fine.

I think that it is important that BOINC projects play fair with the other projects.

Profile Krunchin-Keith [USA]
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: Nov 10 05
Posts: 3047
Credit: 5,330,818
RAC: 4,054

The problem with the behaviour of the Malaria Control server at the moment seems to be that it is delivering WU's assuming that it has 100% of the BOINC resources. The BOINC client works out that it needs to run those WU at a high priority to make the dead line.

This is effectively overriding my priority settings and is staving the other projects, which is clearly not polite. At the moment I have all 4 cores running MC instead of being spread around the other projects, I have to manually go in a suspend MC, if you just give extra time to the WU it would be fine.

I think that it is important that BOINC projects play fair with the other projects.

What BOINC does is run projects that it thinks won't finish in time based on activity. When you suspend a project it throws its computatuions off as the other projects continue to count up or down, but the number for the suspend project does not change as it is suspended. As each project is run, it builds up a Long Term Debt, also a Short Term Debt in some case, especially when these are run in High Priority out of the normal time slice the project should be allowed. After a while boinc will settle down and give more time back to the other projects and slack off running the project which it over ran before, in this case MC. You just have to be patient and let it run, it does this over a longer term, like a week to a month or more, not hour to hour, that is how the long term debt is suppose to work.

Profile mikey
Avatar
Send message
Joined: Mar 23 07
Posts: 4120
Credit: 5,299,212
RAC: 1,698

The problem with the behaviour of the Malaria Control server at the moment seems to be that it is delivering WU's assuming that it has 100% of the BOINC resources. The BOINC client works out that it needs to run those WU at a high priority to make the dead line.

This is effectively overriding my priority settings and is staving the other projects, which is clearly not polite. At the moment I have all 4 cores running MC instead of being spread around the other projects, I have to manually go in a suspend MC, if you just give extra time to the WU it would be fine.

I think that it is important that BOINC projects play fair with the other projects.

What BOINC does is run projects that it thinks won't finish in time based on activity. When you suspend a project it throws its computatuions off as the other projects continue to count up or down, but the number for the suspend project does not change as it is suspended. As each project is run, it builds up a Long Term Debt, also a Short Term Debt in some case, especially when these are run in High Priority out of the normal time slice the project should be allowed. After a while boinc will settle down and give more time back to the other projects and slack off running the project which it over ran before, in this case MC. You just have to be patient and let it run, it does this over a longer term, like a week to a month or more, not hour to hour, that is how the long term debt is suppose to work.


Also what is your cache size for Boinc set to?

Profile maire
Volunteer moderator
Project administrator
Project developer
Project scientist
Send message
Joined: Nov 7 05
Posts: 438
Credit: 118,258
RAC: 0

I'll check with them and see what I can do.
Nick


Any progress? We are all still sitting on a bunch of tasks that need to be returned.

Thanks!


Will let you know when I hear back...
Nick
____________
Nicolas Maire
Swiss Tropical and Public Health Institute
http://www.swisstph.ch

Profile maire
Volunteer moderator
Project administrator
Project developer
Project scientist
Send message
Joined: Nov 7 05
Posts: 438
Credit: 118,258
RAC: 0

I'll check with them and see what I can do.
Nick


Any progress? We are all still sitting on a bunch of tasks that need to be returned.

Thanks!


Will let you know when I hear back...
Nick



Sorry to say that there seems to be no one in Cape Town at the moment in a position to take the project back online. I understand it is likely that the server will come back at the time they are ready for production hosting. At this point there should be a possibility to return tasks. Don't know when this will happen though..

Nick
____________
Nicolas Maire
Swiss Tropical and Public Health Institute
http://www.swisstph.ch

zombie67 [MM]
Avatar
Send message
Joined: Jan 4 07
Posts: 63
Credit: 1,006,390
RAC: 0

Okay. Thanks for the update.
____________
Dublin, CA
Team SETI.USA

jcmb
Send message
Joined: Aug 3 08
Posts: 2
Credit: 26,018
RAC: 75

While I like your theory, it is not what Malariacontol is doing, I told Boinc to not accept new tasks from Malariacontrol but to finish the ones it had.

It did this, I then left it for a week without any MC stuff but it was happy running Seti and Climateprediction, I then allowed it to get data from MC and I was given 3 units that came down so that they are running high priority. The machine has not been suspended since the download happened.

Boinc is always to use 10gb of disk and all of the processors, it had been only 30% of the processors but for this test I have it using all of the processor (and it had been runing in that state for 12 hours before bringing the MC data down)

Why can't we just add a day to two to the WO that get sent out?

Profile mikey
Avatar
Send message
Joined: Mar 23 07
Posts: 4120
Credit: 5,299,212
RAC: 1,698

While I like your theory, it is not what Malariacontol is doing, I told Boinc to not accept new tasks from Malariacontrol but to finish the ones it had.

It did this, I then left it for a week without any MC stuff but it was happy running Seti and Climateprediction, I then allowed it to get data from MC and I was given 3 units that came down so that they are running high priority. The machine has not been suspended since the download happened.


As soon as you told Boinc not to accept any Malaria units it went negative in the time it 'owed' Malaria for the overall crunching, it is now making it up.

Boinc is always to use 10gb of disk and all of the processors, it had been only 30% of the processors but for this test I have it using all of the processor (and it had been runing in that state for 12 hours before bringing the MC data down)

Why can't we just add a day to two to the WO that get sent out?


I can only say what has been said in the past, this Project uses the returned data in a timely manner to adjust the actual treatments being given in the field. Extending the deadlines can mean more people die.

Profile GGnaegi
Volunteer moderator
Send message
Joined: Mar 4 10
Posts: 98
Credit: 31,313
RAC: 159


... I then allowed it to get data from MC and I was given 3 units that came down so that they are running high priority. The machine has not been suspended since the download happened.

...

Why can't we just add a day to two to the WO that get sent out?


We are currently fitting some parameters, that's why the deadline is so short.
For further information, please have a look on this post.

Thank you for your feedback
Guillaume
____________
Guillaume Gnaegi
Swiss Tropical and Public Health Institute
http://www.swisstph.ch

RvP_LaN
Send message
Joined: Apr 15 07
Posts: 2
Credit: 976,895
RAC: 101

Sorry to say that there seems to be no one in Cape Town at the moment in a position to take the project back online

Thanks Nick for that precision.

But, in fact, there should still be an appeal to the university management, or executive research services.

Even if UCT-Malaria was a test for Malaria, despite it's just not polite to switch off a Boinc server like this, I don't get while it wasn't possible for a such university to have a "left-behind" server (even not a machine, maybe a virtual server) which can handle last requests of Boinc volunteers. Today, it's even not possible to detach from UCT-Malaria!

When people give their time and computer ressources to a project, the lesser of politeness is that they can terminate the WUs, or detach properly from the project. When correct information is given, when a deadline for the final shutdown is given, people may act in consequences. For UCT-Malaria, it looks like someone just pull out the plug one morning...

Actually when you have dozens of machines stucked with UCT-Malaria, the only possibility left now is to clean manually, one by one, all previously attached computers... Nice...

If UCT university is not willing to do anything, do you think that people at MalariaControl could put online a server just to respond in place of UCT-Malaria, so we can detach and clean our Boinc clients? (I understand this could be delicate even due to domain name properties)

Profile John Neale
Avatar
Send message
Joined: Feb 21 10
Posts: 83
Credit: 88,737
RAC: 37

When people give their time and computer ressources to a project, the lesser of politeness is that they can terminate the WUs, or detach properly from the project. When correct information is given, when a deadline for the final shutdown is given, people may act in consequences. For UCT-Malaria, it looks like someone just pull out the plug one morning...


I do agree. The funny thing is, in July this year, someone did manage to switch work creation for this project on for a few weeks ... just before they pulled out the plug.
____________

RvP_LaN
Send message
Joined: Apr 15 07
Posts: 2
Credit: 976,895
RAC: 101

Hi,

someone did manage to switch work creation for this project on for a few weeks ... just before they pulled out the plug.

I'm truly sorry to read this, because I obviously didn't follow (enough) the project news in order to terminate it properly... Too bad.

Still, I continue thinking that, nowadays, set up a VirtualMachine in a "corner" of the network, just leaving the scheduler service running, is a minimal cost for IT services, and it would have let us the ability to detach quietly from UCT-Malaria, leaving a longer period to do it.

Regards

Profile John Neale
Avatar
Send message
Joined: Feb 21 10
Posts: 83
Credit: 88,737
RAC: 37

I'm truly sorry to read this, because I obviously didn't follow (enough) the project news in order to terminate it properly... Too bad.


There was no project news to follow. All that happened was that work creation was switched on in July (after a long hiatus), and then in August (after Nicolas Maire contacted UCT) the plug was pulled, completely and without warning. One can follow this by looking at the monthly Total Credit graph for UCT Malaria on BOINCstats.
____________

Heavy Metal Dungeon Keeper
Send message
Joined: Mar 7 06
Posts: 2
Credit: 200,092
RAC: 1,007

Thanks for the information. It's nice to know that our crunching is contributing to a worthy cause.

Can anyone tell us whether UCT : malariacontrol.net does anything worthwhile?

Warped.


Yes and maybe:
Yes, they're getting ready to test a BOSSA (BOINC-extension for volunteer thinking) project called africaMap. This should be a worthy project, and for this they have the server up and running. In addition, there's a possibility that there will be another BOINC project run out of UCT (currently in early planning).
Maybe, I'm not sure the workunits they're currently sending out are of any use. I asked their project admin to switch workunit creation off, should this not be the case.

Nick


Any news on when or if UCT will return
____________

Post to thread

Message boards : Malaria Control : Status update -- june 2010


Return to malariacontrol.net main page


Copyright © 2013 africa@home