Important message concerning the map predictor application

Message boards : Malaria Control : Important message concerning the map predictor application

Author Message
Profile maire
Volunteer moderator
Project administrator
Project developer
Project scientist
Send message
Joined: Nov 7 05
Posts: 438
Credit: 118,258
RAC: 0

The new version of the mappredictor runs more stable than the first version. We do see quite a few errors, but they are concentrated on a small number of hosts. The error usually occurs right at the beginning of workunit.
As we are under a lot of pressure to deliver these results, we have decided on the following procedure: We will make use of the total Windows host population to go through the workunits as quickly as possible. However, if you have Windows hosts attached to the project and prefer not to receive mapping workunits, you can opt out using the procedure described below. Please also consider these points when making your decision:
The revised application version takes considerably longer to run (2-3 longer than the old version, about 30-40 min on an average PC), therefore it is important to repeat the following facts here: This application makes use of BOINC\'s wrapper approach for legacy applications, which has a few drawbacks: there is no checkpointing and no feedback from the application about progress. Therefore a workunit will start from the beginning if interrupted. In addition, one of the input data file is now about 20MB in size (used to be 6MB). You only download this once, but it may be a problem for slow connections.

Opt-out procedure:
-On malariacontrol.net, go to Your account
-malariacontrol.net preferences, View or edit
-Edit malariacontrol.net preferences
- Run malariacontrol simulation application: yes
- Run map predictor application: no

The server side scheduler currently ignores the setting for ‘Run malariacontrol test application’, whether or not you get work for this only depends on the ‘Run test applications?’ setting further up.

We will not start sending out work before Monday, June 3rd 2007.

____________
Nicolas Maire
Swiss Tropical and Public Health Institute
http://www.swisstph.ch

FalconFly
Avatar
Send message
Joined: Mar 7 06
Posts: 92
Credit: 5,517,713
RAC: 0

As we are under a lot of pressure to deliver these results [...]


Hope you don\'t mind I ask, what\'s the trouble?

I wouldn\'t mind running the Map Predictor application exclusively on my lonely Win32 host, but I don\'t have any Option to select that to help out.
____________
Scientific Network : 44800 MHz - 77824 MB - 1970 GB

Betting Slip
Send message
Joined: Jul 9 06
Posts: 8
Credit: 211,118
RAC: 0

Bring em on...
____________

RandyC
Avatar
Send message
Joined: Jun 23 06
Posts: 2695
Credit: 850,101
RAC: 1,184


I wouldn\'t mind running the Map Predictor application exclusively on my lonely Win32 host, but I don\'t have any Option to select that to help out.


Try this:
1. Go to Your Account and select Malariacontrol.net Preferences
2. Setup one of the three venues (Home, Work, School) for Map Predictor only
3. Set your Win32 host to that venue and do a manual update on it

Your host should then complete any outstanding WUs and only download Map Predictor WUs (as available) from then on.

HTH

[edit typo]

Profile maire
Volunteer moderator
Project administrator
Project developer
Project scientist
Send message
Joined: Nov 7 05
Posts: 438
Credit: 118,258
RAC: 0

As we are under a lot of pressure to deliver these results [...]


Hope you don\'t mind I ask, what\'s the trouble?

Deadlines... When we started this one, it looked all so simple. Wrap the binary of the science app, create a few hundred thousand wus, 2 or 3 weeks later everything would be done. This was a few months ago, and there are a few people here who depend on these results. This is why we need a large proportion of the windows hosts to contribute. If we can deliver, this would be a nice example of volunteer computing solving a hard problem with a relatively small investment (still, despite the unforeseen complications).
Nick
____________
Nicolas Maire
Swiss Tropical and Public Health Institute
http://www.swisstph.ch

Profile maire
Volunteer moderator
Project administrator
Project developer
Project scientist
Send message
Joined: Nov 7 05
Posts: 438
Credit: 118,258
RAC: 0


Try this:
1. Go to Your Account and select Malariacontrol.net Preferences
2. Setup one of the three venues (Home, Work, School) for Map Predictor only
3. Set your Win32 host to that venue and do a manual update on it

For those of you who manage separate locations using the \'Combined preferences\' view: There\'s currently a small (cosmetics) issue. You\'ll find numbers instead of Yes/No in the table summary. Non-zero values mean: Get work for this application, zeros mean: No work for this application.
Nick
____________
Nicolas Maire
Swiss Tropical and Public Health Institute
http://www.swisstph.ch

Profile The Gas Giant
Avatar
Send message
Joined: Mar 7 06
Posts: 1213
Credit: 3,503,340
RAC: 1,667

There are a few wu\'s where every single host errors out, like this wu. The only issue I have with wu\'s erroring out is that thread hangs until you acknowledge the error. During this time no work is completed. Due to this I know of some people who stear away from this project during this time and then have not come back.

I\'m also trying to remember how much credit was granted per wu last time. Was it 1.8?

Live long and BOINC.
____________
Paul
(S@H1 8888)

adrianxw
Avatar
Send message
Joined: Mar 8 06
Posts: 145
Credit: 474,763
RAC: 883

I\'m also trying to remember how much credit was granted per wu last time. Was it 1.8?

I would expect the next round of these, which have 2-3 time the execution time, to have 2-3 times the credit.

I had a few of those fail last time, but not many. The problem is, as stated above, when the hang, they hang BOINC completely and no work gets done until you click the message box.

What I was doing was downloading a number of units, then suspending all the map ones letting the regular ones run out, (time slicing with other projects). When I knew I was going to be sitting at the machine for a while, I\'d suspend everything else and run the map wu\'s exclusively. Then if a problem appeared, I could click the box and away she goes again.

I have set my local machines to allow these wu\'s. The machines that run at my remote site I have disallowed. I only visit that site 1-2 times a week, sometimes less, and don\'t wish to risk them standing idle.
____________
Wave upon wave of demented avengers march cheerfully out of obscurity into the dream.

AnRM
Send message
Joined: Mar 7 06
Posts: 54
Credit: 2,130,571
RAC: 0

Bring em on...

Right on!....moving our Windoze machines over from R@H to help...Cheers, Rog.
____________

FalconFly
Avatar
Send message
Joined: Mar 7 06
Posts: 92
Credit: 5,517,713
RAC: 0

I tried setting the Venue of my Win32 box to \"Work\" and set it to run Map Predictor exclusively... Seems that this will work out fine indeed, for as long as there is a constant supply of these WorkUnits :)
____________
Scientific Network : 44800 MHz - 77824 MB - 1970 GB

adrianxw
Avatar
Send message
Joined: Mar 8 06
Posts: 145
Credit: 474,763
RAC: 883

I\'ve set Home, Work and School to be Yes, Yes, Yes, but have not received any of the Map units. 1 machine in Home group, 1 in Work and 2 in School. When they were circulating before, I got them.
____________
Wave upon wave of demented avengers march cheerfully out of obscurity into the dream.

Profile Guy Pauwels
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: Nov 10 05
Posts: 62
Credit: 127,778
RAC: 0

I\'ve set Home, Work and School to be Yes, Yes, Yes, but have not received any of the Map units. 1 machine in Home group, 1 in Work and 2 in School. When they were circulating before, I got them.


Have you tried setting \"Run malariacontrol simulation\" to No? That works for me. Of course then the map WU are the only ones you get.
____________

BOINC.BE: For Belgians who love the smell of glowing red cpu's in the morning
Tutta55's Lair

adrianxw
Avatar
Send message
Joined: Mar 8 06
Posts: 145
Credit: 474,763
RAC: 883

No, I haven\'t, but reading the original post, particularly...

We will make use of the total Windows host population to go through the workunits as quickly as possible.

... it sounded like they were pretty keen to get these things done, so would preferentially send them to people not opting out.
____________
Wave upon wave of demented avengers march cheerfully out of obscurity into the dream.

Klaus Rupprecht
Send message
Joined: Jun 20 06
Posts: 2
Credit: 9,283
RAC: 0

It doesn\'t work with my computer, i presume. If i am running the mappredictor, i get every second a message like this:
07.06.07 20:16:44|malariacontrol.net beta|app reporting negative CPU: -458403381850.212400

And if i am closing boinc, i get after the restart this error-messages:
07.06.07 20:15:25|malariacontrol.net beta|Reason: Unrecoverable error for result mapwca0032911.txt_0 ( - exit code -1 (0xffffffff))
07.06.07 20:15:30|malariacontrol.net beta|[error] Can\'t rename output file mapwca0032911.txt_0_0
07.06.07 20:15:36|malariacontrol.net beta|[error] Can\'t rename output file mapwca0032911.txt_0_1

Sincerly, Klaus.

____________

Profile mikey
Avatar
Send message
Joined: Mar 23 07
Posts: 4120
Credit: 5,299,212
RAC: 1,698

It doesn\'t work with my computer, i presume. If i am running the mappredictor, i get every second a message like this:
07.06.07 20:16:44|malariacontrol.net beta|app reporting negative CPU: -458403381850.212400

And if i am closing boinc, i get after the restart this error-messages:
07.06.07 20:15:25|malariacontrol.net beta|Reason: Unrecoverable error for result mapwca0032911.txt_0 ( - exit code -1 (0xffffffff))
07.06.07 20:15:30|malariacontrol.net beta|[error] Can\'t rename output file mapwca0032911.txt_0_0
07.06.07 20:15:36|malariacontrol.net beta|[error] Can\'t rename output file mapwca0032911.txt_0_1
Sincerly, Klaus.


Are you the only user on this machine? If so are you running as Admin or just a User? It looks like a permissions issue.
____________

Profile Krunchin-Keith [USA]
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: Nov 10 05
Posts: 3047
Credit: 5,330,818
RAC: 4,054

I\'ve set Home, Work and School to be Yes, Yes, Yes, but have not received any of the Map units. 1 machine in Home group, 1 in Work and 2 in School. When they were circulating before, I got them.


I set my preferences for my home venue as follows to force my home windows hosts to only krunch map predictor for now.

Run test applications? yes
Run malariacontrol simulation application No
Run malariacontrol test application Yes
Run map predictor application Yes

This give me only mappredictor 5.17 and NO malariacontrol 5.50

Before I had lots of mc5.50 and few map5.17

Over 100 done in past 24 hours. Hope this helps expedite things.

Profile The Gas Giant
Avatar
Send message
Joined: Mar 7 06
Posts: 1213
Credit: 3,503,340
RAC: 1,667

Shame the wu replication is so high. All the wu\'s with errors keep getting sent out, even after 2 or 3 errors! Basically all these wu\'s are not going to work and just hold up the processing of other wu\'s, both globally and on each host that gets one.

Can the replication be dialled back a little?

Live long and BOINC.

____________
Paul
(S@H1 8888)

Klaus Rupprecht
Send message
Joined: Jun 20 06
Posts: 2
Credit: 9,283
RAC: 0

Are you the only user on this machine? If so are you running as Admin or just a User? It looks like a permissions issue.


I\'m the only user, but i am using Windows 98 SE, not XP or Vista...

____________

Profile Dagorath
Send message
Joined: Jun 26 06
Posts: 68
Credit: 71,310
RAC: 0

Shame the wu replication is so high. All the wu\'s with errors keep getting sent out, even after 2 or 3 errors! Basically all these wu\'s are not going to work and just hold up the processing of other wu\'s, both globally and on each host that gets one.

Can the replication be dialled back a little?

Live long and BOINC.


The replication is only 2. I can\'t see them dialing it back to 1. Or am I missing something?

Alain posted in another thread yesterday that they have corrected a bug in the WUs themselves. He also says we won\'t get more buggy WUs which makes me think they deleted all the buggy ones. That should help a lot.

Still, I get WUs that have 2 to 4 compute errors against them. 99% of them come from hosts running older versions of BOINC. They crunch error free on my system and validate and yield a canonical result if there is one other successful crunch on them. Then they\'re done and won\'t be replicated again.

Also, the crunch reports indicate the WUs crash immediately on startup so next to zero CPU time is wasted.

I\'ve been sending the following text in a private message to all my quorum partners who have failed a WU due to running an old version of BOINC...

You are running an older version of BOINC which is crashing most of the mappredictor work units you receive from Malaricontrol. The mappredictor work units run fine on the currently recommended BOINC 5.8.16. Download it here.

If you are not able to update your BOINC then please opt out of the mappredictor WUs until you are able to update. Instructions for opting out are given in the first post in this thread in the Malariacontrol forums. You can opt out of the mappredictor WUs and continue to crunch the regular malariacontrol WUs. Please do not detach from Malariacontrol because the regular malariacontrol WUs run fine even on older versions of BOINC.

If you are a member of a team then please inform your team members.






____________
--

svenni96
Send message
Joined: Mar 7 06
Posts: 7
Credit: 31,310
RAC: 0

Japp, there are two different errors.

One is because of older client and the other one because of an odd Data-String
The latter should be fixed who Dagorath mentoined.

@ Gas giant: Look on the Wu-Tab. If there are many errors with \"cant resume ..2\" you should crunch the wu. If there is something like \"- exit code 1282 (0x502)\" you should abort it.

The last one you abort was not buggy. One valid result was still received.
____________

Profile The Gas Giant
Avatar
Send message
Joined: Mar 7 06
Posts: 1213
Credit: 3,503,340
RAC: 1,667

Japp, there are two different errors.

One is because of older client and the other one because of an odd Data-String
The latter should be fixed who Dagorath mentoined.

@ Gas giant: Look on the Wu-Tab. If there are many errors with \"cant resume ..2\" you should crunch the wu. If there is something like \"- exit code 1282 (0x502)\" you should abort it.

The last one you abort was not buggy. One valid result was still received.

Yeah, but when I get an _4 or a _3 wu the odds are pretty high that the wu is buggy and going to fail out leaving my computer doing nothing until I acknowledge the error and since I do not sit at my computer anywhere near 24/7 it could be sitting idle a long time, just like yesterday. Plus there is no way to find the _4 and _3 wu\'s in my without having to go through them one by one, which obviously is not something I\'m too willing to do.

@Dagorath said...
The replication is only 2. I can\'t see them dialing it back to 1. Or am I missing something?

Yes initial replication is 2, but on error they will replicate much higher. That is waht the max # of error/total/success results 7, 20, 10 is all about.

Alain posted in another thread yesterday that they have corrected a bug in the WUs themselves. He also says we won\'t get more buggy WUs which makes me think they deleted all the buggy ones. That should help a lot.

Still, I get WUs that have 2 to 4 compute errors against them. 99% of them come from hosts running older versions of BOINC. They crunch error free on my system and validate and yield a canonical result if there is one other successful crunch on them. Then they\'re done and won\'t be replicated again.

I will take this on board and not abort any further _3 or _4 wu\'s.

Thanks for the feedback guys.

Paul.

Robbie Lawrence
Send message
Joined: Jan 4 07
Posts: 12
Credit: 39,680
RAC: 0

Hmm... I don\'t really have time or will to read through all this but from what I can gather I\'m using the latest test version of BOINC which is quite stable at present, and I\'m still getting about a 40-50% fail rate with these errors... Am I missing something?

Profile The Gas Giant
Avatar
Send message
Joined: Mar 7 06
Posts: 1213
Credit: 3,503,340
RAC: 1,667

Hmm... I don\'t really have time or will to read through all this but from what I can gather I\'m using the latest test version of BOINC which is quite stable at present, and I\'m still getting about a 40-50% fail rate with these errors... Am I missing something?

No, you\'re not missing anything. It looks like all the errors you had, had errored out on other hosts as well. So it is nothing to do with you. In fact I\'ve had a wu error out recently as well that all other hosts had errored out on. Based on this, I\'m still tempted to abort any wu with an _3 or _4 suffix and will definitely abort any I get that are _5, _6 or _7.

Robbie Lawrence
Send message
Joined: Jan 4 07
Posts: 12
Credit: 39,680
RAC: 0

Based on this, I\'m still tempted to abort any wu with an _3 or _4 suffix and will definitely abort any I get that are _5, _6 or _7.


I shall join you in this. Will we continue to get them or are they forever gone now?
____________

Profile Krunchin-Keith [USA]
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: Nov 10 05
Posts: 3047
Credit: 5,330,818
RAC: 4,054

I shall join you in this. Will we continue to get them or are they forever gone now?


Still krunching. I have 5 running now (3 Hosts) and 51 stading by to go, last download was 5 minutes ago, no wait make that now, another completed and the host is grabbing another to refresh its queue. So yes, they still have some.

Profile Dagorath
Send message
Joined: Jun 26 06
Posts: 68
Credit: 71,310
RAC: 0

I finally had one crash too and got the \"application had a problem, want to report this to Microsoft?\" popup which halted all crunching which sucks. Can that dirty crunch halting popup be turned off somehow? I don\'t see a way to turn it off via Windows Control Panel. Maybe tweaking some registry setting turns it off?

Or is the only solution to run a second project at lower resource share? I am thinking only the mappredictor app would halt but the other project would crunch on?

EDIT ADDED: I am also thinking running a second project will not cause any loss of progress when BOINC switches projects because BOINC is not supposed to switch projects until the WU in progress checkpoints. Since mappredictor does not checkpoint then BOINC should, I think, wait until a mappredictor WU completes before switching to the other project. Is BOINC still working that way or has that feature been lost somewhere along the line?

____________
--

Kabal
Send message
Joined: Jun 20 06
Posts: 4
Credit: 176,273
RAC: 187

I finally had one crash too and got the \"application had a problem, want to report this to Microsoft?\" popup which halted all crunching which sucks. Can that dirty crunch halting popup be turned off somehow? I don\'t see a way to turn it off via Windows Control Panel. Maybe tweaking some registry setting turns it off?

On Windows XP SP2:
Control Panel -> System -> Advanced -> Error reporting button (bottom - right) -> disable.
More detailed instructions:
http://support.microsoft.com/kb/310414
____________

Profile Guy Pauwels
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: Nov 10 05
Posts: 62
Credit: 127,778
RAC: 0

I finally had one crash too and got the \"application had a problem, want to report this to Microsoft?\" popup which halted all crunching which sucks. Can that dirty crunch halting popup be turned off somehow? I don\'t see a way to turn it off via Windows Control Panel. Maybe tweaking some registry setting turns it off?

On Windows XP SP2:
Control Panel -> System -> Advanced -> Error reporting button (bottom - right) -> disable.
More detailed instructions:
http://support.microsoft.com/kb/310414


Now that is a useful piece of info I didn\'t know yet. If I could click 100 times on the \'+\' icon, I would ;)

Profile The Gas Giant
Avatar
Send message
Joined: Mar 7 06
Posts: 1213
Credit: 3,503,340
RAC: 1,667

I finally had one crash too and got the \"application had a problem, want to report this to Microsoft?\" popup which halted all crunching which sucks. Can that dirty crunch halting popup be turned off somehow? I don\'t see a way to turn it off via Windows Control Panel. Maybe tweaking some registry setting turns it off?

On Windows XP SP2:
Control Panel -> System -> Advanced -> Error reporting button (bottom - right) -> disable.
More detailed instructions:
http://support.microsoft.com/kb/310414


Now that is a useful piece of info I didn\'t know yet. If I could click 100 times on the \'+\' icon, I would ;)

I\'ll second that!

Profile Dagorath
Send message
Joined: Jun 26 06
Posts: 68
Credit: 71,310
RAC: 0

Me three! Thanks Kabal :)
____________
--

Profile mikey
Avatar
Send message
Joined: Mar 23 07
Posts: 4120
Credit: 5,299,212
RAC: 1,698

Me three! Thanks Kabal :)

Me four! Thanks from me too!
____________

Keck_Komputers
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: Nov 10 05
Posts: 29
Credit: 281,138
RAC: 495


EDIT ADDED: I am also thinking running a second project will not cause any loss of progress when BOINC switches projects because BOINC is not supposed to switch projects until the WU in progress checkpoints. Since mappredictor does not checkpoint then BOINC should, I think, wait until a mappredictor WU completes before switching to the other project. Is BOINC still working that way or has that feature been lost somewhere along the line?

You are correct when using 5.8.x or later clients. There is one thing to consider though, the client will wait no longer than double the switch interval before forcing a switch. Even in that case it should suspend to memory if the app has never checkpointed.
____________
BOINC WIKI

BOINCing since 2002/12/8

Profile Dagorath
Send message
Joined: Jun 26 06
Posts: 68
Credit: 71,310
RAC: 0

Thanks, John. Will it override and suspend to memory even if \"leave apps in memory while suspended\" setting = no?

____________
--

Keck_Komputers
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: Nov 10 05
Posts: 29
Credit: 281,138
RAC: 495

Thanks, John. Will it override and suspend to memory even if \"leave apps in memory while suspended\" setting = no?

Yes, but only if the app has never checkpointed.
____________
BOINC WIKI

BOINCing since 2002/12/8

NadiaKoutzen
Send message
Joined: Nov 12 07
Posts: 1
Credit: 0
RAC: 0


I have received your thread re Malariacontrol

Post to thread

Message boards : Malaria Control : Important message concerning the map predictor application


Return to malariacontrol.net main page


Copyright © 2013 africa@home