Optimizer growing up

Message boards : Malaria Control : Optimizer growing up

Author Message
Michael
Volunteer moderator
Project scientist
Send message
Joined: May 5 06
Posts: 79
Credit: 494
RAC: 0

The "optimizer" application will again leave testing status some day during next week, starting from monday, 15.September.

read below an old (but updated) post on what this application does, and how you can opt out of running it (by default, you should be getting workunits of this application, unless it is in testing state and you did not volunteer to run testing workunits)

Only windows hosts will get work.
____________
Michael

Michael
Volunteer moderator
Project scientist
Send message
Joined: May 5 06
Posts: 79
Credit: 494
RAC: 0

This post was last updated on 12. of Sept. 08.
, maintained only for general information about the optimizer science application.

Watch this thread for news, it will leave testing status during next week (from 15. September)

At first, it will be run as a test application, meaning that only users who have \"run test applications\" and \"run optimizer application\" checked in their account settings (under malariacontrol.net preferences) will get work.



In addition, only windows hosts will get work.


Work units will take from 1 to 2 hours depending on the model parameters. Checkpointing is now being done, and progress will be indicated, but not in a very reliable way (so don't worry if it says 100% and then continues for a long time - max 2 hours for now.)

Calculation is done by a java program, contained within the standard boinc-\"wrapper\" application. You don't need to have java installed.. a java runtime environment is included in the application.
Deadlines: Three days.

The name \"optimizer\" for this application was chosen because the server side components are essentially a \"general use\" optimization framework to be used by scientists in our group to work on more specific questions. E.g. to fit simpler models for which the \"big\" malaria model would just not be what you want. The insights from those calculations will help us to improve the main malariacontrol application in the future.


On the science of the project:


To make quantitative predictions of malaria transmission, it is very important to know how long an infection lasts in an infected human. Because the longer it lasts, the more mosquitoes can get infected, the more infected mosquitoes you have, the more humans are being infected etc, etc, etc..
It may at first seem very straightforward to measure this: you just look when somebody gets infected, and then you keep taking blood samples until that person is not infected anymore.
Unfortunately, you only have a chance of about 50% percent to detect an infection, given that it is there. So you already have a problem: you don\'t know when the infection started, and you don\'t know exactly when it ended.
In addition: In areas of high malaria transmission people are very often infected with up to ten or more infections simultaneously... so you never know if what you\'re seeing is still the same infection or a new one..
Recently some work at our institute has used new dna-based methods (which allow distinction of different infections), together with a mathematical approach, to estimate the average duration of an untreated p. falciparum infection.


see Sama etal. 2006
(sorry, only the abstract is freely available to the public)


So far so good, this was an important step forward. The problem that remains is: how are the durations distributed? In other words: do all of the infections last exactly 200 days and then all of them stop? Or does an infection have a constant probability to disappear, which remains constant no matter how old the infection is? Probably none of the two is true, but we need to describe the shape of that distribution of durations somehow, in order to make sensible predictions.


for more on that, see Sama etal.2006b


That\'s almost where we want to go, except for one thing: the above paper measures the distribution in people living in the US who had never experienced malaria before. They were infected on purpose, to cure their syphilis (the method of choice at that time..) We don\'t know what the picture looks like in people living in areas of high transmission, with multiple infections at a time and after decades of being constantly infected...

Attempts to find a mathematical solution to this problem did not work out.. the equations become unsolvable. But there is a way out: instead of using equations, we can use individual based simulations, that means we simulate every single infection in a computer program, and see what parameters can best produce the data we have. The big drawback there is, this just takes too long to calculate on a single computer.
That\'s what we need you guys and girls for, and thanks a lot for making this possible!!

P.S.: Something about the data collection mentioned above, to prevent misunderstandings: There are strict ethical guidelines on how one is allowed to obtain such data. Since most malaria infections in high transmission areas don\'t cause any symptoms, being infected with malaria doesn\'t mean you are sick (because of acquired immunity). People who did have symptoms were of course given treatment.
____________
Michael

Michael
Volunteer moderator
Project scientist
Send message
Joined: May 5 06
Posts: 79
Credit: 494
RAC: 0

Thanks to all those who helped testing! After running the new version in testing status over the weekend, everything seems to be fine,no major issues, so we are leaving testing status now! It's always a great moment to "unleash" the full power of a boinc project and see the results coming back! Why am I so motivated, it's monday morning 9.20am? I think I have an interesting job, not always, but right now this is very exciting...:)
We are confident to have a batch of sensible results from this application by end november latest. Then it will be written up and published, preferably in an open access journal, so we can post a link here.

cheers


____________
Michael

John Clark
Avatar
Send message
Joined: Feb 10 08
Posts: 2085
Credit: 1,137,540
RAC: 740

Michael

Where can I look up data/information on this optimiser client, as I run Win XP on both the rigs I use for Malaria?

Mabe I am a little thick, but I assume "optimiser" refers to a Malaria project client which uses the specialist Intel instruction sets (like MMX, SSE, SSE2, SSE3, SSSE3x or SSE4.1)?

I would love to volunteer to test if this is the correct view, or test when I understand what I can do to contribute, as well as using the stock client.

I did not look around too deeply before I posted this, so I have some reading to carry out.

Hopefully my questions will be answered during this reading, but if there is any short answers you might post I would apprecialte the understanding (both for me and from you).
____________
Go away, I was asleep

Said a Russell, 3 Shih-Tzus & a Bischeon Frize

Profile Krunchin-Keith [USA]
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: Nov 10 05
Posts: 3047
Credit: 5,330,818
RAC: 4,054

Michael

Where can I look up data/information on this optimiser client, as I run Win XP on both the rigs I use for Malaria?

See my FAQ thread Running the different applciations in number crunching.
See the 2nd post in this thread, it explains it too.

Mabe I am a little thick, but I assume "optimiser" refers to a Malaria project client which uses the specialist Intel instruction sets (like MMX, SSE, SSE2, SSE3, SSSE3x or SSE4.1)?

No it is not an optimized application for intel.
See the second post in this thread for a description. It explains what optimizer means.


I would love to volunteer to test if this is the correct view, or test when I understand what I can do to contribute, as well as using the stock client.

Set your settings, the more the merrier.

I did not look around too deeply before I posted this, so I have some reading to carry out.

Hopefully my questions will be answered during this reading, but if there is any short answers you might post I would apprecialte the understanding (both for me and from you).

John Clark
Avatar
Send message
Joined: Feb 10 08
Posts: 2085
Credit: 1,137,540
RAC: 740

Done as per the highlighted bits in your first post.

Now we will see what happens.
____________
Go away, I was asleep

Said a Russell, 3 Shih-Tzus & a Bischeon Frize

Michael
Volunteer moderator
Project scientist
Send message
Joined: May 5 06
Posts: 79
Credit: 494
RAC: 0

good! thanks keith for helping..
____________
Michael

d_a_dempsey
Send message
Joined: Feb 29 08
Posts: 3
Credit: 2,244,103
RAC: 4,754

I have received a couple of these "optimiser packets." I assume these are the work units of the name "Estimation of parameters...". Two of these have reached 100.000% completion, have no more remaining completion time--and continue to run. They will go to a "waiting to run" status, back to "running" but have not completed, even after being 100% complete for almost 24 hours.

Is this normal?

Michael
Volunteer moderator
Project scientist
Send message
Joined: May 5 06
Posts: 79
Credit: 494
RAC: 0

..does not sound good..

that it shows 100% and then continues is not bad by itself.. that indication of percentage is just approximate.. but not for 24 hours?
Maximum 2 hours (for the batch that went out now..)
please abort them by hand, and if possible, could you post a link to the workunits? that would help to find it.. this really shouldnt happen..

____________
Michael

Augustine
Avatar
Send message
Joined: Mar 7 06
Posts: 36
Credit: 275,224
RAC: 0

I noticed the same issue and these WUs took longer than 2h to complete sitting at 100% for most of the time, for too long even with the CPU throttled at 17%:


HTH
____________

Michael
Volunteer moderator
Project scientist
Send message
Joined: May 5 06
Posts: 79
Credit: 494
RAC: 0

I noticed the same issue and these WUs took longer than 2h to complete sitting at 100% for most of the time, for too long even with the CPU throttled at 17%:


HTH



Augustine, those look correct, they took a long time, because the cpu was throttled.. for the first one it appears that the cpu was throttled to about 5% (maybe had more than one wu running at the same time??), and the second one about 15%, pretty close to the 17% you said.. if you compare "cpu time" on the pages above, with the actual time it took, and take into account the trottling, it seems ok to me..

____________
Michael

Augustine
Avatar
Send message
Joined: Mar 7 06
Posts: 36
Credit: 275,224
RAC: 0

if you compare "cpu time" on the pages above, with the actual time it took, and take into account the trottling, it seems ok to me..

OK. I did have one system error out most WUs though...

TIA

____________

Michael
Volunteer moderator
Project scientist
Send message
Joined: May 5 06
Posts: 79
Credit: 494
RAC: 0

if you compare "cpu time" on the pages above, with the actual time it took, and take into account the trottling, it seems ok to me..

OK. I did have one system error out most WUs though...

TIA


Saw it.. very strange, since the contained science app actually terminated correctly (looking at std error), and also gave the correct result. It only affects that one computer of yours. I also noticed, that your client seems not to report back the app-version.. so something must be fishy there ...:)

Things to try:
- Reset project malariacontrol (so the applications are downloaded again.. )
- Try installing boinc in a different place (not "allusers/application_data...") am not sure about this, but it's an unusual place to have boinc installed
- otherwise i would recomment to opt out of running optimizer-wu's (check the "no" box (for "run optimizer app") in your account-> project settings.).
____________
Michael

d_a_dempsey
Send message
Joined: Feb 29 08
Posts: 3
Credit: 2,244,103
RAC: 4,754

Tasks with issues:

35473384
35456755
____________
David

Augustine
Avatar
Send message
Joined: Mar 7 06
Posts: 36
Credit: 275,224
RAC: 0

I also noticed, that your client seems not to report back the app-version.. so something must be fishy there ...:)

Unlike the other systems, it's running a beta client, 6.3.10. The WUs that succeeded do report the application version, but not those that failed. So maybe that's why.

Thanks.
____________

Profile Ananas
Send message
Joined: Mar 7 06
Posts: 58
Credit: 704,023
RAC: 715

I guess you're after a medal for the worst BOINC application?

Each application task opened a GUI window asking me to install JAVA - great thing on an unattended cruncher - CPUs stuck for hours.

After confirming the installation, they still crashed within no time, no reason given, just file transfer errors.

It should at least have a warning ("Java required") behind the OptIn selection.

Profile The Gas Giant
Avatar
Send message
Joined: Mar 7 06
Posts: 1213
Credit: 3,503,340
RAC: 1,667

I believe Java is installed by the Malaria Control application. You just need to ensure you have 'install' rights.....

I always love it when people go off!

Augustine
Avatar
Send message
Joined: Mar 7 06
Posts: 36
Credit: 275,224
RAC: 0

I believe Java is installed by the Malaria Control application. You just need to ensure you have 'install' rights...

How does this play with BOINC 6.0's new protection scheme using its own users?

TIA
____________

Michael
Volunteer moderator
Project scientist
Send message
Joined: May 5 06
Posts: 79
Credit: 494
RAC: 0

I guess you're after a medal for the worst BOINC application?

Each application task opened a GUI window asking me to install JAVA - great thing on an unattended cruncher - CPUs stuck for hours.

After confirming the installation, they still crashed within no time, no reason given, just file transfer errors.

It should at least have a warning ("Java required") behind the OptIn selection.


Hi Ananas,

am sorry for the hassle you got with this application. And thank you for reporting.
No, normally this application does not require a java installation, as a jre comes with the application itself (and it's not really being installed, just unzipped, meaning other apps won't find it, and it's gone after the slot is cleaned up - so no issues with boincs new user policy, i suspect).. however, when the launcher for the java app is executed, and it doesn't find a jre in the place where it should be, it starts looking for a pre-installed one, and tries to use that one.. only then, at the end, if everything fails, it prompts you to install java.. if you install java, that should then do the trick, and in the future the optimizer app will start using your newly installed jre.. this is not exactly what we wanted (since it should use the jre that comes along with app, not just any.. ) but it will work for you. Not completely for us though, since other people might have that error too:

This is an interesting error report, because looking into your results shows that your errors all come out as this " -161 no output file find" .. and obviously a missing jre was the problem.. BUT unzipping of the the jre by the wrapper application didn't result in an error (that would show up in the stderr). I cannot really make sense of it, so far, but maybe this points into the right direction for resolving the -161 error - the jre is unzipped, but not being used by the launcher.. we have to look into this..





____________
Michael

Michael
Volunteer moderator
Project scientist
Send message
Joined: May 5 06
Posts: 79
Credit: 494
RAC: 0

update: optimizer is going back to testing status, at least for a few days.. it really looks as if certain client versions don't do well (some extremely not well) with this application). Trying to figure out if, and which ones.. stay tuned
____________
Michael

Thyme Lawn
Send message
Joined: Jun 20 06
Posts: 180
Credit: 1,138,879
RAC: 1,495

Looks like optimizer v1.55 has fixed the upper cannot equal lower problem with the opt_27_* tasks.

Just returned my first one with the new version, only to get a too many total results validation failure (WU 13134127). The 7th task was sent out 4 hours after the 6th was returned. If the 7th task was created when the 6th was returned it shouldn't have been. If it already existed its state should surely have been set to server state Unsent and outcome Didn't need when the 6th was returned.

The pair of tasks which ran longest have massively different claimed credit but the "stderr out" text between Application terminated and called boinc_finish is identical. This suggests the tasks would have validated successfully if the WU had been set up with maximum total results set to 7 (5 errors + 2 successful to meet the quorum) instead of 6.
____________
"The ultimate test of a moral society is the kind of world that it leaves to its children." - Dietrich Bonhoeffer

Michael
Volunteer moderator
Project scientist
Send message
Joined: May 5 06
Posts: 79
Credit: 494
RAC: 0

Looks like optimizer v1.55 has fixed the upper cannot equal lower problem with the opt_27_* tasks.

yes, we did.. it was a bug in one of the java libraries we used



Just returned my first one with the new version, only to get a too many total results validation failure (WU 13134127). The 7th task was sent out 4 hours after the 6th was returned. If the 7th task was created when the 6th was returned it shouldn't have been. If it already existed its state should surely have been set to server state Unsent and outcome Didn't need when the 6th was returned.


that sounds to me like a server side problem: probably the "transitioner" had not processed the "newly" received yet - although 4 hours seems like a long time to me. There were some problems recently on the server side, causing the transitioner to be a bit behind.. this should be fixed now (will check back with nick) In addition, I set up the max nr of results, so this validation problem will not happen anymore..

There are some problems remaining:
we still have the -161 error (with nothing at all in the standard error), and have so far not been able to resolve it. but it seems to be machine dependent.The next app version will collect a bit more debugging information concerning this problem.

Second, there is an error caused by a bug in client versions 6.2.14 upwards.. this bug is fixed in version 6.2.18, so please update your clients if you are among those who get something like "can't get shmem().." in your stderr.




____________
Michael

Thyme Lawn
Send message
Joined: Jun 20 06
Posts: 180
Credit: 1,138,879
RAC: 1,495

I set up the max nr of results, so this validation problem will not happen anymore.

Thanks Michael.
____________
"The ultimate test of a moral society is the kind of world that it leaves to its children." - Dietrich Bonhoeffer

Michael
Volunteer moderator
Project scientist
Send message
Joined: May 5 06
Posts: 79
Credit: 494
RAC: 0

version 1.59 is out.
It brings a change in that it avoids some redundancy in the calculations, i.e if two persons only differ by age, but are otherwise identical, we only have to calculate the older one of them, and write down the intermediate results... then look the result for the younger ones up in that table.. something along those lines.

This will bring higher precision of results (or shorter calculation times).. , hopefully.. we are still testing what the benefit is.
However it is clear, that this strategy can only be applied in some special cases of models that you want to fit - this doesn't bother us at the moment, because all we wanna do is fit exactly those special cases..


In addition, it collects debugging information about the "Jsmooth" launcher for the java app (not in the standard error though, it comes back to us as a second output file and does not appear in your web-based interface to the result-database). With this information we hope to track down error -161, which is still around, though not very common, but yet unsolved..

cheers
____________
Michael

Snugglebear
Send message
Joined: Jan 12 08
Posts: 2
Credit: 105,948
RAC: 0

Michael, has the issue with the infinite looping work units been resolved? I'm running a FreeBSD 7 x64 box with linux emulation that keeps hanging up on about 75% of the WUs sent my way. They will process through to 100% and then the system will go idle for upwards of 48 hours. Eventually some will be marked as finished and uploaded, but most will simply expire when the deadline passes. When I say idle, I mean it, too - the WU will sit there at 100%, CPU usage drops to zero, and boinc will not upload the result nor process the next unit. From playing around it appears that suspending and then resuming the WUs will cause the unit to resume processing for a few minutes and most of the time that allows them to complete successfully. Some, though, require four or five restarts in order to complete. Any assistance would be appreciated; manually restarting the jobs is tiresome.

Example WU that hung @ 100% and completed after restart:
wu_119_501_306221_0_1225446497_0

Sysinfo:
FreeBSD x.y.z 7.1-PRERELEASE FreeBSD 7.1-PRERELEASE #0: Mon Oct 6 19:41:19 PDT 2008 :/usr/obj/usr/src/sys/GENERIC amd64
Boinc 6.2.14
Platform is x86_64-pc-freebsd primary, i686-pc-linux-gnu alternate

Michael
Volunteer moderator
Project scientist
Send message
Joined: May 5 06
Posts: 79
Credit: 494
RAC: 0



Example WU that hung @ 100% and completed after restart:
wu_119_501_306221_0_1225446497_0

hey, thats a different application ! :) wrong thread...wI'll pass this on to nick..
____________
Michael

Snugglebear
Send message
Joined: Jan 12 08
Posts: 2
Credit: 105,948
RAC: 0

Without seeing a nice architectural diagram it's all the same to me. Since yesterday there have been a string of good WUs that aren't hanging up. Averages have gone from 35/day to 80+/day.

Michael
Volunteer moderator
Project scientist
Send message
Joined: May 5 06
Posts: 79
Credit: 494
RAC: 0

update: The optimizer workunits have now (with version 1.60) an average duration of 2 hours, and a maximum duration of 4hours (compared to avg 1h, max 2h before).

We are doing this to improve our fitting process.. that's why we are trying different settings. We can't really test this in the "testing state", because we wouldn't have enough hosts for that.. but even though one could call it "testing" we're not really testing if the software runs - it does run. We are now just fiddling with the parameters to fine-tune it.

you betcha?
____________
Michael

Thyme Lawn
Send message
Joined: Jun 20 06
Posts: 180
Credit: 1,138,879
RAC: 1,495

Haven't had a version 1.60 task yet, but I've just spotted that checkpointing wasn't being done by version 1.59.

29-Oct-2008 23:53:13 [malariacontrol.net] [cpu_sched] Starting opt_14_-17470_6_200908137_0 (initial)
29-Oct-2008 23:53:13 [malariacontrol.net] Starting task opt_14_-17470_6_200908137_0 using optimizer version 155
29-Oct-2008 23:59:24 [malariacontrol.net] [checkpoint_debug] result opt_14_-17470_6_200908137_0 checkpointed
30-Oct-2008 00:01:01 [CPDN Beta] [checkpoint_debug] result hadrm3spinupagvf_jdcp_1920_160_10002370_1 checkpointed
30-Oct-2008 00:02:18 [malariacontrol.net] [checkpoint_debug] result opt_14_-17470_6_200908137_0 checkpointed
30-Oct-2008 00:05:18 [malariacontrol.net] [checkpoint_debug] result opt_14_-17470_6_200908137_0 checkpointed

... cut ...

02/11/2008 17:10:29|malariacontrol.net|Starting opt_51_-14954_6_966686504_2
02/11/2008 17:10:29|malariacontrol.net|[cpu_sched] Starting opt_51_-14954_6_966686504_2 (initial)
02/11/2008 17:10:30|malariacontrol.net|Starting task opt_51_-14954_6_966686504_2 using optimizer version 159
02/11/2008 17:17:14|CPDN Beta|[checkpoint_debug] result hadrm3spinupagvf_jdcp_1920_160_10002370_1 checkpointed
02/11/2008 17:32:41|CPDN Beta|[checkpoint_debug] result hadrm3spinupagvf_jdcp_1920_160_10002370_1 checkpointed
02/11/2008 17:48:11|CPDN Beta|[checkpoint_debug] result hadrm3spinupagvf_jdcp_1920_160_10002370_1 checkpointed
02/11/2008 18:03:40|CPDN Beta|[checkpoint_debug] result hadrm3spinupagvf_jdcp_1920_160_10002370_1 checkpointed
02/11/2008 18:19:06|CPDN Beta|[checkpoint_debug] result hadrm3spinupagvf_jdcp_1920_160_10002370_1 checkpointed
02/11/2008 18:34:34|CPDN Beta|[checkpoint_debug] result hadrm3spinupagvf_jdcp_1920_160_10002370_1 checkpointed
02/11/2008 18:50:14|CPDN Beta|[checkpoint_debug] result hadrm3spinupagvf_jdcp_1920_160_10002370_1 checkpointed
02/11/2008 19:05:48|CPDN Beta|[checkpoint_debug] result hadrm3spinupagvf_jdcp_1920_160_10002370_1 checkpointed
02/11/2008 19:10:50|malariacontrol.net|Computation for task opt_51_-14954_6_966686504_2 finished

____________
"The ultimate test of a moral society is the kind of world that it leaves to its children." - Dietrich Bonhoeffer

Thyme Lawn
Send message
Joined: Jun 20 06
Posts: 180
Credit: 1,138,879
RAC: 1,495

Haven't had a version 1.60 task yet, but I've just spotted that checkpointing wasn't being done by version 1.59.

I have now, and there's no checkpointing in version 1.60 either.
____________
"The ultimate test of a moral society is the kind of world that it leaves to its children." - Dietrich Bonhoeffer

Thyme Lawn
Send message
Joined: Jun 20 06
Posts: 180
Credit: 1,138,879
RAC: 1,495

I have now, and there's no checkpointing in version 1.60 either.

Correction. There is, it just seems to be very haphazard.

The first one I had ran for 68 minutes without a checkpoint.
03-Nov-2008 05:41:36 [malariacontrol.net] [cpu_sched] Starting opt_50_-30543_6_749280695_3 (initial)
03-Nov-2008 05:41:36 [malariacontrol.net] Starting task opt_50_-30543_6_749280695_3 using optimizer version 160
03-Nov-2008 06:49:49 [malariacontrol.net] Computation for task opt_50_-30543_6_749280695_3 finished

Second one ran for 150 minutes, checkpointed after 33 minutes and then every 16 minutes.
03-Nov-2008 11:27:26 [malariacontrol.net] [cpu_sched] Starting opt_23_-72292_6_905372596_1 (initial)
03-Nov-2008 11:27:27 [malariacontrol.net] Starting task opt_23_-72292_6_905372596_1 using optimizer version 160
03-Nov-2008 12:00:05 [malariacontrol.net] [checkpoint_debug] result opt_23_-72292_6_905372596_1 checkpointed
03-Nov-2008 12:16:13 [malariacontrol.net] [checkpoint_debug] result opt_23_-72292_6_905372596_1 checkpointed
03-Nov-2008 12:32:20 [malariacontrol.net] [checkpoint_debug] result opt_23_-72292_6_905372596_1 checkpointed
03-Nov-2008 12:32:20 [malariacontrol.net] [cpu_sched] Preempting opt_23_-72292_6_905372596_1 (left in memory)
03-Nov-2008 16:29:22 [malariacontrol.net] [cpu_sched] Resuming opt_23_-72292_6_905372596_1
03-Nov-2008 16:29:22 [malariacontrol.net] Resuming task opt_23_-72292_6_905372596_1 using optimizer version 160
03-Nov-2008 16:45:35 [malariacontrol.net] [checkpoint_debug] result opt_23_-72292_6_905372596_1 checkpointed
03-Nov-2008 17:01:53 [malariacontrol.net] [checkpoint_debug] result opt_23_-72292_6_905372596_1 checkpointed
03-Nov-2008 17:18:05 [malariacontrol.net] [checkpoint_debug] result opt_23_-72292_6_905372596_1 checkpointed
03-Nov-2008 17:34:13 [malariacontrol.net] [checkpoint_debug] result opt_23_-72292_6_905372596_1 checkpointed
03-Nov-2008 17:34:13 [malariacontrol.net] [cpu_sched] Preempting opt_23_-72292_6_905372596_1 (left in memory)
03-Nov-2008 20:47:09 [malariacontrol.net] [cpu_sched] Resuming opt_23_-72292_6_905372596_1
03-Nov-2008 20:47:09 [malariacontrol.net] Resuming task opt_23_-72292_6_905372596_1 using optimizer version 160
03-Nov-2008 21:02:47 [malariacontrol.net] [checkpoint_debug] result opt_23_-72292_6_905372596_1 checkpointed
03-Nov-2008 21:11:21 [malariacontrol.net] Computation for task opt_23_-72292_6_905372596_1 finished

____________
"The ultimate test of a moral society is the kind of world that it leaves to its children." - Dietrich Bonhoeffer

Michael
Volunteer moderator
Project scientist
Send message
Joined: May 5 06
Posts: 79
Credit: 494
RAC: 0

sorry for the late reply..
I think, this occured when old workunits with different configurations were still in circulation when the new app-version was already out.


But recently we are getting errors like this:

Exit status -177 (0xffffff4f) ERR_RSC_LIMIT_EXCEEDED

CPU time 593.1563

<message>
Maximum disk usage exceeded
</message>


even though maximum disk usage is at
<rsc_disk_bound>750000000</rsc_disk_bound>

750 megabite.. and that is way above what our app needs..

did anybody out there experience this error and has deeper insigths?

thanks

____________
Michael

Thyme Lawn
Send message
Joined: Jun 20 06
Posts: 180
Credit: 1,138,879
RAC: 1,495

did anybody out there experience this error and has deeper insigths?

Not personally, but the likely cause has been reported here.
____________
"The ultimate test of a moral society is the kind of world that it leaves to its children." - Dietrich Bonhoeffer

Michael
Volunteer moderator
Project scientist
Send message
Joined: May 5 06
Posts: 79
Credit: 494
RAC: 0

did anybody out there experience this error and has deeper insigths?

Not personally, but the likely cause has been reported here.


Ok thanks,
I know how to fix it and will stop sending workunits of this app right now.. the stdout.txt files should be now problem, because the boinc client will remove them once the wu's have finished.

Could anybody please post (some of the) contents of those files? because they contain debugging information. THX!
____________
Michael

Michael
Volunteer moderator
Project scientist
Send message
Joined: May 5 06
Posts: 79
Credit: 494
RAC: 0

ok, the bug is fixed, version 161 is out, and we're slowly starting to send out workunits again...
____________
Michael

Profile necronomicon
Send message
Joined: Jul 2 06
Posts: 10
Credit: 1,625,591
RAC: 0

did anybody out there experience this error and has deeper insigths?

Not personally, but the likely cause has been reported here.


Ok thanks,
I know how to fix it and will stop sending workunits of this app right now.. the stdout.txt files should be now problem, because the boinc client will remove them once the wu's have finished.

Could anybody please post (some of the) contents of those files? because they contain debugging information. THX!


Do you still want one? I made a rar - I think it came out at 16mb so I could upload it.

____________

Post to thread

Message boards : Malaria Control : Optimizer growing up


Return to malariacontrol.net main page


Copyright © 2013 africa@home