New version 6.12 of the malariacontrol science application ready for testing |
Message boards : Number crunching : New version 6.12 of the malariacontrol science application ready for testing
Author | Message |
---|---|
Just a short post to announce that the new version of the malariacontrol science application is now ready for testing. |
|
ID: 9349 | Rating: 0 | rate: / | |
Less than 2 days deadline? Ouch, best not ask too much work then. They come in at an estimated hour and a half, but I suspect they'll take a bit longer than that. |
|
ID: 9351 | Rating: 0 | rate: / | |
Welcome back Nick (and MCDN :) |
|
ID: 9352 | Rating: 0 | rate: / | |
Hi. |
|
ID: 9353 | Rating: 0 | rate: / | |
Thanks for the feedback! I've extended the deadline (for workunits created from now on), and static linking is on the list for the next version. |
|
ID: 9354 | Rating: 0 | rate: / | |
Initial estimate for me was 1.5 hrs, but after 2hrs 40min. I've still got 1hr 35 min. left to go (ha ha!). [edit]Says it's 51% complete.[/edit] |
|
ID: 9355 | Rating: 0 | rate: / | |
Initial estimate for me was 1.5 hrs Same estimate on my Win XP Pro AMD XP2200+, but they go in 3h 15 minutes. Still, since I had 10 of them waiting, I aborted some of them, so not to drive the debt up too high for Malaria. Plus it's less than 24 hours to go on the deadline, so they would never be able to all be done before their deadline. ;-) ____________ Jord. BOINC FAQ Service |
|
ID: 9357 | Rating: 0 | rate: / | |
Initial estimate for me was 1.5 hrs My first WU is now 93% complete and it looks like it will be 5hrs per WU. That means with the 30hr deadline allotted, I can only return 6 (MAYBE). Not sure how your system is so much faster than my 2600 (however, it isn't a dedicated cruncher either). Perhaps crunch times per WU are somewhat variable. |
|
ID: 9360 | Rating: 0 | rate: / | |
Hi. |
|
ID: 9361 | Rating: 0 | rate: / | |
My first two are taking 6-7 hours each on a Core 2 Duo (an order of magnitude longer than Peter Leman's quad). I sure hope there's some variance in the WU runtimes to account for this. |
|
ID: 9362 | Rating: 0 | rate: / | |
Hi Chipotle. |
|
ID: 9363 | Rating: 0 | rate: / | |
Hi Pete. |
|
ID: 9365 | Rating: 0 | rate: / | |
At MacOS 10.5.x on Intel with BOINC client 6.2.18 the CPU throtteling did not work at some stages. At some stages the CPU jumps to 100% and at the same time the disk io increases to about 2..3 MB/s. The duration is about 15 sec, repating every several minutes. |
|
ID: 9367 | Rating: 0 | rate: / | |
I have various lengths on same host, Windows XP. |
|
ID: 9374 | Rating: 0 | rate: / | |
At MacOS 10.5.x on Intel with BOINC client 6.2.18 the CPU throtteling did not work at some stages. At some stages the CPU jumps to 100% and at the same time the disk io increases to about 2..3 MB/s. The duration is about 15 sec, repating every several minutes. From the symptoms, this sounds like the checkpoints. Does it co-incide with your "Write to disk at most every" setting? |
|
ID: 9376 | Rating: 0 | rate: / | |
At MacOS 10.5.x on Intel with BOINC client 6.2.18 the CPU throtteling did not work at some stages. At some stages the CPU jumps to 100% and at the same time the disk io increases to about 2..3 MB/s. The duration is about 15 sec, repating every several minutes. Yes, i thought also that the checkpoints are the cause for the IO. The cycle fits to the BOINC preferences. |
|
ID: 9377 | Rating: 0 | rate: / | |
As you may have seen, we have some issues with the assimilator, which is crashing on some of the validated results. It's a problem to do with the output file formatting, and we'll need to fix it before sending out more work. This may take a few days. |
|
ID: 9380 | Rating: 0 | rate: / | |
I had to abort two units that ran to time limit, but were stuck in restart loops restarting but not progressing. |
|
ID: 9391 | Rating: 0 | rate: / | |
A user reported error: malariacontrolo test app 6.12 failing w/ error code -1 (0xffffffff) |
|
ID: 9408 | Rating: 0 | rate: / | |
I hope this New version 6.12 has a GPU support? |
|
ID: 9431 | Rating: 0 | rate: / | |
Back up and running, looking good so far |
|
ID: 9432 | Rating: 0 | rate: / | |
We've finally fixed the assimilator, and are currently sending out more work. Thanks for submitting problem reports! I should have mentioned that we have quite good information in our on workunits that simply crashed (this is sent back to the server by your BOINC client). What is really useful is information on why you aborted a certain workunit (like Bill's "stuck in restart loops" above). |
|
ID: 9463 | Rating: 0 | rate: / | |
We've finally fixed the assimilator, and are currently sending out more work. Thanks for submitting problem reports! I should have mentioned that we have quite good information in our on workunits that simply crashed (this is sent back to the server by your BOINC client). What is really useful is information on why you aborted a certain workunit (like Bill's "stuck in restart loops" above). I have yet to receive any work units, even though I am signed up for everything (I think). My machine has two Xeon processors and 8 GBytes RAM, so that should not be a problem. I get the message that no work is available even though the server status says that there is work available. I run Red Hat Enterprise Linux 5, if that matters. |
|
ID: 9469 | Rating: 0 | rate: / | |
The only work being sent out currently are Test units, so if you don't have the option to allow test WUs in your preferences you won't get anything. |
|
ID: 9471 | Rating: 0 | rate: / | |
Hi. |
|
ID: 9474 | Rating: 0 | rate: / | |
I allow lots of test units. My preferences say, in part: Run malariacontrol simulation application Yes Run malariacontrol test application Yes Run map predictor application Yes Run optimizer application Yes My message log says things like this: Wed 11 Mar 2009 03:03:12 AM EDT|malariacontrol.net|Message from server: No work sent Wed 11 Mar 2009 03:03:12 AM EDT|malariacontrol.net|Message from server: No work is available for malariacontrol.net Wed 11 Mar 2009 03:03:12 AM EDT|malariacontrol.net|Message from server: No work is available for malariacontrol.net test version Wed 11 Mar 2009 03:03:12 AM EDT|malariacontrol.net|Message from server: No work is available for Prediction of Malaria Prevalence Wed 11 Mar 2009 03:03:12 AM EDT|malariacontrol.net|Message from server: No work is available for Estimation of parameters of infection dynamics (variable duration, max 4h) |
|
ID: 9481 | Rating: 0 | rate: / | |
I just had a work unit fail after 18 hours and 54 minutes with a compute error -177 (0xffffff4f). |
|
ID: 9497 | Rating: 0 | rate: / | |
I have a workunit that seems stuck at 94.507% completion. I noticed this yesterday and there has been no progress since. It's been there for about 10h of computing time. suspend/resume didn't help, so I cancelled it. |
|
ID: 9498 | Rating: 0 | rate: / | |
Thank you all for helping with this first round of testing the new application! 6.13, which is a rebuild of the application for linux with a statically linked libstdc++, seems to work as expected, and the linux success rate has increased from 27% to 99% as a consequence. This enables us to have a closer look at the remaining problems on all platforms. We have plenty of error data to analyze and will stop sending out new work for now. We'll start again once we've identified and hopefully fixed the most severe remaining problems. |
|
ID: 9502 | Rating: 0 | rate: / | |
Hello, Thank you all for helping with this first round of testing the new application! 6.13, which is a rebuild of the application for linux with a statically linked libstdc++, seems to work as expected, and the linux success rate has increased from 27% to 99% as a consequence. This enables us to have a closer look at the remaining problems on all platforms. We have plenty of error data to analyze and will stop sending out new work for now. We'll start again once we've identified and hopefully fixed the most severe remaining problems. ____________ rbo |
|
ID: 9505 | Rating: 0 | rate: / | |
It looks like the Wus are restarting continuously. Will have to abort them in order to free up the computer for other WUs. |
|
ID: 9508 | Rating: 0 | rate: / | |
It looks like the Wus are restarting continuously. Will have to abort them in order to free up the computer for other WUs. After I aborted them recieved the following messags: 3/14/2009 5:59:58 AM malariacontrol.net Sending scheduler request: Requested by user. 3/14/2009 5:59:58 AM malariacontrol.net Reporting 2 completed tasks, not requesting new tasks 3/14/2009 6:00:03 AM malariacontrol.net Scheduler request completed: got 0 new tasks 3/14/2009 6:00:13 AM malariacontrol.net [error] garbage_collect(); still have active task for acked result wu_501_514_2524_0_1236733575_1; state 0 3/14/2009 6:00:13 AM malariacontrol.net [error] garbage_collect(); still have active task for acked result wu_506_232_2542_0_1236751092_0; state 0 3/14/2009 6:00:13 AM malariacontrol.net Computation for task wu_501_514_2524_0_1236733575_1 finished 3/14/2009 6:00:13 AM malariacontrol.net Output file wu_501_514_2524_0_1236733575_1_0 for task wu_501_514_2524_0_1236733575_1 absent 3/14/2009 6:00:13 AM malariacontrol.net Computation for task wu_506_232_2542_0_1236751092_0 finished 3/14/2009 6:00:13 AM malariacontrol.net Output file wu_506_232_2542_0_1236751092_0_0 for task wu_506_232_2542_0_1236751092_0 absent 3/14/2009 6:00:19 AM malariacontrol.net Sending scheduler request: To report completed tasks. 3/14/2009 6:00:19 AM malariacontrol.net Reporting 2 completed tasks, not requesting new tasks 3/14/2009 6:00:24 AM malariacontrol.net Scheduler request completed: got 0 new tasks 3/14/2009 6:00:24 AM malariacontrol.net Message from server: Completed result wu_501_514_2524_0_1236733575_1 refused: result already reported as error 3/14/2009 6:00:24 AM malariacontrol.net Message from server: Completed result wu_506_232_2542_0_1236751092_0 refused: result already reported as error 3/14/2009 6:00:24 AM malariacontrol.net [error] Got ack for task wu_501_514_2524_0_1236733575_1, but can't find it 3/14/2009 6:00:24 AM malariacontrol.net [error] Got ack for task wu_506_232_2542_0_1236751092_0, but can't find it ____________ |
|
ID: 9509 | Rating: 0 | rate: / | |
so far the application runs great for me! I'm happy to crunch some more malaria in the future. |
|
ID: 9520 | Rating: 0 | rate: / | |
Message boards : Number crunching : New version 6.12 of the malariacontrol science application ready for testing