Another stuck WU

Message boards : Number crunching : Another stuck WU

Author Message
Profile clownius
Send message
Joined: Jun 21 06
Posts: 8
Credit: 10,588
RAC: 0

This WU has been stuck for around 24hrs at 33mins and 69%. Actual run time has to be around 6hrs by now. I suspended then Resumed and still couldnt get it unstuck so aborted it in the end. Maybe a Linux app issue as both other machines that completed it were windoze
____________

RandyC
Avatar
Send message
Joined: Jun 23 06
Posts: 2942
Credit: 926,890
RAC: 1,261

This WU has been stuck for around 24hrs at 33mins and 69%. Actual run time has to be around 6hrs by now. I suspended then Resumed and still couldnt get it unstuck so aborted it in the end. Maybe a Linux app issue as both other machines that completed it were windoze


Often, shutting down boinc and restarting will get a stuck WU going again. Just suspending and resuming it won\'t force the app to reload from the checkpoint.

Profile clownius
Send message
Joined: Jun 21 06
Posts: 8
Credit: 10,588
RAC: 0

Oops too late lol. Ill try that next time. Ive had WU\'s unstick by suspend restart in the past but not shure which project that was.
____________

Augustine
Avatar
Send message
Joined: Mar 7 06
Posts: 36
Credit: 275,224
RAC: 0

I\'ve been having quite a few getting stuck too. Initially I thought that it only happened to those systems which suspended at certain times of the day, but now I\'m sure that it happens to other systems which run all the time, just switching from project to project. I\'ve aborted them.

PS: I\'m leaving the applications in memory.

____________

Dotsch
Avatar
Send message
Joined: Jun 21 06
Posts: 65
Credit: 35,926
RAC: 5

Is the application on the CPU, if this problem would happens ? - If not, it could be, that this is the BOINC API problem. If a WU would to often suspended and resumed, the application would be suspended and did not get any CPU cycles any more. This can occur if the application is linked against a BOINC 5.4.x and pre 5.8.x API.
David Anderson has written a fix for this problem, which is included in the BOINC 5.8.8 API. The fix need a relinking of the science application.
____________

u.dgl.
Send message
Joined: Mar 8 06
Posts: 26
Credit: 1,170,930
RAC: 456

See my post:

https://malariacontrol.net/forum_thread.php?id=366

Message 2080

u.dgl.
____________

KAMasud
Send message
Joined: Jan 7 07
Posts: 12
Credit: 18,733
RAC: 0


:-( i have two WU\'s stuck? have run them for six hours and then restarted, nothing, so i will have to abort them. Does this have some thing to do with the version of BOINC? I am running windows XP.
regards
Masud.
____________

Dotsch
Avatar
Send message
Joined: Jun 21 06
Posts: 65
Credit: 35,926
RAC: 5


:-( i have two WU\'s stuck? have run them for six hours and then restarted, nothing, so i will have to abort them. Does this have some thing to do with the version of BOINC? I am running windows XP.
regards
Masud.

Could you please try to stop the BOINC Client, ensure that no BOINC and Malariacontroll procrss is running and restart it again.
____________

Profile The Gas Giant
Avatar
Send message
Joined: Mar 7 06
Posts: 1214
Credit: 3,625,990
RAC: 2,644

I was chatting to someone on IRC (irc.freenode.org, channel #boinc) who says that nearly all the new wu\'s are getting stuck on his machine and is aborting them all and giving up on MC.

I, personally, am having no problems.

Live long and BOINC.

____________
Paul
(S@H1 8888)

Ib Rasmussen
Send message
Joined: Jan 12 07
Posts: 2
Credit: 6,162,196
RAC: 5,483

I\'ve just had two mapprediction wus stuck at 0% for hours before I realised that something was wrong.

The units were mapwca0001378.txt_1 and mapwca0002001.txt_1. They were the last of the kind in my queue, otherwise I would have killed them off.

This was on an AMD-machine running NT4.

/Ib

adrianxw
Avatar
Send message
Joined: Mar 8 06
Posts: 145
Credit: 498,634
RAC: 246

Hej Ib,

There are others having problems with these new wu\'s as well, discussed in this thread. My wu that stuck was also running on an NT4 box although Intel engined.

Venlig hilsen!
____________
Wave upon wave of demented avengers march cheerfully out of obscurity into the dream.

Profile m.mitch
Avatar
Send message
Joined: Mar 8 06
Posts: 43
Credit: 92,560
RAC: 222

I have a work unit that\'s up to 17 hours 21 minutes and still at 0.00%, is there a known fix or should I abort?
____________


Join the #1 Aussie Alliance on Malaria Control

RandyC
Avatar
Send message
Joined: Jun 23 06
Posts: 2942
Credit: 926,890
RAC: 1,261

I have a work unit that\'s up to 17 hours 21 minutes and still at 0.00%, is there a known fix or should I abort?


If it\'s one of the new application WUs, it shouldn\'t take more than one hour max to run, so abort it.

If it\'s an old-style WU, try shutting down BOINC and restarting before you abort. (Give it a little while to see if the %complete is going up).

Profile m.mitch
Avatar
Send message
Joined: Mar 8 06
Posts: 43
Credit: 92,560
RAC: 222

It\'s a Prediction of Malaria Prevalence 1.12.

I don\'t know if that\'s old, new or different. I\'ll have a quick scan through the fora and see if there\'s anything mentioned.

edit:Couldn\'t find anything relevant.
____________


Join the #1 Aussie Alliance on Malaria Control

adrianxw
Avatar
Send message
Joined: Mar 8 06
Posts: 145
Credit: 498,634
RAC: 246

That is one of the new wu\'s, apparently should not take more the an hour. You should abort that one in keeping with the advice in the this thread, although they also say it will kill itself after a \"reasonable\" time...
____________
Wave upon wave of demented avengers march cheerfully out of obscurity into the dream.

Profile m.mitch
Avatar
Send message
Joined: Mar 8 06
Posts: 43
Credit: 92,560
RAC: 222

That is one of the new wu\'s, apparently should not take more the an hour. You should abort that one in keeping with the advice in the this thread, although they also say it will kill itself after a \"reasonable\" time...

I think 17+ hours is reasonable ;-) and it hasn\'t killed itself, so I\'ll abort it.

Thanks everyone.
____________


Join the #1 Aussie Alliance on Malaria Control

Post to thread

Message boards : Number crunching : Another stuck WU


Return to malariacontrol.net main page


Copyright © 2013 africa@home