A second science application for malariacontrol.net


Advanced search

Message boards : Malaria Control : A second science application for malariacontrol.net

AuthorMessage
Profile maire
Volunteer moderator
Project administrator
Project developer
Project scientist
Send message
Joined: Nov 7 05
Posts: 439
Credit: 118,258
RAC: 0
Message 2031 - Posted 9 Feb 2007 18:41:47 UTC

    We are preparing the second science application to be run using malariacontrol.net. We plan to upload this application next Tuesday (Feb 13th) if our last tests come out ok.
    This application was developed at the Swiss Tropical Institute and predicts the spatial distribution of malaria. We\'ll post a short description of the scientific objectives in this thread early next week. As with all malariacontrol.net applications, the results will be published in peer reviewed literature.

    There are a few things that are worth pointing out: This application makes use of BOINC\'s wrapper approach for legacy applications. This has a few drawbacks: there is no checkpointing and no feedback from the application about progress. Therefore a workunit will start from the beginning if interrupted. This should not be a big problem because the workunits are relatively short (less than half an hour on most PCs).

    We have several batches of workunits that we plan to send out over the next few weeks. This first stage will comprise a total of 200\'000 workunits. The current simulation model will keep running and send out work in parallel.

    Nick
    ____________
    Nicolas Maire
    Swiss Tropical and Public Health Institute
    http://www.swisstph.ch

    Profile The Gas Giant
    Avatar
    Send message
    Joined: Mar 7 06
    Posts: 1214
    Credit: 3,722,470
    RAC: 1,336
    Message 2033 - Posted 10 Feb 2007 10:32:33 UTC

      Thanks for the update. It\'s good to hear about what is happening.

      Shame about the lack of check points. I run MC on my work laptop which gets turned off twice a day (to/from work) so I will be loosing crunch time because of it.

      Live long and BOINC.


      ____________
      Paul
      (S@H1 8888)

      RandyC
      Avatar
      Send message
      Joined: Jun 23 06
      Posts: 3174
      Credit: 982,266
      RAC: 942
      Message 2037 - Posted 10 Feb 2007 21:54:52 UTC - in response to Message 2031.

        Last modified: 10 Feb 2007 21:55:48 UTC


        There are a few things that are worth pointing out: This application makes use of BOINC\'s wrapper approach for legacy applications. This has a few drawbacks: there is no checkpointing and no feedback from the application about progress. Therefore a workunit will start from the beginning if interrupted. This should not be a big problem because the workunits are relatively short (less than half an hour on most PCs).


        I see that the Wrapper process requires BOINC 5.5 or above. Can we assume that earlier clients will not be downloading the new application?

        n.b. Guess it\'s time to do that upgrade to 5.8 now.

        Profile Dagorath
        Send message
        Joined: Jun 26 06
        Posts: 68
        Credit: 71,310
        RAC: 0
        Message 2039 - Posted 11 Feb 2007 0:00:08 UTC

          Thanks for the warning. I\'ll be detaching Malaria on Feb 12. No checkpoints is bad enough but no progress indicator as well? What??? Have you taken leave of your senses?!!


          ____________
          --

          Robbie Lawrence
          Send message
          Joined: Jan 4 07
          Posts: 12
          Credit: 39,680
          RAC: 0
          Message 2045 - Posted 11 Feb 2007 6:09:55 UTC - in response to Message 2039.

            Thanks for the warning. I\'ll be detaching Malaria on Feb 12. No checkpoints is bad enough but no progress indicator as well? What??? Have you taken leave of your senses?!!


            I was thinking about this -- World Community Grid has a system where you can choose what projects you wish to have sent to your computer to crunch. Since some people don\'t want to and can\'t run this new application, and if you have plans to introduce more applications on top of these two, maybe introduce something like that?

            I for one will be more than happy to crunch the new and old applications, since my computers never go off, however, and it\'s great to see progress on the project. Good work guys!

            Professor Desty Nova
            Avatar
            Send message
            Joined: Mar 7 06
            Posts: 3
            Credit: 572,914
            RAC: 241
            Message 2047 - Posted 11 Feb 2007 9:32:37 UTC - in response to Message 2039.

              Thanks for the warning. I\'ll be detaching Malaria on Feb 12. No checkpoints is bad enough but no progress indicator as well? What??? Have you taken leave of your senses?!!


              Quote from the wrapper page: \"A legacy application is one for which an executable is available, but not the source code. Therefore it cannot use the BOINC API and runtime system. However, such applications can be run using BOINC.\"

              So I guess if the original application doesn\'t have a checkpoints, and you can\'t change the sourcecode, you\'ll have to leave it like that.
              ____________


              Professor Desty Nova
              Researching Karma the Hard Way

              Profile Lonely
              Avatar
              Send message
              Joined: Mar 8 06
              Posts: 2
              Credit: 135,142
              RAC: 0
              Message 2048 - Posted 11 Feb 2007 11:56:56 UTC

                Last modified: 11 Feb 2007 11:57:36 UTC

                Excuse my ignorance, but is it not possible to \'suspend\' WU\'s prior to exiting the program? If this were possible, it would avoid any loss of work and by doing so, allow those who cannot have their PC\'s running 24/7 to continue their support without loss of work/time/energy consumed. As indicated, some will leave MC and go elsewhere... it may prove a loss you can ill afford.
                ____________

                Keck_Komputers
                Volunteer moderator
                Volunteer tester
                Avatar
                Send message
                Joined: Nov 10 05
                Posts: 29
                Credit: 336,891
                RAC: 451
                Message 2049 - Posted 11 Feb 2007 12:02:28 UTC - in response to Message 2048.

                  Excuse my ignorance, but is it not possible to \'suspend\' WU\'s prior to exiting the program? If this were possible, it would avoid any loss of work and by doing so, allow those who cannot have their PC\'s running 24/7 to continue their support without loss of work/time/energy consumed. As indicated, some will leave MC and go elsewhere... it may prove a loss you can ill afford.

                  Sorry but if the app does not checkpoint suspending does no good.
                  ____________
                  BOINC WIKI

                  BOINCing since 2002/12/8

                  Profile Ananas
                  Send message
                  Joined: Mar 7 06
                  Posts: 58
                  Credit: 752,054
                  RAC: 1
                  Message 2053 - Posted 11 Feb 2007 23:56:59 UTC

                    Suspend alone is usually not a problem, as long as \"Leave applications in memory while suspended?\" is set to \"Yes\".

                    The checkpoint is needed as soon as the application ends, i.e. leaves the computer memory.

                    Profile Dagorath
                    Send message
                    Joined: Jun 26 06
                    Posts: 68
                    Credit: 71,310
                    RAC: 0
                    Message 2054 - Posted 12 Feb 2007 0:32:03 UTC

                      The lack of checkpoints is not a big concern for me but it will be for many other crunchers.

                      The big problem is the lack of progress indicator...we will never know if the app has been sitting there spinning its wheel for 2 or 3 days unless we keep a list of WUs and the times they start. We will always be facing the \"to abort or not to abort\" conundrum and wondering if the WU is just an extra long WU or whether it\'s stalled.

                      Nope, not for me!! That\'s far too much work and bother when the only favors I get in return are worthless credits, worn out hard drives and fans and power bills. This second app needs to be offered as an option. When the admns get that fixed up and get fixed credits working (far more important than a second app) then they can email me and let me know, I might even consider crunching Malaria again.



                      ____________
                      --

                      Profile maire
                      Volunteer moderator
                      Project administrator
                      Project developer
                      Project scientist
                      Send message
                      Joined: Nov 7 05
                      Posts: 439
                      Credit: 118,258
                      RAC: 0
                      Message 2056 - Posted 12 Feb 2007 16:34:21 UTC

                        Here\'s a little bit of additional information on the new science application. We hope this will help you to decide if you want to sit out for the estimated 2-3 weeks it will take to go through the various batches.
                        -The new application will only run on windows clients. All other hosts will get workunits just like before.
                        -We have only tested the new application with core clients of version greater or equal 5.4 (We don\'t need min 5.5 because we use a modified wrapper app). Hosts with older core client versions will get workunits just like before.
                        -All workunits take more or less constant time to complete. On our 2.4 GHz pentium 4, a workunit takes just under 20 mins to complete. This means that you should be able to tell when a workunit gets stuck (we have not seen this during the tests). Further, we have assigned a fixed credit to each workunit of 1.8. This is our best estimate to match the credit or the previous application. We can adjust this if necessary after we get results back from the first small batches.

                        The remainder of this post outlines the scientific objectives of the new app:

                        Plasmodium falciparum malaria is the world’s most important parasitic disease, with a major cause of morbidity and mortality in Africa. A frequently quoted estimated is that in 1995 in sub-Saharan Africa around 1 million deaths and 220 million clinical episodes were directly attributable to malaria mostly in children below the age of 5yrs. In epidemic prone malaria areas in Southern Africa, about 2000 deaths and 200,000 clinical episodes occur annually. However these figures are very uncertain, since reliable maps of the distribution of malaria transmission and the numbers of affected individuals are not available for most of the African continent. Reliable maps of the geographical distribution of malaria are urgently needed for accurate estimation of disease burden, for identifying which geographical areas should be prioritised for purposes of resource allocation and for assessing the progress of intervention programs.
                        The Mapping Malaria Risk in Africa (MARA/ARMA) project was established in 1996 to provide estimates of the distribution of malaria in Africa. It is a collaborative network of key African scientists and institutions with the aim of providing an atlas of malaria for evidence-based and targeted malaria control in Africa. The Swiss Tropical institute is an active partner of this collaboration. To date results of well over 10,000 malaria prevalence surveys have been collated from published and unpublished sources been collated into a single, electronically accessible repository representing the most comprehensive database on malaria in Africa.
                        The current application analyses malaria survey data from the MARA database collected over 300 locations in West and Central Africa. We fit a Bayesian geostatistical model to relate malaria prevalence to environmental factors such as rainfall, temperature, vegetation and abundance among others which were gathered via Remote Sensing. Based on this model we predict the malaria risk at location with no prevalence data over a grid of 200\'000 pixels.
                        Nick
                        ____________
                        Nicolas Maire
                        Swiss Tropical and Public Health Institute
                        http://www.swisstph.ch

                        renke
                        Send message
                        Joined: Jun 26 06
                        Posts: 2
                        Credit: 259,413
                        RAC: 0
                        Message 2060 - Posted 12 Feb 2007 19:58:05 UTC - in response to Message 2056.

                          The new application will only run on windows clients


                          *sigh* no one loves penguins...

                          Chaz
                          Send message
                          Joined: Jun 22 06
                          Posts: 4
                          Credit: 21,095
                          RAC: 0
                          Message 2066 - Posted 13 Feb 2007 22:19:12 UTC - in response to Message 2056.

                            Nick
                            Thanks for the extra info. I\'m sorry to see that one or two people have reacted badly to this new application. I am not a 24/7 cruncher but I assume you will allow a long enough deadline so if the odd work unit ends up being restarted that will not cause a problem.
                            I think the general reaction of most of the crunchers on this project will be to accept that at the moment this is the only way you can run this particular application and that checkpointing isn\'t such a major issue on 20min WU\'s. Please don\'t be disheartened by the odd negative response I think you have already found that the majority of us will do our best to support this project although you may get the odd bit of constructive criticism.
                            Cheers
                            Chaz
                            ____________

                            Profile Contact
                            Avatar
                            Send message
                            Joined: Jun 24 06
                            Posts: 2
                            Credit: 87,852
                            RAC: 0
                            Message 2070 - Posted 14 Feb 2007 23:02:40 UTC - in response to Message 2056.

                              maire wrote:


                              -The new application will only run on windows clients. All other hosts will get workunits just like before.

                              This app - predictor 112, will not run properly on Win9x. A DOS box is opened when app starts.

                              Unless you see a way to deny Win98 receiving this app until a possible fix, we should suspend Win98 hosts.

                              Post a news item if you want to test this app with Win98 in the future and we\'ll resume.

                              Alternatively, maybe you can use a section of prefs (already in BOINC code, I think) to determine if we want to run new apps, so we can run the older apps only if on Win9x.

                              For sure, keep up the good work!


                              ____________

                              Click and enter your name for your BOINC Statistics

                              Profile maire
                              Volunteer moderator
                              Project administrator
                              Project developer
                              Project scientist
                              Send message
                              Joined: Nov 7 05
                              Posts: 439
                              Credit: 118,258
                              RAC: 0
                              Message 2073 - Posted 15 Feb 2007 8:43:51 UTC - in response to Message 2070.


                                This app - predictor 112, will not run properly on Win9x. A DOS box is opened when app starts.

                                Unless you see a way to deny Win98 receiving this app until a possible fix, we should suspend Win98 hosts.

                                Post a news item if you want to test this app with Win98 in the future and we\'ll resume.

                                Alternatively, maybe you can use a section of prefs (already in BOINC code, I think) to determine if we want to run new apps, so we can run the older apps only if on Win9x.

                                For sure, keep up the good work!

                                Thanks, we\'re currently looking at the results from the first batch we sent out. We haven\'t sent any new workunits in the last 10 hours, but of course a few will be resent.
                                I\'ll let you when and how we\'ll proceed.
                                Nick
                                ____________
                                Nicolas Maire
                                Swiss Tropical and Public Health Institute
                                http://www.swisstph.ch

                                Profile The Gas Giant
                                Avatar
                                Send message
                                Joined: Mar 7 06
                                Posts: 1214
                                Credit: 3,722,470
                                RAC: 1,336
                                Message 2074 - Posted 15 Feb 2007 10:46:29 UTC

                                  Just completed a few wu\'s. I like the 1.8 credits for 660 seconds of work!

                                  Profile maire
                                  Volunteer moderator
                                  Project administrator
                                  Project developer
                                  Project scientist
                                  Send message
                                  Joined: Nov 7 05
                                  Posts: 439
                                  Credit: 118,258
                                  RAC: 0
                                  Message 2075 - Posted 15 Feb 2007 11:15:46 UTC - in response to Message 2074.

                                    Just completed a few wu\'s. I like the 1.8 credits for 660 seconds of work!

                                    We\'ve chosen that to match the credit per time we got on a small number of reference computers here that run both applications. We may adjust it a little for future workunits if we see it\'s too generous. Linux users should not have a disadvantage.
                                    Nick
                                    ____________
                                    Nicolas Maire
                                    Swiss Tropical and Public Health Institute
                                    http://www.swisstph.ch

                                    Profile maire
                                    Volunteer moderator
                                    Project administrator
                                    Project developer
                                    Project scientist
                                    Send message
                                    Joined: Nov 7 05
                                    Posts: 439
                                    Credit: 118,258
                                    RAC: 0
                                    Message 2078 - Posted 15 Feb 2007 15:58:24 UTC - in response to Message 2070.


                                      This app - predictor 112, will not run properly on Win9x. A DOS box is opened when app starts.

                                      Dear Win98 users, we investigated this problem and have so far not found an workaround. It\'s caused by Win98 behaving a bit differently when starting a new process. The good news is that the program runs correctly, as long as you don\'t close that window. We are getting back valid results from win 98 clients. We therefore decided to start sending out workunits again. In the meantime we look for a workaround.
                                      Nick

                                      ____________
                                      Nicolas Maire
                                      Swiss Tropical and Public Health Institute
                                      http://www.swisstph.ch

                                      Profile The Gas Giant
                                      Avatar
                                      Send message
                                      Joined: Mar 7 06
                                      Posts: 1214
                                      Credit: 3,722,470
                                      RAC: 1,336
                                      Message 2079 - Posted 15 Feb 2007 19:08:39 UTC - in response to Message 2075.

                                        Just completed a few wu\'s. I like the 1.8 credits for 660 seconds of work!

                                        We\'ve chosen that to match the credit per time we got on a small number of reference computers here that run both applications. We may adjust it a little for future workunits if we see it\'s too generous. Linux users should not have a disadvantage.
                                        Nick

                                        LOL...Linux users get screwed for credit with BOINC when compared to windows in any case. Linux benchmarks lower than windows on the same machine so claims lower. I wouldn\'t adjust it purely for that reason as the problem is not caused by the project, just BOINC. I would be comparing it to other projects. I feel the 1.8 credits is just about right.

                                        I only received 1 of the new wu\'s on my laptop and it was completed before I had to shut it down. So I didn\'t waste any cpu cycles.

                                        u.dgl.
                                        Send message
                                        Joined: Mar 8 06
                                        Posts: 26
                                        Credit: 1,217,644
                                        RAC: 366
                                        Message 2080 - Posted 16 Feb 2007 8:37:40 UTC

                                          Hello,

                                          it seems, that this wu is stuck:

                                          15.02.2007 20:39:25|malariacontrol.net beta|Starting task mapwca0000602.txt_1 using mappredictor version 112

                                          It is running already 12h 57min at 0.000%


                                          u.dgl.
                                          ____________

                                          u.dgl.
                                          Send message
                                          Joined: Mar 8 06
                                          Posts: 26
                                          Credit: 1,217,644
                                          RAC: 366
                                          Message 2086 - Posted 16 Feb 2007 15:57:28 UTC

                                            An other one was stuck:

                                            16.02.2007 16:11:22|malariacontrol.net beta|Starting task mapwca0000822.txt_1 using mappredictor version 112

                                            u.dgl.

                                            ____________

                                            KAMasud
                                            Send message
                                            Joined: Jan 7 07
                                            Posts: 12
                                            Credit: 18,733
                                            RAC: 0
                                            Message 2087 - Posted 16 Feb 2007 19:40:50 UTC - in response to Message 2078.


                                              Running Climate on Win9x also opens a DOS box but i get valid results if i dont shut the DOS box :-) i dont mind these short WU\'s for a change, please keep them coming:-) as it is i have set write to disk every 400 sec in Preferences. :-( and i am from that part of the world where you cant rely on power :-)
                                              Regards
                                              Masud.


                                              This app - predictor 112, will not run properly on Win9x. A DOS box is opened when app starts.

                                              Dear Win98 users, we investigated this problem and have so far not found an workaround. It\'s caused by Win98 behaving a bit differently when starting a new process. The good news is that the program runs correctly, as long as you don\'t close that window. We are getting back valid results from win 98 clients. We therefore decided to start sending out workunits again. In the meantime we look for a workaround.
                                              Nick


                                              ____________

                                              B-Roy
                                              Send message
                                              Joined: Jul 14 06
                                              Posts: 9
                                              Credit: 9,858
                                              RAC: 0
                                              Message 2088 - Posted 16 Feb 2007 23:37:28 UTC

                                                Is it actually normal that there is no screensaver for the wu \"Prediction of Malaria Prevalence 1.12\"?
                                                ____________

                                                Hans Sveen
                                                Send message
                                                Joined: Mar 7 06
                                                Posts: 2
                                                Credit: 274,761
                                                RAC: 0
                                                Message 2089 - Posted 17 Feb 2007 3:09:57 UTC - in response to Message 2086.

                                                  An other one was stuck:

                                                  16.02.2007 16:11:22|malariacontrol.net beta|Starting task mapwca0000822.txt_1 using mappredictor version 112

                                                  u.dgl.

                                                  Hi!
                                                  Also got two that was stuck, one run for almost 9.5 hour( res id: http://www.malariacontrol.net/result.php?resultid=4423465) and one for nearly 1.5 hour(http://www.malariacontrol.net/result.php?resultid=4429738), both was on host id 298.
                                                  Also got one valid result http://www.malariacontrol.net/result.php?resultid=4432377 on host id 250.

                                                  A wild guess: The first one is dual core cpu , the second an ordinary single core cpu, maybe this will help solving the issue with \"hunging\" apps?

                                                  With regards,

                                                  ____________
                                                  Hans Sveen
                                                  Oslo, Norway

                                                  u.dgl.
                                                  Send message
                                                  Joined: Mar 8 06
                                                  Posts: 26
                                                  Credit: 1,217,644
                                                  RAC: 366
                                                  Message 2093 - Posted 17 Feb 2007 9:13:03 UTC

                                                    Hi,

                                                    it seems that the dual core pc are the problem.

                                                    I have seen the same effect as Hans Sveen:

                                                    on my dual core pc the wus go stuck, the single core have no problem.

                                                    Greetings

                                                    u.dgl.
                                                    ____________

                                                    Triciabuk
                                                    Avatar
                                                    Send message
                                                    Joined: Jan 19 07
                                                    Posts: 1
                                                    Credit: 9,440
                                                    RAC: 0
                                                    Message 2096 - Posted 17 Feb 2007 11:04:41 UTC - in response to Message 2093.

                                                      Hi,

                                                      it seems that the dual core pc are the problem.

                                                      I have seen the same effect as Hans Sveen:

                                                      on my dual core pc the wus go stuck, the single core have no problem.

                                                      Greetings

                                                      u.dgl.



                                                      HI

                                                      Can we tell whether a unit is the one of the new ones from the WU Number?

                                                      I have completed units which have taken about 600, 1200 and 1500 seconds and run perfectly OK on a dual core Pentium D. I don\'t know for sure that they were second application units though.

                                                      Regards

                                                      Tricia
                                                      ____________

                                                      wolfsong
                                                      Send message
                                                      Joined: Feb 11 07
                                                      Posts: 4
                                                      Credit: 7,048
                                                      RAC: 0
                                                      Message 2098 - Posted 17 Feb 2007 17:38:37 UTC - in response to Message 2096.

                                                        Last modified: 17 Feb 2007 17:42:43 UTC

                                                        Have also had problems with the new work units. Didn\'t think my PC was dual core but so far none of the 4 mappredict units that have downloaded to my pc have worked, the time just trundles past and no progress is made. Normal units are still working fine.

                                                        I cleared all my other project work units down over-night and I even tried resetting the project but the 2 mappredict units I got today are showing the same problem.

                                                        Bit frustrating really!

                                                        Profile Nightbird
                                                        Send message
                                                        Joined: Mar 7 06
                                                        Posts: 110
                                                        Credit: 395,345
                                                        RAC: 0
                                                        Message 2100 - Posted 17 Feb 2007 21:21:14 UTC

                                                          Last modified: 17 Feb 2007 21:25:27 UTC

                                                          Same problem here with a Barton 3200+ under Millenium (CC 5.4.9). The \"new\" wu is running endless (since yesterday) but a normal wu is working fine.

                                                          edit :
                                                          I\'m using BoincView and according this manager, the cpu efficiency = 0.
                                                          ____________

                                                          Do you want to get banned for 31 years and your account & credits deleted at a Boinc project ? Predictor@home is your best choice.

                                                          wolfsong
                                                          Send message
                                                          Joined: Feb 11 07
                                                          Posts: 4
                                                          Credit: 7,048
                                                          RAC: 0
                                                          Message 2101 - Posted 17 Feb 2007 22:43:31 UTC - in response to Message 2100.

                                                            On closer inspection I think I do have dual core (sys info shows 2 cpu\'s, I didn\'t know that!! shows how well I know my pc huh!). Guess that ties in with what the others have been saying.

                                                            Any chance of a response from the techies please? Do I let the units keep running? Seems a waste of time at the moment. Are they going to fix the problem? Is there anything I can do so that they will work?

                                                            Profile Nightbird
                                                            Send message
                                                            Joined: Mar 7 06
                                                            Posts: 110
                                                            Credit: 395,345
                                                            RAC: 0
                                                            Message 2102 - Posted 17 Feb 2007 23:43:52 UTC - in response to Message 2101.

                                                              Last modified: 17 Feb 2007 23:44:38 UTC

                                                              On closer inspection I think I do have dual core (sys info shows 2 cpu\'s, I didn\'t know that!! shows how well I know my pc huh!). Guess that ties in with what the others have been saying.

                                                              Any chance of a response from the techies please? Do I let the units keep running? Seems a waste of time at the moment. Are they going to fix the problem? Is there anything I can do so that they will work?

                                                              The wu is suspended on my machine.
                                                              The best is to wait monday now.

                                                              ____________

                                                              Do you want to get banned for 31 years and your account & credits deleted at a Boinc project ? Predictor@home is your best choice.

                                                              AnRM
                                                              Send message
                                                              Joined: Mar 7 06
                                                              Posts: 54
                                                              Credit: 2,130,571
                                                              RAC: 0
                                                              Message 2103 - Posted 17 Feb 2007 23:48:29 UTC - in response to Message 2101.

                                                                Last modified: 18 Feb 2007 0:20:27 UTC

                                                                On closer inspection I think I do have dual core (sys info shows 2 cpu\'s, I didn\'t know that!! shows how well I know my pc huh!). Guess that ties in with what the others have been saying.

                                                                Any chance of a response from the techies please? Do I let the units keep running? Seems a waste of time at the moment. Are they going to fix the problem? Is there anything I can do so that they will work?


                                                                Well, I\'m not a techie but you don\'t have a dual core. I have a number of AMD dual core machines and they aren\'t having any problems with the new WUs. I would suggest that if you are using the BOINC screensaver, you change it to the \'blank\' option found on your list of Windows supplied screensavers. This will speed up your processing time and could eliminate your problem. The 2 CPU listing you were looking at is the BOINC default settings and only need to be changed for multicore CPUs (more than 2)ie. servers etc. Hope this helps....Rog.
                                                                Edit: Wolf, I see you are using BOINC version 5.4.11....you could also try upgrading to the current recommended version ie. 5.8.11 that is downloadable from the BOINC home page. FYI, our AMD X2\'s are configured with BOINC ver5.8.11, and blank screen savers. They seem stable and process these new WUs in about 7-8 minutes....Cheers, Rog.
                                                                ____________

                                                                Robbie Lawrence
                                                                Send message
                                                                Joined: Jan 4 07
                                                                Posts: 12
                                                                Credit: 39,680
                                                                RAC: 0
                                                                Message 2105 - Posted 18 Feb 2007 2:47:57 UTC

                                                                  This isn\'t specific to dualcore CPUs. Whether or not it is specific to a certain type of processor, I don\'t know but I somehow doubt it.
                                                                  ____________

                                                                  adrianxw
                                                                  Avatar
                                                                  Send message
                                                                  Joined: Mar 8 06
                                                                  Posts: 145
                                                                  Credit: 512,868
                                                                  RAC: 200
                                                                  Message 2107 - Posted 18 Feb 2007 8:55:25 UTC

                                                                    Last modified: 18 Feb 2007 9:48:51 UTC

                                                                    This new wu is \"stuck\". It has so far run 08:38:52 at 100% CPU, so is not the \"says it is running but isn\'t\" fault I\'ve seen before. I stop/started BOINC as that normally clears stuck wu\'s, but won\'t know if this is fixed until it gets the CPU again, which with the huge negative debt it has run up overnight, may be sometime yet. In the \"stuck\" state, it has monopolised BOINC, none of my other projects have had any CPU overnight. I certainly hope I\'ve not got any of these at my remote site since I only get there occasionally.

                                                                    Machine is a 2.8GHz Northwood, not a dual core in sight. NT4 SP6a BOINC core 5.8.8, no graphics, leave in memory set.

                                                                    @WolfSong

                                                                    You probably are showing 2 processors because your CPU has hyper threading, which makes a chip \"appear\" to have 2 processors when in fact Intel are using a few tricks to share two tasks on a single CPU. Both of my hyper threaded machines show up that way.

                                                                    *** EDIT ***

                                                                    For the sake of research, I have suspended the other projects to let the problem wu run, it has started again from zero as expected. Will watch and report.

                                                                    I notice the other machine that is crunching that unit has not returned it yet either, normally that machine, like mine, returns wu\'s very quickly.

                                                                    The wu has now been running again for 50:45 at 100%, looks like it is going nowhere, but I\'ll give it another 30 minutes or so just to give it a chance.

                                                                    This is an insidious fault since there is no indication that anything is wrong to a casual glance.
                                                                    ____________
                                                                    Wave upon wave of demented avengers march cheerfully out of obscurity into the dream.

                                                                    Adywebb
                                                                    Avatar
                                                                    Send message
                                                                    Joined: Jan 5 07
                                                                    Posts: 15
                                                                    Credit: 11,657
                                                                    RAC: 0
                                                                    Message 2108 - Posted 18 Feb 2007 9:18:00 UTC

                                                                      Just to say I\'ve not had a problem with any of these new WU\'s so far - all completing without problem in around 10 minutes.
                                                                      ____________

                                                                      adrianxw
                                                                      Avatar
                                                                      Send message
                                                                      Joined: Mar 8 06
                                                                      Posts: 145
                                                                      Credit: 512,868
                                                                      RAC: 200
                                                                      Message 2110 - Posted 18 Feb 2007 10:27:34 UTC

                                                                        Last modified: 18 Feb 2007 11:15:20 UTC

                                                                        Sorry, it won\'t let me edit any more!

                                                                        It has run for 90 minutes now in it\'s second life. I have suspended it pending advice. Collectively, it has had over 10 hours now.

                                                                        There are others having problems discussed in this thread.
                                                                        ____________
                                                                        Wave upon wave of demented avengers march cheerfully out of obscurity into the dream.

                                                                        Profile Nightbird
                                                                        Send message
                                                                        Joined: Mar 7 06
                                                                        Posts: 110
                                                                        Credit: 395,345
                                                                        RAC: 0
                                                                        Message 2111 - Posted 18 Feb 2007 11:03:14 UTC - in response to Message 2110.

                                                                          Sorry, it won\'t let me edit any more!

                                                                          It has run for 90 minutes now in it\'s second life. I have suspended it pending advice. Collectively, it has had over 10 hours now.

                                                                          If the wu goes to run, it will finish probably with a \"Maximum CPU time exceeded\".
                                                                          ____________

                                                                          Do you want to get banned for 31 years and your account & credits deleted at a Boinc project ? Predictor@home is your best choice.

                                                                          wolfsong
                                                                          Send message
                                                                          Joined: Feb 11 07
                                                                          Posts: 4
                                                                          Credit: 7,048
                                                                          RAC: 0
                                                                          Message 2115 - Posted 18 Feb 2007 23:58:13 UTC

                                                                            Thanks for all the help :) Was not using the screensaver at all, but have updated Boinc version. Not had a mappredict wu download yet today though so no idea if it made any difference!

                                                                            Andreas
                                                                            Send message
                                                                            Joined: Feb 19 07
                                                                            Posts: 6
                                                                            Credit: 78,266
                                                                            RAC: 68
                                                                            Message 2121 - Posted 19 Feb 2007 15:09:03 UTC - in response to Message 2078.


                                                                              Dear Win98 users, we investigated this problem and have so far not found an workaround. It\'s caused by Win98 behaving a bit differently when starting a new process. The good news is that the program runs correctly, as long as you don\'t close that window. We are getting back valid results from win 98 clients. We therefore decided to start sending out workunits again. In the meantime we look for a workaround.
                                                                              Nick


                                                                              When I ran the predictor app, the DOS box opened, but I also got this message over and over:

                                                                              2007-02-19 15:56:21|malariacontrol.net beta|app reporting negative CPU: -737869762948.382080

                                                                              Other than that nothing happens. I was afraid something was wrong, so I aborted those results. Should I have let them run?

                                                                              Michael
                                                                              Volunteer moderator
                                                                              Project scientist
                                                                              Send message
                                                                              Joined: May 5 06
                                                                              Posts: 79
                                                                              Credit: 494
                                                                              RAC: 0
                                                                              Message 2123 - Posted 19 Feb 2007 18:22:20 UTC - in response to Message 2121.

                                                                                Dear users,
                                                                                there are a number of issues at the moment with the mappredictor application, which we would like to comment on.

                                                                                - Stuck workunits: We see from the results we got back, that the client version clearly has no influence, while multi-core processors do have about a 2 times higher probability of the workunits getting stuck. But since also single core machines can have this problem, it cannot be the only reason. We limited the fpops limit for the workunits, so they will terminate by themselves within reasonable time if they get stuck. If they don\'t, please cancel them after an hour or so.. they should in fact not take longer than 30 minutes.

                                                                                - Server load: as some may have noticed, the launch of the new application has made our server struggle a bit on a few occasions. We think that this has to do with the big number of hosts which are connecting for the first time and have to download the full application. This creates a huge amount of network traffic on our side.

                                                                                - We see that our error rate does vary quiet strongly with time, and conclude from this (for the time being) that there might be a connection between the transiently high server load and some of the errors. Therefore we try to throttle the number of workunits launched per hour to value which the server can cope with, but still would like more clients to download the application files. We would like to reach a stable state, because if there are problems related to the high server load, there is no way for us to really tell what is what. Currently we\'re almost down to zero for this night, and will slowly go up again tomorrow.

                                                                                We apologise for inconveniences, hope you understand that we do what we can, thank you very much for you collaboration, and hope you keep crunching for us!!

                                                                                Cheers
                                                                                Michael


                                                                                ____________
                                                                                Michael

                                                                                Profile The Gas Giant
                                                                                Avatar
                                                                                Send message
                                                                                Joined: Mar 7 06
                                                                                Posts: 1214
                                                                                Credit: 3,722,470
                                                                                RAC: 1,336
                                                                                Message 2125 - Posted 19 Feb 2007 19:49:23 UTC

                                                                                  Thanks for the update Michael.

                                                                                  HomeGnome, yes you should have let it run for atleast 30minutes and maybe upto an hour before aborting it.

                                                                                  Live long and BOINC.

                                                                                  ____________
                                                                                  Paul
                                                                                  (S@H1 8888)

                                                                                  u.dgl.
                                                                                  Send message
                                                                                  Joined: Mar 8 06
                                                                                  Posts: 26
                                                                                  Credit: 1,217,644
                                                                                  RAC: 366
                                                                                  Message 2127 - Posted 20 Feb 2007 7:40:01 UTC


                                                                                    - Stuck workunits: We see from the results we got back, that the client version clearly has no influence, while multi-core processors do have about a 2 times higher probability of the workunits getting stuck. But since also single core machines can have this problem, it cannot be the only reason. We limited the fpops limit for the workunits, so they will terminate by themselves within reasonable time if they get stuck. If they don\'t, please cancel them after an hour or so.. they should in fact not take longer than 30 minutes.


                                                                                    This sounds like a carnival joke!
                                                                                    I had on my dual core pc since beginning of the mappredictor application one! wu, that was not stuck. A few moments ago i aborted the latest

                                                                                    20.02.2007 08:29:43|malariacontrol.net beta|Unrecoverable error for result mapwca0006143.txt_3 (aborted by user)

                                                                                    That wu was running overnight more than 13 hours.

                                                                                    Greetings
                                                                                    u.dgl.

                                                                                    ____________

                                                                                    Andreas
                                                                                    Send message
                                                                                    Joined: Feb 19 07
                                                                                    Posts: 6
                                                                                    Credit: 78,266
                                                                                    RAC: 68
                                                                                    Message 2128 - Posted 20 Feb 2007 8:01:05 UTC - in response to Message 2125.

                                                                                      HomeGnome, yes you should have let it run for atleast 30minutes and maybe upto an hour before aborting it.


                                                                                      OK, I\'ll do that. But what does the \"negative cpu\" message mean? Every second the predictor app is running I get this message. Other than that, nothing at all is happening. Is this normal on win98 machines?

                                                                                      Franken_Power
                                                                                      Avatar
                                                                                      Send message
                                                                                      Joined: Jan 5 07
                                                                                      Posts: 11
                                                                                      Credit: 304,316
                                                                                      RAC: 0
                                                                                      Message 2129 - Posted 20 Feb 2007 8:27:00 UTC - in response to Message 2127.


                                                                                        - Stuck workunits: We see from the results we got back, that the client version clearly has no influence, while multi-core processors do have about a 2 times higher probability of the workunits getting stuck. But since also single core machines can have this problem, it cannot be the only reason. We limited the fpops limit for the workunits, so they will terminate by themselves within reasonable time if they get stuck. If they don\'t, please cancel them after an hour or so.. they should in fact not take longer than 30 minutes.


                                                                                        This sounds like a carnival joke!
                                                                                        I had on my dual core pc since beginning of the mappredictor application one! wu, that was not stuck. A few moments ago i aborted the latest

                                                                                        20.02.2007 08:29:43|malariacontrol.net beta|Unrecoverable error for result mapwca0006143.txt_3 (aborted by user)

                                                                                        That wu was running overnight more than 13 hours.

                                                                                        Greetings
                                                                                        u.dgl.

                                                                                        If You don\'t like carnival jokes like this don\'t do beta projects.... :-)) I had a lot of this WU\'s on my AMD X2 5000+ and no errors....

                                                                                        Michael
                                                                                        Volunteer moderator
                                                                                        Project scientist
                                                                                        Send message
                                                                                        Joined: May 5 06
                                                                                        Posts: 79
                                                                                        Credit: 494
                                                                                        RAC: 0
                                                                                        Message 2133 - Posted 20 Feb 2007 10:23:03 UTC - in response to Message 2127.

                                                                                          Last modified: 20 Feb 2007 11:25:57 UTC


                                                                                          This sounds like a carnival joke!
                                                                                          I had on my dual core pc since beginning of the mappredictor application one! wu, that was not stuck. A few moments ago i aborted the latest



                                                                                          Was not meant as a joke;) If we look ACROSS the hosts, we find a double probability for multi-processor machines. On a given single host it may well be quiet the same every time..

                                                                                          If you experience errors repeatedly, please try to reset the project. This causes the application files to be downloaded again. We would very much appreciate feedback from people who repeatedly had errors and did a reset (regardless of how many cpu\'s the host has). Did it change anything?

                                                                                          thanks
                                                                                          Michael


                                                                                          ____________
                                                                                          Michael

                                                                                          KAMasud
                                                                                          Send message
                                                                                          Joined: Jan 7 07
                                                                                          Posts: 12
                                                                                          Credit: 18,733
                                                                                          RAC: 0
                                                                                          Message 2137 - Posted 20 Feb 2007 13:03:23 UTC


                                                                                            :-) Report time:-) my Prescott 3.06 is not handling them:-) my Celron 1.7 is not handling them as a matter of fact it has a WU stuck at the moment:-( will try it on my P4 2.0 and my P3 1.5 and also will reset the project on my Prescott and then let you chaps know:-)
                                                                                            Regards
                                                                                            Masud.
                                                                                            ____________

                                                                                            adrianxw
                                                                                            Avatar
                                                                                            Send message
                                                                                            Joined: Mar 8 06
                                                                                            Posts: 145
                                                                                            Credit: 512,868
                                                                                            RAC: 200
                                                                                            Message 2139 - Posted 20 Feb 2007 15:05:15 UTC

                                                                                              Last modified: 20 Feb 2007 15:07:04 UTC

                                                                                              We limited the fpops limit for the workunits, so they will terminate by themselves within reasonable time if they get stuck.

                                                                                              Can I ask what constitutes a reasonable time?

                                                                                              Why I ask is that my wu I described above ran for over 8 hours in a single instantiation without terminating. It was not time slicing either, BOINC was stuck crunching the MCDN wu, no other projects were seeing any CPU.

                                                                                              I have a couple of machines at a remote site that I don\'t visit everyday. If one or worse, both of these get a stuck unit, it might be a few days before I can visit the site to manually abort them.
                                                                                              ____________
                                                                                              Wave upon wave of demented avengers march cheerfully out of obscurity into the dream.

                                                                                              AnRM
                                                                                              Send message
                                                                                              Joined: Mar 7 06
                                                                                              Posts: 54
                                                                                              Credit: 2,130,571
                                                                                              RAC: 0
                                                                                              Message 2147 - Posted 20 Feb 2007 17:52:41 UTC - in response to Message 2137.


                                                                                                :-) Report time:-) my Prescott 3.06 is not handling them:-) my Celron 1.7 is not handling them as a matter of fact it has a WU stuck at the moment:-( will try it on my P4 2.0 and my P3 1.5 and also will reset the project on my Prescott and then let you chaps know:-)
                                                                                                Regards
                                                                                                Masud.

                                                                                                Masud, this must be very frustrating for you....we process about 400-500 MC WUs/day and have yet to see a stuck WU. We are running everything from Intel Celerons and AMD Durons to AMD64 X2 dual cores. Now these are 24/7, WinXP, \'blank\' screensaver, MC dedicated machines so they have few processing interruptions. I mention this only because I was wondering if excessive checkpointing, process interruptions or not leaving suspended WUs \'in memory\' could be causing this? I seem to recall that when Rosetta@Home started they had a similiar problem. Hope this helps....Rog.
                                                                                                ____________

                                                                                                adrianxw
                                                                                                Avatar
                                                                                                Send message
                                                                                                Joined: Mar 8 06
                                                                                                Posts: 145
                                                                                                Credit: 512,868
                                                                                                RAC: 200
                                                                                                Message 2148 - Posted 20 Feb 2007 19:18:28 UTC

                                                                                                  Last modified: 20 Feb 2007 19:18:51 UTC

                                                                                                  This is also happening on 24/7 leave in memory no graphics BOINC crunching only machines. That is exactly the kind of setup I had which stuck, what is more, the \"stick\" was repeatable, after 8.5 hours I stopped and started BOINC as that usually frees stuck wu\'s, I then suspended all other projects and let the same wu run again, it stuck again.
                                                                                                  ____________
                                                                                                  Wave upon wave of demented avengers march cheerfully out of obscurity into the dream.

                                                                                                  Profile Nightbird
                                                                                                  Send message
                                                                                                  Joined: Mar 7 06
                                                                                                  Posts: 110
                                                                                                  Credit: 395,345
                                                                                                  RAC: 0
                                                                                                  Message 2149 - Posted 20 Feb 2007 19:32:54 UTC - in response to Message 2129.


                                                                                                    - Stuck workunits: We see from the results we got back, that the client version clearly has no influence, while multi-core processors do have about a 2 times higher probability of the workunits getting stuck. But since also single core machines can have this problem, it cannot be the only reason. We limited the fpops limit for the workunits, so they will terminate by themselves within reasonable time if they get stuck. If they don\'t, please cancel them after an hour or so.. they should in fact not take longer than 30 minutes.


                                                                                                    This sounds like a carnival joke!
                                                                                                    I had on my dual core pc since beginning of the mappredictor application one! wu, that was not stuck. A few moments ago i aborted the latest

                                                                                                    20.02.2007 08:29:43|malariacontrol.net beta|Unrecoverable error for result mapwca0006143.txt_3 (aborted by user)

                                                                                                    That wu was running overnight more than 13 hours.

                                                                                                    Greetings
                                                                                                    u.dgl.

                                                                                                    If You don\'t like carnival jokes like this don\'t do beta projects.... :-)) I had a lot of this WU\'s on my AMD X2 5000+ and no errors....

                                                                                                    Same here on my Amd X2\'s, running fine
                                                                                                    Athlon64 X2 4400+ - XP home - CC 5.6.4
                                                                                                    Athlon64 X2 4600+ - Win2k Sp4 - CC 5.4.9

                                                                                                    ____________

                                                                                                    Do you want to get banned for 31 years and your account & credits deleted at a Boinc project ? Predictor@home is your best choice.

                                                                                                    Profile Nightbird
                                                                                                    Send message
                                                                                                    Joined: Mar 7 06
                                                                                                    Posts: 110
                                                                                                    Credit: 395,345
                                                                                                    RAC: 0
                                                                                                    Message 2150 - Posted 20 Feb 2007 19:37:11 UTC - in response to Message 2128.

                                                                                                      HomeGnome, yes you should have let it run for atleast 30minutes and maybe upto an hour before aborting it.


                                                                                                      OK, I\'ll do that. But what does the \"negative cpu\" message mean? Every second the predictor app is running I get this message. Other than that, nothing at all is happening. Is this normal on win98 machines?

                                                                                                      Same on a Millenuim machine
                                                                                                      http://www.malariacontrol.net/forum_thread.php?id=378

                                                                                                      ____________

                                                                                                      Do you want to get banned for 31 years and your account & credits deleted at a Boinc project ? Predictor@home is your best choice.

                                                                                                      wolfsong
                                                                                                      Send message
                                                                                                      Joined: Feb 11 07
                                                                                                      Posts: 4
                                                                                                      Credit: 7,048
                                                                                                      RAC: 0
                                                                                                      Message 2151 - Posted 20 Feb 2007 21:27:30 UTC

                                                                                                        I reset the project, made no difference. Both mappredict wu on the machine at the time failed to run and I aborted them. Normal units still working perfectly. No more mappredict units have downloaded to my machine since.

                                                                                                        Michael
                                                                                                        Volunteer moderator
                                                                                                        Project scientist
                                                                                                        Send message
                                                                                                        Joined: May 5 06
                                                                                                        Posts: 79
                                                                                                        Credit: 494
                                                                                                        RAC: 0
                                                                                                        Message 2155 - Posted 21 Feb 2007 14:57:54 UTC - in response to Message 2151.


                                                                                                          We have currently stopped sending out new workunits of the mappredictor application while we are trying to sort out this \"stuck\" problem. Our problem is, that this error is a bit difficult to reproduce here, coz we never get it :)
                                                                                                          But maybe if we keep asking the right questions, you might be able to help us:

                                                                                                          The new application consists of two processes, mappredictor_1.12... (the boinc wrapper application), and predictor_1.12.. (the science application). If your workunit gets stuck and you have a look at the task manager, are both processes still active, or only one? which one? (make sure only one workunit of this application is running, otherwise you might see something that belongs to another workunit).

                                                                                                          Thanks for reporting, so far your comments were very helpful to us.
                                                                                                          cheers
                                                                                                          Michael


                                                                                                          ____________
                                                                                                          Michael

                                                                                                          FreeLarry
                                                                                                          Send message
                                                                                                          Joined: Jun 21 06
                                                                                                          Posts: 5
                                                                                                          Credit: 3,126,336
                                                                                                          RAC: 0
                                                                                                          Message 2159 - Posted 22 Feb 2007 9:05:00 UTC - in response to Message 2155.


                                                                                                            We have currently stopped sending out new workunits of the mappredictor application while we are trying to sort out this \"stuck\" problem. Our problem is, that this error is a bit difficult to reproduce here, coz we never get it :)
                                                                                                            But maybe if we keep asking the right questions, you might be able to help us:

                                                                                                            The new application consists of two processes, mappredictor_1.12... (the boinc wrapper application), and predictor_1.12.. (the science application). If your workunit gets stuck and you have a look at the task manager, are both processes still active, or only one? which one? (make sure only one workunit of this application is running, otherwise you might see something that belongs to another workunit).

                                                                                                            Thanks for reporting, so far your comments were very helpful to us.
                                                                                                            cheers
                                                                                                            Michael



                                                                                                            mappredictor_1.12 - never saw this one running at any time in the task manager - it was ther just not using cpu
                                                                                                            predictor_1.12 - only process i ever saw running in task manager - would stay even after unit supposedly finished and reported

                                                                                                            Larry
                                                                                                            ____________

                                                                                                            FalconFly
                                                                                                            Avatar
                                                                                                            Send message
                                                                                                            Joined: Mar 7 06
                                                                                                            Posts: 92
                                                                                                            Credit: 5,517,713
                                                                                                            RAC: 0
                                                                                                            Message 2190 - Posted 24 Feb 2007 10:39:51 UTC - in response to Message 2159.

                                                                                                              Last modified: 24 Feb 2007 10:49:56 UTC

                                                                                                              Not sure if it helps, but from my perspective, some Systems seem much more prone to the \"Stuck WU\" Problem than others.

                                                                                                              On 24 Systems (22 Linux, 2 Win2000), I\'ve seen only 3 to ever have this Problem repeatedly.
                                                                                                              These were Linux Boxes (two Dual Core, one Single Core) running an optimized 5.2.13 BOINC until a few days (switched all to official 5.8.11 release)

                                                                                                              One System had the Problem far more often than the two others (upto twice a day at peak times) :
                                                                                                              Host 1598

                                                                                                              Still leaves me with no clue as to exactly why this was the one System most prone to have the Problem occur - it never had any Troubles with other Projects and did not show any abnormalities.

                                                                                                              It\'s a native 64bit Fedora Core 4 System, Terminal only (no GUI), some unneeded Services disabled and at Standard BIOS/Performance settings (no Tweaks or Overclock) - setup the same way I have all Linux Systems running.

                                                                                                              So far, after switching to 5.8.11, I did not witness any Stuck WorkUnits anymore, but it\'s proably too early to tell if that changed anything (had only 4.5 days, being a cumulative 2600 hours of V5.8.11 based CPU time so far)

                                                                                                              Only if I see the new BOINC Version running without a Stuck WorkUnits for let\'s say a month (some 17000 hours total CPU time), I\'d go as far as to suspect that the new BOINC Version (somehow) fixed that elusive error.
                                                                                                              ____________
                                                                                                              Scientific Network : 44800 MHz - 77824 MB - 1970 GB

                                                                                                              FalconFly
                                                                                                              Avatar
                                                                                                              Send message
                                                                                                              Joined: Mar 7 06
                                                                                                              Posts: 92
                                                                                                              Credit: 5,517,713
                                                                                                              RAC: 0
                                                                                                              Message 2207 - Posted 27 Feb 2007 21:56:45 UTC - in response to Message 2190.

                                                                                                                Last modified: 27 Feb 2007 22:03:16 UTC

                                                                                                                Seems my hopes were too soon.

                                                                                                                My Host 1598 catched a stuck one (WU_24_34_28669_0_497229148_2) again.

                                                                                                                The only odd thing is that it got stuck after 24m 10s instead of stalling right at the beginning (where I saw the most stuck ones so far).
                                                                                                                The task ran at 100% CPU load for over 100 Minutes, but basically made no progress - restarting BOINC helped as usual and the WorkUnit now will likely complete normal.




                                                                                                                ____________
                                                                                                                Scientific Network : 44800 MHz - 77824 MB - 1970 GB

                                                                                                                Michael
                                                                                                                Volunteer moderator
                                                                                                                Project scientist
                                                                                                                Send message
                                                                                                                Joined: May 5 06
                                                                                                                Posts: 79
                                                                                                                Credit: 494
                                                                                                                RAC: 0
                                                                                                                Message 2217 - Posted 1 Mar 2007 11:16:49 UTC

                                                                                                                  Dear users,
                                                                                                                  we have fixed the bug which was causing the workunits to get stuck. We are planning to release a new application version today and will start sending out new workunits..
                                                                                                                  ____________
                                                                                                                  Michael

                                                                                                                  Franken_Power
                                                                                                                  Avatar
                                                                                                                  Send message
                                                                                                                  Joined: Jan 5 07
                                                                                                                  Posts: 11
                                                                                                                  Credit: 304,316
                                                                                                                  RAC: 0
                                                                                                                  Message 2242 - Posted 5 Mar 2007 13:01:22 UTC - in response to Message 2159.


                                                                                                                    We have currently stopped sending out new workunits of the mappredictor application while we are trying to sort out this \"stuck\" problem.


                                                                                                                    Just received a new bunch of mapwc cands and all made an error.....

                                                                                                                    adrianxw
                                                                                                                    Avatar
                                                                                                                    Send message
                                                                                                                    Joined: Mar 8 06
                                                                                                                    Posts: 145
                                                                                                                    Credit: 512,868
                                                                                                                    RAC: 200
                                                                                                                    Message 2305 - Posted 7 Mar 2007 8:30:50 UTC

                                                                                                                      Last modified: 7 Mar 2007 9:00:05 UTC

                                                                                                                      we have fixed the bug which was causing the workunits to get stuck.

                                                                                                                      Sorry, but no you haven\'t!

                                                                                                                      This wu did the same as the other I reported, (this thread 18/2). It got stuck, and when it stuck, it would not release the CPU until it finally crashed. My machine, (not the same one as last time, was this one), has claimed 0.08 seconds of CPU time, but if you look at the message log for this machine, you can see that it actually grabbed the CPU at 17:52 yesterday and held it until it crashed, (or at least some event happened - see below), at 9:01 the next morning.

                                                                                                                      <core_client_version>5.8.11</core_client_version>
                                                                                                                      <![CDATA[
                                                                                                                      <message>
                                                                                                                      - exit code 1282 (0x502)
                                                                                                                      </message>
                                                                                                                      <stderr_txt>
                                                                                                                      o1
                                                                                                                      c1
                                                                                                                      app error: 0x502

                                                                                                                      entering rename_outfile
                                                                                                                      copying file to: ../../projects/www.malariacontrol.net/mapwca0044865.txt_1_0
                                                                                                                      error copying..2
                                                                                                                      copying file to: ../../projects/www.malariacontrol.net/mapwca0044865.txt_1_1
                                                                                                                      error copying..2
                                                                                                                      copying file to: ../../projects/www.malariacontrol.net/mapwca0044865.txt_1_2
                                                                                                                      error copying..2

                                                                                                                      </stderr_txt>
                                                                                                                      ]]>


                                                                                                                      The files it mentions in the above trace are not present in the target directory.

                                                                                                                      As you can see below, no other tasks ran in the interim, the only action was Proteins@Home trying, and failing to get in touch with it\'s home, and MCDN reporting. All this time, SIMAP, Docking@Home and Rosetta were sitting waiting. At 22:36, Proteins tried again, got a \"wait for 31 seconds\" then nothing. There are no further log entries until MCDN aborts. 9:01 is \"about\" the time I arrived on site here, I saw the \"task crashed send error report\" type messagebox and pressed OK, so it may have been me pressing OK that started things up again. If I had not been to the site today, as I frequently am not, it may have sat like that for days.

                                                                                                                      This event demonstrates that BOINC is not dead since the download/upload scheduling is still functioning for a while, it is, however, not swapping tasks. It is possible that MCDN actually crashed at 22:36 and that it posted the messagebox causing BOINC to wait. I don\'t know, I wasn\'t here.

                                                                                                                      I said before, I have machines at a site that I don\'t visit every day. This event occurred there. If you continue to send these wu\'s, I will have no choice but to suspend MCDN at that site.

                                                                                                                      06/03/2007 17:52:13|proteins@home|Computation for task b.36.1.2.0A-63-46_0 finished
                                                                                                                      06/03/2007 17:52:13|malariacontrol.net beta|Starting mapwca0044865.txt_1
                                                                                                                      06/03/2007 17:52:14|malariacontrol.net beta|Starting task mapwca0044865.txt_1 using mappredictor version 114
                                                                                                                      06/03/2007 17:52:16|proteins@home|[file_xfer] Started upload of file b.36.1.2.0A-63-46_0_0.zip
                                                                                                                      06/03/2007 17:52:33|proteins@home|[file_xfer] Finished upload of file b.36.1.2.0A-63-46_0_0.zip
                                                                                                                      06/03/2007 17:52:33|proteins@home|[file_xfer] Throughput 53197 bytes/sec
                                                                                                                      06/03/2007 19:50:32|malariacontrol.net beta|Sending scheduler request: To report completed tasks
                                                                                                                      06/03/2007 19:50:32|malariacontrol.net beta|Reporting 1 tasks
                                                                                                                      06/03/2007 19:50:37|malariacontrol.net beta|Scheduler RPC succeeded [server version 507]
                                                                                                                      06/03/2007 19:50:37|malariacontrol.net beta|Deferring communication for 11 sec
                                                                                                                      06/03/2007 19:50:37|malariacontrol.net beta|Reason: requested by project
                                                                                                                      06/03/2007 20:16:36|proteins@home|Sending scheduler request: To report completed tasks
                                                                                                                      06/03/2007 20:16:36|proteins@home|Reporting 1 tasks
                                                                                                                      06/03/2007 20:16:58||Project communication failed: attempting access to reference site
                                                                                                                      06/03/2007 20:17:00||Access to reference site succeeded - project servers may be temporarily down.
                                                                                                                      06/03/2007 20:17:02|proteins@home|Scheduler request failed: couldn\'t connect to server
                                                                                                                      06/03/2007 20:17:02|proteins@home|Deferring communication for 1 min 0 sec
                                                                                                                      06/03/2007 20:17:02|proteins@home|Reason: scheduler request failed
                                                                                                                      06/03/2007 20:18:02|proteins@home|Sending scheduler request: To report completed tasks
                                                                                                                      06/03/2007 20:18:02|proteins@home|Reporting 1 tasks
                                                                                                                      06/03/2007 20:18:24||Project communication failed: attempting access to reference site
                                                                                                                      06/03/2007 20:18:25||Access to reference site succeeded - project servers may be temporarily down.
                                                                                                                      06/03/2007 20:18:27|proteins@home|Scheduler request failed: couldn\'t connect to server
                                                                                                                      06/03/2007 20:18:27|proteins@home|Deferring communication for 1 min 0 sec
                                                                                                                      06/03/2007 20:18:27|proteins@home|Reason: scheduler request failed
                                                                                                                      06/03/2007 20:19:28|proteins@home|Sending scheduler request: To report completed tasks
                                                                                                                      06/03/2007 20:19:28|proteins@home|Reporting 1 tasks
                                                                                                                      06/03/2007 20:19:50||Project communication failed: attempting access to reference site
                                                                                                                      06/03/2007 20:19:51||Access to reference site succeeded - project servers may be temporarily down.
                                                                                                                      06/03/2007 20:19:53|proteins@home|Scheduler request failed: couldn\'t connect to server
                                                                                                                      06/03/2007 20:19:53|proteins@home|Deferring communication for 1 min 0 sec
                                                                                                                      06/03/2007 20:19:53|proteins@home|Reason: scheduler request failed
                                                                                                                      06/03/2007 20:20:53|proteins@home|Sending scheduler request: To report completed tasks
                                                                                                                      06/03/2007 20:20:53|proteins@home|Reporting 1 tasks
                                                                                                                      06/03/2007 20:21:14||Project communication failed: attempting access to reference site
                                                                                                                      06/03/2007 20:21:17||Access to reference site succeeded - project servers may be temporarily down.
                                                                                                                      06/03/2007 20:21:19|proteins@home|Scheduler request failed: couldn\'t connect to server
                                                                                                                      06/03/2007 20:21:19|proteins@home|Deferring communication for 1 min 0 sec
                                                                                                                      06/03/2007 20:21:19|proteins@home|Reason: scheduler request failed
                                                                                                                      06/03/2007 20:22:19|proteins@home|Sending scheduler request: To report completed tasks
                                                                                                                      06/03/2007 20:22:19|proteins@home|Reporting 1 tasks
                                                                                                                      06/03/2007 20:22:41||Project communication failed: attempting access to reference site
                                                                                                                      06/03/2007 20:22:42||Access to reference site succeeded - project servers may be temporarily down.
                                                                                                                      06/03/2007 20:22:44|proteins@home|Scheduler request failed: couldn\'t connect to server
                                                                                                                      06/03/2007 20:22:44|proteins@home|Deferring communication for 1 min 11 sec
                                                                                                                      06/03/2007 20:22:44|proteins@home|Reason: scheduler request failed
                                                                                                                      06/03/2007 20:24:00|proteins@home|Sending scheduler request: To report completed tasks
                                                                                                                      06/03/2007 20:24:00|proteins@home|Reporting 1 tasks
                                                                                                                      06/03/2007 20:24:22||Project communication failed: attempting access to reference site
                                                                                                                      06/03/2007 20:24:23||Access to reference site succeeded - project servers may be temporarily down.
                                                                                                                      06/03/2007 20:24:25|proteins@home|Scheduler request failed: couldn\'t connect to server
                                                                                                                      06/03/2007 20:24:25|proteins@home|Deferring communication for 4 min 9 sec
                                                                                                                      06/03/2007 20:24:25|proteins@home|Reason: scheduler request failed
                                                                                                                      06/03/2007 20:28:37|proteins@home|Sending scheduler request: To report completed tasks
                                                                                                                      06/03/2007 20:28:37|proteins@home|Reporting 1 tasks
                                                                                                                      06/03/2007 20:28:59||Project communication failed: attempting access to reference site
                                                                                                                      06/03/2007 20:29:02||Access to reference site succeeded - project servers may be temporarily down.
                                                                                                                      06/03/2007 20:29:02|proteins@home|Scheduler request failed: couldn\'t connect to server
                                                                                                                      06/03/2007 20:29:02|proteins@home|Deferring communication for 11 min 39 sec
                                                                                                                      06/03/2007 20:29:02|proteins@home|Reason: scheduler request failed
                                                                                                                      06/03/2007 20:40:46|proteins@home|Sending scheduler request: To report completed tasks
                                                                                                                      06/03/2007 20:40:46|proteins@home|Reporting 1 tasks
                                                                                                                      06/03/2007 20:41:08||Project communication failed: attempting access to reference site
                                                                                                                      06/03/2007 20:41:14|proteins@home|Scheduler request failed: couldn\'t connect to server
                                                                                                                      06/03/2007 20:41:14|proteins@home|Deferring communication for 14 min 21 sec
                                                                                                                      06/03/2007 20:41:14|proteins@home|Reason: scheduler request failed
                                                                                                                      06/03/2007 20:41:15||Access to reference site succeeded - project servers may be temporarily down.
                                                                                                                      06/03/2007 20:55:40|proteins@home|Sending scheduler request: To report completed tasks
                                                                                                                      06/03/2007 20:55:40|proteins@home|Reporting 1 tasks
                                                                                                                      06/03/2007 20:56:02||Project communication failed: attempting access to reference site
                                                                                                                      06/03/2007 20:56:04||Access to reference site succeeded - project servers may be temporarily down.
                                                                                                                      06/03/2007 20:56:06|proteins@home|Scheduler request failed: couldn\'t connect to server
                                                                                                                      06/03/2007 20:56:06|proteins@home|Deferring communication for 1 hr 39 min 57 sec
                                                                                                                      06/03/2007 20:56:06|proteins@home|Reason: scheduler request failed
                                                                                                                      06/03/2007 22:36:07|proteins@home|Sending scheduler request: To report completed tasks
                                                                                                                      06/03/2007 22:36:07|proteins@home|Reporting 1 tasks
                                                                                                                      06/03/2007 22:36:13|proteins@home|Scheduler RPC succeeded [server version 509]
                                                                                                                      06/03/2007 22:36:13|proteins@home|Deferring communication for 31 sec
                                                                                                                      06/03/2007 22:36:13|proteins@home|Reason: requested by project

                                                                                                                      07/03/2007 09:01:02|malariacontrol.net beta|Deferring communication for 1 min 0 sec
                                                                                                                      07/03/2007 09:01:02|malariacontrol.net beta|Reason: Unrecoverable error for result mapwca0044865.txt_1 ( - exit code 1282 (0x502))
                                                                                                                      07/03/2007 09:01:02|malariacontrol.net beta|Computation for task mapwca0044865.txt_1 finished
                                                                                                                      07/03/2007 09:01:02|malariacontrol.net beta|Output file mapwca0044865.txt_1_0 for task mapwca0044865.txt_1 absent
                                                                                                                      07/03/2007 09:01:02|malariacontrol.net beta|Output file mapwca0044865.txt_1_1 for task mapwca0044865.txt_1 absent
                                                                                                                      07/03/2007 09:01:02|malariacontrol.net beta|Output file mapwca0044865.txt_1_2 for task mapwca0044865.txt_1 absent
                                                                                                                      07/03/2007 09:01:02|boincsimap|Restarting task 70303001.015376_0 using hmmer version 509

                                                                                                                      ____________
                                                                                                                      Wave upon wave of demented avengers march cheerfully out of obscurity into the dream.

                                                                                                                      Profile maire
                                                                                                                      Volunteer moderator
                                                                                                                      Project administrator
                                                                                                                      Project developer
                                                                                                                      Project scientist
                                                                                                                      Send message
                                                                                                                      Joined: Nov 7 05
                                                                                                                      Posts: 439
                                                                                                                      Credit: 118,258
                                                                                                                      RAC: 0
                                                                                                                      Message 2378 - Posted 16 Mar 2007 17:24:34 UTC

                                                                                                                        Except for a few workunits that are being resent, you should be no more map predictor jobs. This app in its current version caused more trouble than we expected. We are currently working on a new version.

                                                                                                                        We won\'t start sending new workunits for this application without prior warning. After successful in-house testing, we will first start distributing them on an opt-in basis to those users who agree to receive beta application work (beta in beta in our case...). Those workunits that have been sent back already have been used to create a preliminary map.
                                                                                                                        Nick


                                                                                                                        ____________
                                                                                                                        Nicolas Maire
                                                                                                                        Swiss Tropical and Public Health Institute
                                                                                                                        http://www.swisstph.ch

                                                                                                                        adrianxw
                                                                                                                        Avatar
                                                                                                                        Send message
                                                                                                                        Joined: Mar 8 06
                                                                                                                        Posts: 145
                                                                                                                        Credit: 512,868
                                                                                                                        RAC: 200
                                                                                                                        Message 2379 - Posted 16 Mar 2007 17:49:40 UTC

                                                                                                                          Cheers Nick. I\'m quite happy to run it here, but at my remote site it is risky. I\'m sure you understand.
                                                                                                                          ____________
                                                                                                                          Wave upon wave of demented avengers march cheerfully out of obscurity into the dream.

                                                                                                                          j2satx
                                                                                                                          Send message
                                                                                                                          Joined: Jan 4 07
                                                                                                                          Posts: 12
                                                                                                                          Credit: 2,124,846
                                                                                                                          RAC: 7
                                                                                                                          Message 2450 - Posted 30 Mar 2007 13:29:12 UTC - in response to Message 2378.

                                                                                                                            Except for a few workunits that are being resent, you should be no more map predictor jobs. This app in its current version caused more trouble than we expected. We are currently working on a new version.

                                                                                                                            We won\'t start sending new workunits for this application without prior warning. After successful in-house testing, we will first start distributing them on an opt-in basis to those users who agree to receive beta application work (beta in beta in our case...). Those workunits that have been sent back already have been used to create a preliminary map.
                                                                                                                            Nick



                                                                                                                            How does one \"opt-in\" to receive test WUs?

                                                                                                                            Ken Vogt
                                                                                                                            Avatar
                                                                                                                            Send message
                                                                                                                            Joined: Mar 7 06
                                                                                                                            Posts: 3
                                                                                                                            Credit: 65,618
                                                                                                                            RAC: 0
                                                                                                                            Message 2466 - Posted 1 Apr 2007 13:51:53 UTC - in response to Message 2450.

                                                                                                                              How does one \"opt-in\" to receive test WUs?

                                                                                                                              Hi j2satx, see this thread.
                                                                                                                              ____________
                                                                                                                              Ken

                                                                                                                              Post to thread

                                                                                                                              Message boards : Malaria Control : A second science application for malariacontrol.net


                                                                                                                              Return to malariacontrol.net main page


                                                                                                                              Copyright © 2013 africa@home