Slow performance, Possible invalid status- Suggestions?

Message boards : Unix/Linux : Slow performance, Possible invalid status- Suggestions?

Author Message
jay_e
Send message
Joined: Oct 13 09
Posts: 39
Credit: 396,212
RAC: 361

Greetings,
A low priority request....
I have a WU that is taking a lot of time and is making slow progress - according to BOINC's time to complete..
Here is what BOINC shows at 12:57PM EDT
CPU Time: 09:29:44
Progress: 65.39%
To Complete: 04:10:38

After 90 minutes, BOINC shows:
CPU Time: 10:46:30
Progress: 70.49
To Complete: 3:4927
--


The WU: https://malariacontrol.net/workunit.php?wuid=24117818
--


The PC environment:
I have an old PC that has been reloaded with Debian Linux and has been put to work on BOINC.
Usually, I only run SETI and Malaria control on this PC.
Here is what BOINC says about it:
Libraries: libcurl/7.18.2 OpenSSL/0.9.8g zlib/1.2.3.3 libidn/1.8 libssh2/0.18
Data directory: /var/lib/boinc-client
Processor: 1 GenuineIntel Intel(R) Pentium(R) 4 CPU 1.60GHz [Family 15 Model 1 Stepping 2]
Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm up pebs bts
OS: Linux: 2.6.26-2-686
Memory: 630.56 MB physical, 4.66 GB virtual
Disk: 55.01 GB total, 49.45 GB free
--


Utilization Utilities
While the MC application is running, the system monitor says that about 62% of the physical memory is in use,
so, enough memory should not be a problem.(When SETI is running the entire system uses 23%.)

ps -fly -p 10557
S UID PID PPID C PRI NI RSS SZ WCHAN STIME TTY TIME CMD
R boinc 10557 2038 88 99 - 319056 83672 - 12:56 ? 01:26:20 openMalariaB_6.46_i686-pc-linux-gnu --compress-checkpoints=1
--


Logs - Looking for errors..
/var/lib/boinc-client/slots
This message repeats 7 times:
Warning: ../../projects/malariacontrol.net/wu_760_520_225041_0_1286930896 uses a
potentially incompatible old schema version (20; current is 21).
Use SchemaTranslator to update.
Each message is followed with a short text. Here they are:
9120 RC
12996 RC
16028 RC
18753 RC
19323 RC
21291 RC
---
No related errors is syslog or messages,
---------------

Summary
On another, newer, faster machine, I had a WU marked invalid that had the same error message.

Question
========
Should I go ahead and abort this WU, or let it continue?
Any suggestions on how to avoid?

THANKS!!!,
Jay


PS
I'm using the version of BOINC that comes with the Debian distribution.
I tried the latest - but had other problems; so, I reinstalled from the Debian distribution.

PPS
Here is what BOINC downloaded for this WU:
Wed 13 Oct 2010 03:33:35 AM EDT|malariacontrol.net|Scheduler request succeeded: got 1 new tasks
Wed 13 Oct 2010 03:33:35 AM EDT|malariacontrol.net|New computer location: work
Wed 13 Oct 2010 03:33:38 AM EDT|malariacontrol.net|Started download of openMalariaB_6.46_i686-pc-linux-gnu
Wed 13 Oct 2010 03:33:38 AM EDT|malariacontrol.net|Started download of wu_760_520_225041_0_1286930896
Wed 13 Oct 2010 03:33:41 AM EDT|malariacontrol.net|Finished download of wu_760_520_225041_0_1286930896
Wed 13 Oct 2010 03:33:41 AM EDT|malariacontrol.net|Started download of densities.csv
Wed 13 Oct 2010 03:33:42 AM EDT|malariacontrol.net|Finished download of densities.csv
Wed 13 Oct 2010 03:33:42 AM EDT|malariacontrol.net|Started download of scenario_20.xsd
Wed 13 Oct 2010 03:33:44 AM EDT|malariacontrol.net|Finished download of scenario_20.xsd
Wed 13 Oct 2010 03:35:02 AM EDT|malariacontrol.net|Finished download of openMalariaB_6.46_i686-pc-linux-gnu




jay_e
Send message
Joined: Oct 13 09
Posts: 39
Credit: 396,212
RAC: 361

Here is an update... At 5:30 EDT, the Boinc "To Completion" shows:
CPU Time: 11:48:49
Progress: 74.642%
To Complete: 03:28:30

So, in approx. 4.5 hours wall-clock time, the "To Complete" time has decreased
by 40 minutes.

also /var/lib/boinc-client/stderrdae.txt
Contains "SIGPIPE: write on a pipe with no reader" 82 times.
This may be because I closed the BOINC gui when not in use??

/var/lib/boinc-client/slots/0/stderr.txt has no new errors.


I read that I can ignore the old schema warning...
https://malariacontrol.net/forum_thread.php?id=1049&nowrap=true#13971

Thanks in advance,
Jay

hardy
Volunteer moderator
Project administrator
Project developer
Avatar
Send message
Joined: Feb 18 09
Posts: 141
Credit: 54,376
RAC: 129

Hi Jay

We have already received two results back for this work unit, though they don't agree (hence the third being sent out). However, they both have a CPU time of about 8500 (2+1/3 hours) — granted, the CPUs are faster than yours, but yours should probably still not take 16 hours or so.

I noticed that your machine has only 630 MB of RAM; unfortunately our openmalaria app can require over 400 MB to itself at times. I expect this could be the problem even though your resource monitor suggests otherwise (measuring real memory usage isn't very straightforward). We are intending on reducing the memory requirements with an update at some point, but I can't say when exactly this will happen, so I think the best I can advise you is to switch to another BOINC project requiring less memory!

The BOINC client in Debian squeeze is working fine for me by the way.

jay_e
Send message
Joined: Oct 13 09
Posts: 39
Credit: 396,212
RAC: 361

Hardy,
Thank you for the info!!
I'll resume the WU....

For fun, I tried doing a vmstat under different loads...
There's nothing like confusing the issue with more data. :-)
Idle(mostly)
=============

vmstat 10 4 procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 1 233096 0 0 3 1 0 0 3 233096 0 0 0 9 11 102 3 1 97 0 0 233096 0 0 0 2 9 104 3 1 97 0 0 233096 0 0 0 1 10 104 3 0 96 0 Running a SETI WU ================== procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 2 233400 0 0 3 1 0 0 1 233400 0 0 1 0 0 1 233400 0 0 0 8 10 130 100 0 0 0 1 233400 0 0 0 1 9 126 100 0 0 0 1 233400 0 0 0 2 9 125 100 0 0 0 Idle, then resuming the MC WU ============================== vmstat 10 10 procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 2 233392 0 0 3 1 0 0 4 233392 0 0 0 10 10 104 3 1 96 0 0 233392 0 0 0 13 12 101 3 1 96 0 2 233392 0 0 0 2 10 107 5 1 94 0 1 234828 0 0 144 27 59 1077 59 5 36 0 1 44 234956 0 0 13 2 9 129 96 4 0 0 1 60 234956 0 0 1 0 0 1 68 234984 0 0 1 0 0 1 0 44344 7964 69568 234984 0 0 0 1 10 133 99 1 0 0 1 0 44344 7336 68080 233988 0 0 1 2 9 131 99 1 0 0 After running th MC WU for 5 minutes.. ======================================= vmstat 10 10 procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 2 167584 0 0 3 1 0 0 1 167584 0 0 0 9 10 148 99 1 0 0 1 167584 0 0 0 1 10 142 99 1 0 0 1 167584 0 0 0 2 10 143 100 0 0 0 1 167584 0 0 0 4 9 143 99 1 0 0 1 167584 0 0 0 2 10 138 99 1 0 0 1 167584 0 0 0 2 8 139 99 1 0 0 1 167584 0 0 0 2 9 141 99 1 0 0 1 168612 0 0 0 3 10 141 99 1 0 0 1 6 193156 0 0 0 1565 19 162 97 3 0 0


Yes, it looks like I'm just squeeking by with memory and avoiding swapping and disk-thrashing. I'll take a look for an older-style memory card.

Overkill - WU and Update and safe-upgrade of OS and apps ======================================================== vmstat 10 10 procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 2 0 68408 7724 3216 141700 0 0 3 1 0 0 1 0 71568 7228 2980 151712 1 421 767 3099 78 337 95 5 0 0 1 0 80780 6444 2864 160984 0 1000 0 3138 38 220 96 4 0 0 1 1 94000 7500 1824 155760 0 1260 560 3361 40 250 91 4 4 1 1 0 147112 312 50 4636 120 127 421 95 5 0 0 1 1 94992 7488 2164 141888 0 120 515 192 41 269 97 3 0 0 2 2 14335 28 222 97 3 0 0 1 2 143408 3 0 1 0 0 1 8 143408 0 0 0 2 9 144 99 1 0 0 1 6 143408 3 0 0 1 0 0

Thanks again!!
Jay

PS
Procs r: The number of processes waiting for run time. b: The number of processes in uninterruptible sleep. Memory swpd: the amount of virtual memory used. free: the amount of idle memory. buff: the amount of memory used as buffers. cache: the amount of memory used as cache. Swap si: Amount of memory swapped in from disk (/s). so: Amount of memory swapped to disk (/s). IO bi: Blocks received from a block device (blocks/s). bo: Blocks sent to a block device (blocks/s). System in: The number of interrupts per second, including the clock. cs: The number of context switches per second. CPU These are percentages of total CPU time. us: Time spent running non-kernel code. (user time, including nice time) sy: Time spent running kernel code. (system time) id: Time spent idle. Prior to Linux 2.5.41, this includes IO-wait time. wa: Time spent waiting for IO. Prior to Linux 2.5.41, included in idle. st: Time stolen from a virtual machine. Prior to Linux 2.6.11, unknown.

Profile mikey
Avatar
Send message
Joined: Mar 23 07
Posts: 4382
Credit: 5,361,193
RAC: 1,084

Greetings,
A low priority request....
I have a WU that is taking a lot of time and is making slow progress - according to BOINC's time to complete..
Here is what BOINC shows at 12:57PM EDT
CPU Time: 09:29:44
Progress: 65.39%
To Complete: 04:10:38

The PC environment:
I have an old PC that has been reloaded with Debian Linux and has been put to work on BOINC.
Usually, I only run SETI and Malaria control on this PC.
Here is what BOINC says about it:
Libraries: libcurl/7.18.2 OpenSSL/0.9.8g zlib/1.2.3.3 libidn/1.8 libssh2/0.18
Data directory: /var/lib/boinc-client
Processor: 1 GenuineIntel Intel(R) Pentium(R) 4 CPU 1.60GHz [Family 15 Model 1 Stepping 2]
Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm up pebs bts
OS: Linux: 2.6.26-2-686
Memory: 630.56 MB physical, 4.66 GB virtual
Disk: 55.01 GB total, 49.45 GB free

Question
========
Should I go ahead and abort this WU, or let it continue?
Any suggestions on how to avoid?
THANKS!!!,
Jay


You need to add more memory to that pc! Each Malaria unit is taking about 500meg of ram and that is about all you have! So each unit is swapping to the hard drive making it take FOREVER!!! The ideal unit will not only fit into memory but into the L2 cache as well, but if it won't fit into the memory, you have to remember the OS needs some too, it uses the hard drive as virtual memory which is horribly slow, by comparison!!

jay_e
Send message
Joined: Oct 13 09
Posts: 39
Credit: 396,212
RAC: 361

Greetings!!

Well, the disk activity light was not blinking and the Swap-in and swap-out was zero while the WU was running.

(See VMSTAT data on previous message...)

But I did do an "Overkill" scenario by doing a software update while the WU was running to verify that the disk activity light and the SI an SO were reliable.

And yes, a trip to the part store is in order to get 500 meg or a gig of ram.

And yes the WU did finish with the elapsed time being about three times as much as estimated.. But it may take a while for the run-time-estimator to get a good average for the "B" application.

Thanks for your input!

Jay

Post to thread

Message boards : Unix/Linux : Slow performance, Possible invalid status- Suggestions?


Return to malariacontrol.net main page


Copyright © 2013 africa@home