BOINC V5.8.x Computer Info/Listings Bug

Message boards : Unix/Linux : BOINC V5.8.x Computer Info/Listings Bug

Author Message
FalconFly
Avatar
Send message
Joined: Mar 7 06
Posts: 92
Credit: 5,517,713
RAC: 0

I recently switched all my Systems to V5.8.11, later a few to 5.8.15 and noted the following :

- Computer Listings now include lots of additional CPU Information (while useful for BOINC itself to have it, does it really belong here ?)

Differences across Operating Systems :

Windows2000 SP4
AuthenticAMD
AMD Athlon(tm) XP 3000+ [x86 Family 6 Model 10 Stepping 0] [fpu tsc sse 3dnow mmx]

Linux (2.4.x upto 2.6.15 Kernel, various Distributions)
AuthenticAMD
AMD Athlon(tm) XP 3000+ [fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 mmx fxsr sse syscall mmxext 3dnowext 3dnow]

Linux (2.6.18 Kernel - Fedora Core 6)
AuthenticAMD

----------------------
Windows Systems list Family/Model/Stepping, but only a few CPU Extensions.

Linux Systems upto 2.6.15 Kernel list only the CPU Extensions, but there it lists just about... everything.

Linux Systems with 2.6.18 Kernel however, show only CPU Manufacturer and that\'s it (?)
(have this on two separate Systems running FC6)

Note :
BOINCview V1.4.2b which oversees the entire Network sees the CPU Identifiction correctly on the 2.6.18 Systems, so BOINC 5.8 is reading/using it there - it just fails to report it into the Computer listing.
-----------------
Apart from the additional Info completely clogging up the Computer Listing pages, I believe this is a Bug worth looking into.

And while doing, BOINC Devs could further improve Linux BOINC Benchmark results by about 14% to match their Win32 counterpart (it\'s beyond me why they still fail to achieve within 5% results across Operating Systems after more than 2 years).

----------------
After reading responses after posting this at EAH as well, I was pointed out that using BOINC 5.8 on a later 2.6.19 Kernel apparently does not exhibit this problem.
____________
Scientific Network : 44800 MHz - 77824 MB - 1970 GB

FalconFly
Avatar
Send message
Joined: Mar 7 06
Posts: 92
Credit: 5,517,713
RAC: 0

Update :

I used the internal Update mechanism of Fedora Core (yum) to see if that helps.

But even after upgrading everything (now running 2.6.20 Kernel), BOINC 5.8.15 still sees only the CPU Vendor but nothing else.

Any hints as to what could prevent BOINC from correctly reading out CPU Model and Extensions on Fedora Core 6 ?
____________
Scientific Network : 44800 MHz - 77824 MB - 1970 GB

Chris Sutton
Send message
Joined: Nov 10 05
Posts: 297
Credit: 4,941,683
RAC: 0

Any hints as to what could prevent BOINC from correctly reading out CPU Model and Extensions on Fedora Core 6 ?

I may be way off base here, so please treat with ample salt...

Doesn\'t BOINC read the cpu stuff from the /proc area? Has the 2.6.18 kernel perhaps moved it somewhere else? Or maybe the permissions have changed such that the BOINC process owner can\'t read the necessary /proc area anymore? (Troubleshooting permissions can usually be easily identified by running once off as root and observing the outcome)

Just some ideas off the top of my head.

FalconFly
Avatar
Send message
Joined: Mar 7 06
Posts: 92
Credit: 5,517,713
RAC: 0

I\'ve checked and compared the client_state.xml which holds the data.

Odd enough, it is absolutely correct on the affected Systems and compares 1:1 to other (not affected) Systems.

So basically the Data is all there, but it just gets lost somewhere on the way into the Project\'s Computer Listing.
____________
Scientific Network : 44800 MHz - 77824 MB - 1970 GB

Chris Sutton
Send message
Joined: Nov 10 05
Posts: 297
Credit: 4,941,683
RAC: 0

So basically the Data is all there, but it just gets lost somewhere on the way into the Project\'s Computer Listing.

Aah, ok.
Are there any funny characters after the CPU vendor (maybe localisation or > & < chars) that could \"break\" the XML being sent back to the project in a scheduler response?
[edit]
Or maybe the closing xml tag is missing, or invalid?
[/edit]

FalconFly
Avatar
Send message
Joined: Mar 7 06
Posts: 92
Credit: 5,517,713
RAC: 0

Now that\'s the odd thing.

The lines <p_vendor> and <p_model> on the affected machines are absolutely correct, the structure is identical with nothing abnormal (compared to my other machines) as far as I can see.

That\'s why I have no Idea why only CPU Vendor makes it into the Listings of all Projects I\'m attached to (Malaria, LHC, EAH).

I left a note on the BOINC MessageBoards but so far no solution is in sight.

There\'s nothing different on these Fedora Core 6 Installations compared to my other Linux installs. And since BOINC works just fine on them in all other respects, I\'m out of ideas.

Affected machines :

<host_info>
<timezone>3600</timezone>
<domain_name>Two</domain_name>
<ip_addr>127.0.0.1</ip_addr>
<host_cpid>32151634b18184a67c64ff9696a92100</host_cpid>
<p_ncpus>2</p_ncpus>
<p_vendor>AuthenticAMD</p_vendor>
<p_model>AMD Athlon(tm) 64 X2 Dual Core Processor 4000+ [fpu vme de pse tsc
msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 h
t syscall nx mmxext fxsr_opt rdtscp lm 3dnowext 3dnow pni cx16 lahf_lm cmp_legac
y svm cr8_legacy]</p_model>
<p_fpops>1821726573.128186</p_fpops>
<p_iops>3041526558.994365</p_iops>
<p_membw>1000000000.000000</p_membw>
<p_calculated>1174000914.683204</p_calculated>
<m_nbytes>1049313280.000000</m_nbytes>
<m_cache>524288.000000</m_cache>
<m_swap>2080366592.000000</m_swap>
<d_total>116912492544.000000</d_total>
<d_free>109149900800.000000</d_free>
<os_name>Linux</os_name>
<os_version>2.6.20-1.2925.fc6</os_version>
<accelerators>S3 Vision864</accelerators>
</host_info>

====================================

<host_info>
<timezone>3600</timezone>
<domain_name>Seven</domain_name>
<ip_addr>192.168.10.7</ip_addr>
<host_cpid>3a9f5c095eb3862651f89b2d942f8534</host_cpid>
<p_ncpus>2</p_ncpus>
<p_vendor>AuthenticAMD</p_vendor>
<p_model>AMD Athlon(tm) 64 X2 Dual Core Processor 3600+ [fpu vme de pse tsc
msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 h
t syscall nx mmxext fxsr_opt rdtscp lm 3dnowext 3dnow pni cx16 lahf_lm cmp_legac
y svm cr8_legacy]</p_model>
<p_fpops>1737633334.006470</p_fpops>
<p_iops>2910957752.381352</p_iops>
<p_membw>1000000000.000000</p_membw>
<p_calculated>1173977516.557949</p_calculated>
<m_nbytes>1049313280.000000</m_nbytes>
<m_cache>262144.000000</m_cache>
<m_swap>2080366592.000000</m_swap>
<d_total>75341406208.000000</d_total>
<d_free>69706780672.000000</d_free>
<os_name>Linux</os_name>
<os_version>2.6.20-1.2925.fc6</os_version>
<accelerators></accelerators>
</host_info>

====================================

<host_info>
<timezone>3600</timezone>
<domain_name>Eleven</domain_name>
<ip_addr>127.0.0.1</ip_addr>
<host_cpid>95a30a10f2bd2687c858c9bc6d009386</host_cpid>
<p_ncpus>2</p_ncpus>
<p_vendor>AuthenticAMD</p_vendor>
<p_model>AMD Athlon(tm) 64 X2 Dual Core Processor 3600+ [fpu vme de pse tsc
msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 h
t syscall nx mmxext fxsr_opt rdtscp lm 3dnowext 3dnow pni cx16 lahf_lm cmp_legac
y svm cr8_legacy]</p_model>
<p_fpops>1732994026.022534</p_fpops>
<p_iops>2971513138.927971</p_iops>
<p_membw>1000000000.000000</p_membw>
<p_calculated>1173977649.249337</p_calculated>
<m_nbytes>1049247744.000000</m_nbytes>
<m_cache>262144.000000</m_cache>
<m_swap>2080366592.000000</m_swap>
<d_total>7638237184.000000</d_total>
<d_free>5561090048.000000</d_free>
<os_name>Linux</os_name>
<os_version>2.6.18-1.2798.fc6</os_version>
<accelerators></accelerators>
</host_info>

====================================

Other machine not affected :

<host_info>
<timezone>3600</timezone>
<domain_name>Four</domain_name>
<ip_addr>127.0.0.1</ip_addr>
<host_cpid>c41133bc4cf3ae79b0a0f4566d149fcc</host_cpid>
<p_ncpus>2</p_ncpus>
<p_vendor>AuthenticAMD</p_vendor>
<p_model>AMD Athlon(tm) 64 X2 Dual Core Processor 4600+ [fpu vme de pse tsc
msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 h
t syscall nx mmxext fxsr_opt lm 3dnowext 3dnow pni cx16 lahf_lm cmp_legacy svm c
r8_legacy]</p_model>
<p_fpops>2063037007.106441</p_fpops>
<p_iops>3499431052.433551</p_iops>
<p_membw>1000000000.000000</p_membw>
<p_calculated>1173977680.415684</p_calculated>
<m_nbytes>1049890816.000000</m_nbytes>
<m_cache>524288.000000</m_cache>
<m_swap>2080366592.000000</m_swap>
<d_total>75374960640.000000</d_total>
<d_free>43962748928.000000</d_free>
<os_name>Linux</os_name>
<os_version>2.6.15-1.2054_FC5</os_version>
<accelerators></accelerators>
</host_info>
____________
Scientific Network : 44800 MHz - 77824 MB - 1970 GB

Chris Sutton
Send message
Joined: Nov 10 05
Posts: 297
Credit: 4,941,683
RAC: 0

I left a note on the BOINC MessageBoards but so far no solution is in sight.

The host table in boinc schema only allows 254 chars for storing the model data. The values that you\'ve posted are pretty close to that upper limit. I suspect that somewhere along the line the limit is being exceeded and thus losing the closing tag.

In the other thread, suggest that the devs look into this possibility.
Someone will need to trace the field from start to finish, i.e. from reading the value in client_state.xml through to storing it in the db, and displaying it in the php page.

I\'d volunteer, but I haven\'t looked at the source for about a year, so it would probably take me ages...

I\'d be very surprised if they didn\'t find the problem somewhere there. ;-)

[edit]
As a quick test, you might try cutting a chunk out of the field in one of your affected hosts, and observing whether or not it reflects...
Ok, scratch that. It seems to get overwritten at startup and the file can\'t be edited while boinc is running. No quick test option there, sorry.

Ok, digging deeper, David may have already spotted and corrected this partially.
From the checkin_notes of 10 March:
David 10 Mar 2007 - scheduler: use 1024-char buffer for parsing (handle large CPU model strings) sched/ server_types.C


But I still suspect that the size of the p_model field in the host table needs to be increased, unless it\'s already been adressed by a schema update script.
Currently the create code for the host table looks like this:
create table host (
id integer not null auto_increment,
create_time integer not null,
...
<snip>
...
p_ncpus integer not null,
p_vendor varchar(254),
p_model varchar(254),
p_fpops double not null,
p_iops double not null,
...
<snip>
...
host_cpid varchar(254),
external_ip_addr varchar(254),
max_results_day integer not null,

primary key (id)
) type=InnoDB;


And also the HOST structure in boinc_db.h:
struct HOST {
int id;
int create_time;
...
<snip>
...
int p_ncpus; // Number of CPUs on host
char p_vendor[256]; // Vendor name of CPU
char p_model[256]; // Model of CPU
double p_fpops; // measured floating point ops/sec of CPU
double p_iops; // measured integer ops/sec of CPU
...
<snip>
...
char host_cpid[256]; // host cross-project ID
char external_ip_addr[256]; // IP address seen by scheduler
int max_results_day; // maximum # of results to send per day per CPU
// this is dynamically adjusted to limit work sent to bad hosts

// the following not stored in DB
//
double claimed_credit_per_cpu_sec;

int parse(FILE*);
int parse_time_stats(FILE*);
int parse_net_stats(FILE*);
int parse_disk_usage(FILE*);
void fix_nans();
void clear();
};


[/edit]

FalconFly
Avatar
Send message
Joined: Mar 7 06
Posts: 92
Credit: 5,517,713
RAC: 0

That sounds promising.

As CPU caps are increasing with every generation, the 255 chars is \'on the edge\' indeed.

I hope the 1024 Char fix is implemented soon (unfortunately they never release minor Version Details in the Release Pages, only the generic \"What\'s new in 5.8\"), so I\'ll test every new Release Version until I see it fixed.
____________
Scientific Network : 44800 MHz - 77824 MB - 1970 GB

Chris Sutton
Send message
Joined: Nov 10 05
Posts: 297
Credit: 4,941,683
RAC: 0

As CPU caps are increasing with every generation, the 255 chars is \'on the edge\' indeed.

Like many others, I think there\'s too much data there already and the cpu flags are deserving of their own field(s).

Time will tell which way the development on this proceeds.

I hope the 1024 Char fix is implemented soon (unfortunately they never release minor Version Details in the Release Pages, only the generic \"What\'s new in 5.8\"), so I\'ll test every new Release Version until I see it fixed.

Well, if this is indeed the fix, then it looks like a scheduler change and as such, implementation would be more dependant on the project admins than the client builders. It\'s been my experience that EAH and SAH are the projects that usually have server sides closest to the development releases, so are likely the ones to implement the change first. If feasible, then attach to these and try update around once a week to see if the data gets through, though your best bet would be to communicate with the project admins and request that they advise you when the change gets implemented. Of course, that\'s much easier said than done. :)

Chris Sutton
Send message
Joined: Nov 10 05
Posts: 297
Credit: 4,941,683
RAC: 0

A quick check through the BOINC checkin notes reveals the following:

David 19 Mar 2007
- removed [features] from p_model;
move it to a separate field, p_features,
which is stored on the client and sent to server
but not stored in server DB.

- fix gcc 4.x warnings
- user web: change of app version list XML

client/
hostinfo_unix.C
hostinfo_win.C (new)
main.C
win/
hostinfo_win.cpp (removed)
html/user/
apps.php
lib/
filesys.C
hostinfo.C,h


Should solve your problem, so any client built after this date ought to reflect the changes....

I don\'t have a box with such model features, so let us know if there\'s any improvement. :)

FalconFly
Avatar
Send message
Joined: Mar 7 06
Posts: 92
Credit: 5,517,713
RAC: 0

Very nice, sounds like the Devs acknowledged and fixed the Problem.

I\'ll have a lookout and will do one test installation of the 5.8.17 Release Candidate on one of the affected machines today (maybe I can snatch a post-19 Mar build).
____________
Scientific Network : 44800 MHz - 77824 MB - 1970 GB

Post to thread

Message boards : Unix/Linux : BOINC V5.8.x Computer Info/Listings Bug


Return to malariacontrol.net main page


Copyright © 2013 africa@home