Discuss SMP problems Windoze in the Dev Folding forum on Dev Hardware. SMP problems Windoze Dev Folding forum for discussing Dev Hardware’s folding@home team. The Dev Folding team contributes spare processor cycles to Stanford's research team, helping to find cures for disease. Join us to help science, help medicine, and help our team.
Posts: 412
Time spent in forums: 2 Days 14 h 13 m 44 sec
Reputation Power: 2017
SMP problems Windoze
OK since Stanford is no help, I figure I will post this here to see if I can get some help. Pasting most of the info that I posted on the foldingforums:
Originally they would run one full run and then error out when it started the second... THEN they wouldn't even start up.
Exact same problems on 2 completely different and 100% stable core2 systems here. The deino client will not work either per the instructions on the stanford website or forums.
1st system:
E6600 on GA-EP25-DS3P with 2GB of DDR2-1066
no overclocks
2nd system:
E2140 on GA-P965-DS3 with 2GB of DDR2-800
no overclocks
I have been a long time folder (since the Pentium 3 days) and was a beta member on the old Stanford forums. As a long time computer and hardware tech, I know both of these systems are 100% stable and nothing is changing in my network, on or off the PC. These errors started popping up the same day that Stanford had the server problems so I am curious if it is handing out corrupted fahcore_a1 or other files. I can guarantee you 100% that this is nothing on my side, regardless of which error it gives. Especially since it happens with the 5.91 and 5.92 clients as well (I just tested them on both systems and same types of errors (client-core communications error).
Can't be my memory, both system have been tested (did all that before I came here to post) with multiple memory testing programs. The E6600 has Mushkin DDR2-1066 at 2.1V (default speed and EPP voltage) which is exactly how it is set in the BIOS. The E2140 system has GSkill DDR2-800 at 1.8V, default speed and voltage there as well. Both systems are XP Pro, have Comodo firewall with proper firewall exceptions for anything FAH related. Made the appropriate exceptions in the firewall, completely uninstalled it, no difference.
this is the log that is the same no matter what I use, mpi or deino:
when I try to start the mpiexec/MPICH service manually in services.msc:
I tried 6.22 and 6.22beta2R3
Error code -1
the MPICH service refuses to start, even after restarting the computer
I tried 5.91
Error code -1
the MPICH service refuses to start, even after restarting the computer
I tried 6.22 Deino
Error 63 <99>
Deino is running and active in the services but still get the error
I tried 5.92 Deino
Error 63 <99>
Deino is running and active in the services but still get the error
So something along the lines from Stanford is messed up as this is the exact same problem on 2 completely different computers with different hardware configurations. The only similarities are Intel Core2 CPUs (E6600 and E2140), Gigabyte motherboards (EP35-DS3P and P965-DS3), and WinXP (one Pro, one Home). I have looked through my services to see if there may be something interfering... nothing found.
Now with my E6600 system, I am running under Kubuntu using the v6.20 client with SMP flag and it is running perfectly... about 2 hours from the time of this post before it finishes its first SMP WU.
__________________
Click here for my rig specs
Everytime a mustache is shaved, an angel dies.
Last edited by screwballl : September 3rd, 2008 at 09:09 PM.
Posts: 518
Time spent in forums: 1 Week 6 Days 19 h 9 m 47 sec
Reputation Power: 6846
I've only ever ran the Windows 5.91 and 5.91 extended client
I've read your threads here and at Stanford and don't recall you mentioning if you have the .NET Framework 2.0 required by Windows SMP.
If you do, then disregard my mentioning it
I'm guessing the missing .net files are causing the install.bat to have errors on logging your ID and password required for MPIexec to run properly
I had to set up a Windows boot login on both my XP systems and use the same login/password for the SMP client to get 5.91 SMP to run
I never could get Deino of any version to run on XP, so I gave up on it. Same with the newer SMP clients, did not work here, so I'm still running the 5.91 client that was extended by Stanford
Posts: 412
Time spent in forums: 2 Days 14 h 13 m 44 sec
Reputation Power: 2017
on both systems, they have the .NET2.0 SP1... I even removed and reinstalled .net 2.0 on the E2140 system and that made no difference.
It ran just fine using the 5.91 up until the last expiration... as soon as I attempted to use the 6.22 it all went downhill... nothing except the client had changed on my side. I have tried all currently available clients (5.91, 5.92, 6.22, 6.22beta2R3) and nothing is working (except 6.20 SMP on linux which works fine)
Posts: 518
Time spent in forums: 1 Week 6 Days 19 h 9 m 47 sec
Reputation Power: 6846
OK, Since .net 2.0 is verified, I find something interesting in the code you posted in opening post
Your client is downloading a Deino core -
[02:39:10] Downloading core (/~pande/Win32/x86_Deino/Core_a1.fah from www.stanford.edu)
and MPIexec is trying to execute a core it can't open and run -
[02:39:23] - Calling 'mpiexec -np 4 -channel shm -env MPICH_USE_SMP_OPTIMIZATIONS 1 -host 127.0.0.1 FahCore_a1.exe -dir work/ -suffix 01 -checkpoint 15 -verbose -lifeline 12024 -version 622'
So you get a CPU error of -
[02:39:27] CoreStatus = 63 (99)
Now the way I read it at Stanford, is that Deino and MPI are 2 different types of SMP executibles and they don't interact and can't run each others stuff, so, it looks to me like you need to try and run the Windows MPI client on the MPI core and vice versa, the Deino client on the deino core.
My SMP looks like this below at startup - I don't run any flags on any of my SMP clients
Code:
--- Opening Log file [August 23 23:32:33]
# SMP Client ################################################## ################
################################################## #############################
Folding@Home Client Version 5.91beta6
http://folding.stanford.edu
################################################## #############################
################################################## #############################
Launch directory: C:\Program Files\Folding@Home Windows SMP Client V1.01
Executable: C:\Program Files\Folding@Home Windows SMP Client V1.01\fah-SMP-591.exe
[23:32:33] - Ask before connecting: No
[23:32:33] - User name: JammerX99 (Team 12912)
[23:32:33] - User ID: 5CD0353733E2A087
[23:32:33] - Machine ID: 5
[23:32:33]
[23:32:33] Loaded queue successfully.
[23:32:33]
[23:32:33] + Processing work unit
[23:32:33] Core required: FahCore_a1.exe
[23:32:33] Core found.
[23:32:33] Working on Unit 03 [August 23 23:32:33]
[23:32:33] + Working ...
[23:32:33]
[23:32:33] *------------------------------*
[23:32:33] Folding@Home Gromacs SMP Core
[23:32:33] Version 1.74 (March 10, 2007)
[23:32:33]
[23:32:33] Preparing to commence simulation
[23:32:33] - Ensuring status. Please wait.
[23:32:50] - Looking at optimizations...
[23:32:50] - Working with standard loops on this execution.
[23:32:50] - Previous termination of core was improper.
[23:32:50] - Going to use standard loops.
[23:32:50] - Files status OK
[23:33:05] - Expanded 4824506 -> 24810145 (decompressed 514.2 percent)
[23:33:07]
[23:33:07] Project: 2665 (Run 2, Clone 548, Gen 41)
[23:33:07]
[23:33:08] Entering M.D.
[23:33:14] Calling FAH init
[23:33:16] Read topology
[23:33:16] s
[23:33:16] Writing local files
[23:33:16] Completed 9229 out of 250000 steps (3 percent)
[23:33:16] tions
[23:33:16] Writing local files
[23:33:16] Completed 9229 out of 250000 steps (3 percent)
[23:33:23] Extra SSE boost OK.
[23:39:17] Writing local files
[23:39:18] Completed 10000 out of 250000 steps (4 percent)
[23:58:27] Writing local files
[23:58:27] Completed 12500 out of 250000 steps (5 percent)
[00:17:36] Writing local files
[00:17:36] Completed 15000 out of 250000 steps (6 percent)
[00:36:46] Writing local files
[00:36:46] Completed 17500 out of 250000 steps (7 percent)
[00:55:52] Writing local files
[00:55:52] Completed 20000 out of 250000 steps (8 percent)
[01:15:02] Writing local files
[01:15:03] Completed 22500 out of 250000 steps (9 percent)
[01:34:12] Writing local files
[01:34:12] Completed 25000 out of 250000 steps (10 percent)
[01:53:22] Writing local files
[01:53:22] Completed 27500 out of 250000 steps (11 percent)
[02:12:32] Writing local files
[02:12:32] Completed 30000 out of 250000 steps (12 percent)
[02:31:38] Writing local files
[02:31:39] Completed 32500 out of 250000 steps (13 percent)
[02:50:45] Writing local files
[02:50:45] Completed 35000 out of 250000 steps (14 percent)
[03:09:47] Writing local files
[03:09:47] Completed 37500 out of 250000 steps (15 percent)
[03:28:57] Writing local files
[03:28:57] Completed 40000 out of 250000 steps (16 percent)
[03:48:07] Writing local files
[03:48:08] Completed 42500 out of 250000 steps (17 percent)
[04:07:17] Writing local files
[04:07:18] Completed 45000 out of 250000 steps (18 percent)
[04:26:26] Writing local files
[04:26:27] Completed 47500 out of 250000 steps (19 percent)
[04:45:35] Writing local files
[04:45:36] Completed 50000 out of 250000 steps (20 percent)
[05:04:42] Writing local files
[05:04:43] Completed 52500 out of 250000 steps (21 percent)
[05:23:43] Writing local files
[05:23:43] Completed 55000 out of 250000 steps (22 percent)
[05:42:45] Writing local files
[05:42:45] Completed 57500 out of 250000 steps (23 percent)
[06:01:49] Writing local files
[06:01:50] Completed 60000 out of 250000 steps (24 percent)
[06:20:54] Writing local files
[06:20:55] Completed 62500 out of 250000 steps (25 percent)
[06:39:56] Writing local files
[06:39:56] Completed 65000 out of 250000 steps (26 percent)
[06:59:00] Writing local files
[06:59:00] Completed 67500 out of 250000 steps (27 percent)
[07:18:02] Writing local files
[07:18:03] Completed 70000 out of 250000 steps (28 percent)
[07:37:06] Writing local files
[07:37:07] Completed 72500 out of 250000 steps (29 percent)
[07:56:07] Writing local files
[07:56:08] Completed 75000 out of 250000 steps (30 percent)
[08:15:11] Writing local files
[08:15:11] Completed 77500 out of 250000 steps (31 percent)
[08:34:15] Writing local files
[08:34:15] Completed 80000 out of 250000 steps (32 percent)
[08:53:18] Writing local files
[08:53:18] Completed 82500 out of 250000 steps (33 percent)
[09:12:20] Writing local files
[09:12:20] Completed 85000 out of 250000 steps (34 percent)
[09:31:23] Writing local files
[09:31:24] Completed 87500 out of 250000 steps (35 percent)
[09:50:28] Writing local files
[09:50:28] Completed 90000 out of 250000 steps (36 percent)
[10:09:34] Writing local files
[10:09:34] Completed 92500 out of 250000 steps (37 percent)
[10:28:34] Writing local files
[10:28:34] Completed 95000 out of 250000 steps (38 percent)
[10:47:38] Writing local files
[10:47:38] Completed 97500 out of 250000 steps (39 percent)
[11:06:40] Writing local files
[11:06:40] Completed 100000 out of 250000 steps (40 percent)
[11:25:45] Writing local files
[11:25:46] Completed 102500 out of 250000 steps (41 percent)
[11:44:51] Writing local files
[11:44:51] Completed 105000 out of 250000 steps (42 percent)
[12:03:57] Writing local files
[12:03:57] Completed 107500 out of 250000 steps (43 percent)
[12:23:01] Writing local files
[12:23:01] Completed 110000 out of 250000 steps (44 percent)
[12:42:06] Writing local files
[12:42:07] Completed 112500 out of 250000 steps (45 percent)
[13:01:11] Writing local files
[13:01:11] Completed 115000 out of 250000 steps (46 percent)
[13:20:11] Writing local files
[13:20:11] Completed 117500 out of 250000 steps (47 percent)
[13:39:19] Writing local files
[13:39:19] Completed 120000 out of 250000 steps (48 percent)
[13:58:29] Writing local files
[13:58:29] Completed 122500 out of 250000 steps (49 percent)
[14:17:38] Writing local files
[14:17:39] Completed 125000 out of 250000 steps (50 percent)
[14:36:48] Writing local files
[14:36:48] Completed 127500 out of 250000 steps (51 percent)
[14:55:54] Writing local files
[14:55:54] Completed 130000 out of 250000 steps (52 percent)
[15:15:02] Writing local files
[15:15:03] Completed 132500 out of 250000 steps (53 percent)
[15:34:11] Writing local files
[15:34:12] Completed 135000 out of 250000 steps (54 percent)
[15:53:20] Writing local files
[15:53:21] Completed 137500 out of 250000 steps (55 percent)
[16:12:32] Writing local files
[16:12:32] Completed 140000 out of 250000 steps (56 percent)
[16:31:42] Writing local files
[16:31:42] Completed 142500 out of 250000 steps (57 percent)
[16:50:51] Writing local files
[16:50:52] Completed 145000 out of 250000 steps (58 percent)
[17:09:58] Writing local files
[17:09:59] Completed 147500 out of 250000 steps (59 percent)
[17:29:03] Writing local files
[17:29:04] Completed 150000 out of 250000 steps (60 percent)
[17:48:10] Writing local files
[17:48:11] Completed 152500 out of 250000 steps (61 percent)
[18:07:16] Writing local files
[18:07:16] Completed 155000 out of 250000 steps (62 percent)
[18:26:23] Writing local files
[18:26:24] Completed 157500 out of 250000 steps (63 percent)
[18:45:31] Writing local files
[18:45:31] Completed 160000 out of 250000 steps (64 percent)
[19:04:36] Writing local files
[19:04:37] Completed 162500 out of 250000 steps (65 percent)
[19:23:43] Writing local files
[19:23:43] Completed 165000 out of 250000 steps (66 percent)
[19:42:44] Writing local files
[19:42:44] Completed 167500 out of 250000 steps (67 percent)
[20:01:48] Writing local files
[20:01:49] Completed 170000 out of 250000 steps (68 percent)
[20:20:52] Writing local files
[20:20:52] Completed 172500 out of 250000 steps (69 percent)
[20:39:58] Writing local files
[20:39:58] Completed 175000 out of 250000 steps (70 percent)
[20:59:05] Writing local files
[20:59:05] Completed 177500 out of 250000 steps (71 percent)
[21:18:12] Writing local files
[21:18:12] Completed 180000 out of 250000 steps (72 percent)
[21:37:15] Writing local files
[21:37:15] Completed 182500 out of 250000 steps (73 percent)
[21:56:21] Writing local files
[21:56:21] Completed 185000 out of 250000 steps (74 percent)
[22:15:32] Writing local files
[22:15:32] Completed 187500 out of 250000 steps (75 percent)
[22:34:36] Writing local files
[22:34:36] Completed 190000 out of 250000 steps (76 percent)
[22:53:54] Writing local files
[22:53:54] Completed 192500 out of 250000 steps (77 percent)
[23:12:59] Writing local files
[23:12:59] Completed 195000 out of 250000 steps (78 percent)
[23:32:05] Writing local files
[23:32:06] Completed 197500 out of 250000 steps (79 percent)
[23:51:11] Writing local files
[23:51:11] Completed 200000 out of 250000 steps (80 percent)
[00:10:12] Writing local files
[00:10:13] Completed 202500 out of 250000 steps (81 percent)
[00:29:13] Writing local files
[00:29:13] Completed 205000 out of 250000 steps (82 percent)
[00:48:16] Writing local files
[00:48:17] Completed 207500 out of 250000 steps (83 percent)
[01:07:22] Writing local files
[01:07:22] Completed 210000 out of 250000 steps (84 percent)
[01:26:25] Writing local files
[01:26:26] Completed 212500 out of 250000 steps (85 percent)
[01:45:28] Writing local files
[01:45:28] Completed 215000 out of 250000 steps (86 percent)
[02:04:30] Writing local files
[02:04:30] Completed 217500 out of 250000 steps (87 percent)
[02:23:33] Writing local files
[02:23:34] Completed 220000 out of 250000 steps (88 percent)
[02:42:39] Writing local files
[02:42:39] Completed 222500 out of 250000 steps (89 percent)
[03:01:47] Writing local files
[03:01:47] Completed 225000 out of 250000 steps (90 percent)
[03:20:53] Writing local files
[03:20:53] Completed 227500 out of 250000 steps (91 percent)
[03:39:58] Writing local files
[03:39:58] Completed 230000 out of 250000 steps (92 percent)
[03:59:05] Writing local files
[03:59:06] Completed 232500 out of 250000 steps (93 percent)
[04:18:09] Writing local files
[04:18:09] Completed 235000 out of 250000 steps (94 percent)
[04:37:14] Writing local files
[04:37:14] Completed 237500 out of 250000 steps (95 percent)
[04:56:13] Writing local files
[04:56:14] Completed 240000 out of 250000 steps (96 percent)
[05:15:19] Writing local files
[05:15:19] Completed 242500 out of 250000 steps (97 percent)
[05:34:23] Writing local files
[05:34:23] Completed 245000 out of 250000 steps (98 percent)
[05:53:30] Writing local files
[05:53:31] Completed 247500 out of 250000 steps (99 percent)
[06:12:37] Writing local files
[06:12:38] Completed 250000 out of 250000 steps (100 percent)
[06:12:38] Writing final coordinates.
[06:12:40] Past main M.D. loop
[06:12:40] Will end MPI now
[06:13:40]
[06:13:40] Finished Work Unit:
[06:13:40] - Reading up to 21421872 from "work/wudata_03.arc": Read 21421872
[06:13:40] - Reading up to 592316 from "work/wudata_03.xtc": Read 592316
[06:13:40] goefile size: 0
[06:13:40] logfile size: 212414
[06:13:40] Leaving Run
[06:13:43] - Writing 22232974 bytes of core data to disk...
[06:13:44] ... Done.
[06:13:44] - Failed to delete work/wudata_03.sas
[06:13:44] - Failed to delete work/wudata_03.goe
[06:13:44] Warning: check for stray files
[06:13:44] - Shutting down core
[06:15:44]
[06:15:44] Folding@home Core Shutdown: FINISHED_UNIT
[06:15:44]
[06:15:44] Folding@home Core Shutdown: FINISHED_UNIT
[06:15:48] CoreStatus = 64 (100)
[06:15:48] Sending work to server
[06:15:48] + Attempting to send results
[06:22:39] + Results successfully sent
[06:22:39] Thank you for your contribution to Folding@Home.
[06:22:39] + Number of Units Completed: 38
Last edited by jammerx99 : September 4th, 2008 at 05:22 PM.
Posts: 2,149
Time spent in forums: 3 Months 4 Weeks 1 Day 3 h 5 sec
Reputation Power: 9968
Not sure if I remember correctly but I have seen that error.
Try :- before you run install go into task manager and make sure you are looking at processes for all users and check that there are no rogue smpd, mpiexec, core, fah or other folding files running. I have had occasions when re-installing or trying to recover from an error that although you have exited one or more rogues are still running. Once you end all those processes (careful some keep coming back until you end the host process) then the install will run correctly, register and correctly start the 2 test instances.
May not be it but I do remember having that problem.
Posts: 412
Time spent in forums: 2 Days 14 h 13 m 44 sec
Reputation Power: 2017
Quote:
Originally Posted by JohnFrank
Not sure if I remember correctly but I have seen that error.
Try :- before you run install go into task manager and make sure you are looking at processes for all users and check that there are no rogue smpd, mpiexec, core, fah or other folding files running. I have had occasions when re-installing or trying to recover from an error that although you have exited one or more rogues are still running. Once you end all those processes (careful some keep coming back until you end the host process) then the install will run correctly, register and correctly start the 2 test instances.
May not be it but I do remember having that problem.
I have done this on fresh clean setups and no difference. Still refuses to run... almost always it is the mpiexec that refuses to start (on the MPI client) and then error = 64 <99> from the Deino client
Posts: 2,149
Time spent in forums: 3 Months 4 Weeks 1 Day 3 h 5 sec
Reputation Power: 9968
Quote:
Originally Posted by screwballl
I have done this on fresh clean setups and no difference. Still refuses to run... almost always it is the mpiexec that refuses to start (on the MPI client) and then error = 64 <99> from the Deino client
Maybe one of the guys running the new SMP clients can help as this is beyond me (I am still running the old 5.91 beta with extended deadlines). I am snowed under at the moment (figuratively speaking) which is why I have not been on for a few days, this may last for another week and a bit.
Don't give up, if you cannot fix it or no one can help before I have time, I will sit down then and figure out how to get it running on one of my machines so I can try and help.
Posts: 518
Time spent in forums: 1 Week 6 Days 19 h 9 m 47 sec
Reputation Power: 6846
Screwballl, the only thing I can think of at this point, is that you MPI username and password have to be identical to your Windows OS login username and password. If you don't have a WIndows login for XP or Vista, then thats the problem.
I seem to recall having that problem when I tried to setup SMP on a dual core AMD system, for my very first SMP F@H client. I had to create an XP login username and password and then use that same username and password to get MPI to load mpiexec.exe