Infinite DistRTgen WU

Just as the title says!
User avatar
Janos (retired)
Still a Newbie
Posts: 1919
Joined: Thu Feb 23, 2012 8:58 am
Location: Aberdeenshire, Scotland

#1 Infinite DistRTgen WU

Post by Janos (retired) »

Anyone else having problems with infinitely running DistRTgen WU's?

On ALL of my machines crunching DistRTgen I get a WU once or twice a DAY which just runs and runs. 100% usage for the duration and I have caught tasks running for over 8 hours (I should never sleep!).

I've tried all the normal stuff like a big hammer and a strong telling off but no resolution as yet. Project reset, reinstall of BOINC, under clocking, default clocking. I've even tried a complete OS reinstall.

All the machines are working fine otherwise and it is just DistRTgen WU's which give me bother. There is nothing on the DistRTgen forums which tends to make me think it is something I am doing wrong.

Even a healthy dose of single malt has not solved things. Any ideas? Think I should try some more 18 year old Macallan?
User avatar
Gary Mc
Boinc Sergeant
Boinc Sergeant
Posts: 109
Joined: Sun Nov 07, 2010 12:25 pm
Location: England

#2 Re: Infinite DistRTgen WU

Post by Gary Mc »

Janos wrote:Anyone else having problems with infinitely running DistRTgen WU's?

On ALL of my machines crunching DistRTgen I get a WU once or twice a DAY which just runs and runs. 100% usage for the duration and I have caught tasks running for over 8 hours (I should never sleep!).

I've tried all the normal stuff like a big hammer and a strong telling off but no resolution as yet. Project reset, reinstall of BOINC, under clocking, default clocking. I've even tried a complete OS reinstall.

All the machines are working fine otherwise and it is just DistRTgen WU's which give me bother. There is nothing on the DistRTgen forums which tends to make me think it is something I am doing wrong.

Even a healthy dose of single malt has not solved things. Any ideas? Think I should try some more 18 year old Macallan?
It happens from time to time. Just check it every now and then and delete work until if it is over running excessively. Having said that it has not happened to me for some time but I have done nothing in particular to try to resolve it.

Just cos I have a lot of credits does not mean I know what I am doing.
User avatar
Janos (retired)
Still a Newbie
Posts: 1919
Joined: Thu Feb 23, 2012 8:58 am
Location: Aberdeenshire, Scotland

#3

Post by Janos (retired) »

I am going to have to find a fix as it is happening way to often. Like you Gary I have had the odd one in the past but the last few days have been crazy. Each machine is currently getting two or three a day. I just killed one WU which had been going for 2h 21m. :(
User avatar
Gary Mc
Boinc Sergeant
Boinc Sergeant
Posts: 109
Joined: Sun Nov 07, 2010 12:25 pm
Location: England

#4

Post by Gary Mc »

Janos wrote:I am going to have to find a fix as it is happening way to often. Like you Gary I have had the odd one in the past but the last few days have been crazy. Each machine is currently getting two or three a day. I just killed one WU which had been going for 2h 21m. :(
My ATI7970 machine is running Bionic 7.0.27 (x64)

I am running Windows 7 latest version - automatic updates on

Driver information :| - what ever this means...

Driver Packaging Version 8.961-120405a-137813C-ATI
Provider Advanced Micro Devices, Inc.
2D Driver Version 8.01.01.1243
2D Driver File Path /REGISTRY/MACHINE/SYSTEM/ControlSet001/Control/CLASS/{4D36E968-E325-11CE-BFC1-08002BE10318}/0002
Direct3D Version 7.14.10.0903
OpenGL Version 6.14.10.11631
AMD VISION Engine Control Center Version 2012.0405.2205.37728
AMD Audio Driver Version 7.12.0.7706

Best of luck, as I said I usually just wait for these problems to resolve themselves
:oops:
User avatar
Alez
[ TSBT's Pirate ]
[ TSBT's Pirate ]
Posts: 10363
Joined: Thu Oct 04, 2012 1:22 pm
Location: roaming the planet

#5

Post by Alez »

I found this http://www.freerainbowtables.com/phpBB3 ... fd2e117741

don't know if it helps but on the last post.

Mikey: I'm running stock settings. But 2 of the 4 cards involved came from the factory overclocked. I've got some additional info.

1. The dual GPU machine does get hanging WUs on both Device 0 and 1, it's just that Device 0 hangs are more common.

2. It appears that there is a driver reset each time the hang up appears. (This is probably the causative factor)

3. All cards involved come from Zotac.

Based on the above information, I don't know if the manufacturer is to blame or not. My other GPU machines (running older Pentium machine (DELL) - GTX560 or laptop - GTX660M) don't seem to ever have a problem. I do think that the problem most likely occurs when there is a switch between another project and DistrRTgen (Collaz or PrimeGrid). What I do know is that all GTX560 / Ti are running the same driver version and Windows 7.
User avatar
Alez
[ TSBT's Pirate ]
[ TSBT's Pirate ]
Posts: 10363
Joined: Thu Oct 04, 2012 1:22 pm
Location: roaming the planet

#6

Post by Alez »

see here
http://www.setiusa.us/showthread.php?41 ... Fail/page2
scroll down to post #15

here http://www.xtremesystems.org/forums/sho ... og+reg+fix
scroll down to post #13

and here http://msdn.microsoft.com/en-us/library ... 85%29.aspx

Has this just started since you got the new 79xx's
User avatar
Janos (retired)
Still a Newbie
Posts: 1919
Joined: Thu Feb 23, 2012 8:58 am
Location: Aberdeenshire, Scotland

#7

Post by Janos (retired) »

Ah nice work! It certainly seems logical that that a GPU thread could cause a timeout.

I will give the registry settings a whirl and report back.

Cheers
User avatar
Janos (retired)
Still a Newbie
Posts: 1919
Joined: Thu Feb 23, 2012 8:58 am
Location: Aberdeenshire, Scotland

#8

Post by Janos (retired) »

Just one hour after installing the new registry settings (with reboot) and I have an infinite WU. :(
User avatar
Janos (retired)
Still a Newbie
Posts: 1919
Joined: Thu Feb 23, 2012 8:58 am
Location: Aberdeenshire, Scotland

#9

Post by Janos (retired) »

And another. This is nuts!
User avatar
Janos (retired)
Still a Newbie
Posts: 1919
Joined: Thu Feb 23, 2012 8:58 am
Location: Aberdeenshire, Scotland

#10

Post by Janos (retired) »

Happened now 4 times but all on the same machine. It looks like the other two are fixed (famous last words).

I am going to reinstall drivers, windows updates, the registry heck, etc on the "failing machine" and see if that resolves things.
User avatar
Alez
[ TSBT's Pirate ]
[ TSBT's Pirate ]
Posts: 10363
Joined: Thu Oct 04, 2012 1:22 pm
Location: roaming the planet

#11

Post by Alez »

Are you running multiple units on the card or single incident ? Might it be that that machine is being memory bound ? If you're not already doing so try running 1 unit per GPU with a whole cpu core spare for each card.
I have a similar problem with poem and my 3 GPU's. units run fine on 1GPU but simply stall out and keep resetting on the other two. I have to exclude Poem on these two to run it on 1 GPU.
User avatar
Alez
[ TSBT's Pirate ]
[ TSBT's Pirate ]
Posts: 10363
Joined: Thu Oct 04, 2012 1:22 pm
Location: roaming the planet

#12

Post by Alez »

What version of Boinc are you running ?
User avatar
Janos (retired)
Still a Newbie
Posts: 1919
Joined: Thu Feb 23, 2012 8:58 am
Location: Aberdeenshire, Scotland

#13

Post by Janos (retired) »

I am running a default config with a single WU running on a single 7970. There are no other tasks running (during this testing phase).

The PC is using less than 25% memory.

Touch wood, the other two machines are still working well.
User avatar
Janos (retired)
Still a Newbie
Posts: 1919
Joined: Thu Feb 23, 2012 8:58 am
Location: Aberdeenshire, Scotland

#14

Post by Janos (retired) »

alezevo1 wrote:What version of Boinc are you running ?
Windows 7 - 7.0.28 64bit
User avatar
Alez
[ TSBT's Pirate ]
[ TSBT's Pirate ]
Posts: 10363
Joined: Thu Oct 04, 2012 1:22 pm
Location: roaming the planet

#15

Post by Alez »

I'm running 7.0.47. So far it's running very well. You could try upgrading. If it doesn't work you can always just reinstall 7.0.28.
I was running 7.0.45 and that eventually caused the whole of Boinc to go into a reset loop both on the CPU's and the GPU's when it tried to start a second Poem on the same GPU.
User avatar
Janos (retired)
Still a Newbie
Posts: 1919
Joined: Thu Feb 23, 2012 8:58 am
Location: Aberdeenshire, Scotland

#16

Post by Janos (retired) »

Yeah, I might try that tomorrow. I was also thinking about swapping the 7970 in the machine which keeps failing with a card on another machine which seems to now be working - to test out any hardware issues.
User avatar
Alez
[ TSBT's Pirate ]
[ TSBT's Pirate ]
Posts: 10363
Joined: Thu Oct 04, 2012 1:22 pm
Location: roaming the planet

#17

Post by Alez »

Two other thoughts...
Check the power management options haven't reverted to sleeping the monitor or turning off the GPU or you're new KVA switch could it be causing the card to sense no monitor load and sleep the GPU ?
and if I remember right do the ATI cards not have a turbo mode or similar ? I use afterburner to check the core clocks etc on my cards as my 610 has a tendency ( why I don't know ) to overclock itself fro 810 mhz. At 870 mhz it runs seti fine but at 910 the seti units stall on the GPU and sit with a schedular wait message. The error messages are better and the config setup for GPU's are far better on 7.0.47. Does you're cards not have the ability to increase the core clock speed as the demand on the GPU goes up ?
User avatar
Janos (retired)
Still a Newbie
Posts: 1919
Joined: Thu Feb 23, 2012 8:58 am
Location: Aberdeenshire, Scotland

#18

Post by Janos (retired) »

Power checked
Sleep checked
Same clock settings as the other two cards
Same driver versions
Same windows setup (well almost this one is Ultimate and the other two are Pro)

I am going to test out the hardware tomorrow. Maybe flash the motherboard bios...
User avatar
Alez
[ TSBT's Pirate ]
[ TSBT's Pirate ]
Posts: 10363
Joined: Thu Oct 04, 2012 1:22 pm
Location: roaming the planet

#19

Post by Alez »

Knew I'd read something about this when looking at ATI cards ( all mine are nVidea )
http://forums.anandtech.com/archive/ind ... 44769.html

Apparently the zerocore tech can sometimes idle your cards by itself. Think you need to set the cards into high performance mode or something. Not sure if this is of any help or not.
User avatar
Janos (retired)
Still a Newbie
Posts: 1919
Joined: Thu Feb 23, 2012 8:58 am
Location: Aberdeenshire, Scotland

#20

Post by Janos (retired) »

I *think* I have fixed it. :?

After much use of big hammer and single malt, I swapped the power feed to the card and it now seems to be working perfectly.

Credits be incoming :)

Thanks guys for the help with the debug process.

The registry settings were a superb tip and definitely fixed the mission of Microsoft to protect the user from his own stupidity, at all cost, because Windows knows better.
User avatar
Alez
[ TSBT's Pirate ]
[ TSBT's Pirate ]
Posts: 10363
Joined: Thu Oct 04, 2012 1:22 pm
Location: roaming the planet

#21

Post by Alez »

Microsoft always knows better :roll:
User avatar
Janos (retired)
Still a Newbie
Posts: 1919
Joined: Thu Feb 23, 2012 8:58 am
Location: Aberdeenshire, Scotland

#22

Post by Janos (retired) »

Janos wrote:I *think* I have fixed it. :?
Hmm, not quite. I just had to kill another task on the same machine. :evil:
User avatar
Janos (retired)
Still a Newbie
Posts: 1919
Joined: Thu Feb 23, 2012 8:58 am
Location: Aberdeenshire, Scotland

#23

Post by Janos (retired) »

Different machine this time. The rate of infinite WU's has dropped dramatically but still happening all too often.

I will try plan D and see what happens next...
User avatar
Janos (retired)
Still a Newbie
Posts: 1919
Joined: Thu Feb 23, 2012 8:58 am
Location: Aberdeenshire, Scotland

#24

Post by Janos (retired) »

Out of my three crunchers, two had infinite units this morning: one of 6:03:01 and the other 5:47:41 of utter wasted electric. Not happy.
User avatar
Janos (retired)
Still a Newbie
Posts: 1919
Joined: Thu Feb 23, 2012 8:58 am
Location: Aberdeenshire, Scotland

#25

Post by Janos (retired) »

Came home to find more locked up WU's. That is more than 27 hours of GPU time lost today. Going to faff with more settings this evening but if it continues I will use a way bigger hammer.
User avatar
Janos (retired)
Still a Newbie
Posts: 1919
Joined: Thu Feb 23, 2012 8:58 am
Location: Aberdeenshire, Scotland

#26

Post by Janos (retired) »

I sat down tonight and thought I can't find the cause, how do I cure the symptom?

So, I have written some code to check the time the WU has been running and if it is over 20% of the average of the last 5 WU completion times then it auto suspends the active WU. I can then manually see, at my leisure, if the suspended WU should be restarted or aborted.

Worst case is half a dozen suspended work units per day rather than 2 or 3 crunchers locked up achieving not very much for hours at a time.
Post Reply Previous topicNext topic

Return to “Help”