Xeon - not utilising more than 32 threads

Forum to discuss and compare Hardware profiles and Benchmarking
Forum rules
Welcome to The Scottish Boinc Team boards. See forum rules in pinned post. If you can't be bothered then try not to be too naughty as I have a delete button to press and a ban hammer to swing.
User avatar
scole250
Boinc Brigadier
Boinc Brigadier
Posts: 3040
Joined: Mon Feb 03, 2014 2:38 pm
Location: Goldsboro, (Eastern) North Carolina, USA

Re: Xeon - not utilising more than 32 threads

Unread post by scole250 » Fri Jun 02, 2017 8:26 pm

I can't recall if I reinstalled win7 on the 72 thread system last time it flaked out but I'll try to give it a whirl this weekend.
Image

User avatar
Bryan
Boinc Lietenant Colonel
Boinc Lietenant Colonel
Posts: 900
Joined: Thu May 21, 2015 6:18 pm

Re: Xeon - not utilising more than 32 threads

Unread post by Bryan » Fri Jun 02, 2017 8:38 pm

scole250 wrote:
Fri Jun 02, 2017 8:26 pm
I can't recall if I reinstalled win7 on the 72 thread system last time it flaked out but I'll try to give it a whirl this weekend.
That was too easy, I should have asked for more :roll:
Image

User avatar
scole250
Boinc Brigadier
Boinc Brigadier
Posts: 3040
Joined: Mon Feb 03, 2014 2:38 pm
Location: Goldsboro, (Eastern) North Carolina, USA

Re: Xeon - not utilising more than 32 threads

Unread post by scole250 » Fri Jun 02, 2017 8:52 pm

Bryan wrote:
Fri Jun 02, 2017 8:38 pm
scole250 wrote:
Fri Jun 02, 2017 8:26 pm
I can't recall if I reinstalled win7 on the 72 thread system last time it flaked out but I'll try to give it a whirl this weekend.
That was too easy, I should have asked for more :roll:
And I'll be glad to pass along my findings for just a few measly bitcoins. :lol:
Image

User avatar
Bryan
Boinc Lietenant Colonel
Boinc Lietenant Colonel
Posts: 900
Joined: Thu May 21, 2015 6:18 pm

Re: Xeon - not utilising more than 32 threads

Unread post by Bryan » Sat Jun 03, 2017 2:24 am

scole250 wrote:
Fri Jun 02, 2017 8:52 pm
Bryan wrote:
Fri Jun 02, 2017 8:38 pm
scole250 wrote:
Fri Jun 02, 2017 8:26 pm
I can't recall if I reinstalled win7 on the 72 thread system last time it flaked out but I'll try to give it a whirl this weekend.
That was too easy, I should have asked for more :roll:
And I'll be glad to pass along my findings for just a few measly bitcoins. :lol:
:lol: I knew there was a catch! The bitcoins are in the mail watch for them.
Image

noetus
Boinc Lance Corporal
Boinc Lance Corporal
Posts: 34
Joined: Tue May 30, 2017 3:15 am

Re: Xeon - not utilising more than 32 threads

Unread post by noetus » Wed Jun 07, 2017 10:42 pm

A point of clarification: was the earlier claim of a limit of 64 threads (by Bryan) related to Boinc specifically or was it supposed to cover any set of processes one might care to run?

In my own coding I do not code multi-threaded apps. I tried that and it didn't work out well (thread management was an issue and requires more complex coding than I am really capable of right now). Instead for naturally parallelisable tasks I code for single threads and then run multiple instances from the command line, letting the OS manage the multitasking.

User avatar
Bryan
Boinc Lietenant Colonel
Boinc Lietenant Colonel
Posts: 900
Joined: Thu May 21, 2015 6:18 pm

Re: Xeon - not utilising more than 32 threads

Unread post by Bryan » Wed Jun 07, 2017 11:19 pm

It is a basic function/limitation of any flavor of the Windows OS to include the server versions. It has nothing to do with BOINC specifically. You will run into the same problem with your program.

We did find the solution using the start /node X /affinity 0xFFFFFFFF [your_program.exe]. From the command window or from a batch/cmd file you can execute the command. When the program is launched you assign it to either node 0 or node 1 and then you set the affinity of which processors in that node the program is allowed to use. The 0xFFFFFFFFF is a bit mask for the allowed threads in the node. What I showed would allow the program to use any of 36 threads. In your case you will need to add 2 more F's.

Check out the syntax/manual for the "start" command. Down in the multiprocessor section you will find the NUMA stuff.

https://ss64.com/nt/start.html

There are also NUMA commands you can use programmatically that will do the same thing and with those you can actualy go down to assigning at the thread level rather than at the program level.
Image

User avatar
Bryan
Boinc Lietenant Colonel
Boinc Lietenant Colonel
Posts: 900
Joined: Thu May 21, 2015 6:18 pm

Re: Xeon - not utilising more than 32 threads

Unread post by Bryan » Wed Jun 07, 2017 11:49 pm

To be clear about this, you can run your 88 threads from a program without a problem. However the efficiency will be the same as if you were running 64 threads.

There are 64 NUMA memory pipes available. If you run 64 threads or less then each thread will run at 100% load. If you enable 88 threads then Windows will start "sharing" the memory pipes. So the 24 threads (above the 64) would start sharing the memory pipes with 24 of the 1st 64 and therefore would only be running at 50% efficiency because 1/2 the time they are running and 1/2 the time they are waiting for memory.

You would wind up with 40 threads running at full load and 48 threads running at 50% load ... ie 64 threads. Since the memory channels would have to be continuously loaded/unloaded it would actually be less efficient than just running 64 threads.

So to overcome this limitation you would set your program up to launch 44 threads at Node 0 (CPU0) and then 44 threads to Node 1 (CPU1). Then all 88 threads would be running at 100% loading.

We proved the technique works as I had describe in my post to Pete. We launched one BOINC client to node 0 and a 2nd BOINC client to node 1. On our 72 thread machines all threads were running at 100% load.
Image

User avatar
scole250
Boinc Brigadier
Boinc Brigadier
Posts: 3040
Joined: Mon Feb 03, 2014 2:38 pm
Location: Goldsboro, (Eastern) North Carolina, USA

Re: Xeon - not utilising more than 32 threads

Unread post by scole250 » Thu Jun 08, 2017 12:38 am

Actually, our observations have been that even if you limit the number of threads to 64 they still won't run as efficient as they could because another thing that happens with NUMA is it was designed to use memory on the bus of the other CPU if needed (or not) and it will do so at a loss of performance. To get the best performance, you must use the start command directives /NODE and /AFFINITY to restrict processes to the same NUMA node wired to the processor.

Seeing is believing and there are two things to get setup so you can observe efficiency and affect it:
1. Install BoincTasks so you can see what kind of CPU utilization WUs are running at. http://efmer.com/b/?q=boinctasks_download If you are running Boinc on more than one system, it's the only way to go to manage things. It will require you to configure each Boinc client to allow remote GUI access from other systems.
2. Setup your systems to run multiple Boinc clients. See info here...viewtopic.php?f=172&t=3140 Not sure if you have access to that area yet. If not, let us know an we'll get it moved to an open access area.

A lot of things to do and understand in those 2 items. Going to take a little effort to set it all up. I don't think we can easily give a "go do A, B, C, D" list of instructions. If you have questions though, feel free to ask.
Image

User avatar
Bryan
Boinc Lietenant Colonel
Boinc Lietenant Colonel
Posts: 900
Joined: Thu May 21, 2015 6:18 pm

Re: Xeon - not utilising more than 32 threads

Unread post by Bryan » Thu Jun 08, 2017 12:54 am

There are a couple of caveats to those wanting to run more than 64 threads of BOINC.

1. There are a couple of projects, like Yoyo, that don't allow multiple BOINC clients.

2. If the project uses VBox then it doesn't work because you only have one instance of VBox installed and it therefore falls under the 64 NUMA thread rule. You can assign a boinc client to each node but when it calls VBox then VBox is limited to 64 threads.

On standard BOINC projects it is phenomenal ... beats the heck out of having to turn off HT when you have no choice but to run Windows (like Gerasim).
Image

noetus
Boinc Lance Corporal
Boinc Lance Corporal
Posts: 34
Joined: Tue May 30, 2017 3:15 am

Re: Xeon - not utilising more than 32 threads

Unread post by noetus » Thu Jun 08, 2017 12:57 pm

Let me see if I'm reading you right. So you're saying if I take a 44 core / 88 thread machine, and limit it to 32 cores / 64 threads in BIOS, then run Cinebench on Windows, then enable all cores in BIOS, and run Cinebench again, I won't see any significant difference in the benchmark scores?

User avatar
scole250
Boinc Brigadier
Boinc Brigadier
Posts: 3040
Joined: Mon Feb 03, 2014 2:38 pm
Location: Goldsboro, (Eastern) North Carolina, USA

Re: Xeon - not utilising more than 32 threads

Unread post by scole250 » Thu Jun 08, 2017 1:20 pm

noetus wrote:
Thu Jun 08, 2017 12:57 pm
Let me see if I'm reading you right. So you're saying if I take a 44 core / 88 thread machine, and limit it to 32 cores / 64 threads in BIOS, then run Cinebench on Windows, then enable all cores in BIOS, and run Cinebench again, I won't see any significant difference in the benchmark scores?
I think you'll see a higher benchmark with 88 threads vs. 64 but you won't get anywhere near 100% utilization on all 88 threads at the same time. In order to get the most utilization of all 88 threads under Windows, you'll need to make sure 44 processes run on threads 0-43 on NUMA node 0 and the other 44 processes run on threads 44-87 on NUMA node 1. That doesn't occur automatically. To get Boinc to run under those constraints, you must setup 2 boinc clients and run each using the start command with the correct NODE and AFFINITY options.
Image

User avatar
Bryan
Boinc Lietenant Colonel
Boinc Lietenant Colonel
Posts: 900
Joined: Thu May 21, 2015 6:18 pm

Re: Xeon - not utilising more than 32 threads

Unread post by Bryan » Thu Jun 08, 2017 5:38 pm

noetus wrote:
Thu Jun 08, 2017 12:57 pm
Let me see if I'm reading you right. So you're saying if I take a 44 core / 88 thread machine, and limit it to 32 cores / 64 threads in BIOS, then run Cinebench on Windows, then enable all cores in BIOS, and run Cinebench again, I won't see any significant difference in the benchmark scores?
No, I think Cinebench will use all 88 threads to their fullest capability. That is a professioanlly written benchmarking suite. That suite is used by every Tom, Dick, and Harry who tests computer systems. They would be fools to not take NUMA into account. If I were writing code to run benchmarks one of the 1st things I would do is check the system topology to see how many CPUs, cores/threads, and NUMA nodes that were available. I have absolutely no doubt that they do this and then manage the threads of the benchmark accordingly.

Don't forget the quote from Microsoft concerning the issue;
The reason for initially limiting all threads to a single group is that 64 processors is more than adequate for the typical application. An application that requires the use of multiple groups so that it can run on more than 64 processors must intentionally determine where to run its threads. The application is responsible for setting thread affinities to the desired groups.
I really think that Microsoft knows their product and how it operates :lol: Then again something may have changed in Win10 Enterprise but I haven't found any documentation along those lines. It also isn't mentioned in the new server editions including the "data center" versions. If it were to be changed that would be the logical place for it to occur.
Image

noetus
Boinc Lance Corporal
Boinc Lance Corporal
Posts: 34
Joined: Tue May 30, 2017 3:15 am

Re: Xeon - not utilising more than 32 threads

Unread post by noetus » Fri Jun 09, 2017 1:05 am

OK, this is along the lines of what I assumed. Regarding single-threaded command line applications; multiple instances of these will also be assigned NUMA nodes according to best practices for efficiency I assume.

User avatar
Bryan
Boinc Lietenant Colonel
Boinc Lietenant Colonel
Posts: 900
Joined: Thu May 21, 2015 6:18 pm

Re: Xeon - not utilising more than 32 threads

Unread post by Bryan » Fri Jun 09, 2017 1:27 am

Correct. You would want to keep track of how many you have running and where. You also want to "balance" the 2 nodes so they are both using approximately the same number of threads.
Image

User avatar
scole250
Boinc Brigadier
Boinc Brigadier
Posts: 3040
Joined: Mon Feb 03, 2014 2:38 pm
Location: Goldsboro, (Eastern) North Carolina, USA

Re: Xeon - not utilising more than 32 threads

Unread post by scole250 » Fri Jun 09, 2017 1:49 am

I've been running my 72 thread system under Win 7 again this afternoon and can confirm again that if you run a single Boinc client and allow the OS to manage the nodes you will see CPU utilization vary from 50-100%. Most of the time I saw utilization between 90-96% and it rarely but occasionally peaked at 100% but just as often fell below 75%. It ran this way for a couple hours. Then I ran two clients specifying the NODE and AFFINITY for each client and CPU utilization stayed pegged at 100%.
Image

Post Reply Previous topicNext topic

Return to “Benchmarking and Hardware”

Who is online

Users browsing this forum: No registered users and 1 guest