The importance of Node Interleaving on AMD compute nodes

Enabling Node Interleaving in the bios can greatly increase performance of a compute node. Node interleaving essentially lets the CPU decide where to put the memory, disabling it means that the user must explicitly tell where in memory to put data so that the associated CPU gets best performance.

An explanation of Node Interleaving can be found here

The end result, a 4-5x performance increase in terms of memory bandwidth.

In our lab we have several 64 core AMD nodes with the following specs:

Supermicro HBQGL-6F/HBQGL-IF
Supermicro 1042-LTF SuperServer

Processor	AMD 6274
Nickname	Interlagos
Clock (GHz)	2.2
Sockets/Node	4
Cores/Socket	16
NUMA/Socket	2
DP GFlops/Socket	140.8
Memory/Socket	32 GB
Bandwidth/Socket	102.4 GB/s
DDR3	1333 MHz
L1 cache (excl.)	16KB
L2 cache/# cores	2MB/2
L3 cache/# cores	8MB/8

I noticed a a few days ago that one of the nodes was performing horribly compared to the other so I decided to do some digging. I installed AMDAPPSDK on both machines and ran the clpeak benchmark with the following results:

Bad Compute Node:

 
Platform: AMD Accelerated Parallel Processing
  Device: AMD Opteron(TM) Processor 6274                 
    Driver version  : 1214.3 (sse2,avx,fma4) (Linux x64)
    Compute units   : 64
    Clock frequency : 2200 MHz

    Global memory bandwidth (GBPS)
      float   : 9.22
      float2  : 9.64
      float4  : 9.95
      float8  : 10.16
      float16 : 9.99

  	...

Good Compute Node:

 
Platform: AMD Accelerated Parallel Processing
  Device: AMD Opteron(TM) Processor 6274                 
    Driver version  : 1214.3 (sse2,avx,fma4) (Linux x64)
    Compute units   : 64
    Clock frequency : 2205 MHz

    Global memory bandwidth (GBPS)
      float   : 37.66
      float2  : 42.27
      float4  : 58.08
      float8  : 55.39
      float16 : 43.31

	...

There is a 4-5x differerence in memory bandwidth! I omitted the Flop rates of both nodes as they were identical. By enabling Node interleaving, the performance increases dramatically.

Bios Configuration

Note that I will be talking about Bios version 2.0 here.

I am going to provide the bios configuration of the faster machine for the CPU and the Memory options

Bios->Advanced->Processor & Clock Options

 
GART Error [Disabled]
Microcode Update [Enabled]
Secure Virtual Machine Mode [Disabled]
PowerNow [Enabled]
C State Mode [Disabled]
PowerCap [P-state 0]
HPC Mode [Disabled]
CPB Mode [Auto]
CPU DownCore Mode [Disabled]
C1E Support [Auto]
Clock Spread Spectrum [Disabled]

Bios->Advanced->Advanced Chipset Control -> NorthBridge Configuration

 
HT Speed Support [Auto]
IOMMU [Enabled]

Bios->Advanced->Advanced Chipset Control -> NorthBridge Configuration ->Memory Configuration

 
Bank Interleaving [Auto]
Node Interleaving [Auto]		THE MOST IMPORTANT CHANGE
Channel Interleaving [Auto]
CS Sparing Enable [Disabled]
Bank Swizzle Mode [Enabled]