Testing "Scaled Throughput" mode with AIX 7.1 on POWER7. (Chris's AIX Blog)

The developerWorks Connections platform will be sunset on December 31, 2019. On January 1, 2020, this blog will no longer be available. More details available on our FAQ.

Testing "Scaled Throughput" mode with AIX 7.1 on POWER7.

cggibbo Dec 21 2012 Comments (3) Visits (29835)

0 people like this

Starting with AIX 6.1 TL08 and AIX 7.1 TL02 there’s a new AIX CPU tuning feature called “Scaled Throughput” mode. This is supported on POWER7 and POWER7+ processors only (do not try this on POWER6!). This new mode has the ability to dispatch workload to more SMT threads per VP, avoiding the need to unfold additional VPs. I’ve heard it described as being more “POWER6 like”. I’m not suggesting that you use this feature. This post simply discusses what this new mode can do.

By default AIX (on POWER7) operates in “Raw Throughput” mode. This mode provides the best performance per thread per core. It offers the best response times. It utilises more cores (VPs) to process a systems workload. By comparison, “Scaled Throughput” provides a greater level of per core throughput (processing) by dispatching more SMT threads on a core. This has the effect of utilising fewer VPs/cores. In this mode, more (or all) SMT threads per core will be utilised before dispatching workload to other VPs/core in the system.

The schedo tuning command can be used to enable the new mode using a new parameter called vpm_throughput_mode. e.g.

# schedo –p –o vpm_throughput_mode=X

This tunable can be set to one of the following values:

0 = Legacy Raw mode (default).

1= Scaled or “Enhanced Raw” mode with a higher threshold than legacy.

2 = Scaled mode, use primary and secondary SMT threads.

4 = Scaled mode, use all four SMT threads.

At this stage, this tunable is not restricted, but if you plan on experimenting with it please be careful; make sure you understand how this tuning may impact your system; always test new tuning in a non-production environment first!

I performed a couple of quick tests today, just to see what impact tuning the parameter would have on an AIX 7.1 TL2 system.

aixlpar1 : / # oslevel -s

7100-02-01-1245

aixlpar1 : / # lsconf | grep Mode

System Model: IBM,9119-FHB

Processor Implementation Mode: POWER 7

I started some CPU intensive workload.

Name PID CPU% PgSp Owner CliV2 0 WPAR Total 0

ncpu 4718826 15.7 108K db SerV3 0 Press: "h"-help

ncpu 7143668 15.7 108K db CliV3 0 "q"-quit

ncpu 8126488 15.7 108K db

ncpu 5832920 15.6 108K db

The vpm_throughput_mode parameter was left at the default value (0).

# schedo -a | grep vpm_throughput_mode

vpm_throughput_mode = 0

As expected, the workload was evenly dispatched across each of the primary SMT threads of the 4 VPs assigned to the partition i.e. logical CPU 0, 4, 8 and 12. None of the secondary or tertiary SMT threads were active. This is the default mode and will provide the greatest raw throughput (performance) per VP (as there’s no overhead associated with enabling secondary or tertiary SMT threads).

Topas Monitor for host:aixlpar1 EVENTS/QUEUES FILE/TTY

Fri Dec 21 11:06:34 2012 Interval:2 Cswitch 184 Readch 864

Syscall 299 Writech 651

CPU User% Kern% Wait% Idle% Physc Reads 111 Rawin 0

0 99.7 0.3 0.0 0.0 0.63 Writes 1 Ttyout 233

1 0.2 0.5 0.0 99.3 0.12 Forks 0 Igets 0

2 0.0 0.0 0.0 100.0 0.12 Execs 0 Namei 9

3 0.0 0.0 0.0 100.0 0.12 Runqueue 4.00 Dirblk 0

4 100.0 0.0 0.0 0.0 0.63 Waitqueue 0.0

5 0.0 0.0 0.0 100.0 0.12 MEMORY

6 0.0 0.0 0.0 100.0 0.12 PAGING Real,MB 4096

7 0.0 0.0 0.0 100.0 0.12 Faults 0 % Comp 22

8 100.0 0.0 0.0 0.0 0.63 Steals 0 % Noncomp 2

9 0.0 0.0 0.0 100.0 0.12 PgspIn 0 % Client 2

10 0.0 0.0 0.0 100.0 0.12 PgspOut 0

11 0.0 0.0 0.0 100.0 0.12 PageIn 0 PAGING SPACE

12 100.0 0.0 0.0 0.0 0.63 PageOut 0 Size,MB 2048

13 0.0 0.0 0.0 100.0 0.12 Sios 0 % Used 0

14 0.0 0.0 0.0 100.0 0.12 % Free 100

15 0.0 0.0 0.0 100.0 0.12 NFS (calls/sec)

Next, we enabled scaled throughput mode (2). The workload slowly migrated to logical CPUs 4, 5, 8 and 9. So now only two primary SMT threads were active (lcpu’s 4 and 8) and two secondary threads were active (lcpu’s 5 and 9). All the processing was being performed by fewer VPs (almost like POWER6).

# schedo -p -o vpm_throughput_mode=2

Topas Monitor for host:aixlpar1 EVENTS/QUEUES FILE/TTY

Fri Dec 21 11:07:36 2012 Interval:2 Cswitch 179 Readch 935

Syscall 301 Writech 794

CPU User% Kern% Wait% Idle% Physc Reads 112 Rawin 0

0 14.1 60.3 0.0 25.6 0.00 Writes 2 Ttyout 304

1 5.7 32.8 0.0 61.5 0.00 Forks 0 Igets 0

2 0.0 1.6 0.0 98.4 0.00 Execs 0 Namei 10

3 0.0 3.0 0.0 97.0 0.00 Runqueue 4.00 Dirblk 0

4 100.0 0.0 0.0 0.0 0.47 Waitqueue 0.0

5 100.0 0.0 0.0 0.0 0.47 MEMORY

6 0.0 0.0 0.0 100.0 0.03 PAGING Real,MB 4096

7 0.0 0.0 0.0 100.0 0.03 Faults 0 % Comp 22

8 100.0 0.0 0.0 0.0 0.47 Steals 0 % Noncomp 2

9 100.0 0.0 0.0 0.0 0.47 PgspIn 0 % Client 2

10 0.0 0.0 0.0 100.0 0.03 PgspOut 0

11 0.0 0.0 0.0 100.0 0.03 PageIn 0 PAGING SPACE

12 0.0 50.6 0.0 49.4 0.00 PageOut 0 Size,MB 2048

15 0.0 13.3 0.0 86.7 0.00 Sios 0 % Used 0

14 0.0 0.8 0.0 99.2 0.00 % Free 100

15 0.0 0.8 0.0 99.2 0.00 NFS (calls/sec)

And finally, I tried scaled mode with all four SMT threads (4). All of the workload was migrated to a single VP but all 4 SMT threads were being utilised (primary SMT thread lcpu 0, secondary/tertiary SMT threads 1, 2 &3). This mode offers lower overall core consumption but has the (possibly negative) side effect of enabling more SMT threads on a single VP/core, which may not perform as well as the same workload evenly dispatched to 4 individual VPs/cores (on primary SMT threads).

# schedo -p -o vpm_throughput_mode=4

Topas Monitor for host:aixlpar1 EVENTS/QUEUES FILE/TTY

Fri Dec 21 11:08:30 2012 Interval:2 Cswitch 199 Readch 385

Syscall 149 Writech 769

CPU User% Kern% Wait% Idle% Physc Reads 2 Rawin 0

0 99.7 0.3 0.0 0.0 0.25 Writes 1 Ttyout 375

1 99.9 0.1 0.0 0.0 0.25 Forks 0 Igets 0

2 100.0 0.0 0.0 0.0 0.25 Execs 0 Namei 6

3 100.0 0.0 0.0 0.0 0.25 Runqueue 4.00 Dirblk 0

4 0.0 48.8 0.0 51.2 0.00 Waitqueue 0.0

5 0.0 5.5 0.0 94.5 0.00 MEMORY

6 0.0 3.6 0.0 96.4 0.00 PAGING Real,MB 4096

7 0.0 3.3 0.0 96.7 0.00 Faults 0 % Comp 22

8 0.0 77.1 0.0 22.9 0.00 Steals 0 % Noncomp 2

9 0.0 52.9 0.0 47.1 0.00 PgspIn 0 % Client 2

10 0.0 44.8 0.0 55.2 0.00 PgspOut 0

11 0.0 38.5 0.0 61.5 0.00 PageIn 0 PAGING SPACE

12 0.0 51.7 0.0 48.3 0.00 PageOut 0 Size,MB 2048

13 0.0 8.9 0.0 91.1 0.00 Sios 0 % Used 0

14 0.0 7.0 0.0 93.0 0.00 % Free 100

15 0.0 7.4 0.0 92.6 0.00 NFS (calls/sec)

For more information on “Scaled Throughput” mode, take a look at the following presentation:

http://t.co/jVtJ40Ty

Tags: smt aix vpm_throughput_mode scaled_throughput chris_gibson power7

Comments (3)

Add a Comment

Quarantine this Entry

tleibmcan commented July 12 2014 Comment Permalink

Hi Chris, I wonder what happens if the LPAR has only one VP. In this case will vpm_throughput_mode=4 work better as all 4 threads are used while vpm_throughput_mode=0 only uses 1 thread (primary thread)? Thanks.

tleibmcan commented July 12 2014 Comment Permalink

MZitranski commented May 14 2013 Comment Permalink

Hi Chris, AIX is smart enough not to change the dispatching behaviour when this parameter is (inadvertently) used on POWER6 or even POWER5: You can use the command 'echo "vpm" | kdb | egrep "^Scal|^Eff"' to query the kernel about which mode is set and which mode is effective. In contrast to POWER7 you will see that the "Effective SMT Mode" on POWER5 and POWER6 will not change whatever vpm_throughput_mode is set.

Blogs

Chris's AIX Blog

About this blog

Related posts

Testing AIX Live Upd...

IBM Storage Insights...

IBM Spectrum Control...

IBM Power Systems Bi...

Tip: Rebooting with ...

Tags

Selected Tags

Related Tags

Testing "Scaled Throughput" mode with AIX 7.1 on POWER7.

Send Email Notification

Quarantine this entry

Mark as Duplicate

Comments (3)