To read this content please select one of the options below:

Using benchmarking to determine efficient usage of nodes in a cluster

Glenn R. Luecke (Iowa State University, Ames, Iowa, USA)
Ying Li (Iowa State University, Ames, Iowa, USA)
Martin Cuma (Centre for High Performance Computing, University of Utah, Salt Lake City, Utah, USA)

Benchmarking: An International Journal

ISSN: 1463-5771

Article publication date: 30 October 2007

439

Abstract

Purpose

The purpose of this paper is to evaluate how to use nodes in a cluster efficiently by studying the NAS Parallel Benchmarks (NASPB) on Intel Xeon and AMD Opteron dual CPU Linux clusters.

Design/methodology/approach

The performance results of the NASPB are presented both with one MPI process per node (1 ppn) and with two MPI processes per node (2 ppn). These benchmark results were analyzed by considering the impact of cache effects, code scalability, memory bandwidth within nodes, and the impact of MPI and the MPI communication network. Memory bandwidth was benchmarked using MPI versions of the Streams benchmarks. The impact of MPI and the MPI communication network are evaluated by benchmarking the performance of MPI sends and receives, MPI broadcast, and the MPI all‐to‐all routines.

Findings

The performance results from running the NASPB and from the memory bandwidth benchmarks show that better performance can sometimes be achieved using 1 ppn. Performance results show that the AMD Opteron/Myrinet cluster is able to achieve significantly better utilization of the second processor than the Intel Xeon/Myrinet cluster.

Practical implications

Most Linux clusters are purchased with two processors per node. One would like to run all applications on a cluster with two processors per node using 2 ppn instead of 1 ppn in order to utilize the second processor on each node. However, our results show that this is not always the best choice. Users should always assess their program performance with both 1 ppn and 2 ppn before running production calculations. This issue becomes even more important with the emergence of multi‐core processors.

Originality/value

To the authors' best knowledge, this is the only detailed comparison of AMD Opteron and Intel Xeon dual processor node parallel performance on large Myrinet clusters. The paper should be of value to everybody considering running on or purchasing AMD or Intel‐based Linux cluster.

Keywords

Citation

Luecke, G.R., Li, Y. and Cuma, M. (2007), "Using benchmarking to determine efficient usage of nodes in a cluster", Benchmarking: An International Journal, Vol. 14 No. 6, pp. 728-749. https://doi.org/10.1108/14635770710834518

Publisher

:

Emerald Group Publishing Limited

Copyright © 2007, Emerald Group Publishing Limited

Related articles