Desktop version

Home arrow Computer Science arrow OpenSHMEM and Related Technologies. Enhancing OpenSHMEM for Hybrid Environments: Third Workshop, OpenSHMEM 2016, Baltimore, MD, USA, August 2 – 4, 2016, Revised Selected Papers

Source

Ping-Pong Latency

Figures 4 and 5 compare the round trip time for shmem.put, shmem.put.nbi, and shmem.put.nbe for small and large messages respectively. The origin PE sends a ping using shmemjput, shmem.put.nbi, or shmem.put.nbe, and the destination

Comparing performance of shmem.getmem, shmem.getmem.nbi, and shmem. getmem.nbe

Fig. 2. Comparing performance of shmem.getmem, shmem.getmem.nbi, and shmem. getmem.nbe

Comparing performance of OpenSHMEM OSU shmem get many benchmark using 64 PEs

Fig. 3. Comparing performance of OpenSHMEM OSU shmem get many benchmark using 64 PEs

Roundtrip latency using put-based ping-pong benchmark for small messages

Fig. 4. Roundtrip latency using put-based ping-pong benchmark for small messages

PE and then waits on a corresponding pong using shmemJntwaiLuntil. On receiving the ping, the destination PE responds with a pong through a Put. The target PE waits on the last byte of the message.

Roundtrip latency using put-based ping-pong benchmark for large messages

Fig. 5. Roundtrip latency using put-based ping-pong benchmark for large messages

For our experiments we use Mellanox’s InfiniBand HCA as network and use RC protocol for data transfer, which guarantees in-order delivery of messages. For this setup, polling on the last byte of data to learn the completion is a reasonable approach, although it might be inaccurate for networks and memory architectures that do not guarantee in-order delivery of messages. For completion, the shmemjput and shmemjputjnbi calls require a shmem^quiet, while shmemjput-nbe requires a shmemjwaitjreq on the request.

From the graphs, one can observe that there are some performance differences. For a one byte message, the round trip latencies of shmem_put, shmemjputjnbi, and shmemjputjnbe are 1.58 psec, 1.54 psec, and 1.52 psec respectively. For 4 MB message, the latencies are 753.29 psec, 704.54 psec, and 685.65 psec respectively. The performance difference in case of small message is negligible.

 
Source
Found a mistake? Please highlight the word and press Shift + Enter  
< Prev   CONTENTS   Next >

Related topics