How to use offload over fabric with Knights Landing (Intel® Xeon Phi™ processor)

Offload over Fabric overview

The Intel^® Xeon Phi™ coprocessors x100, code named Knights Corner, support the offload programming model, allowing users to offload computations over PCIe and build heterogeneous applications. Such applications utilize the most prominent features of Intel^® Xeon^® processors and Intel Xeon Phi coprocessors at the same time. The most convenient way to create offloading programs is to use the Compiler Assisted Offload features of Intel^® compilers (offload directives).

Many applications use the offload programming and fit the heterogeneous model very well. Learn more about using LAMMPS* on Intel Xeon Phi coprocessor.

With the introduction of the Intel^® Xeon Phi™ x200 processor, code named Knights Landing, the offload programming model is extended and can still be used with next to no changes to the existing codes. It is implemented as Offload over Fabric (OOF) and enables offloading from servers with Intel Xeon processor to servers with Intel Xeon Phi x200 processor within a high-speed network, such as Intel^® Omni-Path Fabric.

The Offload over Fabric software is part of the Intel Xeon Phi processor software. The Intel Xeon Phi process software will be available soon.

Compilation of offloading applications

I have written a very simple offloading program demonstrating how to run workloads on Intel Xeon Phi x100 coprocessor and Intel Xeon Phi x200 processor.

#include <stdint.h>
#include <stdio.h>

#pragma omp declare target
void what_cpu()
{
    uint32_t eax;
    const uint32_t xeon_phi_x100_id = 0x00010;
    const uint32_t xeon_phi_x200_id = 0x50070;

    __asm volatile("cpuid":"=a"(eax):"a"(1));

    uint32_t this_cpu_id =  eax & 0xF00F0;

    if (this_cpu_id == xeon_phi_x100_id)
        printf("This CPU is Intel(R) XeonPhi(TM) x100 Processor!\n");
    else
    if (this_cpu_id == xeon_phi_x200_id)
        printf("This CPU is Intel(R) XeonPhi(TM) x200 Processor!\n");
    else
        printf("This CPU is other Intel(R) Processor.\n");
}

int main()
{
    printf("Running on host: ");
    what_cpu();

    #pragma omp target
    {
        printf("Running on target: ");
        what_cpu();
    }

    return 0;
}

Use the following command to compile this code using the Intel compiler for Intel Xeon Phi x100 coprocessor:

$ icc -qopenmp -o offload-example offload-example.c

This program executed on a system with Intel Xeon Phi coprocessors x100 outputs:

Offload execution on Intel(R) Xeon Phi(TM) coprocessor

Execute the following command to compile the same code for Offload over Fabric:

$ icc -qopenmp -qoffload-arch=mic-avx512 -o offload-example offload-example.c

Executing the program on a system connected over Intel Omni-Path Fabric to a server with Intel Xeon Phi x200 processor results in the following output:

Offload execution on Intel(R) Xeon Phi(TM) processor

The only difference is -qoffload-arch=mic-avx512 switch which was used to compile for offloading to Intel Xeon Phi x200 processor. Identical command line can be used to compile code for Intel Xeon Phi x200 coprocessor. Programs compiled for Intel Xeon Phi x200 processor should run without recompilation on Intel Xeon Phi x200 coprocessors.

Use Intel^® Parallel Studio XE 2017 Beta Update 1 or newer to compile offloading programs for Intel Xeon Phi x200 processor.

Configuration of offloading application

When offloading over PCIe, the offloading infrastructure can easily find out if coprocessors are present in the system and available for offloading. In the network environment, Offload over Fabric uses the OFFLOAD_NODES environment variable to configure nodes available for the offloading operation from the offload host. The value of OFFLOAD_NODES is a comma-separated list of nodes’ names or IP addresses. OOF requires configuring IP over IB to work. IP over IB allows using regular IP addresses in high-speed fabric network. Check the documentation of your fabric solution, such as Intel Omni-Path Fabric, for more details.

Execute the following commands to run the example code:

export OFFLOAD_NODES=192.168.0.200
./offload-example

Use the COI_OFFLOAD_NODES variable instead of OFFLOAD_NODES for Beta versions of Intel Parallel Studio XE 2017

The connection is authenticated using SSH. When connecting, the name of the offloading user is used. It is highly recommended to configure authentication using SSH keys. Otherwise, the user will have to provide their password upon first connection made by the offloading infrastructure.

The OOF runtime uses a virtual file system features to perform tasks connected to the offload process and memory management on Intel Xeon Phi x200 processor-based nodes. The runtime expects that a tmpfs virtual file system is mounted in the /tmp/intel-coi directory and a hugetlbfs file system is mounted in the /tmp/intel-coi/COI2MB directory. The /tmp/intel-coi file system is used for basic memory management. The size parameter of the file system mounted in the /tmp/intel-coi directory is treated by the offloading runtime as a total limit of memory that can be allocated by offloading processes on a target device. The file system mounted in the /tmp/intel-coi/COI2MB is used for managing the memory allocated using 2MB memory pages.

If any of the file systems is not mounted during runtime initialization, the OOF runtime will attempt to mount it and will return an error if it is not possible. Perform the following steps to mount the required file systems:

[target]# mkdir –p /tmp/intel-coi
[target]# mount -t tmpfs -o size=32g tmpfs /tmp/intel-coi
[target]# chmod 1777 /tmp/intel-coi
[target]# mkdir –p /tmp/intel-coi/COI2MB
[target]# mount none -t hugetlbfs /tmp/intel-coi/COI2MB
[target]# chmod 1777 /tmp/intel-coi/COI2MB

Execute the following command to allow the offloading processes to use huge pages, which may improve performance:

[target]# echo 4000 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_overcommit_hugepages

Summary

The Offload over Fabric model is an easy migration path for users who have existing code base, which uses the Compiler Assisted Offload (offload directives), regardless whether they are the original Intel^® Language Extensions for Offload (LEO) syntax or the newer OpenMP target directives (as shown in the simple example earlier). Check the Intel Xeon Phi x200 processor software documentation for more details on Offload over Fabric. The documentation is available with the Intel Xeon Phi processor software.

Learn more about effective usage of Intel^® Compiler's offload features.

Intel, Xeon, and Intel Xeon Phi are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other countries.

*Other names and brands may be claimed as the property of others