Case Study: BerkeleyGW using Intel® Xeon Phi™ Processors
BerkeleyGW is a Materials Science application for calculating the excited state properties of materials such as band gaps, band structures, absoprtion spectroscopy, photoemission spectroscopy and more....
View ArticleImprove Performance with Vectorization
This article focuses on the steps to improve software performance with vectorization. Included are examples of full applications along with some simpler cases to illustrate the steps to vectorization....
View ArticleMonte-Carlo simulation on Asian Options Pricing
This is an exercise in performance optimization on heterogeneous Intel architecture systems based on multi-core processors and manycore (MIC) coprocessors.NOTE: this lab follows the discussion in...
View ArticleDirect N-body Simulation
Exercise in performance optimization on Intel Architecture, including Intel® Xeon Phi™ processorsNOTE: this lab is an overview of various optimizations discussed in Chapter 4 in the book "Parallel...
View ArticleMultithreaded Transposition of Square Matrices with Common Code for Intel®...
In-place matrix transposition, a standard operation in linear algebra, is a memory bandwidth-bound operation. The theoretical maximum performance of transposition is the memory copy bandwidth. However,...
View ArticleFine-Tuning Vectorization and Memory Traffic on Intel® Xeon Phi™...
by Andrey Vladimirov, Colfax InternationalCommon techniques for fine-tuning the performance of automatically vectorized loops in applications for Intel® Xeon Phi™ coprocessors are discussed. These...
View ArticleOffload over Fabric to Intel® Xeon Phi™ Processor: Tutorial
The OpenMP* 4.0 device constructs supported by the Intel® C++ Compiler can be used to offload a workload from an Intel® Xeon® processor-based host machine to Intel® Xeon Phi™ coprocessors over...
View ArticleHow to Mount a Shared Directory on Intel® Xeon Phi™ Coprocessor
In order to run a native program on the Intel® Xeon Phi™ coprocessor, the program and any dependencies must be copied to the target platform. However, this approach takes away memory from the native...
View ArticleIntroduction to Heterogeneous Streams Library
IntroductionTo efficiently utilize all available resources for the task concurrency application on heterogeneous platforms, designers need to understand the memory architecture, the thread utilization...
View ArticleCaffe* Optimized for Intel® Architecture: Applying Modern Code Techniques
Improving the computational performance of a deep learning frameworkPDF versionAuthorsVadim Karpusenko, Ph.D., Intel Corporation Andres Rodriguez, Ph.D., Intel Corporation Jacek Czaja, Intel...
View ArticleHybrid Parallelism: A MiniFE* Case Study
In my first article, Hybrid Parallelism: Parallel Distributed Memory and Shared Memory Computing, I discussed the chief forms of parallelism: shared memory parallel programming and distributed memory...
View ArticleDirect N-body Simulation
Exercise in performance optimization on Intel Architecture, including Intel® Xeon Phi™ processors.NOTE: this lab is an overview of various optimizations discussed in Chapter 4 in the book "Parallel...
View ArticleRecipe: Building NAMD on Intel® Xeon® and Intel® Xeon Phi™ Processors
PurposeThis recipe describes a step-by-step process of how to get, build, and run NAMD, Scalable Molecular Dynamic, code on Intel® Xeon Phi™ processor and Intel® Xeon® E5 processors for better...
View ArticleIntroducing DNN primitives in Intel® Math Kernel Library
Deep Neural Networks (DNNs) are on the cutting edge of the Machine Learning domain. These algorithms received wide industry adoption in the late 1990s and were initially applied to tasks such as...
View ArticleIntel® HPC Developer Conference 2016 - Session Presentations
The 2016 Intel® HPC Developer Conference brought together developers from around the world to discuss code modernization in high-performance computing. For those who may have missed it or if you want...
View ArticleThread Parallelism in Cython*
IntroductionCython* is a superset of Python* that additionally supports C functions and C types on variable and class attributes. Cython is used for wrapping external C libraries that speed up the...
View ArticleExploring MPI for Python* on Intel® Xeon Phi™ Processor
IntroductionMessage Passing Interface (MPI) is a standardized message-passing library interface designed for distributed memory programming. MPI is widely used in the High Performance Computing (HPC)...
View ArticleQuick Analysis of Vectorization Using the Intel® Advisor 2017 Tool
In this article we continue our exploration of vectorization on an Intel® Xeon Phi™ processor using examples of loops that we used in a previous article. We will discuss how to use the command-line...
View ArticleIntel® Xeon Phi™ Processor 7200 Family Memory Management Optimizations
This paper examines software performance optimization for an implementation of a non-library version of DGEMM executing on the Intel® Xeon Phi™ processor (code-named Knights Landing, with acronym KNL)...
View Article3D Isotropic Acoustic Finite-Difference Wave Equation Code: A Many-Core...
Finite difference is a simple and efficient mathematical tool that helps solve differential equations. In this paper, we solve an isotropic acoustic 3D wave equation using explicit, time domain finite...
View Article