Currently:

I’m a software engineer working for NVIDIA Corportation as part of the ‘Programming Models and Runtime Systems for Deep Learning’ team (subgroup of CUDA Software). I was a Ph.D student at the CSE dept in the Ohio State University and defended my disseration on December 7th, 2016. I was part of the Network-based Computing Lab research group headed by Dr. D.K Panda. I received my Bachelor’s degree in Technology (B.S equivalent (no pun intended)) from the department of Information Technology at NITK.

My research interests include communication designs for accelerators/co-processors in HPC and power/energy-aware computing and performance analysis/profiling of MPI programs. I contributed to the MVAPICH MPI Project between April 2012 - August 2017.

Contact

akshay.v.3.14@gmail.com akvenkatesh@nvidia.com

CV

pdf

Journal Publications

MPI-based parallel synchronous vector evaluated particle swarm optimization for multi-objective design optimization of composite structures S.N. Omkar, Akshay Venkatesh, Mrunmaya Mudigere, Engineering Applications of Artificial Intelligence, Volume 25, Issue 8, December 2012

Conference Publications

MPI-GDS: High Performance MPI Designs with GPUDirect-aSync for CPU-GPU Control Flow Decoupling A. Venkatesh, Ching-Hsiang Chu, Khaled Hamidouche, Sreeram Potluri, Davide Rossetti and Dhabaleswar Panda - ICPP ‘17, August 2017 [Accepted]
Offloaded GPU Collectives using CORE-Direct and CUDA Capabilities on IB Clusters A. Venkatesh, K. Hamidouche, H. Subramoni, DK Panda - HiPC ‘15, December 2015
A Case for Application-Oblivious Energy-Efficient MPI Runtime A. Venkatesh , A. Vishnu , K. Hamidouche , N. Tallent , D. K. Panda , D. Kerbyson , and A. Hoise - Supercomputing 15, Nov 2015 [Accepted as Best Student Paper Finalist]
Exploiting GPUDirect RDMA in Designing High Performance OpenSHMEM for NVIDIA GPU Clusters K. Hamidouche , A. Venkatesh , A. Awan , H. Subramoni , and D. K. Panda - IEEE Cluster 2015, Sep 2015
Designing Non-Blocking Personalized Collectives with Near Perfect Overlap for RDMA-Enabled Clusters H. Subramoni , A. Awan , K. Hamidouche , D. Pekurovsky , A. Venkatesh , S. Chakraborty , K. Tomko , and D. K. Panda - ISC ‘15, Jul 2015
Non-blocking PMI Extensions for Fast MPI Startup S. Chakraborty , H. Subramoni , A. Moody , A. Venkatesh , J. Perkins , and D. K. Panda - CCGrid ‘15, May 2015
A High Performance Broadcast Design with Hardware Multicast and GPUDirect RDMA for Streaming Applications on Infiniband Clusters A. Venkatesh, H. Subramoni, K.Hamidouche, DK Panda - High Performance Computing ‘14
Designing MPI Library with Dynamic Connected Transport (DCT) of InfiniBand: Early Experiences H. Subramoni, K.Hamidouche, A. Venkatesh, S. Chakraborty, DK Panda - Int’l Super Computing Conference (ISC ‘14), May 2014
High Performance Alltoall and Allgather designs for InfiniBand MIC Clusters A. Venkatesh, S. Potluri, R. Rajachandrasekar, M. Luo, K. Hamidouche, DK Panda International Parallel and Distributed Processing Symposium (IPDPS’14), May 2014
MVAPICH-PRISM: A Proxy-based Communication Framework using InfiniBand and SCIF for Intel MIC Clusters S. Potluri, D. Bureddy, K. Hamidouche, A. Venkatesh, K. Kandalla, H. Subramoni and D. K. Panda - Int’l Conference on Supercomputing (SC ‘13), November 2013
Efficient Inter-node MPI Communication using GPUDirect RDMA for InfiniBand Clusters with NVIDIA GPUs S. Potluri, K. Hamidouche, A. Venkatesh, D. Bureddy and D. Panda. - Int’l Conference on Parallel Processing (ICPP ‘13), October 2013
Designing Optimized MPI Broadcast and Allreduce for Many Integrated Core (MIC) InfiniBand Clusters K. Kandalla, A. Venkatesh, K. Hamidouche, S. Potluri and D. K. Panda - Int’l Symposium on High-Performance Interconnects (HotI ‘13), August 2013.
Efficient Intra-node Communication on Intel-MIC Clusters S. Potluri, A. Venkatesh, D. Bureddy, K. Kandalla, and D. K. Panda - International Symposium on Cluster, Cloud and Grid Computing (CCGrid), May 2013
OMB-GPU: A Micro-Benchmark Suite for Evaluating MPI Libraries on GPU Clusters Devendar Bureddy, Hao Wang, A. Venkatesh, Sreeram Potluri, Dhabaleswar K. Panda, EUROMPI 2012

Workshop Publications

Optimizing Collective Communication in UPC J. Jose, K. Hamidouche, J. Zhang, A. Venkatesh, and D. K. Panda, Int’l Workshop on High-Level Parallel Programming Models and Supportive Environments (HIPS ‘14), held in conjunction with International Parallel and Distributed Processing Symposium (IPDPS’14), May 2014
A Comprehensive Performance Evaluation of OpenSHMEM Libraries on InfiniBand Clusters J. Jose, J. Zhang, A. Venkatesh, S. Potluri and D. K. Panda, First OpenSHMEM Workshop: Experiences, Implementations and Tools (OpenSHMEM ‘13), October 2013
UPC on MIC: Early Experiences with Native and Symmetric Modes M. Luo, M. Li, A. Venkatesh, X. Lu and D. K. Panda, Int’l Conference on Partitioned Global Address Space Programming Models (PGAS ‘13), October 2013
Optimized MPI Gather collective for Many Integrated Core (MIC) InfiniBand Clusters. A. Venkatesh, K. Kandalla and D. K. Panda - Extreme Scaling Workshop, August 2013.
Evaluation of Energy Characteristics of MPI Communication Primitives with RAPL A. Venkatesh, K. Kandalla, D. K. Panda - High-Performance, Power-Aware Computing (HPPAC), 2013

Short Paper Publications

MIC-Check: A Distributed Checkpointing Framework for the Intel Many Integrated Cores Architecture, International ACM Symposium on High-Performance Parallel and Distributed Computing (HPDC 2014) R. Rajachandrasekar, S. Potluri, A. Venkatesh, K. Hamidouche, Md. Rahman and D. K. Panda

Misc:

Origin of my name ***