Currently:
I’m a software engineer working for NVIDIA Corportation as part of the ‘Programming Models and Runtime Systems for Deep Learning’ team (subgroup of CUDA Software). I was a Ph.D student at the CSE dept in the Ohio State University and defended my disseration on December 7th, 2016. I was part of the Network-based Computing Lab research group headed by Dr. D.K Panda. I received my Bachelor’s degree in Technology (B.S equivalent (no pun intended)) from the department of Information Technology at NITK.
My research interests include communication designs for accelerators/co-processors in HPC and power/energy-aware computing and performance analysis/profiling of MPI programs. I contributed to the MVAPICH MPI Project between April 2012 - August 2017.
Contact
akshay.v.3.14@gmail.com akvenkatesh@nvidia.com
CV
Journal Publications
- MPI-based parallel synchronous vector evaluated particle swarm optimization for multi-objective design optimization of composite structures S.N. Omkar, Akshay Venkatesh, Mrunmaya Mudigere, Engineering Applications of Artificial Intelligence, Volume 25, Issue 8, December 2012
Conference Publications
- MPI-GDS: High Performance MPI Designs with GPUDirect-aSync for CPU-GPU Control Flow Decoupling A. Venkatesh, Ching-Hsiang Chu, Khaled Hamidouche, Sreeram Potluri, Davide Rossetti and Dhabaleswar Panda - ICPP ‘17, August 2017 [Accepted]
- Offloaded GPU Collectives using CORE-Direct and CUDA Capabilities on IB Clusters A. Venkatesh, K. Hamidouche, H. Subramoni, DK Panda - HiPC ‘15, December 2015
- A Case for Application-Oblivious Energy-Efficient MPI Runtime A. Venkatesh , A. Vishnu , K. Hamidouche , N. Tallent , D. K. Panda , D. Kerbyson , and A. Hoise - Supercomputing 15, Nov 2015 [Accepted as Best Student Paper Finalist]
- Exploiting GPUDirect RDMA in Designing High Performance OpenSHMEM for NVIDIA GPU Clusters K. Hamidouche , A. Venkatesh , A. Awan , H. Subramoni , and D. K. Panda - IEEE Cluster 2015, Sep 2015
- Designing Non-Blocking Personalized Collectives with Near Perfect Overlap for RDMA-Enabled Clusters H. Subramoni , A. Awan , K. Hamidouche , D. Pekurovsky , A. Venkatesh , S. Chakraborty , K. Tomko , and D. K. Panda - ISC ‘15, Jul 2015
- Non-blocking PMI Extensions for Fast MPI Startup S. Chakraborty , H. Subramoni , A. Moody , A. Venkatesh , J. Perkins , and D. K. Panda - CCGrid ‘15, May 2015
- A High Performance Broadcast Design with Hardware Multicast and GPUDirect RDMA for Streaming Applications on Infiniband Clusters A. Venkatesh, H. Subramoni, K.Hamidouche, DK Panda - High Performance Computing ‘14
- Designing MPI Library with Dynamic Connected Transport (DCT) of InfiniBand: Early Experiences H. Subramoni, K.Hamidouche, A. Venkatesh, S. Chakraborty, DK Panda - Int’l Super Computing Conference (ISC ‘14), May 2014
- High Performance Alltoall and Allgather designs for InfiniBand MIC Clusters A. Venkatesh, S. Potluri, R. Rajachandrasekar, M. Luo, K. Hamidouche, DK Panda International Parallel and Distributed Processing Symposium (IPDPS’14), May 2014
- MVAPICH-PRISM: A Proxy-based Communication Framework using InfiniBand and SCIF for Intel MIC Clusters S. Potluri, D. Bureddy, K. Hamidouche, A. Venkatesh, K. Kandalla, H. Subramoni and D. K. Panda - Int’l Conference on Supercomputing (SC ‘13), November 2013
- Efficient Inter-node MPI Communication using GPUDirect RDMA for InfiniBand Clusters with NVIDIA GPUs S. Potluri, K. Hamidouche, A. Venkatesh, D. Bureddy and D. Panda. - Int’l Conference on Parallel Processing (ICPP ‘13), October 2013
- Designing Optimized MPI Broadcast and Allreduce for Many Integrated Core (MIC) InfiniBand Clusters K. Kandalla, A. Venkatesh, K. Hamidouche, S. Potluri and D. K. Panda - Int’l Symposium on High-Performance Interconnects (HotI ‘13), August 2013.
- Efficient Intra-node Communication on Intel-MIC Clusters S. Potluri, A. Venkatesh, D. Bureddy, K. Kandalla, and D. K. Panda - International Symposium on Cluster, Cloud and Grid Computing (CCGrid), May 2013
- OMB-GPU: A Micro-Benchmark Suite for Evaluating MPI Libraries on GPU Clusters Devendar Bureddy, Hao Wang, A. Venkatesh, Sreeram Potluri, Dhabaleswar K. Panda, EUROMPI 2012
Workshop Publications
- Optimizing Collective Communication in UPC J. Jose, K. Hamidouche, J. Zhang, A. Venkatesh, and D. K. Panda, Int’l Workshop on High-Level Parallel Programming Models and Supportive Environments (HIPS ‘14), held in conjunction with International Parallel and Distributed Processing Symposium (IPDPS’14), May 2014
- A Comprehensive Performance Evaluation of OpenSHMEM Libraries on InfiniBand Clusters J. Jose, J. Zhang, A. Venkatesh, S. Potluri and D. K. Panda, First OpenSHMEM Workshop: Experiences, Implementations and Tools (OpenSHMEM ‘13), October 2013
- UPC on MIC: Early Experiences with Native and Symmetric Modes M. Luo, M. Li, A. Venkatesh, X. Lu and D. K. Panda, Int’l Conference on Partitioned Global Address Space Programming Models (PGAS ‘13), October 2013
- Optimized MPI Gather collective for Many Integrated Core (MIC) InfiniBand Clusters. A. Venkatesh, K. Kandalla and D. K. Panda - Extreme Scaling Workshop, August 2013.
- Evaluation of Energy Characteristics of MPI Communication Primitives with RAPL A. Venkatesh, K. Kandalla, D. K. Panda - High-Performance, Power-Aware Computing (HPPAC), 2013
Short Paper Publications
- MIC-Check: A Distributed Checkpointing Framework for the Intel Many Integrated Cores Architecture, International ACM Symposium on High-Performance Parallel and Distributed Computing (HPDC 2014) R. Rajachandrasekar, S. Potluri, A. Venkatesh, K. Hamidouche, Md. Rahman and D. K. Panda