Evaluating Shared Memory Heterogeneous Systems Using Traverse-compute Workloads
Published in Open-Source Computer Architecture Research 2023, 2023
Abstract
Download here
Published in Open-Source Computer Architecture Research 2023, 2023
Download here
Published in 2023 IEEE International Symposium on Performance Analysis of Systems and Software, 2023
Shared memory heterogeneous systems are now mainstream, with nearly every mobile phone and tablet containing integrated processing units. However, developing applications for such devices is difficult as workloads must be decomposed across different processing units, and the decomposition must be flexible to account for the growing diversity of devices, each with different relative processing unit throughput. Furthermore, many devices require distinct programming front ends, requiring significant effort to write cross-platform applications. In this work, we identify a pragmatic class of applications, which we call traverse-compute applications, that are ideal for shared memory heterogeneous systems. These applications have a flexible heterogeneous decomposition where CPUs excel at traversing a tree structure, while accelerators excel at node computations. Leveraging this insight, we present Redwood: a framework for writing heterogeneous traverse-compute workloads. Redwood provides a simple processing unit abstraction and a tree traversal library that enables heterogeneous optimizations. Using Redwood, we implement Grove, a benchmark suite containing nine pragmatic tree traversal applications, e.g., k-nearest neighbors. We instantiate Redwood for three different heterogeneous programming platforms: CUDA, SYCL, and High-Level Synthesis; we use Grove to evaluate five shared memory heterogeneous systems. Our evaluation highlights the importance of flexible heterogeneous decomposition as the optimal parameters differ widely across platforms and applications. However, once optimally configured, heterogeneous implementations can provide up to 13.53x speedups (geomean of 3.01x) over homogeneous implementations, showcasing the potential of heterogeneous computing for these workloads.
Download here
Published in 2020 IEEE Intelligent Vehicles Symposium (IV), 2020
Simulation-based testing is critical for ensuring safety of autonomous vehicles. Autonomous vehicles are enabled by deep learning techniques which require a large quantity of data. With simulation testing, we can create rare events for testing and training of autonomous vehicles. Procedural generation of roads and modeling of driving behaviors in an easily extendable architecture ensures that we are able to create rare scenarios at scale with minimal artistic burden. In this paper, we present CruzWay, a system that both supports and creates these scenarios. With CruzWay, we are able to procedurally generate town sized road networks or road intersections. CruzWay supports generation of road meshes as well as navigation meshes from SUMO road network files. CruzWay can generate cars as well as pedestrians run by behavior trees (BTs) in this environment. The self-contained, modular nature of BTs in combination with procedural roads allows us to create a large number of scenarios.
Download here