Thallo – Scheduling for High-Performance Large-Scale Non-Linear Least-Squares Solvers

24 September 2021

journal article
research article
Published by Association for Computing Machinery (ACM) in ACM Transactions on Graphics

Vol. 40 (5), 1-14
https://doi.org/10.1145/3453986

Abstract

Large-scale optimization problems at the core of many graphics, vision, and imaging applications are often implemented by hand in tedious and error-prone processes in order to achieve high performance (in particular on GPUs), despite recent developments in libraries and DSLs. At the same time, these hand-crafted solver implementations reveal that the key for high performance is a problem-specific schedule that enables efficient usage of the underlying hardware. In this work, we incorporate this insight into Thallo, a domain-specific language for large-scale non-linear least squares optimization problems. We observe various code reorganizations performed by implementers of high-performance solvers in the literature, and then define a set of basic operations that span these scheduling choices, thereby defining a large scheduling space. Users can either specify code transformations in a scheduling language or use an autoscheduler. Thallo takes as input a compact, shader-like representation of an energy function and a (potentially auto-generated) schedule, translating the combination into high-performance GPU solvers. Since Thallo can generate solvers from a large scheduling space, it can handle a large set of large-scale non-linear and non-smooth problems with various degrees of non-locality and compute-to-memory ratios, including diverse applications such as bundle adjustment, face blendshape fitting, and spatially-varying Poisson deconvolution, as seen in Figure 1. Abstracting schedules from the optimization, we outperform state-of-the-art GPU-based optimization DSLs by an average of 16× across all applications introduced in this work, and even some published hand-written GPU solvers by 30%+.

Keywords

This publication has 43 references indexed in Scilit:

Real-time non-rigid reconstruction using an RGB-D camera
ACM Transactions on Graphics, 2014
Real-time 3D reconstruction at scale using voxel hashing
ACM Transactions on Graphics, 2013
Building Rome in a day
Communications of the ACM, 2011
An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness
ACM SIGARCH Computer Architecture News, 2009
Roofline
Communications of the ACM, 2009
Embedded deformation for shape manipulation
ACM Transactions on Graphics, 2007
Poisson image editing
ACM Transactions on Graphics, 2003
3D structure from 2D motion
IEEE Signal Processing Magazine, 1999
ALGLIB, a simple symbol-manipulation package
Communications of the ACM, 1985
Determining optical flow
Artificial Intelligence, 1981

Cited by 3 articles