A Lasso for Hierarchical Interactions

Preprint

22 May 2012

preprint

Published in ArXiv

http://arxiv.org/abs/1205.5050v2

Abstract

We add a set of convex constraints to the Lasso to produce sparse interaction models that honor the hierarchy restriction that an interaction only be included in a model if one or both variables are marginally important. We give a precise characterization of the effect of this hierarchy constraint, prove that hierarchy holds with probability one, and derive an unbiased estimate for the degrees of freedom of our estimator. A bound on this estimate reveals the amount of fitting "saved" by the hierarchy constraint. We distinguish between parameter sparsity -- the number of nonzero coefficients -- and practical sparsity -- the number of raw variables one must measure to make a new prediction. Hierarchy focuses on the latter, which is more closely tied to important data collection concerns such as cost, time, and effort. We develop an algorithm, available in the R package hierNet, and perform an empirical study of our method.

A Lasso for Hierarchical Interactions

Abstract

Other Versions