Impact of the Extrinsic Geometry on Linear Regression

Impact of the Extrinsic Geometry on Linear Regression

In this paper, we study linear regression applied to data structured on a manifold. We assume that the data manifold is smooth and is embedded in a Euclidean space, and our objective is to reveal the impact of the data manifold's extrinsic geometry on the regression. Specifically, we analyze the impact of the manifold's curvatures (or higher order nonlinearity in the parameterization when the curvatures are locally zero) on the uniqueness of the regression solution. Our findings suggest that the corresponding linear regression does not have a unique solution when the manifold is flat. Otherwise, the manifold's curvature (or higher order nonlinearity in the embedding) may contribute significantly, particularly in the solution associated with the normal directions of the manifold. Our findings thus reveal the role of data manifold geometry in ensuring the stability of regression models for out-of-distribution inferences.

Main results

We investigate the local linear regression on smooth function $g(\mathbf{x})$ with data points sampled from different smooth manifolds $M\subset \mathbb R^d$.

  1. When $M$ is given by $M=(x, y=kx^2)\subset \mathbb R^2$, where $k$ characterizes the curvature, the linear regression has leading order solutions: $w_x = \frac{\partial g}{\partial x}$, $w_y = \frac{\partial g}{\partial y} + \frac{1}{2k}\frac{\partial^2 g}{\partial x^2}$. This implies a direct impact of the manifold's curvature on the linear regression. Numerical verifications of this result are provided below:
    Numerical simulations with different $k$.
  2. For any hypersurface $M\subset\mathbb R^d$, one can obtain a quadratic approximation locally. Then a generalized solution formula for the local linear regression can be explicitly obtained: $$ \begin{cases} w_{x_i} = \frac{\partial g}{\partial x_i},\\ w_y = \frac{\partial g}{\partial y} + \frac{1}{2}\frac{\displaystyle\sum_{i=1}^{d-1}k_i \frac{\partial^2 g}{\partial x_i^2}}{\displaystyle\sum_{i=1}^{d-1}k_i^2}. \\ \end{cases} $$
  3. Data in practice usually contains noises. We show that a suitable scale of Gaussian noise $\sim\mathcal N(0, \sigma^2)$ can regularize the behavior caused by the problematic curvature. This balancing effect is demonstrated in the formula: $$ w_y = \frac{\partial g}{\partial y} + \frac{1}{2}\frac{k}{k^2+\frac{45}4\sigma^2}\frac{\partial^2 g}{\partial x^2},$$ and verified in the numerical experiment:
    Numerical simulations with different scale of noise $\sigma$.
    The intuition behind is that the noise blurs the problematic structure of the manifold.
  4. For manifold with codimension $>1$, we perform numerical experiments to demonstrate the importance of the awareness of extrinsic dimension and embedding dimension. These characteristics crucially affect the well-posedness of the regression algorithm. See paper for details.

Publications

  • Liu, L., He, J., & Tsai, R. (2023) Linear Regression on Manifold Structured Data: the Impact of Extrinsic Geometry on Solutions. Topology, Algebra and Geometry in Machine Learning workshop, International Conference of Machine Learning