mat: Cholesky - Better support for custom matrices

Created by: btracey

In Gonum in general, we could do a better job of using matrix interfaces to support specialization. This issue is to think about mat/Cholesky and stat/distmv/Normal as a starting test case. This is a bit wordy, but I'm not sure where to draw the line between goals and non-goals.

The core opportunity is that there are common special cases for Cholesky factorization that can lead to significant improvements. The Cholesky decomposition of a Diagonal matrix is merely the square root of each entry. The current implementation forces O(n^2) space, which is a significant difference for scaling to large sizes. Yet, Diagonal covariance matrices are among the most common. Related improvements exist for Banded matrices (the Cholesky decomposition of a SymBanded is a TriBanded). I don't know that this is true, but I suspect there are certain sparse forms of symmetric matrix that also have a sparse triangular matrix.

On the Cholesky side, right now we store the Cholesky factorization as a TriDense, and instead we could store that as a kind of Triangular (more later). The mat types could be specially dealt with in Factorize, and then there could be an interface that is along the lines of Cholesky() Triangular so types could provide their specializations. Some functions (LogDet) extend naturally from this formulation. It seems like it has to be a non-goal to support custom versions of the "recovery" functions (UTo, LTo, etc.), because it's impossible to know what type you can extract to from an arbitrary Cholesky. The most important functions in Cholesky are the Solve routines. From a brief glance, all of them can be implemented if the Triangular is really

type CholTriangular interface {
    Triangular
    SolveVecTo(x *VecDense, b Vector)  error
    SolveMatTo(x *Dense, b Matrix) error
}

Which are methods that can be added to TriDense, DiagDense, etc. I think these are good methods to add anyway, as they are a core primitive in many matrix-agnostic algorithms (similarly MulTo is a core component of the approximate linear solve routines in exp. Scale we could either solve by adding a fourth method to CholTriangular, or by adding a switch for the right method. The last few functions (i.e. ExtendVecSym) are quite specialized and it's totally expected you would lose your sparsity properties.

Fixing Chol in this way on its own would have a big improvement for many common cases. We provide direct routines for important ones, NormalRand, NormalLogProb, and those routines would be faster and use less memory. I don't think it's directly possible to do the same for all of Normal under the way things currently are.

The problem is for Normal we keep a copy of the Symmetric matrix and its Cholesky factorization. As far as my imagination has taken me, the best way to implement this would be to have a CopySym() Symmetric method, and then change Normal to require a Symmetric that also implements CopySymer. This seems like a pretty tragic API decision. Another solution could be to keep the Symmetric instead of cloning it, but again, that does not match with the ways our APIs work in most places. It seems there are two reasonable solutions.

The first is that we don't do anything about it, and users need to decide if they want to use the full Normal or if they want to ask for Cholesky. An example of an algorithm would be a Kernel Density estimator, should it work with a Symmetric, or should it just work with Cholesky

The second is that we don't keep around the Symmetric at all. (I realize that we've had this discussion before, with me on the other side...). The most common operation that uses the Symmetric form is CovarianceMatrix, and for any common distrubution this would be an O(N^2) operation, so it's not out of line for it to also be an N^2 operation here. The other operations that use it (MarginalNormal) would be impossible to maintain a sparsity pattern anyway. The upside to only storing the Cholesky decomposition is that one saves the O(N^2) space cost, and it would also make NewNormalChol a nearly cost-free operation. These cost savings would then mean that NormalLogProb is redundant, and so those methods can be eliminated.

My proposal (assuming I haven't missed anything) would be to modify mat.Cholesky as discussed above, and modify distmv.Normal and distmv.StudentsT to only store the Cholesky decomposition rather than the symmetric matrix. This would allow the common cases of distmv.Normal to be fast and efficient when the covariance matrix has nice structure.

Admin message

mat: Cholesky - Better support for custom matrices