mat: Cholesky - Better support for custom matrices
Created by: btracey
In Gonum in general, we could do a better job of using matrix interfaces to support specialization. This issue is to think about mat/Cholesky
and stat/distmv/Normal
as a starting test case. This is a bit wordy, but I'm not sure where to draw the line between goals and non-goals.
The core opportunity is that there are common special cases for Cholesky factorization that can lead to significant improvements. The Cholesky decomposition of a Diagonal matrix is merely the square root of each entry. The current implementation forces O(n^2) space, which is a significant difference for scaling to large sizes. Yet, Diagonal covariance matrices are among the most common. Related improvements exist for Banded matrices (the Cholesky decomposition of a SymBanded
is a TriBanded
). I don't know that this is true, but I suspect there are certain sparse forms of symmetric matrix that also have a sparse triangular matrix.
On the Cholesky side, right now we store the Cholesky factorization as a TriDense
, and instead we could store that as a kind of Triangular
(more later). The mat
types could be specially dealt with in Factorize
, and then there could be an interface that is along the lines of Cholesky() Triangular
so types could provide their specializations. Some functions (LogDet
) extend naturally from this formulation. It seems like it has to be a non-goal to support custom versions of the "recovery" functions (UTo
, LTo
, etc.), because it's impossible to know what type you can extract to from an arbitrary Cholesky. The most important functions in Cholesky
are the Solve routines. From a brief glance, all of them can be implemented if the Triangular
is really
type CholTriangular interface {
Triangular
SolveVecTo(x *VecDense, b Vector) error
SolveMatTo(x *Dense, b Matrix) error
}
Which are methods that can be added to TriDense
, DiagDense
, etc. I think these are good methods to add anyway, as they are a core primitive in many matrix-agnostic algorithms (similarly MulTo
is a core component of the approximate linear solve routines in exp
. Scale
we could either solve by adding a fourth method to CholTriangular, or by adding a switch for the right method. The last few functions (i.e. ExtendVecSym
) are quite specialized and it's totally expected you would lose your sparsity properties.
Fixing Chol in this way on its own would have a big improvement for many common cases. We provide direct routines for important ones, NormalRand
, NormalLogProb
, and those routines would be faster and use less memory. I don't think it's directly possible to do the same for all of Normal
under the way things currently are.
The problem is for Normal
we keep a copy of the Symmetric matrix and its Cholesky factorization. As far as my imagination has taken me, the best way to implement this would be to have a CopySym() Symmetric
method, and then change Normal
to require a Symmetric
that also implements CopySymer
. This seems like a pretty tragic API decision. Another solution could be to keep the Symmetric instead of cloning it, but again, that does not match with the ways our APIs work in most places. It seems there are two reasonable solutions.
The first is that we don't do anything about it, and users need to decide if they want to use the full Normal
or if they want to ask for Cholesky
. An example of an algorithm would be a Kernel Density estimator, should it work with a Symmetric
, or should it just work with Cholesky
The second is that we don't keep around the Symmetric at all. (I realize that we've had this discussion before, with me on the other side...). The most common operation that uses the Symmetric form is CovarianceMatrix
, and for any common distrubution this would be an O(N^2) operation, so it's not out of line for it to also be an N^2 operation here. The other operations that use it (MarginalNormal
) would be impossible to maintain a sparsity pattern anyway. The upside to only storing the Cholesky decomposition is that one saves the O(N^2) space cost, and it would also make NewNormalChol
a nearly cost-free operation. These cost savings would then mean that NormalLogProb
is redundant, and so those methods can be eliminated.
My proposal (assuming I haven't missed anything) would be to modify mat.Cholesky
as discussed above, and modify distmv.Normal
and distmv.StudentsT
to only store the Cholesky decomposition rather than the symmetric matrix. This would allow the common cases of distmv.Normal
to be fast and efficient when the covariance matrix has nice structure.