Quickstart¶
The primary functionality of the library consists of dual implementations of a staticallyordered sparsedirect solver for symmetric/Hermitian matrices and a sparsedirect Determinantal Point Process sampler. Both support real and complex matrices and have sequential and multithreaded (via OpenMP’s task scheduler) implementations built on top of custom dense kernels.
This brief quickstart guide walks through building the examples and tests using the meson build system and provides short overviews of the major functionality.
Building the examples and tests¶
While catamari is a headeronly library, there are a number of configuration options (e.g., which BLAS library to use, and whether OpenMP is enabled) which are handled through preprocessor directives and compiler/linker options determined during a configuration stage. Catamari uses meson, a modern alternative to CMake, for this configuration.
One might start with a debug build (the default buildtype
in
meson). Assuming that the
Ninja build system was installed alongside
meson (it typically is), one can configure and build catamari with its default
options via:
mkdir builddebug/
meson builddebug
cd builddebug
ninja
A release version can be built by specifying the buildtype
option as
release
:
mkdir buildrelease/
meson buildrelease Dbuildtype=release
cd buildrelease
ninja
OpenMP task parallelism will be used by default if support for OpenMP was
detected; sharedmemory parallelism can be disabled with the
Ddisable_openmp=true
configuration option.
And Intel MKL support can be enabled
by configuring with Dmkl_lib=/PATH/TO/MKL/LIB
.
By default, catamari is configured with catamari::Int
equal to a
64bit signed integer. But the library can be configured with 32bit integer
support via the Duse_64bit=false
option.
In any build configuration, the library’s unit tests can be run via:
ninja test
Using catamari::Buffer<T>
instead of std::vector<T>
¶
A wellknown issue with std::vector<T>
is that it cannot be readily used
to allocate data without initializing each entry. In the case of catamari’s
multithreaded sparsedirect solver, sequential default initialization overhead
was seen to be a significant bottleneck when running on 16 cores. For this
reason, catamari makes use of quotient’s
quotient::Buffer<T>
template class as an alternative buffer allocation
mechanism. It is imported into catamari as catamari::Buffer<T>
. Both
std::vector<T>
and catamari::Buffer<T>
have the same
operator[]
entry access semantics.
The function catamari::Buffer<T>::Resize(std::size_t)
is
an alternative to std::vector<T>::resize(std::size_t)
which does not
defaultinitialize members. Likewise,
catamari::Buffer<T>::Resize(std::size_t, const T& value)
is an
analogue for std::vector<T>::resize(std::size_t, const T& value)
, but
it differs in that it will ensure that all members of the result are equal
to the specified value (not just newly allocated ones).
Lastly, the underlying data pointer can be accessed via
catamari::Buffer<T>::Data()
instead of
std::vector<T>::data()
(the begin()
and end()
member
functions exist so that rangebased for loops function over
catamari::Buffer<T>
).
A simple example combining all of these features is:
#include <iostream>
#include "catamari.hpp"
const std::size_t num_entries = 5;
catamari::Buffer<float> entries;
entries.Resize(num_entries);
// The five entries are not yet initialized.
// Initialize the i'th entry as i^2.
for (std::size_t i = 0; i < num_entries; ++i) {
entries[i] = i * i;
}
// Print the entries.
std::cout << "entries: ";
for (const float& entry : entries) {
std::cout << entry << " ";
}
std::cout << std::endl;
// Double the length of the buffer and zeroinitialize.
entries.Resize(2 * num_entries, 0.f);
// Extract a mutable pointer to the entries.
float* entries_ptr = entries.Data();
Manipulating dense matrices with BlasMatrix<Field>
¶
The Basic Linear Algebra Subprograms (BLAS)
established a standard format for representing dense matrices: columnmajor
storage with metadata indicating the height, width, leading dimension, and
pointer to the underlying buffer. The leading dimension, or rowstride, of
a matrix stored in columnmajor format is such that the entry
is stored at position i + j * leading_dim
in the buffer.
Catamari thus implements a minimal
description of such a matrix format in its
catamari::BlasMatrixView<Field>
template structure. The data structure
is meant to be a lowlevel, minimal interface to BLASlike APIs and should
typically be avoided by users in favor of the higherlevel
catamari::BlasMatrix<Field>
class, which handles resource allocation
and deallocation.
catamari::BlasMatrixView<Field>
should typically only be used when there
is a predefined buffer holding the columnmajor matrix data. For example:
#include "catamari.hpp"
const std::size_t height = 500;
const std::size_t width = 600;
const std::size_t leading_dim = 1000;
std::vector<double> buffer(leading_dim * width);
catamari::BlasMatrixView<double> matrix_view;
matrix_view.height = height;
matrix_view.width = width;
matrix_view.leading_dim = leading_dim;
matrix_view.data = buffer.data();
// One can now manipulate references to the (i, j) entry of the matrix
// using operator()(catamari::Int, catamari::Int). For example:
matrix_view(10, 20) = 42.;
However, a typical user should not need to manually allocate and attach a
data buffer and could instead use catamari::BlasMatrix<Field>
:
#include "catamari.hpp"
catamari::BlasMatrix<double> matrix;
matrix.Resize(height, width);
// One could alternatively have resized and initialized each entry with a
// particular value (e.g., 0) via matrix.Resize(height, width, 0.);
matrix(10, 20) = 42.;
The catamari::BlasMatrixView<Field>
interface is exposed via the
view
member of the catamari::BlasMatrix<Field>
class.
Manipulating sparse matrices with CoordinateMatrix<Field>
¶
The current userlevel interface for manipulating sparse matrices is via the
coordinateformat class catamari::CoordinateMatrix<Field>
. Its primary
underlying data is a lexicographically sorted
catamari::Buffer<catamari::MatrixEntry<Field>>
and an associated catamari::Buffer<Int>
of row offsets (which serve the
same role as in a Compressed Sparse Row (CSR) format). Thus, this storage
scheme is a superset of the CSR format that explicitly stores both row and
column indices for each entry.
The catamari::MatrixEntry<Field>
template struct is essentially a tuple
of the catamari::Int
row
and column
indices and a scalar
(of type Field
) value
.
The class is designed so that the sorting and offset computation overhead can be amortized over batches of entry additions and removals.
For example, the code block:
#include "catamari.hpp"
catamari::CoordinateMatrix<double> matrix;
matrix.Resize(5, 5);
matrix.ReserveEntryAdditions(6);
matrix.QueueEntryAddition(3, 4, 1.);
matrix.QueueEntryAddition(2, 3, 2.);
matrix.QueueEntryAddition(2, 0, 1.);
matrix.QueueEntryAddition(4, 2, 2.);
matrix.QueueEntryAddition(4, 4, 3.);
matrix.QueueEntryAddition(3, 2, 4.);
matrix.FlushEntryQueues();
const catamari::Buffer<catamari::MatrixEntry<double>>& entries =
matrix.Entries();
would return a reference to the underlying
catamari::Buffer<catamari::MatrixEntry<double>>
of matrix
,
which should contain the entry sequence:
(2, 0, 1.), (2, 3, 2.), (3, 2, 4.), (3, 4, 1.), (4, 2, 2.), (4, 4, 3.)
.
Similarly, subsequently running the code block:
matrix.ReserveEntryRemovals(2);
matrix.QueueEntryRemoval(2, 3);
matrix.QueueEntryRemoval(0, 4);
matrix.FlushEntryQueues();
would modify the Buffer underlying the entries
reference to now
contain the entry sequence:
(2, 0, 1.), (3, 2, 4.), (3, 4, 1.), (4, 2, 2.), (4, 4, 3.)
.
Symmetric and Hermitian direct linear solvers¶
Catamari’s linear system solvers are targeted to the class of matrices which can be (reasonably stably) factored with either Cholesky, , or factorizations, where is diagonal and is unit lowertriangular. This class is a strict (but large) subset of symmetric and Hermitian systems that contains Hermitian QuasiDefinite [GeorgeEtAl2006] and complexsymmetric matrices with positivedefinite real and imaginary components [Higham1998]. The simplest counterexample of an invertible symmetric matrix that cannot be factored with such techniques is:
which is betterhandled with BunchKaufman type techniques (which use either 1x1 or 2x2 diagonal pivots). Catamari does not currently support 2x2 pivots.
Dense factorizations¶
Beyond their intrinsic usefulness, highperformance dense factorizations are a core component of supernodal sparsedirect solvers. Catamari therefore provides sequential and multithreaded (via OpenMP’s task scheduler) implementations of dense Cholesky, , and factorizations (as one might infer, for both real and complex scalars).
Sequential (perhaps using multithreaded BLAS calls) dense Cholesky
factorizations can be easily performed using a call to
catamari::LowerCholeskyFactorization
on a
catamari::BlasMatrixView<Field>
.
#include "catamari.hpp"
// Build a dense Hermitian positivedefinite matrix.
catamari::BlasMatrix<catamari::Complex<double>> matrix;
matrix.Resize(num_rows, num_rows);
// Fill the matrix using commands of the form:
// matrix(row, column) = value;
// Perform the sequential, dense Cholesky factorization using a
// userdetermined algorithmic blocksize.
const catamari::Int block_size = 64;
catamari::LowerCholeskyFactorization(block_size, &matrix.view);
Multithreaded dense Cholesky factorization can similarly be performed with a
call to catamari::OpenMPLowerCholeskyFactorization
, though care must be
taken to avoid thread oversubscription by ensuring that only a single thread is
used for each BLAS call. Each OpenMP routine in Catamari assumes that it is
within a #pragma omp single
section of an #pragma omp parallel
region.
#include "catamari.hpp"
// Build a dense Hermitian positivedefinite matrix.
catamari::BlasMatrix<catamari::Complex<double>> matrix;
matrix.Resize(num_rows, num_rows);
// Fill the matrix using commands of the form:
// matrix(row, column) = value;
// Avoid BLAS thread oversubscription.
const int old_max_threads = catamari::GetMaxBlasThreads();
catamari::SetNumBlasThreads(1);
// Perform the sequential, dense Cholesky factorization using a
// userdetermined algorithmic blocksize.
const catamari::Int tile_size = 128;
const catamari::Int block_size = 64;
#pragma omp parallel
#pragma omp single
catamari::OpenMPLowerCholeskyFactorization(
tile_size, block_size, &matrix.view);
// Restore the number of BLAS threads.
catamari::SetNumBlasThreads(old_max_threads);
Real and complex and can be executed with nearly
identical code by instead calling
catamari::LowerLDLTransposeFactorization
,
catamari::OpenMPLowerLDLTransposeFactorization
,
catamari::LowerLDLAdjointFactorization
, or
catamari::OpenMPLowerLDLAdjointFactorization
.
Please see example/dense_factorization.cc for full examples of using the sequential and multithreaded dense factorizations.
Sparsedirect solver¶
Usage of catamari’s sparsedirect solver through the
catamari::CoordinateMatrix<Field>
template class is fairly
straightforward and has an identical interface in sequential and multithreaded
contexts (the multithreaded solver is called if more the maximum number of
OpenMP threads is detected as greater than one).
#include "catamari.hpp"
// Build a real or complex Hermitian input matrix.
//
// Alternatively, one could use
// catamari::CoordinateMatrix<Field>::FromMatrixMarket to read the matrix from
// a Matrix Market file (e.g., from the Davis sparse matrix collection). But
// keep in mind that one often needs to enforce explicit symmetry.
catamari::CoordinateMatrix<double> matrix;
matrix.Resize(num_rows, num_rows);
matrix.ReserveEntryAdditions(num_entries_upper_bound);
// Queue updates of entries in the sparse matrix using commands of the form:
// matrix.QueueEdgeAddition(row, column, value);
matrix.FlushEntryQueues();
// Fill the options for the factorization.
catamari::SparseLDLControl ldl_control;
// The options for the factorization type are:
// * catamari::kCholeskyFactorization,
// * catamari::kLDLAdjointFactorization,
// * catamari::kLDLTransposeFactorization.
ldl_control.SetFactorizationType(catamari::kCholeskyFactorization);
// Factor the matrix.
catamari::SparseLDL<double> ldl;
const catamari::SparseLDLResult result = ldl.Factor(matrix, ldl_control);
// Solve a linear system using the factorization.
catamari::BlasMatrix<double> right_hand_sides;
right_hand_sides.Resize(num_rows, num_rhs);
// The (i, j) entry of the righthand side can easily be read or modified, e.g.:
// right_hand_sides(i, j) = 1.;
ldl.Solve(&right_hand_sides.view);
// Alternatively, one can solve using iterativerefinement, e.g., using:
catamari::RefinedSolveControl<double> refined_solve_control;
refined_solve_control.relative_tol = 1e15;
refined_solve_control.max_iters = 3;
refined_solve_control.verbose = true;
ldl.RefinedSolve(matrix, refined_solve_control, &right_hand_sides.view);
Catamari’s sparsedirect solver (like CHOLMOD before it [ChenEtAl2008]),
by default, dynamically chooses between a
rightlooking multifrontal method and an uplooking simplicial approach based
upon the arithmetic intensity of the factorization. The former approach is
used for sufficiently high arithmetic intensity (and operation count), while
the latter is used for sufficiently small and/or sparse factorizations.
This default strategy can be overridden by modifying the
catamari::SparseLDLControl::supernodal_strategy
member variable of the
control structure from its default value of
catamari::kAdaptiveSupernodalStrategy
to either
catamari::kSupernodalFactorization
or
catamari::kScalarFactorization
.
There is also support for efficiently factoring sequences of matrices with
identical sparsity patterns, but different numerical values, via the member
function
catamari::SparseLDL<Field>::RefactorWithFixedSparsityPattern(const catamari::CoordinateMatrix<Field>& matrix)
.
Such a technique is important for an efficient implementation of an Interior
Point Method.
One can also browse the example/ folder for complete examples (e.g., for solving 3D Helmholtz equations with PML boundary conditions discretized using trilinear hexahedral elements using a complex factorization).
Determinantal Point Process sampling¶
Catamari’s Determinantal Point Process [Macchi1975] [HoughEtAl2006] samplers all operate directly on the marginal kernel matrix: if is a determinantal point process with a ground set of cardinality (so that we may identify the ground set with indices , the probability of a subset being in a random sample is given by
where is the restriction of the row and column indices of the marginal kernel matrix to .
The eigenvalues of a marginal kernel matrix are restricted to live in (ensuring that all minors are valid probabilities). In many cases, the marginal kernel matrices are assumed to be Hermitian, in which case the eigenvalue bound is enough to guarantee an admissible kernel. More generally, for nonHermitian matrices with eigenvalues in $[0, 1]$, they must also satisfy the condition [Brunel2018]:
Catamari separately handles Hermitian and nonHermitian kernels – one makes use of modified $LDL^H$ factorizations, and the other $LU$ factorizations. The core notion is the close relationship between forming Schur complements and forming conditional DPPs.
Proposition (DPP Schur complements). Given disjoint subsets of the ground set of a Determinantal Point Process with marginal kernel , the probability of being in the sample , respectively conditioned on being either in or outside of the sample, are:
where denotes the restriction of the marginal kernel to the rows with indices in and columns with indices in .
Proof: The first claim follows from
and
The second claim follows from repeated application of the case where is a single element, :
As a corollary, given that the probability of a particular index being included in a DPP sample is equivalent to its (conditioned) diagonal value, we may modify a traditional triangular factorization of the marginal kernel to sample a DPP: when a classical factorization reaches a pivot value, we flip a Bernoulli coin with heads probability equal to the pivot value (which lies in ) to decide if the pivot index will be in the sample. If the coin comes up heads, we keep the sample and procede as usual (as the conditional DPP with the pivot index kept will equal a traditional Schur complement); otherwise, we can subtract one from the pivot (making the value nonpositive, and negative almost surely), and procede as usual.
Dense DPP sampling¶
A dense Hermitian DPP can be sampled from its kernel matrix (in a sequential
manner, perhaps using multithreaded BLAS calls) using the routine
catamari::SampleLowerHermitianDPP
:
#include "catamari.hpp"
catamari::BlasMatrix<catamari::Complex<double>> matrix;
matrix.Resize(num_rows, num_rows);
// Fill the matrix with calls of the form: matrix(i, j) = value;
std::random_device random_device;
std::mt19937 generator(random_device());
const catamari::Int block_size = 64;
const bool maximum_likelihood = false;
const int num_samples = 10;
std::vector<std::vector<catamari::Int>> samples(num_samples);
std::vector<double> log_likelihoods(num_samples);
for (int sample_index = 0; sample_index < num_samples; ++sample_index) {
auto matrix_copy = matrix;
samples[sample_index] = catamari::SampleLowerHermitianDPP(
block_size, maximum_likelihood, &matrix_copy, &generator);
log_likelihoods[sample_index] = catamari::DPPLogLikelihood(matrix_copy.view);
}
The Hermitian DPP can be sampled using OpenMP’s DAGscheduler by instead calling
catamari::OpenMPSampleLowerHermitianDPP
:
#include "catamari.hpp"
catamari::BlasMatrix<catamari::Complex<double>> matrix;
matrix.Resize(num_rows, num_rows);
// Fill the matrix with calls of the form: matrix(i, j) = value;
// Ensure that the DAGscheduled routine will use singlethreaded BLAS calls.
const int old_max_threads = catamari::GetMaxBlasThreads();
catamari::SetNumBlasThreads(1);
std::random_device random_device;
std::mt19937 generator(random_device());
const catamari::Int block_size = 64;
const catamari::Int tile_size = 128;
const bool maximum_likelihood = false;
const int num_samples = 10;
std::vector<std::vector<catamari::Int>> samples(num_samples);
std::vector<double> log_likelihoods(num_samples);
for (int sample_index = 0; sample_index < num_samples; ++sample_index) {
auto matrix_copy = matrix;
#pragma omp parallel
#pragma omp single
{
samples[sample_index] = catamari::OpenMPSampleLowerHermitianDPP(
tile_size, block_size, maximum_likelihood, &matrix_copy, &generator);
log_likelihoods[sample_index] = catamari::DPPLogLikelihood(
matrix_copy.view);
}
}
// Revert to the original number of BLAS threads.
catamari::SetNumBlasThreads(old_max_threads);
An example of calling each of these routines can be found in example/dense_dpp.cc. A more interest example, which builds and samples from a dense DPP that uniformly samples spanning trees over a 2D grid graph, is given in example/uniform_spanning_tree.cc.
NonHermitian dense DPPs can be sampled using catamari::SampleNonHermitianDPP
or catamari::OpenMPSampleNonHermitianDPP
. The
example driver example/aztec_diamond.cc <https://gitlab.com/hodge_star/catamari/blob/master/example/aztec_diamond.cc demonstrates their usage for uniformly
sampling domino tilings of the Aztec diamond.
Sparse DPP sampling¶
Usage of catamari’s sparsedirect Hermitian DPP sampler via
catamari::CoordinateMatrix
is similar to usage of the library’s
sparsedirect solver.
#include "catamari.hpp"
// Build a real or complex symmetric input matrix.
//
// Alternatively, one could use
// catamari::CoordinateMatrix<Field>::FromMatrixMarket to read the matrix from
// a Matrix Market file (e.g., from the Davis sparse matrix collection). But
// keep in mind that one often needs to enforce explicit symmetry.
catamari::CoordinateMatrix<double> matrix;
matrix.Resize(num_rows, num_rows);
matrix.ReserveEntryAdditions(num_entries_upper_bound);
// Queue updates of entries in the sparse matrix using commands of the form:
// matrix.QueueEdgeAddition(row, column, value);
matrix.FlushEntryQueues();
// Construct the sampler.
catamari::SparseHermitianDPPControl dpp_control;
catamari::SparseHermitianDPP<double> dpp(matrix, dpp_control);
// Extract samples (which can either be maximumlikelihood or not).
// The loglikelihoods of each sample are stored as well.
const bool maximum_likelihood = false;
std::vector<std::vector<catamari::Int>> samples(num_samples);
std::vector<double> log_likelihoods(num_samples);
for (int sample_index = 0; sample_index < num_samples; ++sample_index) {
samples[sample_index] = dpp.Sample(maximum_likelihood);
log_likelihoods[sample_index] = dpp.LogLikelihood();
}
Like Catamari’s sparsedirect solver, by default, the DPP sampler dynamically
chooses between a rightlooking
multifrontal method and an uplooking simplicial approach based upon the
arithmetic intensity of the factorization.
This default strategy can be overridden by modifying the
catamari::SparseLDLControl::supernodal_strategy
member variable of the
control structure from its default value of
catamari::kAdaptiveSupernodalStrategy
to either
catamari::kSupernodalFactorization
or
catamari::kScalarFactorization
.
A full example of sampling a DPP from a scaled negative 2D Laplacian is given at example/dpp_shifted_2d_negative_laplacian.cc.
References¶
[Brunel2018]  VictorEmmanuel Brunel, Learning Signed Determinantal Point Processes through the Principal Minor Assignment Problem, arXiv:1811.00465, 2018. URL: https://arxiv.org/abs/1811.00465 
[ChenEtAl2008]  Yanqing Chen, Timothy A. Davis, William W. Hager, and Sivasankaran Rajamanickam, Algorithm 887: CHOLMOD, Supernodal Sparse Cholesky Factorization and Update/Downdate, ACM Trans. Math. Softw., 35(3), Article 22, October 2008. DOI: http://10.1145/1391989.1391995 
[GeorgeEtAl2006]  Alan George, K.H. Irkamov, and A.B. Kucherov, Some properties of symmetric quasidefinite matrices, SIAM J. Matrix Anal. Appl., 21(4), pp. 1318–1323, 2006. DOI: https://epubs.siam.org/doi/10.1137/S0895479897329400 
[Higham1998]  Nicholas J. Higham, Factorizing complex symmetric matrices with positive definite real and imaginary parts, Mathematics of Computation, 64(224), pp. 1591–1599, 1998. URL: https://www.ams.org/journals/mcom/199867224/S0025571898009788/S0025571898009788.pdf 
[HoughEtAl2006] 

[Macchi1975] 
