Most data compression schemes rely on the idea that most real world data is sparse in some basis. For example, for a 400x400 pixel image, it’s almost always the case that we don’t need all 160,000 pixel values to extract all the relevant information from it. This is because of the obscenely high-dimensional space that the image is in. The vast majority of images in a 160,000-dimension space are incomprehensible noise that contain virtually no information. Images that mean something to humans occupy a tiny fraction of the space of all possible 400x400 images. So, given a 400x400 pixel image, we might compress it into a few relevant singular vectors or represent it in a Fourier or wavelet basis in which it is sparse. However, all these compression schemes require that we first obtain the original 400x400 pixel image before we determine which pixels we can discard. What if we could only obtain a sparse version of the image, perhaps around 5~10% of the total number of pixels, and use that to reconstruct the original image? Shouldn’t this be possible given that we know that the image is sparse in some basis? This is the idea behind compressed sensing.
“Compressed sensing can be summarized in a relatively simple statement: A given signal, if it is sufficiently sparse in a known basis, may be recovered (with high probability) using significantly fewer measurements than the signal length, if there are sufficiently many measurements and these measurements are sufficiently random”
-Brunton and Kutz, Data-Driven Science and Engineering
Suppose is the original dense signal that is sufficiently sparse in some known basis. If is a sparse measurement of , we can represent the relationship between and with a measurement matrix :
Typically, the measurement matrix is a random matrix which dictates the level of sparsity of . The goal is to reconstruct from .
Suppose , , and has dimension . The assumption for real-world data is that is sparse in some basis . Let’s say that we need basis vectors in to represent ; that is, is -sparse. We are interested in the case where , i.e. although we necessarily require more measurements than the sparsity of , we want the number of measurements to be substantially less than the dimensionality of the signal space. In terms of the image example, we want the number of (random) pixels that we must sample for reconstruction to be much less than the total number of pixels in the original image.
Observe that given , there are infinitely many solutions to since there are much more variables than equations. However, recall that we have the condition that is sparse in some basis . The somewhat magical part is that given a “well-behaved” measurement matrix , a mathematical property called the Restricted Isometry Property (RIP) ensures that no two sparse signals produce the same measurement . This means that we can almost think of there being one unique sparse solution, which must be , since we know it to be sparse in .
Because is -sparse in , it can be represented as a vector with nonzero entries such that . The goal then is to find a sparse that satisfies . Again, the RIP ensures that there is unique such , or at least uniqueness with high probability. We formulate the following optimization problem to find :
where , or the pseudo-norm, gives the number of nonzero entries. However, because this problem is non-convex and intractable, we substitute the pseudo-norm for the norm. This turns the optimization into a tractable convex optimization. It’s worth noting that “[t]here are very specific conditions that must be met for the -minimization in (3.5) to converge with high probability to the sparsest solution.” The conditions can be summarized as the following: the rows of and the columns of must be uncorrelated, and the number of measurements must be sufficiently large.
Resources: Data-Driven Science and Engineering