Data Compression

A common problem for computer scientists, which has very direct effects on the consumers of their products, is the way in which they store data that will be used by an end user. Finding an intelligent method to store this data in such a way that it does not take up an unnecessarily large amount of space is a common problem that has been looked at for many years. This becomes especially applicable when we are looking at the transmission of this data, or when we are storing it on some external system.

As an example, we will observe storing images and compressing them. The simplest way to represent an image in a computer is what we call a bitmap. For a grayscale image, you can imagine this as being represented as a large matrix of numbers with gray values, representing intensities, ranging from 0 to 255 (the numbers that can be represented in 8 binary bits, or one byte). For color images, we have almost the exact same thing, but instead we have to have three matrices – one each for the red, green, and blue values of each pixel. Every image can be represented in this way, or some equivalent combination of color values, for instance cyan, yellow, and magenta is also a popular model. Obviously, the amount of data grows very quickly as the image grows, with a 1000x1000 pixel image representing 3×10002=2.86 Megabytes, while still being a relatively small image.

For someone who is working with something such as billboards, it is very important that these images have as much information as possible. That is why high megapixel cameras, which capture details at a smaller and smaller level, are important in high end photography; however, if I am designing a webpage in which images will only be displayed as smaller 500 by 500 squares, this extra information is unnecessary and will only serve to fill up available space faster. The question, which we will address using topics covered in  linear algebra, is how can this information be compressed in such a way that the image maintains acceptable quality? How can we decide which values are important to keep in the compressed image?

We believe that these questions can be addressed systematically using concepts from linear algebra this semester. For instance we know that the eigenvalues are unique to any given matrix, so we could use these to help us approximate a solution. We also know that each matrix has a given rank, so when dealing with large numbers, as long as this rank is greater than 0, we have some set of linearly independent columns, so we can analyze this set and the space it occupies to try to reduce our image. For instance, even if the three columns have a rank of three, we know that they may only occupy a subspace of R3 meaning we only have to represent a limited space. Information such as this could help us in deciding how to approximate our image. Our project will attempt to use the characteristics of the matrices of the image as a way to approximate a condensed form of that image.