Let’s begin from the simplest form of visual data – images. Any image (or visual data) has a certain resolution, i.e. a number of pixels (samples) that represent that image. In order for us to view/perceive an image we need a device, e.g. a computer screen or paper. Such devices have their own resolution, the pixel budget available - let it be the screen on which an image is viewed or the printer that is used for printing the image on a paper.
Going shortly back to what resolution is: While there are multiple definitions of resolution we will consider pixel resolution. What is usually meant is the pixel count of an image/visual data. And what is a pixel? Pixel is the smallest point in the image of a certain colour. Below we have a “pixelized” image, i.e. an image of relatively small resolution with apparent pixels:
In this case, the resolution is given by “resolution width” x “resolution height”.
Towards choosing the right pixels
Above are 4 downscaled images of the same original using different downsampling techniques. Despite having the same number of pixels they appear different to us. This is just one of many examples how pixel choice can greatly affect our experience of an image.
What happens when the image resolution and the presentation resolution differ?
The actual resolution of an image and the resolution of its device of representation, e.g. a portion of a screen, almost never match. Most commonly, the image has a higher original resolution, thanks to the rapid development of capture devices such as cameras and their widespreadness. Some of the information contained within this resolution may have been lost due to compression done while storing or transferring data, but that’s another story.
Hence, often we end up having to choose which pixels, or their combinations, out of the original image we want to display on a screen. Shouldn’t that be easy? Not necessarily. Actually the issue is so complex that a multitude of solutions have been proposed through decades.
Having more abundant data than the screen offers us a chance to choose a better option, but with a small caveat: it is not clear which choice of the pixels will provide users the best experience. Do we just randomly pick the pixels? That may be a waste of the data we have and result in an unpleasantly noisy image. It may also be possible to not take a subset of the existing pixels but to instead take averages of some neighbouring pixels or create new pixels with some optimisation function.
This process of choosing a number of pixels in order to go from higher to lower resolution is called downscaling or downsampling. In the last decades, this issue was approached by picking mathematically optimal pixels and then was almost put to rest for a while. But recently, researchers realised mathematical optimum does not coincide with perceptual optimum, i.e. what we like most. Perceptually-aware downscaling uses a metric that optimises the pixel choice to human liking and recognition, i.e. aims at downscaling such that the resulting images/visual data are best suited and looks nicest/most informative for a typical human.
The issue gets even more complex for other types of visual data. For example, in the case of video, we have more of an abundance of data. Subsequent frames hold some more information that can be used up and allows for further optimisations but also carries more challenges. Whatever the type of visual data we are dealing with, we do always need proper downscaling.
Originally published on linkedin