Proj1

Introduction

The Prokudin-Gorskii Photo Collection is a collection of photos taken by Sergei Mikhailovich Prokudin-Gorskii in the early 1900s. The collection includes thousands of images othe Russian Empire, and is notable for being some of the first color photographs ever taken. Despite the fact that there were no ways to display color images at the time, Prokudin-Gorskii was able to take color photographs by taking three separate black and white photographs of the same scene, each with a different color filter in front of the lens. These were later bought by the Library of Congress and digitized, and are now available to the public. In this project, I attempted to colorize these images by aligning the three color plates and stacking them together to form a colorized image of the scene.

Example of a Prokudin-Gorskii Photo
Channels from top to bottom: Blue, Green, Red

Naive Implementation

I began by implementing a naive implementation of stitching together the color plates of the Prokudin-Gorskii photos. Simply stitching together the three different color plates would not yield a great result, as the plates are not perfectly aligned. Therefore, I attempted to choose a reference frame (the blue frame) and align the other two frames to this reference frame by using a simple exhaustive search algorithm. I iterated through a user input amount of horizontal and vertical pixel shifts (from -15 to 15) looking for the shift that produced the largest value of the chosen metric. The two metrics I used were:

Negative Sum of Squared Differences (SSD)

\[\text{NSSD}(u, v) = -\sum_{(x, y) \in N} [I(u + x, v + y) - P(x, y)]^2\]

Normalized Cross-Correlation (NCC)

THE MANAGEMENT OF A LARYNGOCELE CAUSING ACUTE AIRWAY OBSTRUCTION WITH A BALL VALVE EFFECT: A CASE REPORT \[\text{NCC}(u, v) = \frac{\sum_{(x,y)\in N} (I(u+x,v+y) - \bar{I})(P(x, y) - \bar{P})}{\sqrt{\sum_{(x, y) \in N} \left[I(u+x, v+y) - \bar{I}\right]^2 \sum_{(x, y) \in N} \left[P(x, y) - \bar{P}\right]^2}}\]

Since the edges of the image were not all the same and included some slight shifts and imperfections, I also chose to use the inner 85% of the image instead of the whole image. I tested out the naive implementation on small images and found that they both had comparable performance. They both produced more clear images than simply stacking the three channels together with no shift. I ended up choosing to use the NCC metric for all future steps.

Default Alignment (no shift)

Alignment Using NCC Metric R: (3, 2) G: (-3, 2)

Alignment Using NSSD Metric R: (3, 2) G: (-3, 2)

Image Pyramid Implementation

While the naive implementation works decently well for the smaller images, because the other images were on the order of 10x larger, the naive impelmentation would require search widths of over 100 pixels, and thus would be too slow to run. To speed up the search process, I implemented an image pyramid algorithm. The image pyramid algorithm works by creating a series of images that are downsampled by a factor of 2 each time until they are around 32 pixels along the shortest axis. The algorithm then starts by searching for the best shift in the smallest image, and then uses this shift to search for the best shift in the next largest image, and so on. Becuase it can do more of the searching in smaller images, this algorithm runs much faster than it would take for searching through the full images. While I initially used a Gaussian filter before downsampling to avoid aliasing, I found that leaving out the filtering significantly sped up the algorithm while not significantly affecting the results. I therefore chose to just downsample in the offset searching process. This ultimately worked well for almost every image in the library I ran the algorithm on.

Aligned Onion Church

Aligned Melons

Aligned Sculpture

Bells & Whistles

SSIM Metric

One image that performed very poorly with the default image pyramid and metrics I chose was the image of Emir. Because of the difference between the pixel brightness in each channel, the red channel ends up being aligned to be way off from where it should be. However, using the Structural Similarity Index (SSIM) metric, I was able to get a much better alignment for this image and comparable results for the rest of the images in the library. The SSIM metric is a metric that compares the similarity between two images by comparing the luminance, contrast, and structure of the images and is calculated as follows:

\[\text{SSIM}(u, v) = \frac{(2\mu_u\mu_v + c_1)(2\sigma_{uv} + c_2)}{(\mu_u^2 + \mu_v^2 + c_1)(\sigma_u^2 + \sigma_v^2 + c_2)}\] where: