Colorizing the Prokudin-Gorskii Photo Collection

By: Evan Chang

CS 180 Project 1

Introduction

The Prokudin-Gorskii Photo Collection is a collection of photos taken by Sergei Mikhailovich Prokudin-Gorskii in the early 1900s. The collection includes thousands of images othe Russian Empire, and is notable for being some of the first color photographs ever taken. Despite the fact that there were no ways to display color images at the time, Prokudin-Gorskii was able to take color photographs by taking three separate black and white photographs of the same scene, each with a different color filter in front of the lens. These were later bought by the Library of Congress and digitized, and are now available to the public. In this project, I attempted to colorize these images by aligning the three color plates and stacking them together to form a colorized image of the scene.
Monastery
Example of a Prokudin-Gorskii Photo
Channels from top to bottom: Blue, Green, Red

Naive Implementation

I began by implementing a naive implementation of stitching together the color plates of the Prokudin-Gorskii photos. Simply stitching together the three different color plates would not yield a great result, as the plates are not perfectly aligned. Therefore, I attempted to choose a reference frame (the blue frame) and align the other two frames to this reference frame by using a simple exhaustive search algorithm. I iterated through a user input amount of horizontal and vertical pixel shifts (from -15 to 15) looking for the shift that produced the largest value of the chosen metric. The two metrics I used were:

  1. Negative Sum of Squared Differences (SSD)
  2. \[\text{NSSD}(u, v) = -\sum_{(x, y) \in N} [I(u + x, v + y) - P(x, y)]^2\]

  3. Normalized Cross-Correlation (NCC)
  4. THE MANAGEMENT OF A LARYNGOCELE CAUSING ACUTE AIRWAY OBSTRUCTION WITH A BALL VALVE EFFECT: A CASE REPORT \[\text{NCC}(u, v) = \frac{\sum_{(x,y)\in N} (I(u+x,v+y) - \bar{I})(P(x, y) - \bar{P})}{\sqrt{\sum_{(x, y) \in N} \left[I(u+x, v+y) - \bar{I}\right]^2 \sum_{(x, y) \in N} \left[P(x, y) - \bar{P}\right]^2}}\]

Since the edges of the image were not all the same and included some slight shifts and imperfections, I also chose to use the inner 85% of the image instead of the whole image. I tested out the naive implementation on small images and found that they both had comparable performance. They both produced more clear images than simply stacking the three channels together with no shift. I ended up choosing to use the NCC metric for all future steps.
Default Monastery Stack
Default Alignment (no shift)
NCC Aligned Monastery
Alignment Using NCC Metric   R: (3, 2)   G: (-3, 2)
NSSD Aligned Monastery
Alignment Using NSSD Metric   R: (3, 2)   G: (-3, 2)

Image Pyramid Implementation

While the naive implementation works decently well for the smaller images, because the other images were on the order of 10x larger, the naive impelmentation would require search widths of over 100 pixels, and thus would be too slow to run. To speed up the search process, I implemented an image pyramid algorithm. The image pyramid algorithm works by creating a series of images that are downsampled by a factor of 2 each time until they are around 32 pixels along the shortest axis. The algorithm then starts by searching for the best shift in the smallest image, and then uses this shift to search for the best shift in the next largest image, and so on. Becuase it can do more of the searching in smaller images, this algorithm runs much faster than it would take for searching through the full images. While I initially used a Gaussian filter before downsampling to avoid aliasing, I found that leaving out the filtering significantly sped up the algorithm while not significantly affecting the results. I therefore chose to just downsample in the offset searching process. This ultimately worked well for almost every image in the library I ran the algorithm on.

Pyramid Aligned Onion Church
Aligned Onion Church
Pyramid Aligned Melons
Aligned Melons
Pyramid Aligned Sculpture
Aligned Sculpture

Bells & Whistles

SSIM Metric

One image that performed very poorly with the default image pyramid and metrics I chose was the image of Emir. Because of the difference between the pixel brightness in each channel, the red channel ends up being aligned to be way off from where it should be. However, using the Structural Similarity Index (SSIM) metric, I was able to get a much better alignment for this image and comparable results for the rest of the images in the library. The SSIM metric is a metric that compares the similarity between two images by comparing the luminance, contrast, and structure of the images and is calculated as follows:

\[\text{SSIM}(u, v) = \frac{(2\mu_u\mu_v + c_1)(2\sigma_{uv} + c_2)}{(\mu_u^2 + \mu_v^2 + c_1)(\sigma_u^2 + \sigma_v^2 + c_2)}\] where:

Pyramid Aligned Emir
NCC Alignment
SSIM Aligned Emir
SSIM Alignment

Final Results:

SSIM Aligned emir
Red Offset: (104, 40), Green Offset: (48, 20)
SSIM Aligned monastery
Red Offset: (2, 2), Green Offset: (-4, 2)
SSIM Aligned church
Red Offset: (58, -4), Green Offset: (26, 4)
SSIM Aligned three_generations
Red Offset: (110, 10), Green Offset: (52, 14)
SSIM Aligned melons
Red Offset: (178, 12), Green Offset: (80, 10)
SSIM Aligned onion_church
Red Offset: (108, 36), Green Offset: (52, 26)
SSIM Aligned train
Red Offset: (86, 30), Green Offset: (40, 6)
SSIM Aligned tobolsk
Red Offset: (6, 4), Green Offset: (2, 2)
SSIM Aligned icon
Red Offset: (88, 22), Green Offset: (40, 16)
SSIM Aligned cathedral
Red Offset: (12, 4), Green Offset: (4, 2)
SSIM Aligned self_portrait
Red Offset: (174, 36), Green Offset: (78, 28)
SSIM Aligned harvesters
Red Offset: (122, 12), Green Offset: (58, 14)
SSIM Aligned sculpture
Red Offset: (140, -26), Green Offset: (34, -10)
SSIM Aligned lady
Red Offset: (118, 12), Green Offset: (56, 8)