CS 180

Colorizing the Prokudin-Gorskii Photo Collection

By Evan Chang · CS 180 Project 1

Tech: Python, NumPy, Image alignment

Introduction

The Prokudin-Gorskii Photo Collection is a set of photos taken by Sergei Mikhailovich Prokudin-Gorskii in the early 1900s. The collection includes thousands of images of the Russian Empire, and is notable for being some of the first color photographs ever taken.

Despite the fact that there were no ways to display color images at the time, Prokudin-Gorskii was able to take color photographs by taking three separate black and white photographs of the same scene, each with a different color filter in front of the lens. These were later bought by the Library of Congress and digitized, and are now available to the public.

In this project, I colorized these images by aligning the three color plates and stacking them together to form a color image of the scene.

Monastery plates
Example of a Prokudin-Gorskii photo. Channels from top to bottom: blue, green, red.

Naive implementation

I began with a naive approach to stitching the color plates. Simply stacking the three plates does not work well because they are not perfectly aligned.

I chose a reference frame (the blue frame) and aligned the other two frames with an exhaustive search over horizontal and vertical pixel shifts (for example from \(-15\) to \(15\)), maximizing a chosen metric.

SSD and NCC metrics

The two metrics I used were:

1. Negative sum of squared differences (NSSD)

\[ \text{NSSD}(u, v) = -\sum_{(x, y) \in N} [I(u + x, v + y) - P(x, y)]^2 \]

2. Normalized cross-correlation (NCC)

\[ \text{NCC}(u, v) = \frac{\sum_{(x,y)\in N} (I(u+x,v+y) - \bar{I})(P(x, y) - \bar{P})}{\sqrt{\sum_{(x, y) \in N} \left[I(u+x, v+y) - \bar{I}\right]^2 \sum_{(x, y) \in N} \left[P(x, y) - \bar{P}\right]^2}} \]

Because the edges of the image were not consistent and included slight shifts and imperfections, I used the inner 85% of the image instead of the whole image.

On small images, both metrics performed comparably and both were clearer than stacking with no shift. I used the NCC metric for all later steps.

Default monastery stack
Default alignment (no shift)
NCC aligned monastery
NCC metric — R: (3, 2), G: (−3, 2)
NSSD aligned monastery
NSSD metric — R: (3, 2), G: (−3, 2)

Image pyramid implementation

While the naive method works on small images, larger images need search widths on the order of 100+ pixels, which is too slow for exhaustive search.

I implemented an image pyramid: repeatedly downsample by a factor of 2 until the shortest axis is around 32 pixels, search for the best shift at the smallest scale, then refine at each larger scale.

Searching mostly on small images makes the algorithm much faster than searching the full resolution. I initially used a Gaussian before downsampling to limit aliasing, but skipping the filter sped things up without hurting results much, so I downsampled directly for offset search. This worked well on almost every image I tried.

Onion church aligned
Aligned onion church
Melons aligned
Aligned melons
Sculpture aligned
Aligned sculpture

Bells and whistles

SSIM metric

One difficult case was Emir: channel brightness differs enough that NCC aligned the red channel poorly.

Using the structural similarity (SSIM) metric produced a much better alignment for Emir and similar quality on the rest of the library.

SSIM compares luminance, contrast, and structure:

\[ \text{SSIM}(u, v) = \frac{(2\mu_u\mu_v + c_1)(2\sigma_{uv} + c_2)}{(\mu_u^2 + \mu_v^2 + c_1)(\sigma_u^2 + \sigma_v^2 + c_2)} \]

where:

  • \(\mu_u\) and \(\mu_v\) are the mean pixel values of the two patches
  • \(\sigma_u^2\) and \(\sigma_v^2\) are the variances
  • \(\sigma_{uv}\) is the covariance
  • \(c_1\) and \(c_2\) are small constants to avoid division by zero
Emir NCC
NCC alignment (Emir)
Emir SSIM
SSIM alignment (Emir)

Final results

Pyramid alignment with SSIM on the full set of outputs:

Emir
Red offset: (104, 40), green: (48, 20)
Monastery
Red: (2, 2), green: (−4, 2)
Church
Red: (58, −4), green: (26, 4)
Three generations
Red: (110, 10), green: (52, 14)
Melons
Red: (178, 12), green: (80, 10)
Onion church
Red: (108, 36), green: (52, 26)
Train
Red: (86, 30), green: (40, 6)
Tobolsk
Red: (6, 4), green: (2, 2)
Icon
Red: (88, 22), green: (40, 16)
Cathedral
Red: (12, 4), green: (4, 2)
Self portrait
Red: (174, 36), green: (78, 28)
Harvesters
Red: (122, 12), green: (58, 14)
Sculpture
Red: (140, −26), green: (34, −10)
Lady
Red: (118, 12), green: (56, 8)