I began by implementing a naive implementation of stitching together the color plates of the Prokudin-Gorskii photos. Simply stitching together the three different color plates would not yield a great result, as the plates are not perfectly aligned. Therefore, I attempted to choose a reference frame (the blue frame) and align the other two frames to this reference frame by using a simple exhaustive search algorithm. I iterated through a user input amount of horizontal and vertical pixel shifts (from -15 to 15) looking for the shift that produced the largest value of the chosen metric. The two metrics I used were:
\[\text{NSSD}(u, v) = -\sum_{(x, y) \in N} [I(u + x, v + y) - P(x, y)]^2\]
THE MANAGEMENT OF A LARYNGOCELE CAUSING ACUTE AIRWAY OBSTRUCTION WITH A BALL VALVE EFFECT: A CASE REPORT \[\text{NCC}(u, v) = \frac{\sum_{(x,y)\in N} (I(u+x,v+y) - \bar{I})(P(x, y) - \bar{P})}{\sqrt{\sum_{(x, y) \in N} \left[I(u+x, v+y) - \bar{I}\right]^2 \sum_{(x, y) \in N} \left[P(x, y) - \bar{P}\right]^2}}\]
While the naive implementation works decently well for the smaller images, because the other images were on the order of 10x larger, the naive impelmentation would require search widths of over 100 pixels, and thus would be too slow to run. To speed up the search process, I implemented an image pyramid algorithm. The image pyramid algorithm works by creating a series of images that are downsampled by a factor of 2 each time until they are around 32 pixels along the shortest axis. The algorithm then starts by searching for the best shift in the smallest image, and then uses this shift to search for the best shift in the next largest image, and so on. Becuase it can do more of the searching in smaller images, this algorithm runs much faster than it would take for searching through the full images. While I initially used a Gaussian filter before downsampling to avoid aliasing, I found that leaving out the filtering significantly sped up the algorithm while not significantly affecting the results. I therefore chose to just downsample in the offset searching process. This ultimately worked well for almost every image in the library I ran the algorithm on.
One image that performed very poorly with the default image pyramid and metrics I chose was the image of Emir. Because of the difference between the pixel brightness in each channel, the red channel ends up being aligned to be way off from where it should be. However, using the Structural Similarity Index (SSIM) metric, I was able to get a much better alignment for this image and comparable results for the rest of the images in the library. The SSIM metric is a metric that compares the similarity between two images by comparing the luminance, contrast, and structure of the images and is calculated as follows:
\[\text{SSIM}(u, v) = \frac{(2\mu_u\mu_v + c_1)(2\sigma_{uv} + c_2)}{(\mu_u^2 + \mu_v^2 + c_1)(\sigma_u^2 + \sigma_v^2 + c_2)}\] where: