Image Mosaics

Evan Chang

CS180 Project 4

Introduction

In this project, I explored creating simple image mosaics using homographies and multi-resolution blending. All of these image mosaics were created using images I took while trying my best to preserve the center of projection in order for homographies to be able to warp them correctly. Due to this, I attempted to use images that were of more landscape and with as little movement as possible. I took sets of three pictures that could be stitched from left to right.

Image Warping and Mosaicing

Recovering Homographies

After taking my pictures, I manually marked correspondences in my images between two views. I also manually marked corners on the images corresponding to the extent of the images to include in my mosaic. Using these pairs of correspondences, I could recover the Homography matrix satisfying the relationship \(p'=\matrix{H}p\), where the homography matrix H has 8 degrees of freedom. Since each point pair gives us two equations, we need at least \(4\) correspondence pairs to solve for our homography matrix. However, if we can provide more, we can end up with a more robust solution by solving a least squares problem. Our homography matrix is a \(3 \times 3\) matrix of the form: \[ H = \begin{bmatrix} a & b & c \\ d & e & f \\ g & h & 1 \end{bmatrix} \] Flattening out this matrix and setting up the system of equations by plugging in our correspondence points, we get the following system of equations we can solve by Least Squares: \[ \begin{bmatrix} x_1 & y_1 & 1 & 0 & 0 & 0 & -x_1x_1' & -y_1x_1' \\ 0 & 0 & 0 & x_1 & y_1 & 1 & -x_1y_1' & -y_1y_1' \\ x_2 & y_2 & 1 & 0 & 0 & 0 & -x_2x_2' & -y_2x_2' \\ 0 & 0 & 0 & x_2 & y_2 & 1 & -x_2y_2' & -y_2y_2' \\ & & & & \vdots & & & \\ \end{bmatrix} \begin{bmatrix} a \\ b \\ c \\ d \\ e \\ f \\ g \\ h \end{bmatrix} = \begin{bmatrix} x_1' \\ y_1' \\ x_2' \\ y_2' \\ \vdots \end{bmatrix} \]

Image Warping and Rectification

Now that we have a homography matrix between two sets of correspondence points, we can use this matrix to warp images into different perspectives. I used the homography matrix to first forward warp my defined corner points, and then I used an inverse warp on all of the points inside this defined region. I used scipy.interpolate.RegularGridInterpolator to interpolate the original pixel values of these new warped points.

An interesting use of this homography matrix that also was a good test of its accuracy was to rectify images. I chose a rectangular portion of my images and then warped them to a rectangular shape (I arbitrarily chose a base and height dimension for my rectangles).

Rectification Results:
Picture of a sign in Oakland Arena for IVE
Original Image
Rectified Image of a sign in Oakland Arena for IVE
Rectified Portion of Image
Image of left portion of entrance way of Giannini Hall
Original Image
Rectified Image of a picture frame on the left side of Giannini Hall
Rectified Portion of Image
"
Image of a concert stage of (G)-IDLE
Original Image
Rectified Image of a concert screen of (G)-IDLE
Rectified Portion of Image

We can also use this same warping procedure to warp images into the same perspective to be used for image mosaics. Instead of warping to a rectangular region, we can warp correspondences from one image to another to get two images taken from different angles to be in the same perspective.

Image of left portion of entrance way of Giannini Hall
Left Image
Warped image of right portion of entrance way of giannini hall
Left Image (Perspective of Middle)
Image of middle portion of entrance way of Giannini Hall
Middle Image

Image Mosaics

To convert my sets of images into image mosaics, I first had to warp all of the images into the same perspective using the method described in the previous section. I designated one "middle" image which all of the other images would be warped to. I then lined up the images according to the location of the correspondences of the middle image, and allocated space around to fit the other images. I then used a binary mask of the image shapes followed by a distance transform to construct a blending mask for the images. The distance transform represents the distance of each pixel to the nearest edge of the binary mask (I used Euclidean Distance). This could be considered as a representation of the "weight" of each pixel based on its distance from the "center" of the image.

Distance Transform of Left Image
Distance Transform of Left Image
Distance Transform of Middle Image
Distance Transform of Middle Image
Distance Transform of Right Image
Distance Transform of Right Image

I then used the distance transform to create the final blending mask I used by comparing the distance transforms of each image to create a binary mask. With this blending mask and the warped and aligned images, I could then blend the images together using a Laplacian Stack to create the final image mosaic. I ended up finding that when taking my pictures, I rotated quite a bit between perspectives, so this created a very warped image mosaic. This also made it slightly difficult to fully blend out the edges between the images (I also used images that were slightly too large, so the code would often crash my notebook kernel).

Image Mosaic Results:

Image of left portion of entrance way of Giannini Hall
Left Image
Image of middle portion of entrance way of Giannini Hall
Middle Image
Image of right portion of entrance way of Giannini Hall
Right Image
Blended Image of entrance way of Giannini Hall
Blended Image
Image of left portion of area of College of Natural Resources Parking Lot
Left Image
Image of middle portion of area of College of Natural Resources Parking Lot
Middle Image
Image of right portion of area of College of Natural Resources Parking Lot
Right Image
Blended Image of area of College of Natural Resources Parking Lot
Blended Image
Image of bottom portion of Sather Tower
Bottom Image
Image of middle portion of Sather Tower
Middle Image
Image of top portion of Sather Tower
Top Image
Blended Image of Sather Tower
Blended Image

Feature Matching for Autostitching

We can make our jobs easier by finding a way to automatically stitch our images together. In the previous parts, I would manually choose good correspondences between each image, but we can do this automatically by using the methods proposed by Brown et. al. in their paper "Multi-Image Matching using Multi-Scale Oriented Patches.

Harris Interest Point Detector

The first step we perform is detecting parts of the image that we define to be nice correspondence points. These "corners" are points in the image where the gradient of the image is high in all directions. We can detect these points by using the Harris Interest Point Detector. This detector uses the following formula to calculate the "cornerness" of a point: \[ R = \frac{\det(M)}{(\text{trace}(M))} \] Where \(M\) is our second moment matrix calculated from the gradients in our image.

Here are some examples of our Harris Interest Points on an image:
Harris Interest Points on an Image
Harris Interest Points

Adaptive Non-Maximal Suppression

As we noticed, we have many interest points, and they are not well distributed across our image. Because we are taking images from multiple perspectives, we want to make sure we have a good distribution of points across our image to match with other images. We can do this by using Adaptive Non-Maximal Suppression. This method works by choosing the strongest points based on the Harris values and then suppressing points that are too close to each other (we also add a robustness factor to ensure points are strong enough to suppress their neighbors). I used this method to extract the best 500 points from our Harris Interest Points.

We can clearly see that our points are now more evenly distributed across our image compared to the full version and a naive threshold on the Harris values:
Harris Interest Points on an Image
Harris Interest Points
Harris Interest Points on an Image (Thresholded)
Naive Thresholded Points
"
Adaptive Non-Maximal Suppression on Harris Interest Points
Points after ANMS

Extracting Feature Descriptors

Now that we have our interest points, we need to extract feature descriptors from these points. These descriptors are what we will use to match points between images. We can use a version of Multi-Scale Oriented Patches (MOPS) method to extract these descriptors. This method works by taking a 40x40 patch around our interest points, and downsampling this patch to 8x8. The original paper also uses rotation-invariance to allow for better features, but I did not implement this in my project. After we extract these patches, we normalize our patches to have zero mean and unit variance.

Feature Descriptors of Interest Points
Randomly Slected Feature Descriptors of Our Image

Feature Matching

Now that we have our feature descriptor vectors, we can match these vectors between images to find correspondences. While we could do a simple metric of actual distance between two images and choosing the best for each, not every feature patch in our image has a match in the corresponding image. This means it is better to use a discriminator approach. We can do this by comparing the error of the 1-Nearest Neighbor to the error of the 2-Nearest Neighbor. If the ratio of the two errors is less than a certain threshold, we can consider this a good match.

Feature Matches between Two Images
Feature Matches between Two Images

RANSAC

Our final step for automatically determining correspondence points is to use RANSAC (Random Sample Consensus) to filter out outlier points. RANSAC works by randomly choosing 4 points from our matches and calculating a homography matrix from these points. We then use this homography matrix to warp our points and see how many points are inliers. We repeat this process for a certain number of iterations and choose the homography matrix with the most inliers. These are considered our final matching points we can use for our image mosaic warpings.

RANSAC Matches between Two Images
RANSAC Matches between Two Images
As we can see, using RANSAC significantly reduces our number of matches to only the best matches between our images.

Results:

We can then use these to warp our images into the same perspective and into mosaics as we did in the previous part. As can be seen below, this worked about as well as my manual correspondences did, but it was much faster and less tedious.

Blended Image of Sather Tower
Manual Correspondences
Blended Image of Sather Tower
Automatic Correspondences

More Results:

Original Images:

Image of Sun Moon Lake
Image of Sun Moon Lake

Image Mosaic:

Blended Image of Sun Moon Lake

Original Images:

Image from Greek Theatre
Image from Greek Theatre

Image Mosaic:

Blended Image from Greek Theatre

Original Images:

Image from Edinburgh Castle
Image from Edinburgh Castle
Image from Edinburgh Castle

Image Mosaic:

Blended Image from Edinburgh Castle

Conclusion

I found this project to be quite fun. It was really interesting to be able to warp images into different perspectives and blend them together to create a single image. My favorite part of this project was probably the rectification, as it worked really well and was quite interesting to see the results of. I struggled a bit with my image mosaics, as the images I had chose were a little bit too much of a change of perspective, so the warping created large distortions. The blending also was not as smooth as I would have liked due to these sharp changes in perspective as well. I also had some issues with the size of my images, as they were too large and would often crash my notebook kernel (although I realized I should have just resized my images, which is what I did after part 1 of the project). I also enjoyed trying out the automatic feature matching since it was quite tedious to manually match point by point each time (although this did not work quite as well as manually finding points). Unfortunately, since many of the images I had chosen to use in the beginning did not have a lot of parts of the image conducive to the auto-matching algorithm I implemented, I had to find some new images to use. These images did tend to turn out even better though, as I took them with better knowledge of what images worked well for this project.