In this project, I explored creating simple image mosaics using homographies and multi-resolution blending. All of these image mosaics were created using images I took while trying my best to preserve the center of projection in order for homographies to be able to warp them correctly. Due to this, I attempted to use images that were of more landscape and with as little movement as possible. I took sets of three pictures that could be stitched from left to right.
After taking my pictures, I manually marked correspondences in my images between two views. I also manually marked corners on the images corresponding to the extent of the images to include in my mosaic. Using these pairs of correspondences, I could recover the Homography matrix satisfying the relationship \(p'=\matrix{H}p\), where the homography matrix H has 8 degrees of freedom. Since each point pair gives us two equations, we need at least \(4\) correspondence pairs to solve for our homography matrix. However, if we can provide more, we can end up with a more robust solution by solving a least squares problem. Our homography matrix is a \(3 \times 3\) matrix of the form: \[ H = \begin{bmatrix} a & b & c \\ d & e & f \\ g & h & 1 \end{bmatrix} \] Flattening out this matrix and setting up the system of equations by plugging in our correspondence points, we get the following system of equations we can solve by Least Squares: \[ \begin{bmatrix} x_1 & y_1 & 1 & 0 & 0 & 0 & -x_1x_1' & -y_1x_1' \\ 0 & 0 & 0 & x_1 & y_1 & 1 & -x_1y_1' & -y_1y_1' \\ x_2 & y_2 & 1 & 0 & 0 & 0 & -x_2x_2' & -y_2x_2' \\ 0 & 0 & 0 & x_2 & y_2 & 1 & -x_2y_2' & -y_2y_2' \\ & & & & \vdots & & & \\ \end{bmatrix} \begin{bmatrix} a \\ b \\ c \\ d \\ e \\ f \\ g \\ h \end{bmatrix} = \begin{bmatrix} x_1' \\ y_1' \\ x_2' \\ y_2' \\ \vdots \end{bmatrix} \]
Now that we have a homography matrix between two sets of correspondence points, we can use this matrix to warp images into different perspectives.
I used the homography matrix to first forward warp my defined corner points, and then I used an inverse warp on all of the points inside this defined region.
I used scipy.interpolate.RegularGridInterpolator
to interpolate the original pixel values of these new warped points.
An interesting use of this homography matrix that also was a good test of its accuracy was to rectify images. I chose a rectangular portion of my images and then warped them to a rectangular shape (I arbitrarily chose a base and height dimension for my rectangles).
We can also use this same warping procedure to warp images into the same perspective to be used for image mosaics. Instead of warping to a rectangular region, we can warp correspondences from one image to another to get two images taken from different angles to be in the same perspective.
To convert my sets of images into image mosaics, I first had to warp all of the images into the same perspective using the method described in the previous section. I designated one "middle" image which all of the other images would be warped to. I then lined up the images according to the location of the correspondences of the middle image, and allocated space around to fit the other images. I then used a binary mask of the image shapes followed by a distance transform to construct a blending mask for the images. The distance transform represents the distance of each pixel to the nearest edge of the binary mask (I used Euclidean Distance). This could be considered as a representation of the "weight" of each pixel based on its distance from the "center" of the image.
I then used the distance transform to create the final blending mask I used by comparing the distance transforms of each image to create a binary mask. With this blending mask and the warped and aligned images, I could then blend the images together using a Laplacian Stack to create the final image mosaic. I ended up finding that when taking my pictures, I rotated quite a bit between perspectives, so this created a very warped image mosaic. This also made it slightly difficult to fully blend out the edges between the images (I also used images that were slightly too large, so the code would often crash my notebook kernel).
We can make our jobs easier by finding a way to automatically stitch our images together. In the previous parts, I would manually choose good correspondences between each image, but we can do this automatically by using the methods proposed by Brown et. al. in their paper "Multi-Image Matching using Multi-Scale Oriented Patches.
The first step we perform is detecting parts of the image that we define to be nice correspondence points. These "corners" are points in the image where the gradient of the image is high in all directions. We can detect these points by using the Harris Interest Point Detector. This detector uses the following formula to calculate the "cornerness" of a point: \[ R = \frac{\det(M)}{(\text{trace}(M))} \] Where \(M\) is our second moment matrix calculated from the gradients in our image.
Here are some examples of our Harris Interest Points on an image:As we noticed, we have many interest points, and they are not well distributed across our image. Because we are taking images from multiple perspectives, we want to make sure we have a good distribution of points across our image to match with other images. We can do this by using Adaptive Non-Maximal Suppression. This method works by choosing the strongest points based on the Harris values and then suppressing points that are too close to each other (we also add a robustness factor to ensure points are strong enough to suppress their neighbors). I used this method to extract the best 500 points from our Harris Interest Points.
We can clearly see that our points are now more evenly distributed across our image compared to the full version and a naive threshold on the Harris values:Now that we have our interest points, we need to extract feature descriptors from these points. These descriptors are what we will use to match points between images. We can use a version of Multi-Scale Oriented Patches (MOPS) method to extract these descriptors. This method works by taking a 40x40 patch around our interest points, and downsampling this patch to 8x8. The original paper also uses rotation-invariance to allow for better features, but I did not implement this in my project. After we extract these patches, we normalize our patches to have zero mean and unit variance.
Now that we have our feature descriptor vectors, we can match these vectors between images to find correspondences. While we could do a simple metric of actual distance between two images and choosing the best for each, not every feature patch in our image has a match in the corresponding image. This means it is better to use a discriminator approach. We can do this by comparing the error of the 1-Nearest Neighbor to the error of the 2-Nearest Neighbor. If the ratio of the two errors is less than a certain threshold, we can consider this a good match.
Our final step for automatically determining correspondence points is to use RANSAC (Random Sample Consensus) to filter out outlier points. RANSAC works by randomly choosing 4 points from our matches and calculating a homography matrix from these points. We then use this homography matrix to warp our points and see how many points are inliers. We repeat this process for a certain number of iterations and choose the homography matrix with the most inliers. These are considered our final matching points we can use for our image mosaic warpings.
We can then use these to warp our images into the same perspective and into mosaics as we did in the previous part. As can be seen below, this worked about as well as my manual correspondences did, but it was much faster and less tedious.
Original Images:
Image Mosaic:
Original Images:
Image Mosaic:
Original Images:
Image Mosaic:
I found this project to be quite fun. It was really interesting to be able to warp images into different perspectives and blend them together to create a single image. My favorite part of this project was probably the rectification, as it worked really well and was quite interesting to see the results of. I struggled a bit with my image mosaics, as the images I had chose were a little bit too much of a change of perspective, so the warping created large distortions. The blending also was not as smooth as I would have liked due to these sharp changes in perspective as well. I also had some issues with the size of my images, as they were too large and would often crash my notebook kernel (although I realized I should have just resized my images, which is what I did after part 1 of the project). I also enjoyed trying out the automatic feature matching since it was quite tedious to manually match point by point each time (although this did not work quite as well as manually finding points). Unfortunately, since many of the images I had chosen to use in the beginning did not have a lot of parts of the image conducive to the auto-matching algorithm I implemented, I had to find some new images to use. These images did tend to turn out even better though, as I took them with better knowledge of what images worked well for this project.