Panorama Reconstruction with Homographies, Feature Matching, and RANSAC

Abstract

This work reconstructs mosaics by estimating projective transformations between overlapping images. The first half uses manually selected correspondences to compute homographies for rectification and panorama stitching. The second half automates correspondence discovery with feature detection, descriptor matching, and RANSAC.

The important technical thread is robust geometry: a panorama is only as good as the point correspondences used to estimate the warp, and a feature matcher is only useful if outliers are rejected before computing the final homography.

Homographies and Rectification

A homography is a 3times3 projective transform that maps points between two views of a plane. Using homogeneous coordinates:

x' y' 1 \sim H x y 1, H= h 11 h 12 h 13 h 21 h 22 h 23 h 31 h 321

Each correspondence contributes two linear equations after clearing the homogeneous denominator:

x'= h 11 x+h 12 y+h 13 h 31 x+h 32 y+1, y'= h 21 x+h 22 y+h 23 h 31 x+h 32 y+1

x y 1 0 0 0 -x'x -x'y 0 0 0 x y 1 -y'x -y'y h= x' y'

Stacking equations from multiple correspondences gives an overdetermined system solved by least squares. For rectification, I select four corners of a planar object and map them to a rectangle, producing a front-facing view.

Manual Mosaics

For the manually stitched mosaics, I selected overlapping points in each image pair, computed a homography, inverse-warped one image into the other image's coordinate system, estimated an output canvas from the warped corner positions, and blended the images on that canvas.

The result depends on both geometric accuracy and blending. Even with a good homography, the output can show black regions or visible transitions if the camera rotation and canvas bounds produce uncovered areas.

Feature Detection and ANMS

The automatic pipeline begins with Harris corner detection, which identifies points where image intensity changes strongly in multiple directions. The Harris response is based on the second-moment matrix M:

R=det(M)-ktrace(M) 2

Raw Harris points cluster around textured regions, so I applied Adaptive Non-Maximal Suppression. ANMS keeps strong corners that are spatially spread out by assigning each point a suppression radius: the distance to the nearest point with significantly stronger response.

r i =min j ‖x i -x j ‖ such that f(x j) gt;c f(x i)

Descriptors, Matching, and RANSAC

For each selected corner, I extracted a descriptor by sampling an 8times8 patch from a larger blurred 40times40 region. The descriptor is normalized to reduce sensitivity to brightness and contrast. Matching uses descriptor distance, with a nearest-neighbor ratio test to reduce ambiguous matches.

‖d i -d 1 ‖ 2 ‖d i -d 2 ‖ 2 lt;τ

Even after ratio testing, outliers remain. RANSAC repeatedly samples minimal correspondence sets, estimates a candidate homography, projects all points, and counts inliers under a reprojection threshold. The final homography is refit from the inliers.

ε i =‖x i'-π(Hx i)‖ 2

Autostitched Results and Three-View Experiment

Using the RANSAC-filtered correspondences, I computed final automatic mosaics. The VLSB and skyline outputs are the strongest. The street and sunset experiments show that feature matches can be plausible while blending or warping still leaves visible artifacts.

Additional Implementation Notes

For rectification and mosaicing, I used inverse warping because it avoids holes in the output image. The output canvas is first defined in the target coordinate system. For each output pixel, the inverse homography maps the coordinate back into the source image, where the pixel value can be sampled. This is more stable than pushing source pixels forward because a forward warp can skip target pixels when the mapping expands or shears the image.

Canvas construction is a practical but important part of the mosaic pipeline. After applying the homography to the corners of the image being warped, I used the transformed corners to estimate the bounds of the output panorama. Because some transformed coordinates can be negative, the implementation needs a translation offset so all content lands inside the positive pixel grid. Many visual artifacts in mosaics come not from the homography itself, but from this bookkeeping around bounds, offsets, and overlapping regions.

The automatic feature pipeline follows the Brown, Szeliski, and Winder style of local patch matching. Harris detection supplies many candidate corners, ANMS selects a spatially distributed subset, and normalized patch descriptors make matching less sensitive to local brightness changes. The descriptor is deliberately low dimensional: the goal is not to describe every texture detail, but to create enough distinctiveness that corresponding corners are close in descriptor space.

The ratio test is useful because absolute descriptor distance alone can be misleading. A point in a repetitive texture may have several plausible matches, all with similar distances. The ratio between the nearest and second-nearest neighbor measures ambiguity. If the nearest match is only slightly better than the second-nearest, the correspondence is unreliable and should be rejected before RANSAC.

RANSAC then turns a noisy correspondence set into a robust geometric estimate. Each iteration samples a small set of matches, computes a homography, and asks how many other matches agree with that model. Outliers may be numerous, but they are unlikely to agree on the same projective transform. Once a large inlier set is found, refitting the homography on all inliers improves stability relative to using only the minimal sample.

The automatic panoramas reveal that feature matching and image blending are separate problems. RANSAC can produce a sensible set of inliers and still leave visible seams, exposure differences, or black regions if the projection surface and blending strategy are too simple. That is why a production panorama system usually adds cylindrical or spherical projection, gain compensation, seam finding, and multi-band blending after the geometric alignment stage.

The three-view sunset experiment also forced a more global way of thinking about stitching. With two images, there is a single homography and one overlap region. With three images, the middle view becomes a natural reference frame, and both outer images must be warped into that shared coordinate system. Any small error in either side becomes visible when the panorama is assembled, so multi-image stitching benefits from reasoning about a graph of pairwise relationships rather than treating every pair independently.

Technical Takeaways and Future Work

RANSAC was the central robustness tool: it allowed a good homography to be estimated from a noisy set of matches. The hardest part remained warping and blending; several mosaics shared similar black-space artifacts, suggesting that canvas placement and projection choice could be improved.

Future work would add cylindrical warping for wide panoramas, multi-band blending, exposure compensation, and a cleaner three-image stitching graph rather than stitching pairs in a fixed order.