Computer Science 5722 - Computer Vision
Assignment 3: Visual Odometry:
Motion from correspondences
Due : in class March 18 20 April 1
Visual Odometry We can estimate the trajectory of a moving
camera based on the image motion arising from it. In this assignment
you will extend your implementation of a simplified version of the
work of Konolige et
al. , to estimate the trajectory of a robot in an outdoor
environment. Although there are numerous approaches to VO, this
assignment will use monocular interest point feature tracking,
estimation of pose using RANSAC on the
8 point algorithm, and incremental bundle adjustment.
CenSurE Features. In the previous assignment you
implemented Konolige's CenSurE features. In this next step we use the
features and their frame to frame correspondences to estimate the
world motion of the camera.
Consensus correspondence. You will use RANSAC with the 8
point algorithm for estimating the Essential matrix
E which describes the epipolar geometry and therefore the
motion of the robot for a pair of frames (viewpoints) in the sequence.
The 8 point algorithm works as follows:
- We know that for normalized coordinates:
- We obtain normalized coordinates y by transforming pixel
correspondences p=[u,v,1] back to the image plane using calibrated
camera intrinsics focal length f (in pixels) and image center C.
- Hartley's paper (above) recommends further normalizing the points
y=[yx,yy,1] for each image, by translating their centroid to
the origin then scaling them so their average distance to the origin
is
. This gives 2 transforms
T1 and T2, which must be undone once the E matrix is estimated by T2'*E_est*T1.
- Use these normalized y to construct a linear system of equations
eg for each corresponding pair y1=[y1x,y1y,y1z] and y2=[y2x,y2y,y2z]
we have equation:
- We want to solve for the elements e of E, so we
construct a matrix A from our correspondences. To
estimate E we compute the singular value decomposition (SVD)
, the rightmost column of U (or V) is
reshaped to form our estimate of E.
- To enforce that E is rank 2 and has 2 equal nonzero
singular values we can compute [u,s,v] = SVD(E) and substitute
to estimate
- We know that
, so we can
extract the motion [R t] between frames.
- Let
- For [u,s,v] = SVD(E) , choose t to be the 3rd
column of u, Ra=uDv' and Rb=uD'v'. Any combination of R and t will
satisfy the epipolar constraint, but will not necessarily be
physically valid.
- Assuming that the first camera matrix is [I|0] and that t is unit
length, there are 4 possible solutions for the seconds camera
P1=[Ra|t], P2=[Ra|-t], P3=[Rb|t] and P4=[Rb|-t]. These represent 4 possible solutions for reconstruction from E:
- We must determine the actual configuration. To do this reconstruct
one point (corresponding pair) using ([I|0],P1) as the camera pair. A
linear method for triangulating a point given a pair of camera
matrices and the corresponding (normalized) image points is to form the system of equations:
- Solve for the homogeneous coordinates X=[x,y,z,w] using AX=0, using SVD.
- Test cheirality using: c1=zw, c2=dw, for [a,b,d] = P1X. If
c1<0 the point is behind cam1, and if c2<0 the point is behind
cam2. If c1>0 and c2>0 then P1 is the correct camera matrix. If c1<0 and c2<0 then P2 is selected. If c1c2<0 then use

to compute [p,q,r,n]=HX, and test zn>0, and if so select P3, otherwise P4.
RANSAC : basically selects models by randomly sampling a
small number of points and estimating parameters. The model which best
fits (most inliers) the observed data is chosen. In our case:
- Repeat for N samples:
- Randomly select 9 corresponding pairs.
- Fit the Essential matrix using the least squares technique above.
- Reconstruct all the correspondences and compute a distance metric
where d is the Euclidean distance in the image between the original correspondence and the projection of the reconstructed point.
- Count the number of inliers with distance less than some threshold t.
- Choose the E with the greatest number of inliers, use all the inliers to reestimate E, and the relative motion [R t] between the frames (viewpoints).
Download the zip file available here.
Create a MATLAB program (a .m file containing MATLAB commands) which
does the following:
- For each sequential pair of frames i and i+1 in the image
sequence compute a list of correspondences.
- Use the RANSAC approach to find Essential matrix E which best fits the observed correspondences.
- Record the associated motion for each pair of frames.
- Concatenate the motions to plot the motion of the robot.
Submit your completed program, .mat file and a brief explanation
of the threshold and design choices you made to Wei Xu according to
his instructions , by 11am (class
time) on the due date.