12. Bundle adjustment

12.1. Overview

Satellite position and orientation errors have a direct effect on the accuracy of digital elevation models produced by the Stereo Pipeline. If they are not corrected, these uncertainties will result in systematic errors in the overall position and slope of the DEM. Severe distortions can occur as well, resulting in twisted or “taco-shaped” DEMs, though in most cases these effects are quite subtle and hard to detect. In the worst case, such as with old mission data like Voyager or Apollo, these gross camera misalignments can inhibit Stereo Pipeline’s internal interest point matcher and block auto search range detection.


Fig. 12.1 Bundle adjustment is illustrated here using a color-mapped, hill-shaded DEM mosaic from Apollo 15, Orbit 33, images. (a) Prior to bundle adjustment, large discontinuities can exist between overlapping DEMs made from different images. (b) After bundle adjustment, DEM alignment errors are minimized and no longer visible.

Errors in camera position and orientation can be corrected using a process called bundle adjustment. Bundle adjustment is the process of simultaneously adjusting the properties of many cameras and the 3D locations of the objects they see in order to minimize the error between the estimated, back-projected pixel locations of the 3D objects and their actual measured locations in the captured images. This is called the reprojection error.

This complex process can be boiled down to this simple idea: bundle adjustment ensures that the observations in multiple images of a single ground feature are self-consistent. If they are not consistent, then the position and orientation of the cameras as well as the 3D position of the feature must be adjusted until they are. This optimization is carried out along with thousands (or more) of similar constraints involving many different features observed in other images. Bundle adjustment is very powerful and versatile: it can operate on just two overlapping images, or on thousands. It is also a dangerous tool. Careful consideration is required to insure and verify that the solution does represent reality.

Bundle adjustment can also take advantage of GCPs, which are 3D locations of features that are known a priori (often by measuring them by hand in another existing DEM). GCPs can improve the internal consistency of your DEM or align your DEM to an existing data product. Finally, even though bundle adjustment calculates the locations of the 3D objects it views, only the final properties of the cameras are recorded for use by the Ames Stereo Pipeline. Those properties can be loaded into the parallel_stereo program which uses its own method for triangulating 3D feature locations.

When using the Stereo Pipeline, bundle adjustment is an optional step between the capture of images and the creation of DEMs. The bundle adjustment process described below should be completed prior to running the parallel_stereo command.

Although bundle adjustment is not a required step for generating DEMs, it is highly recommended for users who plan to create DEMs for scientific analysis and publication. Incorporating bundle adjustment into the stereo work flow not only results in DEMs that are more internally consistent, it is also the correct way to co-register your DEMs with other existing data sets and geodetic control networks.

A DEM obtained after bundle adjustment and stereo may need to be aligned to a known reference coordinate system. For that, use the pc_align tool (Section 16.49).

See the options --heights-from-dem (Section and --reference-terrain further down for how to incorporate an external DEM in bundle adjustment. Note that these can only locally refine camera parameters, an initial alignment with pc_align is still necessary.

12.2. Bundle adjustment using ASP

Stereo Pipeline provides its own bundle adjustment tool, named bundle_adjust. Its usage is described in Section 16.5.

Here is an example of using this tool on a couple of Apollo 15 images, and its effect on decreasing the stereo triangulation error.


Fig. 12.2 Illustration of the triangulation error (intersection error) map (Section 11.4.1) for a pair of images before (left) and after (right) using Stereo Pipeline’s bundle_adjust. Red and black colors suggest higher error.

Running parallel_stereo without using bundle-adjusted camera models:

parallel_stereo AS15-M-1134.cub AS15-M-1135.cub run_noadjust/run

Performing bundle adjustment:

bundle_adjust AS15-M-1134.cub AS15-M-1135.cub -o run_ba/run

Here only camera orientations are refined. How to optimize the intrinsics (if applicable) is discussed further down (Section 12.2.1).

Running parallel_stereo while using the bundle-adjusted camera models:

parallel_stereo AS15-M-1134.cub AS15-M-1135.cub run_adjust/run \
  --bundle-adjust-prefix run_ba/run

A comparison of the two ways of doing stereo is shown in Fig. 12.2.

Bundle adjustment aims to make the cameras more self-consistent but offers no guarantees about their absolute positions (unless GCP are used), in fact, the cameras can move away a lot sometimes. The options --rotation-weight, --translation-weight, and --camera-weight can be used to constrain how much the cameras can move during bundle adjustment. Note that large values for these may impact the ability to make the cameras self-consistent.

ASP also offers the tool parallel_bundle_adjust which can create match files using multiple processes spread over multiple machines (Section 16.45). These can also be used later during stereo with the options --match-files-prefix and --clean-match-files-prefix.

12.2.1. Floating intrinsics and using a lidar or DEM ground truth

This section documents some advanced functionality, and it suggested the reader study it carefully and invest a certain amount of time to fully take advantage of these concepts.

When the input cameras are of Pinhole type (Section 20.1), optical bar (Section 20.3), or CSM (Section 8.12), it is possible to optimize the intrinsic parameters (focal length, optical center, distortion, with a somewhat different list for optical bar cameras), in addition to the extrinsics.

It is also possible to take advantage of an existing terrain ground truth, such as a lidar file or a DEM, to correct imperfectly calibrated intrinsic parameters, which can result in greatly improved results, such as creating less distorted DEMs that agree much better with the ground truth.

See Section for how to optimize intrinsics with no constraints, Section for when ground constraints can be used (there exist options for sparse ground points and a DEM), and Section 12.2.2 for how to optimize the intrinsics per sensor. A first attempt at floating the intrinsics

We recommend that first bundle adjustment is run with the intrinsics fixed, to get the extrinsics mostly correct, as optimizing for both of them at the same time may result in a non-convex problem which may lead to a suboptimal local minimum. Then, we will jointly optimize (float) the intrinsics and extrinsics.

Note that when solving for intrinsics, bundle_adjust will by default optimize all intrinsic parameters and will share them across all cameras (which must be the same type). You can control this behavior with the --intrinsics-to-float and --intrinsics-to-share parameters.

Hence, the first invocation of camera optimization should be like:

bundle_adjust -t nadirpinhole --inline-adjustments      \
  left.tif right.tif left.tsai right.tsai -o run_ba/run

It is suggested that one run parallel_stereo with the obtained cameras:

parallel_stereo -t nadirpinhole --alignment-method epipolar      \
   left.tif right.tif run_ba/run-left.tsai run_ba/run-right.tsai \

followed by DEM creation (Section 16.52):

point2dem --tr RESOLUTION --errorimage run_stereo/run-PC.tif

Then examine and plot the intersection error:

gdalinfo -stats run_stereo/run-IntersectionErr.tif
colormap run_stereo/run-IntersectionErr.tif
stereo_gui run_stereo/run-IntersectionErr_CMAP.tif

If desired, fancier stereo correlation algorithms can be used, such as MGM, as detailed in Section 6.1. For colormap (Section 16.14), --min and --max bounds can be specified if the automatic range is too large.

We also suggest inspecting the interest points (Section 16.64.9):

stereo_gui left.tif right.tif run_ba/run

and then viewing the interest points from the menu.

If the interest points are not well-distributed, this may result in large ray intersection errors where they are missing. If so, they can be re-created by deleting the existing ones and then modifying --ip-detect-method and --ip-per-tile. Or, one can take advantage of the just-completed stereo run and invoke stereo_tri with of the two additional options:

--num-matches-from-disp-triplets 10000


--num-matches-from-disparity 10000

to create dense and uniformly distributed interest points with desired density (the latter creates a .match file that needs to be copied to the name bundle_adjust expects). See Section 17.5 for more details. See Section 12.3 for how to create sparse but still somewhat uniformly distributed interest points without making use of a stereo disparity.

The first of these options also ensures that if three images are present, and parallel_stereo is invoked on the first and second image, and then on the second and the third, followed by interest point generation, many interest points will be triplets, that is, the same feature will often will be identified in all three images, which can be a very good constraint on bundle adjustment later. (To not generate the triangulated point cloud after this, add the option --compute-point-cloud-center-only.)

If the interest points are good and the mean intersection error is acceptable, but this error shows an odd nonlinear pattern, that means it may be necessary to optimize the intrinsics. We do so by using the cameras with the optimized extrinsics found earlier. This is just an early such attempt, better approaches will be suggested below:

bundle_adjust -t nadirpinhole --inline-adjustments               \
  --solve-intrinsics --camera-weight 1                           \
  --max-pairwise-matches 20000                                   \
  left.tif right.tif run_ba/run-left.tsai run_ba/run-right.tsai  \
  -o run_ba_intr/run

It is important to note that only the non-zero intrinsics will be optimized, and the step size used in optimizing a certain intrinsic parameter is proportional to it. Hence, if an intrinsic is 0 and it is desired to optimize it, it should be set to small non-zero value suggestive of its final estimated scale. If the algorithm fails to give a good solution, perhaps different initial values for the intrinsics should be tried. For example, one can try changing the sign of the initial distortion coefficients, or make their values much smaller.

Here we assumed all intrinsics are shared. See Section 12.2.2 for how to have several groups of intrinsics. See also the option --intrinsics-to-share.

Sometimes the camera weight may need to be decreased, even all the way to 0, if it appears that the solver is not aggressive enough, or it may need to be increased if perhaps it overfits. This will become less of a concern if there is some ground truth, as discussed later.

Next, one can run parallel_stereo as before, with the new cameras, and see if the obtained solution is more acceptable, that is, if the intersection error is smaller. It is good to note that a preliminary investigation can already be made right after bundle adjustment, by looking at the residual error files before and after bundle adjustment. They are in the output directory, with names:


If desired, these csv files can be converted to a DEM with point2dem, which can be invoked with:

--csv-format 1:lon,2:lat,4:height_above_datum

then one can look at their statistics, also have them colorized, and viewed in stereo_gui (Section 16.64.6).

This file also shows how often each feature is seen in the images, so, if three images are present, hopefully many features will be seen three times. Using ground truth when floating the intrinsics

If a point cloud having ground truth, such as a lidar file or DEM exists, say named lidar.csv, it can be used as part of bundle adjustment. For that, the stereo DEM obtained earlier needs to be first aligned to this ground truth, such as:

pc_align --max-displacement VAL run_stereo/run-DEM.tif \
  lidar.csv -o run_align/run

(see the manual page of this tool in Section 16.49 for more details).

This alignment can then be applied to the cameras as well:

bundle_adjust -t nadirpinhole --inline-adjustments              \
  --initial-transform run_align/run-inverse-transform.txt       \
  left.tif right.tif run_ba/run-left.tsai run_ba/run-right.tsai \
  --apply-initial-transform-only -o run_align/run

Note that your lidar file may have some conventions as to what each column means, and then any tools that use this cloud must set --csv-format and perhaps also --datum and/or --csv-proj4.

If pc_align is called with the clouds in reverse order (the denser cloud should always be the first), when applying the transform to the cameras in bundle_adjust one should use transform.txt instead of inverse-transform.txt above.

See Section 16.49.14 for how to handle the case when input adjustments exist.

There are two ways of incorporating a ground constraint in bundle adjustment. The first one assumes that the ground truth is a DEM, and is very easy to use with a large number of images. See Section for more details. The second approach is in the upcoming section. Sparse ground truth and using the disparity

Here we will discuss an approach that works when the ground truth can be sparse, and we make use of the stereo disparity. It requires more work to set up than the earlier one.

We will need to create a disparity from the left and right images that we will use during bundle adjustment. For that we will take the disparity obtained in stereo and remove any intermediate transforms stereo applied to the images and the disparity. This can be done as follows:

stereo_tri -t nadirpinhole --alignment-method epipolar \
  --unalign-disparity                                  \
  left.tif right.tif                                   \
  run_ba/run-left.tsai run_ba/run-right.tsai           \

and then bundle adjustment can be invoked with this disparity and the lidar/DEM file. Note that we use the cameras obtained after alignment:

bundle_adjust -t nadirpinhole --inline-adjustments         \
  --solve-intrinsics --camera-weight 0                     \
  --max-disp-error 50                                      \
  --max-num-reference-points 1000000                       \
  --max-pairwise-matches 10000                             \
  --parameter-tolerance 1e-12                              \
  --reference-terrain lidar.csv                            \
  --reference-terrain-weight 5                             \
  --disparity-list run_stereo/run-unaligned-D.tif          \
  left.tif right.tif                                       \
  run_align/run-run-left.tsai run_align/run-run-right.tsai \
  -o run_ba_intr_lidar/run

Here we set the camera weight all the way to 0, since it is hoped that having a reference terrain is a sufficient constraint to prevent over-fitting.

This tool will write some residual files of the form:


which may be studied to see if the error-to-lidar decreased. Each residual is defined as the distance, in pixels, between a terrain point projected into the left camera image and then transferred onto the right image via the unaligned disparity and its direct projection into the right camera.

If the initial errors in that file are large to start with, say more than 2-3 pixels, there is a chance something is wrong. Either the cameras are not well-aligned to each other or to the ground, or the intrinsics are off too much. In that case it is possible the errors are too large for this approach to reduce them effectively.

We strongly recommend that for this process one should not rely on bundle adjustment to create interest points, but to use the dense and uniformly distributed ones created with stereo, as suggested earlier.

The hope is that after these directions are followed, this will result in a smaller intersection error and a smaller error to the lidar/DEM ground truth (the later can be evaluated by invoking geodiff --absolute on the ASP-created aligned DEM and the reference lidar/DEM file).

Here we assumed all intrinsics are shared. See Section 12.2.2 for how to have several groups of intrinsics. See also the option --intrinsics-to-share.

When the lidar file is large, in bundle adjustment one can use the flag --lon-lat-limit to read only a relevant portion of it. This can speed up setting up the problem but does not affect the optimization. Sparse ground truth and multiple images

Everything mentioned earlier works with more than two images, in fact, having more images is highly desirable, and ideally the images overlap a lot. For example, one can create stereo pairs consisting of first and second images, second and third, third and fourth, etc., invoke the above logic for each pair, that is, run stereo, alignment to the ground truth, dense interest point generation, creation of unaligned disparities, and transforming the cameras using the alignment transform matrix. Then, a directory can be made in which one can copy the dense interest point files, and run bundle adjustment with intrinsics optimization jointly for all cameras. Hence, one should use a command as follows (the example here is for 4 images):

bundle_adjust -t nadirpinhole --inline-adjustments         \
  --solve-intrinsics  --camera-weight 0                    \
  img1.tif img2.tif img3.tif img4.tif                      \
  run_align_12/run-img1.tsai run_align12/run-img2.tsai     \
  run_align_34/run-img3.tsai run_align34/run-img4.tsai     \
  --reference-terrain lidar.csv                            \
  --disparity-list "$disp1 $disp2 $disp3"                  \
  --max-disp-error 50 --max-num-reference-points 1000000   \
  --overlap-limit 1 --parameter-tolerance 1e-12            \
  --reference-terrain-weight 5                             \
  -o run_ba_intr_lidar/run

In case it is desired to omit the disparity between one pair of images, for example, if they don’t overlap, instead of the needed unaligned disparity one can put the word none in this list.

Notice that since this joint adjustment was initialized from several stereo pairs, the second camera picked above, for example, could have been either the second camera from the first pair, or the first camera from the second pair, so there was a choice to make. In Section 8.23 an example is shown where a preliminary bundle adjustment happens at the beginning, without using a reference terrain, then those cameras are jointly aligned to the reference terrain, and then one continues as done above, but this time one need not have dealt with individual stereo pairs.

The option --overlap-limit can be used to control which images should be tested for interest point matches, and a good value for it is say 1 if one plans to use the interest points generated by stereo, though a value of 2 may not hurt either. One may want to decrease --parameter-tolerance, for example, to 1e-12, and set a value for --max-disp-error, e.g, 50, to exclude unreasonable disparities (this last number may be something one should experiment with, and the results can be somewhat sensitive to it). A larger value of --reference-terrain-weight can improve the alignment of the cameras to the reference terrain.

Also note the earlier comment about sharing and floating the intrinsics individually. Using the heights from a reference DEM

In some situations the DEM obtained with ASP is, after alignment, quite similar to the reference DEM, but the heights may be off. This can happen, for example, if the focal length is not accurately known. It is then possible after triangulating the interest point matches in bundle adjustment to replace their heights above datum with values obtained from the reference DEM, which are presumably more accurate. The triangulated points being optimized can then be constrained to not vary too much from these initial positions.

The option for this is --heights-from-dem dem.tif. An additional control is given, in the form of the option --heights-from-dem-weight. The larger its value is, the more constrained those points will be. This multiplies the difference between the triangulated points being optimized and their initial value on the DEM.

This weight value should be inversely proportional with ground sample distance, as then it will convert the measurements from meters to pixels, which is consistent with the reprojection error term (error of projecting pixels into the camera). A less reliable DEM should result in a smaller weight being used.

Then, the option --heights-from-dem-robust-threshold ensures that the weighted differences defined earlier when comparing to the DEM plateau at a certain level and do not dominate the problem. Below we set this to 0.1, which is smaller than the --robust-threshold value of 0.5 which is used to control the reprojection error. Some experimentation with this weight and threshold may be needed.

If a triangulated point does not fall on a valid DEM pixel, bundle adjustment falls back to the --tri-weight constraint, if this constraint is used, or otherwise the triangulated point is not constrained at all.

Here is an example, and note that, as in the earlier section, we assume that the cameras and the terrain are already aligned:

bundle_adjust -t nadirpinhole               \
  --inline-adjustments                      \
  --max-pairwise-matches 10000              \
  --solve-intrinsics --camera-weight 0      \
  --max-pairwise-matches 20000              \
  --heights-from-dem dem.tif                \
  --heights-from-dem-weight 0.1             \
  --heights-from-dem-robust-threshold 0.1   \
  --parameter-tolerance 1e-12               \
  --remove-outliers-params "75.0 3.0 20 25" \
  left.tif right.tif                        \
  run_align/run-run-left.tsai               \
  run_align/run-run-right.tsai              \
  -o run_ba_hts_from_dem/run

Here we were rather generous with the parameters for removing outliers, as the input DEM may not be that accurate, and then if tying too much to it some valid matches be be flagged as outliers otherwise, perhaps.

It is suggested to use dense interest points as above (and adjust --max-pairwise-matches to not throw some of them out). We set --camera-weight 0, as hopefully the DEM constraint is enough to constrain the cameras.

It is important to note that here we assume that a simple height correction is enough. Hence this option is an approximation, and perhaps it should be used iteratively, and a subsequent pass of bundle adjustment should be done without it, or one should consider using a smaller weight above. This option can however be more effective than using --reference-terrain when there is a large uncertainty in camera intrinsics.

See two other large-scale examples of using this option, without floating the intrinsics, in the SkySat processing example (Section 8.23), using Pinhole cameras, and with linescan Lunar images with variable illumination (Section 13.9).

Here we assumed all intrinsics are shared. See Section 12.2.2 for how to have several groups of intrinsics. See also the option --intrinsics-to-share.

It is suggested to look at the documentation of all the options above and adjust them for your use case.

See Section 16.5 for the documentation of all options above, and Section 16.5.8 for the output reports being saved, which can help judge how well the optimization worked. RPC lens distortion

If it is realized that the optimized intrinsics still do not make the ASP-generated DEMs agree very well with the ground truth, and some residual and systematic error can be seen either by comparing these two or in intersection error files, it may be convenient to convert the current camera models to ones with the distortion given by rational function coefficients (RPC) of a desired degree (Section 20.1). An RPC model can have a lot more coefficients to optimize, hence a better fit can be found. However, it is suggested to use low-degree polynomials as those are easy to fit, and go to higher degree only for refinement if needed.

An example showing how to convert a camera model to RPC is given in Section 16.15. Working with map-projected images

If parallel_stereo was run with map-projected images, one can still extract dense interest point matches and the unaligned disparity from such a run, and these can be applied with the original unprojected images for the purpose of bundle adjustment (after being renamed appropriately). This may be convenient since while bundle adjustment must always happen with the original images, parallel_stereo could be faster and more accurate when images are map-projected. It is suggested that the unaligned disparity and interest points obtained this way be examined carefully. Particularly the grid size used in mapprojection should be similar to the ground sample distance for the raw images for best results.

12.2.2. Refining the intrinsics per sensor

Given a set of sensors, with each acquiring several images, we will optimize the intrinsics per sensor. All images acquired with the same sensor will share the same intrinsics, and none will be shared across sensors.

We will work with Kaguya TC linescan cameras and the CSM camera model (Section 8.12). Pinhole cameras in .tsai format (Section 20.1) and Frame cameras in CSM format (Section 8.12.1) can be used as well.

See Section 12.2.1 for an introduction on how optimizing intrinsics works, and Section 8.14 for how to prepare and use Kaguya TC cameras. Things to watch for

Optimizing the intrinsics can be tricky. One has to be careful to select a non-small set of images that have a lot of overlap, similar illumination, and an overall good baseline between enough images (Section 8.1).

It is suggested to do a lot of inspections along the way. If things turn out to work poorly, it is often hard to understand at what step the process failed. Most of the time the fault lies with the data not satisfying the assumptions being made.

The process will fail if, for example, the data is not well-aligned before the refinement of intrinsics is started, if the illumination is so different that interest point matches cannot be found, or if something changed about a sensor and the same intrinsics don’t work for all images acquired with that sensor.

The cam_test tool (Section 16.9) can be used to check if the distortion model gets inverted correctly. The distortion model should also be expressive enough to model the distortion in the images. Image selection

We chose a set of 10 Kaguya stereo pairs with a lot of overlap (20 images in total). The left image was acquired with the TC1 sensor, and the right one with TC2. These sensors have different intrinsics.

Some Kaguya images have different widths. These should not be mixed together.

Some images had very large difference in illumination (not for the same stereo pair). Then, finding of matching interest points can fail. Kaguya images are rather well-registered to start with, so the resulting small misalignment that could not be corrected by bundle adjustment was not a problem in solving for intrinsics, and pc_align (Section 16.49) was used later for individual alignment. This is not preferable, in general. It was tricky however to find many images with a lot of overlap, so this had to make do.

A modification of the work flow for the case of images with very different illumination is in Section Initial bundle adjustment with fixed intrinsics

Put the image and camera names in plain text files named images.txt and cameras.txt. These must be in one-to-one correspondence, and with one image or camera per line.

The order should be with TC1 images being before TC2. Later we will use the same order when these are subdivided by sensor.

Initial bundle adjustment is done with the intrinsics fixed.

parallel_bundle_adjust                      \
  --nodes-list nodes.txt                    \
  --image-list images.txt                   \
  --camera-list cameras.txt                 \
  --num-iterations 50                       \
  --tri-weight 0.2                          \
  --tri-robust-threshold 0.2                \
  --camera-weight 0                         \
  --auto-overlap-params 'dem.tif 15'        \
  --remove-outliers-params '75.0 3.0 20 20' \
  --ip-per-tile 2000                        \
  --matches-per-tile 2000                   \
  --max-pairwise-matches 20000              \
  -o ba/run

The option --auto-overlap-params is used with a prior DEM (such as gridded and filled with point2dem at low resolution based on LOLA RDR data). This is needed to estimate which image pairs overlap.

The option --remove-outliers-params is set so that only the worst outliers (with reprojection error of 20 pixels or more) are removed. That because imperfect intrinsics may result in accurate interest points that have a somewhat large reprojection error. We want to keep such features in the corners to help refine the distortion parameters.

The option --ip-per-tile is set to a large value so that many interest points are generated, and then the best ones are kept. This can be way too large for big images. (Consider using instead --ip-per-image.)

Normally 50 iterations should be enough. Two passes will happen. After each pass outliers will be removed.

It is very strongly suggested to inspect the obtained clean match files (that is, without outliers) with stereo_gui (Section, and reprojection errors in the final pointmap.csv file (Section 16.5.8), using stereo_gui as well (Section 16.64.6). Insufficient or poorly distributed clean interest point matches will result in a poor solution.

The reprojection errors are plotted in Fig. 12.3. Running stereo

We will use the optimized CSM cameras saved in the ba directory (Section 8.12.6). For each stereo pair, run:

parallel_stereo                    \
  --job-size-h 2500                \
  --job-size-w 2500                \
  --stereo-algorithm asp_mgm       \
  --subpixel-mode 9                \
  --nodes-list nodes.txt           \
  left.cub right.cub               \
  ba/run-left.adjusted_state.json  \
  ba/run-right.adjusted_state.json \

Then we will create a DEM at the resolution of the input images, which in this case is 10 m/pixel. The local stereographic projection will be used.

point2dem --tr 10    \
  --errorimage       \
  --stereographic    \
  --proj-lon 93.7608 \
  --proj-lat 3.6282  \

Normally it is suggested to rerun stereo with mapprojected images (Section 6.1.7) to get higher quality results. For the current goal, of optimizing the intrinsics, the produced terrain is good enough. See also Section 6 for a discussion of various stereo algorithms.

Inspect the produced DEMs and intersection error files (Section 16.52). The latter can be colorized (Section 16.64.5). Use gdalinfo -stats (Section 16.22) to see the statistics of the intersection error. In this case it turns out to be around 4 m, which, given the ground resolution of 10 m/pixel, is on the high side. The intersection errors are also higher at left and right image edges, due to distortion. (For a frame sensor this error will instead be larger in the corners.) Evaluating agreement between the DEMs

Overlay the produced DEMs and check for any misalignment. This may happen if there are insufficient interest points or if the unmodelled distortion is large.

Create a blended average DEM from the produced DEMs using the dem_mosaic (Section 16.19):

dem_mosaic stereo*/run-DEM.tif -o mosaic_ba.tif

It is useful to subtract each DEM from the mosaic using geodiff (Section 16.23):

geodiff mosaic_ba.tif stereo_left_right/run-DEM.tif \
  -o stereo_left_right/run

These differences can be colorized with stereo_gui using the --colorbar option (Section 16.64.5). The std dev of the obtained signed difference can be used as a measure of discrepancy. These errors should go down after refining the intrinsics. Refining the intrinsics

We will use the camera files produced by bundle_adjust before, with names as ba/run-*.adjusted_state.json. These have the refined position and orientation. We will re-optimize those together with the distortion parameters (which in bundle_adjust go by the name other_intrinsics). It was found in this case that optimizing the focal length and optical center has no significant effect.

The images and (adjusted) cameras for individual sensors should be put in separate files, but in the same overall order as before, to be able reuse the match files. Then, the image lists will be passed to the --image-list option with comma as separator (no spaces), and the same for the camera lists. The bundle adjustment command becomes:

bundle_adjust --solve-intrinsics                \
  --inline-adjustments                          \
  --intrinsics-to-float other_intrinsics        \
  --image-list tc1_images.txt,tc2_images.txt    \
  --camera-list tc1_cameras.txt,tc2_cameras.txt \
  --num-iterations 10                           \
  --clean-match-files-prefix ba/run             \
  --heights-from-dem mosaic_ba.tif              \
  --heights-from-dem-weight 0.25                \
  --heights-from-dem-robust-threshold 0.25      \
  --remove-outliers-params '75.0 3.0 20 20'     \
  --max-pairwise-matches 10000                  \
  -o ba_other_intrinsics/run

The values for --heights-from-dem-weight and --heights-from-dem-robust-threshold were chosen to be smaller than what is used for the --robust-threshold, which is 0.5. That because the DEM is not perfect, and we don’t want to overfit to it. The DEM-related weights can be increased to make the solution move more towards the DEM. See Section for more details, and Section 16.5 for the documentation of all options above.


Fig. 12.3 The reprojection errors (pointmap.csv) before (top) and after (bottom) refinement of distortion. Some outliers are still visible but are harmless. It can be seen that many red vertical patterns are now much attenuated (these correspond to individual image edges). On the right some systematic errors are seen (due to the search range in stereo chosen here being too small and some ridges having been missed). Those do not affect the optimization. Using mapprojected images would have helped with this. The ultimate check will be the comparison with LOLA RDR (Fig. 12.4). Plotted with stereo_gui (Section 16.64.6). Recreation of the stereo DEMs

The new cameras can be used to redo stereo and the DEMs. It is suggested to use the option --prev-run-prefix in parallel_stereo to redo only the triangulation operation, which greatly speeds up processing (see Section 8.27.11 and Section

As before, it is suggested to examine the intersection error and the difference between each produced DEM and the corresponding combined averaged DEM. These errors drop by a factor of about 2 and 1.5 respectively. Comparing to an external ground truth

We solved for intrinsics by constraining against the averaged mosaicked DEM of the stereo pairs produced with initial intrinsics. This works reasonably well if the error due to distortion is somewhat small and the stereo pairs overlap enough that this error gets averaged out in the mosaic.

Ideally, a known accurate external DEM should be used. For example, one could create DEMs using LRO NAC data. Note that many such DEMs would be need to be combined, because LRO NAC has a much smaller footprint.

Should such a DEM exist, before using it instead of the averaged mosaic, the mosaic (or individual stereo DEMs) should be first aligned to the external DEM. Then, the same alignment transform should be applied to the cameras (Section 16.49.14). Then the intrinsics optimization can happen as before.

We use the sparse LOLA RDR dataset for final validation. This works well enough because the ground footprint of Kaguya TC is rather large.

Each stereo DEM, before and after intrinsics refinement, is individually aligned to LOLA, and the signed difference to LOLA is found.

 pc_align --max-displacement 50                  \
   --save-inv-transformed-reference-points       \
   dem.tif lola.csv                              \
   -o run_align/run

 point2dem --tr 10   \
  --errorimage       \
  --stereographic    \
  --proj-lon 93.7608 \
  --proj-lat 3.6282  \

geodiff --csv-format 2:lon,3:lat,4:radius_km     \
  run_align/run-trans_reference-DEM.tif lola.csv \
  -o run_align/run

The pc_align tool is quite sensitive to the value of --max-displacement (Section 16.49.2). Here it was chosen to be somewhat larger than the vertical difference between the two datasets to align. That because KaguyaTC is already reasonably well-aligned.


Fig. 12.4 The signed difference between aligned stereo DEMs and LOLA RDR before (top) and after (bottom) refinement of distortion. (Blue = -20 meters, red = 20 meters.) It can be seen that the warping of the DEMs due to distortion is much reduced. Plotted with stereo_gui (Section 16.64.6). Handling images with very different illumination

If each stereo pair has consistent illumination, but the illumination is very different between pairs, then the above approach may not work well as tie points could be hard to find. It is suggested to do the initial bundle adjustment per each stereo pair, followed by alignment of the individual produced DEMs to a reference dataset.

Apply the alignment transform to the pairwise bundle-adjusted cameras as well, and use these cameras for the refinement of intrinsics, with the ground constraint being the mosaic of these aligned DEMs.

It is suggested to examine how each aligned DEM differs from the reference, and the same for their mosaic. The hope is that the mosaicking will average out the errors in the individual DEMs.

If a lot of such stereo pairs are present, for the purpose of refinement of intrinsics it is suggested to pick just a handful of them, corresponding to the area where the mosaicked DEM differs least from the reference, so where the distortion artifacts are most likely to have been averaged well.

12.3. Custom approaches to interest points

12.3.1. Uniformly distributed interest points

To attempt to create roughly uniformly distributed sparse interest points during bundle adjustment, use options along the lines --ip-per-tile 1000 --matches-per-tile 500 --max-pairwise-matches 10000. Note that if the images are big, this will result in a very large number of potential matches, because a tile has the size of 1024 pixels. (See Section 16.5.10 for the reference documentation for these options.)

For creating dense interest point matches from stereo disparity, see the section on solving for intrinsics, at Section

12.3.2. Limit extent of interest point matches

To limit the triangulated points produced from interest points to a certain area during bundle adjustment, two approaches are supported. One is the option --proj-win, coupled with --proj-str.

The other is using the --weight-image option (also supported by the jitter solver, Section 16.35). In locations where a given georeferenced weight image has non-positive or nodata values, triangulated points will be ignored. Otherwise each reprojection error will be multiplied by the weight closest geographically to the triangulated point.

Such a weight image can be created from a regular georeferenced image with positive pixel values as follows. Open it in stereo_gui, and draw on top of it one or more polygons, each being traversed in a counterclockwise direction, and with any holes oriented clockwise (Section 16.64.7). Save this shape as poly.shp, and then run:

cp georeferenced_image.tif aux_image.tif
gdal_rasterize -i -burn -32768 poly.shp aux_image.tif

This will keep the data inside the polygons and set the data outside to this value. The value to burn should be negative and smaller than any valid pixel value in the image. To keep the data outside the polygons, omit the -i option.

Then, create a mask of valid values using image_calc (Section 16.31), as follows:

image_calc -c "max(sign(var_0), 0)" \
 --output-nodata-value var_0        \
 aux_image.tif -o weight.tif

Examine the obtained image in stereo_gui and click on various pixels to inspect the values.

If the image does not have positive values to start with, those values can be first shifted up with image_calc.

Various such weight images can be merged with dem_mosaic (Section 16.19) or the values manipulated with image_calc.

12.4. Bundle adjustment using ISIS

In what follows we describe how to do bundle adjustment using ISIS’s tool-chain. It also serves to describe bundle adjustment in more detail, which is applicable to other bundle adjustment tools as well, including Stereo Pipeline’s own tool.

In bundle adjustment, the position and orientation of each camera station are determined jointly with the 3D position of a set of image tie-points points chosen in the overlapping regions between images. Tie points, as suggested by the name, tie multiple camera images together. Their physical manifestation would be a rock or small crater than can be observed across more than one image.

Tie-points are automatically extracted using ISIS’s autoseed and pointreg (alternatively one could use a number of outside methods such as the famous SURF [BETG08]). Creating a collection of tie points, called a control network, is a three step process. First, a general geographic layout of the points must be decided upon. This is traditionally just a grid layout that has some spacing that allows for about 20-30 measurements to be made per image. This shows up in slightly different projected locations in each image due to their slight misalignments. The second step is to have an automatic registration algorithm try to find the same feature in all images using the prior grid as a starting location. The third step is to manually verify all measurements visually, checking to insure that each measurement is looking at the same feature.

A feature observation

Fig. 12.5 A feature observation in bundle adjustment, from [MWLS09]

Bundle Adjustment in ISIS is performed with the jigsaw executable. It generally follows the method described in [TMHF00] and determines the best camera parameters that minimize the projection error given by \({\bf \epsilon} = \sum_k\sum_j(I_k-I(C_j, X_k))^2\) where \(I_k\) are the tie points on the image plane, \(C_j\) are the camera parameters, and \(X_k\) are the 3D positions associated with features \(I_k\). \(I(C_j, X_k)\) is an image formation model (i.e. forward projection) for a given camera and 3D point. To recap, it projects the 3D point, \(X_k\), into the camera with parameters \(C_j\). This produces a predicted image location for the 3D point that is compared against the observed location, \(I_k\). It then reduces this error with the Levenberg-Marquardt algorithm (LMA). Speed is improved by using sparse methods as described in [HZ04], [Kon10], and [CDHR08].

Even though the arithmetic for bundle adjustment sounds clever, there are faults with the base implementation. Imagine a case where all cameras and 3D points were collapsed into a single point. If you evaluate the above cost function, you’ll find that the error is indeed zero. This is not the correct solution if the images were taken from orbit. Another example is if a translation was applied equally to all 3D points and camera locations. This again would not affect the cost function. This fault comes from bundle adjustment’s inability to control the scale and translation of the solution. It will correct the geometric shape of the problem, yet it cannot guarantee that the solution will have correct scale and translation.

ISIS attempts to fix this problem by adding two additional cost functions to bundle adjustment. First of which is \({\bf \epsilon} = \sum_j(C_j^{initial}-C_j)^2\). This constrains camera parameters to stay relatively close to their initial values. Second, a small handful of 3D ground control points can be chosen by hand and added to the error metric as \({\bf \epsilon} = \sum_k(X_k^{gcp}-X_k)^2\) to constrain these points to known locations in the planetary coordinate frame. A physical example of a ground control point could be the location of a lander that has a well known location. GCPs could also be hand-picked points against a highly regarded and prior existing map such as the THEMIS Global Mosaic or the LRO-WAC Global Mosaic.

Like other iterative optimization methods, there are several conditions that will cause bundle adjustment to terminate. When updates to parameters become insignificantly small or when the error, \({\bf \epsilon}\), becomes insignificantly small, then the algorithm has converged and the result is most likely as good as it will get. However, the algorithm will also terminate when the number of iterations becomes too large in which case bundle adjustment may or may not have finished refining the parameters of the cameras.

12.4.1. Tutorial: Processing Mars Orbital Camera images

This tutorial for ISIS’s bundle adjustment tools is taken from [Mor12a] and [Mor12b]. These tools are not a product of NASA nor the authors of Stereo Pipeline. They were created by USGS and their documentation is available at [Cen].

What follows is an example of bundle adjustment using two MOC images of Hrad Vallis. We use images E02/01461 and M01/00115, the same as used in Section 4.2. These images are available from NASA’s PDS (the ISIS mocproc program will operate on either the IMQ or IMG format files, we use the .imq below in the example). For reference, the following ISIS commands are how to convert the MOC images to ISIS cubes.

ISIS> mocproc from=e0201461.imq to=e0201461.cub mapping=no
ISIS> mocproc from=m0100115.imq to=m0100115.cub mapping=no

Note that the resulting images are not map-projected. Bundle adjustment requires the ability to project arbitrary 3D points into the camera frame. The process of map-projecting an image dissociates the camera model from the image. Map-projecting can be perceived as the generation of a new infinitely large camera sensor that may be parallel to the surface, a conic shape, or something more complex. That makes it extremely hard to project a random point into the camera’s original model. The math would follow the transformation from projection into the camera frame, then projected back down to surface that ISIS uses, then finally up into the infinitely large sensor. Jigsaw does not support this and thus does not operate on map-projected images.

Before we can dive into creating our tie-point measurements we must finish prepping these images. The following commands will add a vector layer to the cube file that describes its outline on the globe. It will also create a data file that describes the overlapping sections between files.

ISIS> footprintinit from=e0201461.cub
ISIS> footprintinit from=m0100115.cub
ISIS> echo *cub |  xargs -n1 echo > cube.lis
ISIS> findimageoverlaps from=cube.lis overlaplist=overlap.lis

At this point, we are ready to start generating our measurements. This is a three step process that requires defining a geographic pattern for the layout of the points on the groups, an automatic registration pass, and finally a manual clean up of all measurements. Creating the ground pattern of measurements is performed with autoseed. It requires a settings file that defines the spacing in meters between measurements. For this example, write the following text into a autoseed.def file.

Group = PolygonSeederAlgorithm
      Name = Grid
      MinimumThickness = 0.01
      MinimumArea = 1
      XSpacing = 1000
      YSpacing = 2000

The minimum thickness defines the minimum ratio between the sides of the region that can have points applied to it. A choice of 1 would define a square and anything less defines thinner and thinner rectangles. The minimum area argument defines the minimum square meters that must be in an overlap region. The last two are the spacing in meters between control points. Those values were specifically chosen for this pair so that about 30 measurements would be produced from autoseed. Having more control points just makes for more work later on in this process. Run autoseed with the following instruction.

Autoseed visualization

Fig. 12.6 A visualization of the features laid out by autoseed in qnet. Note that the marks do not cover the same features between images. This is due to the poor initial SPICE data for MOC images.

ISIS> autoseed fromlist=cube.lis overlaplist=overlap.lis    \
          onet=control.net deffile=autoseed.def networkid=moc \
          pointid=???? description=hrad_vallis

The next step is to perform auto registration of these features between the two images using pointreg. This program also requires a settings file that describes how to do the automatic search. Copy the text box below into a autoRegTemplate.def file.

 Object = AutoRegistration
  Group = Algorithm
    Name         = MaximumCorrelation
    Tolerance    = 0.7

  Group = PatternChip
    Samples = 21
    Lines   = 21
    MinimumZScore = 1.5
    ValidPercent = 80

  Group = SearchChip
    Samples = 75
    Lines   = 1000

The search chip defines the search range for which pointreg will look for matching images. The pattern chip is simply the kernel size of the matching template. The search range is specific for this image pair. The control network result after autoseed had a large vertical offset in the ball park of 500 pixels. The large misalignment dictated the need for the large search in the lines direction. Use qnet to get an idea for what the pixel shifts look like in your stereo pair to help you decide on a search range. In this example, only one measurement failed to match automatically. Here are the arguments to use in this example of pointreg.

ISIS> pointreg fromlist=cube.lis cnet=control.net             \
           onet=control_pointreg.net deffile=autoRegTemplate.def

The third step is to manually edit the control and verify the measurements in qnet. Type qnet in the terminal and then open cube.lis and lastly control_pointreg.net. From the Control Network Navigator window, click on the first point listed as 0001. That opens a third window called the Qnet Tool. That window will allow you to play a flip animation that shows alignment of the feature between the two images. Correcting a measurement is performed by left clicking in the right image, then clicking Save Measure, and finally finishing by clicking Save Point.

In this tutorial, measurement 0025 ended up being incorrect. Your number may vary if you used different settings than the above or if MOC spice data has improved since this writing. When finished, go back to the main Qnet window. Save the final control network as control_qnet.net by clicking on File, and then Save As.

After Qnet

Fig. 12.7 A visualization of the features after manual editing in qnet. Note that the marks now appear in the same location between images.

Once the control network is finished, it is finally time to start bundle adjustment. Here’s what the call to jigsaw looks like:

ISIS> jigsaw fromlist=cube.lis update=yes twist=no radius=yes \
           cnet=control_qnet.net onet=control_ba.net

The update option defines that we would like to update the camera pointing, if our bundle adjustment converges. The twist=no says to not solve for the camera rotation about the camera bore. That property is usually very well known as it is critical for integrating an image with a line-scan camera. The radius=yes means that the radius of the 3D features can be solved for. Using no will force the points to use height values from another source, usually LOLA or MOLA.

The above command will spew out a bunch of diagnostic information from every iteration of the optimization algorithm. The most important feature to look at is the sigma0 value. It represents the mean of pixel errors in the control network. In our run, the initial error was 1065 pixels and the final solution had an error of 1.1 pixels.

Producing a DEM using the newly created camera corrections is the same as covered in the Tutorial. When using jigsaw, it modifies a copy of the spice data that is stored internally to the cube file. Thus when we want to create a DEM using the correct camera geometry, no extra information needs to be given to parallel_stereo since it is already contained in the file. In the event a mistake has been made, spiceinit will overwrite the spice data inside a cube file and provide the original uncorrected camera pointing. Hence, the stereo command does not change:

ISIS> parallel_stereo E0201461.cub M0100115.cub bundled/bundled