18. The stereo algorithms in ASP in detail¶
Here we will discuss in a lot of detail ASP’s stereo algorithms. For a brief summary and an illustration see Section 6.1.1. For how to add new such algorithms, see Section 18.8.
18.1. Block-matching¶
Block-matching is ASP’s oldest and default algorithm. It can be
invoked with both stereo
and parallel_stereo
with the option:
--stereo-algorithm asp_bm
and works with any alignment method (Section 17.1.2).
For each pixel in the left image, the algorithm matches a small block
around this pixel with another similar block in the right image. The
block size is given by the value of --corr-kernel
. The obtained
correspondence is then refined based on the value of
--subpixel-mode
, using a block given by --subpixel-kernel
. The
option --corr-timeout
can be used to ensure long-running block
matching operations are stopped after a given time.
The related block-matching algorithm in OpenCV that ASP can invoke is discussed in Section 18.7.
18.2. Semi-Global Matching and More Global Matching algorithms¶
ASP implements the popular Semi-Global Matching (SGM) algorithm
introduced in [Hirschmuller08], and the More Global
Matching (MGM) algorithm [FDFM15], which is a
modification of SGM, and usually produces higher quality
results. These should be invoked with parallel_stereo
, with the
option --stereo-algorithm
being passed the value asp_sgm
and
asp_mgm
, respectively.
It is suggested to use these algorithms with --alignment-method
local_epipolar
, when piecewise alignment between left and right
images is computed which results in the disparity being 1D and faster
to find (Section 17.1.2).
However, the versions of SGM and MGM implemented by ASP can
perform a full 2D disparity search, similar to what is done in the
NG-fSGM algorithm [XLB+16]. Since ASP
processes a wide variety of cameras with varying degrees of metadata
quality, the standard assumption with SGM that the disparity search
can be performed only along a one-dimensional epipolar line does not
hold when the alignment method is not local_epipolar
or for
map-projected images.
The other major change is that ASP’s implementation uses a multi-resolution hierarchical search combined with a compressed memory scheme similar to what is used in the SGM algorithm [RWFH12].
The MGM algorithm reduces the amount of high frequency artifacts in
textureless regions at the cost of a longer run time. ASP also offers
the option of a hybrid SGM/MGM mode (--stereo-algorithm final_mgm
)
where MGM is used only for the final resolution level which obtains
results somewhere between the pure SGM and MGM options.
The greatest advantage of the SGM algorithm over the ASP block-matching algorithm is an improved ability to find disparity matches in areas of repetitive or low texture. SGM can also discern finer resolution features than the standard correlation algorithm since it tends to use much smaller matching kernels. Along with these advantages come several disadvantages. First, SGM is computationally expensive and requires a lot of memory. Second, in some situations it can produce noticeable artifacts at tile boundaries. Third, it can sometimes produce inaccurate results in textureless regions. With careful parameter selection and usage these disadvantages can be mitigated.
MGM is currently limited to using 8 simultaneous threads but SGM does not have a limit.
It is suggested to use these algorithms with default options. If desired, customizations can be done as follows.
Set the
--processes
option keeping in mind memory constraints as discussed earlier. Each process will run one simultaneous SGM instance and consume memory (Section 16.52).The
--corr-memory-limit-mb
parameter limits the number of megabytes of memory that can be used by SGM/MGM. This limit is per-process. To be safe, make sure that you have more RAM available than the value of this parameter multiplied by the number of processes.See Section 16.52.3 regarding tiling and padding.
Each process spawned by parallel_stereo
can use multiple threads with
threads-singleprocess
without affecting the stereo results.
When SGM or MGM is specified, certain stereo parameters have their default values replaced with values that will work with SGM. You can still manually specify these options.
cost-mode
(default 4). Mean absolute distance (MAD) (cost-mode <= 2
) usually does not work well. The census transform mode (cost-mode 3
) [ZW94] tends to perform better overall but can produce artifacts on featureless terrain. The ternary census transform mode (cost-mode 4
) [HCW+16] is a modification of the census transform that is more stable on low contrast terrain but may be less accurate elsewhere.corr-kernel
. SGM kernels must always be odd. The SGM algorithm works with much smaller kernel sizes than the regular integer correlator so the default large kernel is not recommended. The MAD cost mode can be used with any odd kernel size (including size 1) but the census cost modes can only be used with kernel sizes 3, 5, 7, and 9. Size 5 or 7 is usually a good choice. The default is 5.xcorr-threshold
. By default this is enabled and set to 2, which doubles the run time of the SGM algorithm. Set it to -1 to turn it off, which may result in less accuracy. If settingmin-xcorr-level
to 1, one can perform the cross check on the smaller resolution levels without spending the time to run it on the highest resolution level, which is level 0.The median and texture filters in the
stereo_fltr
tool (defaults 3, 11, 0.13). These filters were designed specifically to clean up output from the SGM algorithm and are especially useful in suppressing image artifacts in low-texture portions of the image. A median filter size of 3 and a texture filter size of 11 are good starts but the best values will depend on your input images. Thetexture-smooth-scale
parameter will have to be adjusted to taste, but a range of 0.13 to 0.15 is typical for icy images. These values are enabled by default and must be manually disabled. If your images have good texture throughout it may be best to disable these filters.The
prefilter-mode
setting is ignored when using SGM.The
subpixel-mode
If not set, or set to values 7-12, SGM will perform subpixel interpolation during the stereo correlation step and will not do additional work in the stereo refinement step. This means that after dealing with the long SGM processing time you do not need to follow it up with a slow subpixel option.If desired, you can specify a subpixel mode (with value between 1 and 4) to force those subpixel operations to be performed after the default SGM subpixel method. This can remove some staircasing and other artifacts. It is suggested that in this case subpixel mode 3 be used which is somewhat less accurate than subpixel mode 2 but faster.
Fig. 18.1 shows a comparison between two stereo
modes. The DEM on the left was generated using the default stereo
parameters and --subpixel-mode 3
. The DEM on the right was
generated using the command:
parallel_stereo --stereo-algorithm asp_sgm \
--corr-kernel 7 7 --cost-mode 4 \
--median-filter-size 3 --texture-smooth-size 13 \
--texture-smooth-scale 0.13
Some grid pattern noise is visible in the image produced using SGM.
Using --stereo-algorithm asp_mgm
should reduce it.
18.3. Original implementation of MGM¶
ASP ships the MGM algorithm as implemented by its authors ([FDFM15]) at:
https://github.com/gfacciol/mgm
That program is released under the AGPL license. We do not link to it
directly, rather it is called as a separate process from
stereo_corr
, avoiding license compatibility issues.
To use it, run:
parallel_stereo --alignment-method local_epipolar \
--stereo-algorithm mgm \
--job-size-w 512 --job-size-h 512 \
--sgm-collar-size 128 \
left.tif right.tif left.xml right.xml
In this mode, locally aligned portions of the input left and right
images are saved to disk, the MGM program (named mgm
) is
called for each such pair, it writes the computed disparity
back to disk, which is then ingested by ASP.
To be more specific, a global affine epipolar alignment of the left
and right images is computed first, then the aligned images are broken
up into tiles, with each tile being by default 1024 x 1024 pixels with
a 512 pixel padding (hence the total tile size is 2048 x 2048), local
epipolar alignment is computed for each tile, the combination of the
global and subsequent local alignment is applied to each original
image to get the locally aligned image tiles, and those are written to
disk, to be passed to mgm
.
The mgm
program has its own options. Some are environmental
variables, to be set before the tool is called, such as
CENSUS_NCC_WIN=5
, while others are passed to the mgm
executable on the command line, for example, -t census
. To
communicate any such options to this program, invoke
parallel_stereo
(for example) with:
--stereo-algorithm 'mgm CENSUS_NCC_WIN=5 -t census'
ASP will ensure these will be passed correctly to mgm
.
By default, ASP uses:
MEDIAN=1 CENSUS_NCC_WIN=5 USE_TRUNCATED_LINEAR_POTENTIALS=1 TSGM=3 \
-s vfit -t census -O 8
These are adjusted depending on which ones the user chooses to override.
The CENSUS_NCC_WIN
parameter is is one of the more notable parameters,
as it determines the size of the window to use for correlation, so it
corresponds to the option --corr-kernel
of ASP-implemented
algorithms.
ASP automatically finds the minimum and maximum estimated disparity,
and it passes it to mgm
via the -r
and -R
switches.
18.3.1. Options for mgm¶
- -r (default = -30):
Minimum horizontal disparity value. (The images are assumed to be rectified, which eliminates the vertical disparity.)
- -R (default = 30):
Maximum horizontal disparity value.
- -O (default = 4):
Number of search directions. Options: 2, 4, 8, 16.
- -P1 (default = 8)
SGM regularization parameter P1.
- -P2 (default = 32):
SGM regularization parameter P2.
- -p (default = none):
Prefilter algorithm. Options: none, census, sobelx, gblur. The
census
mode uses the window of dimensionsCENSUS_NCC_WIN
.- -t (default = ad):
Distance function. Options: census, ad, sd, ncc, btad, btsd. For
ncc
the window of dimensionsCENSUS_NCC_WIN
is used. Thebt
option is the Birchfield-Tomasi distance.- -truncDist (default = inf):
Truncate distances at nch * truncDist.
- -s (default = none):
Subpixel refinement algorithm. Options: none, vfit, parabola, cubic.
- -aP1 (default = 1):
Multiplier factor of P1 when sum |I1 - I2|^2 < nch * aThresh^2.
- -aP2 (default = 1):
Multiplier factor of P2 as above.
- -aThresh (default = 5):
Threshold for the multiplier factors.
- -m FILE (default = none):
A file with minimum input disparity.
- -M FILE (default = none):
A file with maximum input disparity.
- -l FILE (default = none):
Write here the disparity without the left-to-right test.
18.3.2. Environmental variables for mgm¶
These should be set on the command line before mgm
is invoked.
(ASP does that automatically.)
- CENSUS_NCC_WIN=3:
Size of the window for the census prefilter algorithm and NCC (normalized cross-correlation).
- TESTLRRL=1:
If 1, do left-to-right and right-to-left consistency checks.
- MEDIAN=0:
Radius of the median filter post-processing.
- TSGM=4:
Regularity level.
- TSGM_ITER=1:
Number of iterations.
- TSGM_FIX_OVERCOUNT=1:
If 1, fix overcounting of the data term in the energy.
- TSGM_DEBUG=0:
If 1, print debug information.
- TSGM_2LMIN=0:
Use the improved TSGM cost only for TSGM=2. Overrides the TSGM value.
- USE_TRUNCATED_LINEAR_POTENTIALS=0:
If 1, use the Felzenszwalb-Huttenlocher truncated linear potential. Then P1 and P2 change meaning. The potential they describe becomes V(p,q) = min(P2, P1*|p-q|).
18.4. OpenCV SGBM¶
The parallel_stereo
program can invoke the OpenCV
semi-global block-matching algorithm (SGBM) if called with:
--alignment-method local_epipolar \
--stereo-algorithm "opencv_sgbm"
Alternatively, the full string having this algorithm and its options can be used, as:
--alignment-method local_epipolar \
--stereo-algorithm \
"opencv_sgbm -mode sgbm -block_size 3 -P1 8 -P2 32
-prefilter_cap 63 -uniqueness_ratio 10 -speckle_size 100
-speckle_range 32 -disp12_diff 1"
If an invocation as follows is used:
--alignment-method local_epipolar \
--stereo-algorithm "opencv_sgbm -block_size 7"
ASP will use the earlier values for all the options except
-block_size
which will be set to 7. Hence, the user can explicitly
specify options whose values are desired to be different than the
default choices.
See an illustration in Fig. 6.1.
18.4.1. SGBM options¶
- -mode (default = sgbm):
Choose among several flavors of SGBM. Use
sgbm
for the less-memory intensive mode. Setting this mode tohh
will run the full-scale two-pass dynamic programming algorithm. It will consume O(image_width * image_height * num_disparities) bytes of memory, and may run out of memory for a large input disparity range. Use3way
for yet another flavor which OpenCV does not document.- -block_size (default = 3):
Block size to use to match blocks from left to right image. It must be an odd number >=1. Normally, it should be somewhere in the 3 - 11 range.
- -P1 (default = 8):
Multiplier for the first parameter controlling the disparity smoothness. This parameter is used for the case of slanted surfaces. This is multiplied by num_image_channels block_size * block_size, and ASP uses num_image_channels = 1. It is used as the penalty on the disparity change by plus or minus 1 between neighbor pixels.
- -P2 (default = 32):
Multiplier for the second parameter controlling the disparity smoothness. This is multiplied by num_image_channels block_size * block_size, and ASP uses num_image_channels = 1. This parameter is used for “solving” the depth discontinuities problem. The larger the values are, the smoother the disparity is. This parameter is the penalty on the disparity change by more than 1 between neighbor pixels. The algorithm requires P2 > P1.
- -disp12_diff (default = 1):
Maximum allowed difference (in integer pixel units) in the left-to-right vs right-to-left disparity check. Set it to a non-positive value to disable the check.
- -prefilter_cap (default = 63):
Truncation value for the prefiltered image pixels. The algorithm first computes the x-derivative at each pixel and clips its value by [-prefilter_cap, prefilter_cap] interval. The result values are passed to the Birchfield-Tomasi pixel cost function.
- -uniqueness_ratio (default = 10):
Margin in percentage by which the best (minimum) computed cost function value should “win” the second best value to consider the found match correct. Normally, a value within the 5 - 15 range is good enough.
- -speckle_size (default = 100):
Maximum size of smooth disparity regions to consider their noise speckles and invalidate. Set it to 0 to disable speckle filtering. Otherwise, set it somewhere in the 50 - 200 range.
- -speckle_range (default = 32):
Maximum disparity variation within each connected component. If you do speckle filtering, set the parameter to a positive value, it will be implicitly multiplied by 16. Normally, 1 or 2 is good enough.
18.5. LIBELAS stereo algorithm¶
ASP ships and can invoke the LIBELAS
(Library for Efficient
Large-scale Stereo Matching) algorithm [GRU10],
described at:
http://www.cvlibs.net/software/libelas/
See an illustration in Fig. 6.1.
We implemented an interface around this library to overcome its assumption of the disparity being always positive, and added other minor changes. Our fork having these additions is at:
https://github.com/NeoGeographyToolkit/libelas
This software is released under GPL. ASP does not link to it directly, rather it gets invoked as via a system call, with its inputs and outputs being on disk.
To invoke it, run:
parallel_stereo --alignment-method local_epipolar \
--stereo-algorithm libelas \
<other options>
If desired to override the values of any of its parameters, those can be passed as follows:
--stereo-algorithm "libelas -ipol_gap_width 100"
(This particular parameter is used to fill holes in the disparity, with a larger value resulting in bigger holes being filled.)
The algorithm options, and their defaults, as used by ASP, are as follows.
- -disp_min (default = 0):
Minimum disparity (ASP estimates this unless the user overrides it).
- -disp_max (default = 255):
Maximum disparity (ASP estimates this unless the user overrides it).
- -support_threshold (default = 0.85):
Maximum uniqueness ratio (best vs. second-best support match).
- -support_texture (default = 10):
Minimum texture for support points.
- -candidate_stepsize (default = 5):
Step size of regular grid on which support points are matched.
- -incon_window_size (default = 5):
Window size of inconsistent support point check.
- -incon_threshold (default = 5):
Disparity similarity threshold for support point to be considered consistent.
- -incon_min_support (default = 5):
Minimum number of consistent support points.
- -add_corners (default = 0):
Add support points at image corners with nearest neighbor disparities.
- -grid_size (default = 20):
Size of neighborhood for additional support point extrapolation.
- -beta (default = 0.02):
Image likelihood parameter.
- -gamma (default = 3):
Prior constant.
- -sigma (default = 1):
Prior sigma.
- -sradius (default = 2):
Prior sigma radius.
- -match_texture (default = 1):
Minimum texture for dense matching.
- -lr_threshold (default = 2):
Disparity threshold for left-right consistency check.
- -speckle_sim_threshold (default = 1):
Similarity threshold for speckle segmentation.
- -speckle_size (default = 200):
Speckles larger than this get removed.
- -ipol_gap_width (default = 3):
Fill holes in disparity of height and width at most this value.
- -filter_median (default = 0):
If non-zero, use an approximate median filter.
- -filter_adaptive_mean (default = 1):
If non-zero, use an approximate adaptive mean filter.
- -postprocess_only_left (default = 0):
If non-zero, saves time by not postprocessing the right image.
- -verbose (default = 0):
If non-zero, print some information about the values of the options being used, as well as what the input and output files are.
- -debug_images (default = 0):
If non-zero, save the images to disk right before being passed to libelas (the images are thus padded, aligned, and scaled to have byte pixels).
18.6. Multi-Scale Multi-Window stereo matching¶
ASP provides access to the Multi-Scale Multi-Window
(MSMW) stereo
matching algorithm [BF15], by invoking its two
implementations msmw
and msmw2
from:
https://github.com/centreborelli/s2p
(see the 3rdparty
directory). While that repository is released
under the AGPL-3.0 license and ASP is under the more permissive Apache
II license, ASP invokes that functionality as an external program via
a system call, so its license does not conflict with ours.
18.6.1. Options for msmw¶
To invoke the msmw
algorithm, run parallel_stereo
with the
option:
--alignment-method local_epipolar \
--stereo-algorithm msmw
By default, ASP invokes this program as if it is called with:
--stereo-algorithm "msmw -i 1 -n 4 -p 4 -W 5 -x 9 -y 9 -r 1
-d 1 -t -1 -s 0 -b 0 -o 0.25 -f 0 -P 32"
In addition ASP, automatically calculates and passes to msmw
values for the -m
and -M
options which correspond to
estimated minimum and maximum disparity values.
Any options explicitly specified by the user, such as:
--stereo-algorithm "msmw -x 7 -y 7"
are substituted in the earlier string before ASP invokes this tool.
The meaning of these switches is as follows.
- -m:
Minimum disparity.
- -M:
Maximum disparity.
- -x (default = 0):
Width of the window (block) to match from the left to right image. Must be set to a positive odd value.
- -y (default = 0):
Matching window height. Must be set to a positive odd value.
- -w (default = 0):
Flag for weighting window.
- -W (default = 5):
Flag for using windows as lists (5x5 windows only). A non-zero value indicates how many of the orientations should be considered. (Note: Not sure what all this means.)
- -i (default = 1):
Type of distance.
- -p (default = 1):
Number of precisions for single scale.
- -P (default = 1):
Factor of disparity refinement by cubic interpolation.
- -n (default = 3):
Number of scales.
- -f (default = 0):
Standard deviation noise.
- -r (default = 0):
Reciprocity value.
- -g (default = 0):
Subpixel reciprocity flag.
- -R (default = 0):
Dual reciprocity value.
- -l (default = 0):
Inverse reciprocity value.
- -d (default = 0):
Mindist value.
- -t (default = 0):
Mindist dilatation.
- -s (default = 0):
Self-similarity value.
- -b (default = 0):
Integral of derivatives.
- -v (default = 0):
Variance value.
- -o (default = 0):
Remove isolated flag.
- -O (default = 0):
Remove isolated grain (number pixels).
- -C (default = -1):
Filter using the cost, train removing a fraction of the accepted points (e.g. 0.05).
- -a (default = 0):
Use Laplacian of the image instead of the image itself.
18.6.2. Options for msmw2¶
This flavor of the MSMW algorithm is called analogously, with:
--stereo-algorithm msmw2
ASP fills in its options as if it is called as:
--stereo-algorithm "msmw2 -i 1 -n 4 -p 4 -W 5 -x 9 -y 9
-r 1 -d 1 -t -1 -s 0 -b 0 -o -0.25 -f 0 -P 32 -D 0 -O 25 -c 0"
As earlier, any of these can be overridden. Compared to msmw
this
tool has the additional options:
- -D (default = 0):
Regression mindist.
- -c (default = 0):
Combine last scale with the previous one to densify the result.
18.7. OpenCV BM¶
The simpler and not as performing block-matching (BM) algorithm of
OpenCV can be invoked in a very similar manner to OpenCV’s SGBM
(Section 18.4), with the algorithm name passed to
--stereo-algorithm
being opencv_bm
. It accepts the same
parameters except -P1
and -P2
, and uses in addition the
option:
- -texture_thresh (default = 10):
The disparity is only computed for pixels whose “texture” measure is no less than this value. Hence lowering this will result in the disparity being computed at more pixels but it may be more erroneous.
The full default string of options that is used by
--stereo-algorithm
is:
"opencv_bm -block_size 21 -texture_thresh 10 -prefilter_cap 31
-uniqueness_ratio 15 -speckle_size 100 -speckle_range 32
-disp12_diff 1"
and any of these can be modified as for the SGBM algorithm. Notice how the BM algorithm has to use a bigger block size than SGBM.
18.8. Adding new algorithms to ASP¶
ASP makes it possible for anybody to add their own algorithm to be used for stereo correlation without having to recompile ASP itself.
Any such algorithm must be a program to be invoked as:
myprog <options> left_image.tif right_image.tif \
output_disparity.tif
Here, as often assumed in the computer vision community, the input
images left_image.tif
and right_image.tif
are expected to be
small image clips with epipolar alignment applied to them so that the
epipolar lines are horizontal and the resulting disparity only need to
be searched in the x
direction (along each row). The images must
have the same size. (ASP will take care of preparing these images.)
The images must be in the TIF format, with pixel values being of the
float
type, and no-data pixels being set to NaN
. The output
disparity is expected to satisfy the same assumptions and be of
dimensions equal to those of the input images.
The options passed to this program are expected to have no other characters except letters, numbers, space, period, underscore, plus, minus, and equal signs. Each option must have exactly one value, such as:
-opt1 val1 -opt2 val2 -opt3 val3
(More flexible options, including boolean ones, so with no value, may be implemented going forward.)
Such a program, say named myprog
, should be copied to the
location:
plugins/stereo/myprog/bin/myprog
relative to the ASP top-level directory, with any libraries in:
plugins/stereo/myprog/lib
Then, add a line to the file:
plugins/stereo/plugin_list.txt
in the ASP top-level directory, in the format:
myprog plugins/stereo/myprog/bin/myprog plugins/stereo/myprog/lib
The entries here are the program name (in lowercase), path to the program, and path to any libraries apart from those shipped with ASP (the last entry is optional).
Then, ASP can invoke this program by calling it, for example, as:
parallel_stereo --alignment-method local_epipolar \
--stereo-algorithm "myprog <options>" \
<images> <cameras> <output prefix>
The program will be called for each pair of locally aligned tiles obtained from these input images, with one subdirectory for each such pair of inputs. That subdirectory will also have the output disparity produced by the program. All such disparities will be read back by ASP, blended together, then ASP will continue with the steps of disparity filtering and triangulation.
It may be helpful to visit one of such subdirectories, examine the
stereo_corr
log file which will show how precisely the program was
called, and also look at its input image tiles and output disparity
stored there.