# User:S1151960

Jump to: navigation, search

Voxel colouring is a computer vision technique for volumetric scene reconstruction from a set of input images taken from a wide range of viewpoints. Steven M. Seitz and Charles R. Dyer first introduced voxel colouring in 1997[1].

As opposed to feature- and contour-based techniques that focus on reconstructing the shapes of the objects in the scene, voxel colouring approaches the problem as that of colour reconstruction. The core idea behind this technique is to find colour invariant voxels that constitute a complete scene with regards to the set of input images.

The ordinal visibility constraint that is a prerequisite for voxel colouring provides an inherent solution for the problem of occlusion in input images. This also makes it possible to implement voxel colouring as a one-pass algorithm.

## Main concepts

Voxel colouring assumes that the scene to be reconstructed is approximately Lambertian.

### Ordinal visibility constraint

In the context of computer vision, the effect of a certain scene point not being visible in a given image, because it is obstructed by another point in the scene, is called occlusion. Most of the scene reconstruction techniques are inefficient in handling significant occlusions in the input images[1].

Voxel colouring technique solves the problem of occlusion by introducing the ordinal visibility constraint, which is defined by Seitz & Dyer as follows:

There exists a norm ${\displaystyle \|{\boldsymbol {\cdot }}\|}$ such that for all scene points ${\displaystyle P}$ and ${\displaystyle Q}$, and input images ${\displaystyle I}$, ${\displaystyle P}$ occludes ${\displaystyle Q}$ in ${\displaystyle I}$ only if ${\displaystyle \|P\|<\|Q\|}$ [1]

This constraint limits the type of camera configurations that can be used to take input images. In general, the ordinal visibility constraint is satisfied if none of the scene points are contained within the convex hull formed by the camera positions (also called camera volume). In such case, the norm of the ordinal visibility constraint can be specified as the distance from the given scene point to the camera volume.

For instance, a set of inward-facing cameras positioned on a circle and capturing a scene inside that circle would be incompatible with the ordinal visibility constraint, because, in this case the circle comprises the camera volume, which also includes the scene points. However, moving the scene either below or above the camera volume (and rotating the cameras downwards or upwards, respectively) would create the compatible configuration.

The ordinal visibility constraint makes it possible to partition the volume space into the layers of uniform distance from the camera volume. The definition of the constraint ensures that the voxels on each layer can only be occluded by the voxels from the previous layers, which are closer to the camera volume. This property is the key for voxel colouring technique.

### Voxel consistency

For the purposes of voxel colouring, the input image ${\displaystyle I}$ is described as a set of infinitesimally small pixels ${\displaystyle p}$: ${\displaystyle I=\{p\}}$. Similarly, the scene ${\displaystyle S}$ that corresponds to the set of input images is described as a set of voxels (three-dimensional analogues of a pixel) ${\displaystyle V}$: ${\displaystyle S=\{V\}}$. The colour of a pixel ${\displaystyle p\in I}$ is denoted by ${\displaystyle colour(p,\ I)}$ and the color information of a voxel ${\displaystyle V\in S}$ is denoted by ${\displaystyle colour(V,\ S)}$.

The voxel ${\displaystyle V\in S}$ that is visible (not occluded) in image ${\displaystyle I}$ and is projected to the pixel ${\displaystyle p\in I}$ is given by ${\displaystyle V=S(p)}$.

If for every voxel ${\displaystyle V\in S}$ and image pixels ${\displaystyle p_{I_{i}}\in I_{i}}$ and ${\displaystyle p_{I_{j}}\in I_{j}\ (i,j=1..N)}$,

${\displaystyle S(p_{I_{i}})=S(p_{I_{j}})=V\implies colour(S(p_{I_{i}}),I_{i})=colour(S(p_{I_{j}}),I_{j})=colour(V,S)}$,

then the set ${\displaystyle S}$ is said to be voxel-consistent with the set of images ${\displaystyle I_{1},\dotsc ,I_{N}}$. In other words, if the set ${\displaystyle S}$ is voxel-consistent, then all image projections of each voxel ${\displaystyle V\in S}$ have the same colour as the voxel itself.

### Complete and consistent scenes

If for every image ${\displaystyle I_{i}\ (i=1..N)}$ and every pixel ${\displaystyle p_{I_{i}}\in I_{i}}$ exists voxel ${\displaystyle V\in S}$ such that ${\displaystyle V=S(p_{I_{i}})}$, then the scene ${\displaystyle S}$ is said to be complete with respect to the set of images ${\displaystyle I_{i}\ (i=1..N)}$.

The scene ${\displaystyle S}$ that is both complete and voxel-consistent with regards to the set of images ${\displaystyle I_{i}\ (i=1..N)}$ is said to be consistent with that set of images. If we denote by ${\displaystyle S_{V}}$ the set of all voxels from the given scene ${\displaystyle S}$ that are closer to the camera volume than the given voxel ${\displaystyle V}$, scene consistency can be defined more formally as follows:

Suppose ${\displaystyle S}$ is complete and, for each point ${\displaystyle V\in S,\ \{V\}\bigcup S_{V}}$ is voxel-consistent. Then ${\displaystyle S}$ is a consistent scene.

The inverse statement is also true:

If ${\displaystyle S}$ is a consistent scene then ${\displaystyle \{V\}\bigcup S_{V}}$ is a voxel-consistent set for every ${\displaystyle V\in S}$.

### Colour invariance

If for a given voxel ${\displaystyle V}$ and any scenes ${\displaystyle S_{1}}$ and ${\displaystyle S_{2}}$ that are consistent with the set of images ${\displaystyle I_{i}\ (i=1..N),\ V\in S_{1}\bigcap S_{2}}$ implies that ${\displaystyle colour(V,\ S_{1})=colour(V,\ S_{2})}$, then ${\displaystyle V}$ is said to be a colour invariant with regards to the set of image ${\displaystyle I_{i}\ (i=1..N)}$.

Let’s introduce the following notation:

${\displaystyle V_{p}=\{S(p)\ |\ {\hat {S}}-consistent,S\in {\hat {S}},\ \|S(p)\|=\min _{\hat {S}}\|{\hat {S}}(p)\|\}}$ for some ${\displaystyle p_{I_{i}}\in I_{i},i=1..N,}$

i.e. ${\displaystyle V_{p}}$ is the voxel, belonging to the scene(s) consistent with the set of input images, that projects to the pixel ${\displaystyle p_{I_{i}}}$ in the given image ${\displaystyle I_{i}}$ and is the closest to the camera volume compared to all the other voxels satisfying the previous criteria.

It is easy to notice that ${\displaystyle V_{p}}$ is a colour invariant, since ${\displaystyle V_{p}\in S}$ implies that ${\displaystyle V_{p}=S(p)}$, by definition. As scene consistency requires voxel-consistency, ${\displaystyle V_{p}}$ will have the same colour in all of its image projections ${\displaystyle p_{I_{i}},i=1..N}$. Since image projections remain the same independent of the scene ${\displaystyle S}$, voxel ${\displaystyle V_{p}}$ will have the same colour across all consistent scenes it belongs to. This conforms to the definition of the colour invariant.

Finally, if we denote by ${\displaystyle {\bar {S}}}$ the set of all ${\displaystyle V_{p}}$, i.e. the closest colour invariants corresponding to each of the pixels ${\displaystyle p_{I_{i}}\in I_{i},i=1..N,}$ we will get a complete scene (contains voxels for each of the pixels in input images) that is consistent with the set of input images (each voxel belongs to at least one consistent scene, i.e. each voxel has the same colour as all of its projections). ${\displaystyle {\bar {S}}}$ constitutes the voxel colouring of the input images.

## Voxel colouring algorithm

In general, the voxel colouring algorithm consists of the following three steps:

If ${\displaystyle \nu _{d}}$ is the set of voxels that are located at distance ${\displaystyle d}$ from the camera volume and ${\displaystyle \nu }$ is the set of all voxels, the idea of partitioning the space can be formalized as follows:

${\displaystyle \nu _{d}=\{V\ |\ \|V\|=d\},}$
${\displaystyle \nu =\bigcup \limits _{i=1}^{M}\nu _{d_{i}},}$ where ${\displaystyle d_{1},\dotsc ,d_{M}}$ is an increasing sequence of numbers.
• Step 2: Find the colour invariant voxels in each layer.

In the simplest case this can be achieved by iterating through all voxels in the layer, projecting them to each of the images to identify their footprint[2][3] (the set of all pixels included in the image projections of a voxel) and performing a voxel consistency test.

The projection here corresponds to "the intersection with the image plane of all rays from the camera center intersecting the voxel" [2].

Without noise or quantization effects, a consistent voxel should project to a set of pixels with equal color values. In the presence of these effects, the correlation of the pixel colors ${\displaystyle \lambda _{V}}$ is evaluated to measure the likelihood of voxel consistency[2]. While there are different heuristics for choosing the correlation function, in the simplest case the standard deviation of the pixel colour values in the visible (unoccluded) voxel projections can be used as ${\displaystyle \lambda _{V}}$ and thresholded by the maximum allowable error in the colour space (selected heuristically).

• Step 3: Mark all image pixels corresponding to the detected colour invariant voxel, if they have not been marked previously.

Marking of the pixels is necessary to account for occlusions. All pixels in input images are initially unmarked. We will denote the set of unmarked pixels from the footprint of voxel ${\displaystyle V}$ in image ${\displaystyle I_{i}}$ with ${\displaystyle \pi _{i},}$ and if the voxel passes the consistency test, the pixels from ${\displaystyle \bigcup \limits _{i=1}^{N}\pi _{i}}$ will be marked as corresponding to this voxel. If a part of the voxel’s footprint has already been marked, it means that the voxel is occluded in that image by another voxel(s), which has already been tested, because it is closer to the camera volume and therefore belongs to one of the previous layers (this is ensured by the ordinal visibility constraint).

Since occlusions are explicitly accounted for in the partitioning and marking steps, one pass through the voxel layers of increasing depth is enough to obtain the voxel colouring of the input images.

### Pseudo-code implementation

Here is the pseudo-code of the one-pass voxel colouring algorithm by Seitz and Dyer[2]:

${\displaystyle {\bar {S}}=\emptyset }$                                              //Initialize the reconstruction
for i = 1,...,M do                                  //Iterate through the layers
for every ${\displaystyle {\boldsymbol {V}}\in \nu _{d_{i}}}$ do                              //Iterate through voxels in the layer
${\displaystyle \pi =\emptyset }$
for j = 1,...,N do                                //Project the voxel to each image
compute footprint ${\displaystyle \rho \ }$ of ${\displaystyle {\boldsymbol {V}}}$ in ${\displaystyle I_{j}}$
${\displaystyle \pi _{j}=\{{\boldsymbol {p}}\in \rho \ |\ {\boldsymbol {p}}\ -\ unmarked\}}$
end for j
compute ${\displaystyle \lambda _{V}}$                                        //Evaluate voxel consistency
if ${\displaystyle \bigcup \limits _{j=0}^{N}\pi _{j}}$ is not empty and ${\displaystyle \lambda _{V} then
${\displaystyle {\bar {S}}={\bar {S}}\cup \{{\boldsymbol {V}}\}}$                                  //Color the voxel
${\displaystyle \pi =\pi \cup \bigcup \limits _{j=0}^{N}\pi _{j}}$                                 //Remember image pixels to mark
end if
end for ${\displaystyle {\boldsymbol {V}}}$
mark pixels in  ${\displaystyle \pi \ }$
end for i



Since all of the voxels are presumed to be transparent at the beginning and are changed to opaque only after passing the consistency test, Dyer compares it to the process of clay modelling.[4]

## Examples

Seitz & Dyer[1][2] presented the results of applying the voxel colouring technique to the scene reconstruction from both real and synthetic scene images. The famous examples used in the relevant literature are the reconstructions of the dinosaur toy and a rose from 21 input images taken by 360° rotation of each of the objects from the downward facing camera fixed above the object level.

Voxel colouring was shown to be efficient in re-projecting the scene views from the input images, preserving most of the fine features of the scene, as well as in reconstructing scene images from new viewpoints.

## References

1. ^ a b c d S. M. Seitz and C. R. Dyer, "Photorealistic Scene Reconstruction by Voxel Coloring", Proc. Computer Vision and Pattern Recognition Conf., pp. 1067-1073, 1997.
2. S. M. Seitz and C. R. Dyer, "Photorealistic Scene Reconstruction by Voxel Coloring", International Journal of Computer Vision, 35(2), pp. 151-173, 1999.
3. ^ L. Westover, "Footprint evaluation for volume rendering", Proc. SIGGRAPH 90, pp. 367–376, 1990.
4. ^ C. R. Dyer, "Volumetric scene reconstruction from multiple views" in "Foundations of Image Understanding", L. S. Davis, ed., Kluwer, Boston, pp. 469-489, 2001.