The use of optical 3D shape measurement devices are rapidly gaining importance, allowing the reconstruction of real 3D objects efficiently. The 3D shape and texture can be obtained from stereo images acquired with a freely moving camera. This approach measures the image displacement from the corresponding feature points in stereo images of the same scene, taken from different views. Although the camera parameters can be estimated from stereo images sequences, the accuracy can be improved using a calibrated camera.
ADVERTISEMENT |
Active vision techniques such as structured light techniques, can also be used to improve accuracy of the point correspondence. The introduction of active illumination aims to simplify the surface reconstruction problem, and provides high accuracy reconstruction by creating a point cloud that represents the shape of the scanned object. The structured light technique projects large patterns onto the scene using a video projector, and observes deformations of the patterns in the image to infer depth. Moreover, for static objects, the accuracy can be substantially increased by using pattern sequences. To capture reliable data, it is possible to use the binary coding to scan a static object. The binary coding uses black-and-white stripes to form a sequence of projection patterns, such that each point on the surface of the object possesses a unique binary code that differs from any other codes of different points. The binary coding technique is very reliable and less sensitive to the surface characteristics, since only binary values exist in all pixels. However, to achieve high spatial resolution, a large number of sequential patterns need to be projected. This work provides a metric measurement performance and comparison using a stereo images approach and binary-coding, structured light approach.
Introduction
3D reconstruction using 2D images is useful in the 3D reconstruction process; it can be used in perception sensors and can perform computer vision techniques. In this approach, 2D images are processed to recover the depth information. One of the methods to recover the depth information depends on a previous camera calibration to determine its position relative to a reference coordinate.1 In a traditional camera calibration, an object, whose geometry in the 3D space is known with good precision, is used as a calibration object.2,3 The self-calibration technique, on the other hand, does not use any calibration object to determine the camera internal parameters. From the camera movement in a static scene and considering the rigidity of the scene, it is possible to evaluate two constraints on the camera internal parameters using exclusively image information.4 Luong and Faugeras showed that correspondences between three images are sufficient to recover the camera internal and external parameters (considering that the images were taken by the same camera with fixed internal parameters).5
Although this approach is very flexible, it is not always reliable due to the number of parameters to estimate and their high noise sensitivity. In cases where the 3D model should look similar to the real object, the accuracy of the acquired data is essential. Therefore, whenever possible the calibration approach should be performed because it provides better accuracy when compared with the self-calibration approach. When calibration is impossible (e.g., scene reconstruction from an old movie), self-calibration is the only choice.6
Active vision techniques can also be used to improve accuracy of the point correspondence. Among them, it is possible to mention the structured light technique. The introduction of active illumination aims to simplify the surface reconstruction problem and provides higher accuracy reconstruction by creating a point cloud, which represents the shape of the scanned object. In the structured light technique, large patterns are projected onto the scene using a video projector, and deformations of the patterns are observed in a camera to infer depth. Although, it is possible to use commercial structured light scanners (e.g., Kinect) to reconstruct scenes with good resolution, other devices attached to the projector and different patterns can be used to allow a resolution around 50 µm. 7,8
This paper provides a metric measurement performance and comparison using a stereo image and structured light approach. The stereo image approach uses at least two cameras, and the structured light approach uses one camera and a projector that works like an inverted camera. Because it is necessary to know whether an estimated measurement satisfies accuracy requirements, this work also presents the measurement error for both approaches. This paper is organized as follows. The stereo camera and the structured light techniques are described in sections two and three, respectively. The calibration step for both approaches is detailed in section four. The reprojection error is discussed in section five. Finally, some results are shown in section six, and section seven presents the conclusions and future works.
Stereo cameras
One approach to recover the depth information from 2D images is by using two cameras. The problem of recovering the missing dimension from a set of images is essentially the correspondence problem, which matches image properties between two or more images.9
Figure 1 shows how the 3D point M can be calculated given two images taken from cameras C and C'. The corresponding points m and m' within both images are projections of M. It is possible to notice that the depth can be recovered using a simple triangulation when the correspondence in the images are known. The geometric entities involved in the epipolar geometry are the epipoles, the epipolar plane, the epipolar line, and the baseline. The focal point of the camera is the camera center (C and C'). Each camera center projects onto a distinct point into the other camera image plane. These two image points are called epipoles (e and e'). The epipolar plane is the plane that contains the camera centers and one 3D point. The epipolar line is the intersection of an epipolar plane with the image plane (lm' and l'm). Note that all epipolar lines pass through the epipole and the projected point on the image plane. The baseline is the separation between the optical centers. With this geometry, any point M in the 3D space forms with the camera centers a plane that intercepts the two images in a line that necessarily passes through the epipoles.
C C' M
Considering that two cameras are side by side (with just a translation difference), the difference in the position of the corresponding points in their respective images is called disparity, and it can be used to recover depth (see figure 2 for details). The disparity is the amount by which the two images of M are displaced relative to each other.
Using the camera internal parameters and the baseline B (see figure 2 for details), the depth can be evaluated by the following relationship.
In order to obtain the 3D position of the captured scene, it is necessary to accomplish two tasks. First, it is necessary to identify where each surface point that is visible in the left image is located in the right image. Second, the exact camera geometry must be known to compute the ray intersection point for associated pixels of the left and right cameras.
Structured light
Structured-light methods use a projector to create a structured light pattern onto the scene. This technique determines the corresponding coordinates between the projector and the structured-light patterns observed through the camera. Giving specific code words to every unitary position of the image, the projected pattern imposes the illusion of texture onto an object, as well as increases the number of points of correspondence from two different perspectives.10 This active device is modeled as an inverse camera, so the calibration step is similar to the procedure used in classical stereo vision system.11 As the structure of the projected pattern is known, the object can be 3D reconstructed by using one single image, looking for differences between the projected and the recorded pattern. Since the point location on both the projector and the camera is known, the distance from the camera to the object can be easily computed, and the reconstruction can be carried out by triangulation based on the correspondences.
A coded structured-light system is based on the projection of a single pattern or a sequence of patterns onto the scene, which is imaged, by a single camera or a set of cameras. Coding strategies can be classified as temporal coding, spatial coding, and direct coding. In systems that use spatial encoding, the patterns are specially designed so that code words are assigned to a set of pixels. Every coded pixel has its own code word, so there is a direct mapping from the code words to the corresponding coordinates of the pixel in the pattern. The code words are simply numbers, which are mapped in the pattern by using gray levels, color, or geometrical representations. As the number of points (i.e., code words) that must be coded increases, the mapping of such code words to a pattern is more difficult. The problems are that they typically need complex patterns or colors to encode position information. To determine the spatial codes uniquely, the size of a code becomes large. Such patterns are easily affected by textures, shape discontinuities, and image compression caused by tilted surfaces.
To improve the accuracy of the reconstructed object, temporal encodings, which are based on projection of a sequence of light patterns onto the object, will be used. While such patterns are not well suited to dynamically scan the scenes, they have the benefit of being easy to decode and are robust to surface-feature variation, producing accurate reconstructions for static objects.
A robust, commonly used structured light system is based on binary light patterns.12 The binary coded pattern uses black-and-white stripes to form a sequence of projection patterns, such that each point on the surface of the object possesses a unique binary code that differs from any other codes of different points. In general, N patterns can code 2N stripes. Figure 3 shows a simplified 2-bit projection pattern. Once this sequence of patterns is projected onto a static scene, there are eight unique areas coded with unique stripes.3 The 3D coordinates (x, y, z) can be computed (based on a triangulation principle) for all eight points along each horizontal line, thus forming a full frame of the 3D image. The binary coding technique is very reliable, less sensitive to the surface characteristics, and easy to decode, since only binary values exist in all pixels. However, to achieve higher spatial resolution, a large number of sequential patterns need to be projected.
The reconstruction performance of this approach lies on the accurate pixel classification within the black-and-white stripes for both diffuse and nondiffuse scenes. Even though the process is conceptually simple, it is difficult to achieve robust classification in real-world scenes containing complex surface light interactions that include indirect lighting effects.
For structured light analysis, projecting a gray code(see figure 4) is superior to a binary code projection.13 Since successive numbers of the gray code vary exactly in one bit, wrong decoding, which is most likely to occur at locations where one bit switches, introduces only a misplacement of at most one resolution unit. On the other hand, the width of bright and dark lines in the pattern with finest resolution is twice as wide compared to the binary code. This facilitates analysis especially at steep object surfaces where the code appears to be compressed. Since it used a per pixel varying threshold, the gray code solution is very robust. However, the resolution is limited to half the size of the finest pattern.
Calibration
Camera calibration is the process of determining the internal camera geometric and optical characteristics (intrinsic parameters: principal point, focal length, and distortion coefficients), and the 3D position and orientation of the camera frame relative to a certain world coordinate system (i.e., extrinsic parameters). Among the camera calibration techniques, it is possible to mention the methods proposed by Tsai and Zhang.14,6 In the camera calibration, the pinhole camera model is used (see figure 5). This model considers the projection of 3D points onto a plane. Under this model, a 3D point with coordinates M=(X,Y,Z)T is mapped to the point on the image plane where a line joining point M to the principal point meets the image plane. Using a geometric relationship and considering the 2D image plane, it is possible to compute that point (X,Y,Z)T is mapped to (ƒ X/Z, ƒ Y/Z)T on the image plane. This mapping assumes that the origin of coordinates in the image plane is the principal point. Considering the general case, where the image plane is not at the principal point, the principal point coordinate must be considered (px,py)T and the mapping becomes
The matrix form of this mapping is written as
and, it has the following concise form:
where K is the camera calibration matrix. MCAM was used because the camera is assumed to be located at the origin of an Euclidean coordinate system with the principal axis of the camera pointing down with the z-axis. Therefore, it is necessary to move the coordinate system to the world coordinate system. This can be done using the following relationship
Where R is a 3 x 3 rotation matrix representing the orientation of the camera coordinate frame, and C represents the coordinate of the camera center in the world coordinate frame. In particular, if the number of pixels per unit distance in image coordinates is nx and ny, in the x and y directions respectively, then the general form of the calibration matrix is
Where ax = f·nx and ay = f·ny similarly x0 = nxpx and y0 = nypy.
The task of camera calibration is to determine the parameters of the transformation between an object in 3D space and the 2D image observed by the camera from visual information. The transformation includes:
• Extrinsic parameters (also called as external parameters): camera orientation and location.
• Intrinsic parameters (also called as internal parameters): characteristics of the camera (x, y, x0, and y0)
A classic calibration technique uses a known-size chessboard pattern, which was first presented by Zhang.6 Using the stereo camera to capture the calibration pattern image, it is possible to extract the corners at pixel (or subpixel) level, and map them to world coordinates. If there is a sufficient number of correspondences between 3D and 2D points, one can try to solve a homogeneous linear system of equations based on the projection matrix K. The solution is also denoted as implicit camera calibration, since the resulting parameters do not have any physical meaning. In the next stage, the intrinsic and extrinsic camera parameters can be extracted from the computed solution of K. A direct linear transform (DLT) to compute the parameters in a noniterative algorithm can be used for the projection matrix. After evaluating both camera parameters, the correspondence between image points in different views provides a relationship for the cameras (rotation and translation movement).
Calibration of structured light scanners is usually more complicated than that of a passive stereo pair. A standard approach consists of three steps: estimating the camera intrinsic matrix (camera calibration), estimating the plane equations for each of the projection planes (projector calibration), and finally, estimating the Euclidean transformation between the camera and the projector (projector-camera calibration). The camera is calibrated using a chessboard pattern, and the projector is calibrated as an inverse camera. This means that instead of taking pictures of a chessboard with known geometry and detecting the corners inside the images, a chessboard pattern with known geometry is projected with different orientations and positions, and the projections are measured with the calibrated camera.
Reprojection error
One of the errors in the 3D coordinate point localization is the systematic error arising from the calibration if the camera parameters are incorrectly estimated. This will result in 3D coordinate measurement errors, and so their effects must be quantified in order to determine the accuracy of the method. To quantify the calibration estimation, the reprojection error can be used.
The reprojection error is the distance between the estimated and the real location of a point (in pixels). The error of the estimation depends on the quality of the alignment, and it is effectively a good indicator of the accuracy of the calibration methodology. The quality of the camera calibration has a direct impact on the quality of the results in systems where the relative projection matrices between cameras must be very accurate.
The reprojection error uses the camera projection matrix. Considering k points detected where a point xj in the image corresponds to a 3D point Xj, the reprojection error for a view i can be written as
A low reprojection error indicates an accurate projection matrix, at least for the points on the plane that were used to compute the projection matrix. Although, the reprojection error may increase for 3D points off the plane it provides a metric to evaluate the calibration performance.
Results
As mentioned before, reprojection errors are indicators of calibration accuracy. Therefore, a minimum reprojection error measured in the calibration image does not ensure the best reconstruction accuracy of arbitrary objects. They will be presented here as a reference for comparison. After the calibration, the reprojection error using stereo cameras was 0.2567 for both cameras because the same pattern was used in the calibration. For the structured light approach, the camera reprojection error was 0.2792, and the projector reprojection error was 0.4329.
A detailed analysis of the measurement quality for both methods was performed generating the 3D point cloud of a flat surface. The ground truth analysis can be viewed in Table I, figure 6, and figure 7.
Table I: 3D Points Depth Ground Truth Comparison
|
Max. Error |
Mean Error |
Std. Deviation |
Stereo camera |
0.42 |
0.00019 |
0.182178 |
Structure light |
0.59 |
0.00503 |
0.232982 |
Although the stereo camera provided better accuracy for the ground truth measurement, it was necessary to use reference marks because the triangulation between points is only possible in a scene with features like corners or edges. In the structured light approach, the mark was not used because the correspondence was performed using the gray code projected horizontally and vertically.
The reconstruction performance was also analyzed using a real object (see figure 8). Figure 9 shows the 3D point cloud recovered using both methods. The comparison of the recovered point cloud showed that the structured light approach can generate higher density of points because it uses the projector pattern to add additional information. Although it is possible to identify the contour of the object, the lack of resolution makes the measurement harder. The object measurement using the structured light approach was easier, and the analysis showed that the error is around 0.1 mm.
Conclusion
Tests showed that it is possible to perform the reconstruction using both techniques. Using the stereo camera techniques, the number of points correspondences in the image has significant influence in the reconstruction resolution. Using the structured light technique, tests showed that because the reconstruction depends on the illumination, the material used in the process has great influence, especially for shiny, reflective, or transparent objects. Areas that were in shadow during the scan were affected by the reconstruction.
For the future work, it is necessary to improve the structured light calibration process and analyze other approaches to reconstruct shiny objects. With the stereo camera technique, the use of multiples images is necessary to increase the reconstruction resolution.
Acknowledgements
This project was partially supported by a joint project from JSPS/CAPES under the Japan-Brazil Research Cooperative Program and Grants-in-Aid for Scientific Research (grant 24500539). R. Y. Takimoto was supported by FAPESP (grant 2011/22402-8), and M. S. G. Tsuzuki was partially supported by CNPq (grant 309570/2010-7).
References
1. Ito, M. “Robot vision modelling-camera modelling and camera calibration.” Advanced Robotics, Vol. 5, pp. 321–335, 1991.
2. Faugeras, O. Three-Dimensional Computer Vision: a Geometric Viewpoint. MIT Press, 1993.
3. Zhang, Z. “A Flexible New Technique for Camera Calibration.” IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 22, No. 11. November 2000.
4. Maybank, S. J., and O. D. Faugeras. “A theory of self-calibration of a moving camera.” The International Journal of Computer Vision, Vol. 8, No. 2, pp. 123–152, Aug. 1992.
5. Luong, Q.-T., and O. D. Faugeras. “Self-calibration of a moving camera from point correspondences and fundamental matrices.” The International Journal of Computer Vision, Vol. 22, No. 3, pp. 261–289, 1997.
6. Zhang, Z. “Camera Calibration.” Chapter 2, pp. 4–43, Emerging Topics in Computer Vision, G. Medioni and S.B. Kang, eds., Prentice Hall Professional Technical Reference, 2004.
7. Takimoto, R. Y.; R. Vogelaar; E. K. Ueda; A. K. Sato; T. C. Martins; T. Gotoh; S. Kagei; and M. S. G. Tsuzuki. “3D Reconstruction Using Low Precision Scanner.” 11th IFAC Workshop on Intelligent Manufacturing Systems, pp. 239–244, Sâo Paulo, Brazil, 2013.
8. Ritz, M.; F. Langguth; M. Scholz; M. Goesele; and A. Stork. “High-resolution acquisition of detailed surfaces with lens-shifted structured light.” Computers & Graphics, Vol. 36, No. 1 , pp. 16–27, 2012.
9. Takimoto, R. Y.; T. C. Martins; F. K. Takase; and M. S. G. Tsuzuki. “Epipolar Geometry Estimation, Metric Reconstruction, and Error Analysis From Two Images.” Proceedings of the 14th IFAC Symposium on Information Control Problems in Manufacturing, pp. W196-W201. IFAC Bucareste, Romania, 2012.
10. Salvi, J.; J. Batlle; and E. Mouaddib. “A robust-coded pattern projection for dynamic 3D scene measurement.” Pattern Recognition Letters, Vol. 19, No. 11, pp. 1055–1065, 1998.
11. Zhang, S. and P. S. Huang. “Novel method for structured light system calibration.” Optical Engineering, Vol. 45, No. 8, 2006.
12. Posdamer, J. L. and M. D. Altschuler. “Surface measurement by space-encoded projected beam systems.” Computer Graphics Image Processing, Vol. 18, No. 1, pp. 1–17, 1982.
13. Sato K. and S. Inokuchi. “Range-imaging system utilizing nematic liquid crystal mask.” Proceedings of the International Conference on Computer Vision, pp. 657–661, IEEE Computer Society Press, 1987.
14. Tsai, R. Y. “An efficient and accurate camera calibration technique for 3D machine vision.” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 364–374, Miami Beach, Florida, 1986.
Add new comment