During the development of our scanning systems we noticed distortion of the depth information (which has previously been noticed by others) which affected the accuracy of our results. As a result we have developed our own calibrated process for reducing the magnitude of this effect.
N.B. All of the discussion and analysis herewith was conducted with the Kinect in ‘near mode’ to maximise point resolution. We initially thought this may be a contributing factor to the observed distortion, but subjective data analysis showed the distortion to be apparent in both near and standard operating modes.
Our Investigative Work
As part of our scanning work we began scanning objects of known geometries: such as planes and cylinders. Whilst examining these scans, we noticed the objects portrayed in the scans were seemingly of very different geometry to the objects that were actually scanned. Figures 1 and 2 show typical point cloud scans of a plane and a cylinder, illustrating this point.
Figures 1 & 2, Distorted point cloud scans of a cylinder and plane (side view) respectively
Interestingly, we also noticed the distortion in the cylinder to be worse when the Kinect was oriented vertically. The distortion of the plane was unchanged, although the ‘distortion pattern’ visible on the plane appeared rotated in line with the rotation of the Kinect.
We began investigating this further, to determine if the distortion varied across the Kinect’s operating range. We hypothesised the errors would increase in magnitude with an increasing depth: due to the decreasing resolution of the Kinect. To test this we scanned a plane at two different depths, and fitted a plane to the scan. Figure 3 and 4 below shows the deviation (in mm) of each point about the plane.
Figure 3 and 4, Planar distortion at 0.61m and 1.28m
From looking at figures 3 and 4 you notice the distortion follows an approximately radially symmetric pattern, which is consistent across the two depths. The distortion is also greatest at the extreme corners of the image: typical of the radial distortion you would expect at the outer extremities of a lens. As expected, the magnitude of the distortion appeared to increase with depth. These plots also explained why the distortion in the cylinder appeared to increase when the Kinect was oriented vertically. When the Kinect was in horizontal orientation the cylinder wasn’t located within the two large error bands at the left and right of the depth image. However, when the Kinect was rotated vertically the two bands were placed at the top and bottom of the image: coinciding with the top and bottom of the cylinder and hence producing the error pattern visible in figure 1 above.
We began trying to determine the source of the distortion, and in doing so tested a number of different Kinect devices. We found each device to have differing magnitudes of distortion, but the radial distortion pattern to be consistent across all the devices tested. This suggests there to be some form of calibration procedure performed at manufacture, aiming to calibrate the Kinect’s depth measurement. It appears this calibration procedure is sufficient for normal operation, but not adequate for accurate scanning. However, without knowing more about the manufacturing procedures it is hard to say.
Our distortion correction method
If you wish to read more about our method of correcting depth distortion we recommend reading our published paper: details can be found at the bottom of the page. However, we give a brief outline below.
We first collected point cloud scans and corresponding depth images of a plane at regular intervals (~10mm) over the region we wanted to calibrate. To do this in a controlled manner we developed the calibration rig shown below in figure 5. However, our processing technique negates the need for accurate alignment between the Kinect and plane
Figure 5, The calibration rig
A plane was fitted to each point cloud scan, and converted back to a depth image via the Kinects integral coordinate mapping function. This was important, as it ensured any factory calibration parameters were applied to the data.
An image subtraction procedure was then used to subtract the fitted depth plane image from the original depth image: leaving an image map of the distortion in each pixel at that particular depth. This procedure was repeated for each plane scan, building up a per pixel database of errors and their corresponding reported depth.
A second order polynomial was fitted to the error data for each pixel, providing a relationship between the Kinects reported depth and the expected error. The polynomial coefficients associated with each pixel were then stored in a calibration file unique to the device.
During subsequent data collections, the entire image space was passed over and the reported depth for each pixel used in conjunction with the requisite polynomial coefficients (for that particular pixel) to calculate a correction factor. Once the depth of each point had been corrected the depth image was passed to the Kinect’s coordinate mapping function to convert it to XYZ data. Figures 6 and 7 show a side view of a cylinder scan, illustrating how the scan looks before and after distortion correction.
Figure 6 and 7, A raw and a distortion corrected scan respectively
Our prediction on the cause of the distortion
If you are not familiar with the operating principals of the Kinect, it would be advisable to read our section on how the Kinect works, and ensure you are familiar with basic camera theory before reading further.
The Kinects depth image (and 3D scans) are actually produced by a ‘virtual camera’: calculated from the speckle reference pattern contained within the Kinect, and uv data from the Kinect’s IR camera. The IR camera indicates the position of the speckles after they have been shifted and morphed by objects in the space, which is then compared to the reference pattern. Stereo vision principals – and a number of image processing techniques – are then used to calculate the depth of each point in space.
Our previous work has shown the Kinect’s IR camera to be calibrated for radial distortion, meaning that a point in space will appear in the correct place within the IR image (in terms of uv), as the observed ‘apparent’ depth will be correct. There appears to be a systematic shift of 3px in u and v between the IR image and depth image, meaning that a point in space should appear in the correct uv position (negating scaling etc) in both the IR image and depth image.
This means the depth image uv values should be correct, but the calculated Z values appear incorrect. The distortion of Z doesn’t appear related to the IR camera or post processing pipeline: suggesting the distortion of Z is related in some way to the speckle pattern. It’s unlikely the reference pattern contained within the Kinect is distorted, which suggests the distortion arises from the IR projectors optics.
It’s likely the radial lens distortion of the IR projector hasn’t been taken into consideration, or the protective plastic lens on the front of the Kinect is distorting the projected pattern. This would cause the speckles to be projected incorrectly on the scene – relative to the reference pattern – leading to the errors observed in depth.
A number of existing studies choose to correct depth in 3D space, without making any change to the X and Y values. However, as discussed above: the value of Z (relative to u and v) will have an impact upon the calculated X and Y values. This is our reasoning for correcting the depth distortion in 2.5D space (uvZ); as highlighted, the uv data is accurate, and after correction the Z data is also accurate. The Kinect’s integral 2.5D to 3D mapping function can then be used to convert the 2.5D points to accurate 3D XYZ data.
The caveat is that this is entirely our theory! Without knowing more about the integral operating principals of the Primsense SoC, or the data pipelines within their reference design we cannot conclude whether the above discussion is correct. However, based on our extensive use of the Kinect, our research around this area, and knowledge of camera principals this appears to be the most logical explanation.
Publications associated with this project
1. Clarkson S, Wheat J, Heller B, et al. (2013) Distortion Correction of Depth Data from Consumer Depth Cameras. 4th Int. Conf. Exhib. 3D Body Scanning Download