Solution
Our system uses Microsoft Kinect cameras to aquire depth images from above the door. The data is then processed using an image processing pipeline, in order to go from depth data to individual human heads. These are then tracked and counted when they pass into a defined room edge.
The image processing pipeline is built to be highly configurable and modular, and was developed alongside alongside the main project so that it could be reused in the future.
The solution is cross platform in the sense that it works on most UNIX-like platforms as well as on Windows. Is supports multiple Kinects for rooms with multiple entrences. It is delivered as a GUI version, supporting callibration and configuration, and a headless version. The GUI version also enable easy debugging by in most cases explaining what is wrong when so be, and by letting the user display most intermediate steps in the image processing. The later provide the means for the operator to get a feel for how the different parameters affect the system, without needing any deeper knowledge of computer vision.
The figures below show examples of tracking and queue detection performed by the system.
![]() |
![]() |
Performance
Our system is very accurate as can be seen in the table below. Under normal circumstances it counts well over 90 % of all people passing by.