AIT Asian Institute of Technology

> > >

Automated analysis of visual cues for real world monitoring tasks
Author	Qureshi, Waqar Shahid
Note	A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Engineering in Mechatronics, School of Engineering and Technology
Publisher	Asian Institute of Technology
Abstract	Conventional methods for inspection and monitoring using human operators are tedious, time consuming, and prone to errors. Technological innovations in the fields of image processing and machine vision have provided opportunities to automate these manual tasks in applications as diverse as surveillance, medical diagnostics, remote sensing, industrial quality control, and precision agriculture. Such automation can increase efficiency, productivity, effectiveness, speed, quality, and yield. Researchers over the years have investigated the feasibility, applications, and implications of different sensors and image processing algorithms, and now the use of low-cost sensors and mobile robots for monitoring and inspection applications is a reality. In this dissertation I focus on inspection and monitoring tasks that can be automated by applying image processing techniques to the output obtained from optical sensors (color and color-depth sensors). Although some of the industrial and military applications involving vision sensors for automation have already evolved into real-world products, there are many areas that still require thorough investigation and research for real-world implementation. I target applications that have societal importance in the areas of safety and precision agriculture. First, I present QuickBlaze, a flame and smoke detection system based on vision sensors aimed at early detection of fire incidents for open or closed indoor and outdoor environments. We use simple image and video processing techniques to compute motion and color cues, enabling segmentation of flame and smoke candidates from the background in real time. QuickBlaze does not require any offline training, although manual adjustment of parameters during a calibration phase is required to cater to the particular camera’s depth of view and the surrounding environment. In an extensive empirical evaluation benchmarking QuickBlaze against commercial fire detection software, we find that it has a better response time, is 2.66 times faster, and better localizes fire incidents. Detection of fire using our real-time video processing approach early on in the burning process holds the potential to decrease the length iii of the critical period from combustion to human response in the event of a fire. Second, we present a novel method for joint localization of a quadcopter pursuer with a monocular camera and an arbitrary target. Our focus is on mobile robots that are capable of tracking and monitoring a target in scenarios such as person/child/animal monitoring or tracking a fugitive. Our method localizes both the pursuer and target with respect to a common reference frame. We show that predicting and correcting pursuer and target trajectories simultaneously produces better results than standard approaches to estimating relative target trajectories in a 3D coordinate system. The efficiency of the proposed method is demonstrated by a series of experiments with a real quadcopter pursuing a human. The results show that the visual tracker can deal effectively with target occlusions and that joint localization outperforms standard localization methods. Third, I present a textured fruit segmentation method based on super-pixel oversegmentation, dense SIFT descriptors, and bag-of-visual-word histogram classification within each superpixel. An empirical evaluation of the proposed technique for textured fruit segmentation yields a 96.67% detection rate, a per-pixel accuracy of 97.657%, and a per frame false alarm rate of 0.645%, compared to a detection rate of 90.0%, accuracy of 84.94%, and false alarm rate of 0.887% for the baseline sparse keypoint-based method. I conclude that super-pixel over-segmentation, dense SIFT descriptors, and bag-of-visual-word histogram classification are effective for in-field segmentation of textured green fruits from the background. Fourth, I present two new methods for automated counting of fruit in images of mango tree canopies, one using texture-based dense segmentation and one using shape-based fruit detection, and compare the use of these methods relative to existing techniques. We tested the robustness of each algorithm on multiple sets of images of mango trees acquired over a period of three years. These images sets vary in imaging conditions (light and exposure), distance to the tree, average number of fruit on the tree, orchard, and season. I find that for fruit-background segmentation, K-nearest neighbor pixel classification based on color and smoothness or pixel classification based on super-pixel over-segmentation, clustering iv of dense SIFT (Scale Invariant Feature Transform) features into visual words, and bag-ofvisual- word super-pixel classification using support vector machines is more effective than simple contrast and color based segmentation. I find that pixel classification is best followed by fruit detection using an elliptical shape model or blob detection using color filtering and morphological image processing techniques. Fifth, I investigate the use of RGB-D based modeling of natural objects using RGB-D sensors and a combination of volumetric 3D reconstruction and parametric shape modeling. We apply the general method to the specific case of detecting and modeling quadric objects (pineapple fruit) in cluttered agricultural environments, towards applications in fruit health monitoring and crop yield prediction. Our method estimates the camera trajectory then performs volumetric reconstruction of the scene. Next, we detect fruit and segment out point clouds that belong to fruit regions. We use two novel methods for robust estimation of a parametric shape model from the dense point cloud: (i) MSAC-based robust fitting of an ellipsoid to the 3D-point cloud, and (ii) nonlinear least squares minimization of dense SIFT (scale invariant feature transform) descriptor distances between fruit pixels in corresponding frames. We compare our shape modeling methods with a baseline direct ellipsoid estimation method. We find that our parametric shape modeling methods are more robust and better able to estimate the size, shape, and volume of pineapple fruit than is the baseline direct method. The techniques proposed in this dissertation will aid the development of new and evolution of existing machine vision based civilian applications that are emerging due to the availability of low-cost optical sensors and computing systems.
Year	2015
Type	Dissertation
School	School of Engineering and Technology (SET)
Department	Department of Industrial Systems Engineering (DISE)
Academic Program/FoS	Microelectronics (ME)
Chairperson(s)	Mongkol Ekpanyapong;Dailey, Matthew N. ;
Examination Committee(s)	Manukid Parnichkun;Soni, Peeyush ;
Scholarship Donor(s)	National University of Science and Technology Pakistan;AIT Fellowship;