Team Seven/Final Paper

From Maslab 2012
(Difference between revisions)
Jump to: navigation, search
Line 1: Line 1:
 +
Team pleit! competed in Maslab 2012 and won first place.
 +
 +
= Robot =
 +
 
== Mechanical design ==
 
== Mechanical design ==
  
Line 13: Line 17:
 
== Vision code ==
 
== Vision code ==
  
Coming soon
+
The key to our successful vision code was its simplicity.  As computer vision problems go, Maslab is on the easier side -- field elements are colored specifically to help robots detect them.  In particular, red and yellow are very distinctive colors and easy for a computer to identify.
 +
 
 +
=== Design ===
 +
 
 +
The 640-by-480 color image from our Kinect was downsampled to 160-by-120.  In order to minimize noise, we downsampled using area averaging rather than nearest-neighbor interpolation.  The 160-by-120 image was then converted to the HSV color space in order to facilitate processing using distinct hue and lumosity information.
 +
 
 +
Pixels in the HSV image were then classified as red, yellow, or other.  We found that the best metric for color recognition was distance in HSV space (sum of squares of component errors) rather than hue threshholding.
 +
 
 +
Flood fill was then used to identify all connected components of each color.  For each connected component, we computed the average and variance of all x coordinates, the average and variance of all y coordinates, and the total number of pixels.
 +
 
 +
When navigating to a ball, we simply used the coordinates of the largest red blob in the frame.  Similary, when navigating to a wall, we simply used the coordinates of the largest yellow blob in the frame.  We did not attempt to track changes in the location of objects from frame to frame -- we just used the largest object in each frame.
 +
 
 +
We used a simple proportional controller to drive to target objects using vision data.  This was necessary because our drive motors used open-loop control.
 +
 
 +
=== Implementation ===
 +
 
 +
The bulk of our vision code was written in pure Python, with the most CPU-intensive parts compiled using Cython.  We had fewer than 200 lines of vision code overall.  The image resizing and color space conversion used OpenCV, but all other code (including flood fill) was custom.
 +
 
 +
We wrote a GUI app with sliders to set color detection constants that displayed an image showing objects that had been recognized.  We anticipated adjusting these constants before every match -- however, our color detection proved robust enough that we did not adjust color paramters at all after the second week.
 +
 
 +
We also wrote a speed tester that automatically profiled the code to help us find and eliminate bottlenecks.
 +
 
 +
=== Performance ===
 +
 
 +
The first iteration of our vision code ran at 10 frames per second.  Using the Python profiler to determine which statements were bottlenecks, we increased it to 100 frames per second by the second week.  Since the camera only provided 30 frames per second, we had plenty of time between frames to do other processing.  This eliminated the need to use multiple threads of execution.
 +
 
 +
The most expensive part of our vision code was downsampling the .3 megapixel image to .01 megapixels.  After that, everything else was very quick.
 +
 
 +
From a programming standpoint, it was very convenient to do everything in a single thread: get image, process image, update state machine, send new drive commands, and repeat.
 +
 
 +
=== Effectiveness ===
 +
 
 +
Our vision code was extremely effective at detecting red and yellow objects.  We had very few false negatives and no false positives.  Greem detection was easy as well, but we got rid of it since we didn't use green object information at all.  It was difficult to distinguish the carpet, blue tops of walls, and white walls.
 +
 
 +
Rather than implementing blue line filtering, we mounted our camera so that the top of its field of view was perfectly horizontal and 5 inches above the floor.  It was physically unable to see over walls.  In this configuration, it was not necessary to identify blue or white.
 +
 
 +
We were one of the teams using a Kinect for vision.  We did not end up using the depth data at all, since it was only reliable for objects two feet or more from the robot.  However, we still feel that it was advantageous to use the Kinect, since its regular camera (designed for video games) had better color and white balance properties than the cheap Logitech webcam.
  
 
== Navigation code ==
 
== Navigation code ==
  
 
Coming soon
 
Coming soon

Revision as of 23:06, 6 February 2012

Team pleit! competed in Maslab 2012 and won first place.

Contents

Robot

Mechanical design

Coming soon

Electrical concerns

Coming soon

Arduino functionality

Coming soon

Vision code

The key to our successful vision code was its simplicity. As computer vision problems go, Maslab is on the easier side -- field elements are colored specifically to help robots detect them. In particular, red and yellow are very distinctive colors and easy for a computer to identify.

Design

The 640-by-480 color image from our Kinect was downsampled to 160-by-120. In order to minimize noise, we downsampled using area averaging rather than nearest-neighbor interpolation. The 160-by-120 image was then converted to the HSV color space in order to facilitate processing using distinct hue and lumosity information.

Pixels in the HSV image were then classified as red, yellow, or other. We found that the best metric for color recognition was distance in HSV space (sum of squares of component errors) rather than hue threshholding.

Flood fill was then used to identify all connected components of each color. For each connected component, we computed the average and variance of all x coordinates, the average and variance of all y coordinates, and the total number of pixels.

When navigating to a ball, we simply used the coordinates of the largest red blob in the frame. Similary, when navigating to a wall, we simply used the coordinates of the largest yellow blob in the frame. We did not attempt to track changes in the location of objects from frame to frame -- we just used the largest object in each frame.

We used a simple proportional controller to drive to target objects using vision data. This was necessary because our drive motors used open-loop control.

Implementation

The bulk of our vision code was written in pure Python, with the most CPU-intensive parts compiled using Cython. We had fewer than 200 lines of vision code overall. The image resizing and color space conversion used OpenCV, but all other code (including flood fill) was custom.

We wrote a GUI app with sliders to set color detection constants that displayed an image showing objects that had been recognized. We anticipated adjusting these constants before every match -- however, our color detection proved robust enough that we did not adjust color paramters at all after the second week.

We also wrote a speed tester that automatically profiled the code to help us find and eliminate bottlenecks.

Performance

The first iteration of our vision code ran at 10 frames per second. Using the Python profiler to determine which statements were bottlenecks, we increased it to 100 frames per second by the second week. Since the camera only provided 30 frames per second, we had plenty of time between frames to do other processing. This eliminated the need to use multiple threads of execution.

The most expensive part of our vision code was downsampling the .3 megapixel image to .01 megapixels. After that, everything else was very quick.

From a programming standpoint, it was very convenient to do everything in a single thread: get image, process image, update state machine, send new drive commands, and repeat.

Effectiveness

Our vision code was extremely effective at detecting red and yellow objects. We had very few false negatives and no false positives. Greem detection was easy as well, but we got rid of it since we didn't use green object information at all. It was difficult to distinguish the carpet, blue tops of walls, and white walls.

Rather than implementing blue line filtering, we mounted our camera so that the top of its field of view was perfectly horizontal and 5 inches above the floor. It was physically unable to see over walls. In this configuration, it was not necessary to identify blue or white.

We were one of the teams using a Kinect for vision. We did not end up using the depth data at all, since it was only reliable for objects two feet or more from the robot. However, we still feel that it was advantageous to use the Kinect, since its regular camera (designed for video games) had better color and white balance properties than the cheap Logitech webcam.

Navigation code

Coming soon

Personal tools