Converting 2D golf swing sequences into 3D models

Wednesday, January 30, 2008

Detection Using a Kind of Hysteresis

We are experimenting with an approach that makes use of two Hough transforms (we'll call them HI and LO) -- one with higher thresholds for line segment length and proximity, and one with lower thresholds. [Our use of HI and LO probably isn't the best choice. HI corresponds to a strict threshold that potentially throws away good signal. LO corresponds to a looser threshold that may let in more bad signal.]

Aside: OpenCV's cvHoughLines2 returns a traditional rho-theta representation if passed the CV_HOUGH_STANDARD flag and finite line segments (with x-y coordinates) if passed the CV_HOUGH_PROBABALISTIC flag.

At a high level, our approach keeps line segments from HI when they exist, and when they don't, it keeps line segments from LO that are "near" the last detected club segment.

We make several simple implementation choices, but we still achieve decent results (demonstrated below):
  • We only keep the results from HI when there are exactly 2 segments. For the parameters we have chosen for HI and for the test video we are using, there are often exactly 2 segments corresponding to either edge of the club shaft. We can relax this by also allowing 1 (or 3) good segment(s) to serve as our club hypothesis, or 2 pairs of 2 segments that are co-linear but disconnected (e.g. because of an occlusion).
  • The first time a HI segment is recognized, we assume that this is a good starting position for the club. We compare the two endpoints of the segment and label the higher one as the original hand position. We then use this an anchor point to determine in all subsequent segments which end is the hand and which end is the club head. Of course, if the first HI segment is not a good match for the club shaft, the entire tracking algorithm will suffer.
  • We keep track of the last club segment detected by HI for use when a subsequent frame has no HI signal. When a frame has only LO signal, all of its individual segments are compared to the last HI segment. We choose the segment that minimizes a simple error metric, defined to be the difference in slope and in positions of hand and head points. A more sophisticated approach would be to incorporate a notion of the motion model of the club. This, however, seems like overkill, because the motion model is the thing we want to *learn* in a later phase.
Here are results from the approach described above, with green lines representing segments detected by HI, red lines representing the closest LO segments to the last HI segment detected, yellow dots representing our notion of hand location, and white dots representing our notion of club head location.


youtube

This does okay, except in long stretches of frames where no HI signal is observed. A simple way to address this problem is to keep track of the last segment detected, whether it be from HI or LO. This approach is demonstrated below, and does much better with long stretches of LO frames.



youtube

Notice that in the vast majority of frames, the line segment we output is at least co-linear with the club shaft, even though the length is often incorrect. We should be able to use color information to better hypothesize the length of the shaft.

Although our simple "tracking" approach works well on this test case, one possibility to improve accuracy is to maintain a window of the last k frames instead of just 1.

No comments: