Wednesday, February 23, 2011

HOG Descriptor

Excellent paper by Dalal and Triggs. It gives a working example on choosing of various modules at the recognition pipeline for human figure (pedestrians).

Much simplified summary
It uses Histogram of Gradient Orientations as a descriptor in a 'dense' setting. Meaning that it does not detect key-Points like SIFT detectors (sparse). Each feature vector is computed from a window (64x128) placed across an input image. Each vector element is a histogram of gradient orientations (9 bins from 0-180 degrees, +/- directions count as the same). The histogram is collected within a cell of pixels (8x8). The contrasts are locally normalized by a block of size 2x2 cells (16x16 pixels). Normalization is an important enhancement. The block moves in 8-pixel steps - half the block size. Meaning that each cell contributes to 4 different normalization blocks. A linear SVM is trained to classify whether a window is human-figure or not. The output from a trained linear SVM is a set of coefficient for each element in a feature vector.

I presume Linear SVM means the Kernel Method is linear, and no projections to higher dimension. The paper by Hsu, et al suggests that linear method is enough when the feature dimension is already high.

OpenCV implementation (hog.cpp, objdetect.hpp)
The HOGDescriptor class is not found in the API documentation. Here is notable points judging by the source code and sample program(people_detect.cpp):

  • Comes with a default human-detector. It says at the file comment that it is "compatible with the INRIA Object Detection and Localization toolkit. I presume this is a trained linear SVM classifier represented as a vector of coefficients;
  • No need to call SVM code. The HOGDescriptor.detect() function simply uses the coefficients on the input feature-vector to compute the weight-sum. If the sum is greated than the user specified 'hitThreshold' (default to 0), then it is a human-figure.
  • 'hitThreshold' argument could be negative.
  • 'winStride' argument (default 8x8)- controls how the window is slide across the input window.
  • detectMultiScale() arguments
    • 'groupThreshold' pass-through to cv::groupRectangles() API - non-Max-Suppression?
    • 'scale0' controls how much down-sampling is performed on the input image before calling 'detect()'. It is repeated for 'nlevels' number of times. Default is 64. All levels could be done in parallel.
Sample (people_detect.cpp)
  • Uses the built-in trained coefficients.
  • Actually needs to eliminate for duplicate rectangles from the results of detectMultiScale(). Is it because it's calling to match at multiple-scales?
  • detect() return list of detected points. The size is the detector window size.
  • With GrabCut BSDS300 test images - only able to detect one human figure (89072.jpg). The rest could be either too small or big or obscured. Interestingly, it detected a few long-narrow upright trees as human figure. It takes about 2 seconds to process each picture.
  • With GrabCut Data_GT test images - able to detect human figure from 3 images: tennis.jpg, bool.jpg (left), person5.jpg (right), _not_ person7.jpg though. An interesting false-positive is from grave.jpg. The cut-off tomb-stone on the right edge is detected. Most pictures took about 4.5 seconds to process.
  • MIT Pedestrian Database (64x128 pedestrian shots):
    • The default HOG detector window (feature-vector) is the same size as the test images.
    • Recognized 72 out of 925 images with detectMultiScale() using default parameters. Takes about 15 ms for each image.
    • Recognized 595 out of 925 images with detect() using default parameters. Takes about 3 ms for each image.
    • Turning off gamma-correction reduces the hits from 595 to 549.
  • INRIA Person images (Test Batch)
    • (First half) Negative samples are smaller in size at (1 / 4) of Positives, 800 - 1000 ms, the others takes about 5 seconds.
    • Are the 'bike_and_person' samples there for testing occlusion?
    • Recognized 232/288 positive images. 65 / 453 negative images - Takes 10-20 secs for each image.
    • Again cut-off boxes resulting in long vertical shape becomes false positives
    • Lamp Poles, Trees, Rounded-Top Extrances, Top part of a tower, long windows are typical false positives. Should upright statue considered 'negative' sample?
    • Picked a few false-negatives to re-run with changing parameters. I picked those with large human-figure and stands mostly upright. (crop_00001.jpg, crop001688.jpg, crop001706.jpg, person_107.jpg).
      • Increased the nLevels from default(64) to 256.
      • Decrease 'hitThreshold' to -2: a lot more small size hits.
      • Half the input image size from the original.
      • Decrease the scaleFactor from 1.05 to 1.01.
      • Tried all the above individually - still unable to recognize the tall figure. I suppose this has something to do with their pose, like how they placed their arms.
Histograms of Oriented Gradients for Human Detection, Dalal & Triggs.
A Practical Guide to Support Vector Classifier, Hsu, Chang & Lin


  1. Hi,
    I am glad you were able to play around with the software. I have one question. How did you make the GroupRectangles(in hog.cpp around line 950) part work. For me it is giving a linking error. Is there a way around it? At the moment I have created my own Rectangle grouping algo.
    Also there is no Max suppression on hitThreshold.
    It does use multiple scales to detect varying sizes of human figures.
    I have the same problem too regarding large humans. I believe that they are too big for detection because the training has been done on very small images and levelling simply makes things worse. If you have solved it please do let me know...

  2. Hello,

    I have not come across linking error on building the people_detect.cpp. I am using OpenCV 2.2 release and building/running with VS2008 Express on Windows XP. What kind of error are you getting? Maybe you could post the details to OpenCV Yahoo Discussion Group so that other people and me could help?
    I am new to CV. And I am in the process of going through the sample programs, trying to build an intuition to all of this. Are you working on some 'real' project?

  3. Thanks. Just sharing my learning experience. Feel free to point out any mistakes I might had.

  4. Nice post. Have you been able to simply isolate the HoG functionalist by itself? Specifically, I am just looking for a way to use the OpenCV Python bindings to compute HoG feaatures for user-specifcied locations in an image. Is there any OpenCV functions which takes as input an image I, a pixel location (x,y), parameters for the orientation angles and bins P, and the window size W, and then outputs the HoG feature in some easy-to-work-with format for that image patch?

    Without this functionality, it makes the OpenCV HoG descriptor kind of useless. *I would also be willing to write the above function in C++ if I can write the HoG output to a file and then work with it in Python.

    Any help in this area would be greatly appreciated.

  5. I have not gone back to study HoG since I posted the entry. There is a HOGDescriptor::compute() member that computes andreturns a descriptor from an image. Is this what you are looking for?

  6. Hi,
    I am planning on creating a detector for generic objects. Any idea on how I would be able to train a custom classifier instead of the default one?

    Thank you

  7. This is what I think (I have not done this): since the HOGDescriptor class uses SVM as classifier, I presume you could find a set of samples, generate HOG-descriptor-vectors for them with this HOGDescriptor class, train a SVM classifier with the vectors, obtain the coefficients, set it to the HOGDesriptor instance. Good luck!

  8. Hi, how can I get the programs to make pedestrian recognition?

  9. pl any one tel me how to train the hog and how to use train data to detect human. pl reply

  10. could u pl tell me how to train hog and how to use training data to detect human , if code are available pl let me know