Vehicle Detection using SSD on Floybhub: Udacity Self-driving Car Nano Degree

Single Shot Multibox Detector (SSD) on keras 1.2.2 and keras 2
SSD is a deep neural network that achieve 75.1% mAP on VOC2007 that outperform Faster R-CNN while having high FPS. (arxiv paper)

Github repo. here
ssd_download_essentials.ipynb: This notebook runs shell command that download code and model weights file, pip install moviepy package and etc.
SSD_car_detection.ipynb: This notebook is based on SSD.ipynb and slightly modified to perform vehicle/lane detection on project_video.mp4

Result using SSD:
with GPU (K80), I had about 12 frames per sec. processing the video.
w/o GPU, it was 0.1 frames per sec.
I did not train the model on the car images provided by udacity course. Instead, I use only weights file in the ssd_keras github above, which is probably trained on VOC2007.

Fancier version with lane detection and smoothed bounding boxes is shown below (full video). However it has terrible ONE FPS, caused by the non-optimized lane detection algo..

Flow chart


1. HOG features + Linear SVC + sliding window, which seems to be the default approach in udacity course? (source youtube video)
ezgif-2-981e659278FPS: about 2.5 (according to this jupyter notebook)

2. Deep learning + heap map approach by Marcus Erbar from this youtube video.  Trained on car images provided by udacity.
ezgif-2-bcc20109e8Obviusly udacity training data worsen the model prediction. The bounding box does not fit to the entire car, it rather tends to fit on the back/side views. On the contrary, the SSD model predict better result because VOC2007 provides more diverse and high quality car images.
FPS: 8 according to its github repo..

Other approaches on youtube:

In this video, the YOLO model is more sensitive than SSD that it constantly detecting cars at opposite lane. On the other hand, size of bounding box is more unstable (see 0:25 ~) and sometimes shows more than one bounding box on a car (same in this video). I wonder if non-maximum suppression is applied in their implementations.
FPS: 21 (source)

4. SVM?

5. Faster-RCNN

Although keeping zagging, the bounding boxes in the video generally cover the entire car. I would say its relatively stable. The most impressive part is that the confidence score stays high (>0.95) even when 2 cars start overlapping. This reminds me of a blog post writing that Faster-RCNN giving better detection result than SSD on kaggle fisheries monitoring competition.

It should be mentioned that the comparisons above are not rigorous in any sense. Due to the fact that I consider neither the hyperparameters, such as input resolution and confidence threshold value which are crucial in object detection, nor the hacks like smoothing bounding box and compensation to missing bounding box. These are just my observations and personal opinions.
However, I would like to recommend a paper from google that gives a comprehensive investigation on detection architectures. It’s readable and inspiring.

1. Apply Kalman filter to smooth the bounding box.
2. Combine lane and vehicle detection (DONE) terrible ONE fps on video processing.
Lane detection references:


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s