Raspberry Pi 3 Benchmarks

idata · ‎08-05-2017

I try to make it short for now as I spent a very week just before my vacation on trying set up proper benchmarking environments. Generally the overall experience of getting Caffe running on plain vanilla OSX, Ubuntu and RasPi Jessies is horrible. I will publish my really working scripts after I get back.

Putting this aside I wanted to see the numbers for devices seeing a cat in 'cat.jpg' with Squeezenet.

I grabbed the necessary files from

https://github.com/DeepScale/SqueezeNet/tree/master/SqueezeNet_v1.1

https://raw.githubusercontent.com/rmekdma/SqueezeNet/9d981310f66e5285083123cba364b3efa4a6ff55/SqueezeNet_v1.1/deploy.prototxt

With Caffe on CPU

time ./build/examples/cpp_classification/classification.bin \

models/squeezenet11/deploy.prototxt \

models/squeezenet11/squeezenet_v1.1.caffemodel \

data/ilsvrc12/imagenet_mean.binaryproto \

data/ilsvrc12/synset_words.txt \

examples/images/cat.jpg

gives me following averages:

0.15s (Macbook Air 13", early 2015, 1,6 Ghz Intel Core i5, 8Gb RAM running macOS Sierra)

0.19s (Lenovo Thinkpad T420S running latest Ubuntu)

1.1s (RasPi 3 running RasPi Jessie)

Now hacking classification_example.py as follows

...

print (str(datetime.now()))

output, userobj = graph.GetResult()

print (str(datetime.now()))

order = output.argsort()[::-1][:6]

print (str(datetime.now()))

...

the Movidius stick gives me from an Ubuntu averages of

0.307s on cat.jpg.

What should I run/test on to get numbers that give me a bigger smile? :)

PS: does anyone have a working install script for OpenCV on RasPi for Python3 (maybe without virtualenv)? That part is kinda just mentioned in the video, but the Interwebz give the general blob of almost working nothings in the topic -- it's especially painful as compiling OpenCV on a RasPi is far from being instant.

idata · ‎08-06-2017

This is somewhat disappointing news. .307s ?

idata · ‎08-06-2017

Been down that path before… For RasPi, takes about an hour if I'm remembering correctly. The OpenCV part:

cd ~

wget -O opencv.zip https://github.com/Itseez/opencv/archive/3.1.0.zip

unzip opencv.zip

wget -O opencv_contrib.zip https://github.com/Itseez/opencv_contrib/archive/3.1.0.zip

unzip opencv_contrib.zip

cd ~/opencv-3.1.0/

mkdir build

cd build

cmake -D CMAKE_BUILD_TYPE=RELEASE -D CMAKE_INSTALL_PREFIX=/usr/local -D INSTALL_PYTHON_EXAMPLES=ON -D OPENCV_EXTRA_MODULES_PATH=~/opencv_contrib-3.1.0/modules -D BUILD_EXAMPLES=ON ..

make -j4

sudo make install

sudo ldconfig

I use the following to get stuff in place prior to installing OpenCV (some of it specific to what I'm doing and some of it not needed for OpenCV, but…):

sudo apt-get update

sudo apt-get upgrade

sudo apt-get install build-essential cmake pkg-config

sudo apt-get install libjpeg-dev libtiff5-dev libjasper-dev libpng12-dev

sudo apt-get install libavcodec-dev libavformat-dev libswscale-dev libv4l-dev

sudo apt-get install libxvidcore-dev libx264-dev

sudo apt-get install libgtk2.0-dev

sudo apt-get install libatlas-base-dev gfortran

sudo apt-get install tesseract-ocr

sudo apt-get install python2.7-dev python3-dev

sudo apt-get install avahi-daemon

sudo pip3 install picamera

sudo pip3 install matplotlib

sudo pip3 install numpy

sudo pip3 install pytesseract

idata · ‎08-08-2017

@soobrosa Recompiling the graph file for SqueezeNet_v1.1 with the -s 12 option may give you a bigger smile.

This requires you to modify the deploy.prototxt file by changing the batch size to 1 (the default value is 10).

You can recompile the graph file by using mvNCCompile.pyc (included with the Movidius NCS SDK) using the squeezenet_v1.1.caffemodel weights and the -s 12 option which will enable usage of all 12 SHAVE vector processors simultaneously.

Afterwards, you can copy the new graph file over to the ncapi/networks/SqueezeNet directory and re-run classification_example.py with the "3" option to specify the SqueezeNet network.

Thank you and let us know your results.

idata · ‎09-06-2017

@Tome_at_Intel @chrispete thanks for weighing in!

-s 12 with Squeezenet 1.1 ended up with 41(!) ms. Impressive!

I made a short write-up here:

https://medium.com/@soobrosa/deep-learning-on-the-edge-first-impressions-of-the-movidius-neural-compute-stick-7de09eeca2d6

idata · ‎09-06-2017

41 ms is impressive. That was on a Raspberry Pi? Thanks for a great review.

Hope to see something with R CNN or YOLO type detection speeds.

I read the new Myriad X chip set only has 16 shave cores. Seems like it

should get about the same results.

Even though its on the front page of the Intel site https://www.intel.com/content/www/us/en/homepage.html

the specs and release dates are quite sketchy.

idata · ‎09-08-2017

@chicagobob123 technically it was on the Neural Stick, but I tried from both the Ubuntu and the Raspbian and it was the same 41 ms.