Intel® Distribution of OpenVINO™ Toolkit
Community assistance about the Intel® Distribution of OpenVINO™ toolkit, OpenCV, and all aspects of computer vision-related on Intel® platforms.

Inference performance

idata
Employee
1,049 Views

Hi, I successfully ran ncs-fullcheck example and used it to inference several pictures. The performance of Alexnet is around 200ms and GoogLeNet is around 550ms. However, when I ran the profiling from tool kit (make example), it shows both AlexNet and GoogLeNet inference is around 90ms. There seem to be a gap between profile data and real inference time. Does anyone know where is this gap comes from (transfer image to the stick and retrieve result out, i.e.), and how do I get the performance the same as profiled?

 

Another question is the inference result seems different from caffe running on the same caffemodel (using cpp classifier), how do I get same result as using caffe?

 

Caffe: AlexNet

 

0.3094 - "n02124075 Egyptian cat"

 

0.1761 - "n02123159 tiger cat"

 

0.1221 - "n02123045 tabby, tabby cat"

 

0.1132 - "n02119022 red fox, Vulpes vulpes"

 

0.0421 - "n02085620 Chihuahua"

 

NCS

 

AlexNet

 

Egyptian cat (69.19%) tabby, tabby cat (6.59%) grey fox, gray fox, Urocyon cinereoargenteus (5.42%) tiger cat (3.93%) hare (3.52%)
0 Kudos
5 Replies
idata
Employee
691 Views

Hi akey,

 

We found an issue with our "ncapi/tools/convert_models.sh" script. You need to add the argument "-s 12" to mvNCCompile.pyc to enable all the vector engines. Please execute that script to regenerate the graph files and you should see the performance similar to that you were seeing with "make example01"

 

Thank You

 

Ramana @ Intel

 

Before the change

 

ubuntu@ubuntu-UP:~/workspace/MvNC_SDK/ncapi/c_examples$ ./ncs-fullcheck ../networks/GoogLeNet/ ../images/512_Amplifier.jpg

 

OpenDevice 4 succeeded

 

Graph allocated

 

radio, wireless (46.97%) CD player (31.79%) tape player (11.16%) cassette player (6.71%) cassette (1.78%)

 

Inference time: 569.302185 ms, total time 575.650308 ms

 

radio, wireless (46.97%) CD player (31.79%) tape player (11.16%) cassette player (6.71%) cassette (1.78%)

 

Inference time: 556.881409 ms, total time 562.636079 ms

 

Deallocate graph, rc=0

 

Device closed, rc=0

 

Change

 

cd ../tools

 

vi convert_models.sh

 

** Add -s 12 to all the compiles

 

!/bin/sh

 

NCS_TOOLKIT_ROOT='../../bin'

 

echo $NCS_TOOLKIT_ROOT

 

python3 $NCS_TOOLKIT_ROOT/mvNCCompile.pyc ../networks/SqueezeNet/NetworkConfig.prototxt -w ../networks/SqueezeNet/squeezenet_v1.0.caffemodel -o ../networks/SqueezeNet/graph -s 12

 

python3 $NCS_TOOLKIT_ROOT/mvNCCompile.pyc ../networks/GoogLeNet/NetworkConfig.prototxt -w ../networks/GoogLeNet/bvlc_googlenet.caffemodel -o ../networks/GoogLeNet/graph -s 12

 

python3 $NCS_TOOLKIT_ROOT/mvNCCompile.pyc ../networks/Gender/NetworkConfig.prototxt -w ../networks/Gender/gender_net.caffemodel -o ../networks/Gender/graph -s 12

 

python3 $NCS_TOOLKIT_ROOT/mvNCCompile.pyc ../networks/Age/deploy_age.prototxt -w ../networks/Age/age_net.caffemodel -o ../networks/Age/graph -s 12

 

python3 $NCS_TOOLKIT_ROOT/mvNCCompile.pyc ../networks/AlexNet/NetworkConfig.prototxt -w ../networks/AlexNet/bvlc_alexnet.caffemodel -o ../networks/AlexNet/graph -s 12

 

Execute the script

 

./convert_models.sh

 

cd ../c_examples

 

After the change

 

ubuntu@ubuntu-UP:~/workspace/MvNC_SDK/ncapi/c_examples$ ./ncs-fullcheck ../networks/GoogLeNet/ ../images/512_Amplifier.jpg

 

OpenDevice 4 succeeded

 

Graph allocated

 

radio, wireless (46.97%) CD player (31.79%) tape player (11.16%) cassette player (6.71%) cassette (1.78%)

 

Inference time: 108.950851 ms, total time 115.101073 ms

 

radio, wireless (46.97%) CD player (31.79%) tape player (11.16%) cassette player (6.71%) cassette (1.78%)

 

Inference time: 88.571877 ms, total time 95.765275 ms

 

Deallocate graph, rc=0

 

Device closed, rc=0
0 Kudos
idata
Employee
691 Views

Much faster now. Continous inference speed from webcam is about 9.5 FPS for GoogleNet. Thanks!

0 Kudos
idata
Employee
691 Views

@akey can you tell me how you calculate the FPS for GoogleNet, please ?

0 Kudos
idata
Employee
691 Views

@ibrahimsoliman in python you can use:

 

from timeit import default_timer as timer time_start = timer() CODE time_end = timer() print('FPS: %.2f fps' % (1000/(time_end-time_start)))
0 Kudos
idata
Employee
691 Views

One thing I don't get though with NCS speed is why it is not running at full 100 GOPS as advertised. For example, in SqueezeNet example below and all other networks, we can see

 

     

  1. MFLOPS estimate is 2x compared to actual op count. Is that because of fp16?
  2.  

  3. MFLOPS are calculated at ~1/3 speed of 100 GOPS. This ratio varies from 1/4 to 1/2 depending on tensor and convolution type.

     

    Movidus/Intel guys could you explain this and may be give some advise how to increase NCS efficiency?
  4.  

 

Detailed Per Layer Profile

 

Layer Name MFLOPs Bandwidth MB/s time(ms)

 

 

25 fire9/squeeze1x1 12.845 587.19 0.43

 

26 fire9/expand1x1 6.423 150.65 0.37

 

27 fire9/expand3x3 57.803 318.67 1.57

 

28 conv10 200.704 272.92 4.28

 

29 pool10 0.392 722.59 0.52

 

30 prob 0.003 10.49 0.18

 

Total inference time 26.89
0 Kudos
Reply