Intel® Distribution of OpenVINO™ Toolkit
Community assistance about the Intel® Distribution of OpenVINO™ toolkit, OpenCV, and all aspects of computer vision-related on Intel® platforms.

mvNCProfile Output Not Matching Tests

idata
Employee
818 Views

So I compiled a caffe network with mvNCCompile, set the amount out of shave cores to 12 and tested the network running on the NCS in some python code. I timed it with time.clock() and was getting inference times on the order of 150ms/image. However, when I run the mvNCProfile tool on the same network, also with "-s 12" it's telling me that inference is taking 1280ms. I don't understand how there could be an order of magnitude difference between my python tests and the results from mvNCProfile.

 

The Python Code is something like this:

 

for test_image in inference_images: try: t0 = time.clock() graph.LoadTensor(test_image.astype(np.float16), 'user object') except: print("Error Loading Tensor into Movidius Graph") try: output, userobj = graph.GetResult() t1 = time.clock() except: print("Error getting result from NCS") times.append(t1 - t0)

 

Printing times and the average of times yields:

 

[0.1364160000000001, 0.14857699999999996, 0.15344800000000003, 0.14042500000000002, 0.15225599999999995, 0.16213199999999972, 0.1494970000000002, 0.16063800000000006, 0.14944800000000003, 0.1511720000000003] 0.15040090000000003

 

According to the documentation these values are seconds represented with floats so I'm right around 150ms/image

 

However, here is what I see from mvNCProfile:

 

mvNCProfile v02.00, Copyright @ Movidius Ltd 2016 /usr/local/bin/ncsdk/Controllers/FileIO.py:52: UserWarning: You are using a large type. Consider reducing your data sizes for best performance "Consider reducing your data sizes for best performance\033[0m") USB: Transferring Data... Time to Execute : 1358.95 ms USB: Myriad Execution Finished Time to Execute : 1286.72 ms USB: Myriad Execution Finished USB: Myriad Connection Closing. USB: Myriad Connection Closed. Network Summary Detailed Per Layer Profile Bandwidth time Name MFLOPs (MB/s) (ms) =============================================================================== 0 data 0.0102011.9 0.005 1 conv1_1 311.0 298.2 15.556 2 conv1_2 6635.5 668.4 148.043 3 pool1 5.8 956.9 11.481 4 conv2_1 3317.8 419.5 59.264 5 conv2_2 6635.5 482.0 103.162 6 pool2 2.9 944.1 5.819 7 conv3_1 3317.8 168.7 76.599 8 conv3_2 6635.5 176.2 146.712 . . . 71 conv9_2_mbox_priorbox 0.0 20.4 0.024 72 mbox_conf_reshape 0.0 171.0 0.390 73 mbox_conf_softmax 0.1 526.9 0.126 74 mbox_conf_flatten 0.0 374.0 0.178 75 detection_out 0.0 38.3 1.738 Total inference time 1281.14 Generating Profile Report 'output_report.html'...

 

Can anyone give me some insight as to why using the Profile tool yields an inference time about 10x larger than what I see when I actually test the function calls in Python?

 

One thought I had was data type - what data type does mvNCProfile use? FP16, FP32, or FP64? Is there any way to change this and test inference times with different data types?

 

I'm using FP16 in my python code if that is relevant information.

 

Thanks!

0 Kudos
7 Replies
idata
Employee
533 Views

@sgiardi3 The time output that the mvNCProfile tool reports is strictly the time spent processing on the chip. It should typically be less vs LoadTensor() + GetResult(). The issue you are seeing could be due to a bug. I would like to reproduce this issue on my bench if you can provide a link to your network. mvNCProfile uses FP16 data. Also by the way, in your code, the LoadTensor() line, its actually timing the numpy fp16 conversion also. if you remove the fp16 line to outside of the timed code, you may even get a lower result. Thanks for reporting this. I hope you can provide a network for testing purposes.

0 Kudos
idata
Employee
533 Views

@Tome_at_Intel Thanks for the response! It seems as though I am having this issue with the SSD networks/weights available on github as well as the one's we fine tuned. For testing you can download the Pascal VOC Model from the SSD github.

 

https://github.com/weiliu89/caffe/tree/ssd

 

If you scroll down to the bottom I just verified this issue on the 07+12+COCO: SSD300* model/weights.

 

This was the Download Link

 

All I did was remove the following lines from the deploy.prototxt to remove the dependency on the label map:

 

save_output_param { label_map_file: "data/VOC0712/labelmap_voc.prototxt" }

 

I then compiled with:

 

mvNCCompile deploy.prototxt -w VGG_VOC0712_SSD_300x300_ft_iter_120000.caffemodel -s 12 -in conv1_1 -on detection_out -o mov_graph_coco

 

Profiled with:

 

mvNCProfile deploy.prototxt -w VGG_VOC0712_SSD_300x300_ft_iter_120000.caffemodel -s 12 -in conv1_1 -on detection_out

 

Profile Result:

 

mvNCProfile v02.00, Copyright @ Movidius Ltd 2016

 

/usr/local/bin/ncsdk/Controllers/FileIO.py:52: UserWarning: You are using a large type. Consider reducing your data sizes for best performance "Consider reducing your data sizes for best performance\033[0m") USB: Transferring Data... Time to Execute : 1412.68 ms USB: Myriad Execution Finished Time to Execute : 1336.27 ms USB: Myriad Execution Finished USB: Myriad Connection Closing. USB: Myriad Connection Closed. Network Summary Detailed Per Layer Profile Bandwidth time # Name MFLOPs (MB/s) (ms) =============================================================================== 0 data 0.0 98093.3 0.005 1 conv1_1 311.0 298.0 15.569 2 conv1_2 6635.5 668.3 148.056 3 pool1 5.8 956.8 11.483 4 conv2_1 3317.8 420.0 59.199 5 conv2_2 6635.5 482.3 103.094 6 pool2 2.9 945.4 5.811 7 conv3_1 3317.8 169.4 76.295 . . . 69 conv9_2_mbox_conf_perm 0.0 7.0 0.023 70 conv9_2_mbox_conf_flat 0.0 22.4 0.014 71 conv9_2_mbox_priorbox 0.0 4.6 0.107 72 mbox_conf_reshape 0.0 152.7 2.290 73 mbox_conf_softmax 0.6 600.7 0.582 74 mbox_conf_flatten 0.0 196.3 1.782 75 detection_out 0.0 19.3 3.468 ------------------------------------------------------------------------------- Total inference time 1330.09 -------------------------------------------------------------------------------

 

Python Testing code:

 

import mvnc.mvncapi as fx import os import cv2 import numpy as np import time fx.SetGlobalOption(fx.GlobalOption.LOG_LEVEL, 2) dim = (300, 300) ncs_names = fx.EnumerateDevices() if len(ncs_names) < 1: print('Error: No NCS devices detected!') quit() ncs_device = fx.Device(ncs_names[0]) network_graph = #Path to "mov_graph_coco" test_image_path = #Path to test images inference_images = [] for image_name in os.listdir(test_image_path): image = cv2.imread(os.path.join(test_image_path, image_name)) image = cv2.resize(image, dim) image = image.astype(np.float32) inference_images.append(image) print('Image') try: ncs_device.OpenDevice() except: print('Error: NCS Device could not be opened') quit() print('Device Opened Normally!') with open(network_graph, mode='rb') as f: blob = f.read() graph = ncs_device.AllocateGraph(blob) img_num = 0 times = [] for test_image in inference_images: try: t0 = time.clock() graph.LoadTensor(test_image.astype(np.float16), 'user object') except: print("Error Loading Tensor into Movidius Graph") try: output, userobj = graph.GetResult() t1 = time.clock() except: print("Error getting result from NCS") times.append(t1 - t0) try: detection = [] detections = [] for i in range(len(output)): detection.append(output[i]) if i > 0 and i % 7 == 0: print("Detection ", int(i/7), " for Image: ", img_num, "-------------------") print(detection) detections.append(detection) detection = [] except: print("Detection Processing Error") try: ncs_device.CloseDevice() except: print('Error: Could not close NCS Device!') quit() print('Closed NCS Device! Everything worked properly') print(times) print(np.average(times))

 

Python Script Results:

 

[0.16148699999999971, 0.15405300000000022, 0.15758600000000023, 0.1580459999999997, 0.16678099999999985, 0.16459499999999982, 0.16467399999999977, 0.170118, 0.15936899999999987, 0.16317100000000018] 0.16198799999999994

 

And again the mvNCProfile tool is returning an inference time almost 10x larger than my python test.

 

Thanks for your help!
0 Kudos
idata
Employee
533 Views

@sgiardi3 Thanks for providing the network and the thorough instructions. We were able to reproduce the issue and are investigating. Thank you for your patience.

0 Kudos
idata
Employee
533 Views

@Tome_at_Intel Just wanted to check in and see if there has been any update on this.

0 Kudos
idata
Employee
533 Views

@sgiardi3 Sorry no update for this particular issue yet. As soon as I find something out, I will let you know. Thanks.

0 Kudos
idata
Employee
533 Views

I'm having the same issue. Profile tool reporting 540ms. I'm measuring 45ms timing in Python (and that includes the time to send and receive the data from the compute stick).

 

An update on this would be very helpful.

0 Kudos
idata
Employee
533 Views

@sgiardi3 - try time.time(). My results are matching profiler now.

0 Kudos
Reply