Intel® Distribution of OpenVINO™ Toolkit
Community assistance about the Intel® Distribution of OpenVINO™ toolkit, OpenCV, and all aspects of computer vision-related on Intel® platforms.
6403 Discussions

Incorrect inference results from a minimal tensorflow model

idata
Employee
3,287 Views

Hi,

 

I have a minimal example of a trivial tensorflow (v1.4) conv net that I train (to overfitting) with only two examples, freeze, convert with mvNCCompile, and then test on a compute stick.

 

The code, and steps, are fully described in the github repo movidius_minimal_example

 

No steps have warnings or errors; but the inference results I get on the stick are incorrect.

 

What should be my next debugging step?

 

Thanks,

 

Mat

 

note. mvccheck does fail, but i'm unsure if it's because of the structure of my minimal example…

 

$ mvNCCheck graph.frozen.pb -in imgs -on output mvNCCheck v02.00, Copyright @ Movidius Ltd 2016 /usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py:766: DeprecationWarning: builtin type EagerTensor has no __module__ attribute EagerTensor = c_api.TFE_Py_InitEagerTensor(_EagerTensorBase) /usr/local/lib/python3.5/dist-packages/tensorflow/python/util/tf_inspect.py:45: DeprecationWarning: inspect.getargspec() is deprecated, use inspect.signature() instead if d.decorator_argspec is not None), _inspect.getargspec(target)) USB: Transferring Data... USB: Myriad Execution Finished USB: Myriad Connection Closing. USB: Myriad Connection Closed. Result: (1, 1) 1) 0 0.46216 Expected: (1, 1) 1) 0 0.79395 ------------------------------------------------------------ Obtained values ------------------------------------------------------------ Obtained Min Pixel Accuracy: 41.789668798446655% (max allowed=2%), Fail Obtained Average Pixel Accuracy: 41.789668798446655% (max allowed=1%), Fail Obtained Percentage of wrong values: 100.0% (max allowed=0%), Fail Obtained Pixel-wise L2 error: 41.789667896678964% (max allowed=1%), Fail Obtained Global Sum Difference: 0.331787109375 ------------------------------------------------------------
0 Kudos
22 Replies
idata
Employee
2,539 Views

Reducing the model by removing the convolutional layers and it works. So I've got something to work on, it's something about the convolutions…

0 Kudos
idata
Employee
2,539 Views

If I use tf.layers.conv2d(model, filters=5, kernel_size=3) or slim.conv2d(model, num_outputs=5, kernel_size=3) I get the same results.

 

With padding=SAME they both give incorrect inference results,

 

With padding=VALID they both thrown an exception…

 

Traceback (most recent call last): File "./test_inference_on_ncs.py", line 29, in <module> output, _user_object = graph.GetResult() File "/usr/local/lib/python3.5/dist-packages/mvnc/mvncapi.py", line 264, in GetResult raise Exception(Status(status)) Exception: mvncStatus.MYRIAD_ERROR

 

I can't see what's fundamentally different between my definition of the conv2d layer compared to the models defined in the slim model zoo ….

0 Kudos
idata
Employee
2,539 Views

OK…. so a conv layer with >=8 outputs works (as mentioned in this post)

 

I agree with this comment though; what is the approach we should use for a fully convolutional image to image architecture (e.g. U-Net pix2pix) where we want the last output layer to be either 1 or 3 channels representing either a black and white or RGB image ?

0 Kudos
idata
Employee
2,539 Views

The method I'm going to use is just output 8 channels in final layer and for training (and inference) just slice off the first for loss calculation. It's unneeded work, but will get me going.

0 Kudos
idata
Employee
2,539 Views

@matpalm Can you post your frozen model so I can try to reproduce this issue on my bench?

0 Kudos
idata
Employee
2,539 Views

Sure, I'll do that today. Have a bundle of related cases I have repro steps for. I'll fork that GitHub repo I mentioned above with test cases for each & I'll check the models in too.

0 Kudos
idata
Employee
2,539 Views

see this github repo for models / reproduction steps etc https://github.com/matpalm/movidius_bug_reports

 

conv_with_8_filters works

 

conv_with_6_filters (same model but with =6 channels) fails
0 Kudos
idata
Employee
2,539 Views

also added an example of deconv failing with padding='SAME' under deconv_padding_same

0 Kudos
idata
Employee
2,539 Views

also added an example of output shape being wrong after conv -> deconv stack conv_deconv_output_shape_wrong

0 Kudos
idata
Employee
2,539 Views

at this point i can't get any workaround to work. all combos i can think of to hack my way out of not being able to use num_channels=1 don't work. so i'll put this project on hold & either check again next SDK release or sooner if you have things you'd like me to further test…

0 Kudos
idata
Employee
2,539 Views

@matpalm I'll be reviewing this issue today and I will get back to you if I need/find anything. Thanks.

0 Kudos
idata
Employee
2,539 Views

Great, thanks! No rush from my end; I'm going to be away for a couple of weeks. I have other cases but didn't get time to make test cases for them; I suspect they might be all the same underlying problem…

0 Kudos
idata
Employee
2,539 Views

@matpalm I have been able to reproduce your issues. At the moment, our SDK requires the output from convolution layers to be >= 8 as you've already seen. A possible workaround is to add a conf file to current working directory and name it the same name as your pb or meta file. For example if your model's name is "model.meta" then you create a brand new file called "model.conf".

 

In the conf file, add the convolution layer(s) (with an output that is less than 8) and in the line right below it, add the line "generic_spatial". This will choose a generic spatial convolution function. See the example below. Make sure to have an additional empty line as the last line in the conf file or else the SDK won't parse it correctly. Let me know if this helps. Thanks.

 

conv1 generic_spatial conv2 generic_spatial conv3 generic_spatial
0 Kudos
idata
Employee
2,539 Views

Thanks @Tome_at_Intel .

 

Got to try this today but still doesn't work sorry. The mode of failure is the same as far as I can see; output from running frozen network on host differs from running on compute stick…

 

e.g.

 

host_positive_prediction (1,) [ 1.] host_negative_prediction (1,) [ 1.38149336e-09] ncs_positive_prediction (1,) [ 0.99902344] ncs_negative_prediction (1,) [ 1.]

 

Just to confirm I've done the config correctly though;

 

When I include a conf file ….

 

e1/Conv2D generic_spatial

 

… and run ./test.sh conv_with_6_filters I see the output mvNCCompile include I see the output

 

Spec opt found opt_conv_generic_spatial 1<< 10 Layer (a) e1/Conv2D use the optimisation mask which is: 0x400 0 0x80000000 Layer fully_connected/MatMul use the generic optimisations which is: 0x80000000 0 0x80000000 Layer output use the generic optimisations which is: 0x80000000

 

( whereas when I include an empty conf file I see just

 

0 0x80000000 Layer e1/Conv2D use the generic optimisations which is: 0x80000000 0 0x80000000 Layer fully_connected/MatMul use the generic optimisations which is: 0x80000000 0 0x80000000 Layer output use the generic optimisations which is: 0x80000000

 

)

 

So it appears to be picking it up the config (?) but doesn't work?

0 Kudos
idata
Employee
2,539 Views

@matpalm Thanks for reporting this issue. I went back and ran your ./test.sh script multiple times without the conf file. It seems that I am able to get a passing result sometimes from the script using conv_with_6_filters.

 

host_positive_prediction (1,) [0.49975544] host_negative_prediction (1,) [0.49975544] ncs_positive_prediction (1,) [0.5] ncs_negative_prediction (1,) [0.5] PASS conv_with_6_filters

 

However sometimes I get failing results when running the test.sh script:

 

host_positive_prediction (1,) [0.49975544] host_negative_prediction (1,) [0.49975544] ncs_positive_prediction (1,) [0.5] ncs_negative_prediction (1,) [0.492] FAIL conv_with_6_filters

 

Not sure why this is happening this way when the same network is generated each time.

0 Kudos
idata
Employee
2,539 Views

yeah, apologies on my part, this is a fault of my overly simply reproduction script…. to save time i've set things up to do a minimal training run to try to build a classifier that maps one example to 0.0 and another to 1.0. the script runs a simple optimiser loop, for a very short time, and it's possible that the optimisation fails & in these cases i see what you've reported here; both the host_positive_prediction & host_negative_prediction are 0.5 . i see this sometimes when i run the script. the workaround is to rerun the script when this happens so you get host_positive_prediction 1.0 and host_negative_prediction 0.0. i should fix this, even if the optimisation takes longer it's better to be more realiable with the reproduction…

0 Kudos
idata
Employee
2,539 Views

( let me fix this so it reproducible every run; i'll ping you again when that's done )

0 Kudos
idata
Employee
2,539 Views

Hey @Tome_at_Intel after being away for a bit I've revisited this, thanks for waiting.

 

Have reduced it to an even more minimal example that for me trains every time (made more stable by using a network that does a simpler regression now). Can you take another look? https://github.com/matpalm/movidius_bug_reports

 

Running ./test.sh conv_with_regression three times I get …

 

expected positive_prediction [10] expected negativee_prediction [5] host_positive_prediction (1,) [ 10.] host_negative_prediction (1,) [ 5.] ncs_positive_prediction (1,) [ 4.50390625] ncs_negative_prediction (1,) [ 4.19921875]

 

expected positive_prediction [10] expected negativee_prediction [5] host_positive_prediction (1,) [ 10.] host_negative_prediction (1,) [ 5.] ncs_positive_prediction (1,) [ 5.109375] ncs_negative_prediction (1,) [ 4.82421875]

 

expected positive_prediction [10] expected negativee_prediction [5] host_positive_prediction (1,) [ 10.] host_negative_prediction (1,) [ 4.99999952] ncs_positive_prediction (1,) [ 5.5] ncs_negative_prediction (1,) [ 5.03125]
0 Kudos
idata
Employee
2,539 Views

updated to v2 of API in hope it might include a fix but still getting problems

 

git clone https://github.com/matpalm/movidius_bug_reports

 

and run_all_tests.sh if someone has time to sanity check what i'm doing wrong…

0 Kudos
idata
Employee
2,052 Views

Hi, I have been running the minimal example on ubuntu 16.04 with NCSDK 2.05 and setting the padding to VALID which now works. It seems to have fixed the error for all the filter sizes under 8.

 

I am still having a problem with the conv_with_regression output though, something weird is going on at the flatten layer.

 

I cannot get the other examples (conv 6 filter, conv shape wrong) as the output node BiasAdd doesn't exist. I tried using output/Sigmoid as that looks like what it should be from the graph file. What is the output node for these examples?

 

WITH FILTERS 5 PADDING SAME:

 

-ve prediction [[0.00200709]]

 

+ve prediction [[0.99799144]]

 

zeros prediction [[0.4813904]]

 

ones prediction [[0.68682736]]

 

ncs_negative_prediction (1,) [0.3876953]

 

ncs_positive_prediction (1,) [0.26342773]

 

ncs_zeros_prediction (1,) [0.48120117]

 

ncs_ones_prediction (1,) [0.40063477]

 

WITH FILTERS 5 PADDING VALID:

 

-ve prediction [[0.00233787]]

 

+ve prediction [[0.9975262]]

 

zeros prediction [[0.5118382]]

 

ones prediction [[0.43551898]]

 

ncs_negative_prediction (1,) [0.00232887]

 

ncs_positive_prediction (1,) [0.9970703]

 

ncs_zeros_prediction (1,) [0.51171875]

 

ncs_ones_prediction (1,) [0.43603516]

 

WITH FILTERS 4 PADDING SAME:

 

-ve prediction [[0.00228436]]

 

+ve prediction [[0.9978607]]

 

zeros prediction [[0.47025326]]

 

ones prediction [[0.18113402]]

 

ncs_negative_prediction (1,) [0.45507812]

 

ncs_positive_prediction (1,) [0.46972656]

 

ncs_zeros_prediction (1,) [0.49316406]

 

ncs_ones_prediction (1,) [0.42529297]

 

WITH FILTERS 4 PADDING VALID:

 

-ve prediction [[0.00204596]]

 

+ve prediction [[0.99782455]]

 

zeros prediction [[0.48026168]]

 

ones prediction [[0.2394675]]

 

ncs_negative_prediction (1,) [0.00204659]

 

ncs_positive_prediction (1,) [0.9980469]

 

ncs_zeros_prediction (1,) [0.4802246]

 

ncs_ones_prediction (1,) [0.23840332]

 

WITH FILTERS 3 PADDING VALID:

 

-ve prediction [[0.00195853]]

 

+ve prediction [[0.99770445]]

 

zeros prediction [[0.5915294]]

 

ones prediction [[0.6413879]]

 

ncs_negative_prediction (1,) [0.00197029]

 

ncs_positive_prediction (1,) [0.9980469]

 

ncs_zeros_prediction (1,) [0.5917969]

 

ncs_ones_prediction (1,) [0.64208984]

 

WITH FILTERS 2 PADDING VALID:

 

-ve prediction [[0.00195683]]

 

+ve prediction [[0.99803835]]

 

zeros prediction [[0.4144791]]

 

ones prediction [[0.27988973]]

 

ncs_negative_prediction (1,) [0.00194931]

 

ncs_positive_prediction (1,) [0.9980469]

 

ncs_zeros_prediction (1,) [0.41430664]

 

ncs_ones_prediction (1,) [0.27954102]
0 Kudos
Reply