Intel® Distribution of OpenVINO™ Toolkit
Community assistance about the Intel® Distribution of OpenVINO™ toolkit, OpenCV, and all aspects of computer vision-related on Intel® platforms.

Why NCS output are "nan" or "inf"?

idata
Employee
1,243 Views

I have tried several caffe models on NCS.Two of them can run correctly.However one of the models' output is unnormalized.The C logs printed followings:

 

making run_cpp

 

cd cpp; ./run_cpp; cd ..

 

Successfully opened NCS device!

 

Successfully allocated graph for ../graph

 

w= 96 ,h= 112

 

Successfully loaded the tensor for image ../../../data/images/044.jpg

 

Successfully got the inference result for image ../../../data/images/044.jpg

 

resultData is 21080 bytes which is 10540 16-bit floats.

 

Index of top result is: -1

 

Probability of top result is: 0.000000

 

All my 10540 categories' probabilities are all "nan" so that the index of top result is -1 with 0.00000 probability.

 

The models have 90 layers with softmax output.And image resolution is 96*112.I have done scaling(1/128) when training.And I use one NCS to run the model.

 

I am wondering about that why the NCS's output is "nan"?

 

Can someone give me some suggestions?Thanks a lot!
0 Kudos
16 Replies
idata
Employee
945 Views

The fp16 data outputted from NCS seems overflowed when transformed to fp32.Is this caused by training in float or other reasons maybe??

0 Kudos
idata
Employee
945 Views

@ssliu Thanks for reporting this. I would like to test the networks you tried and see if I can reproduce this issue myself. If you could provide a link to the networks you used, that would be very useful. Thanks again.

0 Kudos
idata
Employee
945 Views

@Tome_at_Intel How can I share the networks to you ?With the large model and deploy.prototxt and an image for detect?

0 Kudos
idata
Employee
945 Views

@ssliu Github links/dropbox links would work.

0 Kudos
idata
Employee
945 Views

@Tome_at_Intel The github links is https://github.com/sunnySSliu/NCS_comunicate.git. It used the Git LFS.

0 Kudos
idata
Employee
945 Views

@ssliu Thanks for providing your network and images. I tried your network and I had no issues running your program and network, although I did receive very large probabilities. For your provided image file 012.jpg, I received a index of 9309 and a probability of 506.50000000. Please try this again and see if you were able to get the same result. Thanks.

0 Kudos
idata
Employee
945 Views

@Tome_at_Intel Maybe you have to make a revise of the makefile .Now the output is not the last layer,you can delete the "-on" parameter of mvNCCompile in makefile.Thanks.

0 Kudos
idata
Employee
945 Views

The large probabilities seems wrongly and the probabilities goes larger and larger between layers until overflowed.So,at one of the layers,output has going to "nan" or "inf".I also have run the model on PC,and part of the output of fc5 layer is like below:

 

-1.12684 -0.263021 -1.17812 1.32531 0.319905 -0.0244644 -2.2793 0.140543 0.321131 -0.328076 0.985858 -3.03365 0.615863 -0.919282 -0.639025 1.48831 -1.14018 -0.144667 0.601186 0.142621 -1.43988 0.475403 -1.26686 -0.507238 -0.93005 0.427848 -0.728527 -1.86215 0.448485 -0.182363 -1.65505 -0.790986 -0.065641 -3.27148 0.636745 1.10612 -1.17482 -2.7694 -0.329878 -1.14171 -1.28136 1.32891 -0.511056 -0.557872 0.354566 0.340552 -0.807782 -1.80585 -1.60028 -0.346055 0.224037 -0.297021 -0.0793115 -1.65875 0.320309 -0.0186328 0.607013 0.784389 -0.377752 -1.7018 -2.22441 1.67076 0.251789 0.551989 -1.32332 0.742055 -0.366883 0.578996 0.35062 0.147858 -0.918806 -0.347651 -1.25856 2.46879 0.417314 -1.09436 -1.0119 0.511565 -1.68811 0.5903 2.67939 -1.11716 -2.07182 1.58757 -1.29253 -1.78229 -0.485864 -1.05624 -0.926988 -0.748615 0.904353 -0.1299 -0.541021 1.06192 -1.14407 0.589872 -0.942095 1.28527 1.05734 -0.297605 -1.93476 -0.526176 0.713658 0.37538 -0.156158 0.365746 -0.750868 -0.835334 -0.896935 0.678555 0.0420146 0.716639 -0.000560202 -0.681816 0.856575 -0.132422 -1.92006 1.18765 -0.254363 -0.525324 0.100624 -0.374459 -1.40312 -1.86245 1.22713 -0.293825 2.31105 -0.885568 -1.46361 0.874743 -1.66305 -1.88179 -0.49293 -0.779195 1.70638 1.24512 -0.279403 2.41547 0.582597 -0.406534 -1.8943 -1.15343 -2.33131 -0.339133 -0.386461 -0.943876 -0.981108 0.807584 -0.916267 0.922208 0.369756 -0.396057 -1.15999 1.67288 1.22885 -1.38413 -0.650468

 

They are not so large.Is this may caused by the model training in float?Or the input is 96*112 which is not square?

0 Kudos
idata
Employee
945 Views

@Tome_at_Intel How was the debugging on your side ?

0 Kudos
idata
Employee
945 Views

@ssliu It is better to try with square images and see if same issues will appear then make comparison.

0 Kudos
idata
Employee
945 Views

@georgievm_cms I have already tried this.As the model is trained in 96*112,so I got the same error output if I give a square image to it.

0 Kudos
idata
Employee
945 Views

@ssliu I think the right test case in your situation will be to train the model either in 96x96 or 112x112 and then use test images with same size as trained ones.

0 Kudos
idata
Employee
945 Views

@georgievm_cms We would train it in square as soon as our server being idled.However it can't solve the fundamental problem ,right?Please help to debug the original problem.Thanks a lot.

0 Kudos
idata
Employee
945 Views

Hey, I encountered the same problem. The cause maybe many, but one of them, and it was in my case, is that input tensor must be in float16 format, regardless of network's data type (as far as I experience). And this is not addressed in any docs or commented in the example codes.

0 Kudos
idata
Employee
945 Views

@dvpo There is more information about using the NCSDK with the NCS @ https://movidius.github.io/ncsdk/. Regarding using the input tensor with float16 data types, you can visit the C and Python API portions of the documentation site @ https://movidius.github.io/ncsdk/c_api/mvncLoadTensor.html and https://movidius.github.io/ncsdk/py_api/Graph.LoadTensor.html.

0 Kudos
idata
Employee
945 Views

@Tome_at_Intel I did not realize that there is a doc on this. Thanks for pointing out.

0 Kudos
Reply