- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have tried several caffe models on NCS.Two of them can run correctly.However one of the models' output is unnormalized.The C logs printed followings:
making run_cpp
cd cpp; ./run_cpp; cd ..
Successfully opened NCS device!
Successfully allocated graph for ../graph
w= 96 ,h= 112
Successfully loaded the tensor for image ../../../data/images/044.jpg
Successfully got the inference result for image ../../../data/images/044.jpg
resultData is 21080 bytes which is 10540 16-bit floats.
Index of top result is: -1
Probability of top result is: 0.000000
All my 10540 categories' probabilities are all "nan" so that the index of top result is -1 with 0.00000 probability.
The models have 90 layers with softmax output.And image resolution is 96*112.I have done scaling(1/128) when training.And I use one NCS to run the model.
I am wondering about that why the NCS's output is "nan"?
Can someone give me some suggestions?Thanks a lot!
- Tags:
- Tensorflow
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The fp16 data outputted from NCS seems overflowed when transformed to fp32.Is this caused by training in float or other reasons maybe??
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@ssliu Thanks for reporting this. I would like to test the networks you tried and see if I can reproduce this issue myself. If you could provide a link to the networks you used, that would be very useful. Thanks again.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Tome_at_Intel How can I share the networks to you ?With the large model and deploy.prototxt and an image for detect?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@ssliu Github links/dropbox links would work.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Tome_at_Intel The github links is https://github.com/sunnySSliu/NCS_comunicate.git. It used the Git LFS.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@ssliu Thanks for providing your network and images. I tried your network and I had no issues running your program and network, although I did receive very large probabilities. For your provided image file 012.jpg, I received a index of 9309 and a probability of 506.50000000. Please try this again and see if you were able to get the same result. Thanks.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Tome_at_Intel Maybe you have to make a revise of the makefile .Now the output is not the last layer,you can delete the "-on" parameter of mvNCCompile in makefile.Thanks.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The large probabilities seems wrongly and the probabilities goes larger and larger between layers until overflowed.So,at one of the layers,output has going to "nan" or "inf".I also have run the model on PC,and part of the output of fc5 layer is like below:
-1.12684 -0.263021 -1.17812 1.32531 0.319905 -0.0244644 -2.2793 0.140543 0.321131 -0.328076 0.985858 -3.03365 0.615863 -0.919282 -0.639025 1.48831 -1.14018 -0.144667 0.601186 0.142621 -1.43988 0.475403 -1.26686 -0.507238 -0.93005 0.427848 -0.728527 -1.86215 0.448485 -0.182363 -1.65505 -0.790986 -0.065641 -3.27148 0.636745 1.10612 -1.17482 -2.7694 -0.329878 -1.14171 -1.28136 1.32891 -0.511056 -0.557872 0.354566 0.340552 -0.807782 -1.80585 -1.60028 -0.346055 0.224037 -0.297021 -0.0793115 -1.65875 0.320309 -0.0186328 0.607013 0.784389 -0.377752 -1.7018 -2.22441 1.67076 0.251789 0.551989 -1.32332 0.742055 -0.366883 0.578996 0.35062 0.147858 -0.918806 -0.347651 -1.25856 2.46879 0.417314 -1.09436 -1.0119 0.511565 -1.68811 0.5903 2.67939 -1.11716 -2.07182 1.58757 -1.29253 -1.78229 -0.485864 -1.05624 -0.926988 -0.748615 0.904353 -0.1299 -0.541021 1.06192 -1.14407 0.589872 -0.942095 1.28527 1.05734 -0.297605 -1.93476 -0.526176 0.713658 0.37538 -0.156158 0.365746 -0.750868 -0.835334 -0.896935 0.678555 0.0420146 0.716639 -0.000560202 -0.681816 0.856575 -0.132422 -1.92006 1.18765 -0.254363 -0.525324 0.100624 -0.374459 -1.40312 -1.86245 1.22713 -0.293825 2.31105 -0.885568 -1.46361 0.874743 -1.66305 -1.88179 -0.49293 -0.779195 1.70638 1.24512 -0.279403 2.41547 0.582597 -0.406534 -1.8943 -1.15343 -2.33131 -0.339133 -0.386461 -0.943876 -0.981108 0.807584 -0.916267 0.922208 0.369756 -0.396057 -1.15999 1.67288 1.22885 -1.38413 -0.650468
They are not so large.Is this may caused by the model training in float?Or the input is 96*112 which is not square?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Tome_at_Intel How was the debugging on your side ?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@ssliu It is better to try with square images and see if same issues will appear then make comparison.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@georgievm_cms I have already tried this.As the model is trained in 96*112,so I got the same error output if I give a square image to it.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@ssliu I think the right test case in your situation will be to train the model either in 96x96 or 112x112 and then use test images with same size as trained ones.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@georgievm_cms We would train it in square as soon as our server being idled.However it can't solve the fundamental problem ,right?Please help to debug the original problem.Thanks a lot.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hey, I encountered the same problem. The cause maybe many, but one of them, and it was in my case, is that input tensor must be in float16 format, regardless of network's data type (as far as I experience). And this is not addressed in any docs or commented in the example codes.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@dvpo There is more information about using the NCSDK with the NCS @ https://movidius.github.io/ncsdk/. Regarding using the input tensor with float16 data types, you can visit the C and Python API portions of the documentation site @ https://movidius.github.io/ncsdk/c_api/mvncLoadTensor.html and https://movidius.github.io/ncsdk/py_api/Graph.LoadTensor.html.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Tome_at_Intel I did not realize that there is a doc on this. Thanks for pointing out.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page