FPGA, SoC, And CPLD Boards And Kits
FPGA Evaluation and Development Kits
5932 Discussions

Arria 10 SoC Development Kit: Not able to run Intel FPGA AI Suite Soc Design Example

RajanVaja
Beginner
876 Views

I want to run "S2M Mode Demonstration Application" from Intel FPGA AI Suite Soc Design Example User Guide example on Arria 10 SoC Development Kit. I am using prebuilt images from $COREDLA_ROOT/demo/ed4/a10_soc_s2m/sd-card/coredla-image-arria10.wic. However, booting is stuck at U-boot SPL during DRAM initialization with below error:

-----------------------------------------------------------

U-Boot SPL 2022.07 (Jan 06 2023 - 03:46:57 +0000)
FPGA: Checking FPGA configuration setting ...
FPGA: Start to program peripheral/full bitstream ...
FPGA: Early Release Succeeded.
FPGA: Checking FPGA configuration setting ...
FPGA: Start to program peripheral/full bitstream ...
FPGA: Early Release Succeeded.

U-Boot SPL 2022.07 (Jan 06 2023 - 03:46:57 +0000)
DDRCAL: Success
DDR: SDRAM size check failed!
### ERROR ### Please RESET the board ###

-------------------------------------------------------------

I tried FPGA AI Suite 2023.2 and 2023.3.1 prebuilt images and observations are same. 

Below is more information on Hardware setup:

- Model Name: DEV KIT DKSOC10AS066SE

- SW1: 1=>OFF, 2=>OFF, 3=>OFF, 4=>OFF

- SW2 : 1=>OFF, 2=>OFF, 3=> OFF, 4=>OFF, 5=> ON, 6=>ON, 7=>ON, 8=>ON

- SW3 : 1=>OFF, 2=>OFF, 3=> ON, 4=>ON, 5=> ON, 6=>OFF, 7=>OFF, 8=>OFF

 - Attached board top image

 

However, Image from https://releases.rocketboards.org/2023.09/gsrd/a10_gsrd/sdimage.tar.gz,  works fine. What could be reason S2M example prebuilt image is not booting?

Labels (1)
0 Kudos
10 Replies
khtan
Employee
817 Views

Hi Rajan

 

Thank you for reaching out our FPGA community. I'm Kian and I will be looking into this case.

Let me investigate on my end first to see any known issues with the pre built image  and I will get back to you .

 

Thanks

Regards

Kian

 

0 Kudos
khtan
Employee
772 Views

Hi Rajan 

I apologize for the delay in getting back, so far I've checked through our cases , knowledge base or any known issues,  and looks like there's none . Initially I suspected the DIP switches and jumper settings  as it has slight config differences from the rocketboard (https://www.rocketboards.org/foswiki/Documentation/Arria10SoCGSRD) but you mentioned that running the image from rocketboards on SDCARD has no issues , so probably that rule out the switches and jumper settings.

 

Currently I'm setting up a system to do the test on the precompiled image with Arria 10 to see whether I could duplicate the same issue out on my end and will update you later

 

Thanks

Regards

Kian

0 Kudos
RajanVaja
Beginner
657 Views

Hi Kian,

We did some experiments and below are some observation which might be useful to root cause the issue.

 

In the design example given AI design suit HPS DDR4 memory interfaced using following IO interface 

   // HPS Memory
   input   wire                             emif_ref_clk,
   input   wire                             hps_memory_oct_rzqin,
   output  wire  [0:0]                      hps_memory_mem_ck,
   output  wire  [0:0]                      hps_memory_mem_ck_n,
   output  wire  [16:0]                     hps_memory_mem_a,
   output  wire  [0:0]                      hps_memory_mem_act_n,
   output  wire  [1:0]                      hps_memory_mem_ba,
   output  wire  [0:0]                      hps_memory_mem_bg,
   output  wire  [0:0]                      hps_memory_mem_cke,
   output  wire  [0:0]                      hps_memory_mem_cs_n,
   output  wire  [0:0]                      hps_memory_mem_odt,
   output  wire  [0:0]                      hps_memory_mem_reset_n,
   output  wire  [0:0]                      hps_memory_mem_par,
   input   wire  [0:0]                      hps_memory_mem_alert_n,
   inout   wire  [3:0]                      hps_memory_mem_dqs,
   inout   wire  [3:0]                      hps_memory_mem_dqs_n,
   inout   wire  [31:0]                     hps_memory_mem_dq,
   inout   wire  [3:0]                      hps_memory_mem_dbi_n, 

Above signal definitions are not complying to 1 GB DDR4 (256 Mb x 40 x single rank) memory with board.

We updated memory interface according to provided DDR memory 1 GB DDR4 (256 Mb x 40 x single rank), now we are able to boot the Arria 10 SOC board by referring Arria 10 SoC GSRD golden example

 

However, we are still unable to run demo example. While running demo app, we are getting DLA timeout error as below (We tried M2M and S2M both and getting same observations).

 

root@arria10-a2524a6b645b:~/app# ./dla_benchmark -b=1 -cm $compiled_model -d=HETERO:FPGA,CPU -i $imgdir -niter=5 -plugins_xml_file ./plugins.xml -arch
_file $archfile -api=async -groundtruth_loc $imgdir/TF_ground_truth.txt -perf_est -nireq=4 -bgr
[Step 1/12] Parsing and validating input arguments
[ INFO ] Parsing input parameters
[ INFO ] Found 1 compiled graph
[ INFO ] Using custom plugins xml file - ./plugins.xml
[ INFO ] Network is compiled
[ INFO ] Printing summary of arguments being used by dla_benchmark
[ INFO ] API (-api) ........................... async
[ INFO ] Device (-d) .......................... HETERO:FPGA,CPU
[ INFO ] Batch size (-b) ...................... 1
[ INFO ] Compiled model (-cm) ................. /home/root/resnet-50-tf/RN50_Performance_b1.bin
[ INFO ] Num iterations (-niter) .............. 5
[ INFO ] Input images directory (-i) .......... /home/root/resnet-50-tf/sample_images
[ INFO ] Num CPU threads (-nthreads) .......... Not specified
[ INFO ] Architecture file (-arch_file) ....... /home/root/resnet-50-tf/A10_Performance.arch
[ INFO ] Num inference requests (-nireq) ...... 4
[ INFO ] Plugins file (-plugins_xml_file) ..... ./plugins.xml
[ INFO ] Groundtruth file (-groundtruth_loc) .. /home/root/resnet-50-tf/sample_images/TF_ground_truth.txt
[ INFO ] Reverse input image channels (-bgr) .. True
[ INFO ] Reading /home/root/resnet-50-tf/sample_images for graph index 0
[ WARNING ] -nstreams default value is determined automatically for a device. 
        Although the automatic selection usually provides a reasonable performance, 
        but it still may be non-optimal for some cases, for more information look at README.
[Step 2/12] Loading Inference Engine
[ INFO ] OpenVINO: Build ................................. 2022.3.0-9052-9752fafe8eb-HEAD
[ INFO ] 
[Step 3/12] Setting device configuration
[Step 4/12] Reading the Intermediate Representation network
[ INFO ] Skipping the step for compiled network
[Step 5/12] Resizing network to match image sizes and given batch
[ INFO ] Skipping the step for compiled network
[Step 6/12] Configuring input of the model
[ INFO ] Skipping the step for compiled network
[Step 7/12] Loading the model to the device
[ INFO ] Importing model from /home/root/resnet-50-tf/RN50_Performance_b1.bin to HETERO:FPGA,CPU as Graph_0
Runtime arch check is enabled. Check started...
Runtime arch check passed.
Runtime build version check is enabled. Check started...
Runtime build version check passed.
[ INFO ] Import network took 3493.0785 ms
[Step 8/12] Setting optimal runtime parameters
[ WARNING ] Number of iterations was aligned by request number from 5 to 8 using number of requests 4
[Step 9/12] Creating infer requests and filling input blobs with images
[ INFO ] Filling input blobs for network ( Graph_0 )
[ INFO ] Network input 'map/TensorArrayStack/TensorArrayGatherV3' precision U8, dimensions (NCHW): 1 3 224 224 
[ WARNING ] Some image input files will be ignored: only 8 are required from 10
[Step 10/12] Measuring performance (Start inference asyncronously, 4 inference requests using 1 streams for CPU, limits: 8 iterations with each graph)

WaitForDla polling timeout with threadId_0
If inference on one batch is expected to take more than 30 seconds, then increase WAIT_FOR_DLA_TIMEOUT in dlia_plugin.cpp and recompile the runtime.

../src/inference/src/ie_common.cpp:75 FATAL ERROR: inference on FPGA did not complete, jobs finished 0, jobs waited 0
[ ERROR ] Infer failed

 

Also, FPGA DDR4 test shows error from the BTS tool (attached screenshot).

0 Kudos
RajanVaja
Beginner
678 Views

Hi Kian,

 

We did some experiments which might be helpful. Please see our observations below:

In the design example given AI design suit HPS DDR4 memory interfaced using following IO interface 

   // HPS Memory
   input   wire                             emif_ref_clk,
   input   wire                             hps_memory_oct_rzqin,
   output  wire  [0:0]                      hps_memory_mem_ck,
   output  wire  [0:0]                      hps_memory_mem_ck_n,
   output  wire  [16:0]                     hps_memory_mem_a,
   output  wire  [0:0]                      hps_memory_mem_act_n,
   output  wire  [1:0]                      hps_memory_mem_ba,
   output  wire  [0:0]                      hps_memory_mem_bg,
   output  wire  [0:0]                      hps_memory_mem_cke,
   output  wire  [0:0]                      hps_memory_mem_cs_n,
   output  wire  [0:0]                      hps_memory_mem_odt,
   output  wire  [0:0]                      hps_memory_mem_reset_n,
   output  wire  [0:0]                      hps_memory_mem_par,
   input   wire  [0:0]                      hps_memory_mem_alert_n,
   inout   wire  [3:0]                      hps_memory_mem_dqs,
   inout   wire  [3:0]                      hps_memory_mem_dqs_n,
   inout   wire  [31:0]                     hps_memory_mem_dq,
   inout   wire  [3:0]                      hps_memory_mem_dbi_n, 

Above signal definitions are not complying to 1 GB DDR4 (256 Mb x 40 x single rank) memory with board.

We updated memory interface according to provided DDR memory 1 GB DDR4 (256 Mb x 40 x single rank), now we are able to boot the Arria 10 SOC board by referring Arria 10 SoC GSRD golden example

 

However, we are still not able to  run FPGA AI  Suite SoC Design example (tried S2M and M2M and observations are same as below). We are getting DLA timeout when running inference. 

 

root@arria10-a2524a6b645b:~/app# ./dla_benchmark -b=1 -cm $compiled_model -d=HETERO:FPGA,CPU -i $imgdir -niter=5 -plugins_xml_file ./plugins.xml -arch
_file $archfile -api=async -groundtruth_loc $imgdir/TF_ground_truth.txt -perf_est -nireq=4 -bgr
[Step 1/12] Parsing and validating input arguments
[ INFO ] Parsing input parameters
[ INFO ] Found 1 compiled graph
[ INFO ] Using custom plugins xml file - ./plugins.xml
[ INFO ] Network is compiled
[ INFO ] Printing summary of arguments being used by dla_benchmark
[ INFO ] API (-api) ........................... async
[ INFO ] Device (-d) .......................... HETERO:FPGA,CPU
[ INFO ] Batch size (-b) ...................... 1
[ INFO ] Compiled model (-cm) ................. /home/root/resnet-50-tf/RN50_Performance_b1.bin
[ INFO ] Num iterations (-niter) .............. 5
[ INFO ] Input images directory (-i) .......... /home/root/resnet-50-tf/sample_images
[ INFO ] Num CPU threads (-nthreads) .......... Not specified
[ INFO ] Architecture file (-arch_file) ....... /home/root/resnet-50-tf/A10_Performance.arch
[ INFO ] Num inference requests (-nireq) ...... 4
[ INFO ] Plugins file (-plugins_xml_file) ..... ./plugins.xml
[ INFO ] Groundtruth file (-groundtruth_loc) .. /home/root/resnet-50-tf/sample_images/TF_ground_truth.txt
[ INFO ] Reverse input image channels (-bgr) .. True
[ INFO ] Reading /home/root/resnet-50-tf/sample_images for graph index 0
[ WARNING ] -nstreams default value is determined automatically for a device. 
        Although the automatic selection usually provides a reasonable performance, 
        but it still may be non-optimal for some cases, for more information look at README.
[Step 2/12] Loading Inference Engine
[ INFO ] OpenVINO: Build ................................. 2022.3.0-9052-9752fafe8eb-HEAD
[ INFO ] 
[Step 3/12] Setting device configuration
[Step 4/12] Reading the Intermediate Representation network
[ INFO ] Skipping the step for compiled network
[Step 5/12] Resizing network to match image sizes and given batch
[ INFO ] Skipping the step for compiled network
[Step 6/12] Configuring input of the model
[ INFO ] Skipping the step for compiled network
[Step 7/12] Loading the model to the device
[ INFO ] Importing model from /home/root/resnet-50-tf/RN50_Performance_b1.bin to HETERO:FPGA,CPU as Graph_0
Runtime arch check is enabled. Check started...
Runtime arch check passed.
Runtime build version check is enabled. Check started...
Runtime build version check passed.
[ INFO ] Import network took 3493.0785 ms
[Step 8/12] Setting optimal runtime parameters
[ WARNING ] Number of iterations was aligned by request number from 5 to 8 using number of requests 4
[Step 9/12] Creating infer requests and filling input blobs with images
[ INFO ] Filling input blobs for network ( Graph_0 )
[ INFO ] Network input 'map/TensorArrayStack/TensorArrayGatherV3' precision U8, dimensions (NCHW): 1 3 224 224 
[ WARNING ] Some image input files will be ignored: only 8 are required from 10
[Step 10/12] Measuring performance (Start inference asyncronously, 4 inference requests using 1 streams for CPU, limits: 8 iterations with each graph)

WaitForDla polling timeout with threadId_0
If inference on one batch is expected to take more than 30 seconds, then increase WAIT_FOR_DLA_TIMEOUT in dlia_plugin.cpp and recompile the runtime.

../src/inference/src/ie_common.cpp:75 FATAL ERROR: inference on FPGA did not complete, jobs finished 0, jobs waited 0
[ ERROR ] Infer failed

 

We also noticed that FPGA DDR4 test in BTS are failing. Attached screenshot. 

 

Please let me know if you need any additional information.

0 Kudos
khtan
Employee
630 Views

Hi Rajan,

I apologize for the delay in response, was in offsite training for 3 days. Anyway I'm glad that you found the issue of the HPS memory IO not configured correctly. I've setup a system with fpga ai suite with one Arria 10 SOC devkit and also observed the SDRAM calibration failed. I will put a note to our guys to check whether example design needs to be updated.

 

As for the new issues observed DLA timeout when running interference, just want to confirm whether you have gone through the steps in (https://www.intel.com/content/www/us/en/docs/programmable/768979/2023-3/soc-design-example-user-guide.html)  pg14 starting 3.5. Preparing the Intel Arria 10 SX SoC FPGA Development Kit for the Intel FPGA AI Suite SoC Design Example , saw settings needed on the board and also need to add models & interference graph to the SD card .

Meanwhile I will also check on my end to see whether any known issues or solutions to DLA timeout.

 

Thanks

Regards

Kian

0 Kudos
RajanVaja
Beginner
568 Views

Thanks Kian for the response.

Yes, I am following the document steps. Board is setup as per the doc. Model and other required files are generated as per and copied to SD card. 

If you get any working image, please share with us so we can check at out end.

0 Kudos
khtan
Employee
522 Views

Hi Rajan,

Thanks for the reply and sorry for the delay, I'm still with checking with the owner of the design example on the configuration setup. Will let you know as soon as possible on this case.

 

Thanks

Regards

Kian

0 Kudos
khtan
Employee
392 Views

Hi Rajan, 

I'm sorry for the delay , didn't get a response yet from the AI suite team. Meanwhile I've a discussion on this case with my colleague and we are suspecting maybe still the DDR4 stability , it might be enough to boot but not stable to run the interference.

 

Does running the rocketboard (https://releases.rocketboards.org/2023.09/gsrd/a10_gsrd/sdimage.tar.gz,) passes the DDR4 BTS test but not the image that you're running for the interference test with DLA timeout? Is that the same memory settings per rocketboard?

 

Could I have a copy of your changes you made so that I could check the settings with the team ?

 

Thanks

Regards

Kian

0 Kudos
RajanVaja
Beginner
310 Views

Hi @khtan ,

Thanks for the reply.

We followed https://www.intel.com/content/www/us/en/docs/programmable/768979/2023-3-1/build-flow.html and created design. HPS was not booting in that. So on top of that we made changes in HPS DDR configuration as per GSRD. FPGA DDR configurations are same as per example design. We could not find Rocketboard design which has FPGA memory (it only has HPS memory) .

As per my understanding, BTS needs specific sof to be loaded so we cannot use BTS with any custom design. 

0 Kudos
khtan
Employee
82 Views

Hi Rajan,

I'm really sorry for the delay, currently trying to follow up with the AI FPGA team on the design example issue. Still getting issue on my end with a new Arria 10 SOC dev kit. I will try to expedite this case as this is already quite long.

 

Thanks

Regards

Kian

0 Kudos
Reply