Programmable Devices
CPLDs, FPGAs, SoC FPGAs, Configuration, and Transceivers
20705 Discussions

ELF download behaves differently than initialized hex

Altera_Forum
Honored Contributor II
1,409 Views

I have searched the archives but haven't turned up this problem. 

 

I have a NIOS II based system with some custom embedded peripherals, a JTAG UART (for debugging) and a non-JTAG UART (the system I'm designing is using RS-232 communication). I am running a project derived from the "hello_world_small" sample using the NIOS build tools in Eclipse. It uses a minimal amount of HAL functionality, but no operating system. The NIOS currently has two embedded RAM blocks (currently both 8K in size), one for code and one for data. (I say "currently" because originally I had an embedded ROM block for code segments .text and .rodata. I changed it to RAM to enable the test I am describing here.) The code RAM is configured to be initialized with the .HEX version of the system software. 

 

So here's the problem. If I download .SOF of the NIOS system, THEN download the ELF binary of the system software using the Eclipse Run As NIOS II hardware functionality, the software appears to work perfectly. (It seems to send, process and respond to an arbitrary number of packets over RS-232.) 

 

BUT... 

 

If I download the .SOF of the NIOS system and DON'T download the ELF binary of the software, but rather just let it boot from the .HEX version of the software as initialized into the code RAM, the software appears to work perfectly for a while, then hang. The behaviour is repeatable. For a given build, the system will respond correctly to six packets, or eight, or four, or some other fixed number, then hang. It has the feel of a stack overflow problem. The interrupt service routines are still running (at least the ISR that responds to serial input is definitely running) but the high-level program fails. So, I have two questions: 

 

1) Has anybody encountered something like this before? What could possibly be different between the downloaded ELF and the initialized HEX version of the same software? I speculate that it might be different handling of unused portions of the memory, but I can't imagine why it would matter what is in code space that is never reached. I also can't even hypothesize how the system could respond properly to six copies of the same packet, then fail on the seventh. 

 

2) Does anybody have suggestions of how to debug this? Right now, all I know for sure is what I have written above. The software works, then it doesn't. I can't send debug messages out the JTAG UART because I'm not using it. Is it possible to use the Eclipse NIOS JTAG UART console without downloading the ELF? if so, I haven't figured out how.
0 Kudos
5 Replies
Altera_Forum
Honored Contributor II
392 Views

Are you sure the sof file has been generated with the updated hex files? 

It seems you have some memory section not correctly initialized: then when the code hits there, it hangs. 

Although I believe this is not your problem, you may try to add the following calls in the very beginning of your main() for every code/data segment which is possibly not mapped to the default linker sections. 

ALT_LOAD_SECTION_BY_NAME(<name>) 

 

 

--- Quote Start ---  

 

2) Does anybody have suggestions of how to debug this? Right now, all I know for sure is what I have written above. The software works, then it doesn't. I can't send debug messages out the JTAG UART because I'm not using it. Is it possible to use the Eclipse NIOS JTAG UART console without downloading the ELF? if so, I haven't figured out how. 

--- Quote End ---  

 

After you loaded the sof configuration, open a Nios shell prompt and write 'nios2-terminal' 

If you want full debug functionality (step into code, start, stop ...) you can still use Run As... in Eclipse, but you must disable the default Reset and Download features, so you simply attach the debugger to the running program.
0 Kudos
Altera_Forum
Honored Contributor II
392 Views

 

--- Quote Start ---  

Are you sure the sof file has been generated with the updated hex files? 

It seems you have some memory section not correctly initialized: then when the code hits there, it hangs. 

Although I believe this is not your problem, you may try to add the following calls in the very beginning of your main() for every code/data segment which is possibly not mapped to the default linker sections. 

ALT_LOAD_SECTION_BY_NAME(<name>) 

 

--- Quote End ---  

 

 

Thanks for your reply! 

 

I believe it is properly generated with the current code. At least, changes made to the source are reflected in the behaviour of the running system. It certainly looks like what you describe, I just can't figure out how it ever gets to that uninitialized memory. The code just sits there in a loop, responding to serial packets. There's no reason why the path the code follows would be different in response to the Nth packet than the (N+1)st, especially if it's (N+1) copies of the same packet. I'm probably clobbering the stack somewhere, and overwriting a return address, I just can't see how that wouldn't cause disaster the first time, too. Does the ELF contain the opcode for NOP in each uninitialized code location? I suppose that would account for the difference in some deeply pathological case. Hmmm, this is giving me an idea. I'll have to see if NIOS II has an "illegal opcode" trap. 

 

 

--- Quote Start ---  

 

After you loaded the sof configuration, open a Nios shell prompt and write 'nios2-terminal' 

If you want full debug functionality (step into code, start, stop ...) you can still use Run As... in Eclipse, but you must disable the default Reset and Download features, so you simply attach the debugger to the running program. 

--- Quote End ---  

 

 

And thank you for this! I've been banging my head against this for weeks. :)
0 Kudos
Altera_Forum
Honored Contributor II
392 Views

Are you using printf for debug or do you have any other data output on the jtag uart port? 

Please note that if your bsp is configured for reduced drivers, there's no control on the jtag connection. So, the involved code gets stuck if you keep on sending data and you don't have the jtag interface attached. Since usually the printf calls are located in main(), the final behaviour is the same you observed: the program flow seems to hang, but isr are still active.
0 Kudos
Altera_Forum
Honored Contributor II
392 Views

This is almost certainly the explanation, and I will verify later today. So the JTAG UART has some kind of handshaking in it? If nobody is listening, it will eventually block? That is good to know. Thanks. I never would have figured that out.

0 Kudos
Altera_Forum
Honored Contributor II
392 Views

 

--- Quote Start ---  

So the JTAG UART has some kind of handshaking in it? If nobody is listening, it will eventually block? That is good to know. Thanks. I never would have figured that out. 

--- Quote End ---  

 

This applies only if you use the "reduced device drivers" option in order to limit code footprint. 

IIRC with the full driver version, there is not this problem.  

This follows from the fact that in the first case jtag uart works in polled mode, while in the second one it is irq driven.
0 Kudos
Reply