Nios® V/II Embedded Design Suite (EDS)
Support for Embedded Development Tools, Processors (SoCs and Nios® V/II processor), Embedded Development Suites (EDSs), Boot and Configuration, Operating Systems, C and C++
12606 Discussions

ecos kernel suddenly stops booting

Altera_Forum
Honored Contributor II
1,082 Views

For months I have been successfully running ecos and testing my application, but suddenly I am no longer able to boot ecos. Things were working in the morning, but after adding a new application feature and a reboot to install Windows update, I can no longer boot ecos. I have found that if I add or remove random code in my application, sometimes it will boot but usually not. I suspect something about my application is causing an overflow or overwrite when the kernel is booting, but don't know what that could be. Full details follow... 

 

When things stopped working, I had only updated and rebuilt my application. I did not change the SOPC builder, FPGA code, or rebuild the ecos library. I backed out the Windows update, but that didn't help. I have tried re-generating SOPC, recompiling FPGA, rebuilding the ecos library, and increasing the idle thread stack size without success. I switched my multiple stack spaces from static to dynamic allocation without success. 

 

I tried running archived applications (previously built and tested - no recompile) and found that I had to go back over a month to get one that would work. Even one from the day before did not boot. If I recompile a working archived version, it will no longer boot. 

 

I'm running Quartus Web edition 5.0 and it's associated ecos. I have not upgraded to 5.1 yet. I am using the ROMRAM option and downloading the application via NIOSII IDE.  

 

My application is a menu based system and has tons of printfs for menus and help etc. so there is a large data section. 

 

In debugging I found that I run through cyg_hal_invoke_constructors() but do not get to the cyg_start() function that follows in vectors.S. Instead I seem to jump into the on board Flash which is blank. I appear to make 21 calls in cyg_hal_invoke_constructors() but I don't see where the calls initialization list __CTOR_LIST__ is defined. I have an Altera PIO driver which gets installed as part of the cyg_hal_invoke_constructors and that appears to work fine. 

 

I suspect something is overwriting or overflowing but do not know what or where to proceed from here. If I have a working build, it's not clear to me why adding a single line in my application such as "Index++;" would cause the kernel to stop booting. If I then remove that line it would start booting again.  

 

Any thoughts would be greatly appreciated, 

 

Thanks, 

 

Stefan
0 Kudos
5 Replies
Altera_Forum
Honored Contributor II
346 Views

I cut out pretty much everything and got it to the point that if I add/remove a local variable to cyg_user_start it will not boot/boot. How can a local variable within the cyg_user_start function prevent the cyg_start and therefore cyg_user_start from being called? Functions follow. 

 

With this code snippet the system does not boot and does not make it into the cyg_user_start routine - I never see the LED&#39;s update:# include <cyg/kernel/kapi.h># include <stdio.h># include <cyg/hal/io.h> // IOWR 

 

void cyg_user_start(void); 

void cyg_user_start(void) 

FILE *US_LogFp = NULL; 

 

IOWR(0x901040, 0, 0xAA); // Update LED&#39;s so we know we got here. 

 

if (!US_LogFp) { IOWR(0x901040, 0, 0x55); } 

 

printf("\n\nRev Test, %s.\n", __DATE__); 

 

while(true); 

return; 

 

 

If I change the cyg_user_start routine to comment out the local variable it does boot and I see the LED&#39;s update and the print message: 

void cyg_user_start(void) 

// FILE *US_LogFp = NULL; 

 

IOWR(0x901040, 0, 0xAA); // Update LED&#39;s so we know we got here. 

 

// if (!US_LogFp) { IOWR(0x901040, 0, 0x55); } 

 

printf("\n\nRev Test, %s.\n", __DATE__); 

 

while(true); 

return; 

}
0 Kudos
Altera_Forum
Honored Contributor II
346 Views

I have a similar experience with eCos not booting. I have two nios development boards, one slightly older than the other. I have a hardware and software design running eCos that rus fine on one board but not at all on the other. I also have two computers running the Altera tools and on the "good" board with the "good" computer, the eCos system allways runs fine. On the "good" board with the "bad" computer sometimes it works, sometimes it does not. The "bad" board and "bad" computer are fine for all my other projects that don&#39;t use eCos. 

 

What I have noticed is that sometimes it seems to run an old image that was loaded into flash some time back, when I am running from SDram. I have tried running the eCos app from flash as ROM, ROMRAM and tried running from sRam (RAM) and SDram (RAM) but nothing works. 

 

I try using the debugger, but it gets jamed getting all mixed up with the image and source code etc. It appears that it never reaches the code start spot. 

 

I have searched the Altera site looking for clues and this forum, but have found nothing much to help. 

 

My best geuss is there may be a problem in loading the image in the right spot and sorting out the reset and vetor addresses etc. 

 

While this little text does not help you to solve your problem, let it be known that you are not alone! 

 

Peter Mumford UNSW
0 Kudos
Altera_Forum
Honored Contributor II
346 Views

Hi Stefan, 

 

Sounds like it could be a stack overflow. Try increasing your stack size. If you have 

enough memory, double or quadruple the default size it to see if things get better. 

If this eliminates the problem, you can then try to fine tune, or live with the larger 

stack sizes, or you&#39;ll need to review/restructure your app ... and try to eliminate 

things like large automatic arrays, structs, etc. (which is a good idea anyway). 

 

--Scott
0 Kudos
Altera_Forum
Honored Contributor II
346 Views

My first thought too was stack overflow which is why I cut my code down to just that in Post# 2. There is no other large static data or anything like that. I&#39;m not sure which stack ecos uses while it is booting though and it would have to be this stack which is overflowing BEFORE the cyg_start routine is called much less the cyg_user_start. I see sizes for the interrupt and idle stack, which is used for startup? I doubled the idle stack size with no effect. 

 

It appears that I have similar behaviour to some other reports, in that it does appear that when I fail to boot I am running code from the flash. My flash is currently completely erased but if I halt the debugger I get an address within the flash memory region. 

 

Looking at Post# 2 code, I don&#39;t know if it actually the addition of the lines which is causing problems or just some random problem that manifests itself based on the built image size or data somehow. This is especially true as this code is not being run. 

 

I see a new post with what I believe will be a similar issue as well. 

 

Thanks for all your help, 

 

Stefan
0 Kudos
Altera_Forum
Honored Contributor II
346 Views

It&#39;s possible I have identified the cause of this problem. I had a driver for the Altera SOPC PIO module. In testing I removed this driver from the configtool and I was able to build and run without locking up. I did enough building/testing to be confident this driver was the cause. 

 

Since multiple threads could access the PIO I had added a mutex to prevent simultaneous accesses. I did some testing and it appears that the cyg_mutex_init call is the offending line within the driver. I noticed that this function makes a call to New. I&#39;ve changed the source a bunch of times and haven&#39;t had a problem since removing the mutex. 

 

I wasn&#39;t aware there was a restriction on calls within a driver&#39;s initialization. 

 

Does anybody have any thoughts? 

 

Stefan
0 Kudos
Reply