DSR not called

Altera_Forum · ‎11-08-2005

Dear Forum readers

I've been scrathing my head for some time over this.

I have a custom board that gives me a periodic interrupt. I've written a ISR- DSR handler pair and they work - sometimes.

I've stripped the code down and the two examples is the essence of my problem.

The first one works as expected - in the second the DSR is never called, but apart from that the rest of the system performs as it should.

As seen the ONLY change is that abe++ has been replaced with 3 nops - just to make it easier to compare objdumps.

The objdumps of the complete program are identical apart from the sections shown, so the all variables, functions etc reside in identical places.

The compiler has chosen different registers in the two examples, but I really cannot see anything illigal in this assembler code.

Please have a look - this is beyond me http://forum.niosforum.com/work2/style_emoticons/<#EMO_DIR#>/ohmy.gif

-- working --

cyg_uint32 avr_isr_function(cyg_vector_t vector, cyg_addrword_t data)
{
    cyg_interrupt_mask( vector );
    cyg_interrupt_acknowledge( vector );
    
    // Acknowlege 
    Data = IORD_32DIRECT(AVRIF_BASE,0);
    
    for (i=0;i<8;i++) { 
        Data = IORD_32DIRECT(AVRIF_BASE,i*4+4);
        IOWR_32DIRECT(AVRIF_BASE,i*4+4,buf1);
        asm("nop;nop;nop");
    } // endfor
    abe += 8;
    return CYG_ISR_CALL_DSR;
}
void avr_dsr_function(cyg_vector_t vector, cyg_ucount32 count, cyg_addrword_t data) 
{
    // Unmask it 
    abe++;
    cyg_interrupt_unmask( vector );
}
objdump 
      Data = IORD_32DIRECT(AVRIF_BASE,i*4+4);
  8004b0:    d3601917  ldw    r13,-32668(gp)
  8004b4:    681890ba  slli    r12,r13,2
  8004b8:    62066104  addi    r8,r12,6532
  8004bc:    42800037  ldwio    r10,0(r8)
        IOWR_32DIRECT(AVRIF_BASE,i*4+4,buf1);
  8004c0:    6197883a  add    r11,r12,r6
  8004c4:    5a400017  ldw    r9,0(r11)
  8004c8:    d2a01a15  stw    r10,-32664(gp)
  8004cc:    42400035  stwio    r9,0(r8)
        asm("nop;nop;nop");
  8004d0:    0001883a  nop
  8004d4:    0001883a  nop
  8004d8:    0001883a  nop
  8004dc:    d1601917  ldw    r5,-32668(gp)
  8004e0:    29000044  addi    r4,r5,1
  8004e4:    d1201915  stw    r4,-32668(gp)
  8004e8:    393ff10e  bge    r7,r4,8004b0 <_Z16avr_isr_functionjj+0x38>

-- disfunctional--

cyg_uint32 avr_isr_function(cyg_vector_t vector, cyg_addrword_t data)
{
    cyg_interrupt_mask( vector );
    cyg_interrupt_acknowledge( vector );
    
    // Acknowlege 
    Data = IORD_32DIRECT(AVRIF_BASE,0);
    
    for (i=0;i<8;i++) { 
        Data = IORD_32DIRECT(AVRIF_BASE,i*4+4);
        IOWR_32DIRECT(AVRIF_BASE,i*4+4,buf1);
        abe++;
    } // endfor
    abe += 8;
    return CYG_ISR_CALL_DSR;
}
void avr_dsr_function(cyg_vector_t vector, cyg_ucount32 count, cyg_addrword_t data) 
{
    // Unmask it 
    abe++;
    cyg_interrupt_unmask( vector );
}
objdump: 
      Data = IORD_32DIRECT(AVRIF_BASE,i*4+4);
  8004b0:    d3e01917  ldw    r15,-32668(gp)
  8004b4:    781c90ba  slli    r14,r15,2
  8004b8:    72866104  addi    r10,r14,6532
  8004bc:    53000037  ldwio    r12,0(r10)
        IOWR_32DIRECT(AVRIF_BASE,i*4+4,buf1);
  8004c0:    719b883a  add    r13,r14,r6
  8004c4:    6ac00017  ldw    r11,0(r13)
  8004c8:    d3201a15  stw    r12,-32664(gp)
  8004cc:    52c00035  stwio    r11,0(r10)
        abe++;
  8004d0:    d2600417  ldw    r9,-32752(gp)
  8004d4:    d2201917  ldw    r8,-32668(gp)
  8004d8:    49400044  addi    r5,r9,1
  8004dc:    41000044  addi    r4,r8,1
  8004e0:    d1600415  stw    r5,-32752(gp)
  8004e4:    d1201915  stw    r4,-32668(gp)
  8004e8:    393ff10e  bge    r7,r4,8004b0 <_Z16avr_isr_functionjj+0x38>

Altera_Forum · ‎11-08-2005

I am confused as to the purpose of 'abe'. But one thing that bothers me is that it is being messed with in two locations. Hopefully, that isn't a problem since the IRQ in question is masked.

Try returning (CYG_ISR_HANDLED | CYG_ISR_CALL_DSR) from your ISR.

Altera_Forum · ‎11-09-2005

--- Quote Start ---

originally posted by mike desimone@Nov 8 2005, 06:32 PM

i am confused as to the purpose of 'abe'. but one thing that bothers me is that it is being messed with in two locations. hopefully, that isn't a problem since the irq in question is masked.

try returning (cyg_isr_handled | cyg_isr_call_dsr) from your isr.

<div align='right'><{post_snapback}> (index.php?act=findpost&pid=10862)

--- quote end ---

--- Quote End ---

Dear Mike

The process of isolating the problem has stripped the code for all meaningfull purposes. As it is it just mask the interrupt, which is then unmasked in the DSR. The abe variable is declared volatile and some other process does a printf.

I shifted from the Nios 1 because I saw some very rare but unexplainable behaviour where the compiler just generated nonsense code. When I inserted debug code the problem vanished. As I have this code in production and want to be able to make minor changes without worrying about the whole thing breaking in some unrelated place, and because of the obvious benefits of Nios II, I decided to try an upgrade.

When I see stuff like this and cannot explain it I fear the same thing happening again. As I have a huge code base and porting it from Nios1 to NiosII and eCos is quite an investment I'd like it to solve my problems.

In the examples the return codes are identical - but one works the other dont. The differences are in the ISR code alone which runs with interrupts disabled, so I have a hard time seeing that a context switch or the like could mess something up. Also the timing must be close to identical in the two code snippets.

The compiler chooses to use some other registers but as far as I can see in the objdumps all these registers should be stored/restored during a isr.

BTW I've tried to return both values but it makes no difference http://forum.niosforum.com/work2/style_emoticons/<#EMO_DIR#>/dry.gif

If the DSR mechanism was just plain broken I could manage.

Hope that some gifted person can correct whatever I do wrong.

Altera_Forum · ‎11-09-2005

If the code looks OK, what about the state of the registers?

To learn more, try explicitly disabling interrupts around the DSR "abe++".

Altera_Forum · ‎11-09-2005

--- Quote Start ---

originally posted by tns1@Nov 9 2005, 10:42 AM

if the code looks ok, what about the state of the registers?

to learn more, try explicitly disabling interrupts around the dsr "abe++".

<div align='right'><{post_snapback}> (index.php?act=findpost&pid=10880)

--- quote end ---

--- Quote End ---

The problem has been located.

Stepping through the assembler code I noticed that the index into the hal_interrupt_data is calculated and stored in r15.

Then a call to the isr takes place and upon return r15 is used again for finding the DSR and calling the interrupt_end function.

If the ISR uses r15 without restoring it this scenario will obviously crash in many flavours http://forum.niosforum.com/work2/style_emoticons/<#EMO_DIR#>/smile.gif

In the broken code r15 is used by the compiler and not restored which is perfectly legal according to the register usage table 7-2 in nios ii processor reference handbook.

Depending on the kind of C code you write in the ISR you can have r15 corrupted or not.

I've just modified the vector.s code to use r16 instead of r15 pushing it onto the stack etc and that did the trick.

in my humble opinion this seems to be a major bug in the nios port of ecos.

Looking forward to having your opinions.

Altera_Forum · ‎11-21-2005

--- Quote Start ---

originally posted by jskjoet@Nov 9 2005, 11:56 AM

the problem has been located.

stepping through the assembler code i noticed that the index into the hal_interrupt_data is calculated and stored in r15.

then a call to the isr takes place and upon return r15 is used again for finding the dsr and calling the interrupt_end function.

if the isr uses r15 without restoring it this scenario will obviously crash in many flavours http://forum.niosforum.com/work2/style_emoticons/<#emo_dir#>/smile.gif

in the broken code r15 is used by the compiler and not restored which is perfectly legal according to the register usage table 7-2 in nios ii processor reference handbook.

depending on the kind of c code you write in the isr you can have r15 corrupted or not.

i've just modified the vector.s code to use r16 instead of r15 pushing it onto the stack etc and that did the trick.

In my humble opinion this seems to be a major bug in the Nios port of ecos. [/b]

Looking forward to having your opinions.

<div align='right'><{post_snapback}> (index.php?act=findpost&pid=10881)

--- Quote End ---

[/b]

--- Quote End ---

jskjoet,

Thank you for isolating this. We are getting the ecos package for Nios II 5.1 together now and this is one item being looked at.

Altera_Forum · ‎12-04-2005

Hi Jesse

Pleased to be able to help.

Anyway I think I might have found another although minor thing you might want to consider in a new release unless I've completely misunderstood something.

_interrupt_handler enables interrupts after the isr has been called but before posting the dsr (the comments wrongly states that the dsr is called - but it is just added to the list - right ? ).

If we have an interrupt storm this means that each interrupt takes up app 76 bytes on the stack of the thread being interrupted - this might pose a problem.

Why not wait and let the eret instruction reenable ints - to prevent eating stack in case of an interrupt storm ?

The penalty would be a small latency increase, but it would allow you to decide on a stack size based only on the thread variables and function called from the thread.

As implemented you'll have to consider if interrupts could arive 'back to back' and how many times - each eating app 76 bytes.

Altera_Forum · ‎12-06-2005

--- Quote Start ---

originally posted by jskjoet@Dec 3 2005, 09:42 PM

_interrupt_handler enables interrupts after the isr has been called but before posting the dsr (the comments wrongly states that the dsr is called - but it is just added to the list - right ? ).

if we have an interrupt storm this means that each interrupt takes up app 76 bytes on the stack of the thread being interrupted - this might pose a problem.

why not wait and let the eret instruction reenable ints - to prevent eating stack in case of an interrupt storm ?

--- Quote End ---

Correct me if I'm wrong, but isn't this why the ISR masks its interrupt and the DSR unmasks it? If that same interrupt comes in between ISR and DSR, it has to wait for unmasking, at which point the DSR should be mostly done.

Altera_Forum · ‎12-06-2005

Hi Mike

If you use the ISR/DSR setup you can do it that way, and there's no problem. But if you have a fast lowlevel interrupt you cannot aford waiting for the DSR to reenable the irq.

I do not claim that the approach is an error just that each irq eats 80 bytes of stack and that you'll have to allow for this mem usage in each stack in every thread.

If you have interrupts comming back to back, which is quite normal, the stackpointer will not be restored between each of them and eventually you'll crash.

Altera_Forum · ‎12-13-2005

--- Quote Start ---

originally posted by jskjoet@Dec 6 2005, 03:33 PM

hi mike

if you use the isr/dsr setup you can do it that way, and there's no problem. but if you have a fast lowlevel interrupt you cannot aford waiting for the dsr to reenable the irq.

i do not claim that the approach is an error just that each irq eats 80 bytes of stack and that you'll have to allow for this mem usage in each stack in every thread.

if you have interrupts comming back to back, which is quite normal, the stackpointer will not be restored between each of them and eventually you'll crash.

<div align='right'><{post_snapback}> (index.php?act=findpost&pid=11370)

--- quote end ---

--- Quote End ---

jskjoet,

I see your point, but it seems that eCos has this worked out by means of the (configurable) separate interrupt stack. I found this a good read, in case you haven't already seen it: http://ecos.sourceware.org/docs-1.3.1/ref/ecos-ref.c.html (http://ecos.sourceware.org/docs-1.3.1/ref/ecos-ref.c.html)

It also seems to me that if this is a genuine problem it may be time to take another look at the system: Is the ISR (not DSR) code taking too long to execute? Can anything be done to slow the rate of interrupts? At some point the processor will be choked no matter what; I think that eCos is trying to be clever here with the separate isr/dsr architecture specifically to allow you to make your ISR code as brief as possible. Thats just my interpretation though. One big caveat: I'm just getting into eCos now (here at Altera) and haven't yet done what I'd consider a complex design with it yet.

Altera_Forum · ‎12-14-2005

Hi Jesse

Thank you for the reply.

This is not a real problem in my PDA app, but just something I noticed reading the code when finding out why my DSR routines did not work. I thought it might be of interrest for the ecos porting team.

As I read the code, and please correct me if I'm wrong, the stack is eaten from whatever thread that was active, NOT from the dedicated interrupt stack.

In fact I have a hard time figuring out why the dedicated interrupt stack is worthwhile as the ISR is called with interrupts disabled, so unless you specifically enable interrupts again in your ISR, maximum ISR stack usage is quite predictable.

My point is that it would make more sense to wait until the stack has been reclaimed before enabling irq's.