Nios® V/II Embedded Design Suite (EDS)
Support for Embedded Development Tools, Processors (SoCs and Nios® V/II processor), Embedded Development Suites (EDSs), Boot and Configuration, Operating Systems, C and C++

lwip MicroC/OS-II throughput

Altera_Forum
Honored Contributor II
1,750 Views

Does anyone know if the throughput rates on pate 1-31 of the "Using Lightweight IO with the Nios II Processor tutorial" document was for 100Mbps ethernet? 

 

The transmit rate of 5.16 Mbps seems really low unless it is a 10Mbps link. It is not clear what application software they are running to test UDP rates. I am assuming it is not their simple socket server application which uses TCP. 

 

 

My goal is to us TCP to download vga images captured and processed by NIOS-II.
0 Kudos
21 Replies
Altera_Forum
Honored Contributor II
840 Views

Yes those measurements were on a 100MBps link.  

 

I recently measured LWIP throughput on the standard reference design with the clock frequency raised to 70MHz and the figures are: 

UDP TX 820 kbytes/s 

UDP Rx 610 kbytes/s 

 

The software use was a very simple sockets application with the board sending frames as quickly as possible to a PC (or visa versa) and averaging how long it takes to send every 200kbytes. The problems here are that the lan91c111 does not have a particularly fast interface (the processor has to copy every word), and the memory used is SDRAM. 

 

It's hard to say if LWIP is suitable for your application without knowing the rates at which you wish to capture the data. The raw LWIP implementation is mainly suitable for use as a control channel. If you're looking to get rates which approach line rate you need to consider some hardware acceleration.
0 Kudos
Altera_Forum
Honored Contributor II
840 Views

Thanks, 

 

I was hoping that the documented throughput rates were for 10Mbps. This may be a problem.
0 Kudos
Altera_Forum
Honored Contributor II
840 Views

Hi, 

i've finished my LWIP uCOSii UDP test today and i have reached these results: 

UDP TX 35.55 Mbit/s (UDP payload rate,packet length 1472 byte) 

UDP RX 17.66 Mbit/s (UDP payload rate,packet length 1472 byte) 

I've used a Stratix 1S10 development board with a NiosII(f) 100 MHz clock. If you are interested, i can tell you more. 

bye
0 Kudos
Altera_Forum
Honored Contributor II
840 Views

Soin, 

 

I for one would be very interested to know more.
0 Kudos
Altera_Forum
Honored Contributor II
840 Views

hi rugbybloke 

i've used the Stratix 1S10 development with the "full featured" example with some modifications: 

-Nios II (f) 100 MHz,Instruction cache:16Kbytes,Data cache:4Kbytes 

(also the SDRAM has a 100 MHz clock, with delay...) 

-System_clk_timer->Initial period:10 ms (is the tick used by UCOSii) 

-debug module level 1  

-no onchip ram 

Moreover, in the IDE, u have to set "optimization level: -O3" in your program and in the associated library. 

My test program uses UNIX socket to only trasmit or receive continuosly (test tx or test rx) UDP packet of variable length: in the tx test i send an incremental counter to a software running in a PC connected through a LAN cross cable with the board, in the rx test i check this counter. 

bye
0 Kudos
Altera_Forum
Honored Contributor II
840 Views

I know that this is a little bit late, but I have done some detailed speed measurement and tuning that may be relevant. The environment is a 50MHz NiosII on a board with a Cyclone 1c20 and a 91c111 - very similar to the 1c20 demo board. Static RAM is used for program and data during execution. A flash is used for boot only. Builds have been done with the 1.1 tools and the Beta 5.0 tools. 

 

The basic system operation is bulk data collection and transfer to a host PC. Data typically arrives in 4K chunks via DMA. The data is quickly reduced to a 2K chunk using a copy operation. The DMA is transferring some useless bits. Headers are then prepended and the data is sent to the PC via TCP. The data streams out with no application level acknowledgement. 

 

Some of the code optimizations have been in for a long time, and I don't have a good baseline measurement without them. I started this work with toolkit 1.0. The optimizations were as follows: memcpy - increase the level of loop unrolling, inet/chksum - unroll the inner loop, 91c111 driver unroll the inner loop of the transmission algorithm. I am considering also unrolling the 91c111 driver's receive inner loop and getting rid of the rx thread entirely. These latter two steps haven't been taken, yet. 

 

I have found that -O3 produces very much the same results as -O2, though it does generate significantly larger code. Space is an issue in this system, so I just use -O2. 

 

Using the 1.1 toolkit, I was able to get about 1.5ms per data chunk, which corresponds to about 11Mbps. With the 5.0 toolkit, I slowed down to about 1.9ms per data chunk, or about 8.5Mbps. Though I bemoan the slowdown with the 5.0 toolkit, I have found that the 1.1.0 lwIP in 5.0 is more robust under packet loss than the 0.7.2 lwIP in the 1.1 toolkit. 

 

The timing measurements were taken with a logic analyzer. I modified os_cpu_c.c to put in some outputs to a port that I could observe. I used one port bit per task, so I was able to get a nice waveform showing active task times. I also added bits to track the time spent in memcpy, the checksum, tcp_write, and tcp_output, though these weren't strictly necessary. 

 

If you really need speed, plan on doing some tuning. Make provisions for measuring time to guide your tuning efforts. If you can get an ethernet controller that operates as a DMA bus master, you should. It's silly to be transcribing data to a fast ethernet chip the way we do with the 91c111. 

 

If you do plan to use UDP rather than TCP, consider just doing it yourself without involving the stack. It's not a big deal. If the data link is one hop over an ethernet, you could also consider dispensing with the UDP checksum. Since you are protected by the ethernet CRC, the UDP checksum adds little. 

 

Avoid transcriptions to the extent possible. Wherever you have a loop processing your bulk data, make sure that it is unrolled. 

 

If you leave time for this in your project, you'll probably enjoy doing it. Speed tuning is kind of fun if you're not under the gun when you're doing it. 

 

Good luck!
0 Kudos
Altera_Forum
Honored Contributor II
840 Views

This is an update on the reply that I posted a couple of days ago. I've found out why I had such different time measurements between the 1.1 toolkit and the 5.0 toolkit. There was one other modification that I had done to the 1.1 environment that I hadn't bothered to move forward. I had no idea that it had speed implications, but in fact it did. 

 

The 1.1 toolkit's lwIP version had problems in the protection of pbuf's in a multitasking environment. The only way to get effective protection was to specify SYS_LIGHTWEIGHT_PROT. That forced the lwIP library to used interrupt disables rather than a semaphore to protect pbuf's and it's various memory pools. (A couple of other declarations were also required to make this work.) 

 

The 5.0 toolkit's lwIP version resolved the problems in the protection of pbuf's, and the default distribution used semaphores rather than the "lightweight" mechanism. Because of this, I didn't bother to pull forward the changes I had made in lwipopts.h to specify and facilitate SYS_LIGHTWEIGHT_PROT. Yesterday, for reasons unrelated to speed tuning, I had to pull this change into the 5.0 toolkit. Much to my surprise, I got back most of the time that had been lost. It amounted to almost 400us in the handling of two outgoing packets and one incoming ACK. I'm now back in the vicinity of 1.5ms for these operations. This is just under 11Mbps for the payloads in question. 

 

It seems unlikely that semaphores would be so costly, but lwIP does load them down with a mechanism of its own for executing specified functions at specified times. Even with this, though, there must be a lot of semaphore calls in the processing of normal packets. 

 

If you are tuning for performance, I strongly recommend changing to the lightweight protection mechanism. This may well be more important than any of the loop unrolling that I did.
0 Kudos
Altera_Forum
Honored Contributor II
840 Views

When using LWIP + uCOS, I see there are quite a few tuning options available thru the GUI and in the file opt.h . What changes to these default options are you finding helps the performance? Is there better documentation (other than the code) to explain what some of these do? 

 

I see that some are using a 10ms timer interval for uCos, despite using a fast clock. I believe the HW timer defaults to 1ms, so was there any performance testing done as a reason for the longer interval? 

 

thanks
0 Kudos
Altera_Forum
Honored Contributor II
840 Views

JimG, 

You mention "other declarations" needed to support the lightweight mechanism. What are these? 

 

thanks
0 Kudos
Altera_Forum
Honored Contributor II
840 Views

I made the following changes which compiled and tested ok, but haven't done any comparative performance measurements yet. 

 

1.) Added the following code to: 

\altera\kits\nios2\components\altera_lwip\UCOSII\inc\lwipopts.h 

 

/* 

* Enable LightWeight Protection. 

* Refer \altera\kits\nios2\components\altera_lwip\UCOSII\src\downloads\lwip-1.1.0\src\include\lwip\sys.h 

* for changes to the SYS_ARCH_DECL_PROTECT(), SYS_ARCH_PROTECT() and 

* SYS_ARCH_UNPROTECT() macros. 

*/# define SYS_LIGHTWEIGHT_PROT 1 

 

2.) Made the following changes in: 

\altera\kits\nios2\components\altera_lwip\UCOSII\src\downloads\lwip-1.1.0\src\include\lwip\sys.h 

 

/** SYS_ARCH_DECL_PROTECT 

* declare a protection variable. This macro will default to defining a variable of 

* type sys_prot_t. If a particular port needs a different implementation, then 

* this macro may be defined in sys_arch.h. 

*/ 

//#define SYS_ARCH_DECL_PROTECT(lev) sys_prot_t lev# define SYS_ARCH_DECL_PROTECT(lev) alt_irq_context lev 

/** SYS_ARCH_PROTECT 

* Perform a "fast" protect. This could be implemented by 

* disabling interrupts for an embedded system or by using a semaphore or 

* mutex. The implementation should allow calling SYS_ARCH_PROTECT when 

* already protected. The old protection level is returned in the variable 

* "lev". This macro will default to calling the sys_arch_protect() function 

* which should be implemented in sys_arch.c. If a particular port needs a 

* different implementation, then this macro may be defined in sys_arch.h 

*/ 

//#define SYS_ARCH_PROTECT(lev) lev = sys_arch_protect()# define SYS_ARCH_PROTECT(lev) lev = alt_irq_disable_all() 

/** SYS_ARCH_UNPROTECT 

* Perform a "fast" set of the protection level to "lev". This could be 

* implemented by setting the interrupt level to "lev" within the MACRO or by 

* using a semaphore or mutex. This macro will default to calling the 

* sys_arch_unprotect() function which should be implemented in 

* sys_arch.c. If a particular port needs a different implementation, then 

* this macro may be defined in sys_arch.h 

*/ 

//#define SYS_ARCH_UNPROTECT(lev) sys_arch_unprotect(lev)# define SYS_ARCH_UNPROTECT(lev) alt_irq_enable_all(lev) 

//sys_prot_t sys_arch_protect(void); 

//void sys_arch_unprotect(sys_prot_t pval); 

 

3.) Added the following include to: 

\altera\kits\nios2\components\altera_lwip\UCOSII\src\downloads\lwip-1.1.0\src\core\pbuf.c 

\altera\kits\nios2\components\altera_lwip\UCOSII\src\downloads\lwip-1.1.0\src\core\memp.c 

# include "sys/alt_irq.h" 

 

Not sure about any of the other tuning options, haven't got that far yet. 

 

ciao
0 Kudos
Altera_Forum
Honored Contributor II
840 Views

can the cs8900 chip work in the condition of NIOSII? 

 

I see cs8900 in quartusII sopc, but I find the ethernet chip that all the people discussed is lan91c111.  

the chip is too expensive for me! I made a nios development kit by using cs8900! but I don't know if the ethernet chip can work!!
0 Kudos
Altera_Forum
Honored Contributor II
840 Views

Hi Soin 

 

you wrote about a UDP-server implementation with MicroC/OS-II. Could you send me an example of your way to send UDP-packages? I'm trying to do without it, but so far it doesn't work. When I send something it always goes from source port 0 to destination port 0. 

 

Cheers, 

 

Danny
0 Kudos
Altera_Forum
Honored Contributor II
840 Views

Solved that problem, but now all the PC is receiving is some ascii-code...

0 Kudos
Altera_Forum
Honored Contributor II
840 Views

OK solved that problem too... things are going great... now I've got 37 MBit... compiling another NIOS II now according to the altera ethernet acceleration white paper... let's see how that works out

0 Kudos
Altera_Forum
Honored Contributor II
840 Views

I performed a test using Bob's changes. The test consists in sending 200 bytes long packets to a PC that sends them back to the interface and measuring the time to cover the loop. The thread performes the send operation and then blocks on the receive on a different socket. It runs on a Stratix II @ 50 MHz. and the times are averaged over 1000 sends/receives  

The results are depriming: 5154 microseconds to complete the loopback i.e. 620Kbps integrated over transmit and receive. 

Here is a portion of code: 

while(1)  {    start=alt_timestamp();    if (sendto(fd_write, buffer, MSG_LEN , 0, (struct sockaddr*)&rem_addr,         sizeof(struct sockaddr_in)) < 0) {    fprintf(stderr, "%s: sendto failed: ", __FUNCTION__);    }    sent++;    if((num_car_letti= recvfrom(fd_listen, str_in, NUM_CAR, 0, &src_addr,      &namelen)) == -1)    {        fprintf(stderr,"%s: recvfrom failed: \n", __FUNCTION__);        exit(1);    }    stop=alt_timestamp();    delay=(stop-start);    sum+=delay;    if(sent == 1000)    {        fprintf(stderr, "delay %lu usec", (sum/1000)/(tps/1000000));        fprintf(stderr, " %lu\n ", sum);        sent=stop=delay=sum=0;        alt_timestamp_start();    } }//while 

 

--- Quote Start ---  

originally posted by bob@Jun 9 2005, 02:38 AM 

i made the following changes which compiled and tested ok, but haven&#39;t done any comparative performance measurements yet. 

 

1.) added the following code to: 

\altera\kits\nios2\components\altera_lwip\ucosii\inc\lwipopts.h 

 

/* 

* enable lightweight protection. 

* refer \altera\kits\nios2\components\altera_lwip\ucosii\src\downloads\lwip-1.1.0\src\include\lwip\sys.h 

* for changes to the sys_arch_decl_protect(), sys_arch_protect() and 

* sys_arch_unprotect() macros. 

*/# define sys_lightweight_prot  1 

 

2.) made the following changes in: 

\altera\kits\nios2\components\altera_lwip\ucosii\src\downloads\lwip-1.1.0\src\include\lwip\sys.h 

 

/** sys_arch_decl_protect 

* declare a protection variable. this macro will default to defining a variable of 

* type sys_prot_t. if a particular port needs a different implementation, then 

* this macro may be defined in sys_arch.h. 

*/ 

//#define sys_arch_decl_protect(lev) sys_prot_t lev# define sys_arch_decl_protect(lev) alt_irq_context lev 

/** sys_arch_protect 

* perform a "fast" protect. this could be implemented by 

* disabling interrupts for an embedded system or by using a semaphore or 

* mutex. the implementation should allow calling sys_arch_protect when 

* already protected. the old protection level is returned in the variable 

* "lev". this macro will default to calling the sys_arch_protect() function 

* which should be implemented in sys_arch.c. if a particular port needs a 

* different implementation, then this macro may be defined in sys_arch.h 

*/ 

//#define sys_arch_protect(lev) lev = sys_arch_protect()# define sys_arch_protect(lev) lev = alt_irq_disable_all() 

/** sys_arch_unprotect 

* perform a "fast" set of the protection level to "lev". this could be 

* implemented by setting the interrupt level to "lev" within the macro or by 

* using a semaphore or mutex.  this macro will default to calling the 

* sys_arch_unprotect() function which should be implemented in 

* sys_arch.c. if a particular port needs a different implementation, then 

* this macro may be defined in sys_arch.h 

*/ 

//#define sys_arch_unprotect(lev) sys_arch_unprotect(lev)# define sys_arch_unprotect(lev) alt_irq_enable_all(lev) 

//sys_prot_t sys_arch_protect(void); 

//void sys_arch_unprotect(sys_prot_t pval); 

 

3.) added the following include to: 

\altera\kits\nios2\components\altera_lwip\ucosii\src\downloads\lwip-1.1.0\src\core\pbuf.c 

\altera\kits\nios2\components\altera_lwip\ucosii\src\downloads\lwip-1.1.0\src\core\memp.c 

# include "sys/alt_irq.h" 

 

not sure about any of the other tuning options, haven&#39;t got that far yet. 

 

ciao 

<div align='right'><{post_snapback}> (index.php?act=findpost&pid=7529) 

--- quote end ---  

 

--- Quote End ---  

0 Kudos
Altera_Forum
Honored Contributor II
840 Views

I discovered that i compiled my test with -O0 for the library. After replacing all optimization to -O2 I get 1.715 ms to send/receive 200 bytes. This is still a factor ~2 above the values in table 1-3 of the lwip tutorial. Does anybodyhave any explaination for this factor?

0 Kudos
Altera_Forum
Honored Contributor II
840 Views

 

--- Quote Start ---  

 

hi soin, 

 

I am interesting your project of lwip. i know the lan91c111 speed is too slow from see some pdf. no more than 10M.i want to accelerate speed.i see your result .i want to get your code.by the way ,i want to know your method how to test the speed of lan91,use the niosii time or other software?if software ,tell me name. 

thank for your help,if you have time,please send project to wwycoolboy@eyou.com.
0 Kudos
Altera_Forum
Honored Contributor II
840 Views

Hi all, 

 

my testing environment is Cyclone 1c20 chip, 50MHz Nios II cpu, Lan91c111 MAC. I am using lwIP 1.1 with Quartus II 6.0 and Nios IDE 6.0 software. 

 

I am transfering bulk image data from the embedded system to PC via UDP and the fastest speed I ever got is about 9Mbit/s ( 1.1MByte/s). 

 

The problem is that when I use any compiler optimize options when building my NiosII project, it stops functioning. It must be compiled with -O0 level.  

 

Does anybody have any idea about this problem? Thanks!!
0 Kudos
Altera_Forum
Honored Contributor II
840 Views

Hi heavenscape, 

 

you should be using the lwip stand-alone version if possible. That way you can reach the highest bandwith http://forum.niosforum.com/work2/style_emoticons/<#EMO_DIR#>/smile.gif  

 

Cheers, 

 

Danny
0 Kudos
Altera_Forum
Honored Contributor II
769 Views

Hi DannyJacobs, 

 

Thank you for your attention and suggestion!! I am not quite sure how to use the lwip stand alone version. Do I just need to copy the relative source files into my NiosII project and rebuild it?  

 

When I call a lwip stack function, how can I know if it is the stand alone version is called or it is the version provided by NiosII kit is called? Or is there a tutorial or example that shows me how to use the stand alone version of the stack? 

 

Thank you!! 

 

Regards, 

Heavenscape 

 

 

 

--- Quote Start ---  

originally posted by dannyjacobs@Aug 21 2006, 01:58 AM 

hi heavenscape, 

 

you should be using the lwip stand-alone version if possible. that way you can reach the highest bandwith http://forum.niosforum.com/work2/style_emoticons/<#emo_dir#>/smile.gif  

 

cheers, 

 

danny 

<div align='right'><{post_snapback}> (index.php?act=findpost&pid=17737) 

--- quote end ---  

 

--- Quote End ---  

0 Kudos
Reply