Programmable Devices
CPLDs, FPGAs, SoC FPGAs, Configuration, and Transceivers
20693 Discussions

Very low speed of HPS after Preloader

Altera_Forum
Honored Contributor II
1,789 Views

I try insert next code for Altera examples in main(): 

volatile int i; for (i = 0; i < 800*1000*1000; i++) ; 

In SDRAM-variant work is greater than 2 min, in HelloWorld for FFFF0000-addresses program is hang, "Hello Tim\n" never appears. 

This counter 800*1000*1000 is similar to frequency of HPS, this cycle must work some seconds: in Disassemble window compiled code have a rouhly 10 asm-commands. 

In BootROM is inited L1 caches (32 K onchip code/data) and not inited L2-caches, may this pull down a HPS productivity ?
0 Kudos
9 Replies
Altera_Forum
Honored Contributor II
612 Views

At last I run my test on onboard Linux: add function code to HelloWorld example: 

void my_chk() { volatile int i = 0; for (; i < 800*1000*1000; i++) ; } 

On -o0 option a call has 8 seconds and 8 asm-commands in main cycle, on -o3 -- 7 sec and 6 "body" asm-commands. 

The same code in Baremetal is runned 245 seconds, with -o3 optimisation slightly faster -- 230 sec. 

Where in HWLIB is API to accelerate of HPS ? Which code may use ?
0 Kudos
Altera_Forum
Honored Contributor II
612 Views

In cv_5v4.pdf ("Cyclone V Device Handbook Volume 3: Hard Processor System Technical Reference") is subsection cv_54002: "Clock Manager", and writed: 

 

--- Quote Start ---  

The Clock Manager offers the following features: 

• Generates and manages clocks in the HPS 

• Contains the following PLL clock groups: 

- Main—contains clocks for the Cortex-A9 microprocessor unit (MPU) subsystem, level 3 (L3) 

interconnect, level 4 (L4) peripheral bus, and debug 

- Peripheral—contains clocks for PLL-driven peripherals 

- SDRAM—contains clocks for the SDRAM subsystem 

... 

--- Quote End ---  

 

In example Altera-SoCFPGA-HardwareLib-ClockManager-CV-GNU to console is outed 25 MHz for HPS and other, I try use the code: 

my_chk(); { uint32_t mu, di; status = alt_clk_pll_vco_cfg_get(ALT_CLK_MAIN_PLL, &mu, &di); if (status == ALT_E_SUCCESS) { status = alt_clk_pll_vco_cfg_set(ALT_CLK_MAIN_PLL, mu*2, di); if (status == ALT_E_SUCCESS) { my_chk(); } } } 

to increase speed of HPS, however call of alt_clk_pll_vco_cfg_set() return error ! 

And what is more -- call of "alt_clk_pll_vco_cfg_set(ALT_CLK_MAIN_PLL,mu,di);" (without "*2") also return error ! Mistake is in alt_clock_manager.c, function 'alt_clk_pll_vco_chg_methods_get(pll, mult, div);", lines: 

temp = mult * (inputfreq / div); if ((temp <= freqmax) && (temp >= freqmin)) // are the final values within frequency limits? 

The same values that I read -- not may be written to Clock Manager back ! 

I has installed last version of updates to SoC EDS 14.0. 

If I use "mu/2" as new multiplier, the alt_clk_pll_vco_cfg_set() call is successful, test "my_chk();" works slightly slower. 

Use anybody alt_clock_manager.c interface to change main PLL clocks ?
0 Kudos
Altera_Forum
Honored Contributor II
612 Views

Help me ! Anyone repeat simple cycle and see weird slow speed ? 

Very ill is have so powerered HPS and use 1/50 as 20-MHz microcontroller ! 

I has set all Clock Manager registers in Baremetal as in Linux with normal speed -- no result. Cache +/- is not impact. 

I try in DS-5 to change values of all registers in Registers window -- is only evident influence of M (Mode) field in CSPR: with SVC -> USR speed is faster in 1.5 times. 

In Linux is not visible most registers that is visible in Baremetal, comparing and copying is impossible. 

Or to which forum/document go ?
0 Kudos
Altera_Forum
Honored Contributor II
612 Views

My "hello word project" very slow too your code execute ~60 second from debugger and ~120 second from cold reset WTF?????, but in 4-5 second in Altera-SoCFPGA-HardwareLib-MPL-CV-ARMCC (example project).

0 Kudos
Altera_Forum
Honored Contributor II
612 Views

I not have license with SoC kit for ARMCC to compile MPL and try this example. 

If Altera may "race" HPS in "new" MPL, may be this in usual "free" Preloader for 14.1 update ? 

Or MPL will be compiled with simple arm-altera-eabi-*... 

May anybody debug MPL execution and see a place where "racing" does ?
0 Kudos
Altera_Forum
Honored Contributor II
612 Views

From my understanding. I think you did not initialize your MMU

0 Kudos
Altera_Forum
Honored Contributor II
612 Views

When you enable the caches, you also have to initialize the MMU.  

 

# include <stdio.h> 

# include <stdlib.h> 

# include <assert.h> 

# include "alt_cache.h"  

# include "alt_mmu.h"  

 

int __auto_semihosting; 

 

# define N 256 

# define ARRAY_SIZE(array) (sizeof(array) / sizeof(array[0])) 

 

void mul(const double *in_a, const double *in_b, unsigned n, double *out); 

 

/* MMU Page table - 16KB aligned at 16KB boundary */ 

static uint32_t __attribute__ ((aligned (0x4000))) alt_pt_storage[4096]; 

 

static void *alt_pt_alloc(const size_t size, void *context) 

return context; 

 

static void mmu_init(void) 

uint32_t *ttb1 = NULL; 

 

/* Populate the page table with sections (1 MiB regions). */ 

ALT_MMU_MEM_REGION_t regions[] = { 

/* Memory area: 1 GiB */ 

.va = (void *)0x00000000, 

.pa = (void *)0x00000000, 

.size = 0x40000000, 

.access = ALT_MMU_AP_FULL_ACCESS, 

.attributes = ALT_MMU_ATTR_WBA, 

.shareable = ALT_MMU_TTB_S_NON_SHAREABLE, 

.execute = ALT_MMU_TTB_XN_DISABLE, 

.security = ALT_MMU_TTB_NS_SECURE 

}, 

 

/* Device area: Everything else */ 

.va = (void *)0x40000000, 

.pa = (void *)0x40000000, 

.size = 0xc0000000, 

.access = ALT_MMU_AP_FULL_ACCESS, 

.attributes = ALT_MMU_ATTR_DEVICE_NS, 

.shareable = ALT_MMU_TTB_S_NON_SHAREABLE, 

.execute = ALT_MMU_TTB_XN_ENABLE, 

.security = ALT_MMU_TTB_NS_SECURE 

}; 

 

assert(ALT_E_SUCCESS == alt_mmu_init()); 

assert(alt_mmu_va_space_storage_required(regions, ARRAY_SIZE(regions)) <= sizeof(alt_pt_storage)); 

assert(ALT_E_SUCCESS == alt_mmu_va_space_create(&ttb1, regions, ARRAY_SIZE(regions), alt_pt_alloc, alt_pt_storage)); 

assert(ALT_E_SUCCESS == alt_mmu_va_space_enable(ttb1)); 

 

int main(int argc, char** argv) { 

static double a[N], b[N], c[N]; 

unsigned i, t; 

 

mmu_init(); 

alt_cache_system_enable(); 

 

for (i = 0; i < N; i++) { 

a = (double)rand(); 

b = (double)rand(); 

 

*(unsigned volatile *)0xFFFEC600 = 0xFFFFFFFF; /* timer reload value */ 

*(unsigned volatile *)0xFFFEC604 = 0xFFFFFFFF; /* current timer value */ 

*(unsigned volatile *)0xFFFEC608 = 0x003; /* start timer at 200MHz and automatically reload */ 

mul(a, b, N, c); 

t = *(unsigned volatile *)0xFFFEC604; 

printf("used time for %u multiplications = %u ns\n", N, 5 * (0xFFFFFFFF - t)); 

 

return 0; 

}
0 Kudos
Altera_Forum
Honored Contributor II
612 Views

For ARM CC: 

Optimization level minimum = over 100; 

Optimization level maximum = 12 s; 

status = alt_cache_system_enable();=1.5s! 

 

rosj91 - your code do not compile.
0 Kudos
Altera_Forum
Honored Contributor II
612 Views

2 rosj: BIG THANKS ! 

My function my_chk() now works 9-10 seconds instead about 3 minutes ! 

I include a remaked Altera Unhosted example with makeHPSfaster(). First commented call of my_chk() delayed to 3 min, second (after makeHPSfaster()) -- maximum 10 s. 

And, please -- insert your code in brackets, please ! :) 

 

In armcc way is not compiled alt_cache.c + alt_mmu.c, only GNU. 

 

2 Alex: you race my_chk() to 1.5 s ? Please include full example of project. 

 

P.S. I include fast_hps.h.txt (this forum ignore .h-files) with "fast" code to "rise" HPS for Baremetal way. 

Altera HWLIB sources and alt_pt.c with need codes imply be placed in subfolder "alt". 

Without alt_int_cpu_init() a semihosting is hang, printf() and fopen() not works.
0 Kudos
Reply