Intel® Quartus® Prime Software
Intel® Quartus® Prime Design Software, Design Entry, Synthesis, Simulation, Verification, Timing Analysis, System Design (Platform Designer, formerly Qsys)
16559 Discussions

How to improve QuartusII compile time

Altera_Forum
Honored Contributor II
2,287 Views

Hi, 

 

I am Altera Dist. FAE. 

This specific costumer is suffering 

from long compile time (5 hours) 

for a design on StratixII180, 70% 

utilization. 

QuartusII version is 7.1. 

 

In general, these are the tricks I am aware off: 

- Incremental compilation 

- More powerefull workstations (I will be happy 

to have detailed suggestions regarding this point) 

 

Any more suggestions? 

 

Thank you anyway for reading this post.
0 Kudos
17 Replies
Altera_Forum
Honored Contributor II
1,335 Views

First, do they have physical synthesis on? (Assignments -> Settings -> Fitter/Physical Synthesis). This can significantly impact compile times, although it gives better results. If they do, try lowering the effort, or turning some of the options off. Note that the fitter gives very useful messages for these algorithms, something like Register Retiming took 40m and improved timing by 30ps. If you see something that takes a long time and gives little gain, turn it off(the gains are approximation since it hasn't routed yet). 

 

My guess is that this isn't on. Note that 5 hours, especially if you're not on the fastest machine, is not that bad for this device. Yes, it can be faster, but it's not off by orders of magnitude(and we have very good compile times compared to other FPGA vendors of this size). But I understand 5 hours can be long, regardless if this is good or not. 

 

The fitter should be set to Auto Fit, which will have it run until you meet timing and the design is considered routable, and then quit, rather than wasting timing trying to outperform your requirements. (Do you meet timing?). A simple way to reduce compile time is to do a Fast Fit. This can halve the fit time, with a performance reduction of only ~10% or so. If you're testing on a board and there's anyway this will work(slow down your clock rate, put a faster device on the board, etc.) it can be well worth it. Also note that in the lab you don't have to close timing. If your off by a small % of your slack, realize that you're not dealing with the worst case PVT and that the design should still work.  

 

Incremental Compilation can be extremely helpful, but usually requires some floorplanning(LogicLock Regions up front). Why? The basic premise of IC is that it preserves the partitions you don't change. The problem is that if you don't floorplan, your placement tends to have areas of overlap(where two different partitions converge, they are fit into a the holes of the other region, like two different colored marbles thrown in a box. Now, if you lock one set down and try to fit the other set within the "holes", the fit problem becomes much more difficult and performance can get worse. Since you're only 70% full, you may not encounter this, but it's worth reading the handbook section of Incremental Compilation and LogicLock Regions to get a better grasp of this.  

 

Also note that under Tools -> Advisors there is a Compilation Time Advisor. You're going to find some of these suggestions go directly against the Performance Suggestions, which makes sense(you usually trade compile time for better performance), so if you're meeting performance is an important variable in all of this.
0 Kudos
Altera_Forum
Honored Contributor II
1,335 Views

Thank you, Rysc.

0 Kudos
Altera_Forum
Honored Contributor II
1,335 Views

 

--- Quote Start ---  

Also note that under Tools -> Advisors there is a Compilation Time Advisor. 

--- Quote End ---  

 

 

 

Also check "Compilation-Time Optimization Techniques" in the QII 7.1 handbook at Volume 2, Section III, Chapter 8, page 8-84. The handbook might have something that isn't in the Advisor.
0 Kudos
Altera_Forum
Honored Contributor II
1,335 Views

Thank you, Brad

0 Kudos
Altera_Forum
Honored Contributor II
1,335 Views

Anyone know whats the speed gain of going from 32 to 64bit PC?

0 Kudos
Altera_Forum
Honored Contributor II
1,335 Views

Going from 32 bits to 64 bits will be slower as the instructions are longer. 

 

I also use a startix2 180, 

And the best speed (at a reasonable price) you can get for the moment is having a pc 

with win xp 32 bits, 3 GB of ram, 2 fast sata harddisk drive in parallel (raid 0), 

and a core2duo cpu (e6600).
0 Kudos
Altera_Forum
Honored Contributor II
1,335 Views

If you have a multiprocessor machine take note of the setting "Maximum processors allowed for parallel compilation" under "Assignments / Settings / Compilation Process Settings". This defaults to one, so QuartusII will not take advantage of multiple processors unless you increase this number (up to the available number of processors).

0 Kudos
Altera_Forum
Honored Contributor II
1,335 Views

Note that the Maximum Processors setting is given a number. Ideally, it would allow for a value like "Auto" or "Maximum", so that if you compile your design on a 2 processor system, it automatically uses both of them, and if it's on a 16 processor system(someday), it uses all of them. I believe that's in the works. But if you set the number higher than the number of processors you actually have, then it will slow down your compile times(which is why it defaults to 1). As I believe most people know, only the simplest of parallel algorithms see the performance(processor time) decrease linearly with the number of processors(where 2 processors would halve the total processing time). One of Altera's software engineers was at a multi-processor conference/tradeshow/something and basically software people in other fields were generally maxxing out at a square root improvement, so 2 processors would give a 40% improvement, 4 processors would give you a 2X improvement, etc. So extra processors helps, but not to the degree the layperson(like me) often expects.

0 Kudos
Altera_Forum
Honored Contributor II
1,335 Views

I made several test, 

going from one cpu to 2 cpu (in the qaurtus option) 

on a dual core processor (e6600) only speed up by 10 %, 

because Qaurtus only uses both core at the beginning and at the end of the fitter. 

 

And aside that, sometime Quartus runs forever when 2 cpu's are used (probably a sync problem between the multithread sw code).
0 Kudos
Altera_Forum
Honored Contributor II
1,335 Views

 

--- Quote Start ---  

I made several test, going from one cpu to 2 cpu (in the qaurtus option) on a dual core processor (e6600) only speed up by 10 %, because Qaurtus only uses both core at the beginning and at the end of the fitter. 

 

And aside that, sometime Quartus runs forever when 2 cpu's are used (probably a sync problem between the multithread sw code). 

--- Quote End ---  

 

 

 

A 10% speed-up sounds reasonable for version 6.1. The average improvement is more for version 7.1, but I don't know how design-dependent that is. 

 

For Quartus running forever, it would be good to file a service request for that if you are using version 7.1. (If you are using an older version, the problem might already be fixed.) I got the impression recently that the Altera factory thought problems like that were unusual, so they need to know about all the problems that are happening.
0 Kudos
Altera_Forum
Honored Contributor II
1,335 Views

Another way to improve speed is to make the clock in your design as simple as possible when your are just testing. I don't know whether there is other way to improve it. We are suffering such long time wating too. Especially when the timing reqire is critical in the design.

0 Kudos
Altera_Forum
Honored Contributor II
1,335 Views

I made a test with quartus 7.1 

 

The compile time on one core is 1h55, 

the compile time on both core is 1h37.
0 Kudos
Altera_Forum
Honored Contributor II
1,335 Views

Hi, 

 

 

--- Quote Start ---  

Going from 32 bits to 64 bits will be slower as the instructions are longer. 

 

--- Quote End ---  

 

 

This is only sort of true. If you run 64-bit Quartus on a 64-bit OS (say, Windows), you will see a small increase in compile time (on the order of 10%). This is mostly due to the increase in the working set (active memory) of the program due to the increase in size of pointers from 32-bits to 64-bits. 

 

However, if you run 32-bit Quartus on 64-bit OS, you will see no slow down at all. The advantage of doing this is (a) you get access to 4 GB of memory (up from 2GB in 32-bit Windows and ~3.5GB in 32-bit Linux), and (b) 32-bit Quartus will use less memory than 64-bit Quartus. 

 

Unless you are compiling a full design into the largest Stratix III device with complex timing constraints, you should be fine with a 64-bit OS + 32-bit Quartus.  

 

Regards, 

 

Paul Leventis 

Altera Corp.
0 Kudos
Altera_Forum
Honored Contributor II
1,335 Views

 

--- Quote Start ---  

I am Altera Dist. FAE. 

--- Quote End ---  

 

 

Nice to hear that even the Altera specialists are limited in Altera knowledge.:)  

 

Well, there is not much room for optimization: More CPU Power will lead to some 10% in performance, so people will have to wait for the forthcoming full hardware cpus, where all quartus code does not run on PCs but in an array of Stratix devices.;)  

 

In the meanwhile designers will have to work the "pipelined" way: Work at concurrent acting parts of the design, and having several simulations and synthesis processes running the same time (on different machines).
0 Kudos
Altera_Forum
Honored Contributor II
1,335 Views

Don't give up hope yet. Multicore processors are the way of the future (and the present). We are working on paralllelizing more and more of Quartus with each release. While it takes time to get appreciable gains, we're reaching the point where the multi-threaded compile gains are substantial. 

 

Regards, 

 

Paul Leventis 

Altera Corp.
0 Kudos
Altera_Forum
Honored Contributor II
1,335 Views

Right, but still the improvements from now to the next day will be small, so what my words basically said, is thinking of working methods to get around with the current situation. 

 

Anyway, your sentence leads me to a new question: 

 

Paul, are there plans to also go the way to distributed installations on various machines? 

 

Apart from electronics, I am also working at graphics and audio and most of these progs nowadays have the possibility to let parts of the calculations be run on another PC/Workstation. In most cases, the additional PC works as a kind of "plugin". I would imagine, there are also activities e.g. within the placing process, which could be done completely concurrent. (?)
0 Kudos
Altera_Forum
Honored Contributor II
1,335 Views

Hi, 

 

The Design Space Explorer tool in Quartus will farm out multiple compilations to networked computers (with a built-in light-weight batch system or with LSF). This is the easiest, coarsest version of parallelism. Any time you can take a task a sub-divide it into large, fully independent tasks, then parallelism (fine or coarse-grained) works great, and is very scalable. But I don't think this is what you mean. 

 

Farming out work to other computers requires coarser-grained parallelism than what is being employed to take advantage of multiple CPUs in the same computer. For example, you could imagine that when performing timing analysis, the CAD software could analyse one clock domain on one CPU, and analyse another clock domain on another CPU. This is efficient in a system where the CPUs share memory -- the bandwidth from the CPUs to the memory (which contains the timing graph and other improtant information) is on the order of GB/s. However, if you were to try the same parallelism across computers, suddenly the software would have to send all that memory content (the timing graph) over a link (1 Gb/s, say) first... and this would probably not be worth the effort. On top of this, its a bit more of a pain to get a program to dole out work to other computers; you requiring some sort of batch computing environment or computing client, and one machine must manage the scheduling of tasks across the computing resources. 

 

I could see that perhaps if your design had multiple partitions (for example, in an incremental design flow) a CAD system could farm off synthesis and fitting for the various pieces to multiple machines. This is in effect what you are doing when you are using "team-based" design methodologies -- one engineer is doing one piece on their computer, while another is working on a different piece on another machine. 

 

Regards, 

 

Paul
0 Kudos
Reply