Embedded Intel® Core™ Processors
Communicate Intel® Core™ Hardware, Software, Firmware, Graphics Concerns

QAT Performance Issue

SChan69
Novice
2,612 Views

Hi Everyone,

I have a performance issue about QAT.

I run the test app in \openssl-async\apps\speed and record the result as below table.

As you see, the performance of QAT is poor. Can someone please give me some advice?

Here is my test cmd:

QAT : ./apps/openssl speed -engine qat -evp aes-128-cbc -elapsed

SW(disable a AES-NI) : OPENSSL_ia32cap="~0x200000200000000" ./apps/openssl speed -evp aes-128-cbc -elapsed

SW-Using AES-NI: ./apps/openssl speed -evp aes-128-cbc -elapsed

HW Spec:

Platform : Rangeley

CPU : Intel(R) Atom(TM) CPU C2758 @ 2.40GHz

0 Kudos
9 Replies
CarlosAM_INTEL
Moderator
903 Views

Hello SamChang,

Thanks you for contacting the Intel Embedded Community.

The information that may help you is stated in the https://01.org/sites/default/files/page/330687-003_qat_perf_opt_guide.pdf Intel(R) QuickAssist Technology Performance Optimization Guide.

 

Please let us know if this information is useful to you.

 

Best Regards,

Carlos_A.

0 Kudos
SChan69
Novice
903 Views

Hi Carlos

Thanks for your response. i'd like to ask you some questions.

I used the profiling tool and I found that the qaeCryptoMemV2P takes up most of the resource.

1. May I use the zero copy mode to avoiding it?

2. Can I use zero copy in synchronous mode for aes-128-cbc?

(The document mention that the zero copy only support in async mode for aes-128-hmac-sha1)

3. Could you give me some example about using zero cpoy mode?

 

0 Kudos
CarlosAM_INTEL
Moderator
903 Views

Hello SamChang,

Thanks for your reply.

In fact, it is definitely a cost associated with offloading the crypto to QAT hardware. You can really see the cost with smaller packet sizes. One thing we note is you are not using the asynchronous access to openssl. With async we can see much better performance. The data we measured and the commands used to obtain them are the following:

The details related to zero copy are stated in the Application Note included with the libcrypto package that can be found at the following web site:

https://01.org/sites/default/files/page/libcrypto_shim_0.4.9-009_withdocumentation.zip https://01.org/sites/default/files/page/libcrypto_shim_0.4.9-009_withdocumentation.zip

There are few limitations when zero-copy mode is used. Please refer to the info stated in section 1.2.2 for additional details.

By the way, could you please give us detailed description of what you are attempting to accomplish?

Thanks in advance for your reply with the requested information.

Best Regards,

Carlos_A

0 Kudos
SChan69
Novice
903 Views

Hi Carlos,

Thanks for your response, as you see the qat performance is not good in small size data.

The throughput is worst than SW. Could you tell me what kind of use cases that QAT is the best?

I using QAT just for my research.

0 Kudos
CarlosAM_INTEL
Moderator
903 Views

Hello SamChang,

Thanks for your reply.

In order to be in the same page, could you please explain us in a detailed the meaning of the question "what kind of use cases that QAT is the best?"?

 

Thanks in advance for your cooperation to solve this case.

 

Best Regards,

Carlos_A.

0 Kudos
SChan69
Novice
903 Views

Hi Carlos

Sorry, I do not describe my question clearly.

My question is What kind of real case? I can use QAT to handle it.

0 Kudos
CarlosAM_INTEL
Moderator
903 Views

Hello SamChang,

Thanks for your reply.

Based on your clarification, the provided information is the only we have, also nothing to use Async for best performance, and that zero copy has limitations.

Best Regards,

Carlos_A.

0 Kudos
SChan69
Novice
903 Views

Hi Carlos

I got it, thanks for your help....

CarlosAM_INTEL
Moderator
903 Views

Hello SamChang,

Thanks for your update.

We are glad to hear that provided information was useful to you.

Please do not hesitate to contact us again if you have questions related to Intel(R) Embedded devices.

Best Regards,

Carlos_A.

0 Kudos
Reply