- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi there everyone...i'm trying to implement both TEA and XTEA algorithm to make a comparison. I have a complete working vhdl of TEA as following:Library ieee;
use ieee.std_logic_1164.all;
use ieee.std_logic_unsigned.all;
use ieee.std_logic_unsigned.all;
entity TEA_en is
port(
clock: in std_logic; --clock input
input_data: in std_logic_vector (63 downto 0); --input data
key : in std_logic_vector (127 downto 0); --secret key 127 downto 0--
encrypted_data: out std_logic_vector (63 downto 0) --output/encrypted data
);
end entity TEA_en;
architecture behave of TEA_en is
--declare signals
signal Key0, Key1, Key2, Key3 : std_logic_vector (31 downto 0);
signal Z, Y : std_logic_vector (31 downto 0);
signal count :integer :=0;
begin
--separate key into four parts
Key0<=key(127 downto 96);
Key1<=key(95 downto 64);
Key2<=key(63 downto 32);
Key3<=key(31 downto 0);
Process(Input_data, clock)
--declare and initialize variable
Variable delta: std_logic_vector (31 downto 0):=x"9e3779b9";
Variable sum: std_logic_vector (31 downto 0):=x"00000000";
Variable Zeq,Yeq,Z,Y: std_logic_vector (31 downto 0);
Begin
If(rising_edge(clock)) then
If (count<1) then --separate input data into two parts
Z:=input_data(63 downto 32); --part 1 (32bits)
Y:=input_data(31 downto 0); --part 2 (32bits)
Else --null;
End if;
If (count<32) then
--Encryption routine algorithms
sum:=sum+delta;
--Calculate Y
Zeq:=( (Z(27 downto 0) & "0000")+Key0) xor --left shift 4 bits and sum to secret key1
(Z+ sum) xor --Z add to sum
(("00000" & Z(31 downto 5))+Key1); --right shift 5 bits and sum to key2
Y:=Y+Zeq;
--Calculate Z
Yeq:=( (Y(27 downto 0) & "0000")+Key2) xor --left shift 4 bits and sum to secret key1
(Y+ sum) xor --Z add to sum
(("00000" & Y(31 downto 5))+Key3); --right shift 5 bits and sum to key2
Z:=Z+Yeq;
--Output encrypted data
Encrypted_data<=Y&Z;
else
end if;
count<=count+1; --increase value of count
End if;
End process;
end architecture behave;
signal pipeline_0,pipeline_1,pipeline_2,pipeline_3,pipeline_4,pipeline_5: std_logic_vector(31 downto 0);
pipeline_0<=((Z(27 downto 0) & "0000")+Key0);
pipeline_1<=(Z+ sum);
pipeline_2<=(("00000" & Z(31 downto 5))+Key1);
pipeline_3<=( (Y(27 downto 0) & "0000")+Key2);
pipeline_4<=(Y+ sum);
pipeline_5<=(("00000" & Y(31 downto 5))+Key3);
Zeq:= pipeline_0 xor pipeline_1 xor pipeline_2;
Yeq:= pipeline_3 xor pipeline_4 xor pipeline_5;
...any suggestions or is there anyway to improve resource usage and speed of operation or am i doing anything redundant?...any experts out there please do help! thanx in advance!
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You need a major redesign if you want to achieve high speed.
You have long comb. path e.g. from input to Z to Zeq to Y to Yeq. Your pipelining is ok but only partial. The change of function(you call it wrong results) need to be balanced after inserting pipe. Other notes: your counter seems unconstrained going up from 0 and up; your + operator is applied to std_logic_vector (how was it accepted by compiler)- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
hi kaz, thank you so much for the comment! appreciate it...you mentioned that i have to balance out pipeline? sorry but i don't quite understand how i can do so? i'll redesigned the long combinational path part as you mentioned.
on the other hand, will constraining the counter help achieve faster speed? Also, it seems like the code works when i use "use ieee.std_logic_unsigned.all;" but not when i use "use ieee.numeric_std.all". when i use the later package the following error will appear: Error (10327): VHDL error at TEA_en.vhd(46): can't determine definition of operator ""+"" -- found 0 possible definitions you also mentioned that i only have partial pipelinng. how do i achieve a full pipelining? The system block diagram (in fact i used the c code in the following lin to model my system) can be seen here: http://en.wikipedia.org/wiki/tiny_encryption_algorithm Again, thank you for your ideas and comments! they are of great help!:)- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
First the counter need be constrained otherwise it defaults to some 40 or so bits (integer limit) and wouldn't come back to zero until it goes that far and that is not what you want.
For pipeline balance you need to match the delay caused by each register so you add or xor data ...etc. as originally designed but with delay. For example if A = A1+A2 and C = A + B then if you delay A you should delay B equally so that A matches B to get correct C. I said your pieline is incomplete because the path from input to Z to Zeq to Y is all combinatorial. You only pipelined the computation of Z. May be the best way for you is to use signal instead of variable as then it will force a pipeper assignment and all you need is balance the delays.- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi kaz! thank you for your prompt reply! and by constraining the counter as you suggested, the throughput is now 14ns! improved by 1 ns!you're a genius! hahaha:) as for the delay balance...does quartus support this? how can i balance the delay when i don't know the delay of the components? does quartus support "after 3ns" command? anyway i can implementing the delay balancing via coding that you know of? thanx for suggesting the use of signals...i'm looking into it but thanx again for all the advice!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I don't expect counter restrain to improve speed much as the main bottleneck is your long comb path.
By delay I mean one clock period per register so you don't need quartus to tell you. If you register a node then it is updated one clock later if it is a signal. But if it is a variable then it is updated without clock delay. Use of variable is a bit tricky, normally it implies no register i.e. comb. section within clocked process. But there is one exception and that is if variable is read before its assignment, in this case compiler understands it as you want to keep value at end of process and so creates a register and thus becomes equivalent to signal...rather confusing.- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
hi again kaz, thanx for your previous reply...i tried to pipeline the bottleneck using the following code, but it doesn't give me the correct results even after i perform pipeline balancing:
Library ieee;
use ieee.std_logic_1164.all;
use ieee.std_logic_unsigned.all;
entity TEA_en is
port(
clock: in std_logic; --clock input
input_data: in std_logic_vector (63 downto 0); --input data
key : in std_logic_vector (127 downto 0); --secret key 127 downto 0--
encrypted_data: out std_logic_vector (63 downto 0) --output/encrypted data
);
end entity TEA_en;
architecture behave of TEA_en is
--declare signals
signal Key0, Key1, Key2, Key3 : std_logic_vector (31 downto 0);
signal Z, Y : std_logic_vector (31 downto 0);
signal count :integer range 0 to 32:=0; --constrained counter (counts from 0 to 32) improves throughput by 1 ns!
signal pipeline_0,pipeline_1,pipeline_2,pipeline_3,pipeline_4,pipeline_5,pipeline_6,pipeline_7:std_logic_vector(31 downto 0);
begin
--separate key into four parts
Key0<=key(127 downto 96);
Key1<=key(95 downto 64);
Key2<=key(63 downto 32);
Key3<=key(31 downto 0);
Process(Input_data, clock)
--declare and initialize variable
Variable delta: std_logic_vector (31 downto 0):=x"9e3779b9";
Variable sum: std_logic_vector (31 downto 0):=x"00000000";
Variable Zeq,Yeq,Z,Y: std_logic_vector (31 downto 0);
Begin
If(rising_edge(clock)) then
If (count<1) then --separate input data into two parts
Z:=input_data(63 downto 32); --part 1 (32bits)
Y:=input_data(31 downto 0); --part 2 (32bits)
Else --null;
End if;
If (count<32) then
--Encryption routine algorithms
sum:=sum+delta;
for i in 1 to 8 loop
case i is
when 1=> pipeline_0 <=( (Z(27 downto 0) & "0000")+Key0);
when 2=> pipeline_1 <=(Z+ sum) ;
when 3=> pipeline_2 <=(("00000" & Z(31 downto 5))+Key1);
when 4=> pipeline_3 <=( (Y(27 downto 0) & "0000")+Key2) ;
when 5=> pipeline_4 <=(Y+ sum) ;
when 6=> pipeline_5 <=(("00000" & Y(31 downto 5))+Key3) ;
when 7=> pipeline_6 <=pipeline_0 xor pipeline_1 xor pipeline_2;
when 8=> pipeline_7 <=pipeline_3 xor pipeline_4 xor pipeline_5;
when others=> null;
end case;
end loop;
Y:=Y+pipeline_6;
Z:=Z+pipeline_7;
--Output encrypted data
Encrypted_data<=Y&Z;
else null;
end if;
count<=count+1; --increase value of count
End if;
End process;
end architecture behave;
anything i did wrong in the process? many thanx again for the help!^^
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
This is where a good testbench and time to sitdown debugging your code will help.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Indeed as Tricky said you need time and testing to finalise.
But I note your counter though constrained in declaration but not in actual logic, you must define at what value your counter returns back in the logic of counter. You don't need the loop statement as it is doing nothing and each assignment inside this loop is done once anyway. May be you can do your testing this way: instantiate your first working vhdl module and then the new one. Give them same inputs and check outputs until they are same with just delay being different.- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page