How to Rotate Pipelined Data for FFTs

Often, data is present in a pipeline in a linear order, but a calculation is desired to operate on multiple indexes into the data with a "stride" or increment that needs to step across several pipeline stages. This is the case in a pipeline implementing FFT using radix2 or radix4 butterfly operations, or implementing other types of data transforms.

Delay commutators are useful for this, but they are not always explained in detail. Here's an example that may provide the useful details.

If data in a pipeline is presented sequentially, 4 samples wide, it might look like so:

  0   1   2   3
  4   5   6   7
  8   9  10  11
 12  13  14  15
 16  17  18  19
 20  21  22  23
 24  25  26  27
 28  29  30  31

If we wish to operate on the data with indexes n, n+4, n+8, and n+12, we need to see the data as:
  0   4   8  12
  1   5   9  13
  2   6  10  14
  3   7  11  15
 16  20  24  28
 17  21  25  29
 18  22  26  30
 19  23  27  31
A delay commutator with N=1 would do this job. The structure of the commutator is one stage of delays, a mirror/rotate stage, and a second stage of delays.

In the first stage of delays, the first column is not delayed, the second column is delayed by N cycles, the third is delayed by 2*N cycles, and the fourth column is delayed by 3*N cycles.

In the second stage of delays, the same delays are present, in the opposite order - the first column is delayed by 3*N cycles and the last is not delayed.

The rotator changes it's muxing on each cycle to rearrange the data order.

Let's look at how the data would be arranged after the first delay stage:


  0
  4   1
  8   5   2
 12   9   6   3
 16  13  10   7
 20  17  14  11
 24  21  18  15
 28  25  22  19
     29  26  23
         30  27
             31
Rather than specify what the rotator does, let's rearrange the data in each cycle so that the second delay stage will give us what we want...

  0
  1   4
  2   5   8
  3   6   9  12
 16   7  10  13
 17  20  11  14
 18  21  24  15
 19  22  25  28
     23  26  29
         27  30
             31
We can reverse engineer that - if we have inputs A B C D, we want to rearrange them in a regular pattern that repeats every 4 cycles:

  A   D   C   B
  B   A   D   C
  C   B   A   D
  D   C   B   A
Of course the rotator pattern has to be aligned to when the first line of data enters the pipe stage.

If the increment is different - for example, to get 0/8/16/24 as the data indexes coming out of the pipe, then N would be 2, and the rotator pattern would be held constant for N cycles, like so:


  A   D   C   B
  A   D   C   B
  B   A   D   C
  B   A   D   C
  C   B   A   D
  C   B   A   D
  D   C   B   A
  D   C   B   A
Chuck Benz, 29-Jul-2010