Zone of Algorithm

Of course. Let’s embark on a detailed journey through the Fast Fourier Transform (FFT) algorithm, using your provided C++ code as our guide. We will break down the problem, the brilliant mathematical idea behind its solution, the mechanics of the algorithm itself, and finally, how each line of code masterfully implements this complex theory.

1. The Problem: The Slowness of Polynomial Multiplication

At its heart, the problem is to compute the product of two polynomials, and .

Let’s represent them in their coefficient form:

The product polynomial, let’s call it , will have a degree of . Let its coefficients be .

How do we find the coefficient ? It is the sum of all products where the powers of add up to , i.e., . This gives us the formula for the coefficients of the product:

(assuming for and for )

This operation is known as a convolution of the coefficient sequences and .

A straightforward implementation of this formula would involve a nested loop:

// Pseudocode for naive polynomial multiplication
for (int k = 0; k <= n + m; ++k) {
    h[k] = 0;
    for (int i = 0; i <= k; ++i) {
        if (i <= n && (k - i) <= m) {
            h[k] += f[i] * g[k-i];
        }
    }
}

The outer loop runs about times, and the inner loop runs up to times. This results in a time complexity of roughly , or more simply, . With , an approach would require around operations, which is far too slow for any modern computer. We need a much faster way.

2. The Core Idea: A Change in Perspective

The difficulty lies in the convolution operation in the coefficient domain. The brilliant insight of the FFT-based approach is to switch to a different representation of polynomials where multiplication is trivial.

This is the point-value representation.

A polynomial of degree is uniquely determined by its value at any distinct points. For example, a line (degree 1) is uniquely defined by two points. A parabola (degree 2) is defined by three points.

So, instead of representing by its coefficients , we can represent it by a set of point-value pairs:

We choose points because the resulting polynomial has degree . Let’s say we have the point-value representations for both and at the same set of points :

Now, what is the point-value representation of the product ? It’s incredibly simple! For any point , the value is just .

This point-wise multiplication takes only time. This is a massive improvement!

This leads to a new three-step plan for polynomial multiplication:

Evaluation (Coefficient to Point-Value): Given the coefficients of and , pick a set of points and evaluate both polynomials at these points. This is also called the Discrete Fourier Transform (DFT).
Point-wise Multiplication: Create the point-value representation of by multiplying the values of and at each point. This takes time.
Interpolation (Point-Value to Coefficient): Convert the point-value representation of back into its coefficient form. This is the Inverse Discrete Fourier Transform (IDFT).

The challenge now is to make steps 1 and 3 fast. If we pick arbitrary points and evaluate naively (e.g., using Horner’s method), each evaluation takes time. Doing this for points would be , and we are back where we started. The “Fast” in Fast Fourier Transform comes from a clever choice of evaluation points that allows for a divide-and-conquer algorithm to perform steps 1 and 3 in time.

3. The Magic Ingredient: Complex Roots of Unity

The special points we choose are the complex roots of unity.

An -th root of unity is a complex number such that . There are exactly distinct -th roots of unity. Using Euler’s formula, , these roots can be expressed as:

, for .

Here, is the imaginary unit . is called the principal -th root of unity. All other roots are just powers of this principal root.

These roots have beautiful, symmetric properties that are key to the FFT algorithm:

Periodicity: .
Symmetry: .
Reduction: . This property is the cornerstone of the divide-and-conquer approach, as it connects the roots of size to the roots of size .

The Discrete Fourier Transform (DFT) of a coefficient vector is the vector of values obtained by evaluating the polynomial at the -th roots of unity:

The Inverse DFT surprisingly has a very similar form. Given the values , we can recover the coefficients by:

Notice the only two differences: we use the negative powers of the roots (), and we scale the final result by . This means the same core algorithm can be used for both DFT and IDFT!

4. The Algorithm: Fast Fourier Transform (FFT)

The FFT is a divide-and-conquer algorithm to compute the DFT in time. Let’s see how it works for a polynomial of degree (we pad with zeros if necessary, and is a power of 2).

We split the coefficients of into two halves: those with even indices and those with odd indices.

Let’s define two new polynomials of degree :

Now we can write in terms of these smaller polynomials:

The goal is to compute for . Let’s substitute into our new expression:

Using the reduction property , this simplifies to:

This is a beautiful recurrence. We have expressed the DFT of a size- problem in terms of two DFTs of size-. But the real magic happens when we consider the second half of the output, for .

Using the symmetry property :

Let’s summarize for :
Let $E_k = \text{DFT}(A_{even})k = A{even}(\omega_{N/2}^k)O_k = \text{DFT}(A_{odd})k = A{odd}(\omega_{N/2}^k)W_k = \omega_N^k$ (often called the “twiddle factor”)

Then:

This is the famous butterfly operation. We take two values from the smaller DFTs ( and ) and combine them to produce two values for the larger DFT ( and ).

The recurrence relation for the runtime is , which solves to . This is our breakthrough!

5. From Recursion to Iteration: Deconstructing the Code

Recursive FFT is elegant but can be slow due to function call overhead. The provided code uses a more efficient iterative, bottom-up approach. Let’s see how it maps to the theory.

5.1. `main()` - The Grand Plan

int main() {
    // ... Fast I/O ...

    cin >> n >> m;
    int need = n + m + 1; // Degree of product is n+m, so n+m+1 coefficients
    int N = 1;
    while (N < need) N <<= 1; // Find smallest power of 2 >= need
    revInit(N); // Prepare for the iterative FFT

    // Create complex vectors, padding with zeros up to size N
    vector<cdb> A(lim, cdb(0.0, 0.0)), B(lim, cdb(0.0, 0.0));
    for (int i = 0; i <= n; ++i)	
        cin >> x, A[i] = cdb((double)x, 0.0); // Read F(x)
    for (int i = 0; i <= m; ++i)	
        cin >> x, B[i] = cdb((double)x, 0.0); // Read G(x)

    FFT(A, 1);    // Step 1: Evaluate F(x)
    FFT(B, 1);    // Step 1: Evaluate G(x)

    for (int i = 0; i < lim; ++i)
        A[i] *= B[i]; // Step 2: Point-wise product

    FFT(A, -1);   // Step 3: Interpolate H(x)

    for (int i = 0; i < need; ++i)
        cout << (LL)(A[i].real() + 0.5) << ' '; // Output coefficients
    return 0;
}

This is a direct implementation of our three-step plan.

Padding: The size of the transform, N (called lim in the code), must be a power of two large enough to hold the product polynomial’s coefficients.
revInit(N): This pre-computes a permutation that is crucial for the iterative algorithm. We’ll examine it next.
cdb: This is a complex<double>. The coefficients are loaded into the real part.
FFT(A, 1): The 1 signifies a forward FFT (DFT).
FFT(A, -1): The -1 signifies an inverse FFT (IDFT).
Rounding: A[i].real() + 0.5 is a standard trick to round a positive floating-point number to the nearest integer, mitigating small precision errors.

5.2. `revInit()` - The Bit-Reversal Permutation

inline void revInit(int n_lim) {
    lim = n_lim;
    L = 0;
    while ((1 << L) < lim) ++L;
    rev.resize(lim);
    for (int i = 0; i < lim; ++i)
        rev[i] = (rev[i >> 1] >> 1) | ((i & 1) << (L - 1));
    return;
}

If you trace the recursive divide-and-conquer calls, you’ll notice an interesting pattern. At the base case (when subproblems have size 1), the coefficients are not in their original order. For :

Level 0: {a0, a1, a2, a3, a4, a5, a6, a7}
Level 1: {a0, a2, a4, a6} and {a1, a3, a5, a7}
Level 2: {a0, a4}, {a2, a6}, {a1, a5}, {a3, a7}
Level 3 (base): {a0}, {a4}, {a2}, {a6}, {a1}, {a5}, {a3}, {a7}

The final order of indices is {0, 4, 2, 6, 1, 5, 3, 7}. Let’s look at their binary representations (for bits):

0 (000) -> 0 (000)
1 (001) -> 4 (100)
2 (010) -> 2 (010)
3 (011) -> 6 (110)
4 (100) -> 1 (001)
etc.

The final position of coefficient a_i is the index whose binary representation is the reverse of the binary representation of i. This is the bit-reversal permutation.

The iterative FFT starts by reordering the input array according to this permutation. This pre-shuffling allows it to build the solution bottom-up, combining adjacent pairs into blocks of 2, then blocks of 2 into blocks of 4, and so on, mimicking the recursive calls returning.

The line rev[i] = (rev[i >> 1] >> 1) | ((i & 1) << (L - 1)); is a clever dynamic programming way to compute this. The reversal of i is related to the reversal of i >> 1 (i.e., i without its last bit). We take the reversal of i>>1, shift it right to make space, and pop the last bit of i (i & 1) into the most significant position.

5.3. `FFT()` - The Iterative Engine

inline void FFT(vector<cdb>& a, int op) {
    // 1. Apply the bit-reversal permutation
    for (int i = 0; i < lim; ++i)
        if (i < rev[i]) swap(a[i], a[rev[i]]);

    // 2. Iterative butterfly operations
    for (int len = 1; len < lim; len <<= 1) { // len = size of sub-DFTs being merged
        double ang = PI / len * op;
        cdb wlen(cos(ang), sin(ang)); // Principal root: w_{2*len}^1
        for (int i = 0; i < lim; i += (len << 1)) { // Iterate over blocks of size 2*len
            cdb w(1.0, 0.0); // Current twiddle factor w_{2*len}^j, starts at w^0=1
            for (int j = 0; j < len; ++j) { // Butterfly ops within a block
                cdb u = a[i + j];
                cdb v = a[i + j + len] * w;
                a[i + j] = u + v;
                a[i + j + len] = u - v;
                w *= wlen; // Update w to w_{2*len}^{j+1}
            }
        }
    }

    // 3. Scale for Inverse FFT
    if (op == -1)
        for (int i = 0; i < lim; ++i)
            a[i] /= lim;
    return;
}

Let’s analyze this masterpiece of an algorithm.

Bit-Reversal: The first loop does the pre-shuffling we just discussed. if (i < rev[i]) ensures we swap each pair only once. The array a is now in the order the recursive base cases would be.
Iterative Loops: The algorithm now builds the DFT bottom-up.
- for (int len = 1; len < lim; len <<= 1): This outer loop controls the size of the subproblems we are combining. len is half the size of the blocks we are forming. It starts with len=1 (combining size-1 DFTs to make size-2 DFTs), then len=2 (combining size-2 to make size-4), and so on, up to len=lim/2.
- for (int i = 0; i < lim; i += (len << 1)): This loop iterates through the starting indices of the blocks. The block size is 2*len. For len=1, it jumps i=0, 2, 4, .... For len=2, it jumps i=0, 4, 8, ....
- for (int j = 0; j < len; ++j): This is the core butterfly loop. It iterates len times to perform the len butterfly operations required to merge two sub-blocks of size len into one large block of size 2*len.
  - u = a[i + j]: This is the result from the “even” part sub-DFT, corresponding to our .
  - v = a[i + j + len] * w: This is the result from the “odd” part sub-DFT, multiplied by the twiddle factor, corresponding to our .
  - a[i + j] = u + v; and a[i + j + len] = u - v;: This is the butterfly operation itself, storing the results back in place.
Twiddle Factors:
- wlen: This is the principal root for the current stage, . The angle is PI / len because the full circle is , and we are dividing it into parts. The op parameter neatly handles both DFT (op=1) and IDFT (op=-1, which conjugates the root by flipping the sign of the sine term).
- w: This is the current twiddle factor, . It starts at 1 and is updated in each step by multiplying with wlen.
Inverse FFT Scaling:
- As derived from the IDFT formula, if we’re doing an inverse transform (op == -1), we must divide all resulting coefficients by lim (which is ). This code does it conveniently at the very end.

6. Standard Code

#include <bits/stdc++.h>
#define LL long long
#define cdb complex<double>

using namespace std;

const double PI = acos(-1.0);

int n, m, x;
int lim, L;
vector<int> rev;

inline void revInit(int n_lim) {
    lim = n_lim;
    L = 0;
    while ((1 << L) < lim) ++L;
    rev.resize(lim);
    for (int i = 0; i < lim; ++i)
        rev[i] = (rev[i >> 1] >> 1) | ((i & 1) << (L - 1));
    return;
}

inline void FFT(vector<cdb>& a, int op) {
    for (int i = 0; i < lim; ++i)
        if (i < rev[i]) swap(a[i], a[rev[i]]);
    for (int len = 1; len < lim; len <<= 1) {
        double ang = PI / len * op;
        cdb wlen(cos(ang), sin(ang));
        for (int i = 0; i < lim; i += (len << 1)) {
            cdb w(1.0, 0.0);
            for (int j = 0; j < len; ++j) {
                cdb u = a[i + j], v = a[i + j + len] * w;
                a[i + j] = u + v;
                a[i + j + len] = u - v;
                w *= wlen;
            }
        }
    }
    if (op == -1)
        for (int i = 0; i < lim; ++i)
            a[i] /= lim;
    return;
}

int main() {
    ios::sync_with_stdio(false);
    cin.tie(nullptr);
    cout.tie(nullptr);

    cin >> n >> m;
    int need = n + m + 1;
    int N = 1;
    while (N < need) N <<= 1;
    revInit(N);

    vector<cdb> A(lim, cdb(0.0, 0.0)), B(lim, cdb(0.0, 0.0));
    for (int i = 0; i <= n; ++i)	
        cin >> x, A[i] = cdb((double)x, 0.0);
    for (int i = 0; i <= m; ++i)	
        cin >> x, B[i] = cdb((double)x, 0.0);
    FFT(A, 1);
    FFT(B, 1);
    for (int i = 0; i < lim; ++i)
        A[i] *= B[i];
    FFT(A, -1);

    for (int i = 0; i < need; ++i)
        cout << (LL)(A[i].real() + 0.5) << ' ';
    return 0;
}

7. Conclusion: The Symphony of Code and Math

The FFT algorithm for polynomial multiplication is a testament to the power of algorithmic thinking and mathematical insight. By viewing polynomials not as lists of coefficients but as functions to be evaluated, we unlock a path to immense speedup.

The process is a perfectly choreographed dance:

Setup: We determine the necessary size N, a power of two, to avoid cyclic convolution issues and to fit the algorithm’s structure.
Transformation (DFT): We use the iterative FFT, powered by the bit-reversal permutation and butterfly operations based on the properties of complex roots of unity, to transform our coefficient representations into point-value representations in time.
The Simple Step: We perform the trivial point-wise multiplication, which corresponds to the complex convolution in the coefficient domain.
Transformation Back (IDFT): We use the exact same FFT algorithm, with a minor tweak to the angle and a final scaling step, to interpolate the resulting point-values back into the final coefficients, again in time.

Do not go gentle into that good night.