Lattice Part 5: Structured Lattices in Practice — Ring-LWE, Module-LWE, NTRU, Kyber

2024-06-21 约 2732 字预计阅读 13 分钟

Reading: Peikert’s survey spine¹, Regev’s lecture-note framing², and the primary RLWE / Module-LWE / Kyber / NTRU papers³⁴⁵⁶ are enough to keep the taxonomy straight.

Part 4 already fixed the baseline: LWE is noisy linear algebra, and the noise is what blocks elimination. Part 5 asks a different question. If the hard core is already there, what do we gain by forcing that core to live inside a quotient ring or a module?

The answer is mainly structural. Structure reduces representation cost, makes multiplication regular, and gives implementers a path to fast polynomial arithmetic. The price is conceptual: once you add algebraic structure, you also constrain the design space and create assumptions that are not identical to plain LWE.

Reader State Before Adding Rings

Part 4 left us with one equation shape:

$$ \mathbf{b}=A\mathbf{s}+\mathbf{e}\pmod q. $$

Part 0-3 left us with the geometric language for interpreting it: a public linear map defines a lattice or a coset, the useful witnesses are short, and exact solving becomes hard once one asks for short integer structure. Part 5 changes the representation of the public linear map. Instead of storing a completely arbitrary matrix, we force the map to come from polynomial multiplication.

The reader should keep one rule in mind:

$$ \text{polynomial multiplication modulo } x^n+1 \quad\text{is a structured matrix-vector multiplication on coefficients}. $$

Everything else in this article is a consequence of that rule.

Why Add Algebraic Structure

What Part 4 Already Stabilized

Plain LWE gives the right cryptographic pattern:

$$ \mathbf{b} = A\mathbf{s} + \mathbf{e} \pmod q. $$

The public data is linear, the secret is hidden, and the error is small enough to make decryption possible but large enough to prevent clean recovery. That is the baseline this article inherits. Nothing below changes the underlying noise mechanism.

Where Plain Matrix Arithmetic Starts To Hurt

The problem is not hardness. It is engineering.

Unstructured LWE uses general matrices and vectors, so key sizes and arithmetic scale with the full matrix description. That is flexible, but expensive. Once we want a real KEM, we care about:

fewer transmitted coefficients,
faster multiplication,
regular memory access,
compact keys and ciphertexts,
and implementations that do not rely on ad hoc linear algebra tricks.

The structured-lattice line answers those pressures by moving from arbitrary matrices to algebraic objects where multiplication has built-in regularity.

Design invariant: structure is not free. It buys speed by restricting the instance family, and that restriction is exactly what must be analyzed cryptographically.

From LWE To Ring-LWE

The Quotient Ring Setting

The first structural move is to replace vectors by polynomials modulo a fixed relation:

$$ R_q = \mathbb{Z}_q[x]/(f(x)). $$

This quotient-ring move from plain LWE to Ring-LWE is the first real structural compression step in the series.

For the structured-lattice constructions that matter here, the common choice is the negacyclic ring

$$ R_q = \mathbb{Z}_q[x]/(x^n + 1). $$

An element of $R_q$ is a polynomial representative of degree $< n$, with coefficient arithmetic done mod $q$ and polynomial arithmetic done mod $x^n + 1$. This quotient is the whole point: multiplication becomes a structured convolution rather than an arbitrary matrix action³.

The coefficient embedding is fixed throughout the article:

$$ a(x)=a_0+a_1x+\cdots+a_{n-1}x^{n-1}\quad\longleftrightarrow\quad \mathbf{a}=(a_0,\ldots,a_{n-1})\in\mathbb{Z}_q^n. $$

When a polynomial is called small, the claim is about this coefficient vector after choosing centered representatives. A small $s(x)$ or $e(x)$ means its coefficients are drawn from a narrow distribution, not that the polynomial is small as a formal expression.

A Four-Coefficient Example

The quotient relation $x^n+1=0$ means $x^n\equiv -1$. For $n=4$,

$$ x^4\equiv -1,\qquad x^5\equiv -x\pmod{x^4+1}. $$

So multiplication wraps around with a sign change. For example,

$$ \begin{aligned} (1+2x)(3+x^3) &=3+6x+x^3+2x^4 \\ &\equiv 1+6x+x^3\pmod{x^4+1}. \end{aligned} $$

This is the first place where the word negacyclic becomes concrete: high-degree terms re-enter the coefficient vector with a minus sign. The coefficient vector of the product above is $(1,6,0,1)$ before reducing coefficients modulo $q$.

What Changes In The Sample Format

Ring-LWE keeps the same noisy-equation picture, but moves it into $R_q$. The sample becomes

$$ b(x) = a(x)s(x) + e(x) \in R_q, $$

with $a(x)$ public, $s(x)$ secret, and $e(x)$ small.

The continuity from plain LWE is direct, but it should be stated carefully. Ring-LWE is not obtained by taking an arbitrary LWE matrix $A$ and merely re-labeling its entries as polynomial coefficients. What changes is the class of allowed linear maps. In plain LWE, $A$ is generic. In Ring-LWE, multiplication by a fixed public ring element $a(x)$ induces a very specific negacyclic linear operator on coefficient vectors³.

The reduction modulo $x^n+1$ is where the sign pattern appears. Since $x^n\equiv -1$, any term $x^{n+i}$ wraps back as $-x^i$. For $c(x)=a(x)b(x)\bmod (x^n+1)$, the coefficient of $x^i$ is

$$ c_i=\sum_{j=0}^{i}a_jb_{i-j}-\sum_{j=i+1}^{n-1}a_jb_{n+i-j}\pmod q. $$

That equation is the negacyclic convolution. The matrix below is just this coefficient rule written as a linear operator.

For $n=4$, the first coordinate already shows the wrap-around terms:

$$ c_0=a_0b_0-a_1b_3-a_2b_2-a_3b_1\pmod q. $$

Without the quotient, only $a_0b_0$ would contribute to the constant term. The extra negative terms in $c_0$ come from products whose exponent is exactly $4$, because $x^4$ returns as $-1$. The later coefficients receive the same kind of wrap-around from $x^5,x^6$, and so on.

The derivation chain is:

$$ \mathbf{b} = A\mathbf{s} + \mathbf{e} \pmod q $$

becomes, after restricting the linear action to multiplication in the quotient ring,

$$ a(x)s(x)+e(x)=b(x)\pmod{(x^n+1,q)}. $$

If we write

$$ a(x)=a_0+a_1x+\cdots+a_{n-1}x^{n-1}, $$

then multiplication by $a(x)$ is represented on coefficients by a negacyclic matrix of the form

$$ \operatorname{Rot}(a)= \begin{pmatrix} a_0 & -a_{n-1} & \cdots & -a_1 \\ a_1 & a_0 & \cdots & -a_2 \\ \vdots & \vdots & \ddots & \vdots \\ a_{n-1} & a_{n-2} & \cdots & a_0 \end{pmatrix}. $$

So the real lift is:

$$ \text{generic matrix action} \longrightarrow \text{structured negacyclic action}. $$

Equivalently, this section is translating vector or matrix LWE notation into polynomial-ring or module notation while keeping the error term small in the coefficient embedding.

The compression point is now visible. A generic $n\times n$ matrix needs $n^2$ coefficients. Multiplication by one polynomial $a(x)$ needs only $n$ coefficients, but it still defines an $n\times n$ linear map through $\operatorname{Rot}(a)$. This is the efficiency gain and the assumption restriction at the same time.

And if we bundle $k$ such ring elements into a module vector, we get the module form later in the article:

$$ \mathbf{b}(x) = \mathbf{A}(x)\mathbf{s}(x) + \mathbf{e}(x). $$

That is the algebraic lift: vector LWE $\to$ polynomial-ring LWE $\to$ module LWE.

Only The NTT We Need

The NTT matters because polynomial multiplication is the expensive step. In coefficient form, multiplying two degree-$<n$ polynomials is a convolution. The NTT diagonalizes that convolution into pointwise products in a transformed basis. This only works because the parameter choice gives suitable roots of unity in $\mathbb{Z}_q$: the transform is not black-box acceleration, but a payoff of choosing the ring and modulus carefully⁵.

The NTT role here is purely structural and arithmetic: it is the mechanism that connects structure and NTT-backed arithmetic to practical KEM efficiency.

For this article, the important point is not the FFT analogy. The important point is that the ring structure lets us precompute and reuse a fast multiplication regime. If the modulus admits the required roots of unity, one can evaluate a polynomial at a fixed set of points, multiply values coordinatewise, and interpolate back. Abstractly,

$$ \operatorname{NTT}(a\cdot b)=\operatorname{NTT}(a)\odot\operatorname{NTT}(b), $$

where $\odot$ denotes pointwise multiplication in the transform domain. Thus

$$ c(x) = a(x)b(x) $$

is implemented as transform, pointwise multiply, inverse transform.

That is why ring structure helps efficiency:

one compact object replaces a full matrix,
multiplication becomes regular and reusable,
and the implementation can exploit the same algebra repeatedly.

The cost is also conceptual: the security proof and the attack surface now depend on the exact ring, the modulus, the available roots of unity, and the distribution of coefficients. You are no longer in the generic LWE world. The instance family is narrower, the algebra is richer, and the reduction framework is no longer literally the same as plain LWE’s fully generic matrix setting¹³.

From Ring-LWE To Module-LWE

Why One Ring Element Is Not The Whole Design Space

Ring-LWE is elegant, but it can be too rigid. A single ring element gives maximal algebraic structure, but that also constrains the parameter choices and can narrow the family of instances too much for some design goals.

Module-LWE interpolates between plain LWE and Ring-LWE. It keeps polynomial arithmetic, but reintroduces a vector dimension⁴:

$$ \mathbf{b} = \mathbf{A}\mathbf{s} + \mathbf{e}, \qquad \mathbf{A}\in R_q^{k\times k},\ \mathbf{s},\mathbf{e},\mathbf{b}\in R_q^k. $$

Here each coordinate is itself a polynomial modulo $x^n+1$. The multiplication $\mathbf{A}\mathbf{s}$ therefore means $k^2$ ring multiplications and additions, not ordinary scalar multiplication over $\mathbb{Z}_q$. In coefficient form, it expands to a structured matrix over $\mathbb{Z}_q$ whose blocks are negacyclic convolution matrices.

If $\mathbf{A}=(a_{ij}(x))$, then the first coordinate is

$$ b_1(x)=a_{11}(x)s_1(x)+\cdots+a_{1k}(x)s_k(x)+e_1(x). $$

After coefficient embedding, this same equation becomes a block row:

$$ \mathbf{b}_1=\operatorname{Rot}(a_{11})\mathbf{s}_1+\cdots+\operatorname{Rot}(a_{1k})\mathbf{s}_k+\mathbf{e}_1. $$

Thus Module-LWE is not abandoning the matrix picture from Part 4. It replaces scalar entries by structured polynomial blocks.

Modules As The Interpolation Layer

This is the point that matters for practice. The module dimension $k$ is a design knob.

$k=1$ gives the ring-style extreme.
Larger $k$ moves toward matrix-LWE behavior.
Intermediate $k$ lets designers balance efficiency, hardness intuition, and implementation constraints.

So Module-LWE is not a historical footnote. It is the compromise point where structured arithmetic survives, but the construction is less brittle than a single ring element.

The conceptual cost is clear: adding structure also introduces another axis of choice. You must now justify not only $q$ and $n$, but the module rank $k$, the ring, and the noise regime together. The assumption is no longer “generic LWE over arbitrary linear maps”; it is “LWE over a chosen module over a chosen ring,” which is exactly why the design is faster and exactly why the assumption has a more specific algebraic surface.

Why This Is The Path To Modern KEMs

This interpolation is exactly why Module-LWE became the practical route.

Plain LWE is too bulky. Pure RLWE can be too rigid. Module-LWE keeps the polynomial machinery while giving enough slack for engineering tradeoffs. That slack matters for:

public-key size,
ciphertext size,
fast arithmetic,
and side-channel-aware implementations.

Peikert’s survey¹ is important here because it frames the structured-lattice family as a sequence of efficiency-driven compromises, not as unrelated inventions.

Where NTRU Fits And Where It Does Not

Why NTRU Lives In The Same Neighborhood

NTRU belongs in the same conversation because it also uses polynomial arithmetic modulo a relation such as $x^n - 1$ or $x^n + 1$, and it also relies on short secrets and fast convolution-style operations.

That shared algebraic neighborhood explains why people mention NTRU alongside RLWE and MLWE. The implementation instincts are similar: compact polynomials, fast multiplication, and quotient-ring style arithmetic.

Why NTRU Is Not Just Another LWE Family Member

But NTRU is not simply “RLWE with different parameters.”

The construction logic is different. NTRU is built around inversion and short polynomial relations, not around the same noisy linear sample model as LWE. In the basic public-key form, one chooses short polynomials $f,g$, lets $f_q$ denote the inverse of $f$ modulo $q$, and computes

$$ h \equiv g f_q \pmod q, $$

where the inverse condition is

$$ f f_q\equiv 1\pmod q. $$

Then encryption has the form

$$ e \equiv p r h + m \pmod q. $$

Decryption works because multiplying by the short secret gives

$$ f e \equiv p r g + f m \pmod q, $$

after which reduction mod $p$ recovers the message. That is a trapdoor-by-inversion construction, not the same sample-distribution model as RLWE or Module-LWE⁶.

The cancellation is different from LWE. In LWE-style encryption, decryption cancels a large linear term and leaves small noise. In this NTRU sketch, multiplication by $f$ cancels the public inverse hidden inside $h$ and moves the expression back to a small polynomial relation.

That distinction matters. We must distinguish NTRU from RLWE or MLWE instead of flattening them into one family. If you collapse NTRU into RLWE, you miss the fact that the algebraic structure is serving a different design purpose and the reduction model is different too.

So the right sentence is:

NTRU is structurally related to the LWE family, but it is not merely another LWE instance.

That is why this article keeps NTRU as a contrast case rather than as a synonym.

Kyber As The Practical Structured-Lattice KEM

Why Kyber Chooses Module-LWE

Kyber sits at the practical landing point of this line because Module-LWE gives it the right balance. It needs enough structure for compactness and speed, but not so much structure that the design becomes overconstrained.

That is also why the module rank is so important: it is the dial that lets Kyber trade off arithmetic cost against flexibility and security margins.

What The Construction Reuses From The Earlier Sections

Kyber reuses everything established above⁵:

the quotient ring $R_q = \mathbb{Z}_q[x]/(x^n+1)$,
polynomial-vector multiplication in module form,
small secret and error distributions,
and NTT-backed multiplication in the implementation path.

The NTT is not the security assumption. It is the arithmetic accelerator. It exists because the chosen ring admits fast convolution, and Kyber exploits that regularity aggressively.

This is also where the cost of structure becomes concrete. The implementation is fast, but only because the algebra is carefully chosen. Change the ring or the module parameters, and the efficiency profile changes with it.

A minimal module-LWE PKE skeleton already shows the landing point. Key generation has the familiar form

$$ \mathbf{t} = \mathbf{A}\mathbf{s} + \mathbf{e}, $$

while encryption samples a short ephemeral vector $\mathbf{r}$ and forms

$$ \mathbf{u} = \mathbf{A}^T\mathbf{r} + \mathbf{e}_1,\qquad v = \mathbf{t}^T\mathbf{r} + e_2 + m. $$

The decryption algebra is the same cancellation pattern as Part 4, now inside the module-ring setting:

$$ \begin{aligned} v-\mathbf{s}^T\mathbf{u} &= (\mathbf{A}\mathbf{s}+\mathbf{e})^T\mathbf{r}+e_2+m-\mathbf{s}^T(\mathbf{A}^T\mathbf{r}+\mathbf{e}_1) \\ &= \mathbf{e}^T\mathbf{r}+e_2-\mathbf{s}^T\mathbf{e}_1+m. \end{aligned} $$

All large structured terms cancel; only the message and accumulated small errors remain. That is exactly the earlier module equation, now turned into a practical encryption interface. The reason Kyber sits here rather than at plain LWE or pure RLWE is that the module rank $k$ gives a concrete trade-off knob: it changes key size, arithmetic cost, and implementation shape all at once.

For a reader coming directly from Part 4, this is the same correctness condition in a different representation. The message is encoded as a ring element, usually by placing coefficients near separated regions such as $0$ and roughly $q/2$. Decryption succeeds if the accumulated error

$$ \mathbf{e}^T\mathbf{r}+e_2-\mathbf{s}^T\mathbf{e}_1 $$

is still too small to cross the decoding boundary. The algebraic structure changes how multiplication is performed; the noise-budget logic is inherited from plain LWE.

Why This Is The Landing Point Of The Article

Kyber is the landing point because it shows the structured-lattice line after all the abstraction has been paid off in engineering terms.

This is not a standards digest. The important fact is not the exact FIPS 203 syntax, but the design lesson:

plain LWE gives hardness, Ring-LWE gives algebraic compression, Module-LWE gives a tunable middle ground, and Kyber is the KEM that makes that middle ground practical.

That is the continuity from Part 4 to Part 5.

Conclusion

Structured lattices do not replace LWE. They refine it.

The quotient ring gives you a controlled algebraic universe. RLWE shows how noisy linear algebra survives inside that universe. Module-LWE adds a dimension knob so the design is not trapped at either extreme. NTRU remains nearby, but as a different construction family, not a disguised LWE sample. Kyber then lands the line in a real KEM.

The main lesson is simple: algebraic structure is an efficiency tool, but it is also a cryptographic constraint. The practical art is choosing just enough structure to make the scheme fast without making the assumption too narrow.

目录