Skip to Content
Introduction

Reader’s Note: This article is under active development. The benchmark plots below are complete and accurate; I am currently finalizing the text that explains the technical reasons behind Blaze’s efficiency. Please check back soon for the full analysis.

Blaze: High-Performance Photonic Crystal Solver
January 8, 2026

Blaze: A High-Performance Solver for Photonic Crystals

Achieving order-of-magnitude speedups through a mixed-precision LOBPCG algorithm and cache-aware architecture.


Photonic band structure calculations rely on the Plane Wave Expansion (PWE) method, widely standardized by the MIT Photonic Bands (MPB)  software. Established as the field’s gold standard, MPB is trusted for its accuracy and has long been assumed to represent the ceiling of computational efficiency for these problems.

We introduce Blaze, a Rust-based solver that modernizes the PWE approach using mixed-precision arithmetic and an improved LOBPCG algorithm. By explicitly targeting memory bandwidth bottlenecks and leveraging batched Level 3 BLAS operations, Blaze offers superior single- and multi-core scaling, and achieves a 95% reduction in memory footprint while maintaining reference accuracy.

Performance benchmarks utilize the canonical square and hexagonal lattice configurations from Joannopoulos’ seminal 1997 Nature paper .[1] Unless otherwise specified, all data reflects the square lattice configuration at a resolution of 64, calculating 8 bands with 20 k-points per segment.

Single-Core Performance

The computational cost of PWE solvers is dominated by Fast Fourier Transforms (FFTs). Transverse Electric (TE) modes are historically more expensive to solve than Transverse Magnetic (TM) modes, as they require six FFT operations per iteration compared to just two for TM.

This complexity penalty is clearly visible in the legacy solver. Blaze, however, mitigates this through algorithmic optimizations. Even in Full Precision (f64), Blaze outperforms MPB. The decisive leap comes from the Mixed Precision (f32/f64) approach, which reduces memory traffic enough to effectively double the throughput, resulting in a total speedup of approximately .

Loading benchmark data...

Multi-Core Performance

The architectural age of legacy solvers is most evident in parallel execution.[2] MPB attempts to parallelize individual operations within an iteration. On modern hardware, where small-to-medium lattice problems fit entirely within the CPU cache, this fine-grained threading introduces synchronization overhead that outweighs the computational gains, causing performance to regress as threads are added.

Blaze avoids this contention by parallelizing entire jobs rather than individual operations, a strategy optimized for large-scale parameter sweeps.

The chart below reveals a clear three-tier performance hierarchy. The legacy solver struggles with thread overhead. Blaze in Full Precision scales efficiently but remains sensitive to the higher algebraic load of TE modes (approx. 240–350 ms). In contrast, the Mixed Precision mode hits a “hard floor” at roughly 160 ms. By halving the memory requirement for state vectors, Blaze masks the computational complexity of the TE mode, proving that the solver has hit the physical memory bandwidth limit of the machine.

Loading benchmark data...

All subsequent benchmarks for Blaze are performed in Mixed Precision mode, unless otherwise specified.

Memory Efficiency

High-performance computing is increasingly defined by data movement. A major limitation of MPB is its static memory management; benchmarks reveal that the legacy solver reserves a large, fixed memory block (approx. 190 MB) regardless of the problem size.

Blaze adopts a dynamic allocation strategy. As shown below, this results in a dramatic reduction in peak memory usage for standard resolutions. This reduction is critical: by keeping the working set small, Blaze allows the CPU to operate almost entirely within its high-speed L3 cache, avoiding the latency penalty of fetching data from main RAM.

Fundamentally, the storage requirements for FFTs and operator workspaces scale directly with the grid resolution (NN). Therefore, analyzing memory growth against resolution provides the most critical insight into the architectural efficiency.

Loading benchmark data...

This efficiency extends to the dimensionality of the search space. In the LOBPCG algorithm, the search space size is determined by the number of bands (3n3n). While one might expect memory usage to scale with this complexity, both solvers maintain a constant footprint even as the number of bands increases.

Loading benchmark data...

Memory Scaling Laws

To understand the limits of this efficiency, we analyzed how the relative advantage evolves. At low resolutions, MPB is dominated by its static overhead, giving Blaze a 20× advantage. As the resolution increases, the physical storage requirements for the grid naturally grow, and the ratio asymptotically approaches 1x. As mentioned, for the number of bands sweep, both solvers maintain constant memory usage, resulting in a flat ratio.

Loading benchmark data...

For varying resolutions, MPB’s memory usage is effectively constant
(N0.06N^{0.06}), confirming the pre-allocation hypothesis. In contrast, Blaze follows a near-linear trend (N1.09N^{1.09}), scaling predictably with the problem size. Notably, this footprint is identical for both TM and TE polarizations, proving that the storage cost in Blaze is determined strictly by grid topology, independent of the operator’s computational complexity.

Loading benchmark data...

Accuracy Validation

The significant reduction in precision and memory footprint raises a critical question: does this compromise physical accuracy? To verify this, we compared the eigenfrequencies calculated by Blaze against high-precision reference scans.

Loading...
Loading...
Loading...
Loading...

Deviation Analysis

The data reveals a distinct behavior for each polarization. Transverse Magnetic (TM) modes show near-perfect agreement
(10410^{-4}; Blaze’s internal tolerance), with Blaze often converging to slightly lower eigenvalues for higher bands than the reference, suggesting a more robust minimization in the LOBPCG solver.

For Transverse Electric (TE) modes, a systematic upward shift is observable. This deviation is not an artifact of mixed precision, but stems from differences in the inverse epsilon tensor smoothing and operator definitions. While the absolute frequencies differ slightly due to this gauge freedom, the qualitative physics remain intact, specifically band crossings and topological features.

Finally, the deviating trajectories observed in higher bands are an expected characteristic of the mixed-precision approach. The reduced precision may leave some fine-grained degeneracies unlifted; however, these resolve naturally when the search space is expanded to include additional bands.

Loading benchmark data...
Loading benchmark data...
Loading benchmark data...
Loading benchmark data...
Loading benchmark data...
Loading benchmark data...
Loading benchmark data...
Loading benchmark data...
Loading benchmark data...
Loading benchmark data...
Loading...
163264128256
Resolution: 64×64

References

  1. [1]

    Joannopoulos, J., Villeneuve, P. & Fan, S. Photonic crystals: putting a new twist on light. Nature 386, 143–149 (1997). [DOI]

  2. [2]

    J. L. Hennessy and D. A. Patterson, Computer Architecture: A Quantitative Approach, 5th ed. Morgan Kaufmann, 2011, Fig. 2.2. [PDF]


    Figure 2.2 illustrates the dramatic divergence in performance trends between processor speed and memory bandwidth, showing how memory access has become the dominant bottleneck in modern computing systems.

Last updated on