PerftStorm is the GPU-native engine lab. It exists because, after enough exact-perft work, the remaining question stops being “how far can the legacy counter be pushed” and becomes “what should a GPU-first chess engine actually look like.”
That makes PerftStorm complementary to GPUPerft, not a replacement for it:
Alpha status: Work in progress. PerftStorm is experimental engine and move-generation work. Behavior, interfaces, and internal assumptions are still in motion. Run it at your own risk.
GPUPerftis the exact-count and export linePerftStormis the engine-architecture line
Current Release Map
The current downloadable Windows x64 lab builds are:
| Executable | Purpose | Downloads |
|---|---|---|
PerftStormLab.exe |
Board-representation and GPU move-discovery lab. | exe / sha256 |
PerftStormConveyorLab.exe |
Variable-depth continuation and paired-wave conveyor lab. | exe / sha256 |
What It Focuses On
The live scope is deliberately narrow:
- board representations that fit the hardware
- legal move generation at high throughput
- continuation waves for deeper frontiers
- profiler-driven engine architecture work
The repo is not trying to be a polished engine release yet. It is the place where the GPU-first assumptions are tested before they become architecture.
StormLab
StormLab is the move-generation and representation lab.
The current frontrunner is the nbs bit-sliced NybbleBoard-style layout, and the measured legal-path results are already strong enough to matter:
| Case | Result |
|---|---|
| Start position legal fast path | 10.138B legal boards/s |
| Kiwipete legal fast path | 7.213B legal boards/s |
| 218-move stress FEN legal fast path | 6.370B legal boards/s |
nsym-legalb-l01 on easy no-state frontiers |
43.79B fully legal boards/s |
nsym-legalb-l01 on mixed no-state frontiers |
26.41B fully legal boards/s |
| Current full-walk keeper on the 1M walk corpus | about 39.2B fully legal boards/s |
Those numbers are useful because they are tied to legal-correctness gates and comparative representation work, not just to one isolated kernel.
StormConveyorLab
StormConveyorLab is the variable-depth continuation bridge. Its purpose is to move odd-depth frontiers forward in explicit paired +2 ply waves instead of treating shallow suffix depth as a permanent architectural boundary.
Current Validation Snapshot
The first vertical slice, dated 2026-05-18, already validates the shape:
| Scenario | Outcome |
|---|---|
| Corpus depth-5 validation | exact on start, kiwi, midgame, endgame, en-passant-like, and check cases |
| Repeated-wave check case | check d7 returned 91,624 |
| Repeated-wave start position | exact d7 count 3,195,901,860 |
| Build cost | all-status CUDA compile took about 52 minutes wall clock |
Measured admitted start-position wave timings:
| Wave | Input Rows | Descriptors | Counted Nodes | Emit us | Count us | Advance us |
|---|---|---|---|---|---|---|
d3 -> d5 |
8,902 |
197,281 |
4,865,609 |
963.264 |
544.768 |
1,495.46 |
d5 -> d7 |
4,865,609 |
119,060,324 |
3,195,901,860 |
6,396.03 |
25,787.1 |
0 |
The practical significance is that the continuation-wave shape is exact and reusable. That is a prerequisite for a larger GPU-native engine path.
Command-Line Shape
StormLab
.\PerftStormLab.exe --variant all --boards 65536 --iters 8 --json D:\ps_mg\storm_lab.json
Conveyor Smoke
.\PerftStormConveyorLab.exe --scenario check --target-depth 5 --reps 1
Profiling
.\scripts\profile_movegen_lab.ps1 -Out D:\ps_mg -Variant all -Boards 65536 -Iters 8
Why It Matters
PerftStorm is where the project stops assuming that the right long-term engine architecture should resemble the legacy exact counter.
The exact-count work still supplies the validation discipline. What changes here is the design target: board formats, legal move generation, continuation flow, and scheduling choices that are selected because they suit the GPU, not because they preserve old recursive structure.