PerftStorm – AI Chess Staging

PerftStorm is the GPU-native engine lab. It exists because, after enough exact-perft work, the remaining question stops being “how far can the legacy counter be pushed” and becomes “what should a GPU-first chess engine actually look like.”

That makes PerftStorm complementary to GPUPerft, not a replacement for it:

Alpha status: Work in progress. PerftStorm is experimental engine and move-generation work. Behavior, interfaces, and internal assumptions are still in motion. Run it at your own risk.

GPUPerft is the exact-count and export line
PerftStorm is the engine-architecture line

Current Release Map

The current downloadable Windows x64 lab builds are:

Executable	Purpose	Downloads
`PerftStormLab.exe`	Board-representation and GPU move-discovery lab.	exe / sha256
`PerftStormConveyorLab.exe`	Variable-depth continuation and paired-wave conveyor lab.	exe / sha256

What It Focuses On

The live scope is deliberately narrow:

board representations that fit the hardware
legal move generation at high throughput
continuation waves for deeper frontiers
profiler-driven engine architecture work

The repo is not trying to be a polished engine release yet. It is the place where the GPU-first assumptions are tested before they become architecture.

StormLab

StormLab is the move-generation and representation lab.

The current frontrunner is the nbs bit-sliced NybbleBoard-style layout, and the measured legal-path results are already strong enough to matter:

Case	Result
Start position legal fast path	`10.138B` legal boards/s
Kiwipete legal fast path	`7.213B` legal boards/s
218-move stress FEN legal fast path	`6.370B` legal boards/s
`nsym-legalb-l01` on easy no-state frontiers	`43.79B` fully legal boards/s
`nsym-legalb-l01` on mixed no-state frontiers	`26.41B` fully legal boards/s
Current full-walk keeper on the 1M walk corpus	about `39.2B` fully legal boards/s

Those numbers are useful because they are tied to legal-correctness gates and comparative representation work, not just to one isolated kernel.

StormConveyorLab

StormConveyorLab is the variable-depth continuation bridge. Its purpose is to move odd-depth frontiers forward in explicit paired +2 ply waves instead of treating shallow suffix depth as a permanent architectural boundary.

Current Validation Snapshot

The first vertical slice, dated 2026-05-18, already validates the shape:

Scenario	Outcome
Corpus depth-5 validation	exact on start, kiwi, midgame, endgame, en-passant-like, and check cases
Repeated-wave check case	`check d7` returned `91,624`
Repeated-wave start position	exact `d7` count `3,195,901,860`
Build cost	all-status CUDA compile took about `52 minutes` wall clock

Measured admitted start-position wave timings:

Wave	Input Rows	Descriptors	Counted Nodes	Emit us	Count us	Advance us
`d3 -> d5`	`8,902`	`197,281`	`4,865,609`	`963.264`	`544.768`	`1,495.46`
`d5 -> d7`	`4,865,609`	`119,060,324`	`3,195,901,860`	`6,396.03`	`25,787.1`	`0`

The practical significance is that the continuation-wave shape is exact and reusable. That is a prerequisite for a larger GPU-native engine path.

Command-Line Shape

StormLab

.\PerftStormLab.exe --variant all --boards 65536 --iters 8 --json D:\ps_mg\storm_lab.json

Conveyor Smoke

.\PerftStormConveyorLab.exe --scenario check --target-depth 5 --reps 1

Profiling

.\scripts\profile_movegen_lab.ps1 -Out D:\ps_mg -Variant all -Boards 65536 -Iters 8

Why It Matters

PerftStorm is where the project stops assuming that the right long-term engine architecture should resemble the legacy exact counter.

The exact-count work still supplies the validation discipline. What changes here is the design target: board formats, legal move generation, continuation flow, and scheduling choices that are selected because they suit the GPU, not because they preserve old recursive structure.