AI Chess Lab AI Chess Software, research, lab notes, and durable public pages.

Static section

PerftStorm

PerftStorm is the GPU-native engine lab. It exists because, after enough exact-perft work, the remaining question stops being “how far can the legacy counter be pushed” and becomes “what should a GPU-first chess engine actually look like.”

That makes PerftStorm complementary to GPUPerft, not a replacement for it:

Alpha status: Work in progress. PerftStorm is experimental engine and move-generation work. Behavior, interfaces, and internal assumptions are still in motion. Run it at your own risk.

  • GPUPerft is the exact-count and export line
  • PerftStorm is the engine-architecture line

Current Release Map

The current downloadable Windows x64 lab builds are:

Executable Purpose Downloads
PerftStormLab.exe Board-representation and GPU move-discovery lab. exe / sha256
PerftStormConveyorLab.exe Variable-depth continuation and paired-wave conveyor lab. exe / sha256

What It Focuses On

The live scope is deliberately narrow:

  • board representations that fit the hardware
  • legal move generation at high throughput
  • continuation waves for deeper frontiers
  • profiler-driven engine architecture work

The repo is not trying to be a polished engine release yet. It is the place where the GPU-first assumptions are tested before they become architecture.

StormLab

StormLab is the move-generation and representation lab.

The current frontrunner is the nbs bit-sliced NybbleBoard-style layout, and the measured legal-path results are already strong enough to matter:

Case Result
Start position legal fast path 10.138B legal boards/s
Kiwipete legal fast path 7.213B legal boards/s
218-move stress FEN legal fast path 6.370B legal boards/s
nsym-legalb-l01 on easy no-state frontiers 43.79B fully legal boards/s
nsym-legalb-l01 on mixed no-state frontiers 26.41B fully legal boards/s
Current full-walk keeper on the 1M walk corpus about 39.2B fully legal boards/s

Those numbers are useful because they are tied to legal-correctness gates and comparative representation work, not just to one isolated kernel.

StormConveyorLab

StormConveyorLab is the variable-depth continuation bridge. Its purpose is to move odd-depth frontiers forward in explicit paired +2 ply waves instead of treating shallow suffix depth as a permanent architectural boundary.

Current Validation Snapshot

The first vertical slice, dated 2026-05-18, already validates the shape:

Scenario Outcome
Corpus depth-5 validation exact on start, kiwi, midgame, endgame, en-passant-like, and check cases
Repeated-wave check case check d7 returned 91,624
Repeated-wave start position exact d7 count 3,195,901,860
Build cost all-status CUDA compile took about 52 minutes wall clock

Measured admitted start-position wave timings:

Wave Input Rows Descriptors Counted Nodes Emit us Count us Advance us
d3 -> d5 8,902 197,281 4,865,609 963.264 544.768 1,495.46
d5 -> d7 4,865,609 119,060,324 3,195,901,860 6,396.03 25,787.1 0

The practical significance is that the continuation-wave shape is exact and reusable. That is a prerequisite for a larger GPU-native engine path.

Command-Line Shape

StormLab

.\PerftStormLab.exe --variant all --boards 65536 --iters 8 --json D:\ps_mg\storm_lab.json

Conveyor Smoke

.\PerftStormConveyorLab.exe --scenario check --target-depth 5 --reps 1

Profiling

.\scripts\profile_movegen_lab.ps1 -Out D:\ps_mg -Variant all -Boards 65536 -Iters 8

Why It Matters

PerftStorm is where the project stops assuming that the right long-term engine architecture should resemble the legacy exact counter.

The exact-count work still supplies the validation discipline. What changes here is the design target: board formats, legal move generation, continuation flow, and scheduling choices that are selected because they suit the GPU, not because they preserve old recursive structure.