gpt-oss-120B served across 4× RTX 4090 over the open internet — speculative decoding
Ask anything. The 120B model is split across four consumer GPUs in different US states; a small draft proposes tokens and the swarm verifies them in one round-trip.
draft20B · WA entry+ draft stage 0Kansas stage 1Kansas stage 2Illinois s3N.C.
0
tokens
0
tok/s
0
accept/round
0
traversals
draft proposes 4 → the 120B verifies all 4 in one traversal of the chain → longest matching prefix is committed. greedy, so the output is the model's own.