Ask anything. The 120B model is split across four consumer GPUs in different US states; a small draft proposes tokens and the swarm verifies them in one round-trip.
0
tokens
0
tok/s
0
accept/round
0
traversals
draft proposes 4 → the 120B verifies all 4 in one traversal of the chain → longest matching prefix is committed. greedy, so the output is the model's own.