Each activation register file holds 64 16-bit activations. This is sufficient to accommodate 4K activation vectors across 64 PEs. Longer activation vectors can be accommodated with the 2KB activation SRAM. When the activation vector has a length greater than 4K, the M×V will be completed in several batches, where each batch is of length 4K or less. All the local reduction is done in the register file. The SRAM is read only at the beginning and written at the end of the batch.