Deep Compression [23] describes a method to compress DNNs without loss of accuracy through a combination of pruning and weight sharing. Pruning makes matrix W sparse with density D ranging from 4% to 25% for our benchmark layers. Weight sharing replaces each weight Wij with a fourbit index Iij into a shared table S of 16 possible weight values.