In order to achieve the transfer, we remove the output layer FC8 of the pre-trained network and add an adaptation layer formed by two fully connected layers FCa and FCb (see Figure 2, bottom) that use the output vector Y7 of the layer FC7 as input. Note that Y7 is obtained as a complex non-linear function of potentially all input pixels and may capture mid-level object parts as well as their high-level configurations [27, 53]. The FCa and FCb layers compute Ya = a(WaY7 + Ba) and Yb = (WbYa + Bb), where Wa, Ba, Wb, Bb are the trainable parameters. In all our experiments, FC6 and FC7 have equal sizes (either 4096 or 6144, see Section 4), FCa has size 2048, and FCb has a size equal to the number of target categories.The parameters of layers C1.. . C5, FC6 and FC7 are first trained on the source task, then transferred to the target task and kept fixed. Only the adaptation layer is trained on the target task training data as described next.