An imageis decomposed into object proposals using proposal generators,such as EdgeBox [31] or SelectiveSearch [29]. Thebasic pipeline is to iteratively mine (i.e. localize) objects astraining samples using the detectors and then train detectorswith the updated training samples. The detector canbe a proposal level SVM classifier [24, 4, 28] or modernCNN-based detector [26, 14, 30], such as RCNN [9] or FastRCNN [8]. Deselaers et al. [5] first argued to use objectnessscore as a generic object appearance prior to the particulartarget categories. Cinbis et al. [4] proposed a multi-foldmultiple instance learning procedure, which prevents trainingfrom prematurely locking onto erroneous object locations.Uijlings et al. [28] argued to use pre-trained detectorsas the proposal generator and show its effectiveness inknowledge transfer from source to target categories.