Deep Neural Networks (DNNs) have emerged as powerful machine learning models that exhibit major differences from traditional approaches for image classification [13, 23]. DNNs with deep ar- chitectures have the capacity to learn complex models and allow for learning powerful object representations without the need to handle designed features. This has been empirically demonstrated on the ImageNet classification task across thousands of classes.Compared to image classification, object detection is a more challenging task that requires more sophisticated methods to solve. In this context, we focus not only on classifying images, but also on estimating the classes and locations of objects within the images precisely. Object detection is one of the hardest problems in com- puter vision and data engineering. Owing to the improvements in object representations and machine learning methods, many major advances in object detection have been achieved, like Faster R-CNN [20], which has achieved excellent object detection accuracy by using DNNs to classify object proposals.In real world applications, it may be required to classify a given image based on the object(s) contained within that image. For ex- ample, we want to classify an image into different categories based on the fish types within the image, as shown in Fig.1. There are some unique practical challenges for this kinds of image recogni- tion problems: Firstly, the objects are very small as compared to the background. Standard CNN based methods like ResNet [7] and Faster R-CNN [20] may learn the feature of the boats (background) but not the fishes (objects). Therefore, it will fail when presented with images containing new boats. Secondly, imbalanced data sets exist widely in real world and they have been providing great chal- lenges for classification tasks. As a result, the CNN models might be biased towards majority classes with large training samples, such that might have trouble classifying those classes with very few training samples. Thirdly, in real-world applications, getting data is expensive that involves time-consuming and labour-intensive process, e.g., ground truth have to be labeled and confirmed by multiple experts in the domain. How to achieve good performance with very limited training dataset remains a big challenge in both academic and industry.In this paper, we address the aforementioned challenges by pre- senting a computational framework based-on the latest develop- ments in deep learning. Firstly, we propose two-stage detection scheme to handle small object recognition. The framework com- bines the advantages of both object detection and image classifica- tion methods. We first use state-of-the-art object detection method