Input scale plays an important role in modern detection frameworks, and an optimal training scale for images exists empirically. However, the optimal one usually cannot be reached in facing extremely large images under the memory constraint. In this study, we explore the scale effect inside the object detection pipeline and find that feature upsampling with the introduction of high-resolution information benefits the detection. Compared with direct input upscaling, feature upsampling trades a small performance loss for a large amount of memory savings. From these observations, we propose a self-supervised feature augmentation network, which takes downsampled images as inputs and aims to generate comparable features with the ones when feeding upscaled images to networks. We present a guided feature upsampling module, which takes downsampled images as inputs, to learn upscaled feature representations with the supervision of real large features acquired from upscaled images. In a self-supervised learning manner, we can introduce detailed information of images to the network. For an efficient feature upsampling, we design a residualized sub-pixel convolution block based on a sub-pixel convolution layer, which involves considerable information in upsampling process. Experiments on Mapillary Vistas Dataset (MVD), Cityscapes, and COCO are conducted to demonstrate the effectiveness of our method. On the MVD and Cityscapes detection benchmarks, in which the images are extremely large, our method surpasses current approaches. On COCO, the proposed method obtains comparable results to existing methods but with higher efficiency.
@article{Pan2020SelfSupervisedFeature, author = {X. Pan, F. Tang, W. Dong, Y. Gu, Z. Song, Y. Meng, P. Xu, O. Deussen, C. Xu}, doi = {10.1109/TIP.2020.2993403}, journal = {IEEE Transactions on Image Processing}, pages = {6745--6758}, title = {Self-Supervised Feature Augmentation for Large Image Object Detection}, url = {http://graphics.uni-konstanz.de/publikationen/Pan2020SelfSupervisedFeature}, volume = {29}, year = {2020} }