We include visualizations of nearest neighbors (NN) to qualitatively demonstrate supervision collapse.
For each row in the HTML files, we show the query image (left), along with the top 9 nearest neighbors, using Euclidean distance.
Setup
The nearest-neighbor retrieval set includes 10% of images from both the ImageNet train and test sets (Meta-Dataset's split; specifically, 130 images per class). The images are passed in batches of size 256 (Batch Norm is set to train mode) to obtain a feature vector for each image. We use a ResNet-34 Prototypical Net with 224x224 images, trained with normalized SGD.
Computing nearest neighbors for Prototypical Net representations proved challenging due to an entirely different source of supervision collapse:
The default implementation of Prototypical Nets actually only produces representations that are comparable within a single episode.
We hypothesize the reason being two fold:
batch_norm_train_nearest_neigbors.html: This visualizes the NN for ImageNet images from the Meta-Dataset training set. Due to the above reason, the results are close to random, even though all query images are taken from the training set. This is not particularly useful for analysis.
To fix the above problem, we make two modifications to Prototypical Net training:
layer_norm_train_nearest_neigbors.html: This visualizes the NN after the above fix also for query images from the training set. We can see a substantial improvement in the quality of the matches, as would be expected for the retrievals using a representation trained with standard ImageNet classification.
Finally we demonstrate supervision collapse in the following file:
layer_norm_test_nearest_neigbors.html: Here the queries are taken from the test set. We see that, due to supervision collapse, the NN have returned to being quite poor.