This paper demonstrates that an analysis relied upon in a previous paper (Nasr et al., 2019) to identify number-sensitive units in a neural network trained for object recognition is flawed, and that indeed the same network with randomly initialized weights also has a large number of number sensitive units. Moreover, the number of units detected depends strongly on the sample size of the statistical test, with larger sample sizes detecting no number sensitive units. The paper additionally performs some analyses on a network trained specifically to predict number. The reviewers generally felt that the demonstration of Nasr et al.’s flawed analysis was important, with R2 arguing that the work is “imperative to publish” and R1 and R3 finding the experiments in the first part of the paper convincing. However, R1, R3, and R4 all had concerns with the second part of the paper, in which it is claimed that a network trained to classify number (Nu-Net) can learn to subitize. I feel that the results in the first part of the paper are sufficiently impactful that the paper should be accepted. Echoing R2, I believe that the results will be of interest to both computer vision researchers and neuroscientists in terms of painting a cautionary tale against relying too heavily on the specificity of individual neurons. However, I find myself also agreeing with R1, R3, and R4 that the second part of the paper feels overclaimed. I think the analyses are interesting and thought-provoking, but I’m unconvinced that they show the network has truly learned to subitize. So, while I am recommending acceptance, I would like to see some changes to the claims regarding subitizing. For example, rather than saying “we present evidence suggesting that DCNNs can learn subitizing” or “Nu-Net has passed the generalization tests on subitizing”, it would be more accurate to state that (the specific network used) is more robust to distribution shift for small numbers than for large numbers. To make a stronger claim that NNs actually learned to subitize would require extensive experiments on many architectures with many types of datasets, which has not been demonstrated in this paper.