A rambling look at definitions of a paper, explicit and hidden costs, and review.
November 15, 2020.
This past week, there was a kerfluffle on academic Twitter about a paper submitted to ICLR, and one of its accompanying reviews on OpenReview. The paper uses a commercial software package, MuJuCo. The reviewer felt that since the package required a paid license (there’s a caveat, but won’t deep dive here), this would create barriers for underrepresented groups and was an ethical concern (here) and the reviewer recommended rejection.
I was surprised how fierce the defense of this reviewer was. I argued that the reviewer was creating their own policies during review; ICLR’s CfP and Author Guide do not make restrictions on using commercial software. It is a significant leap between “cause new, or enhance existing, inequities” and “disenfranchise or oppress people” in the code of ethics (from the reviewer, I did not make this leap) and using a commercial software package. Consequently, this portion of the review should be ignored.
So that’s the backstory.
While this particular review is the backstory, this post is about more than that – what is a paper? What supporting materials are needed? Who gets to decide? Where are hidden costs? And, a theme for a while: what are best practices in review?
A paper describes scientific work and shares it with the community. A paper also makes an argument for how that scientific work is relevant or important to its scientific community. These two items are tangled up in peer review. From my post with Dmytro Mishkin:
We argue that the main point of the current review process is not answering the question: “Are the results in the paper correct?,” instead: “Is the paper worthy to be published at our precious conference?” And while the first question is related to the paper’s technical aspects and is mostly objective, the current one is subjective by definition.
Is a paper scientific work that can be immediately tested with code? Some communities have pushed for this, such as EMNLP’s reproducibility criteria.
Is a paper scientific work that can be immediately tested with code, for no financial cost? This seems to be the implicit argument in the case of this paper. But computers, power, internet connections cost money. There is also a cost associated with one’s time. Some of these things cannot be acquired with funds. There is always cost.
In the same way that a paper’s value is judged subjectively according to one’s experience, so too is cost. What may be easy for you to accomplish may not be easy for others.
Arguments about costs can become quite arbitrary. For example, many papers with code assume one has access to modern GPUs. If you have access to GPUs through an institution’s cluster, it may not be a cost you see, but someone has invested in that infrastructure and it is a resource you have that others may not have. Many will say that GPUs are available to the those who want them, for free, through interfaces such as Google’s Colaboratory. Read, though:
GPUs and TPUs are sometimes prioritized for users who use Colab interactively rather than for long-running computations, or for users who have recently used less resources in Colab. As a result, users who use Colab for long-running computations, or users who have recently used more resources in Colab, are more likely to run into usage limits and have their access to GPUs and TPUs temporarily restricted. Users with high computational needs may be interested in using Colab’s UI with a local runtime running on their own hardware. Users interested in having higher and more stable usage limits may be interested in Colab Pro.
Resources such as Colaboratory are a gift. They may not be available in the future, and they are not intended for large experiments.
Or you can read Dodge et al. 2019, which discusses computational budgets:
As we show in §4.3, our approach makes clear that the expected-best performing model is a function of the computational budget. In §4.4 we show how our approach can be used to estimate the budget that went into obtaining previous results; in one example, we see a too-small budget for baselines, while in another we estimate a budget of about 18 GPU days was used (but not reported).
The experiment referenced in Dodge et al. 2019 is a NLP one, and from interacting with NLP people I’ve learned they require greater resources than computer vision or robotics. (18 GPU days though; whether you are in NLP or not, this is a very good paper and I am not doing it justice with this short mention.)
So this is what I mean about discussions of cost being arbitrary. Why are some costs okay and others not? We have privileges – some, we don’t even recognize as such until confronted with them – and barriers in where we live, employment, and personal circumstances.
I don’t want anyone to get stuck on GPUs as an example – it is just the easiest. Coincidentally, another reviewer for the backstory paper made a similar comparison.
There has been some debate as to whether Open Science itself is totally welcoming to all and could contribute to gatekeeping. The idea that a contribution must be open, in all ways and all possible definitions, is explored in Bahlai et al. 2019:
Yet because open science can encompass all steps of the scientific process, it is natural that it means different things to different people: One can open the process of data collection, data analysis, computer code, manuscript writing, data publishing, and scholarly publication (to name a few). If you type “open science” into a search engine, you would pull up thousands of hits that do any, all, or none of these things. A side effect of this broad and vague scope, one that has stalled progress in the open-science movement, is that its advocates often become caught up in a detailed checklist of whether a project is “open,” based on tallying whether it hits all the aspects of open science discussed above, rather than focusing on the core goal of accountability and transparency.
Open science seeks to make science accessible to everyone, yet projects that are open in one way but not all ways are often derided by the open-science community, without any acknowledgment of the systematic barriers that make open science more accessible to some scientists than others, nor any respect for the steps taken to overcome some of these barriers by scientists who are not necessarily at the most secure point in their career.
Something bothering me beneath the surface of this debate was the, “just implement everything in the Open Source version.” For those who have time and expertise and do not have funds, it may be hard to imagine having funds and not having programmers, time, or programmers with time to do this type of work. It happens. There are all sorts of places to work and they each have interesting policies and quirks, especially about the ways in which your time is managed and how money is spent. The COVID-19 pandemic has reduced the available time of researchers (described in Myers et al. 2020 TL;DR it has been brutal for caregivers). Meanwhile, publications as a part of performance assessment? Still there.
I don’t want you to get the idea that this piece is a “Crap Olympics”: a competition for who has it worse. It isn’t. Some points are:
One way to temper these debates about paperhood and costs is to have the debates early. Defining the community expectations beforehand in a call for papers or guide to authors benefits authors and reviewers, who then have a better idea of particular conference culture and can plan accordingly. EMNLP has the best guide I have seen, for instance they specified in May which topics were important to the community and not considered grounds for rejection. The guide for authors is also very detailed.
The criteria may need to be specified for each paper type, as suggested in Rogers and Augenstein 2020:
More fine-grained tracks: ACs should never have to decide between different types of papers. If surveys, opinion pieces, resource and analysis papers etc. are all welcome, they should have their own acceptance quotas and best paper awards.
Review forms tailored for different paper types: it does not make much sense to evaluate a reproducibility report for novelty, or a resource paper for SOTA results. COLING 2018 developed review forms taking into account different types of contributions, possibly several per paper.
In the absence of community guidance, reviewers bring their own biases about what topics are important, what hardware/software is good or bad, which types of costs are valid (or not), what supporting materials and in which form are necessary, and all sorts of other things instead of assessing the science in the paper.
I had a particularly strange experience where three reviewers accepted my paper, and the ACs rejected because the work had not yet been evaluated on a public dataset. No public dataset existed for that subject area.
You can still try to influence authors as you would a colleague, but you shouldn’t penalize them. The way to do this is with a Suggestions section. (I specifically mention that the authors do not have to respond to these suggestions.) This is the ‘FYI’ section. Important: you have to actually mean that the authors do not have to respond to the material in this section. Don’t throw a fit during the rebuttal period if you used this, and then the authors did not respond to the content.
“Colaboratory – Google.” https://research.google.com/colaboratory/faq.html (accessed Nov. 14, 2020). URL.
Christie Bahlai, Lewis J. Bartlett, Kevin R. Burgio, Auriel M. V. Fournier, Carl N. Keiser, Timothée Poisot, Kaitlin Stack Whitney. 2019. “Open Science Isn’t Always Open to All Scientists”. American Scientist 107:2. 2019 DOI: 10.1511/2019.107.2.78.
J. Dodge, S. Gururangan, D. Card, R. Schwartz, and N. A. Smith, “Show Your Work: Improved Reporting of Experimental Results,” in Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China, Nov. 2019, pp. 2185–2194, doi: 10.18653/v1/D19-1224..
EMNLP. “How to Write Good Reviews.” 2020. (Accessed Nov. 14, 2020). URL.
EMNLP. “Call for papers.” 2020. (Accessed Nov. 14, 2020). URL
K. R. Myers et al., “Unequal effects of the COVID-19 pandemic on scientists,” Nature Human Behaviour, vol. 4, no. 9, Art. no. 9, Sep. 2020, DOI: 10.1038/s41562-020-0921-y.
Dmytro Mishkin and Amy Tabb. “What does it mean to publish your scientific paper in 2020?: Benefits to authors of non-anonymous preprints.” 2020. URL.
A. Rogers and I. Augenstein, “What Can We Do to Improve Peer Review in NLP?,” arXiv:2010.03863 [cs], Oct. 2020, Accessed: Oct. 11, 2020. [ Online ] . Available: http://arxiv.org/abs/2010.03863..
A. Tabb. 2019. “My reviewing style, or how to review technical papers when you’ve not been taught how.” URL.
© Amy Tabb 2018-2021. All rights reserved. The contents of this site reflect my personal perspectives and not those of any other entity.