Research Article

The Assessment of Science: The Relative Merits of Post-Publication Review, the Impact Factor, and the Number of Citations

  • Adam Eyre-Walker mail,

    Affiliation: School of Life Sciences, University of Sussex, Brighton, United Kingdom

  • Nina Stoletzki

    Affiliation: Hannover, Germany

  • Published: October 08, 2013
  • DOI: 10.1371/journal.pbio.1001675

Reader Comments (4)

Post a new comment on this article

Impact factor and assessor score (reposted from PubMed Commons)

Posted by Joshua_L_Cherry on 08 Nov 2013 at 15:47 GMT

Reposted comment from PubMed Commons (http://www.ncbi.nlm.nih.g...):

This article claims to have demonstrated that post-publication assessors are strongly influenced by their knowledge of the journal in which a paper was published. Specifically, it is claimed that they "tend to over-rate papers published in journals with high impact factors". Furthermore, it is suggested that "scientists have little ability to judge...the intrinsic merit of a paper".

These conclusions, which are based on coefficients of correlation between article metrics, do not follow from the data. The inferences involved are akin to taking correlation as proof of causation.

The authors first observe that journal impact factor (IF) correlates with assessor score even when citation number is controlled for. They interpret this as evidence that (assessor knowledge of) IF directly influences assessor score. The observed partial correlation is in fact expected for imperfect measures of a latent variable even in the absence of causal effects among the measures. If, for example, all three variables (assessor score, IF, and citation number) are noisy measures of article merit with uncorrelated noise, any two are necessarily correlated with each other even when the third is controlled for. Even if we took citation number as a perfect measure of merit (which we have no reason to do), the correlations would show only that assessor score and IF tend to err in the same way, not that one of them influences the other. Note that controlling for citation number does not eliminate the correlation between the scores given by two assessors, but it would be erroneous to conclude that one affected the other. The authors even tell us that citation number is "a very poor measure of the underlying merit of the science, because the accumulation of citations is highly stochastic". Controlling for such a variable could not possibly eliminate the correlation between assessor score and IF, so the authors' reasoning would suggest a strong assessor bias even if no such bias existed.

The authors also point out that controlling for IF significantly reduces the correlation between scores given by different assessors. They conclude that much of the correlation between assessors is due to their both being influenced by their knowledge of IF, rather than reflecting assessment based directly on intrinsic merit. This inference, too, is unfounded. Controlling for one of three intercorrelated variables can reduce the correlation between the other two under a range of conditions that do not involve causal connections. In fact, when two positively correlated variables positively correlate to the same extent with a third, as expected for assessor scores (since ordering of assessors is arbitrary), controlling for the third variable necessarily decreases the correlation between the first two. Thus, the observed reduction in correlation will occur whenever assessor scores correlate positively with IF, which certainly does not require the posited causal effect.

This is not to say that the claimed effect can be disproven or is implausible, but only that it has not been demonstrated. The observed correlation structure is entirely consistent with the complete absence of such an effect, and in the presence of such an effect the authors' reasoning would likely overestimate it drastically.

No competing interests declared.

RE: Impact factor and assessor score (reposted from PubMed Commons)

AdamEyreWalker replied to Joshua_L_Cherry on 08 Nov 2013 at 15:56 GMT

This reply was also originally posted on PubMed Commons (http://www.ncbi.nlm.nih.g...)

We thank the author for Dr. Cherry for his insightful comments. Unfortunately he is correct; controlling for merit, using a noisy measure such as the number of citations, will leave a correlation between assessor score and the impact factor whether or not there is a tendency for assessors to overrate papers in high impact journals. Since we did not previously appreciate this problem, and it wasn’t caught by a number of referees that looked at our paper prior to publication, it might be worth elaborating why this is the case. Imagine that we consider all papers that have received 100 citations; we assumed in our analysis that these represented papers of equal merit. However, because the number of citations is a noisy measure of merit, some of these papers will be papers of poor merit that by chance got more than their fair share of citations, and others will be papers of good merit than received less than their fair share of citations. As a consequence there is variation in merit amongst papers that received 100 citations. Hence, if assessors tend to rate better papers more highly and better journals publish better papers, then there will be a correlation between assessor score and the impact factor even when the number of citations is controlled for. Furthermore, the decrease in the correlation between assessor scores, and between assessor score and the number of citations, when the impact factor is controlled for, may simply reflect the decrease in the variance in merit within journals.

So as Dr. Cherry concludes, there is no evidence from our analysis that assessors overrate science in high impact journals. This tendency may exist, but there is simply no evidence from our analysis. However, the majority of our conclusions are unaffected by this insight; there is a rather poor correlation between assessor scores and between assessor score and the number of citations, whether or not the impact factor is controlled for. These correlations demonstrate that either assessors do not agree on what constitutes merit or they are not good at identifying merit, and that the accumulation of citations is highly stochastic.

Finally, we note that the correlation between assessor score and impact factor is stronger than either the correlation between assessor scores, and the correlation between assessor scores and the number of citations. These correlations therefore suggest that the impact factor is the best measure of merit if there is no tendency for assessors to be influenced by the journal in which a paper is published.

There might be two approaches to determining whether assessors overrate papers in high-ranking journals. Developing a mathematical model of the relationship between assessor score, the number of citations and the impact factor of a journal – so far our attempts to do this have failed. Or to independently assess a range of papers before and after publication.

No competing interests declared.