5 Guide for Reviewers

The current chapter should be considered an extension of the corresponding “Guide for Reviewers” in rOpenSci’s “Dev Guide”. The principles for reviewing packages described there also apply to statistical packages, with this chapter describing additional processes and practices for review of packages submitted to the statistical software peer review system. Reviews of statistical software should first assess compliance against our standards, and then proceed to a more general review, as described in the following two sub-sections. The template to be used for reviews of statistical software is included in the final sub-section of this chapter. Prior to describing the review process, the following sub-section describes several tools which can be used to aid review.

5.1 Tools for Reviewing Statistical Software

Upon initial submission, the ropensci-review-bot generates an automated report summarising aspects of package structure and functionality intended to inform the review process, an example of which can be seen here.

The elements of these reports are described in the Guide for Editors. While the aspects reported on there are primarily intended to help editors initially identify potential issues best addressed prior to review, they nevertheless include a number of insights into package structure which may usefully inform the review process.

Components of these reports intended to aid reviews include a complete report of standards compliance, generated with the srr package, and an interactive diagram of inter-relationships between package functions (and other objects), generated with the pkgstats package. These can be recreated locally by first installing these two packages by running either,

remotes::install_github ("ropensci-review-tools/pkgstats")
remotes::install_github ("ropensci-review-tools/srr")

or,

pak::pkg_install ("ropensci-review-tools/pkgstats")
pak::pkg_install ("ropensci-review-tools/srr")

Within a local clone of the package being reviewed, the report on statistical standards can be generated by running srr::srr_report(), with the sample report linking to a version of that report which may be viewed here, and the detailed statistical properties of the package and associated interactive diagram of package structure generated by running,

library (pkgstats)
x <- pkgstats () # 'x' has lots of detail on package structure
plot_network (x)

This network, the sample version of which may be viewed here, provides immediate visual insight into the relationships between all objects constructed within a package in all languages used, both R itself and any languages used in src/ code such as C or C++. The following section describes the srr report in more detail, and its intended use in assessing compliance with our standards.

5.2 Assessment Against Standards

The entire system for peer review of statistical software is based on sets of general and category-specific standards given in Chapter 6 of this book. The process of assessing software against standards is facilitated by the srr (software review roclets) package which both authors and reviewers need to install as shown above.

This package is primarily intended to aid authors in documenting both how and where their software complies with each of the relevant general and category-specific standards. The function of the package used to aid reviewers is srr_report(), the output of which is linked from the initial package report described above, and can also be generated locally by simply running that function within a local clone of the package being reviewed. The report contains hyperlinks to all places in the code at which each standard is addressed.

Using this report, reviewers must assess their agreement with every statement either of compliance with, or non-applicability of, standards as reflected in roclet tags of:

@srrstats for standards with which software complies;
@srrstatsNA for standards which authors have deemed not to be applicable to their software.

The srr_report() is divided into two main sections containing links to locations in the code where these two types of tags are documented. No action need be taken on standards with which reviewers agree, whether because software complies and has a tag of @srrstats, or because a standard is not applicable and has a tag of @srrstatsNA. Reviewers are only asked to note any standards with which they disagree, primarily either because of:

Disagreement in standards compliance, where authors have used a tag of @srrstats but a reviewer judges either the explanation or associated code to be insufficient for compliance; or
Disagreement about non-applicability of a standard, where authors have used a tag of @srrstatsNA, but a reviewer believes that standard ought to apply to the software.

The srr_report() function returns the same content in markdown format, and may be used by reviewers as an initial checklist against which to assess compliance. All standards with which reviewers agree with authors statements of compliance may simply be removed, hopefully reducing initially extensive checklists down to a manageable few items with which reviews might disagree.

The following sub-section describes additional procedures required when assessing standards compliance of packages aiming for either silver or gold badges. The general srr procedure is described in the main package vignette, which reviewers are also encouraged to read to familiarise themselves with how the srr package is used to document compliance with standards. The main srr vignette includes code which can be stepped through to generate an example report.

5.2.1 Review for Silver and Gold Badges

This system for peer-review of statistical software features badges in three categories of bronze, silver, and gold. As described in the corresponding Guide for Authors, a silver badge is granted to software which complies with more than a minimal set of applicable standards, and which extends beyond bronze in least one notable aspect while a gold badge is granted to software which complies with all standards which reviewers have deemed potentially applicable, and which extends beyond bronze in several notable aspects. These notable aspects by which software may fulfil the requirements of silver or gold badges are:

Compliance with a sufficient number of additional standards beyond the minimal number necessary for bronze compliance;
Demonstrated excellence in compliance with at least two standards from two distinct sub-sections;
Having a demonstrated generality of usage beyond a single use case; or
Demonstrated excellence in internal aspects of package design and structure.

The authors will have identified in their initial submission which of these aspects they intend to fulfil. For packages which claim to comply with more than a minimal number of necessary standards, reviewers must additionally consider both which of the standards with which software complies might be considered minimally necessary, as well as whether any standards which authors have identified as not applicable (through @ssrstatsNA tags) could indeed be deemed applied. Not all standards may be able to be applied to a given piece of software. For example, software designed to accept sparse matrix inputs from the Matrix package will be unable to conform with many of the standards for general rectangular input forms.

These three categories of necessary, currently, and potentially applicable standards can then be used by reviewers to roughly assess the quantitative degree by which compliance exceeds a minimally required level. As stated in the Guide for Authors, the first of these four items may be considered to be fulfilled for software which meets at least one quarter of all potentially applicable standards beyond those minimally required. A useful example of minimally required standards may often be identified as those which would be required for the software to meet one specific use case. Any aspects of the software which generalise its usage beyond that single use case may be considered in the second category of potentially yet not necessarily applicable. Judgement of such categorical distinction, and of precise amounts, is left to the discretion of reviewers.

Packages aiming for gold badges at the end of review will need to comply with all potentially applicable standards, and will also need to fulfil at least three of the four aspects listed above, and described in more detail in the Guide for Authors.

5.2.2 Disagreement with Authors’ Intentions

Authors must state on submission the grade of badge they are aiming for. Reviewers may subsequently deem a package to be compliant with a different grade of badge. The review template includes the following two items:

This package complies with a sufficient number of standards for a (bronze/silver/gold) badge
This grade of badge is the same as what the authors wanted to achieve

The first item is intended to specify the grade of badge (bronze, silver, gold) which reflects the reviewer’s judgement, and need not necessarily reflect the authors intentions. The second item may be checked when a reviewer agrees with the authors that a package is indeed sufficient to achieve their desired badge. Where reviewers do not agree with authors’ beliefs on package package compliance, the second item should be left unchecked in the submitted review. The editor will then ask authors for their response, and will inform whether additional rounds of development and review are necessary to obtain the grade of badge desired by the authors.

5.3 General Package Review

From a reviewer’s perspective, one of the primary aim of our standards-based system is to provide a highly structured system for addressing the technical aspects of review, leaving the general review process comparably free of technical details, and therefore more able to consider broader aspects of package design, functionality, and usage.

Following assessment of compliance with standards, reviewers should accordingly proceed with a general descriptive review by following the processes established in rOpenSci’s general software review system, for which the best source of information is provided by reviews themselves, along with the Guide for Reviewers. In formulating a general review of statistical software, we ask reviewers to explicitly consider the following aspects, some of which loosely correspond to sub-sections of the General Standards for Statistical Software:

Documentation: Is the documentation sufficient to enable general use of the package beyond one specific use case? Do the various components of documentation support and clarify one another?
Algorithms How well are algorithms encoded? Is the choice of computer language appropriate for that algorithm, and/or envisioned use of package? Are aspects of algorithmic scaling sufficiently documented and tested? Are there any aspects of algorithmic implementation which could be improved?
Testing Regardless of actual coverage of tests, are there any fundamental software operations which are not sufficiently expressed in tests? Is there a need for extended tests, or if extended tests exists, have they been implemented in an appropriate way, and are they appropriately documented?
Visualisation (where appropriate) Do visualisations aid the primary purposes of statistical interpretation of results? Are there any aspects of visualisations which could risk statistical misinterpretation?
Package Design Is the package well designed for its intended purpose? We ask reviewers to consider the follow two aspects of package design:
- External Design: Do exported functions and the relationships between them enable general usage of the package? Do exported functions best serve inter-operability with other packages?
- Internal design: Are algorithms implemented appropriately in terms of aspects such as efficiency, flexibility, generality, and accuracy? Could ranges of admissible input structures, or form(s) of output structures, be expanded to enhance inter-operability with other packages?

As algorithms form the core of statistical software, we ask reviewers to pay particular attention to the assessment of algorithmic quality. Most category-specific standards include a central “Algorithmic Standards” component which can be used to provide starting points for more general considerations of algorithmic quality. The General Standard G1.1 also requires all similar algorithms or implementations to be documented within the software, so reviewers should also have access to a list of comparable implementations.

Most of the above considerations are explicitly included in the reviewers’ template which follows.

5.4 Review Template

The following template is to be used for reviews of statistical software. All checkbox items should be retained, and checked where appropriate, while other lines, notably including questions in the General Review section, may be modified or removed as appropriate.

## Package Review

*Please check off boxes as applicable, and elaborate in comments below.  Your review is not limited to these topics, as described in the reviewer guide*

- Briefly describe any working relationship you may have (had) with the package authors (or otherwise remove this statement)

- [ ] As the reviewer I confirm that there are no [conflicts of interest](https://devguide.ropensci.org/policies.html#coi) for me to review this work (If you are unsure whether you are in conflict, please speak to your editor _before_ starting your review).

---

### Compliance with Standards

- [ ] This package complies with a sufficient number of standards for a (bronze/silver/gold) badge
- [ ] This grade of badge is the same as what the authors wanted to achieve

The following standards currently deemed non-applicable (through tags of `@srrstatsNA`) could potentially be applied to future versions of this software: (Please specify)

Please also comment on any standards which you consider either particularly well, or insufficiently, documented.

For packages aiming for silver or gold badges:

- [ ] This package extends beyond minimal compliance with standards in the following ways: (please describe)

---

### General Review

#### Documentation

The package includes all the following forms of documentation:

- [ ] **A statement of need** clearly stating problems the software is designed to solve and its target audience in README
- [ ] **Installation instructions:** for the development version of package and any non-standard dependencies in README
- [ ] **Community guidelines** including contribution guidelines in the README or CONTRIBUTING
- [ ] The documentation is sufficient to enable general use of the package beyond one specific use case

The following sections of this template include questions intended to be used as guides to provide general, descriptive responses. Please remove this, and any subsequent lines that are not relevant or necessary for your final review.

#### Algorithms

- How well are algorithms encoded?
- Is the choice of computer language appropriate for that algorithm, and/or envisioned use of package?
- Are aspects of algorithmic scaling sufficiently documented and tested?
- Are there any aspects of algorithmic implementation which could be improved?

#### Testing

- Regardless of actual coverage of tests, are there any fundamental software operations which are not sufficiently expressed in tests? 
- Is there a need for extended tests, or if extended tests exists, have they been implemented in an appropriate way, and are they appropriately documented?

#### Visualisation (where appropriate)

- Do visualisations aid the primary purposes of statistical interpretation of results?
- Are there any aspects of visualisations which could risk statistical misinterpretation?

#### Package Design

- Is the package well designed for its intended purpose?
- In relation to **External Design:** Do exported functions and the relationships between them enable general usage of the package? 
- In relation to **External Design:** Do exported functions best serve inter-operability with other packages?
- In relation to **Internal Design:** Are algorithms implemented appropriately in terms of aspects such as efficiency, flexibility, generality, and accuracy? 
- In relation to **Internal Design:** Could ranges of admissible input structures, or form(s) of output structures, be expanded to enhance inter-operability with other packages?

---

- [ ] **Packaging guidelines**: The package conforms to the rOpenSci packaging guidelines

Estimated hours spent reviewing:

- [ ] Should the author(s) deem it appropriate, I agree to be acknowledged as a package reviewer ("rev" role) in the package DESCRIPTION file.