The current chapter should be considered an extension of the corresponding “Guide for Reviewers” in rOpenSci’s “Dev Guide”. The principles for reviewing packages described there also apply to statistical packages, with this chapter describing additional processes and practices for review of packages submitted to the statistical software peer review system. Reviews of statistical software should first assess compliance against our standards, and then proceed to a more general review, as described in the following two sub-sections. The template to be used for reviews of statistical software is included in the final sub-section of this chapter. Prior to describing the review process, the following sub-section describes several tools which can be used to aid review.
Upon initial submission, the
ropensci-review-bot generates an automated
report summarising aspects of package structure and functionality intended to
inform the review process, an example of which can be seen
The elements of these reports are described in the Guide for Editors. While the aspects reported on there are primarily intended to help editors initially identify potential issues best addressed prior to review, they nevertheless include a number of insights into package structure which may usefully inform the review process.
Components of these reports intended to aid reviews include a complete report
of standards compliance, generated with the
srr package, and an
interactive diagram of inter-relationships between package functions (and other
objects), generated with the
pkgstats package. These
can be recreated locally by first installing these two packages by running
remotes::install_github ("ropensci-review-tools/pkgstats") remotes::install_github ("ropensci-review-tools/srr")
pak::pkg_install ("ropensci-review-tools/pkgstats") pak::pkg_install ("ropensci-review-tools/srr")
Within a local clone of the package being reviewed, the report on statistical
standards can be generated by running
with the sample report linking to a version of that report which may be viewed
and the detailed statistical properties of the package and associated
interactive diagram of package structure generated by running,
library (pkgstats) x <- pkgstats () # 'x' has lots of detail on package structure plot_network (x)
This network, the sample version of which may be viewed
provides immediate visual insight into the relationships between all objects
constructed within a package in all languages used, both R itself and any
languages used in
src/ code such as C or C++. The following section describes
srr report in more
detail, and its intended use in assessing compliance with our standards.
The entire system for peer review of statistical software is based on sets of
general and category-specific standards given in Chapter 6 of this
book. The process of assessing software against standards is
facilitated by the
srr (software review roclets)
package which both authors and
reviewers need to install as shown above.
This package is primarily intended to aid authors in documenting both how and
where their software complies with each of the relevant general and
category-specific standards. The function of the package used to aid reviewers
the output of which is linked from the initial package report described above,
and can also be generated locally by simply running that function within
a local clone of the package being reviewed. The report contains hyperlinks to
all places in the code at which each standard is addressed.
Using this report, reviewers must assess their agreement with every statement either of compliance with, or non-applicability of, standards as reflected in roclet tags of:
@srrstatsfor standards with which software complies;
@srrstatsNAfor standards which authors have deemed not to be applicable to their software.
is divided into two main sections containing links to locations in the code
where these two types of tags are documented. No action need be taken on
standards with which reviewers agree, whether because software complies and has
a tag of
@srrstats, or because a standard is not applicable and has a tag of
@srrstatsNA. Reviewers are only asked to note any standards with which they
disagree, primarily either because of:
- Disagreement in standards compliance, where authors have used a tag of
@srrstatsbut a reviewer judges either the explanation or associated code to be insufficient for compliance; or
- Disagreement about non-applicability of a standard, where authors have
used a tag of
@srrstatsNA, but a reviewer believes that standard ought to apply to the software.
function returns the same content in markdown format, and may be used by
reviewers as an initial checklist against which to assess compliance. All
standards with which reviewers agree with authors statements of compliance may
simply be removed, hopefully reducing initially extensive checklists down to
a manageable few items with which reviews might disagree.
The following sub-section describes additional procedures required when
assessing standards compliance of packages aiming for either
silver or gold badges. The general
procedure is described in the main package
which reviewers are also encouraged to read to familiarise themselves with how
srr package is used to
document compliance with standards. The main
includes code which can be stepped through to generate an example report.
This system for peer-review of statistical software features badges in three categories of bronze, silver, and gold. As described in the corresponding Guide for Authors, a silver badge is granted to software which complies with more than a minimal set of applicable standards, and which extends beyond bronze in least one notable aspect while a gold badge is granted to software which complies with all standards which reviewers have deemed potentially applicable, and which extends beyond bronze in several notable aspects. These notable aspects by which software may fulfil the requirements of silver or gold badges are:
- Compliance with a sufficient number of additional standards beyond the minimal number necessary for bronze compliance;
- Demonstrated excellence in compliance with at least two standards from two distinct sub-sections;
- Having a demonstrated generality of usage beyond a single use case; or
- Demonstrated excellence in internal aspects of package design and structure.
The authors will have identified in their initial submission which of these
aspects they intend to fulfil. For packages which claim to comply with more
than a minimal number of necessary standards, reviewers must additionally
consider both which of the standards with which software complies might be
considered minimally necessary, as well as whether any standards which authors
have identified as not applicable (through
@ssrstatsNA tags) could indeed be
deemed applied. Not all standards may be able to be applied to a given piece of
software. For example, software designed to accept sparse matrix inputs from
package will be
unable to conform with many of the standards for general rectangular input
These three categories of necessary, currently, and potentially applicable standards can then be used by reviewers to roughly assess the quantitative degree by which compliance exceeds a minimally required level. As stated in the Guide for Authors, the first of these four items may be considered to be fulfilled for software which meets at least one quarter of all potentially applicable standards beyond those minimally required. A useful example of minimally required standards may often be identified as those which would be required for the software to meet one specific use case. Any aspects of the software which generalise its usage beyond that single use case may be considered in the second category of potentially yet not necessarily applicable. Judgement of such categorical distinction, and of precise amounts, is left to the discretion of reviewers.
Packages aiming for gold badges at the end of review will need to comply with all potentially applicable standards, and will also need to fulfil at least three of the four aspects listed above, and described in more detail in the Guide for Authors.
From a reviewer’s perspective, one of the primary aim of our standards-based system is to provide a highly structured system for addressing the technical aspects of review, leaving the general review process comparably free of technical details, and therefore more able to consider broader aspects of package design, functionality, and usage.
Following assessment of compliance with standards, reviewers should accordingly proceed with a general descriptive review by following the processes established in rOpenSci’s general software review system, for which the best source of information is provided by reviews themselves, along with the Guide for Reviewers. In formulating a general review of statistical software, we ask reviewers to explicitly consider the following aspects, some of which loosely correspond to sub-sections of the General Standards for Statistical Software:
- Documentation: Is the documentation sufficient to enable general use of the package beyond one specific use case? Do the various components of documentation support and clarify one another?
- Algorithms How well are algorithms encoded? Is the choice of computer language appropriate for that algorithm, and/or envisioned use of package? Are aspects of algorithmic scaling sufficiently documented and tested? Are there any aspects of algorithmic implementation which could be improved?
- Testing Regardless of actual coverage of tests, are there any fundamental software operations which are not sufficiently expressed in tests? Is there a need for extended tests, or if extended tests exists, have they been implemented in an appropriate way, and are they appropriately documented?
- Visualisation (where appropriate) Do visualisations aid the primary purposes of statistical interpretation of results? Are there any aspects of visualisations which could risk statistical misinterpretation?
Package Design Is the package well designed for its intended purpose? We
ask reviewers to consider the follow two aspects of package design:
- External Design: Do exported functions and the relationships between them enable general usage of the package? Do exported functions best serve inter-operability with other packages?
- Internal design: Are algorithms implemented appropriately in terms of aspects such as efficiency, flexibility, generality, and accuracy? Could ranges of admissible input structures, or form(s) of output structures, be expanded to enhance inter-operability with other packages?
As algorithms form the core of statistical software, we ask reviewers to pay particular attention to the assessment of algorithmic quality. Most category-specific standards include a central “Algorithmic Standards” component which can be used to provide starting points for more general considerations of algorithmic quality. The General Standard G1.1 also requires all similar algorithms or implementations to be documented within the software, so reviewers should also have access to a list of comparable implementations.
Most of the above considerations are explicitly included in the reviewers’ template which follows.
The following template is to be used for reviews of statistical software. All checkbox items should be retained, and checked where appropriate, while other lines, notably including questions in the General Review section, may be modified or removed as appropriate.
## Package Review *Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide* - Briefly describe any working relationship you may have (had) with the package authors (or otherwise remove this statement) - [ ] As the reviewer I confirm that there are no [conflicts of interest](https://devguide.ropensci.org/policies.html#coi) for me to review this work (If you are unsure whether you are in conflict, please speak to your editor _before_ starting your review). --- ### Compliance with Standards - [ ] This package complies with a sufficient number of standards for a (bronze/silver/gold) badge - [ ] This grade of badge is the same as what the authors wanted to achieve The following standards currently deemed non-applicable (through tags of `@srrstatsNA`) could potentially be applied to future versions of this software: (Please specify) Please also comment on any standards which you consider either particularly well, or insufficiently, documented. For packages aiming for silver or gold badges: - [ ] This package extends beyond minimal compliance with standards in the following ways: (please describe) --- ### General Review #### Documentation The package includes all the following forms of documentation: - [ ] **A statement of need** clearly stating problems the software is designed to solve and its target audience in README - [ ] **Installation instructions:** for the development version of package and any non-standard dependencies in README - [ ] **Community guidelines** including contribution guidelines in the README or CONTRIBUTING - [ ] The documentation is sufficient to enable general use of the package beyond one specific use case The following sections of this template include questions intended to be used as guides to provide general, descriptive responses. Please remove this, and any subsequent lines that are not relevant or necessary for your final review. #### Algorithms - How well are algorithms encoded? - Is the choice of computer language appropriate for that algorithm, and/or envisioned use of package? - Are aspects of algorithmic scaling sufficiently documented and tested? - Are there any aspects of algorithmic implementation which could be improved? #### Testing - Regardless of actual coverage of tests, are there any fundamental software operations which are not sufficiently expressed in tests? - Is there a need for extended tests, or if extended tests exists, have they been implemented in an appropriate way, and are they appropriately documented? #### Visualisation (where appropriate) - Do visualisations aid the primary purposes of statistical interpretation of results? - Are there any aspects of visualisations which could risk statistical misinterpretation? #### Package Design - Is the package well designed for its intended purpose? - In relation to **External Design:** Do exported functions and the relationships between them enable general usage of the package? - In relation to **External Design:** Do exported functions best serve inter-operability with other packages? - In relation to **Internal Design:** Are algorithms implemented appropriately in terms of aspects such as efficiency, flexibility, generality, and accuracy? - In relation to **Internal Design:** Could ranges of admissible input structures, or form(s) of output structures, be expanded to enhance inter-operability with other packages? --- - [ ] **Packaging guidelines**: The package conforms to the rOpenSci packaging guidelines Estimated hours spent reviewing: - [ ] Should the author(s) deem it appropriate, I agree to be acknowledged as a package reviewer ("rev" role) in the package DESCRIPTION file.