ESOP and FoSSaCS Artifact Evaluation

Information on submission and evaluation of artifacts for the ESOP and FoSSaCS conferences.

Background

ESOP 2023 and FoSSaCS 2023 will have a joint post-paper-acceptance voluntary artifact evaluation. Authors will be welcome to submit artifacts for evaluation after paper notification. The outcome will not alter the paper acceptance decision.

A paper consists of a constellation of artifacts that extend beyond the document itself: software, proofs, models, test suites, benchmarks, and so on. In some cases, the quality of these artifacts is as important as that of the document itself, yet most of our conferences offer no formal means to submit and evaluate anything but the paper.

Following a trend in our community over the past many years, ESOP'23 and FoSSaCS'23 offer an Artifact Evaluation process, which allows authors of accepted papers to optionally submit supporting artifacts. The goal of artifact evaluation is two-fold: to probe further into the claims and results presented in a paper, and to reward authors who take the trouble to create useful artifacts to accompany the work in their paper. Artifact evaluation is optional, but highly encouraged, and authors may choose to submit their artifact for evaluation only after their paper has been accepted.

The evaluation and dissemination of artifacts improves reproducibility, and enables authors to build on top of each other’s work. Beyond helping the community as a whole, the evaluation and dissemination of artifacts confers several direct and indirect benefits to the authors themselves.

The ideal outcome for the artifact evaluation process is to accept every artifact that is submitted, provided it meets the evaluation criteria listed below. We will strive to remain as close as possible to that ideal goal. However, even though some artifacts may not pass muster and may be rejected, we will evaluate in earnest and make our best attempt to follow authors’ evaluation instructions.

Evaluation Criteria

The artifact evaluation committee (AEC) will read each artifact’s paper and judge how well the submitted artifact conforms to the expectations set by the paper. The specific artifact evaluation criteria are:

  • Consistency with the paper: the artifact should reproduce the same results, modulo experimental error.
  • Completeness: the artifact should reproduce all the results that the paper reports, and should include everything (code, tools, 3rd party libraries, etc.) required to do so.
  • Well documented: the artifact should be well documented so that reproducing the results is easy and transparent.
  • Ease of reuse: the artifact provides everything needed to build on top of the original work, including source files together with a working build process that can recreate the binaries provided.

Note that artifacts will be evaluated with respect to the claims and presentation in the submitted version of the paper, not the camera-ready version.

Badges

Authors of papers with accepted artifacts will be assigned official EAPLS artifact evaluation badges, indicating that they have taken the extra time and have undergone the extra scrutiny to prepare a useful artifact. The ESOP/FoSSaCS 2023 AEC will award the Artifacts Functional and Artifacts (Functional and) Reusable badges. Additionally, the Artifacts Available badge may be obtained by making the artifacts associated with the paper permanently available for retrieval on a publicly accessible archival repository which has a declared plan to enable permanent accessibility.

The badges will appear on the first page of the camera-ready version of the paper. The artifact authors will be allowed to revise their camera ready paper after they are notified of their artifact’s publication in order to include a link to the artifact’s DOI.

Process

To maintain the separation of paper and artifact review, authors will only be asked to upload their artifacts after their papers have been accepted. Authors planning to submit to the artifact evaluation should prepare their artifacts well in advance of this date to ensure adequate time for packaging and documentation.

Throughout the artifact review period, submitted reviews will be (approximately) continuously visible to authors. Reviewers will be able to continuously interact (anonymously) with authors for clarifications, system-specific patches, and other logistics help to make the artifact evaluable. The goal of continuous interaction is to prevent rejecting artifacts for minor issues, not research related at all, such as a “wrong library version”-type of problem.

Types of Artifacts

The artifact evaluation will accept any artifact that authors wish to submit, broadly defined. A submitted artifact might be:

  • software,
  • mechanized proofs,
  • test suites,
  • data sets,
  • a video of a difficult- or impossible-to-share system in use, or
  • any other artifact described in a paper.

Artifact Evaluation Committee (AEC)

Other than the chairs, the AEC members are senior graduate students, postdocs, or recent PhD graduates, identified with the help of the ESOP PC, and recent artifact evaluation committees. Among researchers, experienced graduate students are often in the best position to handle the diversity of systems expectations that the AEC will encounter. In addition, graduate students represent the future of the community, so involving them in the AEC process early will help push this process forward. The AEC chairs devote considerable attention to both mentoring and monitoring, helping to educate the students on their responsibilities and privileges.

  • Iván Arcuschin Moreno (University of Buenos Aires)
  • Lukas Armborst (University of Twente)
  • Darion Cassel (Carnegie Mellon University)
  • Daniil Frumin (University of Groningen)
  • Léo Gourdin (Verimag)
  • Alejandro Hernández-Cerezo (Complutense University of Madrid)
  • Jonas Kastberg Hinrichsen (Aarhus University)
  • Jules Jacobs (Radboud University)
  • Daniel Kocher (University of Salzburg)
  • Di Long Li (The Australian National University)
  • Orestis Melkonian (University of Edinburgh)
  • Shouvick Mondal (Indian Institute of Technology Gandhinagar)
  • Srinidhi Nagendra (University of Paris, IRIF)
  • Mário Pereira (Universidade NOVA de Lisboa | FCT—NOVA LINCS)
  • Long Pham (Carnegie Mellon University)
  • Goran Piskachev (Amazon Web Services)
  • Ocan Sankur (Univ Rennes, CNRS)
  • Somesh Singh (INRIA and ENS Lyon)
  • Dawit Tirore (IT University of Copenhagen)
  • Niccolò Veltri (chair) (Tallinn University of Technology)
  • Théophile Wallez (Inria Paris)
  • Sebastian Wolff (chair) (New York University)
  • Cheng Zhang (Boston University)

Distinguished Artifacts

Based on the reviews and discussion among the AEC, one or more artifacts will be selected for Distinguished Artifact awards.

Conflict of Interests

Conflicts of interests for AEC members are handled by the chairs. Conflicts of interest involving one of the AEC chairs are handled by the other AEC chairs, or the PC of the conference if all chairs are conflicted. Artifacts involving an AEC chair author must be unambiguously accepted (they may not be borderline), and they may not be considered for the distinguished artifact award.

Call for Artifacts

Submission

Submit your artifact via EasyChair: ESOP/FoSSaCS 2023 Artifact Submission.

When submitting, please provide the following:

  • Title: use the form “[ArtifactNN] < XX >” where NN is the id and XX is the tile of your accepted ESOP/FoSSaCS paper.
  • Abstract: your paper’s abstract.
  • Keywords: your paper’s keywords.
  • Paper: the submitted version of your paper.
  • Documentation: the artifact’s documentation as described below.
  • (Optional) Supplementary materials: the artifact itself if it is less than 15MB in size.

General Info

A well-packaged artifact is more likely to be easily usable by the reviewers, saving them time and frustration, and more clearly conveying the value of your work during evaluation. A great way to package an artifact is as a Docker image or in a virtual machine that runs “out of the box” with very little system-specific configuration. We encourage authors to include pre-built binaries for all their code, so that reviewers can start with little effort; together with the source and build scripts that allow to regenerate those binaries, to guarantee maximum transparency. Providing pre-built VMs or docker containers is preferable to providing scripts (e.g. Docker or Vagrant configurations) that build the VM, since this alleviates reliance on external dependencies.

Submission of an artifact does not imply automatic permission to make its content public. AEC members will be instructed that they may not publicize any part of the submitted artifacts during or after completing evaluation, and they will not retain any part of any artifact after evaluation. Thus, you are free to include models, data files, proprietary binaries, and similar items in your artifact.

Artifact evaluation is single-blind.

Important Dates and Review Phases

  • Paper notification: 22 December 2022
  • Artifact submission: 5 January 2023
  • Review preferences due (for reviewers): 10 January 2023
  • Kick-the-tires phase review due: 17 January 2023
  • Full reviews due: 31 January 2023
  • Author-Reviewer interactions end: 3 February 2023
  • Artifact notification: 9 February 2023

The review proceeds in three phases:

Phase 1: Kick-the-tires phase (5 Jan 2023–17 Jan 2023)

In this phase, reviewers will go through the Getting Started Guide that accompanies each artifact and submit a short review based on the “basic functionality” described in it. These initial reviews will be made available to authors, and they will be able to communicate with you directly through EasyChair in order to debug simple issues that arise. The identity of reviewers remains anonymous (“Reviewer A”, “Reviewer B”, etc.).

Phase 2: Main review phase (18 Jan 2023–3 Feb 2023)

In this phase, reviewers will go through the Step-by-Step Instructions of each artifact and submit full, complete reviews, extending and expanding upon the Phase 1 reviews as appropriate. As before, these reviews will be made available to authors, and they will be able to communicate with you directly through EasyChair. This phase ends on February 3rd, after which date the authors no longer interact with the reviewers.

Phase 3: Final review phase (4 Feb 2023–10 Feb 2023)

The majority of artifact evaluations are expected to be completely done after the previous two phases, requiring no additional back-and-forth with the authors. This third phase is for the reviewers to make their final evaluation of the artifacts based on the interactions from the previous two phases.

Generating the Artifact

Your artifact documentation should be a single PDF or TXT file (Update: unfortunately EasyChair will not accept TXT files) containing the following:

  1. the Title of your submission, i.e., “[ArtifactNN] < XX >”,
  2. the URL pointing to a single file containing the artifact,
  3. the md5 hash of the single artifact file (use the md5 or md5sum command-line tool to generate the hash), and
  4. a Getting Started Guide (see details below), and
  5. Step-by-step instructions (see details below) for how you propose to evaluate your artifact (with appropriate connections to the relevant sections, figures and tables of your paper—the results shown only as graphs should be automatically reproducible, i.e. generated, using free tools.

Artifacts do not need to be anonymous; reviewers will be aware of author identities.

The URL must be a Zenodo, Figshare, Google Drive, Dropbox, Github, Bitbucket, or (public) Gitlab URL, to help protect the anonymity of the reviewers. You may upload your artifact directly as “Supplementary materials” if it is a single file less than 15 MB (please indicate this in the documentation).

The Getting Started Guide should contain setup instructions (including, for example, a pointer to the VM player software, its version, passwords if needed, etc.) and basic testing of your artifact that you expect a reviewer to be able to complete in 30 minutes. Reviewers will follow all the steps in the guide during an initial kick-the-tires phase. The Getting Started Guide should be as simple as possible, and yet it should stress the key elements of your artifact. Anyone who has followed the Getting Started Guide should have no technical difficulties with the rest of your artifact.

The Step-by-Step Instructions explain how to reproduce any experiments or other activities that support the conclusions in your paper. Write this for readers who have a deep interest in your work and are studying it to improve it or compare against it. If your artifact runs for more than a few minutes, point this out and explain how to run it on smaller inputs.

Where appropriate, include descriptions of and links to files (included in the archive) that represent expected outputs (e.g., the log files expected to be generated by your tool on the given inputs); if there are warnings that are safe to be ignored, explain which ones they are.

The artifact’s documentation should include the following:

  • A list of claims from the paper supported by the artifact, and how/why.
  • A list of claims from the paper not supported by the artifact, and how/why. For example, performance claims cannot be reproduced in VM, authors are not allowed to redistribute specific benchmarks, etc. Artifact reviewers can then center their reviews / evaluation around these specific claims.
  • Expected running time for each of the experiments.

Packaging the Artifact

When packaging your artifact, please keep in mind:

  1. how accessible you are making your artifact to other researchers, and
  2. the fact that the AEC members will have a limited time in which to make an assessment of each artifact.

Your artifact should have a container or a bootable virtual machine image with all the necessary libraries installed. We strongly encourage you to use a container (e.g., https://www.docker.com/). Using a container or a virtual machine image provides a way to make an easily reproducible environment—it is less susceptible to bit rot. It also helps the AEC have confidence that errors or other problems cannot cause harm to their machines.

You should make your artifact available as a single archive file and use the naming convention < artifactNN >.< suffix >, where the appropriate suffix is used for the given archive format. Please use a widely available compressed archive format such as ZIP (.zip), tar and gzip (.tgz), or tar and bzip2 (.tbz2). Please use open formats for documents.

Authors submitting machine-checked proof artifacts should consult Marianna Rapoport’s Proof Artifacts: Guidelines for Submission and Reviewing.

FAQ

My artifact requires special hardware (e.g., access to a cluster, GPU, CPU extensions such as TSX or SGX, etc). Can I still submit it for the AEC? My artifact uses huge amounts of data and/or my evaluation took a long time to complete (e.g., days or weeks or more). Can I still submit it for the AEC?

Yes. Each artifact should be complete to the extent possible and include everything needed to replicate all the experiments in full, together with instructions about how to do it.

Members of the AEC will spend roughly 8 hours per artifact. During this time, each AEC member will check the completeness of the artifact with regard to the results reported in the paper. Including scaled down versions of the full experiments, or instructions about how to do it, are highly encouraged to assist AEC members in their task ( together with a brief discussion about how the scaled down experiments are representative of the full experiments).

For datasets that are very large (i.e., 100s of GB or more), authors can submit a subset of the datasets to be evaluated by the AEC; and make the full results available to allow for future research.

Can my artifact be accepted if some of the paper’s claims are not supported by the artifact, for example if some benchmarks are omitted or the artifact does not include tools we experimentally compare against in the paper?

In general yes (if good explanations are provided, as explained above), but if such claims are essential to the overall results of the paper, the artifact will be rejected. As an extreme example, an artifact consisting of a working tool submitted with no benchmarks (e.g., if all benchmarks have source that may not be redistributed) would be rejected.

Why can’t we use GitHub for the Available badge?

Commercial repositories are unreliable, in that there is no guarantee the evaluated artifact will remain available indefinitely. Contrary to popular belief, it is possible to rewrite git commit history in a public repository (see docs on git rebase and the “–force” option to git push, and note that git tags are mutable). Users can delete public repositories, or their accounts. And in addition to universities deleting departmental URLs over time, hosting companies also sometimes simply delete data: Bidding farewell to Google Code (2015), Sunsetting Mercurial Support in Bitbucket ( 2019).

Reviewers identified things to fix in documentation or scripts for our artifact, and we’d prefer to publish the fixed version. Can we submit the improved version for the Available badge?

Yes.

Can I get the Available badge without submitting an artifact? I’m still making the thing available!

Yes.

Can I get the Available badge for an artifact that was not judged to be Functional? I’m still making the thing available!

Yes.