FEVER2.0 Shared Task Server

Welcome to the scoring server for the FEVER2.0 shared tasks. Participants can use this system to submit entries for all three phases of the FEVER2.0 shared task.

Task Definition

The FEVER 2.0 Shared Task will build upon work from the first shared task in a Build it Break it Fix it setting. The shared will comprise three phases. In the first phase of the shared task, Builders build systems for solving the first FEVER shared task dataset. The highest scoring systems from the first shared task will be used as baselines and we will also invite new participants to develop new systems.

In the second phase, Breakers are tasked with generating adversarial examples to fool the existing systems. We consider only novel claims (i.e. not contained in the original FEVER dataset) with either Supports, Refutes or NotEnoughInfo labels. Supported or refuted claims should be accompanied with evidence from the Wikipedia dump used in the original task (claims with NotEnoughInfo as labels do not require evidence). The Breakers will have access to the systems to allow themselves to generate claims which are challenging for the builders. Alongside the labels and evidence for each claim, breakers will be asked to provide meta-information regarding the type of attack they are introducing. The breakers will be invited to submit up to a fixed number of claims as their entry to the shared task. We welcome both manual (through the use of our annotation interface) and automated methods for this phase. Half of the claims generated by the Breakers will be retained as a hold-out blind test set and the remaining half will be released to the participants to fix their systems. The blind set will be manually evaluated by the organisers for quality assurance.

In the final phase of the shared task, the original Builders or teams of dedicated Fixers must incorporate the new data generated by the Breakers to improve the systems' classification performance.