Zooniverse Data Aggregation

Hi all, I am Coleman Krawczyk and for the past year I have been working on tools to help Zooniverse research teams work with their data exports. The current version of the code (v1.3.0) supports data aggregation for nearly all the project builder task types, and support will be added for the remaining task types in the coming months.

What does this code do?

This code provides tools to allow research teams to process and aggregate classifications made on their project, or in other words, this code calculates the consensus answer for a given subject based on the volunteer classifications.

The code is written in python, but it can be run completely using three command line scripts (no python knowledge needed) and a project’s data exports.

Configuration

The first script is the uses a project’s workflow data export to auto-configure what extractors and reducers (see below) should be run for each task in the workflow. This produces a series of `yaml` configuration files with reasonable default values selected.

Extraction

Next the extraction script takes the classification data export and flattens it into a series of `csv` files, one for each unique task type, that only contain the data needed for the reduction process. Although the code tries its best to produce completely “flat” data tables, this is not always possible, so more complex tasks (e.g. drawing tasks) have structured data for some columns.

Reduction

The final script takes the results of the data extraction and combine them into a single consensus result for each subject and each task (e.g. vote counts, clustered shapes, etc…). For more complex tasks (e.g. drawing tasks) the reducer’s configuration file accepts parameters to help tune the aggregation algorithms to best work with the data at hand.

A full example using these scripts can be found in the documentation.

Future for this code

At the moment this code is provided in its “offline” form, but we testing ways for this aggregation to be run “live” on a Zooniverse project. When that system is finished a research team will be able to enter their configuration parameters directly in the project builder, a server will run the aggregation code, and the extracted or reduced `csv` files will be made available for download.

Zooniverse

Zooniverse Data Aggregation

What does this code do?

Configuration

Extraction

Reduction

Future for this code

Leave a comment Cancel reply

The Zooniverse Blog. We're the world's largest and most successful citizen science platform and a collaboration between the University of Oxford, The Adler Planetarium, and friends

What does this code do?

Configuration

Extraction

Reduction

Future for this code

Share this:

Related

Leave a comment Cancel reply

The Zooniverse Blog. We're the world's largest and most successful citizen science platform and a collaboration between the University of Oxford, The Adler Planetarium, and friends