Category Archives: Technical

Caesar Subject Rule Effect Vulnerability Report

In the beginning of April 2020, we were notified that subjects from one Zooniverse project were appearing in the subject set of a separate project where they did not belong. In our investigation of the issue, our team determined that this behavior was being caused by a Caesar configuration mistake that used an incorrect Subject Set ID. Project owners using Caesar were able to create Subject Rule Effects that added subjects to collections or subject sets, even without proper subject set editing permissions. We have rectified the issue surrounding Subject Rule Effects and eliminated this vulnerability, and would like to share the details for anyone who is interested.

The issue was raised by project lead James Perry (@JamesPerry), who reported that subjects that didn’t belong to his project were appearing in his subject sets.  Due to a mistyped subject set ID in a Caesar `add_to_subject_set` effect for an unrelated project, that Subject Rule Effect was sending subjects from that project to one of James’s subject sets instead of the correct target.

Our immediate course of action was to fix the project impacted by the vulnerability, and push out a temporary code fix to prevent the vulnerability from being exploited. 

  1. To fix the affected project, we updated the incorrect subject set id for the project that was incorrectly sending subjects to the wrong project and removed the unwanted subjects from the set. 
  2. On April 3rd we deployed a temporary code fix to disable Subject Rule Effect creation and modification for all but admin users (see PR #1109). This change was communicated to affected teams that were most impacted by the change, and teams that reached out after seeing our notification banner or encountering a Caesar interface error.

On May 15th we pushed out a permanent fix that checked the user has permissions to send data to the target subject set or collection. Specifically, the updated validation code checks that the user has update permissions on the project the subject set or collection is linked to. (see PRs #1115, #1129 and #1131). 

For anyone running their own hosted copy of Caesar, we recommend pulling these changes as soon as you’re able.

Cross-Post — Lessons from Space: Why Delay a Launch?

Today’s cross-post is from, blog site of one of our Zooniverse developers. Chelsea writes codes for open source projects like our Zooniverse Citizen Science Mobile App and NASA Landsat Image Processing Pipeline. She also teaches Mobile Software Development at the Master’s Program in Computer Science at the University of Chicago.

A SpaceX Falcon 9 rocket lifts off from Space Launch Complex 40 at Cape Canaveral Air Force Station in Florida at 11:50 p.m. EST on March 6, 2020, carrying the uncrewed cargo Dragon spacecraft on its journey to the International Space Station for NASA and SpaceX’s 20th Commercial Resupply Services (CRS-20) mission. Dragon will deliver more than 5,600 pounds of science investigations and cargo to the orbiting laboratory. Credit: NASA and

Chelsea was selected as a NASA Social appointee to attend the launch of last week’s CRS-20 cargo resupply mission to the International Space Station (this included attending the launch of the SpaceX Falcon 9 rocket and Dragon spacecraft, meeting w/ NASA’s social media team, touring NASA facilities at Kennedy, meeting with experts, and more). Check out all her posts on instagram, twitter, and

This post of Chelsea’s, on why the launch was delayed, resonated in particular with us as a web development team. Across many fields, the lessons and insights around the role of deadlines, the value of redundancy, learning from past experiences/mistakes to make better predictions and mitigate risk, etc. apply.

Check out the full post at Enjoy!

Panoptes CLI 1.1 now available

I recently released version 1.1 of the Panoptes CLI – the command-line interface for managing Zooniverse projects. This update includes some exciting new features. Here are the highlights.

You can install the update by running pip install -U panoptescli. Any bugs or issues should be raised via GitHub. See the changelog for the full list of changes.

Resuming failed subject uploads

This one adds what is probably the CLI’s most requested feature: the ability to resume a failed upload from where it left off, without duplicating subjects or requiring manual changes to the manifest. I hope this will be a huge help to researchers, especially when uploading large manifests.

If the upload fails for any reason – whether that’s an issue with our systems, a problem with your internet connection, a bug in the CLI itself, or if you just decide to stop the upload by pressing ctl-c – the CLI will detect that there was a problem and will ask you if you want to be able to resume the upload later. If you say yes, it will then save a new manifest in YAML format containing the remaining upload queue along with all of the upload’s command line options. Then to resume, you just start a new upload with the YAML manifest instead of the original CSV.

Multithreaded subject uploads

Uploading new subjects can often take a long time. The total upload time depends not only on your internet connection speed, but also on the time it takes for the CLI to talk to the Panoptes API. Creating a new subject typically requires the CLI to make two HTTP requests: one to create the subject and one to upload the subject’s media (the image, or video, or whatever). If the subject has multiple images then that only increases the number of requests. Plus subjects need to be linked to the subject set; this happens in batches, but it can still add up to a lot of requests for large uploads. If you’re uploading 10,000 subjects for example, that means the CLI has to make a minimum of 20,000 requests (probably more), and each of those requests includes some overhead where the CLI is waiting for the server to respond, which is all basically wasted time.

Luckily the Panoptes CLI 1.1 gets around that, by taking advantage of the multithreading features of the Panoptes Client for Python which were released earlier this year. Now, those 20,000 requests will happen five at a time, so for example three of them can be sending data while two of them are waiting for the server, meaning your internet connection is fully utilised the whole time and no time is wasted. In my testing, this substantially sped up subject uploads, potentially saving hours of your time.

Adding and removing lists of subjects to and from subject sets

Often project owners need to add large numbers of existing subjects to a new set, or remove subjects from their current set. It was possible to do this with the previous version of the CLI by passing subject IDs on the command-line, but it was often difficult to modify large numbers of subjects this way (it was possible with xargs on Linux and macOS, but this isn’t the most intuitive way to do it).

Now, there’s a new option to pass a list of IDs in a text file rather than having to specify IDs on the command-line. (The old way is still there too if you prefer to do it that way!) Just produce a text file containing the relevant subject IDs, one per line. If you already have the subject information in a spreadsheet, exporting a CSV file with just the subject ID column will produce the right file (just make sure it only contains the one column).

For example, if you have a file called subject_ids.csv containing the following:


You can run:

panoptes subject-set add-subjects -f subject_ids.csv 1357

to add subjects 1234, 5678, and 9012 to subject set 1357.


Edited 29 November 2019: Fixed typo in pip command for upgrade.

Live Coding the Zooniverse

Here at the Zooniverse, we make scientific discovery accessible to the community. Now, we’re incorporating that philosophy into our software engineering.

Our mobile developer, Chelsea Troy, live streams some of her development work on the Zooniverse Mobile App (available on the Apple App Store for iOS and Google Play for Android). This means that you can watch her as she codes, and you can even submit questions and suggestions while she is working!  For an introduction to the App and Chelsea’s code development efforts, check out this YouTube video.

Why did we decide to try out live coding? Chelsea talks a little bit about that decision in this blog post. Among the reasons: live coding videos are a great way to attract and recruit possible open source contributors whose work on the Zooniverse mobile app and other codebases could greatly benefit the Zooniverse.

After each live stream, a recording of the session will remain on YouTube. Chelsea also publishes show notes for each stream that include a link to the video, a link to the pull request created in the video, an outline of what we covered in the video (with timestamps), and a list of the parts of the video that viewers found the most useful.

Sound interesting? Willing to contribute to Zooniverse open source code development? Keep an eye on Chelsea’s Twitter account (@heychelseatroy) and blog for future live stream events.  But go ahead and check out the recording of her first live stream and show notes to get you started.

For more information on the mobile app, see related blog posts:
Blog Entry: Notes on the Zooniverse Mobile App – New Functionality Release
Blog Entry: A First Look at Mobile Usage and Results

Featured Image Credit: Reddit/cavepopcorn

The Zooniverse is Now Powered by Kubernetes

We recently finished the first stage in a pretty big change to our web hosting infrastructure. We’ve moved all of our smaller backend services (everything except Panoptes, Ouroboros, and frontend code) into a Kubernetes cluster. I’m pretty excited about this change, so I wanted to share what we’ve done and what we’ll be doing next.

Kubernetes is what’s called a container orchestration system, which is a system that lets us run applications on a cluster of servers without having to worry about which specific server each thing is running on. There are a few different products out there that do this sort of thing, and prior to this we were using Docker Swarm. We didn’t find Docker Swarm to be a great fit for us, but we’re really pleased with Kubernetes and what it’s letting us do.

As a result of moving to Kubernetes, we’ve been able to fully automate the process of updating our server-side apps when we make changes to the code. This automation is important, because it means that the process of deploying updated code is no longer a bottleneck in our development process – it means that any member of our team can easily deploy changes, even in components they haven’t worked on before. This smooths out our development process and it should make our jobs a little easier, meaning we can more easily focus on the job of building the Zooniverse without our infrastructure getting in the way.

Not only has Kubernetes made it easier for us to automate things, but we’ve also found it to be a lot more reliable. So much so, in fact, that we’re now planning to move all of our web services into a Kubernetes cluster, including Panoptes and our main HTTP frontend servers. This is the part I’m really excited about! By making this change, we’ll be making our infrastructure a lot simpler to manage while also saving money by using our cloud computing resources more efficiently (since the cluster’s resources are pooled for everything to share). That should obviously be a huge win, because it will leave more time and money for everything else we do.

Watch this space for updates as we make more improvements to our infrastructure over the coming months!

Panoptes Client for Python 1.1

I’ve just released version 1.1 of the Panoptes Client for Python. The changelog has a full list of what’s new, but there are a few things I wanted to highlight, the first two of which will make it substantially faster to create new subjects:

  • Multithreaded media uploads – the client will automatically use several threads to upload media when you first save a new subject. So, for example, if you create a subject which has three images they will all upload simultaneously (up to five simultaneous uploads, then it will queue them).
  • Multithreaded subject creation – you can also simultaneously create the subjects themselves. That means if you’re creating, say, a thousand subjects, the client can queue them all and create up to five of them simultaneously. This works in conjunction with the media uploads, using one combined queue for the subject creation and the media uploads, to avoid overloading the network and to make sure the subject creation doesn’t get too far ahead of the uploads. This one isn’t automatic – you’ll need to create your subjects with the new SubjectSet.async_saves() context manager to take advantage of it.
  • Retries for all GET requests – we’re quite proud of how reliable the Zooniverse platform is, but sometimes server-side errors do happen. The client will now automatically retry all GET requests (i.e. the ones that don’t modify any data) if an error occurs, improving reliability.
  • Retries for batch linking operations – similar to above, the client will retry any add/remove operations via the new LinkCollection class, which handles linking groups of objects (i.e. subjects to a subject set, subjects to a collection, etc.). This means you should see far fewer failures when linking thousands of subjects to a subject set, for example.
  • Context manager for multiple connections – the Panoptes class can now act as a context manager, providing a safe way to perform operations as multiple users (for example, in a web app).

You can install the update by running pip install -U panoptes-client. Any bugs or issues should be raised via GitHub.

Fixed Cross-Site Scripting Vulnerability on Project Page’s External Links

We recently fixed a security vulnerability that existed in the external/social links (e.g. to Twitter) of projects. Prior to this fix, it was possible for project owners to do two things: 1. create external links that ran malicious JavaScript code if they were clicked (e.g. allowing attackers to capture a user’s login session), and 2. create a link to a malicious website disguised as a “legitimate” link to a known Social website (e.g. a Twitter link that actually directed users to the spoof website “”). Our patches fix both issues, and a follow up investigation revealed that there is no indication this vulnerability was exploited by anyone.

The security issue was discovered on 11 Dec 2018 during an internal security check. The first patch in the series (addressing the major JavaScript injection vulnerability) was deployed within 6 hours, and the final patch (addressing the relatively less harmful issue of spoof-able social links) deployed 2 days later.

The fixes for this vulnerability are contained in pull requests #5141, #5142, and #5148 of the Panoptes Front End project on GitHub. Anyone running their own hosted copy of this should pull these changes as soon as possible.

Additional notes on our investigation are as follows:

  • The vulnerability was introduced on 14 May 2015, in pull request 324.
  • Custom links for projects and organisations allowed project builders to cause a user’s browser to execute arbitrary javascript by entering URLs like javascript: alert('oh no'); if the user clicked on that link, the javascript would run.
  • Malicious javascript executed this way could do whatever it wanted on the site, i.e. it could have stolen logged-in users’ API tokens, logged users out and captured their passwords when re-logging in.
  • Theoretically, passwords may have been have been exposed if malicious Javascript captured them on login, though this would only impact users that click malicious links. Emails may have been exposed, notably if an admin user account was breached.
  • However, we audited the database but could find no evidence (other than our own tests) of this having been done by project owners.
  • Our current solution is to sanitise all external/social links – both when taking input from users and when rendering them on webpages – and only allowing standard website URLs to pass.

As a side effect of our fixes, project owners are now unable to add non-standard website URLs to their project’s external links – for example, continues to work fine, but no longer does.

We apologise for any concern this issue may have caused.

Zooniverse Workflow Bug

We recently uncovered a couple of bugs in the Zooniverse code which meant that the wrong question text may have been shown to some volunteers on Zooniverse projects while they were classifying. They were caught and a fix was released the same day on 29th November 2018.

The bugs only affected some projects with multiple live workflows from 6th-12th and 20th-29th November.

One of the bugs was difficult to recreate and relied on a complex timing of events, therefore we think it was rare and probably did not affect a significant fraction of classifications, so it hopefully will not have caused major issues with the general consensus on the data. However, it is not possible for us to say exactly which classifications were affected in the timeframe the bug was active.

We have apologised to the relevant science teams for the issues this may cause with their data analysis, but we would also like to extend our apologies to all volunteers who have taken part in these projects during the time the bugs were in effect. It is of the utmost importance to us that no effort is wasted on our projects and when something like this happens it is taken very seriously by the Zooniverse team. Since we discovered these bugs we worked tirelessly to fix them, and we have taken actions to make sure nothing like this will happen in the future.

We hope that you accept our most sincere apologies and continue the amazing work you do on the Zooniverse. If you have any questions please don’t hesitate to contact us at


The Zooniverse Team

Zooniverse Data Aggregation

Hi all, I am Coleman Krawczyk and for the past year I have been working on tools to help Zooniverse research teams work with their data exports.  The current version of the code (v1.3.0) supports data aggregation for nearly all the project builder task types, and support will be added for the remaining task types in the coming months.

What does this code do?

This code provides tools to allow research teams to process and aggregate classifications made on their project, or in other words, this code calculates the consensus answer for a given subject based on the volunteer classifications.  

The code is written in python, but it can be run completely using three command line scripts (no python knowledge needed) and a project’s data exports.


The first script is the uses a project’s workflow data export to auto-configure what extractors and reducers (see below) should be run for each task in the workflow.  This produces a series of `yaml` configuration files with reasonable default values selected.


Next the extraction script takes the classification data export and flattens it into a series of `csv` files, one for each unique task type, that only contain the data needed for the reduction process.  Although the code tries its best to produce completely “flat” data tables, this is not always possible, so more complex tasks (e.g. drawing tasks) have structured data for some columns.


The final script takes the results of the data extraction and combine them into a single consensus result for each subject and each task (e.g. vote counts, clustered shapes, etc…).  For more complex tasks (e.g. drawing tasks) the reducer’s configuration file accepts parameters to help tune the aggregation algorithms to best work with the data at hand.

A full example using these scripts can be found in the documentation.

Future for this code

At the moment this code is provided in its “offline” form, but we testing ways for this aggregation to be run “live” on a Zooniverse project.  When that system is finished a research team will be able to enter their configuration parameters directly in the project builder, a server will run the aggregation code, and the extracted or reduced `csv` files will be made available for download.

Panoptes Client for Python 1.0.3

Hot on the heels of last week’s update, I’ve just released version 1.0.3 of the Python Panoptes Client, which fixes a bug introduced in the previous release. If you encounter a TypeError when you try to create subjects, please update to this new version and that should fix it.

This release also updates the default client ID that is used to identify the client to the Panoptes API. This is to ensure that each of our API clients is using a unique ID.

As before, you can install the update by running pip install -U panoptes-client.