We recently finished the first stage in a pretty big change to our web hosting infrastructure. We’ve moved all of our smaller backend services (everything except Panoptes, Ouroboros, and frontend code) into a Kubernetes cluster. I’m pretty excited about this change, so I wanted to share what we’ve done and what we’ll be doing next.
Kubernetes is what’s called a container orchestration system, which is a system that lets us run applications on a cluster of servers without having to worry about which specific server each thing is running on. There are a few different products out there that do this sort of thing, and prior to this we were using Docker Swarm. We didn’t find Docker Swarm to be a great fit for us, but we’re really pleased with Kubernetes and what it’s letting us do.
As a result of moving to Kubernetes, we’ve been able to fully automate the process of updating our server-side apps when we make changes to the code. This automation is important, because it means that the process of deploying updated code is no longer a bottleneck in our development process – it means that any member of our team can easily deploy changes, even in components they haven’t worked on before. This smooths out our development process and it should make our jobs a little easier, meaning we can more easily focus on the job of building the Zooniverse without our infrastructure getting in the way.
Not only has Kubernetes made it easier for us to automate things, but we’ve also found it to be a lot more reliable. So much so, in fact, that we’re now planning to move all of our web services into a Kubernetes cluster, including Panoptes and our main HTTP frontend servers. This is the part I’m really excited about! By making this change, we’ll be making our infrastructure a lot simpler to manage while also saving money by using our cloud computing resources more efficiently (since the cluster’s resources are pooled for everything to share). That should obviously be a huge win, because it will leave more time and money for everything else we do.
Watch this space for updates as we make more improvements to our infrastructure over the coming months!
I’ve just released version 1.1 of the Panoptes Client for Python. The changelog has a full list of what’s new, but there are a few things I wanted to highlight, the first two of which will make it substantially faster to create new subjects:
Multithreaded media uploads – the client will automatically use several threads to upload media when you first save a new subject. So, for example, if you create a subject which has three images they will all upload simultaneously (up to five simultaneous uploads, then it will queue them).
Multithreaded subject creation – you can also simultaneously create the subjects themselves. That means if you’re creating, say, a thousand subjects, the client can queue them all and create up to five of them simultaneously. This works in conjunction with the media uploads, using one combined queue for the subject creation and the media uploads, to avoid overloading the network and to make sure the subject creation doesn’t get too far ahead of the uploads. This one isn’t automatic – you’ll need to create your subjects with the new SubjectSet.async_saves() context manager to take advantage of it.
Retries for all GET requests – we’re quite proud of how reliable the Zooniverse platform is, but sometimes server-side errors do happen. The client will now automatically retry all GET requests (i.e. the ones that don’t modify any data) if an error occurs, improving reliability.
Retries for batch linking operations – similar to above, the client will retry any add/remove operations via the new LinkCollection class, which handles linking groups of objects (i.e. subjects to a subject set, subjects to a collection, etc.). This means you should see far fewer failures when linking thousands of subjects to a subject set, for example.
Context manager for multiple connections – the Panoptes class can now act as a context manager, providing a safe way to perform operations as multiple users (for example, in a web app).
You can install the update by running pip install -U panoptes-client. Any bugs or issues should be raised via GitHub.
The fixes for this vulnerability are contained in pull requests #5141, #5142, and #5148 of the Panoptes Front End project on GitHub. Anyone running their own hosted copy of this should pull these changes as soon as possible.
Additional notes on our investigation are as follows:
The vulnerability was introduced on 14 May 2015, in pull request 324.
However, we audited the database but could find no evidence (other than our own tests) of this having been done by project owners.
Our current solution is to sanitise all external/social links – both when taking input from users and when rendering them on webpages – and only allowing standard website URLs to pass.
As a side effect of our fixes, project owners are now unable to add non-standard website URLs to their project’s external links – for example, https://example.com continues to work fine, but mailto:firstname.lastname@example.org no longer does.
We apologise for any concern this issue may have caused.
Hi all, I am Coleman Krawczyk and for the past year I have been working on tools to help Zooniverse research teams work with their data exports. The current version of the code (v1.3.0) supports data aggregation for nearly all the project builder task types, and support will be added for the remaining task types in the coming months.
What does this code do?
This code provides tools to allow research teams to process and aggregate classifications made on their project, or in other words, this code calculates the consensus answer for a given subject based on the volunteer classifications.
The code is written in python, but it can be run completely using three command line scripts (no python knowledge needed) and a project’s data exports.
The first script is the uses a project’s workflow data export to auto-configure what extractors and reducers (see below) should be run for each task in the workflow. This produces a series of `yaml` configuration files with reasonable default values selected.
Next the extraction script takes the classification data export and flattens it into a series of `csv` files, one for each unique task type, that only contain the data needed for the reduction process. Although the code tries its best to produce completely “flat” data tables, this is not always possible, so more complex tasks (e.g. drawing tasks) have structured data for some columns.
The final script takes the results of the data extraction and combine them into a single consensus result for each subject and each task (e.g. vote counts, clustered shapes, etc…). For more complex tasks (e.g. drawing tasks) the reducer’s configuration file accepts parameters to help tune the aggregation algorithms to best work with the data at hand.
At the moment this code is provided in its “offline” form, but we testing ways for this aggregation to be run “live” on a Zooniverse project. When that system is finished a research team will be able to enter their configuration parameters directly in the project builder, a server will run the aggregation code, and the extracted or reduced `csv` files will be made available for download.
Hot on the heels of last week’s update, I’ve just released version 1.0.3 of the Python Panoptes Client, which fixes a bug introduced in the previous release. If you encounter a TypeError when you try to create subjects, please update to this new version and that should fix it.
This release also updates the default client ID that is used to identify the client to the Panoptes API. This is to ensure that each of our API clients is using a unique ID.
As before, you can install the update by running pip install -U panoptes-client.
Last month I gave a talk at the Wetton Workshop in Oxford. Unlike the other talks that week, mine wasn’t about astronomy. I was talking about Docker – a useful tool which has become popular among people who run web services. We use it for practically everything here, and it’s pretty clear that researchers would find it useful if only more of them used it. That’s especially true in fields like astronomy, where a lot of people write their own code to process and analyse their data. If after reading this post you think you’d like to give Docker a try and you’d like some help getting started, just get in touch and I’ll be happy to help.
I’m going to give a brief outline of what Docker is and why it’s useful, but first let’s set the scene. You’re trying to run a script in Python that needs a particular version of NumPy. You install that version but it doesn’t seem to work. Or you already have a different version installed for another project and can’t change it. Or the version it needs is really old and isn’t available to download anymore. You spend hours installing different combinations of packages and eventually you get it working, but you’re not sure exactly what fixed it and you couldn’t repeat the same steps in the future if you wanted to exactly reproduce the environment you’re now working in.
Many projects require an interconnected web of dependencies, so there are a lot of things that can go wrong when you’re trying to get everything set up. There are a few tools that can help with some of these problems. For Python you can use virtual environments or Anaconda. Some languages install dependencies in the project directory to avoid conflicts, which can cause its own problems. None of that helps when the right versions of packages are simply not available any more, though, and none of those options makes it easy to just download and run your code without a lot of tedious setup. Especially if the person downloading it isn’t already familiar with Python, for example.
If people who download your code today can struggle to get it running, how will it be years from now when the version of NumPy you used isn’t around anymore and the current version is incompatible? That’s if there even is a current version after so many years. Maybe people won’t even be using Python then.
Luckily there is now a solution to all of this, and it’s called software containers. Software containers are a way of packaging applications into their own self-contained environment. Everything you need to run the application is bundled up with the application itself, and it is isolated from the rest of the operating system when it runs. You don’t need to install this and that, upgrade some other thing, check the phase of the moon, and hold your breath to get someone’s code running. You just run one command and whether the application was built with Python, Ruby, Java, or some other thing you’ve never heard of, it will run as expected. No setup required!
Docker is the most well-known way of running containers on your computer. There are other options, such as Kubernetes, but I’m only going to talk about Docker here.
Using containers could seriously improve the reproducibility of your research. If you bundle up your code and data in a Docker image, and publish that image alongside your papers, anyone in the world will be able to re-run your code and get the same results with almost no effort. That includes yourself a few years from now, when you don’t remember how your code works and half of its dependencies aren’t available to install any more.
There is a growing movement for researchers to publish not just their results, but also their raw data and the code they used to process it. Containers are the perfect mechanism for publishing both of those together. A search of arXiv shows there have only been 40 mentions of Docker in papers across all fields in the past year. For comparison there have been 474 papers which mention Python, many of which (possibly most, but I haven’t counted) are presenting scripts and modules created by the authors. That’s without even mentioning other programming languages. This is a missed opportunity, given how much easier it would be to run all this code if the authors provided Docker images. (Some of those authors might provide Docker images without mentioning it in the paper, but that number will be small.)
Docker itself is open source, and all the core file formats and designs are standardised by the Open Container Initiative. Besides Docker, other OCI members include tech giants such as Amazon, Facebook, Microsoft, Google, and lots of others. The technology is designed to be future proof and it isn’t going away, and you won’t be locked into any one vendor’s products by using it. If you package your software in a Docker container you can be reasonably certain it will still run years, or decades, from now. You can install Docker for free by downloading the community edition.
So how might Docker fit into your workday? Your development cycle will probably look something like this: First you’ll probably outline an initial version of the code, and then write a Dockerfile containing the instructions for installing the dependencies and running the code. Then it’s basically the same as what you’d normally do. As you’re working on the code, you’d iterate by building an image and then running that image as a container to test it. (With more advanced usage you can often avoid building a new image every time you run it, by mounting the working directory into the container at runtime.) Once the code is ready you can make it available by publishing the Docker image.
There are three approaches to publishing the image: push the image to the Docker Hub or another Docker registry, publish the Dockerfile along with your code, or export the image as a tar file and upload that somewhere. Obviously these aren’t mutually exclusive. You should do at least the first two, and it’s probably also wise to publish the tar file wherever you’d normally publish your data.
The Docker Hub is a free registry for images, so it’s a good place to upload your images so that other Docker users can find them. It’s also where you’ll find a wide selection of ready-built Docker images, both created by the Docker project themselves and created by other users. We at the Zooniverse publish all of the Docker images we use for our own work on the Docker Hub, and it’s an important part of how we manage our web services infrastructure. There are images for many major programming languages and operating system environments.
There are also a few packages which will allow you to run containers in high performance computing environments. Two popular ones are Singularity and Shifter. These will allow you to develop locally using Docker, and then convert your Docker image to run on your HPC cluster. That means the environment it runs in on the cluster will be identical to your development environment, so you won’t run into any surprises when it’s time to run it. Talk to your institution’s IT/HPC people to find out what options are available to you.
Hopefully I’ve made the case for using Docker (or containers in general) for your research. Check out the Docker getting started guide to find out more, and as I said at the beginning, if you’re thinking of using Docker in your research and you want a hand getting started, feel free to get in touch with me and I’ll be happy to help you.
This vulnerability was reported to us on June 20, 2018, by Lacroute Serge. We began testing fixes around three hours later, which were deployed about 15 hours after the original report, on June 21, 2018.
The fixes for this vulnerability are contained in pull requests #4710 and #4711 for the Panoptes Front End project on GitHub. Anyone running their own hosted copy of this should pull these changes as soon as possible.
We have investigated the cause and assessed the impact of this vulnerability. A summary of what we found follows:
No data was leaked as a result of this vulnerability. The vulnerability was not exploited for any malicious purpose and there was no unauthorised access to any of our systems.
The vulnerability was introduced on September 12, 2017, in a change which was part of our work to allow projects to be translated into multiple languages.
We found three projects that contained exploits for this vulnerability (not including projects created by our own team for testing purposes): two were created before the vulnerability was introduced, so the exploit wouldn’t have worked at the time they were created (it might have worked if the projects were visited between September 12, 2017, and June 21, 2018, but no-one did so); the remaining project was created by the security researcher who reported the vulnerability.
Our audit included previous titles for projects (all changes to projects are versioned, so we were able to audit any project titles which have since been changed).
No users other than the project owner and members of our development team visited any of these projects, so no other users activated any of the exploits.
We’d like to thank Lacroute Serge for reporting this vulnerability to us via the method detailed on our security page, following responsible disclosure by reporting it to us in private to give us the opportunity to fix it.
Part three in a multi-part series exploring the visual and UX changes to the Zooniverse classify interface
Today we’ll be going over a couple of visual changes to familiar elements of the classify interface and new additions we’re excited to premier. These updates haven’t been implemented yet, so nothing is set in stone. Please use this survey to send me feedback about these or any of the other updates to the Zooniverse.
Many respondents to my 2017 design survey requested that they be able to use the keyboard to make classifications rather than having to click so many buttons. One volunteer actually called the classifier “a carpal-tunnel torturing device”. As a designer, that’s hard to hear – it’s never the goal to actively injure our volunteers.
We actually do support keyboard shortcuts! This survey helped us realize that we need to be better at sharing some of the tools our developers have built. The image above shows a newly designed Keyboard Shortcut information modal. This modal (or “popup”) is a great example of a few of the modals we’re building – you can leave it open and drag it around the interface while you work, so you’ll be able to quickly refer to it whenever you need.
This behavior will be mirrored in a few of the modals that are currently available to you:
Add to Favorites
Add to Collection / Create a New Collection
It will also be applied to a few new ones, including…
Another major finding from the design survey was that users did not have a clear idea where to go when they needed help with a task (see chart below).
We know research teams often put a lot of effort into their help texts, and we wanted to be sure that work was reaching the largest possible audience. Hence, we moved the Field Guide from a small button on the right-hand side of the screen – a place that can become obscured by the browser’s scrollbar – and created a larger, more prominent button in the updated toolbar:
By placing the Field Guide button in a more prominent position and allowing the modal to stay open during classifications, we hope this tool will be taken advantage of more than it currently is.
The layout was the result of the audit of every live project I conducted in spring 2017:
Mode item count
Mode label word count
Min item count
Min label word count
Max items count
Max label word count
Using the mode gave me the basis on which to design; however, there’s quite a disparity between min and max amounts. Because of this disparity, we’ll be giving project owners with currently active projects a lot of warning before switching to the new layout, and they’ll have the option to continue to use the current Field Guide design if they’d prefer.
Another major resource Zooniverse offers its research teams and volunteers is the Tutorial. Often used to explain project goals, welcome new volunteers to the project, and point out what to look for in an image, the current tutorial is often a challenge because its absolute positioning on top of the subject image.
In this iteration of the classify interface, the tutorial opens once as a modal, just as it does now, and then lives in a tab in the task area where it’s much more easily accessible. You’ll be able to switch to the Tutorial tab in order to compare the example images and information with the subject image you’re looking at, rather than opening and closing the tutorial box many times.
A brand-new statistics section
Another major comment from the survey was that volunteers wanted more ways to interact with the Zooniverse. Thus, you’ll be able to scroll down to find a brand-new section! Features we’re adding will include:
Your previous classifications with Add to Favorites or Add to Collection buttons
Interesting stats, like the amount of classifications you’ve done and the amount of classifications your community have done
Links to similar projects you might be interested in
Links to the project’s blog and social media to help you feel more connected to the research team
Links to the project’s Talk boards, for a similar purpose
Possibly: A way to indicate that you’re finished for the day, giving you the option to share your experience on social media or find another project you’re interested in.
The statistics we chose were directly related to the responses from the survey:
Respondents were able to choose more than one response; when asked to rank them in order of importance, project-wide statistics were chosen hands-down:
We also heard that volunteers sometimes felt disconnected from research teams and the project’s accomplishments:
“In general there is too less information about the achievement of completed projects. Even simple facts could cause a bit of a success-feeling… how many pictures in this project over all have been classified? How much time did it take? How many hours were invested by all participating citizens? Were there any surprising things for the scientists? Things like that could be reported long before the task of a project is completely fullfilled.”
Research teams often spend hours engaged in dialog with volunteers on Talk, but not everyone who volunteers on Zooniverse is aware or active on Talk. Adding a module on the classify page showing recent Talk posts will bring more awareness to this amazing resource and hopefully encourage more engagement from volunteers.
Templates for different image sizes and dimensions
When the project builder was created, we couldn’t have predicted the variety of disparate topics that would become Zooniverse projects. Originally, the subject viewer was designed for one common image size, roughly 2×3, and other sizes have since been shoehorned in to fit as well as they can.
Now, we’d like to make it easier for subjects with extreme dimensions, multimedia subjects, and multi-image subjects to fit better within the project builder. By specifically designing templates and allowing project owners to choose the one that best fits their subjects, volunteers and project owners alike will have a better experience.
Very wide subjects will see their toolbar moved to the bottom of the image rather than on the right, to give the image as much horizontal space as possible. Tall subjects will be about the same width as they have been, but the task/tutorial box will stay fixed on the screen as the image scrolls, eliminating the need to scroll up and down as often when looking at the bottom of the subject.
Let’s get started!
I’m so excited for the opportunity to share a preview of these changes with you. Zooniverse is a collaborative project, so if there’s anything you’d like us to address as we implement this update, please use this survey to share your thoughts and suggestions. Since we’re rolling these out in pieces, it will be much easier for us to be able to iterate, test, and make changes.
We estimate that the updates will be mostly in place by early 2019, so there’s plenty of time to make sure we’re creating the best possible experience for everyone.
Thank you so much for your patience and understanding as we move forward. In the future, we’ll be as open and transparent as possible about this process.