Panoptes Client for Python 1.0.3

Hot on the heels of last week’s update, I’ve just released version 1.0.3 of the Python Panoptes Client, which fixes a bug introduced in the previous release. If you encounter a TypeError when you try to create subjects, please update to this new version and that should fix it.

This release also updates the default client ID that is used to identify the client to the Panoptes API. This is to ensure that each of our API clients is using a unique ID.

As before, you can install the update by running pip install -U panoptes-client.

Experiments on the Zooniverse

Occasionally we run studies in collaboration with external  researchers in order to better understand our community and improve our platform. These can involve methods such as A/B splits, where we show a slightly different version of the site to one group of volunteers and measure how it affects their participation, e.g. does it influence how many classifications they make or their likelihood to return to the project for subsequent sessions?

One example of such a study was the messaging experiment we ran on Galaxy Zoo.  We worked with researchers from Ben Gurion University and Microsoft research to test if the specific content and timing of messages presented in the classification interface could help alleviate the issue of volunteers disengaging from the project. You can read more about that experiment and its results in this Galaxy Zoo blog post https://blog.galaxyzoo.org/2018/07/12/galaxy-zoo-messaging-experiment-results/.

As the Zooniverse has different teams based at different institutions in the UK and the USA, the procedures for ethics approval differ depending on who is leading the study. After recent discussions with staff at the University of Oxford ethics board, to check our procedure was up to date, our Oxford-based team will be changing the way in which we gain approval for, and report the completion of these types of studies. All future study designs which feature Oxford staff taking part in the analysis will be submitted to CUREC, something we’ve been doing for the last few years. From now on, once the data gathering stage of the study has been run we will provide all volunteers involved with a debrief message.

The debrief will explain to our volunteers that they have been involved in a study, along with providing information about the exact set-up of the study and what the research goals were. The most significant change is that, before the data analysis is conducted, we will contact all volunteers involved in the study allow a period of time for them to state that they would like to withdraw their consent to the use of their data. We will then remove all data associated with any volunteer who would not like to be involved before the data is analysed and the findings are presented. The debrief will also contain contact details for the researchers in the event of any concerns and complaints. You can see an example of such a debrief in our original post about the Galaxy Zoo messaging experiment here https://blog.galaxyzoo.org/2015/08/10/messaging-test/.

As always, our primary focus is the research being enabled by our volunteer community on our individual projects. We run experiments like these in order to better understand how to create a more efficient and productive platform that benefits both our volunteers and the researchers we support. All clicks that are made by our volunteers are used in the science outcomes from our projects no matter whether they are part of an A/B split experiment or not. We still strive never to waste any volunteer time or effort.

We thank you for all that you do, and for helping us learn how to build a better Zooniverse.

Who’s in the Zoo – Sophia Vaughan

Hello, my name is Sophia Vaughan, I’m a Physics studeIntro2nt from Oxford University but I don’t spend all of my time studying physics, I also like to knit and go to science fiction conventions (especially Star Trek ones).

I’m posting on this blog because I’ve just started a summer project here at Zooniverse and while I’m here hoping to create a new project for the Zooniverse. But don’t worry you don’t need to be a scientist, student or a professor to create a project, all you need is Zooniverse account! If all goes well you’ll see future blog post(s) from me on how to build your own project for using the project builder.

Why you should use Docker in your research

Last month I gave a talk at the Wetton Workshop in Oxford. Unlike the other talks that week, mine wasn’t about astronomy. I was talking about Docker – a useful tool which has become popular among people who run web services. We use it for practically everything here, and it’s pretty clear that researchers would find it useful if only more of them used it. That’s especially true in fields like astronomy, where a lot of people write their own code to process and analyse their data. If after reading this post you think you’d like to give Docker a try and you’d like some help getting started, just get in touch and I’ll be happy to help.

I’m going to give a brief outline of what Docker is and why it’s useful, but first let’s set the scene. You’re trying to run a script in Python that needs a particular version of NumPy. You install that version but it doesn’t seem to work. Or you already have a different version installed for another project and can’t change it. Or the version it needs is really old and isn’t available to download anymore. You spend hours installing different combinations of packages and eventually you get it working, but you’re not sure exactly what fixed it and you couldn’t repeat the same steps in the future if you wanted to exactly reproduce the environment you’re now working in. 

Many projects require an interconnected web of dependencies, so there are a lot of things that can go wrong when you’re trying to get everything set up. There are a few tools that can help with some of these problems. For Python you can use virtual environments or Anaconda. Some languages install dependencies in the project directory to avoid conflicts, which can cause its own problems. None of that helps when the right versions of packages are simply not available any more, though, and none of those options makes it easy to just download and run your code without a lot of tedious setup. Especially if the person downloading it isn’t already familiar with Python, for example.

If people who download your code today can struggle to get it running, how will it be years from now when the version of NumPy you used isn’t around anymore and the current version is incompatible? That’s if there even is a current version after so many years. Maybe people won’t even be using Python then.

Luckily there is now a solution to all of this, and it’s called software containers. Software containers are a way of packaging applications into their own self-contained environment. Everything you need to run the application is bundled up with the application itself, and it is isolated from the rest of the operating system when it runs. You don’t need to install this and that, upgrade some other thing, check the phase of the moon, and hold your breath to get someone’s code running. You just run one command and whether the application was built with Python, Ruby, Java, or some other thing you’ve never heard of, it will run as expected. No setup required!

Docker is the most well-known way of running containers on your computer. There are other options, such as Kubernetes, but I’m only going to talk about Docker here.

Using containers could seriously improve the reproducibility of your research. If you bundle up your code and data in a Docker image, and publish that image alongside your papers, anyone in the world will be able to re-run your code and get the same results with almost no effort. That includes yourself a few years from now, when you don’t remember how your code works and half of its dependencies aren’t available to install any more.

There is a growing movement for researchers to publish not just their results, but also their raw data and the code they used to process it. Containers are the perfect mechanism for publishing both of those together. A search of arXiv shows there have only been 40 mentions of Docker in papers across all fields in the past year. For comparison there have been 474 papers which mention Python, many of which (possibly most, but I haven’t counted) are presenting scripts and modules created by the authors. That’s without even mentioning other programming languages. This is a missed opportunity, given how much easier it would be to run all this code if the authors provided Docker images. (Some of those authors might provide Docker images without mentioning it in the paper, but that number will be small.)

Docker itself is open source, and all the core file formats and designs are standardised by the Open Container Initiative. Besides Docker, other OCI members include tech giants such as Amazon, Facebook, Microsoft, Google, and lots of others. The technology is designed to be future proof and it isn’t going away, and you won’t be locked into any one vendor’s products by using it. If you package your software in a Docker container you can be reasonably certain it will still run years, or decades, from now. You can install Docker for free by downloading the community edition.

So how might Docker fit into your workday? Your development cycle will probably look something like this: First you’ll probably outline an initial version of the code, and then write a Dockerfile containing the instructions for installing the dependencies and running the code. Then it’s basically the same as what you’d normally do. As you’re working on the code, you’d iterate by building an image and then running that image as a container to test it. (With more advanced usage you can often avoid building a new image every time you run it, by mounting the working directory into the container at runtime.) Once the code is ready you can make it available by publishing the Docker image.

There are three approaches to publishing the image: push the image to the Docker Hub or another Docker registry, publish the Dockerfile along with your code, or export the image as a tar file and upload that somewhere. Obviously these aren’t mutually exclusive. You should do at least the first two, and it’s probably also wise to publish the tar file wherever you’d normally publish your data.

 

The Docker Hub is a free registry for images, so it’s a good place to upload your images so that other Docker users can find them. It’s also where you’ll find a wide selection of ready-built Docker images, both created by the Docker project themselves and created by other users. We at the Zooniverse publish all of the Docker images we use for our own work on the Docker Hub, and it’s an important part of how we manage our web services infrastructure. There are images for many major programming languages and operating system environments.

There are also a few packages which will allow you to run containers in high performance computing environments. Two popular ones are Singularity and Shifter. These will allow you to develop locally using Docker, and then convert your Docker image to run on your HPC cluster. That means the environment it runs in on the cluster will be identical to your development environment, so you won’t run into any surprises when it’s time to run it. Talk to your institution’s IT/HPC people to find out what options are available to you.

Hopefully I’ve made the case for using Docker (or containers in general) for your research. Check out the Docker getting started guide to find out more, and as I said at the beginning, if you’re thinking of using Docker in your research and you want a hand getting started, feel free to get in touch with me and I’ll be happy to help you. 

Who’s who in the Zoo – Michele Darrow

In this week’s edition of Who’s who in the Zoo, meet Science Scribbler‘s Michele Darrow.

– Helen


 

IMG_20180427_171949

 

Project: Science Scribbler

Researcher: Michele Darrow, Development Scientist

Location: TTP Labtech & Diamond Light Source

 

What are your main research interests?

I’m interested in various imaging techniques and how best to process the data collected from them to answer both biological questions and to develop algorithms to automate the process in the future.

 

Who else is in your project team? What are their roles?

Our team spans biologists, imaging technique experts and software engineers.

 

Tell us more about the data used in your project

Right now, Science Scribbler is working to process a dataset related to Huntington’s Disease. Many of the circular objects marked by volunteers as part of the project are organelles. These are smaller compartments inside of each cell, and each type of compartment has a different job. In Huntington’s Disease, some of the compartments are dysfunctional, but details about how the disease leads to this dysfunction aren’t really known. By comparing a diseased cell to a non-diseased cell we’re hoping to learn about organelle changes due to the disease.

 

How do Zooniverse volunteers contribute to your research? 

Zooniverse volunteers are amazing! Without them, this project would be moving at a glacial speed!

For this project, volunteers are asked to place marks in the centers of objects and outline the objects. This information will first tell us where all of the organelles are so we can begin to answer our biological question. And second, using this dataset as a standard, we hope to create a computer algorithm that will use just the center points to find the outlines of the objects, making the task easier and faster in the future.

 

 

What discoveries, and other outputs, has your project led to so far?

We have retired 50% of the images in this first dataset. Because we’re far enough along, Mark, one of the software developers on the project, has begun to process the data. You can see the code that he is working on and his descriptions of what it does here.(https://github.com/DiamondLightSource/zooniverse/blob/master/notebooks/science_scribbler.ipynb)

 

 

What’s in store for your project in the future?

The problem of segmentation – marking out parts of a dataset in order to analyze is – is a common problem across many imaging techniques and disciplines. One of the benefits of working on these problems at Diamond Light Source is that there are always people and projects that need improved segmentation. Our next project on Science Scribbler will focus on a new imaging technique and a new segmentation problem. This switch in focus will give the original project some time to process the data, come up with potential computer algorithms to improve the process and think about the next steps. And it will give a new project and new researchers a chance to interact with the Zooniverse to jumpstart their research!

 

What are your favourite other citizen research projects and why?

I love Wildwatch Kenya! I find it super chill and it’s always fun when you find an animal in the picture!

 

 

And finally, when not at work, where are we most likely to find you?

Long walks with my husband and dogs (Jinx and Charm) and reading good books (right now, re-reading the Harry Potter series).

 

IMG_20180612_110727 - Michele Darrow

Fixed cross-site scripting vulnerability on project home pages

We recently fixed a security vulnerability in the way project titles are handled on project home pages. Prior to this it was possible to create a project which included Javascript in its name, and thus inject code into the page. After investigating this incident, we have determined that this vulnerability has not been exploited for any malicious purpose; no data was leaked and no users were exposed to injected code.

This vulnerability was reported to us on June 20, 2018, by Lacroute Serge. We began testing fixes around three hours later, which were deployed about 15 hours after the original report, on June 21, 2018.

The fixes for this vulnerability are contained in pull requests #4710 and #4711 for the Panoptes Front End project on GitHub. Anyone running their own hosted copy of this should pull these changes as soon as possible.

We have investigated the cause and assessed the impact of this vulnerability. A summary of what we found follows:

  • No data was leaked as a result of this vulnerability. The vulnerability was not exploited for any malicious purpose and there was no unauthorised access to any of our systems.
  • The vulnerability was introduced on September 12, 2017, in a change which was part of our work to allow projects to be translated into multiple languages.
  • We found three projects that contained exploits for this vulnerability (not including projects created by our own team for testing purposes): two were created before the vulnerability was introduced, so the exploit wouldn’t have worked at the time they were created (it might have worked if the projects were visited between September 12, 2017, and June 21, 2018, but no-one did so); the remaining project was created by the security researcher who reported the vulnerability.
  • Our audit included previous titles for projects (all changes to projects are versioned, so we were able to audit any project titles which have since been changed).
  • All three projects contained only benign code to display a JavaScript alert box. None of them attempted to perform any malicious actions.
  • No users other than the project owner and members of our development team visited any of these projects, so no other users activated any of the exploits.

We’d like to thank Lacroute Serge for reporting this vulnerability to us via the method detailed on our security page, following responsible disclosure by reporting it to us in private to give us the opportunity to fix it.