Category Archives: Science

Experiments on the Zooniverse

Occasionally we run studies in collaboration with external  researchers in order to better understand our community and improve our platform. These can involve methods such as A/B splits, where we show a slightly different version of the site to one group of volunteers and measure how it affects their participation, e.g. does it influence how many classifications they make or their likelihood to return to the project for subsequent sessions?

One example of such a study was the messaging experiment we ran on Galaxy Zoo.  We worked with researchers from Ben Gurion University and Microsoft research to test if the specific content and timing of messages presented in the classification interface could help alleviate the issue of volunteers disengaging from the project. You can read more about that experiment and its results in this Galaxy Zoo blog post https://blog.galaxyzoo.org/2018/07/12/galaxy-zoo-messaging-experiment-results/.

As the Zooniverse has different teams based at different institutions in the UK and the USA, the procedures for ethics approval differ depending on who is leading the study. After recent discussions with staff at the University of Oxford ethics board, to check our procedure was up to date, our Oxford-based team will be changing the way in which we gain approval for, and report the completion of these types of studies. All future study designs which feature Oxford staff taking part in the analysis will be submitted to CUREC, something we’ve been doing for the last few years. From now on, once the data gathering stage of the study has been run we will provide all volunteers involved with a debrief message.

The debrief will explain to our volunteers that they have been involved in a study, along with providing information about the exact set-up of the study and what the research goals were. The most significant change is that, before the data analysis is conducted, we will contact all volunteers involved in the study allow a period of time for them to state that they would like to withdraw their consent to the use of their data. We will then remove all data associated with any volunteer who would not like to be involved before the data is analysed and the findings are presented. The debrief will also contain contact details for the researchers in the event of any concerns and complaints. You can see an example of such a debrief in our original post about the Galaxy Zoo messaging experiment here https://blog.galaxyzoo.org/2015/08/10/messaging-test/.

As always, our primary focus is the research being enabled by our volunteer community on our individual projects. We run experiments like these in order to better understand how to create a more efficient and productive platform that benefits both our volunteers and the researchers we support. All clicks that are made by our volunteers are used in the science outcomes from our projects no matter whether they are part of an A/B split experiment or not. We still strive never to waste any volunteer time or effort.

We thank you for all that you do, and for helping us learn how to build a better Zooniverse.

Advertisements

Why you should use Docker in your research

Last month I gave a talk at the Wetton Workshop in Oxford. Unlike the other talks that week, mine wasn’t about astronomy. I was talking about Docker – a useful tool which has become popular among people who run web services. We use it for practically everything here, and it’s pretty clear that researchers would find it useful if only more of them used it. That’s especially true in fields like astronomy, where a lot of people write their own code to process and analyse their data. If after reading this post you think you’d like to give Docker a try and you’d like some help getting started, just get in touch and I’ll be happy to help.

I’m going to give a brief outline of what Docker is and why it’s useful, but first let’s set the scene. You’re trying to run a script in Python that needs a particular version of NumPy. You install that version but it doesn’t seem to work. Or you already have a different version installed for another project and can’t change it. Or the version it needs is really old and isn’t available to download anymore. You spend hours installing different combinations of packages and eventually you get it working, but you’re not sure exactly what fixed it and you couldn’t repeat the same steps in the future if you wanted to exactly reproduce the environment you’re now working in. 

Many projects require an interconnected web of dependencies, so there are a lot of things that can go wrong when you’re trying to get everything set up. There are a few tools that can help with some of these problems. For Python you can use virtual environments or Anaconda. Some languages install dependencies in the project directory to avoid conflicts, which can cause its own problems. None of that helps when the right versions of packages are simply not available any more, though, and none of those options makes it easy to just download and run your code without a lot of tedious setup. Especially if the person downloading it isn’t already familiar with Python, for example.

If people who download your code today can struggle to get it running, how will it be years from now when the version of NumPy you used isn’t around anymore and the current version is incompatible? That’s if there even is a current version after so many years. Maybe people won’t even be using Python then.

Luckily there is now a solution to all of this, and it’s called software containers. Software containers are a way of packaging applications into their own self-contained environment. Everything you need to run the application is bundled up with the application itself, and it is isolated from the rest of the operating system when it runs. You don’t need to install this and that, upgrade some other thing, check the phase of the moon, and hold your breath to get someone’s code running. You just run one command and whether the application was built with Python, Ruby, Java, or some other thing you’ve never heard of, it will run as expected. No setup required!

Docker is the most well-known way of running containers on your computer. There are other options, such as Kubernetes, but I’m only going to talk about Docker here.

Using containers could seriously improve the reproducibility of your research. If you bundle up your code and data in a Docker image, and publish that image alongside your papers, anyone in the world will be able to re-run your code and get the same results with almost no effort. That includes yourself a few years from now, when you don’t remember how your code works and half of its dependencies aren’t available to install any more.

There is a growing movement for researchers to publish not just their results, but also their raw data and the code they used to process it. Containers are the perfect mechanism for publishing both of those together. A search of arXiv shows there have only been 40 mentions of Docker in papers across all fields in the past year. For comparison there have been 474 papers which mention Python, many of which (possibly most, but I haven’t counted) are presenting scripts and modules created by the authors. That’s without even mentioning other programming languages. This is a missed opportunity, given how much easier it would be to run all this code if the authors provided Docker images. (Some of those authors might provide Docker images without mentioning it in the paper, but that number will be small.)

Docker itself is open source, and all the core file formats and designs are standardised by the Open Container Initiative. Besides Docker, other OCI members include tech giants such as Amazon, Facebook, Microsoft, Google, and lots of others. The technology is designed to be future proof and it isn’t going away, and you won’t be locked into any one vendor’s products by using it. If you package your software in a Docker container you can be reasonably certain it will still run years, or decades, from now. You can install Docker for free by downloading the community edition.

So how might Docker fit into your workday? Your development cycle will probably look something like this: First you’ll probably outline an initial version of the code, and then write a Dockerfile containing the instructions for installing the dependencies and running the code. Then it’s basically the same as what you’d normally do. As you’re working on the code, you’d iterate by building an image and then running that image as a container to test it. (With more advanced usage you can often avoid building a new image every time you run it, by mounting the working directory into the container at runtime.) Once the code is ready you can make it available by publishing the Docker image.

There are three approaches to publishing the image: push the image to the Docker Hub or another Docker registry, publish the Dockerfile along with your code, or export the image as a tar file and upload that somewhere. Obviously these aren’t mutually exclusive. You should do at least the first two, and it’s probably also wise to publish the tar file wherever you’d normally publish your data.

 

The Docker Hub is a free registry for images, so it’s a good place to upload your images so that other Docker users can find them. It’s also where you’ll find a wide selection of ready-built Docker images, both created by the Docker project themselves and created by other users. We at the Zooniverse publish all of the Docker images we use for our own work on the Docker Hub, and it’s an important part of how we manage our web services infrastructure. There are images for many major programming languages and operating system environments.

There are also a few packages which will allow you to run containers in high performance computing environments. Two popular ones are Singularity and Shifter. These will allow you to develop locally using Docker, and then convert your Docker image to run on your HPC cluster. That means the environment it runs in on the cluster will be identical to your development environment, so you won’t run into any surprises when it’s time to run it. Talk to your institution’s IT/HPC people to find out what options are available to you.

Hopefully I’ve made the case for using Docker (or containers in general) for your research. Check out the Docker getting started guide to find out more, and as I said at the beginning, if you’re thinking of using Docker in your research and you want a hand getting started, feel free to get in touch with me and I’ll be happy to help you. 

The Universe Inside Our Cells

Below is the first in a series of guest blog posts from researchers working on one of our recently launched biomedical projects, Etch A Cell.

Read on to let Dr Martin Jones tell you about the work they’re doing to further understanding of the universe inside our cells!

– Helen

 

Having trained as a physicist, with many friends working in astronomy, I’ve been aware of Galaxy Zoo and the Zooniverse from the very early days. My early research career was in quantum mechanics, unfortunately not an area where people’s intuitions are much use! However, since I found myself working in biology labs, now at the Francis Crick Institute in London, I have been working in various aspects of microscopy – a much more visual enterprise and one where human analysis is still the gold standard. This is particularly true in electron microscopy, where the busy nature of the images means that many regions inside a cell look very similar. In order to make sense of the images, a person is able to assimilate a whole range of extra context and previous knowledge in a way that computers, for the most part, are simply unable to do. This makes it a slow and labour-intensive process. As if this wasn’t already a hard enough problem, in recent years it has been compounded by new technologies that mean the microscopes now capture images around 100 times faster than before.

Picture1
Focused ion beam scanning electron microscope

 

Ten years ago it was more or less possible to manually analyse the images at the same rate as they were acquired, keeping the in-tray and out-tray nicely balanced. Now, however, that’s not the case. To illustrate that, here’s an example of a slice through a group of cancer cells, known as HeLa cells:

Picture2

We capture an image like this and then remove a very thin layer – sometimes as thin as 5 nanometres (one nanometre is a billionth of a metre) – and then repeat… a lot! Building up enormous stacks of these images can help us understand the 3D nature of the cells and the structures inside them. For a sense of scale, this whole image is about the width of a human hair, around 80 millionths of a metre.

Zooming in to one of the cells, you can see many different structures, all of which are of interest to study in biomedical research. For this project, however, we’re just focusing on the nucleus for now. This is the large mostly empty region in the middle, where the DNA – the instruction set for building the whole body – is contained.

Picture3

By manually drawing lines around the nucleus on each slice, we can build up a 3D model that allows us to make comparisons between cells, for example understanding whether a treatment for a disease is able to stop its progression by disrupting the cells’ ability to pass on its genetic information.

Nucleus3D-1.gif

Animated gif of 3D model of a nucleus

However, images are now being generated so rapidly that the in-tray is filling too quickly for the standard “single expert” method – one sample can produce up to a terabyte of data, made up of more than a thousand 64 megapixel images captured overnight. We need new tricks!

 

Why citizen science?

With all of the advances in software that are becoming available you might think that automating image analysis of this kind would be quite straightforward for a computer. After all, people can do it relatively easily. Even pigeons can be trained in certain image analysis tasks! (http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0141357). However, there is a long history of underestimating just how hard it is to automate image analysis with a computer. Back in the very early days of artificial intelligence in 1966 at MIT, Marvin Minsky (who also invented the confocal microscope) and his colleague Seymour Papert set the “summer vision project” which they saw as a simple problem to keep their undergraduate students busy over the holidays. Many decades later we’ve discovered it’s not that easy!

Picture4

(from https://www.xkcd.com/1425/)

Our project, Etch a Cell is designed to allow citizen scientists to draw segmentations directly onto our images in the Zooniverse web interface. The first task we have set is to mark the nuclear envelope that separates the nucleus from the rest of the cell – a vital structure where defects can cause serious problems. These segmentations are extremely useful in their own right for helping us understand the structures, but citizen science offers something beyond the already lofty goal of matching the output of an expert. By allowing several people to annotate each image, we can see how the lines vary from user to user. This variability gives insight into the certainty that a given pixel or region belongs to a particular object, information that simply isn’t available from a single line drawn by one person. Difference between experts is not unheard of unfortunately!

The images below show preliminary results with the expert analysis on the left and a combination of 5 citizen scientists’ segmentations on the right.

Screen Shot 2017-06-21 at 15.29.00
Example of expert vs. citizen scientist annotation

In fact, we can go even further to maximise the value of our citizen scientists’ work. The field of machine learning, in particular deep learning, has burst onto the scene in several sectors in recent years, revolutionising many computational tasks. This new generation of image analysis techniques is much more closely aligned with how animal vision works. The catch, however, is that the “learning” part of machine learning often requires enormous amounts of time and resources (remember you’ve had a lifetime to train your brain!). To train such a system, you need a huge supply of so-called “ground truth” data, i.e. something that an expert has pre-analysed and can provide the correct answer against which the computer’s attempts are compared. Picture it as the kind of supervised learning that you did at school: perhaps working through several old exam papers in preparation for your finals. If the computer is wrong, you tweak the setup a bit and try again. By presenting thousands or even millions of images and ensuring your computer makes the same decision as the expert, you can become increasingly confident that it will make the correct decision when it sees a new piece of data. Using the power of citizen science will allow us to collect the huge amounts of data that we need to train these deep learning systems, something that would be impossible by virtually any other means.

We are now busily capturing images that we plan to upload to Etch a cell to allow us to analyse data from a range of experiments. Differences in cell type, sub-cellular organelle, microscope, sample preparation and other factors mean the images can look different across experiments, so analysing cells from a range of different conditions will allow us to build an atlas of information about sub-cellular structure. The results from Etch a cell will mean that whenever new data arrives, we can quickly extract information that will help us work towards treatments and cures for many different diseases.

Studying the Impact of the Zooniverse

Below is a guest post from a researcher who has been studying the Zooniverse and who just published a paper called ‘Crowdsourced Science: Sociotechnical epistemology in the e-research paradigm’. That being a bit of a mouthful, I asked him to introduce himself and explain – Chris.

My name is David Watson and I’m a data scientist at Queen Mary University of London’s Centre for Translational Bioinformatics. As an MSc student at the Oxford Internet Institute back in 2015, I wrote my thesis on crowdsourcing in the natural sciences. I got in touch with several members of the Zooniverse team, who were kind enough to answer all my questions (I had quite a lot!) and even provide me with an invaluable dataset of aggregated transaction logs from 2014. Combining this information with publication data from a variety of sources, I examined the impact of crowdsourcing on knowledge production across the sciences.

Last week, the philosophy journal Synthese published a (significantly) revised version of my thesis, co-authored by my advisor Prof. Luciano Floridi. We found that Zooniverse projects not only processed far more observations than comparable studies conducted via more traditional methods—about an order of magnitude more data per study on average—but that the resultant papers vastly outperformed others by researchers using conventional means. Employing the formal tools of Bayesian confirmation theory along with statistical evidence from and about Zooniverse, we concluded that crowdsourced science is more reliable, scalable, and connective than alternative methods when certain common criteria are met.

In a sense, this shouldn’t really be news. We’ve known for over 200 years that groups are usually better than individuals at making accurate judgments (thanks, Marie Jean Antoine Nicolas de Caritat, aka Marquis de Condorcet!) The wisdom of crowds has been responsible for major breakthroughs in software development, event forecasting, and knowledge aggregation. Modern science has become increasingly dominated by large scale projects that pool the labour and expertise of vast numbers of researchers.

We were surprised by several things in our research, however. First, the significance of the disparity between the performance of publications by Zooniverse and those by other labs was greater than expected. This plot represents the distribution of citation percentiles by year and data source for articles by both groups. Statistical tests confirm what your eyes already suspect—it ain’t even close.

Influence of Zooniverse Articles

We were also impressed by the networks that appear in Zooniverse projects, which allow users to confer with one another and direct expert attention toward particularly anomalous observations. In several instances this design has resulted in patterns of discovery, in which users flag rare data that go on to become the topic of new projects. This structural innovation indicates a difference not just of degree but of kind between so-called “big science” and crowdsourced e-research.

If you’re curious to learn more about our study of Zooniverse and the site’s implications for sociotechnical epistemology, check out our complete article.

Pop-ups on Comet Hunters

pasted-image-at-2016_10_20-11_05-am

 

We’re testing out a new feature of our interface, which means if you’re classifying images on Comet Hunters you may see occasional pop-up messages like the one pictured above.

The messages are designed to give you more information about the project. If you do not want to see them, you have the option to opt-out of seeing any future messages. Just click the link at the bottom of the pop-up.

You can have a look at this new feature by contributing some classifications today at www.comethunters.org.

Asteroid Zoo Paused

The AsteroidZoo community has exhausted the data that are available at this time. With all the data examined we are going to pause the experiment, and before users spend more time we want to make sure that we can process your finds through the Minor Planet Center and get highly reliable results.

We understand that it’s frustrating when you’ve put in a lot of work, and there isn’t a way to confirm how well you’ve done. But please keep in mind that this was an experiment – How well do humans find asteroids that machines cannot?

Often times in science an experiment can run into dead-ends, or speed-bumps; this is just the nature of science. There is no question that the AsteroidZoo community has found several potential asteroid candidates that machines and algorithms simply missed. However, the conversion of these tantalizing candidates into valid results has encountered a speed bump.

What’s been difficult is that all the processing to make an asteroid find “real” has been based on the precision of a machine – for example the arc of an asteroid must be the correct shape to a tiny fraction of a pixel to be accepted as a good measurement. The usual process of achieving such great precision is hands-on, and might take takes several humans weeks to get right. On AsteroidZoo, given the large scale of the data, automating the process of going from clicks to precise trajectories has been the challenge.

While we are paused, there will be updates to both the analysis process, and the process of confirming results with the Minor Planet Center. Updates will be posted as they become available.

https://talk.asteroidzoo.org/
http://reporting.asteroidzoo.org/

Thank you for your time.

Sunspotter Citizen Science Challenge: 29th August – 6th September

Calling all Zooniverse volunteers!  As we transition from the dog days of summer to the pumpkin spice latte days of fall (well, in the Northern hemisphere at least) it’s time to mobilize and do science!

Sunspotter Citizen Science Challenge

Our Zooniverse community of over 1.3 million volunteers has the ability to focus efforts and get stuff done. Join us for the Sunspotter Citizen Science Challenge! From August 29th to September 5th, it’s a mad sprint to complete 250,000 classifications on Sunspotter.

Sunspotter needs your help so that we can better understand and predict how the Sun’s magnetic activity affects us on Earth. The Sunspotter science team has three primary goals:

  1. Hone a more accurate measure of sunspot group complexity
  2. Improve how well we are able to forecast solar activity
  3. Create a machine-learning algorithm based on your classifications to automate the ranking of sunspot group complexity
Classifying on Sunspotter
Classifying on Sunspotter

In order to achieve these goals, volunteers like you compare two sunspot group images taken by the Solar and Heliospheric Observatory and choose the one you think is more complex.  Sunspotter is what we refer to as a “popcorn project”.  This means you can jump right in to the project and that each classification is quick, about 1-3 seconds.

Let’s all roll up our sleeves and advance our knowledge of heliophysics!

The Science of Citizen Science: Meetings in San Jose This Week

I and other Galaxy Zoo and Zooniverse scientists are looking forward to the Citizen Science Association (CSA) and American Association for the Advancement of Scientists (AAAS) meetings in San Jose, California this week.

As I mentioned in an earlier post, we’ve organized an AAAS session that is titled, “Citizen Science from the Zooniverse: Cutting-Edge Research with 1 Million Scientists,” which will take place on Friday afternoon. It fits well with the AAAS’s them this year: “Innovations, Information, and Imaging.” Our excellent line-up includes Laura Whyte (Adler) on Zooniverse, Brooke Simmons (Oxford) on Galaxy Zoo, Alexandra Swanson (U. of Minnesota) on Snapshot Serengeti, Kevin Wood (U. of Washington) on Old Weather, Paul Pharoah (Cambridge) on Cell Slider, and Phil Marshall (Stanford) on Space Warps.

And in other recent Zooniverse news, which you may have heard already, citizen scientists from the Milky Way Project examined infrared images from NASA’s Spitzer Space Telescope and found lots of “yellow balls” in our galaxy. It turns out that these are indications of early stages of massive star formation, such that the new stars heat up the dust grains around them. Charles Kerton and Grace Wolf-Chase have published the results in the Astrophysical Journal.

But let’s get back to the AAAS meeting. It looks like many other talks, sessions, and papers presented there involve citizen science too. David Baker (FoldIt) will give plenary lecture on post-evolutionary biology and protein structures on Saturday afternoon. Jennifer Shirk (Cornell), Meg Domroese and others from CSA have a session Sunday morning, in which they will describe ways to utilize citizen science for public engagement. (See also this related session on science communication.) Then in a session Sunday afternoon, people from the European Commission and other institutions will speak about global earth observation systems and citizen scientists tackling urban environmental hazards.

Before all of that, we’re excited to attend the CSA’s pre-conference on Wednesday and Thursday. (See their online program.) Chris Filardi (Director of Pacific Programs, Center for Biodiversity and Conservation, American Museum of Natural History) and Amy Robinson (Executive Director of EyeWire, a game to map the neural circuits of the brain) will give the keynote addresses there. For the rest of the meeting, as with the AAAS, there will be parallel sessions.

The first day of the CSA meeting will include: many sessions on education and learning at multiple levels; sessions on diversity, inclusion, and broadening engagement; a session on defining and measuring engagement, participation, and motivations; a session on CO2 and air quality monitoring; a session on CS in biomedical research;
and sessions on best practices for designing and implementing CS projects, including a talk by Chris Lintott on the Zooniverse and Nicole Gugliucci on CosmoQuest. The second day will bring many more talks and presentations along these and related themes, including one by Julie Feldt about educational interventions in Zooniverse projects and one by Laura Whyte about Chicago Wildlife Watch.

I also just heard that the Commons Lab at the Woodrow Wilson Center is releasing two new reports today, and hardcopies will be available at the CSA meeting. One report is by Muki Haklay (UCL) about “Citizen Science and Policy: A European Perspective” and the other is by Teresa Scassa & Haewon Chung (U. of Ottawa) about “Typology of Citizen Science Projects from an Intellectual Property Perspective.” Look here for more information.

In any case, we’re looking forward to these meetings, and we’ll keep you updated!

Penguin Watch: Top images so far

Yesterday we launched our latest project: Penguin Watch. It is already proving to be one of the most popular projects we run, with over one hundred thousand classifications in the first day! The data come from 50 cameras focussed on the nesting areas of penguin colonies around the Southern Ocean. Volunteers are asked to tag adult penguins, chicks, and eggs.

Here are my favourite images uncovered by our volunteers so far: (click on an image to see what people are saying about it on Penguin Watch Talk)

1st Rule of Penguin Watch - You don't have to count them all. But I dare you to!
1st Rule of Penguin Watch – You don’t have to count them all. But I dare you to!

 

Living on the edge
Living on the edge
Penguins aren't always only black and white...
Penguins are always only black and white…
I think they want in!
I think they want in!
Spot the tourists
Spot the tourists
We're saved!
We’re saved!
Coming back from a refreshing afternoon swim
Coming back from a refreshing afternoon swim

 

See what amazing pictures you can find right now at www.penguinwatch.org

Call for Proposals

conscicom

 

The Constructing Scientific Communities project (ConSciCom), part of the AHRC’s ‘Science in Culture’ theme, is inviting proposals for citizen science or citizen humanities projects to be developed as part of the Zooniverse platform.

ConSciCom examines citizen science in the 19th and 21st centuries, contrasting and reflecting on engagement with distributed communities of amateur researchers in both the historical record and in contemporary practice.

Between one and four successful projects will be selected from responses to this call, and will be developed and hosted by the Zooniverse in association with the applications. We hope to include both scientific and historical projects; those writing proposals should review the existing range of Zooniverse projects which include not only classification, but also transcription projects. Please note, however, ConSciCom cannot distribute funds nor support imaging or other digitization in support of the project.

Projects will be selected according to the following criteria:

  1. Merit and usefulness of the data expected to result from the project.
  2. Novelty of the problem; projects which require extending the capability of the Zooniverse platform or serve as case studies for crowdsourcing in new areas or in new ways are welcome.
  3. Alignment with the goals and interests of the Constructing Scientific Communities project. In particular, we wish to encourage projects that:
    1. Have a significant historical dimension, especially in relation to the history of science.
    2. Involve the transcription of text, either in its entirety or for rich metadata.

Note it is anticipated that some, but not necessarily all selected projects, will meet this third criterion; please do submit proposals on other topics.

The deadline for submissions is July 25th 2014. You can submit a proposal by following this link http://conscicom.org/proposals/form/