Below is a guest post from a researcher who has been studying the Zooniverse and who just published a paper called ‘Crowdsourced Science: Sociotechnical epistemology in the e-research paradigm’. That being a bit of a mouthful, I asked him to introduce himself and explain – Chris.
My name is David Watson and I’m a data scientist at Queen Mary University of London’s Centre for Translational Bioinformatics. As an MSc student at the Oxford Internet Institute back in 2015, I wrote my thesis on crowdsourcing in the natural sciences. I got in touch with several members of the Zooniverse team, who were kind enough to answer all my questions (I had quite a lot!) and even provide me with an invaluable dataset of aggregated transaction logs from 2014. Combining this information with publication data from a variety of sources, I examined the impact of crowdsourcing on knowledge production across the sciences.
Last week, the philosophy journal Synthese published a (significantly) revised version of my thesis, co-authored by my advisor Prof. Luciano Floridi. We found that Zooniverse projects not only processed far more observations than comparable studies conducted via more traditional methods—about an order of magnitude more data per study on average—but that the resultant papers vastly outperformed others by researchers using conventional means. Employing the formal tools of Bayesian confirmation theory along with statistical evidence from and about Zooniverse, we concluded that crowdsourced science is more reliable, scalable, and connective than alternative methods when certain common criteria are met.
We were surprised by several things in our research, however. First, the significance of the disparity between the performance of publications by Zooniverse and those by other labs was greater than expected. This plot represents the distribution of citation percentiles by year and data source for articles by both groups. Statistical tests confirm what your eyes already suspect—it ain’t even close.
We were also impressed by the networks that appear in Zooniverse projects, which allow users to confer with one another and direct expert attention toward particularly anomalous observations. In several instances this design has resulted in patterns of discovery, in which users flag rare data that go on to become the topic of new projects. This structural innovation indicates a difference not just of degree but of kind between so-called “big science” and crowdsourced e-research.
If you’re curious to learn more about our study of Zooniverse and the site’s implications for sociotechnical epistemology, check out our complete article.
We’re testing out a new feature of our interface, which means if you’re classifying images on Comet Hunters you may see occasional pop-up messages like the one pictured above.
The messages are designed to give you more information about the project. If you do not want to see them, you have the option to opt-out of seeing any future messages. Just click the link at the bottom of the pop-up.
You can have a look at this new feature by contributing some classifications today at www.comethunters.org.
We’re cleaning up our email list to make sure that we do not email anyone who does not want to hear from us. You will have got an email last week asking you if you want to stay subscribed. If you did not click the link in that email, then you will have received one today saying you have been unsubscribed from our main mailing list. Don’t worry! If you still want to receive notifications from us regarding things like new projects, please go to www.zooniverse.org/settings/email and make sure you’re subscribed to general Zooniverse email updates.
NOTE: This has not affected emails you get from individual Zooniverse projects.
The AsteroidZoo community has exhausted the data that are available at this time. With all the data examined we are going to pause the experiment, and before users spend more time we want to make sure that we can process your finds through the Minor Planet Center and get highly reliable results.
We understand that it’s frustrating when you’ve put in a lot of work, and there isn’t a way to confirm how well you’ve done. But please keep in mind that this was an experiment – How well do humans find asteroids that machines cannot?
Often times in science an experiment can run into dead-ends, or speed-bumps; this is just the nature of science. There is no question that the AsteroidZoo community has found several potential asteroid candidates that machines and algorithms simply missed. However, the conversion of these tantalizing candidates into valid results has encountered a speed bump.
What’s been difficult is that all the processing to make an asteroid find “real” has been based on the precision of a machine – for example the arc of an asteroid must be the correct shape to a tiny fraction of a pixel to be accepted as a good measurement. The usual process of achieving such great precision is hands-on, and might take takes several humans weeks to get right. On AsteroidZoo, given the large scale of the data, automating the process of going from clicks to precise trajectories has been the challenge.
While we are paused, there will be updates to both the analysis process, and the process of confirming results with the Minor Planet Center. Updates will be posted as they become available.
We’re getting through the first round of Penguin Watch data- it’s amazing and it’s doing the job we wanted, which is to revolutionise the collection and processing of penguin data from the Southern Ocean – to disentangle the threats of climate change, fishing and direct human disturbance. The data are clearly excellent, but we’re now trying to automate processing them so that results can more rapidly influence policy.
In “PenguinWatch 2.0”, people will be able to see the results of their online efforts to monitor and conserve Antarctica’s penguins colonies. The more alert among you will notice that it’s not fully there yet, but we’re working on it!
We have loads of ideas on how to integrate this with the penguinwatch.org experience so that people are more engaged, learn more and realise what they are contributing to!
For now, we’re doing this the old-fashioned way; anyone such as schools who want to be more engaged, can contact us (firstname.lastname@example.org) and we’ll task you with a specific colony and feedback on that.
We’re sorry to let you know that at 16:29 BST on Wednesday last week we made a change to the Panoptes code which had the unexpected result that it failed to record classifications on six of our newest projects; Season Spotter, Wildebeest Watch, Planet Four: Terrains, Whales as Individuals, Galaxy Zoo: Bar Lengths, and Fossil Finder. It was checked by two members of the team – unfortunately, neither of them caught the fact that it failed to post classifications back. When we did eventually catch it, we fixed it within 10 minutes. Things were back to normal by 20:13 BST on Thursday, though by that time each project had lost a day’s worth of classifications.
To prevent something like this happening in the future we are implementing new code that will monitor the incoming classifications from all projects and send us an alert if any of them go unusually quiet. We will also be putting in even more code checks that will catch any issues like this right away.
It is so important to all of us at the Zooniverse that we never waste the time of any of our volunteers, and that all of your clicks contribute towards the research goals of the project. If you were one of the people whose contributions were lost we would like to say how very sorry we are, and hope that you can forgive us for making this terrible mistake. We promise to do everything we can to make sure that nothing like this happens again, and we thank you for your continued support of the Zooniverse.
Today, we launchAnnoTate, an art history and transcription project made in partnership with Tate museums and archives. AnnoTate was built with the average time-pressed user in mind, by which I mean the person who does not necessarily have five or ten minutes to spare, but maybe thirty or sixty seconds.
AnnoTate takes a novel approach to crowdsourced text transcription. The task you are invited to do is not a page, not sentences, but individual lines. If the kettle boils, the dog starts yowling or the children are screaming, you can contribute your one line and then go attend to life.
The new transcription system is powered by an algorithm that will show when lines are complete, so that people don’t replicate effort unnecessarily. As in other Zooniverse projects, each task (in this case, a line) is done by several people, so you’re not solely responsible for a line, and it’s ok if your lines aren’t perfect.
Of course, if you want trace the progression of an artist’s life and work through their letters, sketchbooks, journals, diaries and other personal papers, you can transcribe whole pages and documents in sequence. Biographies of the artists are also available, and there will be experts on Talk to answer questions.
Every transcription gets us closer to the goal of making these precious documents word searchable for scholars and art enthusiasts around the world. Help us understand the making of twentieth-century British art!
Calling all Zooniverse volunteers! As we transition from the dog days of summer to the pumpkin spice latte days of fall (well, in the Northern hemisphere at least) it’s time to mobilize and do science!
Our Zooniverse community of over 1.3 million volunteers has the ability to focus efforts and get stuff done. Join us for the Sunspotter Citizen Science Challenge! From August 29th to September 5th, it’s a mad sprint to complete 250,000 classifications on Sunspotter.
Sunspotter needs your help so that we can better understand and predict how the Sun’s magnetic activity affects us on Earth. The Sunspotter science team has three primary goals:
Hone a more accurate measure of sunspot group complexity
Improve how well we are able to forecast solar activity
Create a machine-learning algorithm based on your classifications to automate the ranking of sunspot group complexity
In order to achieve these goals, volunteers like you compare two sunspot group images taken by the Solar and Heliospheric Observatory and choose the one you think is more complex. Sunspotter is what we refer to as a “popcorn project”. This means you can jump right in to the project and that each classification is quick, about 1-3 seconds.
Let’s all roll up our sleeves and advance our knowledge of heliophysics!
In the previous post, I described the creation of the Zooniverse Project Success Matrix from Cox et al. (2015). In essence, we examined 17 (well, 18, but more on that below) Zooniverse projects, and for each of them combined 12 quantitative measures of performance into one plot of Public Engagement versus Contribution to Science:
The aim of this post is to answer the questions: What does it mean? And what doesn’t it mean?
Discussion of Results
The obvious implication of this plot and of the paper in general is that projects that do well in both public engagement and contribution to science should be considered “successful” citizen science projects. There’s still room to argue over which is more important, but I personally assert that you need both in order to justify having asked the public to help with your research. As a project team member (I’m on the Galaxy Zoo science team), I feel very strongly that I have a responsibility both to use the contributions of my project’s volunteers to advance scientific research and to participate in open, two-way communication with those volunteers. And as a volunteer (I’ve classified on all the projects in this study), those are the 2 key things that I personally appreciate.
It’s apparent just from looking at the success matrix that one can have some success at contributing to science even without doing much public engagement, but it’s also clear that every project that successfully engages the public also does very well at research outputs. So if you ignore your volunteers while you write up your classification-based results, you may still produce science, though that’s not guaranteed. On the other hand, engaging with your volunteers will probably result in more classifications and better/more science.
Surprises, A.K.A. Failing to Measure the Weather
Some of the projects on the matrix didn’t appear quite where we expected. I was particularly surprised by the placement of Old Weather. On this matrix it looks like it’s turning in an average or just-below-average performance, but that definitely seems wrongto me. And I’m not the only one: I think everyone on the Zooniverse team thinks of the project as a huge success. Old Weather has provided robust and highly useful data to climate modellers, in addition to uncovering unexpected data about important topics such as the outbreak and spread of disease. It has also provided publications for more “meta” topics, including the study of citizen science itself.
Additionally, Old Weather has a thriving community of dedicated volunteers who are highly invested in the project and highly skilled at their research tasks. Community members have made millions of annotations on log data spanning centuries, and the researchers keep in touch with both them and the wider public in multiple ways, including a well-written blog that gets plenty of viewers. I think it’s fair to say that Old Weather is an exceptional project that’s doing things right. So what gives?
There are multiple reasons the matrix in this study doesn’t accurately capture the success of Old Weather, and they’re worth delving into as examples of the limitations of this study. Many of them are related to the project being literally exceptional. Old Weather has crossed many disciplinary boundaries, and it’s very hard to put such a unique project into the same box as the others.
Firstly, because of the way we defined project publications, we didn’t really capture all of the outputs of Old Weather. The use of publications and citations to quantitatively measure success is a fairly controversial subject. Some people feel that refereed journal articles are the only useful measure (not all research fields use this system), while others argue that publications are an outdated and inaccurate way to measure success. For this study, we chose a fairly strict measure, trying to incorporate variations between fields of study but also requiring that publications should be refereed or in some other way “accepted”. This means that some projects with submitted (but not yet accepted) papers have lower “scores” than they otherwise might. It also ignores the direct value of the data to the team and to other researchers, which is pretty punishing for projects like Old Weather where the data itself is the main output. And much of the huge variety in other Old Weather outputs wasn’t captured by our metric. If it had been, the “Contribution to Science” score would have been higher.
Secondly, this matrix tends to favor projects that have a large and reasonably well-engaged user base. Projects with a higher number of volunteers have a higher score, and projects where the distribution of work is more evenly spread also have a higher score. This means that projects where a very large fraction of the work is done by a smaller group of loyal followers are at a bit of a disadvantage by these measurements. Choosing a sweet spot in the tradeoff between broad and deep engagement is a tricky task. Old Weather has focused on, and delivered, some of the deepest engagement of all our projects, which meant these measures didn’t do it justice.
To give a quantitative example: the distribution of work is measured by the Gini coefficient (on a scale of 0 to 1), and in our metric lower numbers, i.e. more even distributions, are better. The 3 highest Gini coefficients in the projects we examined were Old Weather (0.95), Planet Hunters (0.93), and Bat Detective (0.91); the average Gini coefficient across all projects was 0.82. It seems clear that a future version of the success matrix should incorporate a more complex use of this measure, as very successful projects can have high Gini coefficients (which is another way of saying that a loyal following is often a highly desirable component of a successful citizen science project).
Thirdly, I mentioned in part 1 that these measures of the Old Weather classifications were from the version of the project that launched in 2012. That means that, unlike every other project studied, Old Weather’s measures don’t capture the surge of popularity it had in its initial stages. To understand why that might make a huge difference, it helps to compare it to the only eligible project that isn’t shown on the matrix above: The Andromeda Project.
In contrast to Old Weather, The Andromeda Project had a very short duration: it collected classifications for about 4 weeks total, divided over 2 project data releases. It was wildly popular, so much so that the project never had a chance to settle in for the long haul. A typical Zooniverse project has a burst of initial activity followed by a “long tail” of sustained classifications and public engagement at a much lower level than the initial phase.
The Andromeda Project is an exception to all the other projects because its measures are only from the initial surge. If we were to plot the success matrix including The Andromeda Project in the normalizations, the plot looks like this:
Because we try to control for project duration, the very short duration of the Andromeda Project means it gets a big boost. Thus it’s a bit unfair to compare all the other projects to The Andromeda Project, because the data isn’t quite the same.
However, that’s also true of Old Weather — but instead of only capturing the initial surge, our measurements for Old Weather omit it. These measurements only capture the “slow and steady” part of the classification activity, where the most faithful members contribute enormously but where our metrics aren’t necessarily optimized. That unfairly makes Old Weather look like it’s not doing as well.
In fact, comparing these 2 projects has made us realize that projects probably move around significantly in this diagram as they evolve. Old Weather’s other successes aren’t fully captured by our metrics anyway, and we should keep those imperfections and caveats in mind when we apply this or any other success measure to citizen science projects in the future; but one of the other things I’d really like to see in the future is a study of how a successful project can expect to evolve across this matrix over its life span.
Why do astronomy projects do so well?
There are multiple explanations for why astronomy projects seem to preferentially occupy the upper-right quadrant of the matrix. First, the Zooniverse was founded by astronomers and still has a high percentage of astronomers or ex-astronomers on the payroll. For many team members, astronomy is in our wheelhouse, and it’s likely this has affected decisions at every level of the Zooniverse, from project selection to project design. That’s starting to change as we diversify into other fields and recruit much-needed expertise in, for example, ecology and the humanities. We’ve also launched the new project builder, which means we no longer filter the list of potential projects: anyone can build a project on the Zooniverse platform. So I think we can expect the types of projects appearing in the top-right of the matrix to broaden considerably in the next few years.
The second reason astronomy seems to do well is just time. Galaxy Zoo 1 is the first and oldest project (in fact, it pre-dates the Zooniverse itself), and all the other Galaxy Zoo versions were more like continuations, so they hit the ground running because the science team didn’t have a steep learning curve. In part because the early Zooniverse was astronomer-dominated, many of the earliest Zooniverse projects were astronomy related, and they’ve just had more time to do more with their big datasets. More publications, more citations, more blog posts, and so on. We try to control for project age and duration in our analysis, but it’s possible there are some residual advantages to having extra years to work with a project’s results.
Moreover, those early astronomy projects might have gotten an additional boost from each other: they were more likely to be popular with the established Zooniverse community, compared to similarly early non-astronomy projects which may not have had such a clear overlap with the established Zoo volunteers’ interests.
The citizen science project success matrix presented in Cox et al. (2015) is the first time such a diverse array of project measures have been combined into a single matrix for assessing the performance of citizen science projects. We learned during this study that public engagement is well worth the effort for research teams, as projects that do well at public engagement also make better contributions to science.
It’s also true that this matrix, like any system that tries to distill such a complex issue into a single measure, is imperfect. There are several ways we can improve the matrix in the future, but for now, used mindfully (and noting clear exceptions), this is generally a useful way to assess the health of a citizen science project like those we have in the Zooniverse.
What makes one citizen science project flourish while another flounders? Is there a foolproof recipe for success when creating a citizen science project? As part of building and helping others build projects that ask the public to contribute to diverse research goals, we think and talk a lot about success and failure at the Zooniverse.
But while our individual definitions of success overlap quite a bit, we don’t all agree on which factors are the most important. Our opinions are informed by years of experience, yet before this year we hadn’t tried incorporating our data into a comprehensive set of measures — or “metrics”. So when our collaborators in the VOLCROWE project proposed that we try to quantify success in the Zooniverse using a wide variety of measures, we jumped at the chance. We knew it would be a challenge, and we also knew we probably wouldn’t be able to find a single set of metrics suitable for all projects, but we figured we should at least try to write down onepossible approach and note its strengths and weaknesses so that others might be able to build on our ideas.
In this study, we only considered projects that were at least 18 months old, so that all the projects considered had a minimum amount of time to analyze their data and publish their work. For a few of our earliest projects, we weren’t able to source the raw classification data and/or get the public-engagement data we needed, so those projects were excluded from the analysis. We ended up with a case study of 17 projects in all (plus the Andromeda Project, about which more in part 2).