Optimizing for interest : Why people aren’t machines

One of the joys of working in the Zooniverse is the sheer variety of people who are interested in our work, and I spent a happy couple of days toward the end of last year at a symposium about Discovery Infomatics – alongside a bunch of AI researchers and their friends who are trying to automate the process of doing science. I don’t think they’d mind me saying that we’re a long, long way from achieving that, but it was a good chance to muse on some of the connections between the work done by volunteers here and by our colleagues who think about machine learning.

In the past we’ve shown that machines can learn from us, but we’ve also talked about the need for a system that can combine the best of human and machine.

These two things are not the same
Robot and human (Thanks to Flickr user NineInchNachosXI)

I’m still convinced that that will especially be needed as the size of datasets produced by scientific surveys continues to increase at a frightening pace. The essential idea is that only the proportion of the data which really needs human attention need be passed to human classifiers; an idea that starts off as a non-brainer (wouldn’t it be nice if we could decide in advance which proportion of Galaxy Zoo systems are too faint or fuzzy for sensible decisions to be made?) and then becomes interestingly complex.

This is particularly true when you start thinking of volunteers not as a crowd, but as a set of individuals. We know from looking at the data from past projects that people’s talents are varied – the people who are good at identifying spiral arms, for example, may not be the same people who can spot the faintest signs of a merger. So if we want to be most efficient, what we should be aiming for is passing each and every person the image that they’d be best at classifying.

That in turn is easy to say, but difficult to deliver in practice. Since the days of the original Galaxy Zoo we’ve tended to shun anything that resembles a test before a volunteer is allowed to get going, and in any case a test which thoroughly examined someone’s ability in every aspect of the task (how do they do on bright galaxies? on faint ones? on distant spirals? on nearby ellipticals? on blue galaxies? what about mergers?) wouldn’t be much fun.

One solution is to use the information we already have; after all, every time someone provides a classification we learn something not only about the thing they’re classifying but also about them. This isn’t a new idea – in astronomy, I think it’s essentially the same as the personal equation used by stellar observers to combine results from different people – but things have got more sophisticated recently.

As I’ve mentioned before, a team from the robotics group in the department of engineering here in Oxford took a look at the classifications supplied by volunteers in the Galaxy Zoo: Supernova project and showed that by classifying the classifiers we could make better classifications. During the Discovery Infomatics conference I had a quick conversation with Tamsyn Waterhouse, a researcher from Google interested in similar problems, and I was able to share results from Galaxy Zoo 2 with her*.

We didn’t get time for a long chat, but I was delighted to hear that work on Galaxy Zoo had made it into a paper Tamsyn presented at a different conference. (You can read her paper here, or in Google’s open access repository here.) Her work, which is much wider than our project, develops a method which considers the value of each classification based (roughly) on the amount of information it provides, and then tries to seek the shortest route to a decision. And it works – she’s able to show that by applying these principles we would have been done with Galaxy Zoo 2 faster than we were – in other words, we wasted some people’s time by not being as efficient as we could be.

A reminder of what Galaxy Zoo 2 looked like!
A reminder of what Galaxy Zoo 2 looked like!

That doesn’t sound good – not wasting people’s time is one of the fundamental promises we make here at the Zooniverse (it’s why we spend a lot of time selecting projects that genuinely need human classifications). Zoo 2 was a long time in the past, but knowing what we know now should we be implementing a suitable algorithm for all projects from here on in?

Probably not. There are some fun technical problems to solve before we could do that anyway, but even if we could, I don’t think we should. The current state of the art of such work misses, I think, a couple of important factors which distinguish citizen science projects from other examples considered in Tamsyn’s paper particularly. To state the obvious: volunteer classifiers are different from machines. They get bored. They get inspired. And they make a conscious or an unconscious decision to stay for another classification or to go back to the rest of the internet.

The interest a volunteer will have in a project will change as they move (or are moved by the software) from image to image and from task to task, and in a complicated way. Imagine getting a galaxy that’s difficult to classify; on a good day you might be inspired by the challenge and motivated to keep going, on a bad one you might just be annoyed and more likely to leave. We all learn as we go, too, and so our responses to particular images change over time. The challenge is to incorporate these factors into whatever algorithm we’re applying so that we can maximise not only efficiency, but interest. We might want to show the bright, beautiful galaxies to everyone, for example. Or start simple with easy examples and then expand the range of galaxies that are seen to make the task more difficult. Or allow people a choice about what they see next. Or a million different things.

Whatever we do, I’m convinced we will need to do something; datasets are getting larger and we’re already encountering projects where the idea of getting through all the data in our present form is a distant dream. Over the next few years, we’ll be developing the Zooniverse infrastructure to make this sort of experimentation easier, looking at theory with the help of researchers like Tamsyn to see what happens when you make the algorithms more complicated, and talking to our volunteers to find out what they want from these more complicated projects – all in our twin causes of doing as much science as possible, while providing a little inspiration along the way.

* – Just to be clear, in both cases all these researchers got was a table of classifications without any way of identifying individual volunteers except by a number.

Teachers Wanted For Planet Hunters Educators Guide Piloting

We need you and your students to help us craft a top-notch resource for teachers!  Educators at the Adler Planetarium have been hard at work creating an educators guide aimed at helping teachers bring the thrilling hunt for exoplanets into their classroom.  The first draft is nearly ready and we want to know what you think.

We’re looking for US-based 6th -8th grade teachers to try one or more lessons from the Planet Hunters Educators Guide this spring with their students.   Each lesson can be taught as a stand alone activity and takes approximately 45 – 60 minutes of class time.  We want to know what works, what needs to change, and any other feedback you can provide.

Besides, one of your students may just discover a new planet!  You can’t get that in gym class (although physical fitness is very important). 

If you’re interested, please email the following information to education@zooniverse.org

 

Name:

State:

Grades & Subjects Taught:

Number of Class Sections (if applicable):

Project Workshop Winners

We were delighted by the response to our call for volunteers to attend our project workshop and we’re delighted to announce that our two winners are Katy Maloney and Janet Bain. Katy is a Planet Hunter from Montreal (you can see her in this recent video about online communities. Janet is well known to those from Old Weather where she serves as moderator of the very active forum.

As Jules explained in her post, these workshops are a chance for the strange mix of people behind the scenes of the Zooniverse – developers, educators and scientists – to get together to discuss what works and what doesn’t, and to plan the year ahead. We think it’s very important to have volunteers there – and we hope that Katy and Janet (along with Jules, who we’ve invited back) will keep you all informed and involved in the discussions.

There were a few comments in the discussion under that last post from people – particularly locals – who would clearly have dearly loved to come. Unfortunately, it wouldn’t be possible to run the workshop as a public event; both because of the format (which features spontaneously arranged small group discussions) and also to allow everyone to speak freely about often quite difficult issues. What I do find heartening is that we’ve grown a community who want to help us plan and develop for the future, and we need to take that seriously.

I’ll write more over the next couple of weeks and months about what we’re going to do to be more open, but for now for those who really wanted to come we’ll work hard to organise some truly public events. We have a meeting in Oxford on the 22nd June which I hope British Zooites will be able to attend, and we’ll arrange a similar event in Chicago as soon as possible. We’ll also try hard to webcast these events so all can attend.

Chris

Why SciStarter.com is Bad For Citizen Science

Preface: I’d like to begin by saying that I’ve met Darlene Cavalier at conferences in the past and I’m a big supporter of her efforts. Darlene is truly is a ‘cheerleader’ for citizen science, her enthusiasm is infectious and the citizen science domain is clearly a better place with her. I’m writing here about what I consider the bad practice of SciStarter.com and Science For Citizens LLC, their parent organisation. I have no idea whether the issues highlighted here are because of decisions that she has made.

There was a time not so long ago when you needed a new account for pretty much everything you tried out on the web. Want to upload photos to Flickr? Then signup for a Yahoo! ID. Want a blog? Then give WordPress or Tumblr your details. Feeling social? Then FaceBook, Twitter or MySpace would pretty much want the same information. These days there are a number of solutions that allow you to log in to web-based services using things like your Facebook, Twitter or Google account. Under the hood these solutions typically rely on a couple of protocols such as OAuth and OpenID and often still request your email address when you sign in but the days of hundreds of accounts each with their own password to remember are coming to a close.

In many ways a request by an organision for your email address when signing up for a new service is completely reasonable. In exchange for handing over your email address and a few personal details these tools were often available for free – both parties win. There is of course the discussion around who or what is the product when you use these free services but let’s not go into that here.

Since launching the original Galaxy Zoo back in 2007 we’ve encouraged our volunteer community to register for an account with us, although for the vast majority of our projects (and all of our recent ones) this login/signup is an optional step. For the Zooniverse there are two main reasons for asking you to create an account:

1) When we publish a paper as a result of your efforts we feel extremely strongly about crediting you for your efforts. Experience has taught us that attempting to publish a paper with 170,000 authors on is somewhat frowned upon by the journals but if you take a look at any of the Zooniverse publications you’ll find a link to an authors page such as here, here and here. We can only credit you if you share some personal information with us when you sign up.

2) For our research methods to work well, identifying an individual ‘classifier’ is pretty important. You can read more about this here (the original Galaxy Zoo paper) or here but in order to produce the best results possible we spend lots of time working out who is ‘best’ at a particular task and weighting their contributions accordingly. Being able to reliably identify an individual throughout the lifetime of a project (and even between projects) is most simple when someone has logged in.

Over the past year or so I’ve become increasingly concerned by the behaviour of SciStarter.com – a website that indexes citizen science projects from across the web. The site does a pretty good job of cataloging citizen science projects you can contribute to – when you visit the site and search for example for ‘bats’ the Zooniverse project Bat Detective is listed in the results. Selecting the result takes you to a brief summary of Bat Detective and offers you a link to ‘get started now!’ and this is where it goes wrong: Rather than taking you straight to the Bat Detective site you have to be ‘logged in’. Sign up for what exactly? Am I signing up to take part in Bat Detective? No. You’re actually just signing up for an account with SciStarter.com just so you can get a link to a project that SciStarter.com has nothing to do with.

Additionally, in a recent ‘top 10’ blog post of most successful citizen science projects of 2012, Bat Detective was highlighted. Did the link in this article send you straight to the Bat Detective website? Sadly not, it of course links to SciStarter’s catalogue page about Bat Detective which requires account registration before you can access the URL.

To me this doesn’t seem right and in many ways this is just exploiting people’s lack of experience and understanding of the web. There’s a reason that Facebook.com is in the consistently the most Googled terms – many people just don’t quite understand how the web works and I think SciStarter.com are exploiting this. Conversly, for those who are a little more web savvy these tactics must seem very clumsy. Perhaps more importantly though, it’s widely recognised that signup forms are a barrier to entry for many people and so by having people jump through this hoop SciStarter.com are actually holding potential citizen scientists back.

I don’t believe it’s in anyone’s interest other than Scistarter’s to require you to sign up to follow a link through to a project. By mandating this step they are building an index of individuals interested in other people’s projects when they don’t have any of their own and they’re risking confusing new community volunteers about what they have and haven’t signed up for. All of this is made worse by the fact that SciStarter.com is a division of Science for Citizens LLC – a commercial company.

So my challenge to SciStarter.com is this: If you’re so committed to citizen science then why put up this artificial barrier to contribution? Crawling the internet for people’s emails is one of the less tasteful aspects of the web and one I’d hoped we’d seen the end of. So how about it SciStarter?

Zooniverse Workshops

It’s very exciting, and a little bit scary, but we’re going to begin offering online educator workshops! The first two will occur on the 30th of March and the 6th of April and we’re looking for volunteers who would like to participate.

The workshops aim to introduce educators to the range of citizen science projects available on zooniverse.org, our new website ZooTeach and the new classroom interactive tool for Galaxy Zoo, the Navigator. They are aimed at teachers, but anybody who is interested in using citizen science in a informal education setting, after school club, scout group or maybe with their home schooled son or daughter, is more than welcome to join us.

We will be using Google + Hangouts (http://www.google.com/+/learnmore/hangouts/), which can be freely accessed by anyone with a Google email account. The time of the workshops has yet to be decided as we were unsure what would work best with different time zones, but it will last for approximately 2 hours.

If you interested in participating please email education@zooniverse.org, including the date that suits you best and your location and we will do our best to set up a time that work for as many people as possible! Each workshop will have a maximum of 10 participants, but we may decide to do several smaller groups in different countries.

I hope to meet some of you online in the not too distant future!