Category Archives: News

Welcome to the Worm Watch Lab

Today we launch a new Zooniverse project in association with the Medical Research Council (MRC) and the Medical Research Foundation: Worm Watch Lab.

We need the public’s help in observing the behaviour of tiny nematode worms. When you classify on wormwatchlab.org you’re shown a video of a worm wriggling around. The aim of the game is to watch and wait for the worm to lay eggs, and to hit the ‘z’ key when they do. It’s very simple and strangely addictive. By watching these worms lay eggs, you’re helping to collect valuable data about genetics that will assist medical research.

Worm Watch Lab

The MRC have built tracking microscopes to record these videos of crawling worms. A USB microscope is mounted on a motorised stage connected to a computer. When the worm moves, the computer analyses the changing image and commands the stage to move to re-centre the worm in the field of view. Because the trackers work without supervision, they can run eight of them in parallel to collect a lot of video! It’s these movies that we need the public to help classify.

By watching movies of the nematode worms, we can understand how the brain works and how genes affect behaviour. The idea is that if a gene is involved in a visible behaviour, then mutations that break that gene might lead to detectable behavioural changes. The type of change gives us a hint about what the affected gene might be doing. Although it is small and has far fewer cells than we do, the worm used in these studies (called C. elegans) has almost as many genes as we do! We share a common ancestor with these worms, so many of their genes are closely related to human genes. This presents us with the opportunity to study the function of genes that are important for human brain function in an animal that is easier to handle, great for microscopy and genetics, and has a generation time of only a few days. It’s all quite amazing!

To get started visit www.wormwatchlab.org and follow the tutorial. You can also find Worm Watch Lab on Facebook and on Twitter.

52 Years of Human Effort

At ZooCon last week I spoke about the scale of human attention that the Zooniverse receives. One of my favourite stats in this realm (from Clay Shirky’s book ‘Cognitive Surplus’) is that in the USA, adults cumulatively spend about 200 billion hours watching TV every year. By contrast it took 100 million hours of combined effort for Wikipedia to reach its status as the world’s encyclopaedia.

In the previous year people collectively spent just shy of half a million hours working on Zooniverse projects. Better put, the community invested about 52 years worth of effort[1]. That’s to say that if an individual sat down and did nothing but classify on Zooniverse sites for 52 years they’d only just have done the same amount of work as our community did between June 2012 and June 2013. The number is always rising too. Citizen science is amazing!

Another way of thinking about it is to convert this time into Full Time Equivalents (FTEs). One person working 40 hours per week, for 50 weeks a year works for 2000 hours a year – that’s 1 FTE. So our 460,000 hours of Zooniverse effort are equivalent to 230 FTEs. It’s as if we had a building with 230 people in who only came in every day to click on Zooniverse projects.

Zooniverse Effort Distribution to June 2013

This amazing investment by the community is not broken down evenly of course, as the above ‘snail’ chart shows. In fact Planet Hunters alone would occupy 62 of the people in our fictional building: the project took up 27% of the effort in the last year. Galaxy Zoo took 17%, which means it had almost 9 years of your effort all to itself. Planet Four had a meteoric launch on the BBC’s Stargazing Live less than six months ago and since that time it has gobbled up just over 5 years of human attention – 10% of the whole for the past year.

What’s wonderful is that our 230 metaphorical workers, and the 52 years they represent, are not confined to one building or one crazed click-worker. Our community is made of hundreds of thousands of individuals across the world – 850,000 of whom have signed up through zooniverse.org. Some of them have contributed a single classification, others have given our researchers far, far more of their time and attention. Through clicking on our sites, discussing ideas on Talk, or just spreading the word: Zooniverse volunteers are making a significant contribution to research in areas from astronomy to zoology.

Congratulations to everyone who’s taken part and let’s hope this number increases again by next year!

[1] In my ZooCon talk I incorrectly gave the figure of 35 years. This was wrong for two reasons; firstly, I had neglected Andromeda Project, Planet Four and Snapshot Serengeti for technical reasons. Secondly I had calculated the numbers incorrectly, in my rush to get my slides ready, and I underestimated them all by about 20%.

Anyone Fancy an Asteroid?

This could be the Asteroid Zoo logo
This might be the Asteroid Zoo logo

Anyone interested in astronomy on the web will be aware of the fabulous success of Planetary Resources’ fundraising effort to build and launch the ARKYD space telescope. They’ve already raised more than a million dollars – helped in part by a cunning plan to let you take a picture of yourself in space – but they’re not stopping there. With three days to go, we’re delighted to announce that they’re going to try and help us help Zooniverse volunteers hunt for potentially hazardous asteroids.

The latest stretch goal is to support the development by the Zooniverse of a citizen science asteroid hunt. If the new target is hit, we’ll build a system that uses more than 3 million images, taken data from the Catalina Sky Survey – the survey responsible for nearly half of the near Earth asteroid discoveries in the last fifteen years. We know there are asteroids out that are waiting to be discovered, and we’re willing to bet that the existing routines used to scan through the survey data didn’t find them all.

Recent discoveries of near-Earth objects; Catalina's the big purple part.
Recent discoveries of near-Earth objects; Catalina’s the big purple part.

Anyone who’s followed the Zooniverse over the last few years knows that we believe in doing projects that make authentic contributions to science, and so I’m especially pleased that the project with Planetary Resources is also focused on improving machine learning solutions to asteroid hunting. Rather like our supernova project, an ideal outcome would be to use the classifications provided by volunteers to improve automated searching and suggest new methods by which machines might take up the strain. In the meantime, though, there are new (small) worlds to find – with your help, we’ll be launching the search for them soon.

I’ve put my money where my mouth is already, and if you can afford it then I hope you’ll follow the link and donate so we can all go asteroid hunting. You can also watch their Kickstarter video to see what they’re trying to do.

PRI-KS-Banner-428x60

How the Zooniverse Works: Tools and Technologies

In my last post I described at length the domain model that we use to describe conceptually what the Zooniverse does. That wouldn’t mean much without an implementation of that model and so in this post I’m going to describe some of the tools and technologies that we use to actually run our citizen science projects.

The lifecycle of a Zooniverse project

Let’s think a little more about what happens when you visit a project such as Snapshot Serengeti. Ignoring all of the to-and-fro that your web browser does to work out where the domain name ‘snapshotserengeti.org’ points to, once it’s figured this and a few other details out you basically get sent a website that your browser renders for you. For the website to function as a Zooniverse project a few things are essential:

  1. You need to be able to view images (or listen to audio or watch a video) that we and the science team need your help analysing.
  2. You need to be able to log in with your Zooniverse account.
  3. We need to capture back what you said when doing the citizen science analysis task.
  4. Save out favourite images to your profile.
  5. View recent images you’ve seen in your profile.
  6. Discuss these images with the community.

It turns out that pretty much all of the functionality mentioned above is for us delivered by an application we call Ouroboros as an API layer and a website (such as Snapshot Serengeti) talking to it.

Screen Shot 2013-06-26 at 8.20.51 AM

Ouroboros – or ‘why the simplest API that works is probably all you need’.

So what is Ouroboros? It provides an API (REST/JSON) that allows you to build a Zooniverse project that has all of the core components (1-6) listed above. Technology-wise it’s a custom Ruby on Rails application (Rails 3.2) that uses MongoDB to store data and Redis as a query cache all running on Amazon Web Services. It’s probably utterly useless to anyone but us but for our needs it’s just about perfect.

Screen Shot 2013-06-26 at 8.01.30 AM

At the Zooniverse we’re optimised for a few different things. In no particular order of priority they are:

  1. Volume – we want to be able to build lots of projects.
  2. Science – we want it to be easy to do science with the efforts of our community.
  3. Scale/performance – we want to be able to have millions of people come to our proejcts and them to stay up.
  4. Availability – we’d prefer our websites to be ‘up’ and not ‘down’.
  5. Cost – we want to keep costs at a manageable level.

Pretty much all of these requirements point to having a shared API (Ouroboros) that serves a large number of projects (I’ll argue #4 in the pub with anyone who really wants to push me on it).

Running a core API that serves many projects makes you take the maintenance and health of that application pretty seriously. Should Ouroboros throw a wobbly then we’d currently take out about 10 Zooniverse projects at once and this is only set to increase. This means we’ve thought a lot about how to scale the application for times when we’re busy and we also spend significant amounts of time monitoring the application performance and tuning code where necessary. I mentioned that cost is a factor – running a central API means that when the Zooniverse is quiet and there aren’t many people about we can scale back the number of servers we’re running (automagically on Amazon Web Services) to a minimal level.

We’ve not always built our projects this way. The original Galaxy Zoo (2007) was an ASP/web forms application, projects between Galaxy Zoo 2 and SETI Live were all separate web applications, many of them built using an application called The Juggernaut. Building standalone applications every time not only made it difficult to maintain our projects but we also found ourselves writing very similar (but subtly different) code many times between projects, code for things like choosing which Subject to show next.

Ouroboros is an evolution of our thinking about how to build projects, what’s important and generalisable and what isn’t. At it’s most basic it’s a really fast Subject allocator and Classification collector. Our realisation over the last few years was that the vast majority of what’s different about each project is the user experience and classification interface and this has nothing to do with the API.

Subjects out, Classifications back in.
Subjects out, Classifications back in.

The actual projects

The point of having a central API is that when we want to build a new project we’re already working with a very familiar toolset – the way we log people in, do signup forms, ask for a Subject, send back Classifications – all of this is completely standard. In fact if you’re building in JavaScript (which we almost always are these days) then there’s a client library called ‘Zooniverse’ (meta I know) available here on GitHub.

Having a standard API and client library for talking to it meant that we built the Zooniverse project Planet Four in less than 1 week! That’s not to say it’s trivial to build projects, it’s definitely not, but it is getting easier. And having this standardised way of communicating with the core Zooniverse means that the bulk of the effort when building Planet Four was exactly where it should be – the fan drawing tools – the bit that’s different from any of our other projects.

Screen Shot 2013-06-26 at 8.19.22 AM

So how do we actually build our projects these days? We build our projects as JavaScript web applications using JavaScript web frameworks such as Spine JS, Backbone or something completely custom. The point being, that all of the logic for how the interface should behave is baked into the JavaScript application – Ouroboros doesn’t try and help with any of this stuff.

Currently the majority of our projects are hosted using the Amazon S3 static website hosting service. The benefits of this are numerous but key ones for us are:

  1. There’s no webserver serving the site content, that is http://www.galaxyzoo.org resolves to an S3 bucket. When you access the Galaxy Zoo site S3 does all of the hard work and we just pay for the bandwidth from S3 to your computer.
  2. Deploying is easy. When we want to put out a new version of any of our sites we just upload new timestamped versions of the files and your browser starts using them instead.
  3. It’s S3 – Amazon S3 is a quite remarkable service –  a significant fraction of the web is using it. Currently hosting more than 2 trillion (yes that’s 12 zeroes) objects and regularly serving more than 1 million requests for data per second the S3 service is built to scale and we get to use it (and so can you).

Amazon S3 is a static webhost (i.e. you can’t have any server-side code running) so how do we make a static website into a Zooniverse project you can log in to when we can’t access database records? The main site functions just fine – these JavaScript applications (such as the current Galaxy Zoo or any recent Zooniverse project) implement what is different about the project’s interface. We then use a small invisible iFrame on each website that actually points to api.zooniverse.org which is Ouroboros. When you use a login form we actually set a cookie on this domain and then send all of our requests back to the API through this iFrame. This approach is a little unusual and with browsers tightening up the restrictions on third-party cookies if looks like we might need to swap it out for a different approach but for now it’s working well.

Summing up

If you’re like me then when you read something you read the opening, look at the pictures and then skip to the conclusions. I’ll summarise here just incase you’re doing that too:

In the Zooniverse there’s a clear separation between the API (Ouroboros) and the citizen science projects that the community interact with. Ouroboros is a custom-built, highly scalable application built in Ruby on Rails, that runs on Amazon Web Services and uses MongoDB, Redis and a few other fancy technologies to do its thing.

The actual citizen science projects that people interact with are these days all pure JavaScript applications that are hosted on Amazon S3 and they’re pretty much all open source. They’re generally still bespoke applications each time but share common code for talking to Ouroboros.

What I didn’t talk about in this post are the hardest bits we’ve solved in Ouroboros – namely all of the logic about how to make finding Subjects for people quickly and other ‘smart stuff’. That’s coming up next.

Putting the ‘citizen’ in ‘citizen science’

I was slightly surprised to see my twitter feed this morning filling up with comments about how the term ‘citizen’ appears in writing about science, and about public engagement with science. This seems to be coming from Roland Jackson’s post in response to the publication of a report called ‘What publics? When?’ from Sciencewise, an organisation that gives advice on science policy to government. Roland’s point is that perhaps the reason we get ourselves in a tangle when talking about public engagement is the word ‘public’, thinking that ‘citizen’ does a better job of breaking down the divide between ‘us’ doing the engagement and the ‘public’ being engaged. (There’s another engaging comparison on Nottingham’s ‘Making Science Public’ blog.)

In such contexts, I reckon ‘citizen’ comes up most often in ‘citizen science’, and I thought it might be interesting to say something about our use of the term. It’s how we describe our projects in papers, and we chose it mostly because we didn’t like the term ‘crowdsourcing’, which never seem adequate for projects which very quickly demonstrated that they could grow way beyond simple requests for a community to complete a task. We quickly realised we wanted people to make discoveries, to follow them up themselves and to chase down their own research questions and crowdsourcing just doesn’t describe that. I also liked the fact that anyone – professional or amateur, project designer or participant – could be a citizen scientist.

We clearly weren’t that confident, though. Although the core collaboration that builds and runs the Zooniverse is the Citizen Science Alliance, we’ve mostly reserved that term for grant applications rather than using in the real word. (Let along the problems of being a citizen science group which produces humanities projects either deliberately or accidentally.) This reticence isn’t misplaced; it reflects my firm belief that noone in the history of the world has ever set down at a computer, opened their web browser and thought ‘I’m a citizen scientist. Let’s do some citizen science’. Zooniverse participants are fans of one or more of our projects, and they tend to have stumbled in and then found a comfortable environment where they can do exciting things, rather than started off by looking for a science project. (This is also, I think, reflected in the lack of traffic we get from citizen science portals like SciStarter.)

‘Citizen’ science, from this perspective isn’t any more inclusive than talking about ‘public engagement’. The most common alternative (‘PPSR’ or Public Participation in Scientific Research) doesn’t help either. If names are important, we need a new one for this thing that we’re doing, but as the person who has been most consistently wrong about naming Zooniverse projects (I voted against Galaxy Zoo, for starters!) I’m the last person to ask. Maybe we should crowdsource a solution….

Chris

PS I’m reminded of this slide deck from Arfon which proposes CBSR (Community Based Scientific Research) and PPFCSM (Public Participation as a Fundamental Component of the Scientific Method), although I think he’s kidding on the last one.

Live from ZooCon

Hello for the Martin Wood Lecture Theatre in Oxford’s Department of Physics which is playing host to a crowd of Zooniverse volunteers and project members for ZooCon13. We’re recording the talks for later broadcast, but as a sneak preview I thought I’d liveblog the event.

Talk 1 – SpaceWarps

We’re kicking off with Aprajita Verma from Oxford and from Space Warps, the newest Zooniverse astronomy project. As is traditional when talking about gravitational lensing – the bending of light by matter, she’s using Phil Marshall’s Galaxy in a Wine Glass video.

Galaxy In a Wine Glass

SpaceWarps is much needed – LSST, the next generation of survey telescope, will produce something like 10000 galaxy scale lenses. It’s designed to map a very wide area of sky, which is perfect for finding rare things like lenses – and this will produce a lot of work as traditional lens hunting is very labour intensive. Not only do they need to be found, but they then need to be modeled.

There are three lenses in this image - one real, and two simulated. Spotting the difference is hard...
There are three lenses in this image – one real, and two simulated. Spotting the difference is hard…

Luckily – we have effort! 2 million 6 million classifications have been recorded already from over 8000 people. Particularly pleasing for me is that 40% of those people are discussing things on Talk – this is essential as lenses are complicated things and the interesting ones are going to be found through discussion. The team are doing dynamic assessment of the results, retiring images that no longer need classification – I especially liked their division of classifiers into ‘Optimists’ – who get lenses right but also get excited about lots of things that aren’t lenses – ‘Pessimists’ – who correctly dismiss non-lenses but get rid of lenses too – the ‘Astute’, who get everything right and the ‘Obtuse’, those who get everything wrong. Luckily, we have lots of astute classifiers and almost none who are obtuse, as evidenced by a sneak preview of the first few discoveries (more on those next week).

Talk 2 – Cosmic Evolution from Galaxy Zoo

Next up is Karen Masters of Portsmouth and Galaxy Zoo, talking about science results from the Zooniverse’s oldest project. It’s already clear there is lots of ground to cover in this conference and Karen’s bounding through a brief history of observational astronomy, noting the conceptual leap required to go from thinking about the Milky Way, our galaxy, and an expanding Universe filled with billions of the blighters. Karen just showed a cool movie showing the parts of the sky that have been mapped by the Sloan Digital Sky Survey, which provided images for the early incarnations of Galaxy Zoo.

A Galaxy in need of classifications.
A Galaxy in need of classifications.

In going through the history of Galaxy Zoo, Karen reminds me that the original BBC news story on Galaxy Zoo claims that we hope that 30,000 people will eventually take part. We smashed that on day one if I remember correctly. (There’s also a factual error in that news story – if anyone tells me what it is via Twitter (@chrislintott) or in person they can have a pint). While I relieve ancient history, Karen’s talking about her work on red spirals: most spirals are blue, but Galaxy Zoo helped us find lots of red ones and Karen says that the Milky Way may even be on its way to becoming one. The work on the red spirals was part of a serious shift in how we think about galaxy formation – a few years back that story was all about mergers but now it’s thought that lots of galaxies form and evolve (including fading from being a blue spiral to a red spiral) in slower, less spectacular ways.

Of course, one of the advantages of citizen science that Galaxy Zoo demonstrated was the ability of classifiers to discover the weird and wonderful. Recent examples include the bulgeless galaxies – spirals which are guaranteed not to have had a merger within the last few billion years – and a set of galaxies (mostly red spirals!) with massive bars at their centre. In even better news, we have time on the Very Large Array (I REPEAT – WE HAVE TIME ON THE VERY LARGE ARRAY!) to follow up on these things.

WE GOT TIME ON THE VERY LARGE ARRAY (this is a picture of some of it)
WE GOT TIME ON THE VERY LARGE ARRAY (this is a picture of some of it)

I’m really quite excited about the VLA. I’ve always wanted to use it.

Talk 3 – New Uses for Old Weather

We’re taking a break from astronomy with Philip Brohan from the Old Weather project – he’s explaining that scientists need historical observations to constrain their models of how the climate behaves. Lacking the ability to stick a weather satellite in the Tardis and head back in time, we need to scrabble around for old records, an idea that dates back to Beaufort of wind scale fame.

Philip in the gloaming, beneath an Old Weather slide.
Philip in the gloaming, beneath an Old Weather slide.

This is great, but the supercomputers can’t read the 73 million logbook pages we’d like to sort through – hence the need for volunteers. So far more than a million logbook pages have been processed by the project – a small fraction of the total needed but a very useful quantity! Most of these volunteers are attracted by the historical information that the logs fortuitously contain – Philip is currently beneath a slide showing a log book containing both the information that the ship’s company are fitted with seal-skin boots, and that 23 dogs are received on board. (Why? Surely not for food…).

It’s all got a bit gruesome now – six dead bodies are being placed in alcohol. Luckily we’re swiftly on to HMS Tarantula, where their anemometer is infested with ants. The current set of logbooks have more famous events; in particular, the logbooks of the Jeannette show the discovery of the Arctic island now named after it (upon which nothing but ice sheets grow). The fact that we have these logbooks at all is a miracle; the ship was crushed by the ice and the crew (most of whom perished) chose to carry the scientific records with them as they struggled to safety.

Images from and about the Jeanette, including in the bottom left an artist's impression of the chest of logbooks being saved.
Images from and about the Jeanette, including in the bottom left an artist’s impression of the chest of logbooks being saved.

As well as the climate and the history, Phil says, the third important aspect of Old Weather is the people. The project’s made particularly good use of the forum, which has steered the project in new directions and provided a home for discussion of things we never thought to look for, as well as art and verse. The latter was particularly inspired by the tragic loss of the chocolate aboard the HMS Manuta. Before rolling the credits listing his more than 17,000 collaborators, Philip ended these tales by noting that to make a serious dent on the archives we need to speed up by a factor of ten, a challenge the Zooniverse is happy to accept.

Talk 4 – The Future of Galaxy Zoo

Back to the Universe now, and Oxford’s Brooke Simmons is able to start her talk on what’s coming up for Galaxy Zoo by reminding the crowd that the data release paper for the second version of Galaxy Zoo is now with the referee. At about 30 pages, it’s as short as it could possibly be, showing the amount of effort that goes into dealing with classifications received via a large citizen science project.

Brooke’s now explaining the need – with Galaxy Zoo trying to reach back to a time not that long after the Big Bang – for us to use all sorts of tests to understand how our classifications work. Showing images of the same galaxies shifted to higher and higher redshifts (further and further away) it’s clear that classifications will change just because it’s harder to see what’s going on when galaxies get further away. We’re also playing with supercomputer simulations of the evolution of galaxies which shows how things change over time.

It’s not all about simulations, though – we’re thinking about moving beyond the optical range of the spectrum and looking at galaxies in the ultraviolet and infrared. The former, from a satellite called GALEX, shows only the youngest starts, the latter, from a survey called UKIDSS which covers about a third of the Sloan area, the dust and older stars. Also on the agenda are more advanced tools, like those which power the Galaxy Zoo Navigator which allows primarily school groups to look at the statistics of their classifications.

Correction I wasn’t listening properly to Aprajita; Spacewarps got 2 million classifications in the first week, and at the time of ZooCon was over 6 million. I’ve corrected the post. 1st July 2013.

Got An Idea for a Zooniverse Project? Propose One

For more than a year, we’ve been openly accepting proposals for new Zooniverse projects and this has brought to life projects such as Seafloor Explorer, Snapshot Serengeti, Notes from Nature and Space Warps.

Yesterday, five Zooniverse projects were featured in The Biologist’s 10 Great Citizen Science Projects – several of them were ideas proposed by researchers we had never met before they came to us and said ‘hey, I have a cool idea for a project‘. We’ve also recently seen articles about how the Zooniverse might be able to help in a crisis and how we provide an excellent avenue for proactive procrastination. Citizen science projects are wide and varied and lots of researchers have great ideas.

So this is a good time to remind everyone that we want to hear from researchers with ideas for Zooniverse projects. If that’s you: propose a project! We have funding from the Alfred P. Sloan Foundation to build your great ideas and work with you to further science. We also have an incredibly talented team of designers, developers, educators and researchers who want to make your idea into an awesome new Zooniverse project.

If you want to know more about this, you can get in touch with any of the team or via our general email address or on Twitter @the_zooniverse. We’re currently working on projects that were proposed earlier this year and we’ll be announcing them soon. Maybe yours will be next?

New Project: Join the Search for ‘Space Warps’

Gravitational lenses – or ‘space warps’ – are created when massive galaxies cause light to bend around them such that they act rather like giant lenses in space. By looking through data that has never been seen by human eyes, our new Space Warps project is asking citizen scientists to help discover some of these incredibly rare objects. We need your help to spot these chance-alignments of galaxies in a huge survey of the night sky. To take part visit www.spacewarps.org.

A Gravitational Lens

Gravitational lenses help us to answer all kinds of questions about galaxies, including how many very low mass stars such as brown dwarfs – which aren’t bright enough to detect directly in many observations – are lurking in distant galaxies. The Zooniverse has always been about connecting people with the biggest questions and now, with Space Warps, we’re taking our first trip to the early Universe. We’re excited to let people be the first to see some of the rarest astronomical objects of all!

The Space Warps project is a lens discovery engine. Joining the search is easy: when you visit the website you are given examples of what space warps look like and are shown how to mark potential candidates on each image. The first set of images to be inspected in this project is from the CFHT (Canada-France-Hawaii Telescope) legacy survey.

Computer algorithms have already scanned the images, but there are likely to be many more space warps that the algorithms have missed. We think that only with human help will we find them all. Realistic simulated lenses are dropped into some images to help you learn how to spot them, and reassure you that you’re on the right track. Previous studies have shown that the human brain is better at identifying complex lenses than computers are, and we know at the Zooniverse that members of the public can be at least as good at spotting astronomical objects as experts! We’re going to use the data from citizen scientists to continuously train computers to become better space warp spotters.

This is a really exciting project and you can read more on the Space Warps blog. As with our other projects it can also be found on Twitter (@SpaceWarps), on Facebook and you can discuss any interesting objects you find on Space Warps Talk. We’re really excited about this project and think you’ll be able to make some amazing discoveries through it.

Galaxy Zoo is Open Source

It’s always a good feeling a be making a codebase open and today it’s time to push the latest version of Galaxy Zoo into the open. As I talked about in my blog post a couple of months ago, making open source code the default for Zooniverse is good for everyone involved with the project.

One significant benefit of making code open is that from here on out it’s going to be much easier to have Zooniverse projects translated into your favourite language. When we build a new project we typically extract the content into something called a localisation file (or localization if you prefer your en_US) which is basically just a plain text file that our application uses. You can view that file for our (US) English translation file here and it looks a little like this:

En

So how do I translate Galaxy Zoo?

I’m glad you asked… It turns out there’s a feature built into the code-hosting platform we’re using (called GitHub) which allows you to basically make your own copy of the Galaxy Zoo codebase. It’s called ‘forking’ and you can read much more about it here but all you need to do to contribute is fork the Galaxy Zoo code repository, add in your new translation file and (there’s a handy script that will generate a template file based on the English version), translate the English values into the new language and send the changes back up to GitHub.

Once you’re happy with the new translation and you’d like us to try it out you can send us a ‘pull request’ (details here). If everything looks good then we can review the changes and pull the new translation into the main Galaxy Zoo codebase. You can see an example of a pull request from Robert Simpson that’s been merged in here.

So what next?

This method of translating projects is pretty new for us and so we’re still finding our way a little here. As a bunch of developers it feels great to be using the awesome collaborative toolset that the GitHub platform offers to open up code and translations to you all.

Cheers

Arfon

Optimizing for interest : Why people aren’t machines

One of the joys of working in the Zooniverse is the sheer variety of people who are interested in our work, and I spent a happy couple of days toward the end of last year at a symposium about Discovery Infomatics – alongside a bunch of AI researchers and their friends who are trying to automate the process of doing science. I don’t think they’d mind me saying that we’re a long, long way from achieving that, but it was a good chance to muse on some of the connections between the work done by volunteers here and by our colleagues who think about machine learning.

In the past we’ve shown that machines can learn from us, but we’ve also talked about the need for a system that can combine the best of human and machine.

These two things are not the same
Robot and human (Thanks to Flickr user NineInchNachosXI)

I’m still convinced that that will especially be needed as the size of datasets produced by scientific surveys continues to increase at a frightening pace. The essential idea is that only the proportion of the data which really needs human attention need be passed to human classifiers; an idea that starts off as a non-brainer (wouldn’t it be nice if we could decide in advance which proportion of Galaxy Zoo systems are too faint or fuzzy for sensible decisions to be made?) and then becomes interestingly complex.

This is particularly true when you start thinking of volunteers not as a crowd, but as a set of individuals. We know from looking at the data from past projects that people’s talents are varied – the people who are good at identifying spiral arms, for example, may not be the same people who can spot the faintest signs of a merger. So if we want to be most efficient, what we should be aiming for is passing each and every person the image that they’d be best at classifying.

That in turn is easy to say, but difficult to deliver in practice. Since the days of the original Galaxy Zoo we’ve tended to shun anything that resembles a test before a volunteer is allowed to get going, and in any case a test which thoroughly examined someone’s ability in every aspect of the task (how do they do on bright galaxies? on faint ones? on distant spirals? on nearby ellipticals? on blue galaxies? what about mergers?) wouldn’t be much fun.

One solution is to use the information we already have; after all, every time someone provides a classification we learn something not only about the thing they’re classifying but also about them. This isn’t a new idea – in astronomy, I think it’s essentially the same as the personal equation used by stellar observers to combine results from different people – but things have got more sophisticated recently.

As I’ve mentioned before, a team from the robotics group in the department of engineering here in Oxford took a look at the classifications supplied by volunteers in the Galaxy Zoo: Supernova project and showed that by classifying the classifiers we could make better classifications. During the Discovery Infomatics conference I had a quick conversation with Tamsyn Waterhouse, a researcher from Google interested in similar problems, and I was able to share results from Galaxy Zoo 2 with her*.

We didn’t get time for a long chat, but I was delighted to hear that work on Galaxy Zoo had made it into a paper Tamsyn presented at a different conference. (You can read her paper here, or in Google’s open access repository here.) Her work, which is much wider than our project, develops a method which considers the value of each classification based (roughly) on the amount of information it provides, and then tries to seek the shortest route to a decision. And it works – she’s able to show that by applying these principles we would have been done with Galaxy Zoo 2 faster than we were – in other words, we wasted some people’s time by not being as efficient as we could be.

A reminder of what Galaxy Zoo 2 looked like!
A reminder of what Galaxy Zoo 2 looked like!

That doesn’t sound good – not wasting people’s time is one of the fundamental promises we make here at the Zooniverse (it’s why we spend a lot of time selecting projects that genuinely need human classifications). Zoo 2 was a long time in the past, but knowing what we know now should we be implementing a suitable algorithm for all projects from here on in?

Probably not. There are some fun technical problems to solve before we could do that anyway, but even if we could, I don’t think we should. The current state of the art of such work misses, I think, a couple of important factors which distinguish citizen science projects from other examples considered in Tamsyn’s paper particularly. To state the obvious: volunteer classifiers are different from machines. They get bored. They get inspired. And they make a conscious or an unconscious decision to stay for another classification or to go back to the rest of the internet.

The interest a volunteer will have in a project will change as they move (or are moved by the software) from image to image and from task to task, and in a complicated way. Imagine getting a galaxy that’s difficult to classify; on a good day you might be inspired by the challenge and motivated to keep going, on a bad one you might just be annoyed and more likely to leave. We all learn as we go, too, and so our responses to particular images change over time. The challenge is to incorporate these factors into whatever algorithm we’re applying so that we can maximise not only efficiency, but interest. We might want to show the bright, beautiful galaxies to everyone, for example. Or start simple with easy examples and then expand the range of galaxies that are seen to make the task more difficult. Or allow people a choice about what they see next. Or a million different things.

Whatever we do, I’m convinced we will need to do something; datasets are getting larger and we’re already encountering projects where the idea of getting through all the data in our present form is a distant dream. Over the next few years, we’ll be developing the Zooniverse infrastructure to make this sort of experimentation easier, looking at theory with the help of researchers like Tamsyn to see what happens when you make the algorithms more complicated, and talking to our volunteers to find out what they want from these more complicated projects – all in our twin causes of doing as much science as possible, while providing a little inspiration along the way.

* – Just to be clear, in both cases all these researchers got was a table of classifications without any way of identifying individual volunteers except by a number.