Making the Zooniverse Open Source

We’re pleased to announce that the time has come to start making the Zooniverse open source. From today, you’ll be able to see several of our current projects on Github (at https://github.com/zooniverse) and will be able to fork them and contribute to them.

Taking the Zooniverse open source is something we’ve been thinking about for a long time. As the field of citizen science expands into ever broader domains the number of tools available to people to start their own projects is still low. Since the launch of Galaxy Zoo 2 we’ve been building tools that allow for code reuse across a number of projects and while the majority(1) of our software has never been ‘officially’ open, behind the scenes we’ve been sharing with pretty much anyone who asked, often talking them through the thought process that led us to design our software in a particular way.

Because of our natural inclination to share with those who approached us, we’ve never really made publishing our code a priority. As with most closed source projects there are also a number of pretty boring (but sometimes important) reasons for not publishing – we worried about how usable the code we’d written was to people we didn’t work closely with – as a small team we favour clean code and conversation with other developers over heavy documentation. Some sensitive information around our production environment inevitably slipped into the codebases which mean’t lots of work to clean up and security audit our tools. Some of these reasons hold for legacy applications each project we start often comes with a new Git repo and an opportunity to develop in a different way.

What does this mean?

Well, from today you’ll start to see a number of applications appear on the Zooniverse GitHub site. We’re starting with a collection of our most recent projects: Snapshot Serengeti, Bat Detective, Cyclone Center and Seafloor Explorer.

It’s important to say here that we’re not expecting a community of developers to jump in a help us develop new projects (although that would be pretty cool), but if there’s a typo on our site or a really annoying bug that you know exactly how to fix then fork the repo and send us a pull request and we’ll see what we can do. Significantly for our localisation support (translating sites into multiple languages) we’re proposing that new translations should be submitted in exactly this way (2). There are a huge number of very talented people in the Zooniverse community who until today had no way of contributing to the project other than to help analyse data. That changes today.

We’re releasing our software under a very liberal license – Apache 2.0. In very simple terms this means that the tools we develop can be used for whatever you like provided you follow the rules of the Apache 2.0 license.

What aren’t we open sourcing?

In truth lots of legacy code for our older projects aren’t likely to make it into the open. A large number of our projects between 2009 (Galaxy Zoo 2) and 2011 (The Milky Way Project) were all built upon a shared codebase called The Juggernaut. While we’re not making each of the projects open we’re are publishing the common application core which has been kept up to date and runs on Rails 3.1.

We’re also not opening up our applications that hold sensitive user information and are mission-critical for the operation of the Zooniverse. That’s not to say we won’t ever do this, we’re just not comfortable publishing these applications at this point. This basically means that the application that powers Zooniverse Home (www.zooniverse.org) and an application called Ouroboros (api.zooniverse.org) that serves up images and collects back classifications aren’t part of our open source strategy.

Why now?

Aside from the reasons mentioned above, there are a number of reasons to make open source our default position. In part it’s about people – developers these days are often hired (or at least shortlisted) by their GitHub profiles that show which projects they’ve been working on. As our team grows and we hire talented young developers we’re doing them a disservice not allowing them to show off the awesome work they do. It’s also about the way in which we as the Zooniverse do science. We believe citizen science is an inherently open way of doing research, we often work with open datasets (such as SDSS) and ask people to donate their time and efforts to a project that in the end produces open data products for the research community to enjoy (e.g. data.galaxyzoo.org, data.milkywayproject.org). Having a closed codebase for everything we do just feels incompatible with this way of doing research.

What’s next?

To be honest we’re not quite sure. Going forward, our projects will typically become open source as we launch them. If there’s a Zooniverse project that you think you’d like to rework for a different purpose then there’s now nothing stopping you from doing this. If you’re interested in helping us with a new translation for your favourite project then we’d love to talk. Perhaps you’re just interested to see how some of our applications work. Regardless, we invite you to take a look and give us feedback. The Zooniverse has always been about harnessing the crowd to make science happen. From today, there is a new way for people to contribute to that goal.

Cheers
Arfon

Footnotes:
1. Scribe, our open source text transcription framework grew out of Old Weather and has been used on a number of projects now.
2. A fuller article about language support is coming very soon on this blog.

2 thoughts on “Making the Zooniverse Open Source”

  1. This is great news!

    What can keen zooites do, those who think python is a snake, and ruby a semi-precious gemstone (i.e. those whose greatest coding achievement is writing a simple formula in a spreadsheet)?

    Somewhat related: are there any plans, or thoughts, to develop a more formal way of soliciting zooite feedback, and to enable discussion of such?

    The Zooniverse seems – until now! – to be almost entirely top-down driven [1]: professional scientists work on developing projects, launching them, collecting clicks (data), analyzing results, publishing them, … ordinary zooites play almost no part in any of this (beyond some beta testing, being moderators in forums, and every now and then ending up as co-authors of some papers). I think having more structured two-way communication would lead to a better Zooniverse all round.

    [1] Lens Zoo would seem to be at least a partial exception (though it’s not up and running yet)

Leave a comment