Category Archives: Technical

Introducing VOLCROWE – Volunteer and Crowdsourcing Economics

volcrowe

Hi everyone, I’d like to let you know about a cool new project we are involved with. VOLCROWE is a three year research project funded by the Engineering and Physical Sciences Research Council in the UK, bringing together a team of researchers (some of which are already involved with the Zooniverse, like Karen Masters) from the Universities of Portsmouth, Oxford, Manchester and Leeds. The PI of the project Joe Cox says “Broadly speaking, the team wants to understand more about the economics of the Zooniverse, including how and why it works in the way that it does. Our goal is to demonstrate to the community of economics and management scholars the increasingly amazing things that groups of people can achieve when they work together with a little help from technology. We believe that Zooniverse projects represent a specialised form of volunteering, although the existing literature on the economics of altruism hasn’t yet taken into account these new ways in which people can give their time and energy towards not-for-profit endeavours. Working together with Zooniverse volunteers, we intend to demonstrate how the digital economy is making it possible for people from all over the world to come together in vast numbers and make a contribution towards tackling major scientific problems such as understanding the nature of the Universe, climate change and even cancer.

These new forms of volunteering exemplified by the Zooniverse fundamentally alter the voluntary process as it is currently understood. The most obvious change relates to the ways in which people are able to give their time more flexibly and conveniently; such as contributing during their daily commute using a smart phone! It also opens new possibilities for the social and community aspects of volunteering in terms of creating a digitally integrated worldwide network of contributors. It may also be the case that commonly held motivations and associations with volunteering don’t hold or work differently in this context. For example, religious affiliations and memberships may or may not be as prevalent as they are with more traditional or recognised forms of volunteering. With the help of Zooniverse volunteers, the VOLCROWE team are exploring all of these issues (and more) with the view to establishing new economic models of digital volunteering.

To achieve this aim, we are going to be interacting with the Zooniverse community in a number of ways. First, we’ll be conducting a large scale survey to find out more about its contributors (don’t worry – you do not have to take part in the survey or give any personal information if you do not want to!). The survey data will be used to test the extent to which assumptions made by existing models of volunteering apply and, if necessary, to formulate new ones. We’ll also be taking a detailed look at usage statistics from a variety of projects and will test for trends in the patterns of contributions across the million (and counting) registered Zooniverse volunteers. This larger-scale analysis will be supplemented with a number of smaller sessions with groups of volunteers to help develop a more nuanced understanding of people’s relationships with and within the Zooniverse. Finally, we’ll be using our expertise from the economic and management sciences to study the organisation of the Zooniverse team themselves and analyse the ways and channels they use to communicate and to make decisions. In short, with the help of its volunteers, we want to find out what makes the Zooniverse tick!

In the survey analysis, no information will be collected that could be used to identify you personally. The only thing we will ask for is a Zooniverse ID so that we can match up your responses to your actual participation data; this will help us address some of the project’s most important research questions. The smaller group and one-to-one sessions will be less anonymous by their very nature, but participation will be on an entirely voluntary basis and we will only ever use the information we gather in a way in which you’re comfortable. The team would really appreciate your support and cooperation in helping us to better understand the processes and relationships that drive the Zooniverse. If we can achieve our goals, we may even be able to help to make it even better!”

Keep an eye out for VOLCROWE over the coming weeks and months; they’d love you to visit their website and follow them on Twitter.

Grant and the Zooniverse Team

Not the Premier League : How Zooniverse got blocked by the courts

Anyone browsing the BBC News Technology section last night might have seen an unexpected appearance of a couple of our projects in this story about illegal streaming of Premier League football games. The story started on Saturday with an email from a volunteer pointing out that Virgin Media, a major Internet Service Provider in the UK, were blocking access to Notes From Nature. All is well now, but if you do experience problems please let us know. If you’d like the background, then read on.

Continue reading Not the Premier League : How Zooniverse got blocked by the courts

Zooniverse, GitHub and the future

In case you haven’t noticed I’ve had a pretty busy five years at the Zooniverse. With more than 25 projects launched in fields from astronomy to biodiversity and from climataology all the way to zoology, it’s been an incredible experience to work with so many new science teams hungry for answers to research questions that can only be answered by enlisting the help of a large number of volunteers. This model of citizen science, one where we boil down the often complex analysis task brought to us by a science team to the ‘simplest thing that will work’, build a rich user experience and then ask a bunch of people to help, seems to work pretty well.

For me, one of the best aspects of what I get to do is that I work in a domain that is an inherently open way of doing research. Having joined Zooniverse when we were still ‘just’ Galaxy Zoo, to see the range of projects we host broaden and to watch our community mature has been a remarkable experience. With our latest endeavour – the Galaxy Zoo Quench project – it’s clear that the line between the activites of the ‘science’ team and the ‘volunteers’ is becoming less defined by the day. Citizen-led science in the Zooniverse began with a group of people in the Galaxy Zoo Forum, ‘The Peas Corp’ when they discovered a new class of galaxy, and it continues today with volunteers discovering new types of worms, exotic exoplanets and even, through Quench, analysing and writing a new paper as a group. These of course are just examples I’ve taken from the Zooniverse and there are many more in other projects run by other people, but in each case the result is the same: by enagaing the public in a meaningful way Citizen Science is challenging the centuries old practices of academia and that has to be a good thing.

The opportunity to change the way science is done, whether it’s building software to increase efficiency or developing new collaboration models, is what brought me to the Zooniverse and now it’s what is leading me away. At the end of September this year I’m going to be hanging up my hat as Technical Lead of the Zooniverse and joining GitHub as their ‘science guy’.

As with all big decisions in life this wasn’t an easy one. I feel very fortunate to have had the opportunity to give technical direction to an incredible team of scientists, developers, educators and designers here at the Adler and the wider Zooniverse. But over the past couple of years I’ve also got to know a number of the GitHub folks and I’ve been hugely impressed by their focus on building the very best platform possible for online collaboration. Starting with the very simple idea that ‘it should be easier to work together than alone’ they’ve clearly nailed what it looks like to work on a problem with others in code. But software isn’t the only thing people are sharing on GitHub – legislators are publishing drafts of state law, technicians are documenting scientific laboratory protocols and with tools like the IPython Notebook researchers have defined formats and means of sharing entire research workflows.

The mantra of ‘collaborative versioned science’ has been rattling around my head now for a couple of years. I believe there’s an opportunity for GitHub to be the platform for capturing the process of scientific discovery and I want to help make that happen.

So what does this mean for the Zooniverse? Well, I’m leaving at a pretty good time as the Zooniverse has never been healthier – there’s a first-class web and education team of twelve people I’m going to be leaving behind at the Adler Planetarium in Chicago and we’ve just secured several large grants to expand our sister team at The University of Oxford to ten people (watch this space for job ads).

With all of these people and a number of major development projects in the pipeline we’re going to need a new Technical Lead. If this sounds fun, like you might be a good fit (and you’re able to work in the UK or US) then drop myself and Chris Lintott a line (we’re arfon@zooniverse.org and chris@zooniverse.org) – we’d love to talk. Our software is a mixture of Ruby, Rails and Javascript and we like using technologies like MongoDB, Redis, Amazon Web Services and Hadoop. We get to work on hard data science problems, build custom software for solving crowdsourcing at scale and work with some incredibly smart and creative collaborators.  Whoever takes over is going to have a lot of fun.

Arfon

PS If you’d like to know more about what work looks like as a Technical Lead of the Zooniverse then I’ve written recently about some of the problems we’ve addressed over the past few years herehere and here.

New Talk Feature: Automatic Favourites Collection

Today we’ve added a new feature to all the Zooniverse sites that use the new version of Talk. As you’ll know, most of our projects allow you to save ‘favourites’ – a list of things that are either cool/interesting/worthy of keeping and something to refer back to later. One often asked for feature is for this collection of favourites in the project to be available in Talk as a collection.

Today we’ve added this feature and from now on when you favourite something on the main site (e.g. Galaxy Zoo) it will automagically appear in a collection called ‘Favourites’ on Talk. That means you can discuss, share or even import them as a data source into tools.zooniverse.org

Enjoy!

Zoo Tools: A New Way to Analyze, View and Share Data

Since the very first days of Galaxy Zoo, our projects have seen amazing contributions from volunteers who have gone beyond the main classification tasks. Many of these examples have led to scientific publications, including Hanny’s Voorwerp, the ‘green pea’ galaxies, and the circumbinary planet PH1b.

One common thread that runs through the many positive experiences we’ve had with the volunteers is the way in which they’ve interacted more deeply with the data. In Galaxy Zoo, much of this has been enabled by linking to the Sloan SkyServer website, where you can find huge amounts of additional information about galaxies on the site (redshift, spectra, magnitudes, etc). We’ve put in similar links on other projects now, linking to the Kepler database on Planet Hunters, or data on the location and water conditions in Seafloor Explorer.

The second part of this that we think is really important, however, is providing ways in which users can actually use and manipulate this data. Some users have been already been very resourceful in developing their own analysis tools for Zooniverse projects, or have done lots of offline work pulling data into Excel, IDL, Python, and lots of other programs (see examples here and here). We want to make using the data easier and available to more of our community, which has led to the development of Zoo Tools (http://tools.zooniverse.org). Zoo Tools is still undergoing some development, but we’d like to start by describing what it can do and what sort of data is available.

An Example

Zoo Tools works in an environment which we call the Dashboard – each Dashboard can be thought of as a separate project that you’re working on. You can create new Dashboards yourself, or work collaboratively with other people on the same Dashboard by sharing the URL.

Zoo Tools Main Page

Create a New Dashboard

Within the Dashboard, there are two main functions: selecting/importing data, and then using tools to analyze the data.

The first step for working with the Dashboard is to select the data you’d like to analyze. At the top left of the screen, there’s a tab named “Data”. If you click on this, you’ll see the different databases that Zoo Tools can query. For Galaxy Zoo, for example, it can query the Zooniverse database itself (galaxies that are currently being classified by the project), or you can also analyze other galaxies from the SDSS via their Sky Server website.

Import Data from Zooniverse

Clicking on the “Zooniverse” button, for example, you can select galaxies in one of four ways: a Collection (either your own or someone else’s), looking at your recently classified galaxies, galaxies that you’ve favorited, or specific galaxies via their Zooniverse IDs. Selecting any of these will import them as a dataset, which you can start to look at and analyze. In this example we’ll import 20 recent galaxies.

Import 20 Recents

After importing your dataset, you can use any of the tools in Dashboard (which you can select under “Tools” at the top of the page) on your data. After selecting a tool, you choose the dataset that you’d like to work with from a dropdown menu, and then you can begin using it. For example: if I want to look at the locations of my galaxies on the sky, I can select the “Map” tool. I then select the data source I’d like to plot (in this case, “Zooniverse–1”) and the tool plots the coordinates of each galaxy on a map of the sky. I can select different wavelength options for the background (visible light, infrared, radio, etc), and could potentially use this to analyze whether my galaxies are likely to have more stars nearby based on their position with respect to the Milky Way.

The other really useful part is that the tools can talk to each other, and can pass data back and forth. For example: you could import a collection of galaxies and look at their colour in a scatterplot. You could then select only certain galaxies in that tool, and then plot the positions of those galaxies on the map. This is what we do in the screenshots below:

This slideshow requires JavaScript.

Making Data Analysis Social

You can also share Dashboards with other people. From the Zoo Tools home page you can access your existing dashboards as well as delete them and share them with others. You can share on Twitter and Facebook or just grab the URL directly. For example, the Dashboard above can be found here – with a few more tools added as a demonstration.

Sharing a Dashboard

This means that once you have a Dashboard set up and ready to use, you can send it to somebody else to use too. Doing this will mean that they see the same tools in the same configuration, but on their own account. They can then either replicate or verify your work – or branch off and use what you were doing as a springboard for something new.

What ‘Tools’ Are There?

Currently, there are eight tools available for both regular Galaxy Zoo and the Galaxy Zoo Quench projects:

  • Histogram: makes bar charts of a single data parameter
  • Scatterplot: plot any two data parameters against each other
  • Map: plot the position of objects on the sky, overplotted on maps of the sky at different wavelengths (radio, visible, X-ray, etc.)
  • Statistics: compute some of the most common statistics on your data (eg, mean, minimum, maximum, etc).
  • Subject viewer: examine individual objects, including both the image and all the metadata associated with that object
  • Spectra: for galaxies in the SDSS with a spectrum, download and examine the spectrum.
  • Table: List the metadata for all objects in a dataset. You can also use this tool to create new columns from the data that exists – for example, take the difference between magnitudes to define the color of a galaxy.
  • Color-magnitude: look at how the color and magnitude of galaxies compare to the total population of Galaxy Zoo. A really nice way of visualizing and analyzing how unusual a particular galaxy might be.

We have one tool up and running for Space Warps called Space Warp Viewer. This lets users adjust the color and scale parameters of image to examine potential gravitational lenses in more detail.

Snapshot Serengeti Dashboard

Finally, Snapshot Serengeti has several of the same tools that Galaxy Zoo does, including Statistics, Subject Viewer, Table, and Histogram (aka Bar Graph). There’s also Image Gallery, where you can examine the still images from your datasets, and we’re working on an Image Player. There’s a few very cool and advanced tools we started developing last week – they’re not yet deployed, but we’re really excited to let you follow the activity over many seasons or by focusing on particular cameras. Stay tuned. You can see an example Serengeti Dashboard, showing the distribution of Cheetahs, here (it’s also shown in the screenshot above).

We hope that Zoo Tools will be an important part of all Zooniverse projects in the future, and we’re looking forward to you trying them out. More to come soon!

How the Zooniverse Works: Tools and Technologies

In my last post I described at length the domain model that we use to describe conceptually what the Zooniverse does. That wouldn’t mean much without an implementation of that model and so in this post I’m going to describe some of the tools and technologies that we use to actually run our citizen science projects.

The lifecycle of a Zooniverse project

Let’s think a little more about what happens when you visit a project such as Snapshot Serengeti. Ignoring all of the to-and-fro that your web browser does to work out where the domain name ‘snapshotserengeti.org’ points to, once it’s figured this and a few other details out you basically get sent a website that your browser renders for you. For the website to function as a Zooniverse project a few things are essential:

  1. You need to be able to view images (or listen to audio or watch a video) that we and the science team need your help analysing.
  2. You need to be able to log in with your Zooniverse account.
  3. We need to capture back what you said when doing the citizen science analysis task.
  4. Save out favourite images to your profile.
  5. View recent images you’ve seen in your profile.
  6. Discuss these images with the community.

It turns out that pretty much all of the functionality mentioned above is for us delivered by an application we call Ouroboros as an API layer and a website (such as Snapshot Serengeti) talking to it.

Screen Shot 2013-06-26 at 8.20.51 AM

Ouroboros – or ‘why the simplest API that works is probably all you need’.

So what is Ouroboros? It provides an API (REST/JSON) that allows you to build a Zooniverse project that has all of the core components (1-6) listed above. Technology-wise it’s a custom Ruby on Rails application (Rails 3.2) that uses MongoDB to store data and Redis as a query cache all running on Amazon Web Services. It’s probably utterly useless to anyone but us but for our needs it’s just about perfect.

Screen Shot 2013-06-26 at 8.01.30 AM

At the Zooniverse we’re optimised for a few different things. In no particular order of priority they are:

  1. Volume – we want to be able to build lots of projects.
  2. Science – we want it to be easy to do science with the efforts of our community.
  3. Scale/performance – we want to be able to have millions of people come to our proejcts and them to stay up.
  4. Availability – we’d prefer our websites to be ‘up’ and not ‘down’.
  5. Cost – we want to keep costs at a manageable level.

Pretty much all of these requirements point to having a shared API (Ouroboros) that serves a large number of projects (I’ll argue #4 in the pub with anyone who really wants to push me on it).

Running a core API that serves many projects makes you take the maintenance and health of that application pretty seriously. Should Ouroboros throw a wobbly then we’d currently take out about 10 Zooniverse projects at once and this is only set to increase. This means we’ve thought a lot about how to scale the application for times when we’re busy and we also spend significant amounts of time monitoring the application performance and tuning code where necessary. I mentioned that cost is a factor – running a central API means that when the Zooniverse is quiet and there aren’t many people about we can scale back the number of servers we’re running (automagically on Amazon Web Services) to a minimal level.

We’ve not always built our projects this way. The original Galaxy Zoo (2007) was an ASP/web forms application, projects between Galaxy Zoo 2 and SETI Live were all separate web applications, many of them built using an application called The Juggernaut. Building standalone applications every time not only made it difficult to maintain our projects but we also found ourselves writing very similar (but subtly different) code many times between projects, code for things like choosing which Subject to show next.

Ouroboros is an evolution of our thinking about how to build projects, what’s important and generalisable and what isn’t. At it’s most basic it’s a really fast Subject allocator and Classification collector. Our realisation over the last few years was that the vast majority of what’s different about each project is the user experience and classification interface and this has nothing to do with the API.

Subjects out, Classifications back in.
Subjects out, Classifications back in.

The actual projects

The point of having a central API is that when we want to build a new project we’re already working with a very familiar toolset – the way we log people in, do signup forms, ask for a Subject, send back Classifications – all of this is completely standard. In fact if you’re building in JavaScript (which we almost always are these days) then there’s a client library called ‘Zooniverse’ (meta I know) available here on GitHub.

Having a standard API and client library for talking to it meant that we built the Zooniverse project Planet Four in less than 1 week! That’s not to say it’s trivial to build projects, it’s definitely not, but it is getting easier. And having this standardised way of communicating with the core Zooniverse means that the bulk of the effort when building Planet Four was exactly where it should be – the fan drawing tools – the bit that’s different from any of our other projects.

Screen Shot 2013-06-26 at 8.19.22 AM

So how do we actually build our projects these days? We build our projects as JavaScript web applications using JavaScript web frameworks such as Spine JS, Backbone or something completely custom. The point being, that all of the logic for how the interface should behave is baked into the JavaScript application – Ouroboros doesn’t try and help with any of this stuff.

Currently the majority of our projects are hosted using the Amazon S3 static website hosting service. The benefits of this are numerous but key ones for us are:

  1. There’s no webserver serving the site content, that is http://www.galaxyzoo.org resolves to an S3 bucket. When you access the Galaxy Zoo site S3 does all of the hard work and we just pay for the bandwidth from S3 to your computer.
  2. Deploying is easy. When we want to put out a new version of any of our sites we just upload new timestamped versions of the files and your browser starts using them instead.
  3. It’s S3 – Amazon S3 is a quite remarkable service –  a significant fraction of the web is using it. Currently hosting more than 2 trillion (yes that’s 12 zeroes) objects and regularly serving more than 1 million requests for data per second the S3 service is built to scale and we get to use it (and so can you).

Amazon S3 is a static webhost (i.e. you can’t have any server-side code running) so how do we make a static website into a Zooniverse project you can log in to when we can’t access database records? The main site functions just fine – these JavaScript applications (such as the current Galaxy Zoo or any recent Zooniverse project) implement what is different about the project’s interface. We then use a small invisible iFrame on each website that actually points to api.zooniverse.org which is Ouroboros. When you use a login form we actually set a cookie on this domain and then send all of our requests back to the API through this iFrame. This approach is a little unusual and with browsers tightening up the restrictions on third-party cookies if looks like we might need to swap it out for a different approach but for now it’s working well.

Summing up

If you’re like me then when you read something you read the opening, look at the pictures and then skip to the conclusions. I’ll summarise here just incase you’re doing that too:

In the Zooniverse there’s a clear separation between the API (Ouroboros) and the citizen science projects that the community interact with. Ouroboros is a custom-built, highly scalable application built in Ruby on Rails, that runs on Amazon Web Services and uses MongoDB, Redis and a few other fancy technologies to do its thing.

The actual citizen science projects that people interact with are these days all pure JavaScript applications that are hosted on Amazon S3 and they’re pretty much all open source. They’re generally still bespoke applications each time but share common code for talking to Ouroboros.

What I didn’t talk about in this post are the hardest bits we’ve solved in Ouroboros – namely all of the logic about how to make finding Subjects for people quickly and other ‘smart stuff’. That’s coming up next.

How the Zooniverse Works: The Domain Model

We talk a lot in the Zooniverse about research, whether it’s interesting stories from the community, a new paper based upon the combined efforts of the volunteers and the science teams or conferences we might be going to.

One thing we don’t spend much time talking about is the technology solutions we use to build the Zooniverse sites, the lessons we’ve learned as a team building more than twenty five citizen science projects over the past five years and where we think the technical challenges still remain in building out the Zooniverse into something bigger and better.

There’s a lot to write here so I’m going to break this into three separate blog posts. The first is going to be entirely about the domain model that we use to describe what we do. When it seems relevant I’ll talk a little more about implementation details of these domain entities in our code too. The second will be about technologies and the infrastructure we run the Zooniverse atop of and the third will be about making smarter systems.

Why bother with a domain model?

Firstly it’s worth spending a little time talking about why we need a domain model. In my mind the primary reason for having a domain model is that it gives the team, whether it’s the developers, scientists, educators or designers working on a new project a shared vocabulary to talk about the system we’re building together. It means that when I use the term ‘Classification’ everyone in the team understands that I’m talking about the thing we store in the database that represents a single analysis/interaction of a Zooniverse volunteer with a piece of data (such as a picture of a galaxy), which by the way we call a ‘Subject’.

Technology wise the Zooniverse is split into a core set of web services (or Application Programming Interface, API) that serve up data and collect it back (more about that later) and some web applications (the Zooniverse projects) that talk to these services. The domain model we use is almost entirely a description of the internals of the core Zooniverse API called Ouroboros and this is an application that is designed to support all of the Zooniverse projects which means that some of the terms we use might sound overly generic. That’s the point.

The core entities

The domain model is actually pretty simple. We typically think most about the following entities:

User

People are core to the Zooniverse. When talking publically about the Zooniverse I almost always use the term ‘citizen scientist’ or ‘volunteer’ because it feels like an appropriate term for someone who donates their time to one of our projects. When writing code however, the shortest descriptive term that makes sense is usually selected so in our domain model the term we use is User.

A User is exactly what you’d expect, it’s a person, it has a bunch of information associated with it such as a username, an email address, information about which projects they’ve helped with and a host of other bits and bobs. Crucially though for us, a User is the same regardless of which project they’re working – that is Users are pan-Zooniverse. Whether you’re classiying galaxies over at Galaxy Zoo or identifying animals on Snapshot Serengeti we’re associating your efforts with the same User record each time which turns out to be useful for a whole bunch of reasons (more later).

Subject

Just as people are core, as are the things that they’re analysing to help us do research. In Old Weather it’s a scanned image of a ship log book, in Planet Hunters it’s a light curve but regardless of the project internally we call all of these things Subjects. A Subject is the thing that we present to a User when we want to them to do something.

Subjects are one of the core entities that we want to behave differently in our system depending upon their particular flavour. A log book in Old Weather is only viewed three times before being retired whereas an image in Galaxy Zoo is shown more than 30 times before retiring. This means that for each project we have a specific Subject class (e.g. GalaxyZooSubject) that inherits its core functionality from a parent Subject class but then extends the functionality with the custom behaviour we need for a particular project.

Subjects are then stored in our database with a collection of extra information a particular Subject sub-class can use for each different project. For example in Galaxy Zoo we might store some metadata associated with the survey telescope that imaged the galaxy and in Cyclone Center we store information about the date, time and position the image was recorded.

Workflow/Task

These two entities are grouped together as they’re often used to mean broadly the same thing. When a User is presented with a Subject on one of our projects we ask them to do something. This something is called the Task. These Tasks can be grouped together into a Workflow which is essentially just a grouping entity. To be honest we don’t use it very much as most projects just have a single Workflow but in theory it allows us to group a collection of Tasks into a logical unit. In Notes from Nature each step of the transcription (such as ‘What is the location?’) is a separate Task, in Galaxy Zoo, each step of the decision tree is a Task too.

Classification

It’s no accident that I’ve introduced these three entities, User, Subject and Task first as a combination of these is what we call a Classification. The Classification is the core unit of human effort produced by the Zooniverse community as it represents what a person saw and what they said about it. We collect a lot of these – across all of the Zooniverse projects to date we must be getting close to 500 million Classifications recorded.

I’ll talk more about what we store in a Classification in a followup the next post about technologies suffice to say now that they store a full description of what the User said about the object. In previous versions of the Zoonivese API software we tried to break these records out into smaller units called Annotations but we don’t do that any more – it was an unnecessary step.

Screen Shot 2013-06-20 at 2.36.12 PM

Group

Sometimes we need to group Subjects together for some higher level function. Perhaps it’s to represent a season’s worth of images in Snapshot Serengeti or a particular cell dye staining in Cell Slider. Whatever the reason for grouping, the entity we use to describe this is ‘Group’.

Grouping records is both one of the most useful features Ouroboros offers but also one of the things it took the longest for us to find the right level of abstraction. While a Group can represent an astronomical survey in Galaxy Zoo (such as the Hubble CANDELS survey) or a Ship in Old Weather, it isn’t just enough for a bunch of Subjects to all be associated with each other. There’s often a lots of functionality that goes along with a Group or the Subjects within that is custom for each Zooniverse project. Ultimately we’ve solved this in a similar fashion to Subject – by having per-project subclasses of Groups (i.e. there is a SerengetiGroup that inherits from Group) that can set custom behaviour as required.

Project

Ouroboros (the Zooniverse API) hosts a whole bunch of different Zooniverse projects so it’s probably no surprise that we represent the actual citizen science project within our domain model. No prize for guessing the name of this entity – it’s called Project.

A Project is really just the overarching named entity that Subjects, Classifications and Groups are associated with. Project in Ouroboros does some other cool stuff for us though. It’s the Project that knows about the Groups, its current status (such as how many Subjects are complete) and other adminstrative functions. We also sometimes deliver a slightly different user experience to different Users in what are known as A/B splits – it’s the Project that manages these too.

So that’s about it. There are a few more entities routinely in discussion in the Zooniverse team such as Favourite (something a User favourites when they’re on one of our projects) but they’re really not core to the main operation of a project.

The domain description we’re using today is informed by everthing we’ve learnt over the past five years of building proejcts. It’s also a consequence of how the Zooniverse has been functioning – we try lots of projects in lots of different research domains and so we need a domain model that’s flexible enough to support something like Notes from Nature, Planet Four and Snapshot Seregeti but not so generic that we can’t build rich user experiences.

We’ve also realised that the vast majority of what’s differenct about each project is the user experience and classification interface. We’re always likely to want to put significant effort into developing the best user experience possible and so having an API that abstracts lots of the complexity away and allows us to focus on what’s different about each project is a big win.

Our domain model has also been heavily influenced by the patterns that have emerged working with science teams. In the early years we spent a lot of time abstracting out each step of the User interaction with a Subject into distinct descriptive entities called Annotations. While in theory these were a more ‘complete’ description of what a User did, the science teams rarely used them and almost never in realtime operations. The vast majority of Zooniverse projects to date collect large numbers of Classifications that are write once, read very much later. Realising this has allowed us to worry less about exactly what we’re storing at a given time and focus on storing data structures that are a convenient for the scientists to work with.

Overall the Zooniverse domain model has been a big success. When designing for the Zooniverse we really were developing a new system unlike anything else we knew of. It’s terminology is pervasive in the collaboration and makes conversations much more focussed and efficient which can only be a good thing.

Galaxy Zoo is Open Source

It’s always a good feeling a be making a codebase open and today it’s time to push the latest version of Galaxy Zoo into the open. As I talked about in my blog post a couple of months ago, making open source code the default for Zooniverse is good for everyone involved with the project.

One significant benefit of making code open is that from here on out it’s going to be much easier to have Zooniverse projects translated into your favourite language. When we build a new project we typically extract the content into something called a localisation file (or localization if you prefer your en_US) which is basically just a plain text file that our application uses. You can view that file for our (US) English translation file here and it looks a little like this:

En

So how do I translate Galaxy Zoo?

I’m glad you asked… It turns out there’s a feature built into the code-hosting platform we’re using (called GitHub) which allows you to basically make your own copy of the Galaxy Zoo codebase. It’s called ‘forking’ and you can read much more about it here but all you need to do to contribute is fork the Galaxy Zoo code repository, add in your new translation file and (there’s a handy script that will generate a template file based on the English version), translate the English values into the new language and send the changes back up to GitHub.

Once you’re happy with the new translation and you’d like us to try it out you can send us a ‘pull request’ (details here). If everything looks good then we can review the changes and pull the new translation into the main Galaxy Zoo codebase. You can see an example of a pull request from Robert Simpson that’s been merged in here.

So what next?

This method of translating projects is pretty new for us and so we’re still finding our way a little here. As a bunch of developers it feels great to be using the awesome collaborative toolset that the GitHub platform offers to open up code and translations to you all.

Cheers

Arfon

The New Zooniverse Home

If you visit Zooniverse Home today you may notice that a few things have changed. For a while now we’ve wanted to improve the design of the site to better handle the growing range of projects that are part of the Zooniverse. With the updates released today we’ve completely reworked the look and feel of the site. We’ve organised projects into different categories, starting with Space, Climate and Humanities. There are lots of new categories coming soon.

Today we’re also launching an entirely new type of project as part of ‘Zooniverse Labs’. To date all Zooniverse projects come with the guarantee that your work will contribute directly toward real research output (usually academic papers), however this can make it hard to try new ideas. As the name suggests, Zooniverse Labs is about experimentation and we’re excited about launching new, often smaller-scale websites that continue to stretch the boundaries of science online. More news on this very soon.

Another new part of the redesigned Zooniverse homepage is the ‘My Projects’ section that allows you to see your contributions to the Zooniverse, and what else you might like to try out. This part of the site is very much in beta, and we hope to add to it in the coming months.

We hope you like the new look, and that it helps you explore more of what the Zooniverse has to offer.

Recent downtime at The Zooniverse

Last week the majority of The Zooniverse websites suffered their first major outage for over 2 years. Firstly I wanted to reassure you that the valuable classifications you all provide was always secure but perhaps more importantly I wanted to say sorry – we should have recovered more quickly from this incident and we’ll be working hard this week to put better systems in place to enable us to recover more rapidly in the future.

For those who’d like more of the gory details, read on…

Some background

Many of you will know the story of the launch of the original Galaxy Zoo (in July 2007); just a few hours after launch the website hosted by the SDSS web team and Johns Hopkins University crashed under the strain of thousands of visitors to the site. Thankfully due to the heroic efforts of the JHU team (involving a new web server being built) the Galaxy Zoo site recovered and a community of hundreds of thousands of zooties was born.

Fast-forward to February 2009 and we were planning for the launch of Galaxy Zoo 2. This time we knew that we were going to have a busy launch day – Chris was once again going to be on BBC breakfast. So that we could keep the site running well during the extremely busy periods we looked to commercial solutions for scalable web hosting. There are a number of potential choices in this arena but by far the most popular and reliable service was that offered by Amazon (yes the book store) and their Web Services platform. While I won’t dig into the technical details of Amazon Web Services (AWS) here, the fundamental difference when running web sites on AWS is that you have a collection of ‘virtual machines’ rather than physical servers.

If any of you have ever run Virtual PC or VMWare on your own computer then you’ll already realise that using machine virtualisation it’s possible to run a number of virtual machines on a single physical computer. This is exactly what AWS do except they do it at a massive scale (millions of virtual machines) and have some fantastic tools to help you build new virtual servers. One particularly attractive feature of using these virtual machines is that you can essentially have as many as you want and you only pay by the hour. At one point on the Galaxy Zoo 2 launch day we had 20 virtual machines running the Galaxy Zoo website, API and databases. 2 days later we were running only 3. The ability to scale up (and down) in realtime the number of virtual machines means that we are able to cope with huge variations in the traffic that a particular Zooniverse site may be receiving.

The outage last week

As I write this blog post we currently have 22 virtual servers running on AWS. That includes all of The Zooniverse projects, the database servers, caching servers for Planet Hunters, our blogs, the forums and Talk and much more. Amazon have a number of hosting ‘regions’ that are essentially different geographical locations where they have datacenters. We happen to host in the ‘us-east’ region in Virginia – conveniently placed for both Europe and American traffic.

We have a number of tools in place that monitor the availability of our web sites and last Thursday at about 9am GMT I received a text-message notification that our login server (login.zooiverse.org) was down. We have a rota within the dev team for keeping an eye on the production web servers and last week it was my turn to be on call.

I quickly logged on to our control panel and saw that there was a problem with the virtual machine and attempted a reboot. At this point I also started to receive notifications that a number of the Zooniverse project sites were also unavailable. At this point realising that something rather unusual was going on I checked the Amazon status page which was ‘all green’, i.e. no known issues. Amazon can be a little slow to update this page so I also checked Twitter (https://twitter.com/#!/search/aws). Twitter was awash with people complaining that their sites were down and that they couldn’t access their virtual machines. Although this wasn’t ‘good’ news, it’s always helpful to understand in a situation such as this if the issue is with the code that we run on the servers or the servers themselves.

Waiting for a fix?

At this point we rapidly put up holding pages on our project sites and reviewed the status of each project site and service that we run. As the morning progressed it became clear that the outage was rather serious and actually became significantly worse for The Zooniverse as in turn, each of the database servers that we run became inaccessible. We take great care to execute nightly backups of all of our databases and so when the problems started the oldest backup was 4 hours old. With hindsight when the problems first started we should have immediately moved The Zooniverse servers to a different AWS region (this is actually what we did do with login.zooniverse.org) and booted up new database servers with the backup from the night before however we were reluctant to do this because of the need to reintegrate the classifications made by the community during the outage. But hindsight is always 20-20 and this isn’t what we did. Instead, believing that a fix for the current situation was only a matter of hours away we waited for Amazon to fix the problem for us.

As the day progressed a number of the sites became available for short periods and then inaccessible again. It wasn’t a fun day to be on operations duty with servers continually going up and down. Worse, our blogs were also unavailable so we only had Twitter to communicate what was going on. At about 11pm on the first evening a number of The Zooniverse projects had been up for a number of hours and things looked to be improving. Chris and I spoke on the phone and we agreed that if things weren’t completely fixed by the morning we’d move the web stack early Friday morning.

Friday a.m.

Friday arrived and the situation was slightly improved but the majority of the our projects were still in maintenance mode so I set about rebuilding The Zoonivere web stack. Three out of five of our databases were accessible again and so I took a quick backup of all of the three and booted up replacement databases and web servers. This was all we needed to restore the majority of the projects and by lunchtime on Friday we pretty much had a fully working Zooniverse again. The database server used by the blogs and forums took a little longer to recover and so it was Friday evening before they were back up.

A retrospective

We weren’t alone in having issues with AWS last week. Sites such as Reddit and Foursquare were also affected as were thousands of other users of the service. I think the team at the Q&A site Quora put it best when they said ‘We’d point fingers, but we wouldn’t be where we are today without EC2.’ on their holding page and this is certainly true of The Zooniverse.

Over the past 2 years we’ve only been able to deliver the number of projects that we have and the performance and uptime that we all enjoy due to the power, flexibility and reliability of the AWS platform. Amazon have developed a number of services that mean that it was possible (in theory) to protect against the failure they experienced in the US-east region last week. Netflix was notably absent from the list of AWS hosted sites affected by the AWS downtime and this is because they’ve gone to huge effort (and expense) to protect themselves against such a scenario (http://techblog.netflix.com/2010/12/5-lessons-weve-learned-using-aws.html).

Unfortunately with the limited resources available to The Zooniverse we’re not able to build a resilient web stack as Netflix however there are a number of steps we’ll be taking this to make sure that we’re in a much better position to recover from a similar incident in the future so that we experience downtime of minutes and hours rather than days.

Cheers
Arfon & The Tech Team