Anyone browsing the BBC News Technology section last night might have seen an unexpected appearance of a couple of our projects in this story about illegal streaming of Premier League football games. The story started on Saturday with an email from a volunteer pointing out that Virgin Media, a major Internet Service Provider in the UK, were blocking access to Notes From Nature. All is well now, but if you do experience problems please let us know. If you’d like the background, then read on.
In case you haven’t noticed I’ve had a pretty busy five years at the Zooniverse. With more than 25 projects launched in fields from astronomy to biodiversity and from climataology all the way to zoology, it’s been an incredible experience to work with so many new science teams hungry for answers to research questions that can only be answered by enlisting the help of a large number of volunteers. This model of citizen science, one where we boil down the often complex analysis task brought to us by a science team to the ‘simplest thing that will work’, build a rich user experience and then ask a bunch of people to help, seems to work pretty well.
For me, one of the best aspects of what I get to do is that I work in a domain that is an inherently open way of doing research. Having joined Zooniverse when we were still ‘just’ Galaxy Zoo, to see the range of projects we host broaden and to watch our community mature has been a remarkable experience. With our latest endeavour – the Galaxy Zoo Quench project – it’s clear that the line between the activites of the ‘science’ team and the ‘volunteers’ is becoming less defined by the day. Citizen-led science in the Zooniverse began with a group of people in the Galaxy Zoo Forum, ‘The Peas Corp’ when they discovered a new class of galaxy, and it continues today with volunteers discovering new types of worms, exotic exoplanets and even, through Quench, analysing and writing a new paper as a group. These of course are just examples I’ve taken from the Zooniverse and there are many more in other projects run by other people, but in each case the result is the same: by enagaing the public in a meaningful way Citizen Science is challenging the centuries old practices of academia and that has to be a good thing.
The opportunity to change the way science is done, whether it’s building software to increase efficiency or developing new collaboration models, is what brought me to the Zooniverse and now it’s what is leading me away. At the end of September this year I’m going to be hanging up my hat as Technical Lead of the Zooniverse and joining GitHub as their ‘science guy’.
As with all big decisions in life this wasn’t an easy one. I feel very fortunate to have had the opportunity to give technical direction to an incredible team of scientists, developers, educators and designers here at the Adler and the wider Zooniverse. But over the past couple of years I’ve also got to know a number of the GitHub folks and I’ve been hugely impressed by their focus on building the very best platform possible for online collaboration. Starting with the very simple idea that ‘it should be easier to work together than alone’ they’ve clearly nailed what it looks like to work on a problem with others in code. But software isn’t the only thing people are sharing on GitHub – legislators are publishing drafts of state law, technicians are documenting scientific laboratory protocols and with tools like the IPython Notebook researchers have defined formats and means of sharing entire research workflows.
The mantra of ‘collaborative versioned science’ has been rattling around my head now for a couple of years. I believe there’s an opportunity for GitHub to be the platform for capturing the process of scientific discovery and I want to help make that happen.
So what does this mean for the Zooniverse? Well, I’m leaving at a pretty good time as the Zooniverse has never been healthier – there’s a first-class web and education team of twelve people I’m going to be leaving behind at the Adler Planetarium in Chicago and we’ve just secured several large grants to expand our sister team at The University of Oxford to ten people (watch this space for job ads).
PS If you’d like to know more about what work looks like as a Technical Lead of the Zooniverse then I’ve written recently about some of the problems we’ve addressed over the past few years here, here and here.
Today we’ve added a new feature to all the Zooniverse sites that use the new version of Talk. As you’ll know, most of our projects allow you to save ‘favourites’ – a list of things that are either cool/interesting/worthy of keeping and something to refer back to later. One often asked for feature is for this collection of favourites in the project to be available in Talk as a collection.
Today we’ve added this feature and from now on when you favourite something on the main site (e.g. Galaxy Zoo) it will automagically appear in a collection called ‘Favourites’ on Talk. That means you can discuss, share or even import them as a data source into tools.zooniverse.org
Since the very first days of Galaxy Zoo, our projects have seen amazing contributions from volunteers who have gone beyond the main classification tasks. Many of these examples have led to scientific publications, including Hanny’s Voorwerp, the ‘green pea’ galaxies, and the circumbinary planet PH1b.
One common thread that runs through the many positive experiences we’ve had with the volunteers is the way in which they’ve interacted more deeply with the data. In Galaxy Zoo, much of this has been enabled by linking to the Sloan SkyServer website, where you can find huge amounts of additional information about galaxies on the site (redshift, spectra, magnitudes, etc). We’ve put in similar links on other projects now, linking to the Kepler database on Planet Hunters, or data on the location and water conditions in Seafloor Explorer.
The second part of this that we think is really important, however, is providing ways in which users can actually use and manipulate this data. Some users have been already been very resourceful in developing their own analysis tools for Zooniverse projects, or have done lots of offline work pulling data into Excel, IDL, Python, and lots of other programs (see examples here and here). We want to make using the data easier and available to more of our community, which has led to the development of Zoo Tools (http://tools.zooniverse.org). Zoo Tools is still undergoing some development, but we’d like to start by describing what it can do and what sort of data is available.
Zoo Tools works in an environment which we call the Dashboard – each Dashboard can be thought of as a separate project that you’re working on. You can create new Dashboards yourself, or work collaboratively with other people on the same Dashboard by sharing the URL.
Within the Dashboard, there are two main functions: selecting/importing data, and then using tools to analyze the data.
The first step for working with the Dashboard is to select the data you’d like to analyze. At the top left of the screen, there’s a tab named “Data”. If you click on this, you’ll see the different databases that Zoo Tools can query. For Galaxy Zoo, for example, it can query the Zooniverse database itself (galaxies that are currently being classified by the project), or you can also analyze other galaxies from the SDSS via their Sky Server website.
Clicking on the “Zooniverse” button, for example, you can select galaxies in one of four ways: a Collection (either your own or someone else’s), looking at your recently classified galaxies, galaxies that you’ve favorited, or specific galaxies via their Zooniverse IDs. Selecting any of these will import them as a dataset, which you can start to look at and analyze. In this example we’ll import 20 recent galaxies.
After importing your dataset, you can use any of the tools in Dashboard (which you can select under “Tools” at the top of the page) on your data. After selecting a tool, you choose the dataset that you’d like to work with from a dropdown menu, and then you can begin using it. For example: if I want to look at the locations of my galaxies on the sky, I can select the “Map” tool. I then select the data source I’d like to plot (in this case, “Zooniverse–1”) and the tool plots the coordinates of each galaxy on a map of the sky. I can select different wavelength options for the background (visible light, infrared, radio, etc), and could potentially use this to analyze whether my galaxies are likely to have more stars nearby based on their position with respect to the Milky Way.
The other really useful part is that the tools can talk to each other, and can pass data back and forth. For example: you could import a collection of galaxies and look at their colour in a scatterplot. You could then select only certain galaxies in that tool, and then plot the positions of those galaxies on the map. This is what we do in the screenshots below:
Making Data Analysis Social
You can also share Dashboards with other people. From the Zoo Tools home page you can access your existing dashboards as well as delete them and share them with others. You can share on Twitter and Facebook or just grab the URL directly. For example, the Dashboard above can be found here – with a few more tools added as a demonstration.
This means that once you have a Dashboard set up and ready to use, you can send it to somebody else to use too. Doing this will mean that they see the same tools in the same configuration, but on their own account. They can then either replicate or verify your work – or branch off and use what you were doing as a springboard for something new.
What ‘Tools’ Are There?
Currently, there are eight tools available for both regular Galaxy Zoo and the Galaxy Zoo Quench projects:
- Histogram: makes bar charts of a single data parameter
- Scatterplot: plot any two data parameters against each other
- Map: plot the position of objects on the sky, overplotted on maps of the sky at different wavelengths (radio, visible, X-ray, etc.)
- Statistics: compute some of the most common statistics on your data (eg, mean, minimum, maximum, etc).
- Subject viewer: examine individual objects, including both the image and all the metadata associated with that object
- Spectra: for galaxies in the SDSS with a spectrum, download and examine the spectrum.
- Table: List the metadata for all objects in a dataset. You can also use this tool to create new columns from the data that exists – for example, take the difference between magnitudes to define the color of a galaxy.
- Color-magnitude: look at how the color and magnitude of galaxies compare to the total population of Galaxy Zoo. A really nice way of visualizing and analyzing how unusual a particular galaxy might be.
We have one tool up and running for Space Warps called Space Warp Viewer. This lets users adjust the color and scale parameters of image to examine potential gravitational lenses in more detail.
Finally, Snapshot Serengeti has several of the same tools that Galaxy Zoo does, including Statistics, Subject Viewer, Table, and Histogram (aka Bar Graph). There’s also Image Gallery, where you can examine the still images from your datasets, and we’re working on an Image Player. There’s a few very cool and advanced tools we started developing last week – they’re not yet deployed, but we’re really excited to let you follow the activity over many seasons or by focusing on particular cameras. Stay tuned. You can see an example Serengeti Dashboard, showing the distribution of Cheetahs, here (it’s also shown in the screenshot above).
We hope that Zoo Tools will be an important part of all Zooniverse projects in the future, and we’re looking forward to you trying them out. More to come soon!
In my last post I described at length the domain model that we use to describe conceptually what the Zooniverse does. That wouldn’t mean much without an implementation of that model and so in this post I’m going to describe some of the tools and technologies that we use to actually run our citizen science projects.
The lifecycle of a Zooniverse project
Let’s think a little more about what happens when you visit a project such as Snapshot Serengeti. Ignoring all of the to-and-fro that your web browser does to work out where the domain name ‘snapshotserengeti.org’ points to, once it’s figured this and a few other details out you basically get sent a website that your browser renders for you. For the website to function as a Zooniverse project a few things are essential:
- You need to be able to view images (or listen to audio or watch a video) that we and the science team need your help analysing.
- You need to be able to log in with your Zooniverse account.
- We need to capture back what you said when doing the citizen science analysis task.
- Save out favourite images to your profile.
- View recent images you’ve seen in your profile.
- Discuss these images with the community.
It turns out that pretty much all of the functionality mentioned above is for us delivered by an application we call Ouroboros as an API layer and a website (such as Snapshot Serengeti) talking to it.
Ouroboros – or ‘why the simplest API that works is probably all you need’.
So what is Ouroboros? It provides an API (REST/JSON) that allows you to build a Zooniverse project that has all of the core components (1-6) listed above. Technology-wise it’s a custom Ruby on Rails application (Rails 3.2) that uses MongoDB to store data and Redis as a query cache all running on Amazon Web Services. It’s probably utterly useless to anyone but us but for our needs it’s just about perfect.
At the Zooniverse we’re optimised for a few different things. In no particular order of priority they are:
- Volume – we want to be able to build lots of projects.
- Science – we want it to be easy to do science with the efforts of our community.
- Scale/performance – we want to be able to have millions of people come to our proejcts and them to stay up.
- Availability – we’d prefer our websites to be ‘up’ and not ‘down’.
- Cost – we want to keep costs at a manageable level.
Pretty much all of these requirements point to having a shared API (Ouroboros) that serves a large number of projects (I’ll argue #4 in the pub with anyone who really wants to push me on it).
Running a core API that serves many projects makes you take the maintenance and health of that application pretty seriously. Should Ouroboros throw a wobbly then we’d currently take out about 10 Zooniverse projects at once and this is only set to increase. This means we’ve thought a lot about how to scale the application for times when we’re busy and we also spend significant amounts of time monitoring the application performance and tuning code where necessary. I mentioned that cost is a factor – running a central API means that when the Zooniverse is quiet and there aren’t many people about we can scale back the number of servers we’re running (automagically on Amazon Web Services) to a minimal level.
We’ve not always built our projects this way. The original Galaxy Zoo (2007) was an ASP/web forms application, projects between Galaxy Zoo 2 and SETI Live were all separate web applications, many of them built using an application called The Juggernaut. Building standalone applications every time not only made it difficult to maintain our projects but we also found ourselves writing very similar (but subtly different) code many times between projects, code for things like choosing which Subject to show next.
Ouroboros is an evolution of our thinking about how to build projects, what’s important and generalisable and what isn’t. At it’s most basic it’s a really fast Subject allocator and Classification collector. Our realisation over the last few years was that the vast majority of what’s different about each project is the user experience and classification interface and this has nothing to do with the API.
The actual projects
Having a standard API and client library for talking to it meant that we built the Zooniverse project Planet Four in less than 1 week! That’s not to say it’s trivial to build projects, it’s definitely not, but it is getting easier. And having this standardised way of communicating with the core Zooniverse means that the bulk of the effort when building Planet Four was exactly where it should be – the fan drawing tools – the bit that’s different from any of our other projects.
Currently the majority of our projects are hosted using the Amazon S3 static website hosting service. The benefits of this are numerous but key ones for us are:
- There’s no webserver serving the site content, that is http://www.galaxyzoo.org resolves to an S3 bucket. When you access the Galaxy Zoo site S3 does all of the hard work and we just pay for the bandwidth from S3 to your computer.
- Deploying is easy. When we want to put out a new version of any of our sites we just upload new timestamped versions of the files and your browser starts using them instead.
- It’s S3 – Amazon S3 is a quite remarkable service – a significant fraction of the web is using it. Currently hosting more than 2 trillion (yes that’s 12 zeroes) objects and regularly serving more than 1 million requests for data per second the S3 service is built to scale and we get to use it (and so can you).
If you’re like me then when you read something you read the opening, look at the pictures and then skip to the conclusions. I’ll summarise here just incase you’re doing that too:
In the Zooniverse there’s a clear separation between the API (Ouroboros) and the citizen science projects that the community interact with. Ouroboros is a custom-built, highly scalable application built in Ruby on Rails, that runs on Amazon Web Services and uses MongoDB, Redis and a few other fancy technologies to do its thing.
What I didn’t talk about in this post are the hardest bits we’ve solved in Ouroboros – namely all of the logic about how to make finding Subjects for people quickly and other ‘smart stuff’. That’s coming up next.
We talk a lot in the Zooniverse about research, whether it’s interesting stories from the community, a new paper based upon the combined efforts of the volunteers and the science teams or conferences we might be going to.
One thing we don’t spend much time talking about is the technology solutions we use to build the Zooniverse sites, the lessons we’ve learned as a team building more than twenty five citizen science projects over the past five years and where we think the technical challenges still remain in building out the Zooniverse into something bigger and better.
There’s a lot to write here so I’m going to break this into three separate blog posts. The first is going to be entirely about the domain model that we use to describe what we do. When it seems relevant I’ll talk a little more about implementation details of these domain entities in our code too. The second will be about technologies and the infrastructure we run the Zooniverse atop of and the third will be about making smarter systems.
Why bother with a domain model?
Firstly it’s worth spending a little time talking about why we need a domain model. In my mind the primary reason for having a domain model is that it gives the team, whether it’s the developers, scientists, educators or designers working on a new project a shared vocabulary to talk about the system we’re building together. It means that when I use the term ‘Classification’ everyone in the team understands that I’m talking about the thing we store in the database that represents a single analysis/interaction of a Zooniverse volunteer with a piece of data (such as a picture of a galaxy), which by the way we call a ‘Subject’.
Technology wise the Zooniverse is split into a core set of web services (or Application Programming Interface, API) that serve up data and collect it back (more about that later) and some web applications (the Zooniverse projects) that talk to these services. The domain model we use is almost entirely a description of the internals of the core Zooniverse API called Ouroboros and this is an application that is designed to support all of the Zooniverse projects which means that some of the terms we use might sound overly generic. That’s the point.
The core entities
The domain model is actually pretty simple. We typically think most about the following entities:
People are core to the Zooniverse. When talking publically about the Zooniverse I almost always use the term ‘citizen scientist’ or ‘volunteer’ because it feels like an appropriate term for someone who donates their time to one of our projects. When writing code however, the shortest descriptive term that makes sense is usually selected so in our domain model the term we use is User.
A User is exactly what you’d expect, it’s a person, it has a bunch of information associated with it such as a username, an email address, information about which projects they’ve helped with and a host of other bits and bobs. Crucially though for us, a User is the same regardless of which project they’re working – that is Users are pan-Zooniverse. Whether you’re classiying galaxies over at Galaxy Zoo or identifying animals on Snapshot Serengeti we’re associating your efforts with the same User record each time which turns out to be useful for a whole bunch of reasons (more later).
Just as people are core, as are the things that they’re analysing to help us do research. In Old Weather it’s a scanned image of a ship log book, in Planet Hunters it’s a light curve but regardless of the project internally we call all of these things Subjects. A Subject is the thing that we present to a User when we want to them to do something.
Subjects are one of the core entities that we want to behave differently in our system depending upon their particular flavour. A log book in Old Weather is only viewed three times before being retired whereas an image in Galaxy Zoo is shown more than 30 times before retiring. This means that for each project we have a specific Subject class (e.g. GalaxyZooSubject) that inherits its core functionality from a parent Subject class but then extends the functionality with the custom behaviour we need for a particular project.
Subjects are then stored in our database with a collection of extra information a particular Subject sub-class can use for each different project. For example in Galaxy Zoo we might store some metadata associated with the survey telescope that imaged the galaxy and in Cyclone Center we store information about the date, time and position the image was recorded.
These two entities are grouped together as they’re often used to mean broadly the same thing. When a User is presented with a Subject on one of our projects we ask them to do something. This something is called the Task. These Tasks can be grouped together into a Workflow which is essentially just a grouping entity. To be honest we don’t use it very much as most projects just have a single Workflow but in theory it allows us to group a collection of Tasks into a logical unit. In Notes from Nature each step of the transcription (such as ‘What is the location?’) is a separate Task, in Galaxy Zoo, each step of the decision tree is a Task too.
It’s no accident that I’ve introduced these three entities, User, Subject and Task first as a combination of these is what we call a Classification. The Classification is the core unit of human effort produced by the Zooniverse community as it represents what a person saw and what they said about it. We collect a lot of these – across all of the Zooniverse projects to date we must be getting close to 500 million Classifications recorded.
I’ll talk more about what we store in a Classification in a followup the next post about technologies suffice to say now that they store a full description of what the User said about the object. In previous versions of the Zoonivese API software we tried to break these records out into smaller units called Annotations but we don’t do that any more – it was an unnecessary step.
Sometimes we need to group Subjects together for some higher level function. Perhaps it’s to represent a season’s worth of images in Snapshot Serengeti or a particular cell dye staining in Cell Slider. Whatever the reason for grouping, the entity we use to describe this is ‘Group’.
Grouping records is both one of the most useful features Ouroboros offers but also one of the things it took the longest for us to find the right level of abstraction. While a Group can represent an astronomical survey in Galaxy Zoo (such as the Hubble CANDELS survey) or a Ship in Old Weather, it isn’t just enough for a bunch of Subjects to all be associated with each other. There’s often a lots of functionality that goes along with a Group or the Subjects within that is custom for each Zooniverse project. Ultimately we’ve solved this in a similar fashion to Subject – by having per-project subclasses of Groups (i.e. there is a SerengetiGroup that inherits from Group) that can set custom behaviour as required.
Ouroboros (the Zooniverse API) hosts a whole bunch of different Zooniverse projects so it’s probably no surprise that we represent the actual citizen science project within our domain model. No prize for guessing the name of this entity – it’s called Project.
A Project is really just the overarching named entity that Subjects, Classifications and Groups are associated with. Project in Ouroboros does some other cool stuff for us though. It’s the Project that knows about the Groups, its current status (such as how many Subjects are complete) and other adminstrative functions. We also sometimes deliver a slightly different user experience to different Users in what are known as A/B splits – it’s the Project that manages these too.
So that’s about it. There are a few more entities routinely in discussion in the Zooniverse team such as Favourite (something a User favourites when they’re on one of our projects) but they’re really not core to the main operation of a project.
The domain description we’re using today is informed by everthing we’ve learnt over the past five years of building proejcts. It’s also a consequence of how the Zooniverse has been functioning – we try lots of projects in lots of different research domains and so we need a domain model that’s flexible enough to support something like Notes from Nature, Planet Four and Snapshot Seregeti but not so generic that we can’t build rich user experiences.
We’ve also realised that the vast majority of what’s differenct about each project is the user experience and classification interface. We’re always likely to want to put significant effort into developing the best user experience possible and so having an API that abstracts lots of the complexity away and allows us to focus on what’s different about each project is a big win.
Our domain model has also been heavily influenced by the patterns that have emerged working with science teams. In the early years we spent a lot of time abstracting out each step of the User interaction with a Subject into distinct descriptive entities called Annotations. While in theory these were a more ‘complete’ description of what a User did, the science teams rarely used them and almost never in realtime operations. The vast majority of Zooniverse projects to date collect large numbers of Classifications that are write once, read very much later. Realising this has allowed us to worry less about exactly what we’re storing at a given time and focus on storing data structures that are a convenient for the scientists to work with.
Overall the Zooniverse domain model has been a big success. When designing for the Zooniverse we really were developing a new system unlike anything else we knew of. It’s terminology is pervasive in the collaboration and makes conversations much more focussed and efficient which can only be a good thing.
It’s always a good feeling a be making a codebase open and today it’s time to push the latest version of Galaxy Zoo into the open. As I talked about in my blog post a couple of months ago, making open source code the default for Zooniverse is good for everyone involved with the project.
One significant benefit of making code open is that from here on out it’s going to be much easier to have Zooniverse projects translated into your favourite language. When we build a new project we typically extract the content into something called a localisation file (or localization if you prefer your en_US) which is basically just a plain text file that our application uses. You can view that file for our (US) English translation file here and it looks a little like this:
So how do I translate Galaxy Zoo?
I’m glad you asked… It turns out there’s a feature built into the code-hosting platform we’re using (called GitHub) which allows you to basically make your own copy of the Galaxy Zoo codebase. It’s called ‘forking’ and you can read much more about it here but all you need to do to contribute is fork the Galaxy Zoo code repository, add in your new translation file and (there’s a handy script that will generate a template file based on the English version), translate the English values into the new language and send the changes back up to GitHub.
Once you’re happy with the new translation and you’d like us to try it out you can send us a ‘pull request’ (details here). If everything looks good then we can review the changes and pull the new translation into the main Galaxy Zoo codebase. You can see an example of a pull request from Robert Simpson that’s been merged in here.
So what next?
This method of translating projects is pretty new for us and so we’re still finding our way a little here. As a bunch of developers it feels great to be using the awesome collaborative toolset that the GitHub platform offers to open up code and translations to you all.
If you visit Zooniverse Home today you may notice that a few things have changed. For a while now we’ve wanted to improve the design of the site to better handle the growing range of projects that are part of the Zooniverse. With the updates released today we’ve completely reworked the look and feel of the site. We’ve organised projects into different categories, starting with Space, Climate and Humanities. There are lots of new categories coming soon.
Today we’re also launching an entirely new type of project as part of ‘Zooniverse Labs’. To date all Zooniverse projects come with the guarantee that your work will contribute directly toward real research output (usually academic papers), however this can make it hard to try new ideas. As the name suggests, Zooniverse Labs is about experimentation and we’re excited about launching new, often smaller-scale websites that continue to stretch the boundaries of science online. More news on this very soon.
Another new part of the redesigned Zooniverse homepage is the ‘My Projects’ section that allows you to see your contributions to the Zooniverse, and what else you might like to try out. This part of the site is very much in beta, and we hope to add to it in the coming months.
We hope you like the new look, and that it helps you explore more of what the Zooniverse has to offer.
Last week the majority of The Zooniverse websites suffered their first major outage for over 2 years. Firstly I wanted to reassure you that the valuable classifications you all provide was always secure but perhaps more importantly I wanted to say sorry – we should have recovered more quickly from this incident and we’ll be working hard this week to put better systems in place to enable us to recover more rapidly in the future.
For those who’d like more of the gory details, read on…
Many of you will know the story of the launch of the original Galaxy Zoo (in July 2007); just a few hours after launch the website hosted by the SDSS web team and Johns Hopkins University crashed under the strain of thousands of visitors to the site. Thankfully due to the heroic efforts of the JHU team (involving a new web server being built) the Galaxy Zoo site recovered and a community of hundreds of thousands of zooties was born.
Fast-forward to February 2009 and we were planning for the launch of Galaxy Zoo 2. This time we knew that we were going to have a busy launch day – Chris was once again going to be on BBC breakfast. So that we could keep the site running well during the extremely busy periods we looked to commercial solutions for scalable web hosting. There are a number of potential choices in this arena but by far the most popular and reliable service was that offered by Amazon (yes the book store) and their Web Services platform. While I won’t dig into the technical details of Amazon Web Services (AWS) here, the fundamental difference when running web sites on AWS is that you have a collection of ‘virtual machines’ rather than physical servers.
If any of you have ever run Virtual PC or VMWare on your own computer then you’ll already realise that using machine virtualisation it’s possible to run a number of virtual machines on a single physical computer. This is exactly what AWS do except they do it at a massive scale (millions of virtual machines) and have some fantastic tools to help you build new virtual servers. One particularly attractive feature of using these virtual machines is that you can essentially have as many as you want and you only pay by the hour. At one point on the Galaxy Zoo 2 launch day we had 20 virtual machines running the Galaxy Zoo website, API and databases. 2 days later we were running only 3. The ability to scale up (and down) in realtime the number of virtual machines means that we are able to cope with huge variations in the traffic that a particular Zooniverse site may be receiving.
The outage last week
As I write this blog post we currently have 22 virtual servers running on AWS. That includes all of The Zooniverse projects, the database servers, caching servers for Planet Hunters, our blogs, the forums and Talk and much more. Amazon have a number of hosting ‘regions’ that are essentially different geographical locations where they have datacenters. We happen to host in the ‘us-east’ region in Virginia – conveniently placed for both Europe and American traffic.
We have a number of tools in place that monitor the availability of our web sites and last Thursday at about 9am GMT I received a text-message notification that our login server (login.zooiverse.org) was down. We have a rota within the dev team for keeping an eye on the production web servers and last week it was my turn to be on call.
I quickly logged on to our control panel and saw that there was a problem with the virtual machine and attempted a reboot. At this point I also started to receive notifications that a number of the Zooniverse project sites were also unavailable. At this point realising that something rather unusual was going on I checked the Amazon status page which was ‘all green’, i.e. no known issues. Amazon can be a little slow to update this page so I also checked Twitter (https://twitter.com/#!/search/aws). Twitter was awash with people complaining that their sites were down and that they couldn’t access their virtual machines. Although this wasn’t ‘good’ news, it’s always helpful to understand in a situation such as this if the issue is with the code that we run on the servers or the servers themselves.
Waiting for a fix?
At this point we rapidly put up holding pages on our project sites and reviewed the status of each project site and service that we run. As the morning progressed it became clear that the outage was rather serious and actually became significantly worse for The Zooniverse as in turn, each of the database servers that we run became inaccessible. We take great care to execute nightly backups of all of our databases and so when the problems started the oldest backup was 4 hours old. With hindsight when the problems first started we should have immediately moved The Zooniverse servers to a different AWS region (this is actually what we did do with login.zooniverse.org) and booted up new database servers with the backup from the night before however we were reluctant to do this because of the need to reintegrate the classifications made by the community during the outage. But hindsight is always 20-20 and this isn’t what we did. Instead, believing that a fix for the current situation was only a matter of hours away we waited for Amazon to fix the problem for us.
As the day progressed a number of the sites became available for short periods and then inaccessible again. It wasn’t a fun day to be on operations duty with servers continually going up and down. Worse, our blogs were also unavailable so we only had Twitter to communicate what was going on. At about 11pm on the first evening a number of The Zooniverse projects had been up for a number of hours and things looked to be improving. Chris and I spoke on the phone and we agreed that if things weren’t completely fixed by the morning we’d move the web stack early Friday morning.
Friday arrived and the situation was slightly improved but the majority of the our projects were still in maintenance mode so I set about rebuilding The Zoonivere web stack. Three out of five of our databases were accessible again and so I took a quick backup of all of the three and booted up replacement databases and web servers. This was all we needed to restore the majority of the projects and by lunchtime on Friday we pretty much had a fully working Zooniverse again. The database server used by the blogs and forums took a little longer to recover and so it was Friday evening before they were back up.
We weren’t alone in having issues with AWS last week. Sites such as Reddit and Foursquare were also affected as were thousands of other users of the service. I think the team at the Q&A site Quora put it best when they said ‘We’d point fingers, but we wouldn’t be where we are today without EC2.’ on their holding page and this is certainly true of The Zooniverse.
Over the past 2 years we’ve only been able to deliver the number of projects that we have and the performance and uptime that we all enjoy due to the power, flexibility and reliability of the AWS platform. Amazon have developed a number of services that mean that it was possible (in theory) to protect against the failure they experienced in the US-east region last week. Netflix was notably absent from the list of AWS hosted sites affected by the AWS downtime and this is because they’ve gone to huge effort (and expense) to protect themselves against such a scenario (http://techblog.netflix.com/2010/12/5-lessons-weve-learned-using-aws.html).
Unfortunately with the limited resources available to The Zooniverse we’re not able to build a resilient web stack as Netflix however there are a number of steps we’ll be taking this to make sure that we’re in a much better position to recover from a similar incident in the future so that we experience downtime of minutes and hours rather than days.
Arfon & The Tech Team
It’s that time of year again. Here in the Northern Hemisphere the flowers are starting to pop up, the ground hogs are coming out of hibernation, and signs everywhere indicate that it’s time to invest in the latest styles. Here in the Zooniverse, I’m not sure we can generally be accused of clinging to the newest fashions, but security is always on our mind, and with the release of WordPress 3.1 we did need to do a few updates. While I was inside the guts of our server, I admit I went a little nuts and decided (in conversations with a bunch of people, including maybe you) that it might be nice to give our blogs matching outfits.
The new Zoo theme you see before you comes from one of my favourite WordPress design companies: Woo Themes. The theme is an adaption of their social media ready MyStream, and it’s my hope that all the happy little links to classifying and social media will form a central jumping off point to see what’s new, to share what you like, and to do science..
There are still a few things wanting. The Mendeley pages are, um, empty, but that will get fixed soon. The posts are also missing a lot of comments. This is where you come in. We know there are a ton of people contributing to science through the Zooniverse, and we’d like at least a large chunk of that ton reading this blog. Next time someone asks you about what you’re doing online when you’re clicking away, send them to the blog. Get them involved one blog post and one click at a time.