52 Years of Human Effort

At ZooCon last week I spoke about the scale of human attention that the Zooniverse receives. One of my favourite stats in this realm (from Clay Shirky’s book ‘Cognitive Surplus’) is that in the USA, adults cumulatively spend about 200 billion hours watching TV every year. By contrast it took 100 million hours of combined effort for Wikipedia to reach its status as the world’s encyclopaedia.

In the previous year people collectively spent just shy of half a million hours working on Zooniverse projects. Better put, the community invested about 52 years worth of effort[1]. That’s to say that if an individual sat down and did nothing but classify on Zooniverse sites for 52 years they’d only just have done the same amount of work as our community did between June 2012 and June 2013. The number is always rising too. Citizen science is amazing!

Another way of thinking about it is to convert this time into Full Time Equivalents (FTEs). One person working 40 hours per week, for 50 weeks a year works for 2000 hours a year – that’s 1 FTE. So our 460,000 hours of Zooniverse effort are equivalent to 230 FTEs. It’s as if we had a building with 230 people in who only came in every day to click on Zooniverse projects.

Zooniverse Effort Distribution to June 2013

This amazing investment by the community is not broken down evenly of course, as the above ‘snail’ chart shows. In fact Planet Hunters alone would occupy 62 of the people in our fictional building: the project took up 27% of the effort in the last year. Galaxy Zoo took 17%, which means it had almost 9 years of your effort all to itself. Planet Four had a meteoric launch on the BBC’s Stargazing Live less than six months ago and since that time it has gobbled up just over 5 years of human attention – 10% of the whole for the past year.

What’s wonderful is that our 230 metaphorical workers, and the 52 years they represent, are not confined to one building or one crazed click-worker. Our community is made of hundreds of thousands of individuals across the world – 850,000 of whom have signed up through zooniverse.org. Some of them have contributed a single classification, others have given our researchers far, far more of their time and attention. Through clicking on our sites, discussing ideas on Talk, or just spreading the word: Zooniverse volunteers are making a significant contribution to research in areas from astronomy to zoology.

Congratulations to everyone who’s taken part and let’s hope this number increases again by next year!

[1] In my ZooCon talk I incorrectly gave the figure of 35 years. This was wrong for two reasons; firstly, I had neglected Andromeda Project, Planet Four and Snapshot Serengeti for technical reasons. Secondly I had calculated the numbers incorrectly, in my rush to get my slides ready, and I underestimated them all by about 20%.

Anyone Fancy an Asteroid?

This could be the Asteroid Zoo logo
This might be the Asteroid Zoo logo

Anyone interested in astronomy on the web will be aware of the fabulous success of Planetary Resources’ fundraising effort to build and launch the ARKYD space telescope. They’ve already raised more than a million dollars – helped in part by a cunning plan to let you take a picture of yourself in space – but they’re not stopping there. With three days to go, we’re delighted to announce that they’re going to try and help us help Zooniverse volunteers hunt for potentially hazardous asteroids.

The latest stretch goal is to support the development by the Zooniverse of a citizen science asteroid hunt. If the new target is hit, we’ll build a system that uses more than 3 million images, taken data from the Catalina Sky Survey – the survey responsible for nearly half of the near Earth asteroid discoveries in the last fifteen years. We know there are asteroids out that are waiting to be discovered, and we’re willing to bet that the existing routines used to scan through the survey data didn’t find them all.

Recent discoveries of near-Earth objects; Catalina's the big purple part.
Recent discoveries of near-Earth objects; Catalina’s the big purple part.

Anyone who’s followed the Zooniverse over the last few years knows that we believe in doing projects that make authentic contributions to science, and so I’m especially pleased that the project with Planetary Resources is also focused on improving machine learning solutions to asteroid hunting. Rather like our supernova project, an ideal outcome would be to use the classifications provided by volunteers to improve automated searching and suggest new methods by which machines might take up the strain. In the meantime, though, there are new (small) worlds to find – with your help, we’ll be launching the search for them soon.

I’ve put my money where my mouth is already, and if you can afford it then I hope you’ll follow the link and donate so we can all go asteroid hunting. You can also watch their Kickstarter video to see what they’re trying to do.

PRI-KS-Banner-428x60

How the Zooniverse Works: Tools and Technologies

In my last post I described at length the domain model that we use to describe conceptually what the Zooniverse does. That wouldn’t mean much without an implementation of that model and so in this post I’m going to describe some of the tools and technologies that we use to actually run our citizen science projects.

The lifecycle of a Zooniverse project

Let’s think a little more about what happens when you visit a project such as Snapshot Serengeti. Ignoring all of the to-and-fro that your web browser does to work out where the domain name ‘snapshotserengeti.org’ points to, once it’s figured this and a few other details out you basically get sent a website that your browser renders for you. For the website to function as a Zooniverse project a few things are essential:

  1. You need to be able to view images (or listen to audio or watch a video) that we and the science team need your help analysing.
  2. You need to be able to log in with your Zooniverse account.
  3. We need to capture back what you said when doing the citizen science analysis task.
  4. Save out favourite images to your profile.
  5. View recent images you’ve seen in your profile.
  6. Discuss these images with the community.

It turns out that pretty much all of the functionality mentioned above is for us delivered by an application we call Ouroboros as an API layer and a website (such as Snapshot Serengeti) talking to it.

Screen Shot 2013-06-26 at 8.20.51 AM

Ouroboros – or ‘why the simplest API that works is probably all you need’.

So what is Ouroboros? It provides an API (REST/JSON) that allows you to build a Zooniverse project that has all of the core components (1-6) listed above. Technology-wise it’s a custom Ruby on Rails application (Rails 3.2) that uses MongoDB to store data and Redis as a query cache all running on Amazon Web Services. It’s probably utterly useless to anyone but us but for our needs it’s just about perfect.

Screen Shot 2013-06-26 at 8.01.30 AM

At the Zooniverse we’re optimised for a few different things. In no particular order of priority they are:

  1. Volume – we want to be able to build lots of projects.
  2. Science – we want it to be easy to do science with the efforts of our community.
  3. Scale/performance – we want to be able to have millions of people come to our proejcts and them to stay up.
  4. Availability – we’d prefer our websites to be ‘up’ and not ‘down’.
  5. Cost – we want to keep costs at a manageable level.

Pretty much all of these requirements point to having a shared API (Ouroboros) that serves a large number of projects (I’ll argue #4 in the pub with anyone who really wants to push me on it).

Running a core API that serves many projects makes you take the maintenance and health of that application pretty seriously. Should Ouroboros throw a wobbly then we’d currently take out about 10 Zooniverse projects at once and this is only set to increase. This means we’ve thought a lot about how to scale the application for times when we’re busy and we also spend significant amounts of time monitoring the application performance and tuning code where necessary. I mentioned that cost is a factor – running a central API means that when the Zooniverse is quiet and there aren’t many people about we can scale back the number of servers we’re running (automagically on Amazon Web Services) to a minimal level.

We’ve not always built our projects this way. The original Galaxy Zoo (2007) was an ASP/web forms application, projects between Galaxy Zoo 2 and SETI Live were all separate web applications, many of them built using an application called The Juggernaut. Building standalone applications every time not only made it difficult to maintain our projects but we also found ourselves writing very similar (but subtly different) code many times between projects, code for things like choosing which Subject to show next.

Ouroboros is an evolution of our thinking about how to build projects, what’s important and generalisable and what isn’t. At it’s most basic it’s a really fast Subject allocator and Classification collector. Our realisation over the last few years was that the vast majority of what’s different about each project is the user experience and classification interface and this has nothing to do with the API.

Subjects out, Classifications back in.
Subjects out, Classifications back in.

The actual projects

The point of having a central API is that when we want to build a new project we’re already working with a very familiar toolset – the way we log people in, do signup forms, ask for a Subject, send back Classifications – all of this is completely standard. In fact if you’re building in JavaScript (which we almost always are these days) then there’s a client library called ‘Zooniverse’ (meta I know) available here on GitHub.

Having a standard API and client library for talking to it meant that we built the Zooniverse project Planet Four in less than 1 week! That’s not to say it’s trivial to build projects, it’s definitely not, but it is getting easier. And having this standardised way of communicating with the core Zooniverse means that the bulk of the effort when building Planet Four was exactly where it should be – the fan drawing tools – the bit that’s different from any of our other projects.

Screen Shot 2013-06-26 at 8.19.22 AM

So how do we actually build our projects these days? We build our projects as JavaScript web applications using JavaScript web frameworks such as Spine JS, Backbone or something completely custom. The point being, that all of the logic for how the interface should behave is baked into the JavaScript application – Ouroboros doesn’t try and help with any of this stuff.

Currently the majority of our projects are hosted using the Amazon S3 static website hosting service. The benefits of this are numerous but key ones for us are:

  1. There’s no webserver serving the site content, that is http://www.galaxyzoo.org resolves to an S3 bucket. When you access the Galaxy Zoo site S3 does all of the hard work and we just pay for the bandwidth from S3 to your computer.
  2. Deploying is easy. When we want to put out a new version of any of our sites we just upload new timestamped versions of the files and your browser starts using them instead.
  3. It’s S3 – Amazon S3 is a quite remarkable service –  a significant fraction of the web is using it. Currently hosting more than 2 trillion (yes that’s 12 zeroes) objects and regularly serving more than 1 million requests for data per second the S3 service is built to scale and we get to use it (and so can you).

Amazon S3 is a static webhost (i.e. you can’t have any server-side code running) so how do we make a static website into a Zooniverse project you can log in to when we can’t access database records? The main site functions just fine – these JavaScript applications (such as the current Galaxy Zoo or any recent Zooniverse project) implement what is different about the project’s interface. We then use a small invisible iFrame on each website that actually points to api.zooniverse.org which is Ouroboros. When you use a login form we actually set a cookie on this domain and then send all of our requests back to the API through this iFrame. This approach is a little unusual and with browsers tightening up the restrictions on third-party cookies if looks like we might need to swap it out for a different approach but for now it’s working well.

Summing up

If you’re like me then when you read something you read the opening, look at the pictures and then skip to the conclusions. I’ll summarise here just incase you’re doing that too:

In the Zooniverse there’s a clear separation between the API (Ouroboros) and the citizen science projects that the community interact with. Ouroboros is a custom-built, highly scalable application built in Ruby on Rails, that runs on Amazon Web Services and uses MongoDB, Redis and a few other fancy technologies to do its thing.

The actual citizen science projects that people interact with are these days all pure JavaScript applications that are hosted on Amazon S3 and they’re pretty much all open source. They’re generally still bespoke applications each time but share common code for talking to Ouroboros.

What I didn’t talk about in this post are the hardest bits we’ve solved in Ouroboros – namely all of the logic about how to make finding Subjects for people quickly and other ‘smart stuff’. That’s coming up next.

Putting the ‘citizen’ in ‘citizen science’

I was slightly surprised to see my twitter feed this morning filling up with comments about how the term ‘citizen’ appears in writing about science, and about public engagement with science. This seems to be coming from Roland Jackson’s post in response to the publication of a report called ‘What publics? When?’ from Sciencewise, an organisation that gives advice on science policy to government. Roland’s point is that perhaps the reason we get ourselves in a tangle when talking about public engagement is the word ‘public’, thinking that ‘citizen’ does a better job of breaking down the divide between ‘us’ doing the engagement and the ‘public’ being engaged. (There’s another engaging comparison on Nottingham’s ‘Making Science Public’ blog.)

In such contexts, I reckon ‘citizen’ comes up most often in ‘citizen science’, and I thought it might be interesting to say something about our use of the term. It’s how we describe our projects in papers, and we chose it mostly because we didn’t like the term ‘crowdsourcing’, which never seem adequate for projects which very quickly demonstrated that they could grow way beyond simple requests for a community to complete a task. We quickly realised we wanted people to make discoveries, to follow them up themselves and to chase down their own research questions and crowdsourcing just doesn’t describe that. I also liked the fact that anyone – professional or amateur, project designer or participant – could be a citizen scientist.

We clearly weren’t that confident, though. Although the core collaboration that builds and runs the Zooniverse is the Citizen Science Alliance, we’ve mostly reserved that term for grant applications rather than using in the real word. (Let along the problems of being a citizen science group which produces humanities projects either deliberately or accidentally.) This reticence isn’t misplaced; it reflects my firm belief that noone in the history of the world has ever set down at a computer, opened their web browser and thought ‘I’m a citizen scientist. Let’s do some citizen science’. Zooniverse participants are fans of one or more of our projects, and they tend to have stumbled in and then found a comfortable environment where they can do exciting things, rather than started off by looking for a science project. (This is also, I think, reflected in the lack of traffic we get from citizen science portals like SciStarter.)

‘Citizen’ science, from this perspective isn’t any more inclusive than talking about ‘public engagement’. The most common alternative (‘PPSR’ or Public Participation in Scientific Research) doesn’t help either. If names are important, we need a new one for this thing that we’re doing, but as the person who has been most consistently wrong about naming Zooniverse projects (I voted against Galaxy Zoo, for starters!) I’m the last person to ask. Maybe we should crowdsource a solution….

Chris

PS I’m reminded of this slide deck from Arfon which proposes CBSR (Community Based Scientific Research) and PPFCSM (Public Participation as a Fundamental Component of the Scientific Method), although I think he’s kidding on the last one.

Live from ZooCon

Hello for the Martin Wood Lecture Theatre in Oxford’s Department of Physics which is playing host to a crowd of Zooniverse volunteers and project members for ZooCon13. We’re recording the talks for later broadcast, but as a sneak preview I thought I’d liveblog the event.

Talk 1 – SpaceWarps

We’re kicking off with Aprajita Verma from Oxford and from Space Warps, the newest Zooniverse astronomy project. As is traditional when talking about gravitational lensing – the bending of light by matter, she’s using Phil Marshall’s Galaxy in a Wine Glass video.

Galaxy In a Wine Glass

SpaceWarps is much needed – LSST, the next generation of survey telescope, will produce something like 10000 galaxy scale lenses. It’s designed to map a very wide area of sky, which is perfect for finding rare things like lenses – and this will produce a lot of work as traditional lens hunting is very labour intensive. Not only do they need to be found, but they then need to be modeled.

There are three lenses in this image - one real, and two simulated. Spotting the difference is hard...
There are three lenses in this image – one real, and two simulated. Spotting the difference is hard…

Luckily – we have effort! 2 million 6 million classifications have been recorded already from over 8000 people. Particularly pleasing for me is that 40% of those people are discussing things on Talk – this is essential as lenses are complicated things and the interesting ones are going to be found through discussion. The team are doing dynamic assessment of the results, retiring images that no longer need classification – I especially liked their division of classifiers into ‘Optimists’ – who get lenses right but also get excited about lots of things that aren’t lenses – ‘Pessimists’ – who correctly dismiss non-lenses but get rid of lenses too – the ‘Astute’, who get everything right and the ‘Obtuse’, those who get everything wrong. Luckily, we have lots of astute classifiers and almost none who are obtuse, as evidenced by a sneak preview of the first few discoveries (more on those next week).

Talk 2 – Cosmic Evolution from Galaxy Zoo

Next up is Karen Masters of Portsmouth and Galaxy Zoo, talking about science results from the Zooniverse’s oldest project. It’s already clear there is lots of ground to cover in this conference and Karen’s bounding through a brief history of observational astronomy, noting the conceptual leap required to go from thinking about the Milky Way, our galaxy, and an expanding Universe filled with billions of the blighters. Karen just showed a cool movie showing the parts of the sky that have been mapped by the Sloan Digital Sky Survey, which provided images for the early incarnations of Galaxy Zoo.

A Galaxy in need of classifications.
A Galaxy in need of classifications.

In going through the history of Galaxy Zoo, Karen reminds me that the original BBC news story on Galaxy Zoo claims that we hope that 30,000 people will eventually take part. We smashed that on day one if I remember correctly. (There’s also a factual error in that news story – if anyone tells me what it is via Twitter (@chrislintott) or in person they can have a pint). While I relieve ancient history, Karen’s talking about her work on red spirals: most spirals are blue, but Galaxy Zoo helped us find lots of red ones and Karen says that the Milky Way may even be on its way to becoming one. The work on the red spirals was part of a serious shift in how we think about galaxy formation – a few years back that story was all about mergers but now it’s thought that lots of galaxies form and evolve (including fading from being a blue spiral to a red spiral) in slower, less spectacular ways.

Of course, one of the advantages of citizen science that Galaxy Zoo demonstrated was the ability of classifiers to discover the weird and wonderful. Recent examples include the bulgeless galaxies – spirals which are guaranteed not to have had a merger within the last few billion years – and a set of galaxies (mostly red spirals!) with massive bars at their centre. In even better news, we have time on the Very Large Array (I REPEAT – WE HAVE TIME ON THE VERY LARGE ARRAY!) to follow up on these things.

WE GOT TIME ON THE VERY LARGE ARRAY (this is a picture of some of it)
WE GOT TIME ON THE VERY LARGE ARRAY (this is a picture of some of it)

I’m really quite excited about the VLA. I’ve always wanted to use it.

Talk 3 – New Uses for Old Weather

We’re taking a break from astronomy with Philip Brohan from the Old Weather project – he’s explaining that scientists need historical observations to constrain their models of how the climate behaves. Lacking the ability to stick a weather satellite in the Tardis and head back in time, we need to scrabble around for old records, an idea that dates back to Beaufort of wind scale fame.

Philip in the gloaming, beneath an Old Weather slide.
Philip in the gloaming, beneath an Old Weather slide.

This is great, but the supercomputers can’t read the 73 million logbook pages we’d like to sort through – hence the need for volunteers. So far more than a million logbook pages have been processed by the project – a small fraction of the total needed but a very useful quantity! Most of these volunteers are attracted by the historical information that the logs fortuitously contain – Philip is currently beneath a slide showing a log book containing both the information that the ship’s company are fitted with seal-skin boots, and that 23 dogs are received on board. (Why? Surely not for food…).

It’s all got a bit gruesome now – six dead bodies are being placed in alcohol. Luckily we’re swiftly on to HMS Tarantula, where their anemometer is infested with ants. The current set of logbooks have more famous events; in particular, the logbooks of the Jeannette show the discovery of the Arctic island now named after it (upon which nothing but ice sheets grow). The fact that we have these logbooks at all is a miracle; the ship was crushed by the ice and the crew (most of whom perished) chose to carry the scientific records with them as they struggled to safety.

Images from and about the Jeanette, including in the bottom left an artist's impression of the chest of logbooks being saved.
Images from and about the Jeanette, including in the bottom left an artist’s impression of the chest of logbooks being saved.

As well as the climate and the history, Phil says, the third important aspect of Old Weather is the people. The project’s made particularly good use of the forum, which has steered the project in new directions and provided a home for discussion of things we never thought to look for, as well as art and verse. The latter was particularly inspired by the tragic loss of the chocolate aboard the HMS Manuta. Before rolling the credits listing his more than 17,000 collaborators, Philip ended these tales by noting that to make a serious dent on the archives we need to speed up by a factor of ten, a challenge the Zooniverse is happy to accept.

Talk 4 – The Future of Galaxy Zoo

Back to the Universe now, and Oxford’s Brooke Simmons is able to start her talk on what’s coming up for Galaxy Zoo by reminding the crowd that the data release paper for the second version of Galaxy Zoo is now with the referee. At about 30 pages, it’s as short as it could possibly be, showing the amount of effort that goes into dealing with classifications received via a large citizen science project.

Brooke’s now explaining the need – with Galaxy Zoo trying to reach back to a time not that long after the Big Bang – for us to use all sorts of tests to understand how our classifications work. Showing images of the same galaxies shifted to higher and higher redshifts (further and further away) it’s clear that classifications will change just because it’s harder to see what’s going on when galaxies get further away. We’re also playing with supercomputer simulations of the evolution of galaxies which shows how things change over time.

It’s not all about simulations, though – we’re thinking about moving beyond the optical range of the spectrum and looking at galaxies in the ultraviolet and infrared. The former, from a satellite called GALEX, shows only the youngest starts, the latter, from a survey called UKIDSS which covers about a third of the Sloan area, the dust and older stars. Also on the agenda are more advanced tools, like those which power the Galaxy Zoo Navigator which allows primarily school groups to look at the statistics of their classifications.

Correction I wasn’t listening properly to Aprajita; Spacewarps got 2 million classifications in the first week, and at the time of ZooCon was over 6 million. I’ve corrected the post. 1st July 2013.

Got An Idea for a Zooniverse Project? Propose One

For more than a year, we’ve been openly accepting proposals for new Zooniverse projects and this has brought to life projects such as Seafloor Explorer, Snapshot Serengeti, Notes from Nature and Space Warps.

Yesterday, five Zooniverse projects were featured in The Biologist’s 10 Great Citizen Science Projects – several of them were ideas proposed by researchers we had never met before they came to us and said ‘hey, I have a cool idea for a project‘. We’ve also recently seen articles about how the Zooniverse might be able to help in a crisis and how we provide an excellent avenue for proactive procrastination. Citizen science projects are wide and varied and lots of researchers have great ideas.

So this is a good time to remind everyone that we want to hear from researchers with ideas for Zooniverse projects. If that’s you: propose a project! We have funding from the Alfred P. Sloan Foundation to build your great ideas and work with you to further science. We also have an incredibly talented team of designers, developers, educators and researchers who want to make your idea into an awesome new Zooniverse project.

If you want to know more about this, you can get in touch with any of the team or via our general email address or on Twitter @the_zooniverse. We’re currently working on projects that were proposed earlier this year and we’ll be announcing them soon. Maybe yours will be next?

How the Zooniverse Works: The Domain Model

We talk a lot in the Zooniverse about research, whether it’s interesting stories from the community, a new paper based upon the combined efforts of the volunteers and the science teams or conferences we might be going to.

One thing we don’t spend much time talking about is the technology solutions we use to build the Zooniverse sites, the lessons we’ve learned as a team building more than twenty five citizen science projects over the past five years and where we think the technical challenges still remain in building out the Zooniverse into something bigger and better.

There’s a lot to write here so I’m going to break this into three separate blog posts. The first is going to be entirely about the domain model that we use to describe what we do. When it seems relevant I’ll talk a little more about implementation details of these domain entities in our code too. The second will be about technologies and the infrastructure we run the Zooniverse atop of and the third will be about making smarter systems.

Why bother with a domain model?

Firstly it’s worth spending a little time talking about why we need a domain model. In my mind the primary reason for having a domain model is that it gives the team, whether it’s the developers, scientists, educators or designers working on a new project a shared vocabulary to talk about the system we’re building together. It means that when I use the term ‘Classification’ everyone in the team understands that I’m talking about the thing we store in the database that represents a single analysis/interaction of a Zooniverse volunteer with a piece of data (such as a picture of a galaxy), which by the way we call a ‘Subject’.

Technology wise the Zooniverse is split into a core set of web services (or Application Programming Interface, API) that serve up data and collect it back (more about that later) and some web applications (the Zooniverse projects) that talk to these services. The domain model we use is almost entirely a description of the internals of the core Zooniverse API called Ouroboros and this is an application that is designed to support all of the Zooniverse projects which means that some of the terms we use might sound overly generic. That’s the point.

The core entities

The domain model is actually pretty simple. We typically think most about the following entities:

User

People are core to the Zooniverse. When talking publically about the Zooniverse I almost always use the term ‘citizen scientist’ or ‘volunteer’ because it feels like an appropriate term for someone who donates their time to one of our projects. When writing code however, the shortest descriptive term that makes sense is usually selected so in our domain model the term we use is User.

A User is exactly what you’d expect, it’s a person, it has a bunch of information associated with it such as a username, an email address, information about which projects they’ve helped with and a host of other bits and bobs. Crucially though for us, a User is the same regardless of which project they’re working – that is Users are pan-Zooniverse. Whether you’re classiying galaxies over at Galaxy Zoo or identifying animals on Snapshot Serengeti we’re associating your efforts with the same User record each time which turns out to be useful for a whole bunch of reasons (more later).

Subject

Just as people are core, as are the things that they’re analysing to help us do research. In Old Weather it’s a scanned image of a ship log book, in Planet Hunters it’s a light curve but regardless of the project internally we call all of these things Subjects. A Subject is the thing that we present to a User when we want to them to do something.

Subjects are one of the core entities that we want to behave differently in our system depending upon their particular flavour. A log book in Old Weather is only viewed three times before being retired whereas an image in Galaxy Zoo is shown more than 30 times before retiring. This means that for each project we have a specific Subject class (e.g. GalaxyZooSubject) that inherits its core functionality from a parent Subject class but then extends the functionality with the custom behaviour we need for a particular project.

Subjects are then stored in our database with a collection of extra information a particular Subject sub-class can use for each different project. For example in Galaxy Zoo we might store some metadata associated with the survey telescope that imaged the galaxy and in Cyclone Center we store information about the date, time and position the image was recorded.

Workflow/Task

These two entities are grouped together as they’re often used to mean broadly the same thing. When a User is presented with a Subject on one of our projects we ask them to do something. This something is called the Task. These Tasks can be grouped together into a Workflow which is essentially just a grouping entity. To be honest we don’t use it very much as most projects just have a single Workflow but in theory it allows us to group a collection of Tasks into a logical unit. In Notes from Nature each step of the transcription (such as ‘What is the location?’) is a separate Task, in Galaxy Zoo, each step of the decision tree is a Task too.

Classification

It’s no accident that I’ve introduced these three entities, User, Subject and Task first as a combination of these is what we call a Classification. The Classification is the core unit of human effort produced by the Zooniverse community as it represents what a person saw and what they said about it. We collect a lot of these – across all of the Zooniverse projects to date we must be getting close to 500 million Classifications recorded.

I’ll talk more about what we store in a Classification in a followup the next post about technologies suffice to say now that they store a full description of what the User said about the object. In previous versions of the Zoonivese API software we tried to break these records out into smaller units called Annotations but we don’t do that any more – it was an unnecessary step.

Screen Shot 2013-06-20 at 2.36.12 PM

Group

Sometimes we need to group Subjects together for some higher level function. Perhaps it’s to represent a season’s worth of images in Snapshot Serengeti or a particular cell dye staining in Cell Slider. Whatever the reason for grouping, the entity we use to describe this is ‘Group’.

Grouping records is both one of the most useful features Ouroboros offers but also one of the things it took the longest for us to find the right level of abstraction. While a Group can represent an astronomical survey in Galaxy Zoo (such as the Hubble CANDELS survey) or a Ship in Old Weather, it isn’t just enough for a bunch of Subjects to all be associated with each other. There’s often a lots of functionality that goes along with a Group or the Subjects within that is custom for each Zooniverse project. Ultimately we’ve solved this in a similar fashion to Subject – by having per-project subclasses of Groups (i.e. there is a SerengetiGroup that inherits from Group) that can set custom behaviour as required.

Project

Ouroboros (the Zooniverse API) hosts a whole bunch of different Zooniverse projects so it’s probably no surprise that we represent the actual citizen science project within our domain model. No prize for guessing the name of this entity – it’s called Project.

A Project is really just the overarching named entity that Subjects, Classifications and Groups are associated with. Project in Ouroboros does some other cool stuff for us though. It’s the Project that knows about the Groups, its current status (such as how many Subjects are complete) and other adminstrative functions. We also sometimes deliver a slightly different user experience to different Users in what are known as A/B splits – it’s the Project that manages these too.

So that’s about it. There are a few more entities routinely in discussion in the Zooniverse team such as Favourite (something a User favourites when they’re on one of our projects) but they’re really not core to the main operation of a project.

The domain description we’re using today is informed by everthing we’ve learnt over the past five years of building proejcts. It’s also a consequence of how the Zooniverse has been functioning – we try lots of projects in lots of different research domains and so we need a domain model that’s flexible enough to support something like Notes from Nature, Planet Four and Snapshot Seregeti but not so generic that we can’t build rich user experiences.

We’ve also realised that the vast majority of what’s differenct about each project is the user experience and classification interface. We’re always likely to want to put significant effort into developing the best user experience possible and so having an API that abstracts lots of the complexity away and allows us to focus on what’s different about each project is a big win.

Our domain model has also been heavily influenced by the patterns that have emerged working with science teams. In the early years we spent a lot of time abstracting out each step of the User interaction with a Subject into distinct descriptive entities called Annotations. While in theory these were a more ‘complete’ description of what a User did, the science teams rarely used them and almost never in realtime operations. The vast majority of Zooniverse projects to date collect large numbers of Classifications that are write once, read very much later. Realising this has allowed us to worry less about exactly what we’re storing at a given time and focus on storing data structures that are a convenient for the scientists to work with.

Overall the Zooniverse domain model has been a big success. When designing for the Zooniverse we really were developing a new system unlike anything else we knew of. It’s terminology is pervasive in the collaboration and makes conversations much more focussed and efficient which can only be a good thing.