New Talk Feature: Automatic Favourites Collection

Today we’ve added a new feature to all the Zooniverse sites that use the new version of Talk. As you’ll know, most of our projects allow you to save ‘favourites’ – a list of things that are either cool/interesting/worthy of keeping and something to refer back to later. One often asked for feature is for this collection of favourites in the project to be available in Talk as a collection.

Today we’ve added this feature and from now on when you favourite something on the main site (e.g. Galaxy Zoo) it will automagically appear in a collection called ‘Favourites’ on Talk. That means you can discuss, share or even import them as a data source into tools.zooniverse.org

Enjoy!

How the Zooniverse Works: Keeping It Personal

This the the third post in a series about how, at a high level, the Zooniverse collection of citizen science projects work. In the first post I describe the core domain model description that we use – something that turns out to be a crucial part of faciliating conversation between scienctists and developers. In the second I covered about some of the core technologies that keep things running smoothly. In this and the next few posts I’m going to talk about parts of the Zooniverse that are subtle but important optimisations. Things such as how we pick which Subject to show to someone next, how we decide when a Subject is complete, and measuring the quality of a person’s Classifications.

Much of what I’m about to describe probably isn’t obvious to the casual observer but these are some of the pieces of the Zooniverse technical puzzle that as a team we’re most proud of and have taken many iterations over the past five years to get right. This post is about how we decide what to show to you next.

A Quick Refresher

At its most basic, a Zooniverse citizen science project is simply a website that shows you some data images, audio or plots, asks you to perform some kind of analysis on interpretation on it and collects back what you said. As I described in my previous post we’ve abstracted most of the data-part of that workflow into an API called Ouroboros which handles functionality such as login, serving up Subjects and collecting back user-generated Classifications.

Keeping it Fast

The ability for our infrastructure to scale quickly and predictably is a major technical requirement for us. We’ve been fortunate over the past few years to receive a fair bit of attention in the press which can result in tens or hundreds of thousands of people coming to our projects in a very short period of time. When you’re dealing with visitor numbers at that scale ideally you want everyone to have a pleasant experience.

Let’s think a little more about what absolutely has to happen when a person visits for example Galaxy Zoo.

  1. We need to show a login/signup form and send the information provided by the individual back to the server.
  2. Once registration/login is complete we need to serve back some personal information (such as a screen name).
  3. We need to pick some Subjects to show.

For many of the operations that happen in the Zooniverse, a record is written to a database somewhere. When trying to improve the performance of code that involves databases, a key strategy is to try and avoid querying these database as much as possible especially if the queries are complex and the databases are large as these are often the slowest parts of your application.

What count’s as ‘complex’ and ‘big’ in database terms varies based upon the types of records that you are storing, the choices you’ve made about how to index them and the resources you provide to the database server i.e. how much RAM/CPU you have available.

Keeping it personal

If there’s one place that complex queries are guaranteed to reside in a Zooniverse project codebase then it’s the part where we decide what to show to a particular person next. It’s complex, in need of optimisation and potentially slow for a number of reasons:

  1. When selecting a Subject we need to pick from one that a particular User hasn’t seen before.
  2. Often Subjects are in Groups (such as a collection of records in Notes from Nature) and so these queries have to happen within a particular scope.
  3. We often want to prioritise a certain subset of the Subjects.
  4. These queries happen a lot, at least n * the total number of Subjects (where n is the number of repeat classifications each Subject receives).
  5. The list of Subjects we’re selecting from is often large (many millions).

On first inspection, writing code to achieve the requirements above might not seem that hard but if you add in the requirement that we’d like to be able to select Subjects hundreds of times per second for many thousands of Users then it starts to get tricky.

A ‘poor man’s’ version of this might look something like this

def self.next_original_for_user(user)
  recents = joins(:classifications).where(:classifications => { :zooniverse_user_id => user.id }).select('subjects.id').all
  if recents.any?
    where(['id NOT IN (?)', recents]).first
  else
    first
  end
end

What we’re doing here is finding all the classifications for a given User and grabbing all of the Subject ids for them. Then we do a SQL select to grab the first record that doesn’t have an id matching one of the ones from existing classifications.

While this code is perfectly valid and would work OK for small-scale datasets there are a number of core issues with it:

  1. It’s pretty much guaranteed to get slower over time – as the number of classifications grows for a user retrieving the recent classifications is going to become a bigger and bigger query.
  2. It’s slow from the start – NOT IN queries are notoriously slow.
  3. It’s wasteful – every time we grab a new Subject for a User we essentially run the same query to grab the recent classification Subject ids.

These factors combined make for some serious potential performance issues if we want to execute code like this frequently, for large numbers of people and across large datasets all of which are requirements for the Zooniverse.

A better way

It turns out that there are technologies out there designed to help with this sort of scenario. When we select the new Subject for a user there’s no reason why this operation has to actually happen in the database that the Subjects are stored in, instead we can keep ‘proxy’ records stored in lists or sets. That means that if we have a big list of ids of things that are available to be classified and a list of ids of things that each user has seen so far then when we want to select a Subject for someone we just subtract those two things and then pick randomly from the difference and pluck that record from the database.

Screen Shot 2013-07-22 at 21.35.20

In the diagram above when Rob (in the middle) comes to one of our sites we subtract from the big list of Subjects that need classifying still (in blue) the list of things that he’s already seen (in green) and then pick randomly from that resulting set. Going by this diagram it looks like we must have to keep a list of available Subjects for each project together with a separate list of Subjects per project per user so that we can do this subtraction and that’s exactly the case. The database technology that we use to do this is called Redis and it’s designed for operations just like this.

The result

Maturing our codebase to a point where the queries described above are straightforward has been a lot of work, mostly by this guy. What does it look like to actually require this kind of behaviour in code? Just two lines:

class BatDetectiveSubject < Subject
  include SubjectSelector
  include SubjectSelector::Unique
end

This example is selecting ‘unique’ records for each user. We can also select unique grouped and prioritised unique records for projects like Planet Hunters. Regardless of the selection ‘flavour’ we’re using it’s simple for us to now to implement selection behaviour, using Redis to perform these selection operations means that everything is insanely quick, typically returning from Redis in ~30ms even for databases with many tens of thousands of Subjects to be classified.

Screen Shot 2013-07-22 at 22.14.00

Making the routinely hard stuff easier is a continual goal for the Zooniverse development team. That way we can focus maximum effort on the front-end experience and what’s different and hard about each new project we build.

Insights for Informal Science Institutions from Citizen Science Projects

Today we have a guest post from Dr. Ryan Cook, Citizen Science Learning Researcher at the Adler Planetarium.  Ryan earned his PhD in socio-cultural anthropology from the University of Chicago.  His research interests include ethnographic  investigations in Mexico and the US on the intersection of science and religion.

It has been my pleasure to be a researcher for Zooniverse, based at the Adler Planetarium in Chicago, since May 2012. This position has exercised my anthropologist’s curiosity about how people understand and engage with science, taking it in an interesting and very productive new direction that I plan to continue. Thus I am pleased to have a chance to share my work on this blog.

At this writing I am close to completing my portion of a federally-funded project studying Zooniverse volunteers.  I have benefitted greatly in this research from the assistance of your esteemed edu-bloggers, Kelly and Laura, as well as my former Adler colleague Jason Reed and former supervisor Karen Carney. Specifically, we tried to determine whether and how much volunteers’ conceptions of and attitudes towards science changed through their participation in virtual citizen science projects.

This week, I presented some of our findings at the Visitor Studies Association’s annual conference in the town that beer built: Milwaukee, Wisconsin. Outfitted with a snazzy poster and a pile of official Zooniverse postcards and stickers, I argued for the relevance of our studies of Zoo volunteers to museums and science centers that want visitors to their websites to learn about science.

To know what could possibly be learned about science in Zooniverse, Karen, Kelly, and I put together a model of understanding science to guide us. We based the model’s criteria on what scholars who theorize, research, and teach science claimed as central characteristics of the sciences — for instance, relying on sense experience, proceeding methodically or logically, and revising knowledge in light of new evidence.

I then spent several months combing through Zooniverse databases and Google Analytics tables, trying to create a quantitative picture of how volunteers engaged with the tasks, blogs, and forums making up each Zoo. Figure 1 shows an example of the data by which we quantified and compared engagement among Zoos.

Fig. 1 - Old Weather visitor flow, Google Analytics
Fig. 1 – Old Weather visitor flow, Google Analytics

Following the lead of some preliminary statistics, Kelly and I applied our model to mapping out opportunities for learning about science in a subset of mature Zoos (i.e., those launched before the shift to an all-in-one-page design strategy). The Zoos were chosen in pairs with similar tasks but different levels of volunteer engagement:

[*Since the Supernovae Zoo was retired during the course of our project, it was included in the engagement variables but left out of subsequent research stages.]

Upon matching these engagement statistics to the range and type of learning opportunities we identified, three main patterns emerged:

  1. Opportunities for science learning were unevenly distributed within and across Zoos’ webpages. Talk and the Forums, for instance, allow a wide range of volunteers to engage in rich communication with each other and with moderators, administrators, and the science teams regarding the scientific import of the Zoos.
  2. The parts of the Zoos where volunteers went in the greatest numbers and spent the most time were typically those with the fewest, most limited, and least obvious learning opportunities. High-traffic, low-opportunity pages included the classification, marking, and transcription tasks at the core of each Zoo, as we can see in Figure 2.
Fig. 2 - average time on page by page type
Fig. 2 – average time on page by page type

1. Of the more than 700,000 volunteers to visit these Zoos at the time of our analysis, only a small percentage stayed long enough or reached enough pages to encounter many of the learning opportunities we identified.

Each of these findings makes sense if we bear in mind that Zooniverse did not start out as a platform for volunteers to learn about science, but rather as a tool for scientists to carry out certain kinds of data-intensive research.

I contended in my VSA presentation that this mismatch offered museums and science centers some guidance in how to (re)design their websites to improve the chances that visitors would encounter opportunities to learn what the institutions decided was important. Laura, Kelly, and the Zooniverse team have been testing out ways to design more learning opportunities into the “stickiest” parts of the Zoos.

And as for me, I have followed up this quantitative work with a series of in-depth interviews of heavily involved volunteers. By coding their responses based on an extended version of our science learning model, I aim to find out what they feel they learned from their Zooniverse engagement and how it helps us to determine how one segment of volunteers engaged with the science learning opportunities we identified. This interview material will appear along with the engagement data and the science learning model in my report, which should be completed by late September. Stay tuned: you will hear about it first!

Chasing Storms Online with the New Cyclone Center

Cyclone Center has recorded almost 250,000 classifications from volunteers around the world since its launch in September 2012. We’ve had lots of feedback on the project and have recently made significant changes that we think will make the experience of classifying storms more rewarding.

Patterns in storm imagery are best recognized by the human eye, so the scientists behind Cyclone Center are asking you to help look through 30 years of images of tropical storms. The end product will be a new global tropical cyclone dataset that could not be realistically obtained in any other fashion. We have already found that the pattern matching by our classifiers is doing better in many cases than a computer algorithm on the same images – this is very exciting!

The biggest change to the site is that we’re now targeting storms for classification. We’ve shifted to a system where the whole community will work on particular storms until they’re finished. This produces useful data very quickly – and means we can classify timely and scientifically useful storms as needed. These targeted storms will change frequently as you help us complete each one. You can check a box on the Cyclone Center home page that will mean you get alerted when new targeted storms appear: we hope to recruit a horde of enthusiastic online storm chasers this way.

Cyclone Centre Homepage

We’ve added much more inline classification guidance – gone are the days of clicking on question marks to get help.  For each step in the process, you will be shown information on how to best answer the question. We think this will give you more confidence in what you are doing and hopefully inspire you to do even more!

We’ve improved the tutorial and we’re providing more feedback as you go along – now instead of waiting for several images to see the “Storm Stats” page, you will immediately go there after your first image. We’ve also upgraded Cyclone Center Talk, which allows for better searching and highlights more of the interesting discussions going on between other citizen scientists.

All-in-all it’s a big change for an awesome project. Log in to Cyclone Center today and give the new version a try. Don’t forget to check the box to start getting alerted to new storms as they appear: this will be incredibly useful for the research behind the site, and means you can be the first to classify data on new storms.

[Visit http://www.cyclonecenter.org and see the blog at http://blog.cyclonecenter.org]

Zoo Tools: A New Way to Analyze, View and Share Data

Since the very first days of Galaxy Zoo, our projects have seen amazing contributions from volunteers who have gone beyond the main classification tasks. Many of these examples have led to scientific publications, including Hanny’s Voorwerp, the ‘green pea’ galaxies, and the circumbinary planet PH1b.

One common thread that runs through the many positive experiences we’ve had with the volunteers is the way in which they’ve interacted more deeply with the data. In Galaxy Zoo, much of this has been enabled by linking to the Sloan SkyServer website, where you can find huge amounts of additional information about galaxies on the site (redshift, spectra, magnitudes, etc). We’ve put in similar links on other projects now, linking to the Kepler database on Planet Hunters, or data on the location and water conditions in Seafloor Explorer.

The second part of this that we think is really important, however, is providing ways in which users can actually use and manipulate this data. Some users have been already been very resourceful in developing their own analysis tools for Zooniverse projects, or have done lots of offline work pulling data into Excel, IDL, Python, and lots of other programs (see examples here and here). We want to make using the data easier and available to more of our community, which has led to the development of Zoo Tools (http://tools.zooniverse.org). Zoo Tools is still undergoing some development, but we’d like to start by describing what it can do and what sort of data is available.

An Example

Zoo Tools works in an environment which we call the Dashboard – each Dashboard can be thought of as a separate project that you’re working on. You can create new Dashboards yourself, or work collaboratively with other people on the same Dashboard by sharing the URL.

Zoo Tools Main Page

Create a New Dashboard

Within the Dashboard, there are two main functions: selecting/importing data, and then using tools to analyze the data.

The first step for working with the Dashboard is to select the data you’d like to analyze. At the top left of the screen, there’s a tab named “Data”. If you click on this, you’ll see the different databases that Zoo Tools can query. For Galaxy Zoo, for example, it can query the Zooniverse database itself (galaxies that are currently being classified by the project), or you can also analyze other galaxies from the SDSS via their Sky Server website.

Import Data from Zooniverse

Clicking on the “Zooniverse” button, for example, you can select galaxies in one of four ways: a Collection (either your own or someone else’s), looking at your recently classified galaxies, galaxies that you’ve favorited, or specific galaxies via their Zooniverse IDs. Selecting any of these will import them as a dataset, which you can start to look at and analyze. In this example we’ll import 20 recent galaxies.

Import 20 Recents

After importing your dataset, you can use any of the tools in Dashboard (which you can select under “Tools” at the top of the page) on your data. After selecting a tool, you choose the dataset that you’d like to work with from a dropdown menu, and then you can begin using it. For example: if I want to look at the locations of my galaxies on the sky, I can select the “Map” tool. I then select the data source I’d like to plot (in this case, “Zooniverse–1”) and the tool plots the coordinates of each galaxy on a map of the sky. I can select different wavelength options for the background (visible light, infrared, radio, etc), and could potentially use this to analyze whether my galaxies are likely to have more stars nearby based on their position with respect to the Milky Way.

The other really useful part is that the tools can talk to each other, and can pass data back and forth. For example: you could import a collection of galaxies and look at their colour in a scatterplot. You could then select only certain galaxies in that tool, and then plot the positions of those galaxies on the map. This is what we do in the screenshots below:

This slideshow requires JavaScript.

Making Data Analysis Social

You can also share Dashboards with other people. From the Zoo Tools home page you can access your existing dashboards as well as delete them and share them with others. You can share on Twitter and Facebook or just grab the URL directly. For example, the Dashboard above can be found here – with a few more tools added as a demonstration.

Sharing a Dashboard

This means that once you have a Dashboard set up and ready to use, you can send it to somebody else to use too. Doing this will mean that they see the same tools in the same configuration, but on their own account. They can then either replicate or verify your work – or branch off and use what you were doing as a springboard for something new.

What ‘Tools’ Are There?

Currently, there are eight tools available for both regular Galaxy Zoo and the Galaxy Zoo Quench projects:

  • Histogram: makes bar charts of a single data parameter
  • Scatterplot: plot any two data parameters against each other
  • Map: plot the position of objects on the sky, overplotted on maps of the sky at different wavelengths (radio, visible, X-ray, etc.)
  • Statistics: compute some of the most common statistics on your data (eg, mean, minimum, maximum, etc).
  • Subject viewer: examine individual objects, including both the image and all the metadata associated with that object
  • Spectra: for galaxies in the SDSS with a spectrum, download and examine the spectrum.
  • Table: List the metadata for all objects in a dataset. You can also use this tool to create new columns from the data that exists – for example, take the difference between magnitudes to define the color of a galaxy.
  • Color-magnitude: look at how the color and magnitude of galaxies compare to the total population of Galaxy Zoo. A really nice way of visualizing and analyzing how unusual a particular galaxy might be.

We have one tool up and running for Space Warps called Space Warp Viewer. This lets users adjust the color and scale parameters of image to examine potential gravitational lenses in more detail.

Snapshot Serengeti Dashboard

Finally, Snapshot Serengeti has several of the same tools that Galaxy Zoo does, including Statistics, Subject Viewer, Table, and Histogram (aka Bar Graph). There’s also Image Gallery, where you can examine the still images from your datasets, and we’re working on an Image Player. There’s a few very cool and advanced tools we started developing last week – they’re not yet deployed, but we’re really excited to let you follow the activity over many seasons or by focusing on particular cameras. Stay tuned. You can see an example Serengeti Dashboard, showing the distribution of Cheetahs, here (it’s also shown in the screenshot above).

We hope that Zoo Tools will be an important part of all Zooniverse projects in the future, and we’re looking forward to you trying them out. More to come soon!

Galaxy Zoo Quench: A New Kind of Citizen Science

A new ‘mini’ project went live yesterday called Galaxy Zoo Quench. This project involves new images of 6,004 galaxies drawn from the original Galaxy Zoo. As usual, everyone is invited to come and classify these galaxies, but this project has a twist that makes it special! We hope to take citizen science to the next level by providing the opportunity to take part in the entire scientific process – everything from classifying galaxies to analyzing results to collaborating with astronomers to writing a scientific article!

Galaxy Zoo Quench

Galaxy Zoo Quench is examining a sample of galaxies that have recently and abruptly quenched their star formation. These galaxies are aptly named Post-Quenched Galaxies. They provide an ideal laboratory for studying galaxy evolution. So that’s exactly what we want to do: with the help of the Zooniverse community. We hope you’ll join us as we try out a new kind of citizen science project. Visit http://quench.galaxyzoo.org to learn more.

The entire process of classifying, analyzing, discussing, and writing the article will take place over an ~8-12 week period. After classifying the galaxies, Quench volunteers can use tools.zooniverse.org to plot the data and look for trends. We also have a special Quench Talk forum to discuss and identify key results to include in the paper – above you can see examples of some of the cool objects people have already found and discussed.

Have questions about the project? Leave a comment here or ask us on Twitter (@galaxyzoo) or on the Galaxy Zoo Facebook page. In case you’re worried: the regular Galaxy Zoo will continue as normal.

Now go visit http://quench.galaxyzoo.org and start classifying!

Zooniverse Live Chat

A small team of scientists and developers from across the Zooniverse are gathered at Adler Planetarium in Chicago this week to pitch and work on ideas for advanced tools for some of your favorite Zooniverse projects. Our goal is to come up with some  tools and experiences that will help the Zooniverse volunteers further explore, beyond the scope of the main classification interfaces, the rich datasets behind the projects in new and different ways. As part of the three days of hacking, there will be a live chat with representatives from Galaxy Zoo, Planet Hunters, Snapshot Serengeti, and Planet Four (as well as a special guest or two) tomorrow Thursday July 11th at 2pm CDT ( 3 pm EDT, 8 pm BST). We’ll also give you an inside peek into the US Zooniverse Headquarters on the floor of the Adler Planetarium where much of the coding and development behind the Zooniverse happens.

You can find the video feed here on the blog. If you can’t watch live, the video is recorded and will be available to view later. If you have questions for the science teams you can post them in the comments or tweet @the_zooniverse

 

ZooTeach and Resources for the Classroom

Have you got your students whirling with excitement over Cyclone Center ?  Are they positively passionate about Planet Four?

Here in Zooniverse HQ, we like nothing better than hearing from teachers and educators about how you’re using Zooniverse projects in your classrooms and other learning environments.  Over the last year we’ve traveled to several conferences and meetings and heard about all kinds of innovative ways that teachers have put Zooniverse projects to use with their students.  We need you to share your amazing ideas!

ZooTeach is a companion website to Zooniverse containing lessons and resources aimed at helping teachers bring Zooniverse projects into their classrooms. Anybody can upload and share activities; you only need a Zooniverse login to contribute.  This fall we’ll have several new lessons and activities created as part of the Zooniverse Teacher Ambassadors Workshop to share with you.  We hope that you’ll consider sharing some of the ways that you’ve found to bring citizen science into your classroom or check-out ideas from other educators.

Welcome to the Worm Watch Lab

Today we launch a new Zooniverse project in association with the Medical Research Council (MRC) and the Medical Research Foundation: Worm Watch Lab.

We need the public’s help in observing the behaviour of tiny nematode worms. When you classify on wormwatchlab.org you’re shown a video of a worm wriggling around. The aim of the game is to watch and wait for the worm to lay eggs, and to hit the ‘z’ key when they do. It’s very simple and strangely addictive. By watching these worms lay eggs, you’re helping to collect valuable data about genetics that will assist medical research.

Worm Watch Lab

The MRC have built tracking microscopes to record these videos of crawling worms. A USB microscope is mounted on a motorised stage connected to a computer. When the worm moves, the computer analyses the changing image and commands the stage to move to re-centre the worm in the field of view. Because the trackers work without supervision, they can run eight of them in parallel to collect a lot of video! It’s these movies that we need the public to help classify.

By watching movies of the nematode worms, we can understand how the brain works and how genes affect behaviour. The idea is that if a gene is involved in a visible behaviour, then mutations that break that gene might lead to detectable behavioural changes. The type of change gives us a hint about what the affected gene might be doing. Although it is small and has far fewer cells than we do, the worm used in these studies (called C. elegans) has almost as many genes as we do! We share a common ancestor with these worms, so many of their genes are closely related to human genes. This presents us with the opportunity to study the function of genes that are important for human brain function in an animal that is easier to handle, great for microscopy and genetics, and has a generation time of only a few days. It’s all quite amazing!

To get started visit www.wormwatchlab.org and follow the tutorial. You can also find Worm Watch Lab on Facebook and on Twitter.