Galaxy Zoo is Open Source

April 16, 2013 arfon 4 Comments

It’s always a good feeling a be making a codebase open and today it’s time to push the latest version of Galaxy Zoo into the open. As I talked about in my blog post a couple of months ago, making open source code the default for Zooniverse is good for everyone involved with the project.

One significant benefit of making code open is that from here on out it’s going to be much easier to have Zooniverse projects translated into your favourite language. When we build a new project we typically extract the content into something called a localisation file (or localization if you prefer your en_US) which is basically just a plain text file that our application uses. You can view that file for our (US) English translation file here and it looks a little like this:

So how do I translate Galaxy Zoo?

I’m glad you asked… It turns out there’s a feature built into the code-hosting platform we’re using (called GitHub) which allows you to basically make your own copy of the Galaxy Zoo codebase. It’s called ‘forking’ and you can read much more about it here but all you need to do to contribute is fork the Galaxy Zoo code repository, add in your new translation file and (there’s a handy script that will generate a template file based on the English version), translate the English values into the new language and send the changes back up to GitHub.

Once you’re happy with the new translation and you’d like us to try it out you can send us a ‘pull request’ (details here). If everything looks good then we can review the changes and pull the new translation into the main Galaxy Zoo codebase. You can see an example of a pull request from Robert Simpson that’s been merged in here.

So what next?

This method of translating projects is pretty new for us and so we’re still finding our way a little here. As a bunch of developers it feels great to be using the awesome collaborative toolset that the GitHub platform offers to open up code and translations to you all.

Cheers

Arfon

Education

NSTA or Bust

April 3, 2013 The Zooniverse 3 Comments

At this time next week we’ll be rubbing elbows with science teachers and informal educators at the National Science Teacher Association’s annual meeting in San Antonio, Texas. This year’s conference theme is Next Generation Science: Learning, Literacy, and Living. It’s promises to be four days packed with excitement and science fun (and delicious TexMex food).

Zooniverse education will be out in full force! Laura and I will be at the Zooniverse booth in the exhibition hall (Booth #1444) throughout the conference. We’re also facilitating a workshop entitled Citizen Science Investigations in the Classroom on Saturday April 13^th from 12:30 – 1:30pm. If you happen to be attending NSTA, we hope that you’ll stop by and say hello (and score one of our snazzy new Zooniverse stickers). If you’re not attending but want to follow our wacky and sciencey adventures, we’ll be tweeting (@zooteach) throughout the conference.

One of these sweet stickers could be yours!

News

Optimizing for interest : Why people aren’t machines

March 29, 2013 chrislintott 5 Comments

One of the joys of working in the Zooniverse is the sheer variety of people who are interested in our work, and I spent a happy couple of days toward the end of last year at a symposium about Discovery Infomatics – alongside a bunch of AI researchers and their friends who are trying to automate the process of doing science. I don’t think they’d mind me saying that we’re a long, long way from achieving that, but it was a good chance to muse on some of the connections between the work done by volunteers here and by our colleagues who think about machine learning.

In the past we’ve shown that machines can learn from us, but we’ve also talked about the need for a system that can combine the best of human and machine.

These two things are not the same — Robot and human (Thanks to Flickr user NineInchNachosXI)

I’m still convinced that that will especially be needed as the size of datasets produced by scientific surveys continues to increase at a frightening pace. The essential idea is that only the proportion of the data which really needs human attention need be passed to human classifiers; an idea that starts off as a non-brainer (wouldn’t it be nice if we could decide in advance which proportion of Galaxy Zoo systems are too faint or fuzzy for sensible decisions to be made?) and then becomes interestingly complex.

This is particularly true when you start thinking of volunteers not as a crowd, but as a set of individuals. We know from looking at the data from past projects that people’s talents are varied – the people who are good at identifying spiral arms, for example, may not be the same people who can spot the faintest signs of a merger. So if we want to be most efficient, what we should be aiming for is passing each and every person the image that they’d be best at classifying.

That in turn is easy to say, but difficult to deliver in practice. Since the days of the original Galaxy Zoo we’ve tended to shun anything that resembles a test before a volunteer is allowed to get going, and in any case a test which thoroughly examined someone’s ability in every aspect of the task (how do they do on bright galaxies? on faint ones? on distant spirals? on nearby ellipticals? on blue galaxies? what about mergers?) wouldn’t be much fun.

One solution is to use the information we already have; after all, every time someone provides a classification we learn something not only about the thing they’re classifying but also about them. This isn’t a new idea – in astronomy, I think it’s essentially the same as the personal equation used by stellar observers to combine results from different people – but things have got more sophisticated recently.

As I’ve mentioned before, a team from the robotics group in the department of engineering here in Oxford took a look at the classifications supplied by volunteers in the Galaxy Zoo: Supernova project and showed that by classifying the classifiers we could make better classifications. During the Discovery Infomatics conference I had a quick conversation with Tamsyn Waterhouse, a researcher from Google interested in similar problems, and I was able to share results from Galaxy Zoo 2 with her*.

We didn’t get time for a long chat, but I was delighted to hear that work on Galaxy Zoo had made it into a paper Tamsyn presented at a different conference. (You can read her paper here, or in Google’s open access repository here.) Her work, which is much wider than our project, develops a method which considers the value of each classification based (roughly) on the amount of information it provides, and then tries to seek the shortest route to a decision. And it works – she’s able to show that by applying these principles we would have been done with Galaxy Zoo 2 faster than we were – in other words, we wasted some people’s time by not being as efficient as we could be.

A reminder of what Galaxy Zoo 2 looked like!

That doesn’t sound good – not wasting people’s time is one of the fundamental promises we make here at the Zooniverse (it’s why we spend a lot of time selecting projects that genuinely need human classifications). Zoo 2 was a long time in the past, but knowing what we know now should we be implementing a suitable algorithm for all projects from here on in?

Probably not. There are some fun technical problems to solve before we could do that anyway, but even if we could, I don’t think we should. The current state of the art of such work misses, I think, a couple of important factors which distinguish citizen science projects from other examples considered in Tamsyn’s paper particularly. To state the obvious: volunteer classifiers are different from machines. They get bored. They get inspired. And they make a conscious or an unconscious decision to stay for another classification or to go back to the rest of the internet.

The interest a volunteer will have in a project will change as they move (or are moved by the software) from image to image and from task to task, and in a complicated way. Imagine getting a galaxy that’s difficult to classify; on a good day you might be inspired by the challenge and motivated to keep going, on a bad one you might just be annoyed and more likely to leave. We all learn as we go, too, and so our responses to particular images change over time. The challenge is to incorporate these factors into whatever algorithm we’re applying so that we can maximise not only efficiency, but interest. We might want to show the bright, beautiful galaxies to everyone, for example. Or start simple with easy examples and then expand the range of galaxies that are seen to make the task more difficult. Or allow people a choice about what they see next. Or a million different things.

Whatever we do, I’m convinced we will need to do something; datasets are getting larger and we’re already encountering projects where the idea of getting through all the data in our present form is a distant dream. Over the next few years, we’ll be developing the Zooniverse infrastructure to make this sort of experimentation easier, looking at theory with the help of researchers like Tamsyn to see what happens when you make the algorithms more complicated, and talking to our volunteers to find out what they want from these more complicated projects – all in our twin causes of doing as much science as possible, while providing a little inspiration along the way.

* – Just to be clear, in both cases all these researchers got was a table of classifications without any way of identifying individual volunteers except by a number.

Education

Teachers Wanted For Planet Hunters Educators Guide Piloting

March 19, 2013 The Zooniverse 2 Comments

We need you and your students to help us craft a top-notch resource for teachers! Educators at the Adler Planetarium have been hard at work creating an educators guide aimed at helping teachers bring the thrilling hunt for exoplanets into their classroom. The first draft is nearly ready and we want to know what you think.

We’re looking for US-based 6^th -8^th grade teachers to try one or more lessons from the Planet Hunters Educators Guide this spring with their students. Each lesson can be taught as a stand alone activity and takes approximately 45 – 60 minutes of class time. We want to know what works, what needs to change, and any other feedback you can provide.

Besides, one of your students may just discover a new planet! You can’t get that in gym class (although physical fitness is very important).

If you’re interested, please email the following information to education@zooniverse.org.

Name:

State:

Grades & Subjects Taught:

Number of Class Sections (if applicable):

News

Project Workshop Winners

March 13, 2013 chrislintott 7 Comments

We were delighted by the response to our call for volunteers to attend our project workshop and we’re delighted to announce that our two winners are Katy Maloney and Janet Bain. Katy is a Planet Hunter from Montreal (you can see her in this recent video about online communities. Janet is well known to those from Old Weather where she serves as moderator of the very active forum.

As Jules explained in her post, these workshops are a chance for the strange mix of people behind the scenes of the Zooniverse – developers, educators and scientists – to get together to discuss what works and what doesn’t, and to plan the year ahead. We think it’s very important to have volunteers there – and we hope that Katy and Janet (along with Jules, who we’ve invited back) will keep you all informed and involved in the discussions.

There were a few comments in the discussion under that last post from people – particularly locals – who would clearly have dearly loved to come. Unfortunately, it wouldn’t be possible to run the workshop as a public event; both because of the format (which features spontaneously arranged small group discussions) and also to allow everyone to speak freely about often quite difficult issues. What I do find heartening is that we’ve grown a community who want to help us plan and develop for the future, and we need to take that seriously.

I’ll write more over the next couple of weeks and months about what we’re going to do to be more open, but for now for those who really wanted to come we’ll work hard to organise some truly public events. We have a meeting in Oxford on the 22nd June which I hope British Zooites will be able to attend, and we’ll arrange a similar event in Chicago as soon as possible. We’ll also try hard to webcast these events so all can attend.

Chris

News

ARCHIVE: Why SciStarter.com is Bad For Citizen Science

March 12, 2013 arfon 10 Comments

Since this post was written, SciStarter have changed their policies and now provide direct links to projects. This is a good thing, and I’m happy to acknowledge it here.

Chris – April 2017

Preface: I’d like to begin by saying that I’ve met Darlene Cavalier at conferences in the past and I’m a big supporter of her efforts. Darlene is truly is a ‘cheerleader’ for citizen science, her enthusiasm is infectious and the citizen science domain is clearly a better place with her. I’m writing here about what I consider the bad practice of SciStarter.com and Science For Citizens LLC, their parent organisation. I have no idea whether the issues highlighted here are because of decisions that she has made.

There was a time not so long ago when you needed a new account for pretty much everything you tried out on the web. Want to upload photos to Flickr? Then signup for a Yahoo! ID. Want a blog? Then give WordPress or Tumblr your details. Feeling social? Then FaceBook, Twitter or MySpace would pretty much want the same information. These days there are a number of solutions that allow you to log in to web-based services using things like your Facebook, Twitter or Google account. Under the hood these solutions typically rely on a couple of protocols such as OAuth and OpenID and often still request your email address when you sign in but the days of hundreds of accounts each with their own password to remember are coming to a close.

In many ways a request by an organision for your email address when signing up for a new service is completely reasonable. In exchange for handing over your email address and a few personal details these tools were often available for free – both parties win. There is of course the discussion around who or what is the product when you use these free services but let’s not go into that here.

Since launching the original Galaxy Zoo back in 2007 we’ve encouraged our volunteer community to register for an account with us, although for the vast majority of our projects (and all of our recent ones) this login/signup is an optional step. For the Zooniverse there are two main reasons for asking you to create an account:

1) When we publish a paper as a result of your efforts we feel extremely strongly about crediting you for your efforts. Experience has taught us that attempting to publish a paper with 170,000 authors on is somewhat frowned upon by the journals but if you take a look at any of the Zooniverse publications you’ll find a link to an authors page such as here, here and here. We can only credit you if you share some personal information with us when you sign up.

2) For our research methods to work well, identifying an individual ‘classifier’ is pretty important. You can read more about this here (the original Galaxy Zoo paper) or here but in order to produce the best results possible we spend lots of time working out who is ‘best’ at a particular task and weighting their contributions accordingly. Being able to reliably identify an individual throughout the lifetime of a project (and even between projects) is most simple when someone has logged in.

Over the past year or so I’ve become increasingly concerned by the behaviour of SciStarter.com – a website that indexes citizen science projects from across the web. The site does a pretty good job of cataloging citizen science projects you can contribute to – when you visit the site and search for example for ‘bats’ the Zooniverse project Bat Detective is listed in the results. Selecting the result takes you to a brief summary of Bat Detective and offers you a link to ‘get started now!’ and this is where it goes wrong: Rather than taking you straight to the Bat Detective site you have to be ‘logged in’. Sign up for what exactly? Am I signing up to take part in Bat Detective? No. You’re actually just signing up for an account with SciStarter.com just so you can get a link to a project that SciStarter.com has nothing to do with.

Additionally, in a recent ‘top 10’ blog post of most successful citizen science projects of 2012, Bat Detective was highlighted. Did the link in this article send you straight to the Bat Detective website? Sadly not, it of course links to SciStarter’s catalogue page about Bat Detective which requires account registration before you can access the URL.

To me this doesn’t seem right and in many ways this is just exploiting people’s lack of experience and understanding of the web. There’s a reason that Facebook.com is in the consistently the most Googled terms – many people just don’t quite understand how the web works and I think SciStarter.com are exploiting this. Conversly, for those who are a little more web savvy these tactics must seem very clumsy. Perhaps more importantly though, it’s widely recognised that signup forms are a barrier to entry for many people and so by having people jump through this hoop SciStarter.com are actually holding potential citizen scientists back.

I don’t believe it’s in anyone’s interest other than Scistarter’s to require you to sign up to follow a link through to a project. By mandating this step they are building an index of individuals interested in other people’s projects when they don’t have any of their own and they’re risking confusing new community volunteers about what they have and haven’t signed up for. All of this is made worse by the fact that SciStarter.com is a division of Science for Citizens LLC – a commercial company.

So my challenge to SciStarter.com is this: If you’re so committed to citizen science then why put up this artificial barrier to contribution? Crawling the internet for people’s emails is one of the less tasteful aspects of the web and one I’d hoped we’d seen the end of. So how about it SciStarter?

Education

Zooniverse Workshops

March 1, 2013 The Zooniverse 1 Comment

It’s very exciting, and a little bit scary, but we’re going to begin offering online educator workshops! The first two will occur on the 30th of March and the 6th of April and we’re looking for volunteers who would like to participate.

The workshops aim to introduce educators to the range of citizen science projects available on zooniverse.org, our new website ZooTeach and the new classroom interactive tool for Galaxy Zoo, the Navigator. They are aimed at teachers, but anybody who is interested in using citizen science in a informal education setting, after school club, scout group or maybe with their home schooled son or daughter, is more than welcome to join us.

We will be using Google + Hangouts (http://www.google.com/+/learnmore/hangouts/), which can be freely accessed by anyone with a Google email account. The time of the workshops has yet to be decided as we were unsure what would work best with different time zones, but it will last for approximately 2 hours.

If you interested in participating please email education@zooniverse.org, including the date that suits you best and your location and we will do our best to set up a time that work for as many people as possible! Each workshop will have a maximum of 10 participants, but we may decide to do several smaller groups in different countries.

I hope to meet some of you online in the not too distant future!

News

Calling all Zooites! Your chance to attend the second Zooniverse Project Workshop in Chicago!

February 27, 2013 juleswilkinson 24 Comments

Meg Schwamb giving the Planethunters presentation in 2012
Photo © Julia Wilkinson

It’s almost a year since I attended the first ever Zooniverse Project Workshop in my role as an advisory board member. In April the second Zooniverse workshop will convene to discuss yet more exciting new projects. I’ll be there and hopefully so will Alice Sheppard (if her exam timetable permits!) This year, however, there is funding available for one more volunteer to attend. This is a responsible role for a dedicated and enthusiastic Zooite and that could be you!

This is a fantastic opportunity to meet the science teams behind projects old and new and to find out just what is involved in getting a project up and running. You will attend some great presentations and have the chance to contribute to some fascinating discussions and workshops. Last year we covered things such as design, how to get the best science out of a project and how to create the best user experience. You need to be prepared to take part in discussions and to talk about your experiences as a Zooniverse volunteer. The more you put in the more rewarding the conference will be and you’ll find that your contribution will be hugely respected and valued. Volunteers can make or break a project and I was certainly made to feel that my input was extremely important.

There is only one place available, however, so to help the team decide who gets to go please tell us in no more than 250 words a little about yourself, why you think you should go and what you can contribute to the discussions as a volunteer. Please add your full name and preferred e-mail address and send this to team@zooniverse.org with the subject line CHICAGO PLEASE. The closing date is 12 noon GMT on Thursday 7 March 2013. The Zooniverse team will choose the successful entry.

The conference will be held over two days at the Adler Planetarium, Chicago on 29 and 30 April 2013. Flight and hotel expenses will be reimbursed in full.

This really is a fantastic opportunity to contribute to citizen science and the future of the Zooniverse – don’t miss out!

For a detailed account of last years event have a look at the notes on my blog.

Education

Under the Sea and On the Moon with Third Graders

February 25, 2013 The Zooniverse Leave a comment

We have had a great response from teachers in the Chicago area to our offer of making classroom visits. Yesterday marked out first visit to West Ridge Elementary in Chicago’s Rogers Park neighborhood. After consulting with Ms. Tschaen, the third grade science teacher, we decided to present Seafloor Explorer and Moon Zoo to the students. Apologies in advance for the lack of pictures, we were having way too much fun to think about proper documentation.

One of the challenges while preparing for this school visit was figuring out a quick and easy way to explain crowdsourcing to third graders. I scoured the web for a nifty interactive or video,but in the end decided a low-tech solution, a story, was the best solution. I told the students that when I was a kid my friends and I loved to play soccer. One we were planning to play but my Mom told me I wasn’t allowed to until my room was clean. When I was a kid, the floor of my room generally resembled a soup of toys, clothes, books, and papers. Cleaning it was no small task and usually entailed the better part of a whole day. I asked the students how they thought my friends and I solved the problem. Every group offered the solution that we could work together to clean my room and then have time to play soccer together. Voila! The principle of crowdsourcing quickly and easily explained. Granted, I settled on a somewhat simplified definition, that crowdsourcing is getting a bunch people to help solve a bigger problem, but it did the trick and the students “got it”.

Next we were ready to set the stage with how Zooniverse projects utilize the efforts of many to solve problems involving large datasets. With the first two classes, we decided to test a newly developed Seafloor Explorer classroom activity. For time’s sake we modified the activity by focusing on species identification and left out ground cover identification component. After a 10 minute group discussion of Seafloor Explorer’s science goals and how to identify the different animals we were off and running. Just like with the example of cleaning my room and soccer, the students called out that we needed more people to identify the 30,000,000 + images comprising this project’s dataset. Success! They were challenged to work together as a class to beat the time it took me to identify species on 40 different cards. Working together each class about 1/3 of the time it took me to do it alone. Double success!

Laura engaged the third class in lunar adventures using Moon Zoo. Students learned a little bit about the history of moon exploration. Next they discussed craters and the different ways we can find out information about our nearest celestial neighbor. After a brief introduction to the Lunar Reconnaissance Orbiter, they divided into groups to explore individual portions of the moon. Students worked together to mark any craters larger that their thumbprint on their section of the moon. They tallied the total number of craters on their group’s individual moon section and compared them to the other groups’ moon sections. Finally students identified potential sites for a lunar lander to touchdown.

So, what did we learn from our adventures with third graders? I’ve long suspected that, while students certainly have the ability to participate in most any Zooniverse project, it sometimes helps to introduce the project “offline”. This may help students feel more secure when they begin participating on a project’s website. Many teachers I’ve spoken with point out that students don’t always feel empowered in their practice of science. By frontloading students with a little bit of a project’s background content and walking through the classification task together, students can easily see that they are more than capable to make an important contribution to current scientific research. Working as a group also fosters a sense of community that we, as a class, are working together to help scientists make important discoveries and maybe even making some important discoveries ourselves. I’m sure that there will be many lessons to learn over our remaining visits to Chicago-area classrooms.

News

Making the Zooniverse Open Source

February 18, 2013 arfon 2 Comments

We’re pleased to announce that the time has come to start making the Zooniverse open source. From today, you’ll be able to see several of our current projects on Github (at https://github.com/zooniverse) and will be able to fork them and contribute to them.

Taking the Zooniverse open source is something we’ve been thinking about for a long time. As the field of citizen science expands into ever broader domains the number of tools available to people to start their own projects is still low. Since the launch of Galaxy Zoo 2 we’ve been building tools that allow for code reuse across a number of projects and while the majority(1) of our software has never been ‘officially’ open, behind the scenes we’ve been sharing with pretty much anyone who asked, often talking them through the thought process that led us to design our software in a particular way.

Because of our natural inclination to share with those who approached us, we’ve never really made publishing our code a priority. As with most closed source projects there are also a number of pretty boring (but sometimes important) reasons for not publishing – we worried about how usable the code we’d written was to people we didn’t work closely with – as a small team we favour clean code and conversation with other developers over heavy documentation. Some sensitive information around our production environment inevitably slipped into the codebases which mean’t lots of work to clean up and security audit our tools. Some of these reasons hold for legacy applications each project we start often comes with a new Git repo and an opportunity to develop in a different way.

What does this mean?

Well, from today you’ll start to see a number of applications appear on the Zooniverse GitHub site. We’re starting with a collection of our most recent projects: Snapshot Serengeti, Bat Detective, Cyclone Center and Seafloor Explorer.

It’s important to say here that we’re not expecting a community of developers to jump in a help us develop new projects (although that would be pretty cool), but if there’s a typo on our site or a really annoying bug that you know exactly how to fix then fork the repo and send us a pull request and we’ll see what we can do. Significantly for our localisation support (translating sites into multiple languages) we’re proposing that new translations should be submitted in exactly this way (2). There are a huge number of very talented people in the Zooniverse community who until today had no way of contributing to the project other than to help analyse data. That changes today.

We’re releasing our software under a very liberal license – Apache 2.0. In very simple terms this means that the tools we develop can be used for whatever you like provided you follow the rules of the Apache 2.0 license.

What aren’t we open sourcing?

In truth lots of legacy code for our older projects aren’t likely to make it into the open. A large number of our projects between 2009 (Galaxy Zoo 2) and 2011 (The Milky Way Project) were all built upon a shared codebase called The Juggernaut. While we’re not making each of the projects open we’re are publishing the common application core which has been kept up to date and runs on Rails 3.1.

We’re also not opening up our applications that hold sensitive user information and are mission-critical for the operation of the Zooniverse. That’s not to say we won’t ever do this, we’re just not comfortable publishing these applications at this point. This basically means that the application that powers Zooniverse Home (www.zooniverse.org) and an application called Ouroboros (api.zooniverse.org) that serves up images and collects back classifications aren’t part of our open source strategy.

Why now?

Aside from the reasons mentioned above, there are a number of reasons to make open source our default position. In part it’s about people – developers these days are often hired (or at least shortlisted) by their GitHub profiles that show which projects they’ve been working on. As our team grows and we hire talented young developers we’re doing them a disservice not allowing them to show off the awesome work they do. It’s also about the way in which we as the Zooniverse do science. We believe citizen science is an inherently open way of doing research, we often work with open datasets (such as SDSS) and ask people to donate their time and efforts to a project that in the end produces open data products for the research community to enjoy (e.g. data.galaxyzoo.org, data.milkywayproject.org). Having a closed codebase for everything we do just feels incompatible with this way of doing research.

What’s next?

To be honest we’re not quite sure. Going forward, our projects will typically become open source as we launch them. If there’s a Zooniverse project that you think you’d like to rework for a different purpose then there’s now nothing stopping you from doing this. If you’re interested in helping us with a new translation for your favourite project then we’d love to talk. Perhaps you’re just interested to see how some of our applications work. Regardless, we invite you to take a look and give us feedback. The Zooniverse has always been about harnessing the crowd to make science happen. From today, there is a new way for people to contribute to that goal.

Cheers
Arfon

Footnotes:
1. Scribe, our open source text transcription framework grew out of Old Weather and has been used on a number of projects now.
2. A fuller article about language support is coming very soon on this blog.