Who’s who in the Zoo – Sarah Allen

In this week’s edition of Who’s who in the Zoo, meet Sarah Allen, a front-end web developer in the Zooniverse team. 

– Helen


SarahAllen - Sarah Allen

Name: Sarah Allen

Location: Adler Planetarium, Chicago

 

 

Tell us about your role within the team:

I’m a front-end web developer and have been with the team for three and a half years. I’ve worked on a variety of projects including Chimp & See, Wildcam Gorongosa, Zooniverse Classrooms’ educational tools, Gravity Spy, and day to day maintenance of zooniverse.org.

 

What did you do in your life before the Zooniverse?

I originally did IT for a couple of medical schools involving Windows server management, Google domain management, application management, and general help desk. I eventually decided to learn to code and went to a code bootcamp when those first started getting popular. Then continued to self-teach as well as freelance before I joined the Zooniverse team.

 

What does your typical working day involve?

Usually first checking slack, email, and the Zooniverse talk board for any bug reports. Then I prioritize code reviews, following up to any pull requests I’ve submitted, then new feature development or learning about something new in the afternoon

 

How would you describe the Zooniverse in one sentence?

We empower researchers and the public to find answers to questions in real data.

 

Tell us about the first Zooniverse project you were involved with

Cyclone Center! My first project was implementing the project redesign and classification challenge.

 

Of all the discoveries made possible by the Zooniverse, which for you has been the most notable and why?

Tabby’s star on Planet Hunters. It’s been one of my go to examples when explaining what it is that we do.

 

What’s been your most memorable Zooniverse experience?

Building and launching Chimp & See. It was a mostly solo project for me and although there was a learning curve and frustrating times with it, I felt very accomplished when it launched. I learned a lot from the process

 

What are your top three citizen science projects? 

Chimp & See, Planet Hunters, and Gravity Spy.

 

What advice would you give to a researcher considering creating a Zooniverse project?

Do lots of prototyping and beta testing with the project builder before you launch so you have a solid idea of the data format going in and what the resulting classification data will look like. Have a timely plan on how to process the data and get that results back to the volunteers.

 

When not at work, where are we most likely to find you?

Seeing live music, dining out, playing video or board games, or cooking at home.

Advertisements

Who’s who in the Zoo – Becky Rother

In this week’s edition of our Who’s who in the Zoo series meet Becky Rother, who is visual design lead here at the Zooniverse.

– Helen


becky-2 - Becky Rother

Name: Becky Rother

Location: Adler Planetarium, Chicago, IL

 

 

 

Tell us about your role within the team: 

I’ve been the Zooniverse designer for a little over a year. In this role, I design custom projects like Scribes of the Cairo Geniza in addition to general Zooniverse.org and public-facing design needs. I also help organize public Zooniverse events here at the Adler Planetarium.

 

What did you do in your life before the Zooniverse?

I actually have a degree in Journalism, and have worked in various roles from newspaper page designer to mobile app designer.

 

What does your typical working day involve?

Being on the US team, there’s usually some catch up to be done from our UK colleagues involving reviewing an implemented design or answering questions on our Slack channel. Besides that, every day is different! I may design a giant banner one day, then the next work on wireframes for a new project we’re just getting started.

 

How would you describe the Zooniverse in one sentence?

Zooniverse is an exceptional group of people working together to positively effect science and the humanities.

 

Tell us about the first Zooniverse project you were involved with

The first custom project I worked on was the Anti-Slavery Manuscripts, a collaborative transcription project in partnership with the Boston Public Library. It’s a really special project for a number of reasons, and a great introduction to the Zooniverse community.

 

What are your top three citizen science projects?

I’m obsessed with our camera trap projects – Chicago Wildlife Watch in particular. It’s SO COOL to get to see actual animals in their native habitats. I also really enjoy Gravity Spy – as a non-astronomer, it’s neat to be able to help scientists study gravitational waves. Lastly, I may be biased but I really enjoy Anti-Slavery Manuscripts. Reading these first-hand accounts from people actually involved in the abolitionist movement during the Civil War really brings history to life.

 

When not at work, where are we most likely to find you?

Chicago has so many great music venues, so I love taking advantage of that and going to indie rock shows. I also love travel and will take any opportunity to explore somewhere new.


 

Focussing effort where it is needed: picking out the Bugs that are harder to Bash

Below is a guest post from Dr Philip Fowler, who leads our award-winning bug-squishing project BashTheBug. This project aims to improve the diagnosis and treatment of tuberculosis, which remains one of the leading causes of death worldwide.

This project has a huge amount of data to get through, so Phil is working hard to make sure this is being done in the most efficient way possible. Read on to find out more. 

– Helen

 


Focussing effort where it is needed: picking out the Bugs that are harder to Bash

 

BashTheBug has been running for a little over a year now and in that time 11,303 volunteers have classified 834,032 images of the bacterium that causes tuberculosis growing on 14 antibiotics at different strengths. These images correspond to a bit less than 4,000 different samples of M. tuberculosis since each image is shown, by default, to different 15 volunteers to generate a consensus.

The goal of the larger CRyPTIC project that BashTheBug belongs to is to match all this data with the genomes of each and every sample and thereby produce the most comprehensive and accurate catalogue of what genetic variants confer resistance to specific antibiotics. This is important because there is a shift towards using genomic methods to diagnose which antibiotics would be best to treat individual patient infections because genomics can be faster, cheaper and probably more accurate as well.

 

Too many new images?

The CRyPTIC project has produced a new dataset of 4,286 samples. These have been collected from people with tuberculosis from all over the world.

This dataset alone would need 900,060 classifications if we were to simply require each antibiotic lane to be seen by 15 different volunteers and, unless a lot more people joined the project, would take at least a year. Our problem is the project is producing around 1,000 samples a month, which would require 210,000 classifications a month, which our volunteers at present could not keep up with!

Ultimately the CRyPTIC project will collect at least 30,000 samples over the next few years, so we are only at the beginning!

 

Some images are easy…

What might help is we’ve found that some of the images of bacterial growth are easy to classify. For example, all 15 volunteers identify well number 2 as the first well in which there is growth.

Unknown

If the volunteers find this easy, a computer might also, so we wrote some computer software (called AMyGDA) that tries to measure the growth in each of the wells on the 96-well plate. It does a good job on these simple cases, but is confused by cases where this is little growth, or there are artefacts on the image, like air bubbles, contamination or shadows.

We can identify the “easier” images based on how much growth there is, and whether the computer software agrees with the single reading we have of each plate done by a laboratory scientist. On our new dataset of 4,286 samples, this approach identifies 84% of the antibiotic lanes as easy to classify.

If we only send the remaining 16% of images to the volunteers, that reduces the number of classifications we need to complete this dataset down to 144,000 with a monthly growth rate of 34,000 which is much more achievable!

 

…and some are hard.

But this means you will all be seeing images that are harder to interpret and classify and therefore should be more of a challenge.

This is an example of an image that is harder to classify.

Unknown-1

In our existing dataset, these images have typically elicited a range of answers. Some volunteers might say they cannot classify the image, whilst others would identify a range of wells as being the first with no growth. We can, of course, still form a consensus (I’d say well 5), but the variation is itself telling us something about how and why the image is hard to classify, which is potentially useful (for example, for training a machine learning classifier).

 

A few things to think about

Because the images should be, on average, more challenging now, you will have to make more frequent judgment calls about whether that blob in well 5 is an artefact or whether it “looks like” growth, and if you think it is, whether or not it is big enough to be relevant. Personally, I’d say for something to be growth it has to look like the growth in the positive control wells. If it is a lot smaller (like a dot) then I personally tend to ignore it. Don’t spend too long on individual images – rely on the collective power of the volunteers to allow us to extract a consensus from all your answers!

 

Focussing your efforts

In summary

– there are a lot of new images available to classify on our Zooniverse project page and
– they should be, on average, a lot more interesting and challenging

 

To get more frequent project updates,

– check for banners on the Zooniverse project page
– follow BashTheBug on Twitter, Instagram and Facebook
– check out our blog

 


Philip W Fowler
6 August 2018

The Zooniverse at the Royal Society

Pano Royal

On the 3rd of July the Zooniverse team headed to London to take part in the Royal Society’s Summer Science Exhibition. Several Zooniverse projects were featured, including Galaxy Zoo, Penguin Watch and The Plastic Tide:

Zooniverse

Royal1
Zooniverse at the Royal Society

For those of you new to the Zoo, the Zooniverse is an online platform for Citizen Science research. It relies on volunteers to analyse data which then contributes to real research. This often results in new discoveries, publications and data sets useful to the wider research community. At the Royal Society, Zooniverse team members; Adam McMaster, Grant Miller, Cam Allen, Jim O’Donnell and Helen Spiers gave visitors a whistle stop tour of the Zooniverse and answered any questions people had about what we do. Visitors were surprised at the plethora of projects on the platform that they can contribute to, and got to ‘listen to the Zooniverse’ via a web page that plays a note for every classification made! You can listen to it here, it’s not the easiest thing to dance to but I think it makes nice background music.

Galaxy Zoo

Royal3
Galaxy Zoo

Members of the Galaxy Zoo team, Coleman Krawczyk and Jen Gupta, shared the ‘Tactile Universe’ with visitors. The Tactile Universe is a project aimed at making astronomy more accessible to those with visual impairments. The team behind it has developed a way to 3D print galaxy images so that the brighter parts of the picture stick out more. This allows you to feel the shape of the galaxy rather than see it. People really seemed to enjoy trying to match up pictures of the galaxies with what they could feel on the ‘Tactile Universe’ tiles. Also on show was the Galaxy Zoo project, which asks volunteers to identify the type of galaxy in an image.

Penguin Watch

Royal4
Penguin Watch

Do you want to count some penguins? If yes then Penguin Watch is the project for you! Fiona Jones, a member of the Penguin Watch team, and myself showed this project off to visitors at the exhibition. Visitors were interested to learn how something as simple as clicking on the penguins in an image can make a big difference to not only scientists but also the penguins themselves. This is because scientists can use the information they get from Penguin Watch to monitor and protect the penguin populations. We also had some cold weather gear and a ration pack which the team need to set up the Penguin Watch cameras in the cold arctic climate. (I still tried the gear on despite the 25 degree temperatures in sunny London!).

The Plastic Tide

Royal6
Plastic Tide

Showing visitors the big problem of plastic pollution in our oceans was The Plastic Tide team; Peter Kohler, Stefan Leutenegger, Karl-Mattias Tepp, and Arturo Castillo. The Plastic Tide project asks volunteers to look at photos of beaches taken by their drone and tag plastic and other rubbish. Visitors to the exhibition were very interested to learn that they could help with this project even from the comfort of their own homes! The team also gave a live demonstration of how they are using the data they get from the Zooniverse to train a computer to identify rubbish on beaches!

 

Visit Zooniverse: www.zooniverse.org

Visit Galaxy Zoo: www.galaxy-zoo.org

Visit Penguin Watch: www.penguinwatch.org

Visit Plastic Tide: www.the-plastic-tide.org

A Zooniverse Spin-Out Company

I wanted to let you know that a new company called 1715 Labs has been set up to make commercial use of the software created by the Zooniverse team. Specifically, the company will explore how other businesses might make use of our tools in order to classify and label images, text, audio and video.

We’ve been approached over the years by a number of companies with such projects in mind, but the Zooniverse policy has always been to accept only projects whose aim is academic research. (See our policy statement at https://www.zooniverse.org/help/lab-policies).

This is not changing. This policy will remain the same for Zooniverse projects, so you can be sure that any project you see at Zooniverse.org will continue to have as its goal the advancement of academic research. Projects developed by 1715 Labs will not appear at Zooniverse.org.

It’s also important that you, the volunteers, know that the Zooniverse will not be handing the new company any of your data or personal information. Indeed, according to the Zooniverse privacy policy we will not be able to. Instead, the company will use the same software as the Zooniverse to reach other crowds who can take part in any projects it creates.

The team who have been working on the Zooniverse will continue to do so, just as they always have. However, the possibility exists that some team members – including myself – may serve as paid consultants for 1715 Labs as the new company gets off the ground. This work will be managed separately from work for us in the Zooniverse.

1715 Labs is formally a spin-out company of the University of Oxford, where a large part of the Zooniverse team have been based from the beginning. It is currently led by Sophie Hackford (We’re currently recruiting a long-term CEO – if you’re interested, send me a CV). Normally, the researchers involved in leading such a spin-out would receive equity in the new company, and benefit financially from it.

However, in this case, we have given up any such rights, passing the shares instead to another new organisation, the 1715 Association. This means – unusually for a spin out company – no-one involved in Zooniverse owns shares or has a financial stake in the new company. (As noted above, some of the team may end up working for 1715 labs as consultants).

The 1715 Association is what’s known as a company limited by guarantee. If it receives money as a result of its ownership of shares in 1715 Labs, it must use it in accordance with its objects. The objects of the 1715 Association are to benefit citizen science research, especially through the Zooniverse. Should the company do well, therefore, the result will be additional funding for our work here at Zooniverse.org and the chance to build new, better, more interesting projects.

This is good news – we want the excellent software our developers have created to be used, and if it can benefit our research, then so much the better. Hopefully, businesses with data that needs labelling will be inspired by this link to the Zooniverse to work with 1715 Labs.

I’m looking forward to seeing what happens with this new venture. In the meantime, I’m happy to answer any questions in the comments below or over on Talk.

Chris Lintott, PI for Zooniverse

Panoptes Client for Python 1.0.3

Hot on the heels of last week’s update, I’ve just released version 1.0.3 of the Python Panoptes Client, which fixes a bug introduced in the previous release. If you encounter a TypeError when you try to create subjects, please update to this new version and that should fix it.

This release also updates the default client ID that is used to identify the client to the Panoptes API. This is to ensure that each of our API clients is using a unique ID.

As before, you can install the update by running pip install -U panoptes-client.

Experiments on the Zooniverse

Occasionally we run studies in collaboration with external  researchers in order to better understand our community and improve our platform. These can involve methods such as A/B splits, where we show a slightly different version of the site to one group of volunteers and measure how it affects their participation, e.g. does it influence how many classifications they make or their likelihood to return to the project for subsequent sessions?

One example of such a study was the messaging experiment we ran on Galaxy Zoo.  We worked with researchers from Ben Gurion University and Microsoft research to test if the specific content and timing of messages presented in the classification interface could help alleviate the issue of volunteers disengaging from the project. You can read more about that experiment and its results in this Galaxy Zoo blog post https://blog.galaxyzoo.org/2018/07/12/galaxy-zoo-messaging-experiment-results/.

As the Zooniverse has different teams based at different institutions in the UK and the USA, the procedures for ethics approval differ depending on who is leading the study. After recent discussions with staff at the University of Oxford ethics board, to check our procedure was up to date, our Oxford-based team will be changing the way in which we gain approval for, and report the completion of these types of studies. All future study designs which feature Oxford staff taking part in the analysis will be submitted to CUREC, something we’ve been doing for the last few years. From now on, once the data gathering stage of the study has been run we will provide all volunteers involved with a debrief message.

The debrief will explain to our volunteers that they have been involved in a study, along with providing information about the exact set-up of the study and what the research goals were. The most significant change is that, before the data analysis is conducted, we will contact all volunteers involved in the study allow a period of time for them to state that they would like to withdraw their consent to the use of their data. We will then remove all data associated with any volunteer who would not like to be involved before the data is analysed and the findings are presented. The debrief will also contain contact details for the researchers in the event of any concerns and complaints. You can see an example of such a debrief in our original post about the Galaxy Zoo messaging experiment here https://blog.galaxyzoo.org/2015/08/10/messaging-test/.

As always, our primary focus is the research being enabled by our volunteer community on our individual projects. We run experiments like these in order to better understand how to create a more efficient and productive platform that benefits both our volunteers and the researchers we support. All clicks that are made by our volunteers are used in the science outcomes from our projects no matter whether they are part of an A/B split experiment or not. We still strive never to waste any volunteer time or effort.

We thank you for all that you do, and for helping us learn how to build a better Zooniverse.

Who’s in the Zoo – Sophia Vaughan

Hello, my name is Sophia Vaughan, I’m a Physics studeIntro2nt from Oxford University but I don’t spend all of my time studying physics, I also like to knit and go to science fiction conventions (especially Star Trek ones).

I’m posting on this blog because I’ve just started a summer project here at Zooniverse and while I’m here hoping to create a new project for the Zooniverse. But don’t worry you don’t need to be a scientist, student or a professor to create a project, all you need is Zooniverse account! If all goes well you’ll see future blog post(s) from me on how to build your own project for using the project builder.

Why you should use Docker in your research

Last month I gave a talk at the Wetton Workshop in Oxford. Unlike the other talks that week, mine wasn’t about astronomy. I was talking about Docker – a useful tool which has become popular among people who run web services. We use it for practically everything here, and it’s pretty clear that researchers would find it useful if only more of them used it. That’s especially true in fields like astronomy, where a lot of people write their own code to process and analyse their data. If after reading this post you think you’d like to give Docker a try and you’d like some help getting started, just get in touch and I’ll be happy to help.

I’m going to give a brief outline of what Docker is and why it’s useful, but first let’s set the scene. You’re trying to run a script in Python that needs a particular version of NumPy. You install that version but it doesn’t seem to work. Or you already have a different version installed for another project and can’t change it. Or the version it needs is really old and isn’t available to download anymore. You spend hours installing different combinations of packages and eventually you get it working, but you’re not sure exactly what fixed it and you couldn’t repeat the same steps in the future if you wanted to exactly reproduce the environment you’re now working in. 

Many projects require an interconnected web of dependencies, so there are a lot of things that can go wrong when you’re trying to get everything set up. There are a few tools that can help with some of these problems. For Python you can use virtual environments or Anaconda. Some languages install dependencies in the project directory to avoid conflicts, which can cause its own problems. None of that helps when the right versions of packages are simply not available any more, though, and none of those options makes it easy to just download and run your code without a lot of tedious setup. Especially if the person downloading it isn’t already familiar with Python, for example.

If people who download your code today can struggle to get it running, how will it be years from now when the version of NumPy you used isn’t around anymore and the current version is incompatible? That’s if there even is a current version after so many years. Maybe people won’t even be using Python then.

Luckily there is now a solution to all of this, and it’s called software containers. Software containers are a way of packaging applications into their own self-contained environment. Everything you need to run the application is bundled up with the application itself, and it is isolated from the rest of the operating system when it runs. You don’t need to install this and that, upgrade some other thing, check the phase of the moon, and hold your breath to get someone’s code running. You just run one command and whether the application was built with Python, Ruby, Java, or some other thing you’ve never heard of, it will run as expected. No setup required!

Docker is the most well-known way of running containers on your computer. There are other options, such as Kubernetes, but I’m only going to talk about Docker here.

Using containers could seriously improve the reproducibility of your research. If you bundle up your code and data in a Docker image, and publish that image alongside your papers, anyone in the world will be able to re-run your code and get the same results with almost no effort. That includes yourself a few years from now, when you don’t remember how your code works and half of its dependencies aren’t available to install any more.

There is a growing movement for researchers to publish not just their results, but also their raw data and the code they used to process it. Containers are the perfect mechanism for publishing both of those together. A search of arXiv shows there have only been 40 mentions of Docker in papers across all fields in the past year. For comparison there have been 474 papers which mention Python, many of which (possibly most, but I haven’t counted) are presenting scripts and modules created by the authors. That’s without even mentioning other programming languages. This is a missed opportunity, given how much easier it would be to run all this code if the authors provided Docker images. (Some of those authors might provide Docker images without mentioning it in the paper, but that number will be small.)

Docker itself is open source, and all the core file formats and designs are standardised by the Open Container Initiative. Besides Docker, other OCI members include tech giants such as Amazon, Facebook, Microsoft, Google, and lots of others. The technology is designed to be future proof and it isn’t going away, and you won’t be locked into any one vendor’s products by using it. If you package your software in a Docker container you can be reasonably certain it will still run years, or decades, from now. You can install Docker for free by downloading the community edition.

So how might Docker fit into your workday? Your development cycle will probably look something like this: First you’ll probably outline an initial version of the code, and then write a Dockerfile containing the instructions for installing the dependencies and running the code. Then it’s basically the same as what you’d normally do. As you’re working on the code, you’d iterate by building an image and then running that image as a container to test it. (With more advanced usage you can often avoid building a new image every time you run it, by mounting the working directory into the container at runtime.) Once the code is ready you can make it available by publishing the Docker image.

There are three approaches to publishing the image: push the image to the Docker Hub or another Docker registry, publish the Dockerfile along with your code, or export the image as a tar file and upload that somewhere. Obviously these aren’t mutually exclusive. You should do at least the first two, and it’s probably also wise to publish the tar file wherever you’d normally publish your data.

 

The Docker Hub is a free registry for images, so it’s a good place to upload your images so that other Docker users can find them. It’s also where you’ll find a wide selection of ready-built Docker images, both created by the Docker project themselves and created by other users. We at the Zooniverse publish all of the Docker images we use for our own work on the Docker Hub, and it’s an important part of how we manage our web services infrastructure. There are images for many major programming languages and operating system environments.

There are also a few packages which will allow you to run containers in high performance computing environments. Two popular ones are Singularity and Shifter. These will allow you to develop locally using Docker, and then convert your Docker image to run on your HPC cluster. That means the environment it runs in on the cluster will be identical to your development environment, so you won’t run into any surprises when it’s time to run it. Talk to your institution’s IT/HPC people to find out what options are available to you.

Hopefully I’ve made the case for using Docker (or containers in general) for your research. Check out the Docker getting started guide to find out more, and as I said at the beginning, if you’re thinking of using Docker in your research and you want a hand getting started, feel free to get in touch with me and I’ll be happy to help you.