All posts by Sam Blickhan

IMLS Postdoctoral Fellow for the Zooniverse

Fun with IIIF

In this blog post, I’ll describe a recent prototyping project we (Jim O’Donnell: front-end developer; Sam Blickhan: Project Manager) carried out with our colleagues at the British Library (Mia Ridge, who I’m also collaborating with on the Collective Wisdom project) to explore IIIF compatibility for the Zooniverse Project Builder. You can read Mia’s complimentary blog post here.

History & context

While Zooniverse supports projects working with a number of different data formats (aka ‘subjects’), including video and audio, far and beyond the most frequently used data are images. Images are easy enough to drag and drop into our simple uploader (a feature of the Project Builder for adding data to your project) to create groups of subjects, or subject sets. If you want to upload your subjects with their associated metadata, however, things become slightly more complex. A subject manifest is a data table that allows you to list image file names alongside associated metadata. By including a manifest with your images to upload, the metadata will remain associated with those images within the Zooniverse platform. 

So, what happens if you already have a manifest? Can you upload any type of manifest into Zooniverse? What if you’re working with a specific set of standards? 

IIIF (pronounced “triple eye eff”) stands for International Image Interoperability Framework. It is a set of standards for image and A/V delivery across the web, from servers to different web environments. It supports viewing of images as well as interaction, and uses manifests as a major structural component. 

If you’re new to IIIF, that’s okay! To understand the work we did, you’ll need three IIIF definitions, all reproduced here from https://iiif.io/get-started/how-iiif-works/:

Manifest: the prime unit in IIIF which lists all the information that makes up a IIIF object. It communicates how to display your digital objects, and what information to display about them, including structure, to varying degrees of complexity as determined by the implementer. (For example, if the object is a book of illustrations, where each illustrated page is a canvas, and there is one specific order to the arrangement of those pages).

Canvas: the frame of reference for the display of your content, both spatial and temporal (just like a painting canvas for two-dimensional materials, or with an added time dimension for a/v content).

Annotation: a standard way to associate different types of content to whatever is on your canvas (such as a translation of a line or the name of a person in a photograph. In the IIIF model, images and other presentation content are also technically annotations onto a canvas). For more detail, see the Web Annotation Data Model.

What we did

For this effort, we worked with Mia and her colleagues at the British Library on an exploratory project to see if we could create a proof of concept for Zooniverse image upload and data export which was IIIF compatible. If successful, these two prototypes could then form the basis for an expanded effort. We used the British Library In The Spotlight Zooniverse project as a testing ground.

Data upload

First, we wanted to figure out a way to create a Zooniverse subject set from a IIIF manifest. We figured the easiest approach would be to use the manifest URL, so Jim built a tool that imports IIIF manifests via a URL pasted into the Project Builder (see image below).

This is an experimental feature, so it won’t show up in your Zooniverse project builder ‘Subject Sets’ page by default. If you want to try it out, you can preview the feature by adding subject-sets/iiif?env=production to your project builder URL. For example, if your project number is #xxx, you’d use the URL https://www.zooniverse.org/lab/xxx/subject-sets/iiif?env=production

To create a new subject set, you simply copy/paste the IIIF manifest URL into the box at the top of the image and click ‘Fetch Manifest’. The Zooniverse uploader will present a list of metadata fields from the manifest. The tick box column at the far right allows you to flag certain fields as ‘Hidden’, meaning they won’t be shown to volunteers in your project’s classification interface. Once you’ve marked everything you want to be ‘Hidden’, you click ‘Create a subject set’ to generate the new subject set from the IIIF manifest. 

Export to manifest with IIIF annotations

In the second phase of this experiment, we explored how to export Zooniverse project results as IIIF annotations. This was trickier, because the Zooniverse classification model requires multiple classifications from different volunteers, which are then typically aggregated together after being downloaded from the platform.

To export Zooniverse results as IIIF annotations, therefore, we needed to include a step that runs the appropriate in-house offline aggregation code, then convert the data to the appropriate IIIF annotation format. Because the aggregation step is necessary to produce a single annotation per task, this step is project- and workflow-specific (whereas the IIIF Manifest URL upload works for all project types). For this effort, we tested annotation publishing on the In The Spotlight Transcribe Dates workflow, which uses a simple free-text entry task. The other In The Spotlight workflow has a slight more complex task structure (rectangle marking task + text entry sub-task), which we’re hoping to be able to add to the technical documentation soon.

IIIF Technical Coordinator Glen Robson created a demo for viewing the In The Spotlight annotations in Mirador, which you can explore here: https://glenrobson.github.io/iiif_stuff/zooniverse/partof/ 

Full details and technical documentation are available at https://github.com/zooniverse/iiif-annotations.

Next steps & ways to get involved

Now, we need your feedback! The next steps for this work will include identifying community needs and interest – would you use these tools for your Zooniverse project? What features look useful (or less so)? Your feedback will help us determine our next steps. Mostly, we want to know who our potential audiences are, what task types they would most want to use, and what sort of comfort level they have, particularly when it comes to running the annotations code (from “This is great!” to “I don’t even know where to start!”). There are a lot of possible routes we could take from here, and we want to make sure our future work is in service of our project building community.

Try out the In The Spotlight project and help create real data for testing ingest processes.

Get in touch!

Finally, a massive “Thank you!” to the British Library for funding this experiment, and to Glen Robson and Josh Hadro at IIIF for their feedback on various stages of this experiment.

Engaging Crowds: new options for subject delivery & interaction

Since its founding, a well-known feature of the Zooniverse platform has been that volunteers see (& interact with) image, audio, or video files (known as ‘subjects’ in Zooniverse parlance) in an intentionally random order. A visit to help.zooniverse.org provides this description of the subject selection process:

[T]he process for selecting which subjects get shown to volunteers is very simple: it randomly selects an (unretired, unseen) subject from the linked subject sets for that workflow.

https://help.zooniverse.org/next-steps/subject-selection/

For some project types, this method can help to avoid bias in classification. For other project types, however, random subject delivery can make the task more difficult.

Transcription projects frequently use a single image as the subject-level unit. These images most often depict a single page of text (i.e., 1 subject = 1 image = 1 page of text). Depending on the source material being transcribed, that unit/page is often only part of a multi-page document, such as a letter or manuscript. In these cases, random subject delivery removes the subject (page) from its larger context (document). This can actually make successful transcription more difficult, as seeing additional uses of a word or letter can be helpful for deciphering a particular hand.

Decontextualized transcription can also be frustrating for volunteers who may want greater context for the document they’re working on. It’s more interesting to be able to read or transcribe an entire letter, rather than snippets of a whole.

This is why we’re exploring new approaches to subject delivery on Zooniverse as part of the Engaging Crowds project. Engaging Crowds aims to ‘investigate the practice of citizen research in the heritage sector‘ in collaboration with the UK National Archives, the Royal Botanic Garden Edinburgh, and the National Maritime Museum. The project is funded by the UK Arts & Humanities Research Council as one of eight foundational projects in the ‘Towards a National Collection: Opening UK Heritage to the World‘ program.

As part of this research project, we have designed and built a new indexing tool that allows volunteers to have more agency around which subject sets—and even which subjects—they want to work on, rather than receiving them randomly.

The indexing tool allows for a few levels of granularity. Volunteers can select what workflow they want to work on, as well as the subject set. These features are currently being used on HMS NHS: The Nautical Health Service, the first of three Engaging Crowds Zooniverse projects that will launch on the platform before the end of 2021.

Subject set selection screen, as seen in HMS NHS: The Nautical Health Service.

Sets that are 100% complete are ‘greyed’ out, and moved to the end of the list — this feature was based on feedback from early volunteers who found it too easy to accidentally select a completed set to work on.

In the most recent iteration of the indexing tool, selection happens at the subject level, too. Scarlets and Blues is the second Engaging Crowds project, featuring an expanded indexing tool from the version seen in HMS: NHS. Within a subject set, volunteers can select the individual subject they want to work on based on the metadata fields available. Once they have selected a subject, they can work sequentially through the rest of the set, or return to the index and choose a new subject.

Subject selection screen as seen in Scarlets and Blues.

On all subject index pages, the Status column tells volunteers whether a subject is Available (i.e. not complete and not yet seen); Already Seen (i.e. not complete, but already classified by the volunteer viewing the list); or Finished (i.e. has received enough classifications and no longer needs additional effort).

A major new feature of the indexing tool is that completed subjects remain visible, so that volunteers can retain the context of the entire document. When transcribing sequentially through a subject set, volunteers that reach a retired subject will see a pop-up message over the classify interface that notes the subject is finished, and offers available options for how to move on with the classification task, including going directly to the next classifiable subject or returning to the index to choose a new subject to classify.

Subject information banner, as seen in Scarlets and Blues.

As noted above, sequential classification can help provide context for classifying images that are part of a group, but until now has not been a common feature of the platform. To help communicate ordered subject delivery to volunteers, we have included information about the subject set–and a given subject’s place within that set–in a banner on top of the image. This subject information banner (shown above) tells volunteers where they are within the order of a specific subject set.

Possible community use cases for the indexing tool might include volunteers searching a subject set in order to work on documents written by a particular author, written within a specific year, or that are written in a certain language. Some of the inspiration for this work came from Talk posts on the Anti-Slavery Manuscripts project, in which volunteers asked how they could find letters written by certain authors whose handwriting they had become particularly adept at transcribing. Our hope is that the indexing tool will help volunteers more quickly access the type of materials in a project that speak to their interests or needs.

If you have any questions, comments, or concerns about the indexing tool, please feel free to post a comment here, or on one of our Zooniverse-wide Talk boards. This feature will not be immediately available in the Project Builder, but project teams who are interested in using the indexing tool on a future project should email contact@zooniverse.org and use ‘Indexing Tool’ in the subject line. We’re keen to continue trying out these new tools on a range of projects, with the ultimate goal of making them freely available in the Project Builder.

Frequently Asked Questions: Indexing Tool + Sequential Classification

“Will all new Zooniverse projects use this method for subject selection and sequential classification?”

No. The indexing tool is an optional feature. Teams who feel that their projects would benefit from this feature can reach out to us for more information about including the indexing tool in their projects. Those who don’t want the indexing tool will be able to carry on with random subject delivery as before.

“Why can’t I refresh the page to get a new subject?”

Projects that use sequential classification do not support loading new subjects on page refresh. If the project is using the indexing tool, you’ll need to return to the index and choose a new page. If the project is not using the indexing tool, you’ll need to classify the image in order to move forward in the order of sequence. However, the third Engaging Crowds project (a collaboration with the Royal Botanic Garden Edinburgh) will include the full suite of indexing tool features, plus an additional ‘pagination’ option that will allow volunteers to move forwards and backwards through a subject set to decide what to work on see preview image below). We’ll write a follow-up to this post once that project has launched.

A green banner with the name of the subject set and Previous and Next buttons
Subject information banner, as seen in the forthcoming Royal Botanic Garden Edinburgh project.

“How do I know if I’m getting the same page again?”

The subject information banner will give you information about where you are in a subject set. If you think you’re getting the same subject twice, first start by checking the subject information banner. If you still think you’re getting repeat subjects, send the project team a message on the appropriate Talk board. If possible, include the information from the subject information banner in your post (e.g. “I just received subject 10/30 again, but I think I already classified it!”).

project completed: The American Soldier in wwII

This is a guest post from the research team behind The American Soldier in WWII.

As challenges press upon all of us in the midst of the pandemic, the team behind The American Soldier in World War II has some good news to share. 

When we initially launched our project on Zooniverse on VE Day 2018, our goal was to have all 65,000 pages of commentaries on war and military service written by soldiers in their own hands transcribed and annotated within a 2-year window – in triplicate, for quality-control purposes. We not only hit that milestone in May 2020, but last week we completed an additional 4th round. 

Attracting 3,000-plus new contributors, this extension of the transcription drive took only six months. Beyond allowing more people to engage with these unique and revealing wartime documents, the added round is improving our final project output. Within the next week or so, our top Zooniverse transcribers will begin final, manual verification of these transcriptions and annotations, which have been cleaned algorithmically. If you are a consistent project contributor and interested in helping with final validation, please do let us know by signing up here.

As we move forward with the project, we have created a Farewell Talk board. Since we have had so many incredible contributors to The American Soldier, we would love to hear any parting words our volunteers would like to share with the team and with fellow contributors about your experiences or most memorable transcriptions. 

We are so incredibly grateful for the international team of researchers, data and computer scientists, designers, educators, and volunteers who have gotten the project to where it is and in spite of the great upheaval. Thanks to their hard work and dedication, the project’s open-access website remains on track for a spring 2021 launch. 

We look forward to sharing more news with you soon. Until then, be well and safe. 

The American Soldier in WWII Team

The Zooniverse: A Quick starter guide for research teams

Over the past several months, we’ve welcomed thousands of new volunteers and dozens of new teams into our community.

This is wonderful.

Because there are new people arriving every day, we want to take this opportunity to (re)introduce ourselves, provide an overview of how Zooniverse works, and give you some insight on the folks who maintain the platform and help guide research teams through the process of building and running projects.

Who are we?

The core Zooniverse team is based across three institutions:

  • Oxford University, Oxford UK
  • The Adler Planetarium, Chicago IL
  • The University of Minnesota-Twin Cities, Minneapolis MN

We also have collaborators at many other institutions worldwide. Our team is made up of web developers, research leads, data scientists, and a designer.

How we build projects

Research teams can build Zooniverse projects in two ways.

First, teams can use the Project Builder to create their very own Zooniverse project from scratch, for free. In order to launch publicly and be featured on zooniverse.org/projects, teams must go through beta review, wherein a team of Zooniverse volunteer beta testers give feedback on the project and answer a series of questions that tell us whether the project is 1) appropriate for the platform; and 2) ready to be launched. Anyone can be a beta tester! To sign up, visit https://www.zooniverse.org/settings/email. Note: the timeline from requesting beta review to getting scheduled in the queue to receiving beta feedback is a few weeks. It can then take a few weeks to a few months (depending on the level of changes needed) to improve your project based on beta feedback and be ready to apply for full launch. For more details and best practices around using the Project Builder, see https://help.zooniverse.org/getting-started/.

The second option is for cases where the tools available in the Project Builder aren’t quite right for the research goals of a particular team. In these cases, they can work with us to create new, custom tools. We (the Zooniverse team) work with these external teams to apply for funding to support design, development, project management, and research.

Those of you who have applied for grant funding before will know that this process can take a long time. Once we’ve applied for a grant, it can take 6 months or more to hear back about whether or not our efforts were successful. Funded projects usually require at least 6 months to design, build, and test, depending on the complexity of the features being created. Once new features are created, we then need additional time to generalize (and often revise) them for inclusion in the Project Builder toolkit.

To summarize:

Option 1: Project Builder

  • Free!
  • Quick!
  • Have to work with what’s available (no customization of tools or interface design)

Option 2: Custom Project

  • Funding required
  • Can take a longer time
  • Get the features you need!
  • Supports future teams who may also benefit from the creation of these new tools!

We hope this helps you to decide which path is best for you and your research goals.

Celebrating Citizen Science Day 2019, pt. 1

To celebrate Citizen Science Day 2019, this coming Saturday 13th April, a different member of the Zooniverse team will be posting each day this week to share with you some of our all-time favourite Zooniverse projects. First off in the series is our Digital Humanities Lead, Dr. Samantha Blickhan.

From CitizenScience.org: “Citizen Science Day is an annual event to celebrate and promote all things citizen science: amazing discoveries, incredible volunteers, hardworking practitioners, inspiring projects, and anything else citizen science-related!”

Here at Zooniverse, we’re excited to participate by highlighting a series of projects that we enjoy. I want to kick things off by showing off a current project that does a great job illustrating one of my favorite things about this type of research: its ability to cross typical academic or discipline-specific boundaries.

Screen Shot 2019-04-08 at 1.56.53 PM
Reading Nature’s Library is a transcription project, launched in February 2018, that was created by a team at Manchester Museum. The project invites volunteers to help transcribe labels for the museum’s collections, which include everything from Archery to Numismatics to Zoology, so this project has something for everyone! In the 13 months since their project launched, a community of 2,669 registered Zooniverse volunteers have completed over 9,283(!) subjects.

Beyond the wide-ranging contents of their dataset, this project is a great way to show how projects can affect a range of disciplines. The results of this project could be used for research in a range of disciplines within the sciences (as varied as their collections), not to mention studies of history, archives, and collections management. Furthermore, large amounts of transcribed text can be a useful tool for helping to train machine learning models for Handwritten Text Recognition.

Today’s project selection also raises a good point about terminology and models for participatory research. Although this week we are celebrating ‘Citizen Science Day’, not all projects fit into the same ‘Citizen Science’ model, and the use of ‘citizen’ is not intended in a narrow, geographic sense. As we celebrate the efforts by project teams and their communities of volunteers, we also want to acknowledge the work being done to illuminate these differences and work to develop models for inclusivity and sustainability. The following article great place to start if you’re interested in learning more:

Eitzel, MV et al. (2017) Citizen Science Terminology Matters: Exploring Key Terms: https://theoryandpractice.citizenscienceassociation.org/articles/10.5334/cstp.96/