Wikidata can change the way citizen scientists contribute

If you’ve been following discussions on citizen science, you’ve probably realized that researchers are generating so much data, that they need extensive help for parsing the data and making it more useful. For many projects, citizen scientists have answered the call for help–making enormous contributions. Sure, there was a recent study which found that “Most participants in citizen science projects give up almost immediately”, but as Caren Cooper pointed out:

Just by trying, citizen scientists made important contributions regardless of whether or not they chose to continue.

But I digress...
But I digress…

What does citizen science have to do with wikidata? On that matter, what the heck is wikidata?

Much of citizen science contributions come in some form of data collection (observations; sample collection; taking measurements, pictures, coordinates, etc) or classification (identification, data entry, etc) but few citizen scientists participate in analyzing the data.

From ‘Surveying the Citizen Science Landscape’ by Andrea Wiggins and Kevin Crowstone (click the figure to read the paper, it’s open access)

Wikidata (a linked, structured database for open data) may serve to change that. Naturally, wikidata relies on the contributions of volunteers; however, the data incorporated into wikidata is open for anyone to use. In fact, wikidata is begging to be used and citizen scientists and citizen data scientists are welcome to use it. An international group of has already put together a grant proposal (open/crowdsourced in the true spirit of wikipedia) to make wikidata an open virtual research environment. Dubbed, Wikidata for Research the proposal aims to establish “Wikidata as a central hub for linked open research data more generally, so that it can facilitate fruitful interactions at scale between professional research institutions and citizen science and knowledge initiatives.”

As exciting as this all is, there is a lot of work that still needs to be done for making wikidata more successful. Although it’s open access, it’s still a bit inaccessible due to the lack of clear documentation for new users. It’s not that the information doesn’t exist–there is a ton of information on wikidata available and a lot of neat tools already available and in development. You just have to look really hard for it. Fortunately, the wikidata community is already aware of the key issues that need to be addressed in order to become more successful.

Researchers have already taken considerable effort to make science more accessible by contributing to science-related articles. There are over 10,000 genes already in wikipedia thanks (in part) to the Gene Wiki initiative! It makes sense that wikidata is next. A lot of progress has been made in this arena, but I’ll save that for later.

Merry XMAS and Happy researching!

Here’s just a small sample of various XMAS-named science things:

  • Xmas-1 – A rather mysterious fruit fly gene which may affect fruit fly reproduction.
  • Christmas disease – AKA haemophilia B was named after the first patient identified with the disease, Stephen Christmas. Clinically appears to be identical with classic haemophilia. It is an x-linked form of haemophilia caused by deficiency of Christmas factor (factor IX)
  • Christmas Factor – AKA Factor IX, entrez geneID: 2158
  • XMAS: An Experiential Approach for Visualization, Analysis, and Exploration of Time Series Microarray Data
  • XMAS The Xymmer+Mystrium+Adetomyrma+Stigmatomma clade of ants

What have I missed?

Does your gene drug target have other implications? Enter: MORPHIN

MORPHIN: a web tool for human disease research by projecting model organism biology onto a human integrated gene network.

If you missed it, there was another headline reporting the discovery of a new potential gene target, “CREB3L3 May Be Target for Obesity, Diabetes Treatments” that has shown promise in mice anyway. In light of all these new disease-related genes being discovered daily–in various model organisms–wouldn’t it be awesome if there was a tool that allowed us see how new genes discovered in model organisms linked to human disease? This may seem like a strange thing to do considering the fact that researchers use model organisms in order to study human diseases in the first place. After all, one would expect that the genes discovered in a model organism whilst conducting research on a particular disease should link to the disease being modeled, right? Of course! But what if the gene you discover while researching angina actually plays a big role in erectile dysfunction? What if the gene you’re targeting for erectile dysfunction may actually play an important role in muscular dystrophy? While Viagra was originally developed to treat angina by targeting phosphodiesterase 5, it became a blockbuster treatment for erectile dysfunction. Similarly, Cialis which was developed also targeting PDE5 for erectile dysfunction has been explored for the treatment of Beckman’s muscular dystrophy.

Research behind the little blue pill started with angina. By Tim.Reckmann (Own work) [CC BY-SA 3.0 (, via Wikimedia Commons

The point is…maybe you’ve uncovered a group of genes that may be the key to a disease you’re studying, but there may be other disease indications you may want to consider branching off and studying in the future. And now, there’s a tool to help with that: Enter MORPHIN

I tested MORPHIN using the gene from the aforementioned article: CREB3L3, along with another gene that has been hyped as a target for curing obesity: Leptin (LEP) and the Leptin receptor (LEPR). I entered the three genes, selected ‘mouse’ for the model, and submitted the query.

Unfortunately, MORPHIN is very slow. A query can take 3-5 min, even 20 min if there’s a queue. I’m guessing the delays stems from using not an overlap-based gene set association measure (Fisher exact test) in addition to a network-based gene set association measure, RIDDLE to enhance the sensitivity of association mapping. RIDDLE itself takes 2-3 min to run if you’re lucky. I’m sure the mapping of queried animal genes to human orthologs is the fastest part considering there are really fast services already available like

35 min later, we have the result! (MORPHIN must be really popular!). Of course, the first disease that was associated with these three genes was ‘Eating disorders.’ The second one was ‘HELLP syndrome’. A quick search in pubmed reveals that there are 2000+ articles on HELLP syndrome, four of which are linked to Leptin, none to CREB3L3. Interesting. Might be fun to try and see how the results of a query for various phosphodiesterases, but I don’t think I can spare another hour on this.

If you’re curious, see the development time line of Viagra here.

Impact of Social Sciences – The great potential of citizen science: restoring the role of tacit knowledge and amateur discovery.

    “In most citizen science projects today, however, amateurs perform rather mundane tasks like documenting things (see above), outsourcing computing power (e.g. SETI@home) or playing games (e.g. Foldit). You can go to the Scientific American’s citizen science webpage and search for the word ‘help’ and you will find that out of 15 featured projects, 13 are prefaced help scientists do something. The division of roles between citizens and real scientists is evident. Citizen scientists perform honey bee tasks. The analytic capacity remains with real researchers. Citizen science today is often a twofold euphemism.

    That is not to say that collecting, documenting and counting is not a crucial part of research. In many ways the limited task complexity even resembles the day-to-day business of in-person research teams. Citizen scientists, on the other hand, can work when they want to and on what they want to. That being said, citizen science is still a win win in terms of data collection and citizen involvement.”Impact of Social Sciences – The great potential of citizen science: restoring the role of tacit knowledge and amateur discovery..

Don’t get me wrong, I understand and see the point the author makes about the division of roles between citizen and real scientists in terms of running analyses, but before anyone complains that citizen scientists don’t get to do anything fun and that their tasks are mundane…take a look at this:

'Real' scientists do really boring stuff too

If you’ve never touched one of these in your life, it’s fun for about the first ten minutes. After that, it quickly loses its charm. It basically works similar to this:


Which is gets old pretty quick when you’re adding different solutions, volumes of solutions, etc to a couple dozen hundred of these:

Minusheet Figure 2

Personally, I’d rather be counting these guys for Penguin Watch:
Penguins walking -Moltke Harbour, South Georgia, British overseas territory, UK-8

Anyone else feel the same?

Neat Science Thursday – Videogames and violence

News flash, a new study was just released on PLOS ONE examining the effects of violent video games on levels of aggression. Now before you freak out about whether or not more arbitrary ratings should be used to label video games, you should actually read the research article first (it’s open-access and really interesting, so enjoy!)

I know you don’t play these RPGs in first person to look at the pretty landscape!

Then be sure that you don’t generalize these findings to all video games just yet. This study was only comparing a First Person Shooter (Call of Duty Modern Warfare) with a puzzle-platform game (little big planet 2) rather than a ‘neutral’ first person game because the authors were unable to identify such a game. Forget all you role playing gamers who play in first person perspective…I’m sure you only play to kill monsters and don’t care for the long and intricate narrative anyway. I’m sure you ignore all those NPC requests for help, because neutral first person perspective games just aren’t popular. Eat that Elderscrolls, you give the player too many options! Forget you, you miscellaneous first person golf games—you’re too boring! There are just no good neutral first person games to compare against a first person shooter. Don’t worry, the researchers were aware of this issue, and will be sure to expand on this study once they identify a non-violent first person game.

It’s a bit tricky to define violence, so the most clear and obvious case of it was used for this study. This explains why more ambiguous games where killing monsters is optional is not used as a comparison. Eg- My bro once spent hours in a 1st person RPG just jumping in order to increase his acrobatic skill.

Because playing a golf video game is just as dull as playing a real life golf game.

The researchers did have a very clever method for studying aggression. Not sure it’s an accurate measure of aggression, but it is definitely very clever.

The researchers used the General Aggression Model (GAM) for this study (this is not where the cleverness comes in). “A widely accepted model for understanding media effects, the GAM posits that cognition, affect and arousal mediate an individual’s perception of a situation. Thus, in the short-term a violent video game may temporarily increase aggression through the activation of one or more of these domains. In the long-term aggressive scripts can develop and become more readily available.”

Here’s the clever part: To measure the alterations in aggression (or arousal) the researchers measured the amount of chili sauce to which the player was willing to subject a non-existent pepper-sensitive taste-tester. The more aggressive player would be more willing to subject a pepper-sensitive person to more pain. Indeed, the researchers found that subjects that played the first person shooter put more chili sauce than subjects that played the puzzle-platform game. Furthermore, subjects that played the first person shooter online (in a more competitive environment) used even more chili sauce than subjects that played the FPS offline.

The researchers did take some measures of affect (emotional state of the players), but didn’t see a difference thus didn’t pursue the matter further. It’s unclear if the FPS and the puzzle-platform game induced different levels of arousal and if levels of arousal could be distinguished from aggression in the chili sauce test. Would be interesting to see how someone playing a Kinect running game would compare in this test. Just a little food for thought.

Neat Science Thursday – The Efficacy of Smartphone based Citizen Science Training

Citizen science has contributed greatly to ecological studies and nature surveys for over a century, but is just beginning to make a mark in biomedical research. Part of the problem may be attributed to the difficulty in enabling a citizen scientists’ participation on complex tasks. Foldit did an excellent job in harnessing the problem solving skills of the gaming community by turning protein folding problems into a game. Eyewire has enabled citizen scientists to help map neurons by creating a challenging and interesting virtual coloring book. In order to be successful at these tasks, training is of extreme importance–so important that the recent PLOS One paper studying three different citizen science training methods deserves more attention.

Although this study was again focused on ecological work, the authors of the paper studied three different modes of training:

  • In-person training- participants are provided in-person training along with app-based videos and app-based text/images.
  • Video-training- participants are given no in-person training, but receive app-based video and app-based text/image training.
  • Text/Image only training- participants only receive app-based text/image training (no video or in-person training).

Each mode of training had an equal number of participants during training; however, removing low submission participants and participant drop out resulted in unequal numbers in the training groups during the data analysis. All in all, there were a total of 56 participants studied in the final analysis: 14 (in-person training), 17 (video training), and 25 (text/image training).

Participants were trained on the identification of specific invasive plant species in Maine and were asked to submit their pictures and locations of the invasive plants in question using the Outsmart app.

Table 1. Percent correctly identified by the five species investigated. doi:10.1371/journal.pone.0111433.t001

After analyzing the results, the authors found that participants did an excellent job of identifying the invasive species in the ‘easy’ category. The biggest difference was in the identification of invasive species in the ‘difficult’ category. The authors expected that participants from the in-person training group would outperform the others, but found that participants in the video training group did just as well. This has important implications for citizen science training since geographic limitations restrict the ability to do massive amounts of in-person training, but video training is stream-able and may help participants to perform just as well.

It’s unclear whether or not there were sample issues at the end due to participant drop off was an issue in the final results–that is, whether or not less skilled users were dropping out, effectively increasing the % correct rates in the different groups since more users dropped out of the in-person and video training groups than the text/image training group. It’s also unclear as to whether the training method affected drop out rates, but it is interesting that the text/image training had lower drop out rates.

Also important to note- regardless of the training medium, the citizen scientists did pretty good overall–leading to two take home messages:

  1. Citizen scientists can and do contribute high quality data (not a new finding, but certainly worth repeating)
  2. This research group did a pretty awesome job with their training to begin with and the different media through which they offered the training was a bit like icing on the cake.

See the original paper here, it’s open access, and quite an enjoyable read.
For more citizen science games, visit this post.