1st BD2K/3rd Network of Biothings Hackathon Recap

Let me clarify one thing from the start: My programming skills are extremely limited.

That said, I was still very curious as to what a hackathon would be like and concerned about whether or not I’d be able to find a project to which I might make some sort of contribution.

2015.05.07
The first night of the hackathon was the meet, greet, and project pitches. Of the 45 or so people that signed up, about 39 of them showed up. Nearly everyone from the Andrew’s lab was there, and quite a number of people from Peipei Ping’s lab at UCLA as well. Illumina, one of the event sponsors was there as well but we’ll get to that later.

Opening night introduction

A little after 7:00 PM, Greg gave an introduction to the hackathon, thanked the sponsors profusely, before opening the stage up to the project pitches listed below (In bold are ones which had enough interest to move forward–in some form or another–):

  • Genome-wide SNP Frequency Visualization
  • Create a Big Data API registry
  • MyGene.Info plugin to Cytoscape
  • Scalable Interactive phylogenetic tree API
  • Revamping the Emperor API
  • Wikidata to Gene Wiki interface
  • PyPI4Bio: The catalogue of Python packages for bioinformatics
  • A Python Flask or Django-based webapp for metaproteomics cluster job submission
  • FastR: Efficient entity property search system
  • Mobile (iOS, Android) Semantic Pathfinder
  • Interactive network browser
  • Edge Evidence Annotator
  • Schema extractor for WikiData
  • Bring KaBOB online using ElasticSearch and REST services
  • PHapL: Protein Haplotype Library
  • 23andUs.com: Dating service that optimizes genetic compatibility between you and your future mate
  • A mobile cooperative running app
  • Illumina pitch about the use of Facebase

Illumina was a great sponsor in that they not only introduced their Facebase platform, they also made some of their staff available to answer questions about working at Illumina and in the industry in general (apparently they’re hiring.) IMHO they deserve props for that.

I decided to join the Wikidata to Gene Wiki interface project because I’d been working on the GENE/Gene Wiki partnership for the past nine months. Lively discussions about the projects continued past midnight.

2015.05.08
I was excited to learn that Anders (from Peipei Ping’s lab) and Ryan had joined the project. Anders had also been working on the Gene Wiki project and had already updated 60 cardiac protein entries in Wikipedia. Sorry, Anders, I’m going to keep calling it Gene Wiki for now since it was started as part of this grant: The Gene Wiki: Community intelligence applied to gene annotation. Once we start interweaving Wikidata, we should probably start calling it the BD2K wiki.

Immediately, we ran into issues as Wikipedia had a brand new (less than a week old) way of pulling and displaying information from Wikidata; however, it was only usable in a test instance of Wikipedia, and all the items and properties (data) that we wanted to work with were not available in the test instance of Wikidata. In order for Sebastian to run a script that would call wikidata items and properties into the protein box template used by Gene Wiki, we had to either work on the non-test instance of Wikipedia/Wikidata or mirror some of the existing information onto the test site.

After lunch, the entire team got to work manually copying items and properties as well as Gene Wiki pages from the real wiki sites over to the test site. After a limited number of items and properties were mirrored, Sebastian started writing the necessary code from scratch. Initially, we had hoped to fork an instance of the protein bot box or Gene Wiki Generator; however, after inspecting the code we discovered it utilized a bit of java (of which which none of us had sufficient expertise). Meanwhile Anders and I continued to mirror additional properties from the Wikidata site to the test instance of Wikidata. We finished a little after 10:00 PM.

2015.05.09
After breakfast, we set to create our powerpoint for the demo (which came out to be quite pretty.) We finished right before lunch, and the presentations/demo’s started a little after 2:00 PM. The schedule of the presentations is available here, and footage of the presentations are available here. Overall, the amount of work participants completed in such a short period of time was staggering and some of the demos were really cool! Congrats to the first place winner, Heart2heart (which had a working smartphone fitness app by the end of the event), and to the SmartAPI team (2nd place) whose work has incredible potential for solving a lot of Big Data fragmentation issues.

Wikipedia as a basic scientific reference

Wikipedia is probably the most current, extensive, and accessible knowledge base available. Currently, there are over 10,000 Wikipedia entries for human genes of interest thanks to the Gene Wiki project and the contributions of the dedicated and altruistic Wikipedia community. Unfortunately, many of these articles are out-of-date or are just stubs in desperate need of content. If you are in science and truly believe in open access, why not contribute?

It may be a bit intimidating to edit a scientific Wikipedia article if you’ve never done it before, but it is actually quite easy! In the interest of encouraging wiki contributions from those in STEM disciplines, here’s a 10 step walk-thru for editing a Wikipedia entry:

1. Register/Login – No, it’s not necessary for you to do this step in order to edit a wiki, but you should just so you can be proud of all the wiki pages you improve in the future
2. Go to a wiki page in need of an update. Look up your favorite gene in wikipedia and help improve it!
3. Click on the ‘edit’ tab in the top, right corner of the page
01. Click on Edit
4. Edit the content of the page. To add a section break, use double equal signs (eg- ==Section== )
5. To add a journal reference, click on ‘cite’ in the navigation bar, click on the ‘templates’ drop-down menu and select ‘cite journal’. Enter the PMID of the article into the ‘PMID’ field and click on the search (magnifying glass) icon to auto-populate the other fields.
02. click on PMID search icon.
6. If you plan on using this reference more than once, assign it a reference name so you can insert it again later.
03a. Assign a reference name
7. Click ‘insert’ to insert the reference
8. When using a previously inserted/named reference again just use the ‘Named Reference’ (clipboard) icon.
03b. Use a previous named reference
9. Make a note about your changes in the ‘edit summary’ field, and then preview (optional) and save your edits.
04a. save your edits
10. If you make a mistake, you can easily revert the edits you made. Just go to the ‘view history’ tab. Find the changes you need to revert and click ‘undo’.
04. revert a change

That’s it! What are you waiting for?

Need more editing tips? Check out the Gene Wiki portal and learn more ways to help improve Wikipedia as a knowledge base for human genes of interest.

Wikidata can change the way citizen scientists contribute

If you’ve been following discussions on citizen science, you’ve probably realized that researchers are generating so much data, that they need extensive help for parsing the data and making it more useful. For many projects, citizen scientists have answered the call for help–making enormous contributions. Sure, there was a recent study which found that “Most participants in citizen science projects give up almost immediately”, but as Caren Cooper pointed out:

Just by trying, citizen scientists made important contributions regardless of whether or not they chose to continue.

But I digress...
But I digress…

What does citizen science have to do with wikidata? On that matter, what the heck is wikidata?

Much of citizen science contributions come in some form of data collection (observations; sample collection; taking measurements, pictures, coordinates, etc) or classification (identification, data entry, etc) but few citizen scientists participate in analyzing the data.

From ‘Surveying the Citizen Science Landscape’ by Andrea Wiggins and Kevin Crowstone (click the figure to read the paper, it’s open access)

Wikidata (a linked, structured database for open data) may serve to change that. Naturally, wikidata relies on the contributions of volunteers; however, the data incorporated into wikidata is open for anyone to use. In fact, wikidata is begging to be used and citizen scientists and citizen data scientists are welcome to use it. An international group of has already put together a grant proposal (open/crowdsourced in the true spirit of wikipedia) to make wikidata an open virtual research environment. Dubbed, Wikidata for Research the proposal aims to establish “Wikidata as a central hub for linked open research data more generally, so that it can facilitate fruitful interactions at scale between professional research institutions and citizen science and knowledge initiatives.”

As exciting as this all is, there is a lot of work that still needs to be done for making wikidata more successful. Although it’s open access, it’s still a bit inaccessible due to the lack of clear documentation for new users. It’s not that the information doesn’t exist–there is a ton of information on wikidata available and a lot of neat tools already available and in development. You just have to look really hard for it. Fortunately, the wikidata community is already aware of the key issues that need to be addressed in order to become more successful.

Researchers have already taken considerable effort to make science more accessible by contributing to science-related articles. There are over 10,000 genes already in wikipedia thanks (in part) to the Gene Wiki initiative! It makes sense that wikidata is next. A lot of progress has been made in this arena, but I’ll save that for later.

Neat Science Thursday – Crowdsourcing blues

As evident by a few of the previous posts on crowdsourcing science, wikipedia, and the GENE/Gene Wiki partnership, I think crowdsourcing science and citizen scientists are awesome! The speed with which a lot of interested non-scientists can sift through data is simply astounding!

In spite of all the positive features of crowdsourcing science and information (like wikipedia), there are also some interesting drawbacks. For example, wiki entries have been vandalized as part of a joke/challenge started by a comedian in order to make a joke/commentary on the wisdom of crowds, and more recently, users from government-related ip addresses have been systematically editing pages to reflect a particular political agenda. This kind of vandalism has prompted the banning of government ip addresses in the past.

But issues with crowdsourcing are not limited to just information platforms like wikipedia. Crowdsourcing competitions in order to foster participation and innovation have also been hijacked as covered in a recent (and very interesting) post on Science2.0.

According to the original post found at The Conversation about a new study:
“The research, published today in the Journal of the Royal Society Interface, found the openness of crowdsourced competitions, particularly those with a “winner takes all” prize, made them vulnerable to attack.

The researchers used game theory to analyse the trade-off between the potential for increased productivity from crowdsourcing a project, and the possibility of it being set back by malicious behaviour. They cited the DARPA Network Challenge as an example of a hijacked crowdsourcing competition, in which the organisers were left to sort through many fake submissions, including fabricated pictures of people impersonating DARPA officials….continue to the original post

Or visit the actual study publication (and hope your institution has access to it) if you want to read the original study.

Making research accessible with the GENE/Gene Wiki Partnership

Did you know you can update your favorite gene on Wikipedia and get a review article published while you’re at it? Here’s what you need to know about the GENE/Gene Wiki partnership:

  1. What is it? The goal of the Gene Wiki is to create a comprehensive Wikipedia article for every human gene. To incentivize authors to improve Wikipedia content, GENE is now soliciting new gene-specific review articles under a new dual-publication model. Authors are invited to create two separate versions of their review (one for the journal, and one in wikipedia). More on the partnership here: Gene Wiki Reviews: Marrying crowdsourcing with traditional peer review.
  2. How long should the review article be? The length of the review article is up to you! Since you are the expert on the gene you’re writing about, the length is based on whatever you think is necessary to describe the current state of the field.
  3. How long should the wikipedia article be? We are targeting a final length of approximately 1200 words (though longer and more detailed articles are certainly welcome)
  4. How are the two versions different? One version is targeted at professional scientists following typical academic and editorial standards. The second version is written for the Wikipedia audience and includes a slightly heavier emphasis on a general audience. Both versions will be peer-reviewed together, but for copyright reasons, these two versions must be separate works that have no substantial similarity. Some examples of review articles and wikipedia entries published under this model include:
  5. I am busy but intrigued, what is the time line? We generally suggest a 2-3 month deadline, but since this is an ongoing series in the journal, the time line is flexible and can be worked around your schedule. Don’t be discouraged from participating because you are busy now. Make the commitment to submit when your schedule permits.
  6. Do I have to go at this alone? Absolutely not! If you have colleagues who would make good co-authors for the review, feel free to solicit their assistance.
  7. Do I have to write the wiki article all at once? Nope. Our goal is to incentivize you, the expert, to make your knowledge about your favorite gene accessible. If it’s easier for you to write the wiki article in pieces, go ahead and do so! As long as the wiki entry is complete by the time you submit your manuscript, we will be happy to accept your review article.
  8. The gene I work on doesn’t make much sense to write about alone, how should I contribute? Genes that work in concert can be tackled as a pair as with this example:
  9. Why should I do this? By publishing a gene-specific review article, you help your scientific colleagues stay abreast of the current literature on your favorite gene. By publishing under the dual publication model (ie- on wikipedia), you help make your favorite gene more accessible to everyone allowing more people to understand the importance of your field of research. Everyone wins!
  10. How do I get in on this? Check to see whether or not your favorite gene could use some serious contributions on wikipedia. If so, contact me. Include your gene of interest in the email, and your preferred deadline for the manuscript submission.