Finding buried treasure in shifting sand

The problem of keeping up with scientic literature is not new. In 1986, information scientist, Don R. Swanson, published an article about mining the wealth of knowledge buried in academic literature. In his article, “Undiscovered public knowledge”, Swanson investigated information that was not readily available simply because individual biomedical research papers were (and in many ways still are today) created “to some degree independently of one another.” By investigating literature that was “logically connected”, but was otherwise “non-interactive”, Swanson teased out a hypothesis essentially joining two small fields of research.

Since then, researchers are still trying to develop methods to wade through this ever-growing body of literature, only now there are about one million new biomedical research articles being published per year compared to the roughly 350 thousand published in 1986.

Finding the right information is a problem that's only going to get worse unless we do something
Finding the right information is a problem that’s only going to get worse unless we do something

Compound this issue with the growing amount of information that is now contained within biomedical research literature, but is not readily accessible due to lack of appropriate annotations.

This post was originally written for Mark2Cure and can be viewed in its entirety here.

Neat Science Thursday – Open Access is Awesome

Twitter is a great resource for scientific inspiration, because the awesome scientists that regularly tweet share a lot of interesting articles, blog posts, and great content. One particular humorous piece was the link to the overly honest science methods pictures over at, especially the one about pay-walls.

Awesome picture from
Awesome picture from

As hilarious as the image is, it draws attention to several important issues in research. First, is the excessive attention paid to indices such as Impact Factor, or citation-based indices. How often are highly cited articles cited without being read simply because they’re locked behind pay-walls? Why should the merits of a researcher judged primarily on their publications in highly cited, pay-walled journals? Let’s not even touch on how tweeting may impact the perceived respectability of a scientist by his or her peers.

If getting people to read the research article is of genuine interest, then open-access takes the cake. Citations may serve as a proxy for the number of people who read an article and found it valuable, if they actually had access to read the article. Otherwise, it’s just another number.

The Research Information Network (RIN) conducted an analysis of the articles published in Nature Communications and found that OA articles are cited more than subscription articles and attracted three times as many views as those only available to subscribers. More about this study here, see the report here or the full data set here because it’s open access, SO YOU CAN!

Of course, these issues may not be as important as getting and keeping a job. Unless academic institutions consider other metrics in their hiring practices, less established researchers may face considerable pressures against publishing in OA journals.

As one keen PhD candidate put it,

    “Scientists applying for funding and positions are judged not only according to the quality of their work, but also where it is published. Having a single paper published in any of these high-profile journals can have a transformative effect on a career. If publication requires flashy work in fashionable fields then, so the argument goes, this offers the most reliable path to funding and permanent positions.”

View his insightful post about early career researchers.

And more recently, Erin McKiernan, an early career researcher, posted a compelling call for researchers to stand up for and publish in OA journals.

    At any stage of your career, you have the right to stand up for what you believe in. If you believe in openness, stand up for it. Access to information is a human right, but it is often treated as a privilege. This has to change. And it will take all of us to make it happen.

See the rest of her post here

If these were not reasons enough, then consider the public good. How can grant-funded researchers expect the public to fund science (ie- grants), and not be able to see the results of their investments? One woman went to extreme lengths in order to gain access to research articles about the her children’s genetic disease.

    “We spent hours copying articles from bound journals. But fees gate the research libraries of private medical schools. These fees became too costly for us to manage, and we needed to gain access to the material without paying for entry into the library each time. We learned that by volunteering at a hospital associated with a research library, we could enter the library for free. After several months of this, policies changed and we resorted to masking our outdated volunteer badge and following a legitimate student (who would distract the guard) into the library.”

Read Sherry Terry’s article here.

Or the story about the woman who deciphered her own genetic mutation. Who started by searching for biomedical papers on her disease, and then having to “scratched around in Google until she found uploaded PDFs of the articles she wanted.”

Convinced? Check out the list of journals to avoid like the plague.

And if you’re really dedicated, and work on a gene, join the Su lab’s efforts in expanding the publicly available knowledge base on human genes: Gene Wiki.

Omics Pipe- computational framework for reproducible multi-omics data analysis

As mentioned on Monday, the researchers at the Su lab have been working diligently on a paper to introduce their new computational framework for reproducible multi-omics data analysis.  Advancements in sequencing technologies have made sequencing cheaper and more efficient, but this means that researchers need a way to handle all that data.  Different groups have generated a variety of tools for analyzing disparate datasets, but pipelines that integrate these different tools can be daunting to use without a computational background, or may be unaffordable due to commercial licensing fees.

And that’s where Omics Pipe comes in.  Its framework is modular and open source so its functionality can be extended by the research community; it already supports several pipelines; it has built in version control; and best of all, it’s free!

Check out the preprint article; to learn more about Omics Pipe, or visit the Omics Pipe repository.

Here are just some example pipelines of tools that Omics Pipe can automate. Try it out for yourself!

Stipend ready meals- cost-effective fillers

They’re staples for a reason, they fill you quickly and are readily affordable. I’m talking about staples that people have eaten for ages now… noodles, rice, oatmeal, potatoes, and beans. I probably missed some fancier ones, because this is all about eating on the cheap. We’ll devote a post to each staple, starting with noodles. I briefly mentioned pasta/ramen noodles before because noodles are a very cost-effective way (and easy because who has time) for turning a side dish into the main meal.

Any cream-based soup can become a great pasta sauce. Add noodles into broth-based soups for a filling meal. Toss pasta noodles with olive oil, a can of tuna, and some veggies for an instant pasta salad. If you must eat out, don’t get pasta because it’s too damn cheap/easy to make. Cooking pasta is fast and easy, but if you really can’t spare the 15-20 minutes to do it when you’re hungry, make it ahead of time, and throw it in the freezer. That’s what you buy when you get those frozen pasta entrées anyway. When making pasta, the noodles will cost you less than a dollar a pound, so it’s almost like paying $5-$9 for sauce when you buy a pasta entrée. You can do this when your income is higher, but if you’re living off a stipend, save your money and make your own darned pasta.

Here’s a ridiculously easy-to-make pasta recipe, courtesy of Campbell’s soups:
1 can (10 3/4 ounces) Campbell’s Condensed Cream of Chicken and Mushroom Soup (50 cents a can on sale)
1/2 cup milk
2 cups cubed cooked chicken (Hello Costco $5 chicken!)
3 cups medium egg noodles, cooked and drained
Chopped fresh parsley (you can get bunches of these for less than 20 cents on sale at the international markets)
+Whatever random seasonings you like

It’s supposed to make four servings, but I’ve personally found the recipe to be too salty as is, so usually I just use 1 pound (uncooked dry weight) of cooked pasta, which means a lot of frozen pasta entrées afterwards. For variety, you can substitute the cream of chicken with clam chowder and the cubed chicken with canned tuna.

Surrounded by geniuses- part 11 : Erick

The new genius who was here before, but not was not mentioned when I first joined the lab because this genius was not (as far as I knew) actually a part of the lab…yet. Sure, his name floated around in the Su Lab chat channels, but he was not physcially present presumably because he was in another lab. I also saw him during some of the Su Lab meetings, but then again, there are also other labs which show up during these meetings, so his status was still unclear to me. What is clear is that the new genius who was here before, but not is a genius. Not only did he give a fascinating presentation on one of his research projects, he also has an MHS in Public Health AND an MD, which makes the new genius who was here before, but not the perfect person to conduct research on developing open source translational bioinformatics methods to advance personalized medicine.

Specifically, Erick’s research projects include:

  • Developing statistical methods for adaptive enrichment in clinical trials with extensions to post-hoc subgroup analyses.
  • Analyzing phased whole genome sequence data from the Wellderly Study
  • Developing computational tools for a massively-multiplexed and inexpensive gene expression assay (Rnl2-based RASLseq)
The new-genius-who-was-here-before-me-but-not has some impressive credentials
The new-genius-who-was-here-before-me-but-not is an expert in a lot of subjects, including PSB participation

Additionally, the new genius who was here before, but not is a pro when it comes to the Pacific Symposium for Biocomputing (PSB) and has dispensed great advice to the genius-in-charge with insane mental organizational skills on how to best enjoy the conference.

On a side note: In case you missed it, the genius-in-charge with insane mental organizational skills and the Young-dad-in-charge-of-too-many-crazy-projects genius along with some others will be helping to chair the session on crowdsourcing and data mining at PSB. Though it’s too late to submit a manuscript on this session topic, it isn’t too late to register for the conference.

Neat Science Thursday – Crowdsourcing blues

As evident by a few of the previous posts on crowdsourcing science, wikipedia, and the GENE/Gene Wiki partnership, I think crowdsourcing science and citizen scientists are awesome! The speed with which a lot of interested non-scientists can sift through data is simply astounding!

In spite of all the positive features of crowdsourcing science and information (like wikipedia), there are also some interesting drawbacks. For example, wiki entries have been vandalized as part of a joke/challenge started by a comedian in order to make a joke/commentary on the wisdom of crowds, and more recently, users from government-related ip addresses have been systematically editing pages to reflect a particular political agenda. This kind of vandalism has prompted the banning of government ip addresses in the past.

But issues with crowdsourcing are not limited to just information platforms like wikipedia. Crowdsourcing competitions in order to foster participation and innovation have also been hijacked as covered in a recent (and very interesting) post on Science2.0.

According to the original post found at The Conversation about a new study:
“The research, published today in the Journal of the Royal Society Interface, found the openness of crowdsourced competitions, particularly those with a “winner takes all” prize, made them vulnerable to attack.

The researchers used game theory to analyse the trade-off between the potential for increased productivity from crowdsourcing a project, and the possibility of it being set back by malicious behaviour. They cited the DARPA Network Challenge as an example of a hijacked crowdsourcing competition, in which the organisers were left to sort through many fake submissions, including fabricated pictures of people impersonating DARPA officials….continue to the original post

Or visit the actual study publication (and hope your institution has access to it) if you want to read the original study.