def science { }: 2008

Tuesday, October 7, 2008

Heuristically Beneficial Artifacts

I've recently embarked on a study to see what effect clustering mass spectra has on the design of MRM experiments. Specifically one clusters spectra from the shotgun proteomics discovery phase of the experiment to see how it affects the selection of peptide species and the transitions that a mass spec looks for during the second targeted sequencing phase of the experiment.

While the work is ongoing, a very interesting (at least to me) result has come out of plotting the unfiltered SEQUEST results from the non-clustered and clustered version of the mass spec data. The say a picture is worth a thousand words, so:

The chart is showing the XCORR values of the clustered spectra plotted against the XCORR values for their respective members, independent of what peptide was identified for each. The color coding represent scores for peptides that are part of the decoy database, where:

blue = a decoy hit was scored in the clustered spectra, but not the original
green = a decoy hit was scored in the original spectra, but not the clustered spectra
red = both spectra scored a decoy peptide

What you'll note is that spectra that all clustered score a different set of peptides higher than do the original spectra, and vice versa. At least for this data set, which is SILAC labeled, we note that each method is identifying a different population of peptides. This goes against all current publications, so I have to be careful in the interpretation of these results and will need a significant amount of validation experiments, but this is exciting stuff!

Particularly, it is going to be very interesting to see whether clustering actually helps or hinders ion selection when designing MRM experiments. If it does help ion selection, I have already come up with the catchy name, Heuristically Beneficial Artifacts^TM. If not, well, we will at least have looked at the real-world effects of a methodology when applied to a new type of experimental goal.
More on this as it develops.

Monday, October 6, 2008

Not so much about science...

So in looking back over that last few entries, it seems I have strayed from my original mission, which was to cover issues regarding informatics and how it relates to science. The first part seems to have dominated the posts recently, so I am dropping all pretenses starting a new blog, over at Blogger competitor Wordpress.

Why a new platform? I figured I would give Wordpress a spin to see how it compares to blogger. Also I like the template they had for the blog ;) So far, the interface is similar, but looks a bit cleaner than Blogger's. Overall, it seems the same functionality and workflow, with the addition of one thing that I can see coming in very useful for a site like I envision, permanent pages.

In effect, If I write something that I want to be prominently displayed at all times, I create a page. This can come in handy say, if I give a set of instructions for setting up your development environment, or for providing a listing of resources. As a regular blog post these types of items are "discoverable" by searching, but they will eventually go away as new posts drown them out.

Another agument for the Page as opposed to a post is to combat a post from becoming stale. How many times have you thought you found an answer to a question in a blog post but it turns out said post is a bit old and the solution no longer applies? I guess stackoverflow seeks to remedy this, but bloggers can do their part by posting "important" and generally applicable posts with prominence as a page, thus they are always reminded to update it, since is it always visible.

Anywho, there I go again, talking strictly about web applications. Well, I guess I'll shut up now and you can check AppMecha for more posts related to these issues.

Monday, September 15, 2008

Why Thn.gs should send a chill down evey ISV's spine

Independent software vendors (ISV's) have recently been touting some of the most useful applications that I have seen in my entire career. From 37signals to GitHub to CulturedCode, the genius of these applications are simple interfaces to complex sets of requirements. That last one, CulturedCode, has a problem though. Their flagship about to launch product just got pwned by some russians.

While 37signals and GitHub are already web applications, with a service based revenue stream, CC's product is an OS X desktop application that works beautifully as a get-things-done task orginizer, with a perpetual licensing model. The aforementioned russians took every piece of this application and made it into what appears to be an almost complete equivilient of the desktop application. And if that we not bad enough, they integrated it into Google Gears for offline use.

What can CC do about this? Not much, unless they already have a huge bankroll for lawyers. And this situation should send chills down the spine of every small software shop releasing small useful tools. The very nature of these shops constrains feature creap, which in turn forces the designers and developers to squeeze the most they can out of what they have in place, which in turn makes the software simple and powerful at the same time. But it also makes these applications vulnerable to xeroxing using the web, (relatively) cheap labor pools, and robust distribution network with firewall-like immunity to legal action in the form of international borders ( "litigation-wall" ?).

Anyone want to take bets on who is next on the feed tray? My guess is a Delicious Library web clone.

So take home message is, design software that is not easily replicated, either by feature or connectivity, or realize that your app really is easy to xerox and have a large maerketing scheme to drown out any news of the enemy.

Speaking of Google, there was that little hiccup in relations related to releasing a clone 37signal's campfire with the release App Engine. And then there is Chrome+Gears, the browser-DB combination that makes web-applications even more desktop like. Not that they are alone in this, Adobe (AIR) and Microsoft (SilverLight will certainly "extend" the reach of IE) are walking the same browser-as-platform path, and trying to building their market share on the "everything should be free" web culture, and that really raises my mercury. Fucking piracy enablers.

Friday, September 12, 2008

Licensing makes Cloud Computing ... difficult

I have thought a lot in recent months about how to best leverage cloud computing resources, or utility computing, that are increasingly becoming available to the general development community and one issue in particular makes me cringe: LICENSING.

It just so happens that in my field, proteomics, the open source set of tools gather a lot of press, but really most submissions to publish research are still using commercial algorithms for the initial data analysis, even as plenty of research has been show (and published) that results from commercial and open source algorithms are comparable. There is an inherent level of trust manuscript reviewers have in the commercial offerings that is hard to overcome, hence most researchers still opt to use the commercial algorithms as the gold standard.

Not that this is a bad thing, mind you. As someone whom supports the informatics efforts for many researchers, I find that the commercial offerings are much more stable, and are subsequently easier to support and maintain than most of the current crop of open source offerings.

The trouble lies with rigid licensing models of commercial offerings. Specifically you must purchase perpetual licenses for a certain number of compute cores. Such a model is just not compatible with what I would like to do as a service center, namely to provide software-as-service billing to researchers. True the high up-front licensing can be amortized over the life of the support contract, but it assumes that the computers running the algorithm are already procured and dedicated to the software (also not a bad assumption in most cases, since the hardware costs pale in comparison to the licensing).

In effect, there is no way to make a utility computing model, such as one offered by Amazon Web Services, work with these sorts of license restrictions. The set-up and tear down of a compute job is too high to be a viable full time solution.

What I would like to do is augment my current computing capacity during crunch times. Dedicated licensing prevents this. As does the way most networked algorithms work, but that's another post.

Monday, April 14, 2008

Pharma's Futures

Alternate title: "Bet Big, lose a long long time, then win so big that it makes your losses look like pocket change. Or maybe I'll just lose my shirt. Maybe both simultaneously, who knows?"

This post is motivated by a set of lectures I attended as part of the 3rd Annual ITMAT Symposium on Translational Medicine (Full disclosure, ITMAT is currently my employer). Here are the first two lectures given today (for posterity in case the link above goes to the ether some day) that are the focus of this post:

Nassim Nicholas Taleb, PhD, London Business School, “Errors in the Analyses of Market Potential for Drugs”

Dale Nordenberg, Healthcare Industry Advisory, PricewaterhouseCoopers LLP, “Global Trends and Drug Development”

The first talk was by the author of The Black Swan, a very entertaining read that outlines the very large effects that highly improbable events have on the economic markets. The essence of his book, and his talk, were that in not normally distributed markets, such as the securities market, a rare event can either capture huge amounts of wealth or lead to huge amounts of losses, and the the incremental gains/losses reported over years are really just the noise in the data, and hence unimportant in a sense. Dr. Taleb backed up his claim with a few charts of trade earnings over a decade and most notably showed that two days, in which rare occurrences took place, he both gained 98% of his total revenue (and lost it as well) over the ten years he had data for. Two days in ten years. Yikes.

Taleb went on to claim that these types of events are equally applicable to big pharma. Currently the top 6 or so blockbuster drugs capture over 90% of the revenue, and that's no where near normal. Recent high-profile litigation also showed that rare events can hurt a companies' bottom line as well.

The second talk by Dale Nordenberg focused on what industry pundits and analysts are calling "Pharma 2020", or the shift in the industry towards providing personalized medicine. This is pharma's equivalent to the so-called "long tail" economics model popularized by Internet sales. Selling a larger variety of drugs, each tailored for use by fewer people.

There is no doubt in my mind that the gains in safety and efficacy of future drugs offered to the public would be enormous if pharma did indeed move in this direction, but I am a bit skeptical about the financial incentive of big pharma to follow the piper. It is essentially assuming that pharma can reach a normal earnings distribution if this model is followed, which assumes that big pharam will be insulated from the risks of non-normally distributed earnings environments.

But if we take Taleb's lecture at face value in the context of current litigation practices in the US, then we already know that the risk model that pharma faces is one where a rare event will have a very large detrimental effect. Without the equally large and insular effects of blockbusters on revenue, how will a company survive a class action? If I where a CEO, I would probably be paying lip-service to the Pharma 2020 vision, in the hope that such actions would lead to protection from Congress, but my horse's name would still be "Blockbuster McGee".

Google App Engine: constraints are good

There has been a lot of rejoicing and jeering over Google's web application deployment offering, Google App Engine. Most of the gripes (that I have seen) have been about the constraints of only supporting a single language, limited database capabilities, no file or OS access of any kind. Frankly I think there is an element of FUD to all of these.

First for language choice, AppEngine has been receiving a lot of flack over the choice of a hobbled Python over and above all other languages. Also there have been a couple of gripes overheard at the lack of real foreign key constraints in the relational layer.

Boo frickin hooo. Stop whining and get coding and you'll come to the same conclusion all artists, composers, coders, and generally anybody that ever created anything did. Namely that constraints are sometimes the inspiration for the creation. Sometimes it is the constraints that shape the work more than anything else. Lack there of may sometimes lead to interesting experimentation, but I put forth that actively not following a system of conduct is itself a constraint. Trying to be original is hard work, and all the harder by not framing your work within something familiar.

But I also should stop bitching and get to work. My idea has the potential to reshape the way collaboration science is conducted, but I need to deliver the tool to the audience. Seems like G-Apps would be a perfect test bed.

Thursday, April 10, 2008

If can't beat 'em, join 'em

Just as Twitter was effectively killing my (and others) blogging fecundity, I saw that they posted a handy-dandy little link on the dash for inserting Tweets into your blog. Sweet. Now I can have the best of both worlds.

The Twitter giveth and taketh away.

Thursday, March 13, 2008

Twitter is killing my blog

Twitter is killing my blog initiative. In fact this post can just be a straight copy-paste from some recent tweets (reverse chronological order):

	delagoya Ah, just saw that on XP you can't see my character cleverness. Sux4U dude about 23 hours ago from Snitter
	delagoya suppose a hash assignment {:foo ➠ :bar } about 23 hours ago from Snitter
	delagoya and also wondering if I can use ruby's syntatic suger with random windings about 23 hours ago from Snitter
	delagoya It's like the evolution of large emails ☞ short emails ☞ IM about 23 hours ago from Snitter
	delagoya notice that twitter is killing my blog initiative. Why blog when a short tweet will do? about 23 hours ago from Snitter

Really? It's that easy? Can you beat that ease-of-brain-dump blogger.com? I think not. Some picasa functions came close for posting quick notes on photos, but there isn't really a native blogger equivalent (that I know of). The whole create post -> edit post -> post post life cycle is too long for a quick set of thoughts you want to jot down.

I could probably research this more, but really why would I when twitter is always there, and I actually have an audience that comments back to my posts. And so back to my original point: Twitter is killing my blog.

Friday, February 15, 2008

Sequel's lackluster to_*

Sequel is a great bare-bones ORM, but the bare-bones quality of Sequel::Model leave something to be desired. For instance the obj.to_json method just calls the default ruby object inspect method, which prints out the class name and memory space. Not helpful. Also no to_xml() for easy REST incorporation. Almost makes me want to go back to ActiveRecord, but then what's the point in using Merb?

Anywho, it's not that hard to extend the functionality of Sequel::Model, so I have started writing some gems to make development with Merb + Sequel a little easier. Like real to_xml & to_json methods on the model instances and instance collections. More on this as it develops.

Merb TLS mail plugin gem

UPDATE: Now available on github : merb_tlsmail github page

My previous post on sending mail via a TLS SMTP server on merb covered monkey patching Merb::Mailer.

I took the time to code this up as a gem, using Merb's meta-programming routines to extend Merb:Mailer in a standard way (for merb that is). See this open ticket in the Merb lighthouse issue tracker to download the gem until it is released as a proper plugin.

Tuesday, February 12, 2008

Secure SMTP server (TLS) from merb apps

It seems that Merb's Mailer class is either using a local sendmail client or a non-TLS enabled SMTP server. This is not a unique problem to merb, but rather a deficiency in Ruby 1.8.

I took some time to look around and found that Rails has the same problem, and it was fixed via a plugin, not a gem as is the "merb way". There was also a gem that packaged the Net::SMTP classes from Ruby 1.9, which do have TLS support. It isn't hard to guess what I did next.

I monkey punched Merb::Mailer to overwrite the net_smtp method and added two config options into merb_init.rb. See the pastie for the code example here.

http://pastie.caboo.se/151190

Thursday, February 7, 2008

Where has the Sematic web failed us?

News about the Semantic Web has being the "next big thing" has been hitting web application developers over the head for years, like it was news about the iPhone. But where are the products? Who uses it? Expect for academic papers, a standards process that nobody pays attention to, and ontology narcissists, nobody uses RDF or OWL or any of those supposedly "next generation" tool sets and languages. OK, maybe Powerset will, but I'll believe that when I see it.

Certainly the swoogle is no google, although it is starting to address what I see is the most overlooked part of the semantic web, usability. It seems that developers and proponents of Web 3.0 thinmk regular users of the web are a lot smarter than they are. Swoogle does actually show a pretty nicely formatted report on the metadata it has for a result, if you know what you are looking at that is. Yet the main result link leads to the originating ontology, which is and RDF xml file. Yeah, that's helpful. Even if Joe Public is aware enough to click on the metadata link, instead of the big red button that is the main link, I can't ever image him making heads or tails of the report, or for that matter caring.

Why is that? Why is usability not even a concern for majority of ontology & semantic web developers? What makes this situation even more of a disaster is that tagging (and tag clouds) are so wide spread and ridiculously easy to understand. How is tagging any different than RDF annotations? A little more text, that's what. Oh and querying RDF is a bitch, so developers are also affected by the situation, making adoption of this "standard" that much more unlikely.

PS: I am not part of, or hold any affiliation to, NG&E, but "the 85%" is one of those stereotypes that ring true to me.

Friday, January 4, 2008

Zed.. very humorous

Zed Shaw's latest rant is a hoot. When I first read it, it was clearly the draft he mentioned it was. I'm glad he posted it as a draft, though, because the next iteration did give a chance for DHH to clarify, and also gave Zed a chance to frame the whole rant a bit better with his admission that he himself was the main person responsible for almost going to the poor house.