Tuesday, December 11, 2007

Ask not for what you think you need...

I have been reflecting a lot recently on why researchers think they need grand solutions for relatively small problems. Specifically there is a perception among benchers that are conducting large-ish experiments that they need some sort of LIMS to manage their data. Frankly as I've stated many times (at least in conversations with other IT folk, and even with benchers) most researchers don't want a LIMS. What they really want is fancy file storage.

Yet, the town folk keep insisting "we need LIMS, please give...". LIMS are probably the only thing that potentially fits the bill for experimental data management, so that's what benchers ask for, when most likely a digital asset management application would suffice. Heck, even all those bit torrent sites can potentially do the job that researchers need.

If you are a coder that is continually asked to provide a LIMS to researchers, or if you are a bencher that is interested in LIMS for data management, here are a set of questions you can ask before going any further down the rough and tumbly road that is LIMS adoption:

  1. Are you in a regulated environment?
  2. Are you willing to mandate use of the LIMS?
  3. Do you have adequate personnel to support the LIMS locally (e.g. do you have a dedicated person that will actively promote the use of the LIMS, train folks, configure the system, do extensive follow-up, etc. Vendor support will only be of help at the start of the adoption process.)
  4. Do you have a lot of spare cash? (Think 6 figures to buy an initial bank of licenses)
  5. Will you have a lot of spare cash for the next 3-5 years? (Think 5 figures to keep annual support and maintenance up to date.)
If you answered "No" to any of these questions, seriously reconsider buying what is traditionally known as a LIMS. Instrument control, automatic data acquisition, yada, yada, all those marketing features used to sell a LIMS don't mean squat if no one uses it in the first place.

Thursday, November 29, 2007

Python 25 & MySQL on Leopard

There was a bit of trouble when I tried to use macports to install the excellent Trac project management and issue tracking application. Specifically, the py25-mysql port did not compile correctly.

Trolling through the InterWeb, I found this post about a fix to compile the python module from scratch. The post, however, has more instructions than are necessary, so here is my revised procedure:

  1. Download the source from here.
  2. Unpack the archive and edit the _mysql.c file to comment out lines 37-39:

    // #ifndef uint
    // #define uint unsigned int
    // #endif

  3. Edit the site.cfg file to set the mysql_config path. For me this was:

    mysql_config = /usr/local/mysql/bin/mysql_config

  4. Compile as normal

    python setup.cfg config
    python setup.cfg build
    sudo python setup.cfg install


Now if only someone would make these changes in the port file, then trac installs would be super easy, instead of just easy-ish.

Tuesday, November 27, 2007

X11 on Leopard, auto-launch is awesome

Being used to Tiger's version of X11, I automatically started an X11 session at the start of the day to enable X11 port forwarding from our Linux servers. It has annoyed me that Leopard starts an xterm window every single time and having looked through the xinitrc files an seeing no mention of an xterm on launch was driving me batty.

I finally got fed up enough today to start hunting the InterWeb for a solution and found this page that gave me several tidbits of good news. First is that X11 is now an on-demand service, and will start when it is needed. Second, the /App/Utils/X11.app is not a link to the service itself, but really a link to starting an xterm via the on-demand launchd X11 service. Hence I was starting that xterm myself! D'oh!

This has two practical implications for my work method. First, that I don't need to manually start the X11 service anymore. Second, that I must remove the DISPLAY env setting from my bash profile. That last bit is important, hence the emphasis. If set, the DISPLAY env will basically cause the on-demad service to not work.

I tested out connecting via SSH to our servers and launching an xterm. As advertised, once the DISPLAY setting was removed from my profile, the xterm from the server started the X11 service as needed and displayed correctly.

Now if Apple fixes those Spaces bugs, everything will be all good.

Speaking at Philly on Rails Meeting

I will be speaking at the PhillyOnRails December group meeting next week. Topics, previously covered on this blog, will be using R with Ruby to make plots and using Gbarcode.

Wednesday, November 14, 2007

OS X 10.5 Leopard and MySQL

UPDATE: The Mysql.com supplied preferences pane now works. Just use that.

It seems that the once super-simple MySQL on OS X install is no more in Leopard, that is until the MySQL developer community catches up. Now there is a small amount of terminal-fu that must be done to enable a mysql server on OS X, which has been covered in some other blog posts. But I will repeat the procedure here, since I added a script or two to make things a tiny bit simpler for me.
First, if you did a clean install of Leopard, then you need to download and install the MySQL server for OS X from the mysql.com. Just install the main server, not the preferences pane or the start up item.
Next download this launchd file and copy it to "/Library/LaunchDaemons/com.mysql.mysqld.plist"
Next change the ownership to root. Might as well set proper permissions while you are at it:

sudo chown root:wheel /Library/LaunchDaemons/com.mysql.mysqld.plist
sudo chmod 644 /Library/LaunchDaemons/com.mysql.mysqld.plist


Next load the file into launchd using launchctl

sudo launchctl load /Library/LaunchDaemons/com.mysql.mysqld.plist

Finally, you can create some shell script to load and unload the server at will and place it where you normally place your binaries (mine is ~/bin/sqlserverctl ):
#!/bin/sh
case "$1" in
start)
sudo launchctl load -w /Library/LaunchDaemons/com.mysql.mysqld.plist
;;
stop)
sudo launchctl unload -w /Library/LaunchDaemons/com.mysql.mysqld.plist
;;
*)
echo "USAGE: $0 {start|stop}"
;;
esac

UPDATE: There was a small error in my start script, namely and extra space in the case statement, it has been modified and fixed above.

UPDATE: If you install MySQL via MacPorts, then use the "+server" varient and use this launchd file instead
/Library/LaunchDaemons/org.macports.mysql5.plist

Tuesday, November 13, 2007

Annotated Western with iPhone sketch app

Just tested out the Sketch application for annotation of a western blot on the iPod Touch. Verdict: Not so easy to write text or small symbols. Also it was impossible to do one handed, as the touch slid around the desk. Would need some sort of rubber mat to put any small device on to provide enough traction for one-handed operations.

On the bright side, it should be a fairly easy task to alter this app for putting in text and small oft-used symbols (like arrows).

Thursday, November 8, 2007

On The Apple Tablet

The internet is all-a-twitter with rumors of an Apple tablet. Frankly, like others, I am of the opinion that the tablet already exists in some prototype fashion within Apple and never released because of the so-far lack luster market. Unlike them, however, I do think that there is a market out there for this sort of device, even outside of the "professional" computing world environment that they cite, such as doctors and sales reps.

Having owned an iPod touch for several weeks, it has become my go-to device for web surfing and quick tasks. The thick-fingered-centric design of the touchscreen is really very satisfying to use, and would only get better with increasing size. One example application would be music. How cool would a touchscreen mixer board (GarageBand anyone?) be? Or for great DJ effects at a party, like virtual record scratching. Finger Painting. Awesome renditions of boardgames. Or totally new game designs, like the DS rendition of Zelda. Give it a kick stand and prop it up to view movies. Or better yet a dock and stand with keyboard and mouse to drop the tablet into.

Business applications are also pretty amendable to such a device, namely my idea of bench-top LIMS input devices. Experience with the Touch also lead me to believe that note taking, brain storming, and diagramming would be greatly enhanced, or at least as easy, as regular laptops.

But before I get my own hopes and dreams up any further, I stick with the "I'll believe it when I see it."

Friday, October 19, 2007

can't touch this

so been plating with the iPod touch for a few days now and am trying to type out an entry as fast a the keyboard will let me. not too bad word replacement is happening and I can type out reasonably quickly. for posts re rich text we does not work but is fine ss HTML. amy caps you see are actually word replacements.

well a good test to say the least. makes me hopeful that a lims app is not so far fetched.

Wednesday, October 10, 2007

Easy Gbarcode

A few posts ago I had mentioned that my Gbarcode project should really be a bit more user friendly as well as provide an easy way to create PNGs without loading memory hogging libraries like RMagick. I investigated using the Cairo libraries for this here.

Good news for every one that uses Gbarcode, and those thinking of using it, I created a small wrapper module that is more Ruby-ish and also uses Cairo to print out to PNG. The functionality is bare-bones, with a fixed height of 150 pixels and width is determined by the length of the encoded barcode. It works well for Code 128 barcodes. Grab the ruby file here.

To use this module, here is an example (note that bad encoding schemes will result in a raised excpetion, hence the begin/rescue block):
require 'gbc'
require 'gbarcode'
require 'markaby'

begin
b = GBC::Barcode.new(ARGV[0], Gbarcode::BARCODE_ANY)
puts b.ascii, b.partial, b.encoding
b.to_png("out.png")
b = Markaby::Builder.new()
b.html do
body do
img(:src => "out.png")
end
end
f = File.open("out.html","w")
f.write(b.to_s)
f.close
rescue Exception => e
puts e.message
end

Tuesday, October 9, 2007

ITMArT: Design: Part 1.2 "The competition revised"


During the last post, we took a look at two open-source procurement solutions out in the wild. Today I noticed that there is a third project that we should take a look at, Coupa. Coupa is a commercial venture that also provides an express edition of their product (e.g. a bit less features than the supported product).

Simply put, their mission statement is very similar to ITMArT's, namely that purchasing should be simple, intuitive and serve the needs of both the managers and users of the system, not just the managers. Taking a look at the demos, Coupa blows ITMArT out of the water feature and UI wise, while retaining some pretty advanced capabilities. There are very nice reporting modules, web forms, invoice ability, role and group based access to data and actions, very nice email integration, etc. If I where a finance guy at the University level, I would stop all in-house development right now and switch to their platform. Its that nice.

Having said that, I am not at the University level, or even departmental level. ITMArT's audience is a much smaller group, namely the medium to large laboratory, and they rarely need such things are approval hierarchies, approved vendor catalogs, etc. Those needs are usually at a higher organizational level and hence ITMArT development does not stop here.

A bit of good news is that I have lots of good ideas from the Coupa demos that I will ut on the feature request list of ITMArT. Another good bit is that Coupa express edition is a Ruby on Rails application, the same as ITMArT! So I can take advantage of some of their codebase for a few nagging concerns of mine, like authentication, group assignments/permission, and role based authorities.

Monday, October 1, 2007

ITMArT : Design : Part 1 "The competition"

So far my posts have focused on the bencher's quest (to learn code). This is mainly because it is infinitely easier to write about small and helpful scripts than about complex issues like application UI design. But since I introduced the ITMArT application, I might as well go through the design process for it. This will be a mutlipart post covering a couple of areas of the application and design process. First up will be the history and requirments of the app.


History

ITMArT began as any project does:Outlining a set of requirements and then looking to see if some existing application already does what you want to accomplish. In this case, the requirements consisted of:

  1. an application that tracks purchase requests and purchase orders
  2. users would often have recurring request, a searchable catalog would be ideal
  3. new types of requests come up often so adding vendors and items should be easy and flexible
  4. orders can only be placed by select group of users with access to funds
  5. monthly report of expenditures would be great
  6. email alerts of requests from request to order to receiving

You'd think there would be some open source ERP system out there that would have these component in it, and there are: Openbravo.com and TinyERP are just two examples. The trouble is that ERP systems generally tend to focus on the needs of the business processes and not so much on the requirements of the end user. I'll get to what that means in a minute, but suffice to say this type of focus would not have worked for us.

The ERPs

Let's first look at Openbravo, a Zope application and UI nightmare. Pop-ups all over the place, difficult navigation and difficult to use forms. Tons of click-through dialogs and switching between keyboard and mouse, essentially MSDN design taken to the web. But don't just take my word for it, demo it yourself and see.

TinyERP is much better, but still focuses on the business case. For instance their forms (and top menu) make a distinction between creating a purchase order and creating the line items for that order. Which means that the user must know ahead of time that they need to create an order request first, which means that the system is not intuitive enough and will either require documentation (which no one reads) or training (which you have to pay for). Much worse, when creating the order line items, you need to specify a ton of fields and know the purchase order ID! OK, so the AJAX search helps you out for finding the order ID, but one typo and you have a clean up on aisle 9.

The last thing that makes an ERP unsuitable would be my first point, namely the focus on business processes. Usually this is not a problem, since you can make the assumption that the community is relatively stable and business oriented. The user community we have to service is a mix of bench scientists, admin assistants and business office types. There are no resources for training and the usability expectations are high.

This means that we have to shoot for the Lowest Common Denominator in terms of the overhead placed the ordering process. Making a purchase request should be as simple as creating an email, and if it is not, then there had better be some work savings later to make up for it. Order placement in TinyERP is less than ideal, but not completely onerous as Openbravo, and the work savings are that the reporting, catalog searching and order processesing are streamlined.

The deal breaker comes in the catalog management aspect. Adding items and vendors is not intuitive or easy enough for the majority of our user base. This is where focusing on the business process of catalog management really takes a pounding on usability. The central assumption here is that catalogs come from a preferred set of vendors and are bulk updated (and not very often). The broad user community means that the vendor list will be constantly changing, that pricing will be variable, and the various catalogs will be fluid. The burden of catalog management must be passed on to the purchase requester for the application to be scalable (e.g. roled out to other departments or laboratories). Centralized management of the catalog would incur too high a cost otherwise.

Conclusion

Based on these quick reviews of existing systems, I decided that further trolling through open source projects would not be worth my time. If even a simple, easy, and good looking application like TinyERP did not fit the bill, what would? A custom application, that's what.

On the next ITMArT post, we'll dicuss the actual design choices made to fulfill the list of requirements that we had mentioned at the start.

Saturday, September 29, 2007

Signs from above

So my earlier idea of using an iPhone or iPod touch as the gateway into an electronic lab notebook for tracking protocols and doing real-time data entry has been given some conflicting signals from the powers that be. One of those powers (my boss) answered my long (but not too long) email proposal with a one word reply. What was the word you ask? ... "cute" ... No signature even.

Another higher power (the internet) quickly followed that crushing monosyllabic bitch-slap with an uplifting rumour from AppleInsider that Apple is developing a PDA device based on the iPhone & Touch UI, but with a bigger screen. Bigger screen (woohoo!) and the potential to install non-Apple third party apps (highly unlikely given Apple's recent activities) would be an ideal device to use as a lab e-notebook tablet.

Oh, whom do I listen to??? Should I drop the project completely? Or should I table the project until the a new, (relatively) low-cost, and developer-friendly device pops into existence?

In the end, I decided to listen to my inner voice, you know the impatient one, and just bought myself a touch to play around with for the time being. If this mythical iPAD ever sees the light of day, I'll raise the issue again with the G-man and get him to foot the bill for what will probably be a more expensive device. Either way I win ;) Unless of course he calls my bluff ...

Thursday, September 27, 2007

Cairo for barcodes

Continuing my discussion about Gbarcode 2, I tested my theory of using Cairo for creation of barcodes. It turns out that this is not so hard, but it was a bit of a lerning curve, to say the least. I used the ruby DL library to load the cairo shared libs from the system (thanks to the GD2 gem for the code hints here). A few small methods where all that were needed for bare minimum functionality: creating bars and adding text. No fancy formating here.

Next, I used gnu barcode to get layout information for a barcode, so I could test the drawing methods independently of barcode creation logic.

The result is the picture posted above. Neat huh? Code is posted below, but I think for a production gem, I'll probably not use DL, since I have to wrap the gnu C libs for actually creating barcodes from text strings anyway.

Without further ado, the test script:
require 'dl'
require 'rbconfig'

module BC
VERSION = '1.5.0'.freeze

def self.cairo_library_name
case Config::CONFIG['arch']
when /darwin/
  'libcairo.2.dylib'
when /mswin32/, /cygwin/
  'cairo.dll'
else
  'libcairo.so.2'
end
end
def self.name_for_symbol(symbol, signature)
case Config::CONFIG['arch']
when /mswin32/, /cygwin/
  sum = -4
  signature.each_byte do |char|
    sum += case char
    when ?D: 8
    else     4
    end
  end
  "#{symbol}@#{sum}"
else
  symbol.to_s
end
end

private_class_method :cairo_library_name, :name_for_symbol

LIB = DL.dlopen(cairo_library_name)
SYM = {
:cairo_image_surface_create   => 'PIII',
:cairo_create    => 'PP',
:cairo_get_target    => 'PP',
:cairo_destroy    => '0P',
:cairo_surface_destroy    => '0P',
:cairo_surface_write_to_png    => '0PS',
:cairo_set_source_rgb    => '0PDDD',
:cairo_move_to    => '0PDD',
:cairo_line_to    => '0PDD',
:cairo_set_line_width    => '0PD',
:cairo_stroke    => '0P',
:cairo_select_font_face    => '0PSII',
:cairo_set_font_size    => '0PD',
:cairo_show_text    => '0PS'
}.inject({}) { |x, (k, v)| x[k] = LIB[name_for_symbol(k, v), v]; x }

class LibraryError < rs =" SYM[:cairo_image_surface_create].call(0,w,h)" rs =" SYM[:cairo_create].call(s)"> #{s.class}] :: R[#{r} =>  #{r.class}] :: RS[#{rs} =>  #{rs.class}]"
  SYM[:cairo_set_source_rgb].call(r,0.0,0.0,0.0)
  puts "S [#{s} => #{s.class}] :: R[#{r} =>  #{r.class}] :: RS[#{rs} =>  #{rs.class}]"
  return r
end

def self.ctx w,h
  context(surface(w,h))
end

def self.add_bar ctx,x,y,w,h
  # cairo_move_to(cr,11.0,20.5);
  # cairo_line_to(cr,11.0,70.5);
  # cairo_set_line_width(cr,1.85);
  # cairo_stroke(cr);
  SYM[:cairo_move_to].call(ctx,x,y)
  SYM[:cairo_line_to].call(ctx,x,h)
  SYM[:cairo_set_line_width].call(ctx,w)
  SYM[:cairo_stroke].call(ctx)
end

def self.add_text(ctx,txt,x,y)
  # cairo_select_font_face (cr, "serif", CAIRO_FONT_SLANT_NORMAL = 0, CAIRO_FONT_WEIGHT_BOLD = 1);
  # cairo_set_font_size (cr, 12.0);
  # cairo_move_to (cr, 21.0, 90.0);
  # cairo_show_text (cr, "TEST1234");  
  SYM[:cairo_select_font_face].call(ctx,"serif",0,1)
  SYM[:cairo_set_font_size].call(ctx,12.0)
  SYM[:cairo_move_to].call(ctx,x,y)
  SYM[:cairo_show_text].call(ctx,txt)
end

def self.draw(ctx,fname)
  # surface = cairo_get_target(cr)
  # cairo_destroy(cr);
  # cairo_surface_write_to_png (surface, "hello.png");
  # cairo_surface_destroy (surface);
  r,rs = SYM[:cairo_get_target].call(ctx)
  SYM[:cairo_destroy].call(ctx);
  SYM[:cairo_surface_write_to_png].call(r,fname)
  SYM[:cairo_surface_destroy].call(r);
end
end
end

include BC
c = B.ctx 132, 100

B.add_bar(c,  11.00 , 20.00,  1.85, 70.0)
B.add_bar(c,  13.50 , 20.00,  0.85, 70.0)
B.add_bar(c,  16.50 , 20.00,  0.85, 70.0)
B.add_bar(c,  22.00 , 20.00,  1.85, 70.0)
B.add_bar(c,  25.50 , 20.00,  2.85, 70.0)
B.add_bar(c,  30.50 , 20.00,  0.85, 70.0)
B.add_bar(c,  32.50 , 20.00,  0.85, 70.0)
B.add_bar(c,  37.00 , 20.00,  1.85, 70.0)
B.add_bar(c,  39.50 , 20.00,  0.85, 70.0)
B.add_bar(c,  44.00 , 20.00,  1.85, 70.0)
B.add_bar(c,  47.50 , 20.00,  2.85, 70.0)
B.add_bar(c,  50.50 , 20.00,  0.85, 70.0)
B.add_bar(c,  55.00 , 20.00,  1.85, 70.0)
B.add_bar(c,  58.50 , 20.00,  2.85, 70.0)
B.add_bar(c,  63.50 , 20.00,  0.85, 70.0)
B.add_bar(c,  65.50 , 20.00,  0.85, 70.0)
B.add_bar(c,  68.50 , 20.00,  2.85, 70.0)
B.add_bar(c,  73.00 , 20.00,  3.85, 70.0)
B.add_bar(c,  76.50 , 20.00,  0.85, 70.0)
B.add_bar(c,  79.00 , 20.00,  1.85, 70.0)
B.add_bar(c,  83.50 , 20.00,  2.85, 70.0)
B.add_bar(c,  87.50 , 20.00,  0.85, 70.0)
B.add_bar(c,  91.50 , 20.00,  0.85, 70.0)
B.add_bar(c,  94.00 , 20.00,  1.85, 70.0)
B.add_bar(c,  99.50 , 20.00,  2.85, 70.0)
B.add_bar(c, 103.00 , 20.00,  1.85, 70.0)
B.add_bar(c, 105.50 , 20.00,  0.85, 70.0)
B.add_bar(c, 110.00 , 20.00,  1.85, 70.0)
B.add_bar(c, 115.50 , 20.00,  2.85, 70.0)
B.add_bar(c, 118.50 , 20.00,  0.85, 70.0)
B.add_bar(c, 121.00 , 20.00,  1.85, 70.0)


B.add_text(c,"TEST1234", 21.0, 90.0)
B.draw(c,"test_bc.png")

Mad plotz

A recent submission to a journal has caused us a few headaches over the past few weeks, as the editors sent the paper back to us stating that we did not meet the minimal reporting requirements for the type of experiment that was performed. Which is poetic justice in a way, since for the past few years I have been promoting the use of minimal reporting requirements and standard data formats.

I think this particular journal, however, has gone a bit too far in asking for annotated spectra for every identification in the result set. For most low-throughput experiments this is not such a big deal, but we had thousands of identifications and even wrote an algorithm to automatically assign a quality score to those identifications so that such manual validation of spectra should not be necessary.

But I digress. Instead of fighting it, we decided to give the editors what they want, annotated spectra for every hit. It turns out that this is not such a trivial thing to do. Even gathering all of the data was a tough job, since the experiment was performed many years ago on instrumentation that is nearing its end of life. A lot of file parsing and data reorganization had to be done, prior to any development effort to produce the images that the journal wanted.

A bit of background and some numbers will help us understand the enormous task we undertook. The experiment was a proteomics profile of two developmental stages in zebra fish. We used two methodologies, 2D gels and LCMS, to fractionate the samples and ran them through mass spectrometers. The 2D gels gave fewer identifications than the LCMS, but it was still a lot of data. For instance, just these results contaied of 30,000 peptide identifications! You can reduce that to about 2,000 proteins that the journal has asked for annotated spectra. Needless to say, the brute force method of taking screen shots of each spectra from the program would not work.

I wrote a few scripts and libraries to parse the raw data and the final result table to come up with the above figure. This is bringing mzXML, excel, and MGF files together with Ruby, C libraries, and the R statistical tool to produce the nice picture you see, but it took me two weeks to figure out the specifics. How on earth could a regular bencher do this?

I think the journal is in for a rude awakening once the backlash of angry rebuttals from paper submitters start flowing in. I would also like to see their reaction to the gigantic pile of spectra we are about to send them.

Tuesday, September 11, 2007

Touchscreen Lab

For a while now I have been musing on what would a next-generation Lab Information Management Systems (LIMS) user interface could look like. Thought of touchscreen interfaces and ad-hoc text mining have filled my brain, but I have not yet been able to get a clear enough picture to start work on such a project.

Enter the iPhone and iPod touch. How cool would it be to develop a clean and robust LIMS application for them? How would you take advantage of the touchscreen? How easy would filling out notes or changes to protocol templates be with the on-screen keyboard? How would the screen size limit you design process? How effective would it be to integrate SMS for alerts for stop-watch and protocol coordination?

Plus what better excuse can I make to purchase one of these bad boys? Anyway, here is a quick mock-up of the type of interface I envision. Enjoy!


Thursday, August 30, 2007

Now with comments...

OK, OK, so before I said I didn't believe in comments ... but I guess they have a time and place. Like a blog that gives code examples. I have to admit that sometimes it is nice to see some point clarified by the blogger to some reader's question, but for the most part, I still think they are not so useful.

As a compromise, I decided to enable comments to posts, but you'll have to verify that you are indeed a real-live person with captcha for each post.

Also if you do have a blog, or have something more substantial to add, I prefer you post comment on your own blog and use blogger.com's "link to this post" back-link functionality. I think this would make for a better conversation and also increase google scores ;)

Tuesday, August 28, 2007

Gbarcode using GD script

BTW, here is the script I used to create the barcode in the previous post using the gbarcode and gd2 gems:


require 'rubygems'
require 'gd2'
require 'gbarcode'

include Gbarcode
include GD2

b = barcode_create("TEST1234567890")
barcode_encode(b,BARCODE_128)

w = 20
h = 100
x = 10

y1 = 10
y2 = h - 20

bars = b.partial.split(//).map {|e| e.to_i}
bars.map {|e| w += e}

i = Image::IndexedColor.new(w,h)
i.palette << c =" Canvas.new(i)" color =" Color::BLACK" font =" Font::Small" f =" File.open(">

For the public good

For a while now, I have been working on designing data standards for the research community. Often times, the standards process is not so much based on efficient and useful design, but more on compromises between a large and diverse set of users. So far this process has led to complex standards that I have to take some partial credit for. Frankly I would rather not, but a publication record is a must for any sort of success at academic institutions.

But something good did came out of my dissatisfaction in the public contributions I have made thus far. I was motivated to contribute something to the open source community that was completely unrelated to data standards, a barcode creation library ( a gem) for ruby, Gbarcode. Several items helped in deciding that this would be a good project:
  • I needed to create barcodes for a project ;)
  • the existing ruby barcode gem only produced Code 39
  • the images that it produced were not readable by my scanner
  • the project was dead (last release was in July 2005)
I looked around and the current open source projects where GNU barcode (C), and Barbecue (Java). I decided to try my hand at SWIG wrapping the GNU barcode C API. Long story short, SWIG is not the most intuitive tool, but I was able to make some strides in creating the interface file and pass in Ruby strings to create the barcodes with.

One major hurdle of the project was creating a MS Windows-compatible gem. Gems are notorious for not supporting Windows. On Unix, Linux and Mac OS X, the gems usually install just fine, since they are compiled on install. On Windows, it is not at all straight forward to pre-compiling the parts of the library written in C. Since I wanted this to be useful for the widest audience, I looked around for other gems that did support Windows and found that Hpricot has a nice rake task and environment for compiling and packaging the gem for windows. Thanks to _why I was able to make this work, with a bit of leg work. I wouldn't recommend going to the SVN repos to look at what I did, since it is in an unstable state. Just go to _why's site and look at what he did.

One criticism I have with Gbarcode is that the only supported output format is PostScript. In order to use it for web sites (specifically RoR), you would have to run the output through ImageMagick, or some other image processing software. Much to my pleasant surprise, I looked the other day on Google, and several sites have covered how to do this. Just search for ruby and barcode and it should come right up.

The drawback to the approaches listed in those how-to's is that RMagick (and ImageMagick) are memory hogs. Since people are actually starting to use Gbarcode, I have started thinking about re-coding it it to make it more Ruby-ish (currently since it is a binding of the C lib, it uses C-style method calls) and to use Cairo as the image producing library. I tried GD, but the barcodes come out less than optimal:

I don't know why the bottom part of the barcode marges bars, maybe it happens on the way to encoding the PNG, but nothing I tried fixes this. My hope is that with the Cairo integration, this artifact goes away.

Tuesday, August 21, 2007

A word about comments on blogs...

Bencher #1 also asked why I didn't allow comments on the blog. There is a good reason for this, see here and here.

UPDATE: Plus he can always come down the hall to complain ;)

UPDATE (8.23.07): For the impatient, I summarize:

  • Blogs should be about one voice, it is not a debate forum.
  • If you have something to say, start your own blog

Back two steps...

So bencher #1 that changed the regex in the file rename program to fit his needs says I come off a bit arrogant on this blog. Yeah, I sorta have to agree, and he makes some other good points:
  1. his role is to do research and write papers
  2. he brings in the grant money
  3. learned to program since the compy86 came out
Fair enough. But learning a newer and more applicable language never hurts and could save time when I am not available.

UPDATE: I did change the wording a bit on the last post. Happy?

Progress!

It turns out that this blog is not a waste of time ;)

From my previous post, the bencher that requested a script to rename files actually modified it to fit his needs. This is great news and was unexpected.

He downloaded and installed ruby on his computer and ran the script himself, found that some filenames did not match to a LIMS ID in the input file. He knew these files did have a LIMS ID so he started investigating the code and found comments that identified where regular expressions where matching LIMS IDs to file names.

So what if he got stuck on a few files and had to ask for some regular expression help, at least he was trying and that is A Good Thing.

Monday, August 20, 2007

Tag Line Woes

So the tag line for this blog has changed a few times already. Call me fickle, but I have not found the "perfect" one yet, as each has had it's ups and downs:

  • Where agile methods and bioinformatics colide
  • Where agile methods and bioinformatics meet
  • For researchers and developers alike
To the current "For researchers (learn to code) and developers (learn to speak) alike". Doesn't exactly role off of the tongue. Or get my point across. Anywho, I am sure this won't be the last iteration.

BTW, the title also went through some changes on the first day, since "Agile Science" and a few other related titles was already taken in the blogosphere.

The SNP per Gene count

IN my last post I related an example where a scientist came to me to parse a file for the number of SNPs per gene in an excel file. The simplest solution would be to use a hash keyed on the gene symbol and the value tracks the number of times you have seen a particular gene symbol. Here is the program:


require 'rubygems'
require 'fastercsv'

genecount= Hash.new()
FasterCSV.foreach(ARGV[0], :headers => true) do |row|
# headers => id,snp_id,genome_build,chromosome,coordinate,gene_symbol,priority,snp_per_gene
if (genecount[row["gene_symbol"]])
genecount[row["gene_symbol"]] += 1
else
genecount[row["gene_symbol"]] = 1
end
end


output = File.open("#{ARGV[0]}.rev.csv", "w")
output.puts("gene_symbol,snp_count")
genecount.each_pair do |g,c|
output.puts "\"#{g}\",#{c}"
end
output.close

Friday, August 17, 2007

Hacks Before Code

I often find that when you are trying to solve two problems at once, you do a poor job of both. Case in point, someone just came into my office asking how they would go about getting the number of SNPs per gene from some excel file they have. I start to explain set theory and databases and you could see visible signs of mental shutdown ensue (the slacking jaw, the glazed eyes). Trying a different tactic and showing them a script as I wrote it to create a hash keyed by gene and the value being the count of SNPs from the file gave equal results.

So I am trying out Something New. I am going to push that researchers learn to program in a context that is completely separate from science, and is hopefully fun enough that they stick with it for more than a few days. Enter Hackety.org, a project spearheaded by _why the lucky stiff that seeks to (insert Fake Steve Jobs "voice") re-instill the child like wonder back into learning how to program.

With HacketyHack, I hope that researchers are motivated to learn aspects of programming in an entertaining environment before they have to do any real work, which of course will suck some of the fun out of the activity.

I'll be putting together lessons to augment the existing 7 exercises of HacketyHack in the coming months with real but simple bioinformatics tutorials. So download that hack-box and get coding folks!

ITMArT: A request tracking system

For a few (3-4) months I have been working on a user request and order management tracking system. Most of that time has been spent wrestling with RoR's ajax functionality and making the UI as intuitive as I possibly can. Basically I took the "getting real" book at face value and started with the interface.

What remains, though, are lots of "under-the-hood" plumbing to get small things like getting user accounts to work with the CAS SSO server, access control lists and group management. Oh, and email alerts... yeeesh. Well at least it looks pretty.


The search works well and the cart concept seems to be pretty easy to follow. The order processing, though, still leaves something to be desired. Reporting is air-ware at the moment.

I'll keep posting tidbits about this project often (since it currently take 90% of my time) so stay tuned!

Tut 1: Rename a set of files

Today I had a researcher come to me asking if I can write a script to rename a set of result files following some convention. This article will cover that bit of coding, but first some background:

1) I use Ruby, and Ruby on Rails, for my day-to-day operations. While there are some rough edges in Ruby's library support, it get's most things done efficiently, and of course you can't get much better than RoR for web apps. So any code in this blog will usually be Ruby code.

2) We have a commercial Laboratory Information Management System (LIMS) that creates identifiers for experiments, samples, and result files. The twist here is that most (3/4) of the experiments have already been accomplished before introduction of the LIMS. So while the LIMS is capable of outputing queue files for the instruments to name the files according to LIMS' convention, this does not apply here and we must retrofit the LIMS IDs into the existing result files.

Why is this important at all? Well, the LIMS can automatically assign the result file to the annotated experiment in the system on file upload if the result file has the correct identifier in the name. While you could do this manually, you would not want to do this for the 1000 result files that were/are going to be produced. See first post on time wasting by researchers that don't know how to code. At least this one is smart enough to know there is a better way.

The good news is that as long as the filename contains the LIMS ID, it does not matter what the rest of the name is, so we only have to figure out a way to relate the existing filename to proposed LIMS ID. This turns out to be easier than expected since they both have a sequential number that corresponds to the source sample in them.

E.g. :
existing file name = 07Aug05_SF_ASA_583.RAW
LIMS ID = APA1742A583MS3
proposed rename = APA1742A583MS3_07Aug05_SF_ASA_583.RAW

Thus a simple regular expression can pull out the proper sample number from the result filename and LIMS ID and do the renaming. Without further adieu, the script:

#!/usr/bin/env ruby
require 'rubygems'
require 'fastercsv'

# output a useage message if no inputs are given
unless ARGV[0]
puts "Need input queue file and directory of RAW files"
puts "USAGE:"
puts "ruby rename.rb INPUT_QUEUE_FILE INPUT_DIR"
exit(0)
end

#define a LIMS ID lookup hash keyed by the sample number
lims_ids = {}

# use FasterCSV to parse the LIMS instrument queue file for the LIMS IDs
# We need the third column for the filename (remember that arrays start with zero ;)
FasterCSV.foreach(ARGV[0]) do |row|
if (row[2] =~ /(\d+)MS3\-/)
k = $1
row[2] =~ /^(\S+MS3)\-/
lims_ids[k] = $1
end
end

# change to the directory with all of the result files
# and read the files that have a "RAW" extension
Dir.chdir(ARGV[1])
raw_files = Dir.glob("*.{RAW,raw}")

# go through the set of files and rename them
raw_files.each do |f|
puts f
f =~ /(\d+)\.RAW$/i
puts $1
if (lims_ids[$1])
system("mv #{f} #{lims_ids[$1]}_#{f}")
end
end

Thursday, August 16, 2007

It's Alive!

Good premise for a blog ... check.
Catchy title ... check.
Bad first-post title ... check.

OK! Start the blog!

Well, they say that third times the charm and since my other two blogs have gone stale and moldy, this would be the third. And it's not even a New Year's resolution! Actual internal motivation started this one ;)

So what is Def Sci about, you ask? Well basically I think that bioinformatics applications (and scientific software in general) is over-engineered and does not pay enough attention to the UI aspects of their software. Many projects and otherwise good idea fail to grasp the essential point that agile web development shops have been pushing of late; namely that without an intuitive and simple interface, you'll never get adoption and thus never get an evangelical early adopter community to beat your drums. I am here to espouse my views on simple and agile development, with a skew towards biomedical informatics.

Open source developers and commercial vendors, lend me your ears, because I have a direct line to actual researchers and deal with them daily! I know what they like and don't like about products! I know their needs! I am the bridge, if you will, between those that would sell me something and those that would use it.

But wait that's not all! I have a message for the biomedical researcher as well:

"Learn some programming, bub."

I kid you not when I say I have walked into a meeting and someone tells me they spent weeks trawling through protein result lists in excel determining which were the same/different across a couple of conditions and experiments and did I have a better way to do this. My answer: "That's three lines of code." OK, maybe four. Would take me 1 minute to code, and most of that time is spent typing, since I never did learn to type with more than 6 fingers.

So my audience, prepare yourself for a mix of posts ranging from problems folks approach me with (and the solution I come up with), to reviews of source software out there in the wild, to comments on projects that I am working on, to just plain old rants (like this one :) )