Showing posts with label programming. Show all posts
Showing posts with label programming. Show all posts

Thursday, September 27, 2007

Cairo for barcodes

Continuing my discussion about Gbarcode 2, I tested my theory of using Cairo for creation of barcodes. It turns out that this is not so hard, but it was a bit of a lerning curve, to say the least. I used the ruby DL library to load the cairo shared libs from the system (thanks to the GD2 gem for the code hints here). A few small methods where all that were needed for bare minimum functionality: creating bars and adding text. No fancy formating here.

Next, I used gnu barcode to get layout information for a barcode, so I could test the drawing methods independently of barcode creation logic.

The result is the picture posted above. Neat huh? Code is posted below, but I think for a production gem, I'll probably not use DL, since I have to wrap the gnu C libs for actually creating barcodes from text strings anyway.

Without further ado, the test script:
require 'dl'
require 'rbconfig'

module BC
VERSION = '1.5.0'.freeze

def self.cairo_library_name
case Config::CONFIG['arch']
when /darwin/
  'libcairo.2.dylib'
when /mswin32/, /cygwin/
  'cairo.dll'
else
  'libcairo.so.2'
end
end
def self.name_for_symbol(symbol, signature)
case Config::CONFIG['arch']
when /mswin32/, /cygwin/
  sum = -4
  signature.each_byte do |char|
    sum += case char
    when ?D: 8
    else     4
    end
  end
  "#{symbol}@#{sum}"
else
  symbol.to_s
end
end

private_class_method :cairo_library_name, :name_for_symbol

LIB = DL.dlopen(cairo_library_name)
SYM = {
:cairo_image_surface_create   => 'PIII',
:cairo_create    => 'PP',
:cairo_get_target    => 'PP',
:cairo_destroy    => '0P',
:cairo_surface_destroy    => '0P',
:cairo_surface_write_to_png    => '0PS',
:cairo_set_source_rgb    => '0PDDD',
:cairo_move_to    => '0PDD',
:cairo_line_to    => '0PDD',
:cairo_set_line_width    => '0PD',
:cairo_stroke    => '0P',
:cairo_select_font_face    => '0PSII',
:cairo_set_font_size    => '0PD',
:cairo_show_text    => '0PS'
}.inject({}) { |x, (k, v)| x[k] = LIB[name_for_symbol(k, v), v]; x }

class LibraryError < rs =" SYM[:cairo_image_surface_create].call(0,w,h)" rs =" SYM[:cairo_create].call(s)"> #{s.class}] :: R[#{r} =>  #{r.class}] :: RS[#{rs} =>  #{rs.class}]"
  SYM[:cairo_set_source_rgb].call(r,0.0,0.0,0.0)
  puts "S [#{s} => #{s.class}] :: R[#{r} =>  #{r.class}] :: RS[#{rs} =>  #{rs.class}]"
  return r
end

def self.ctx w,h
  context(surface(w,h))
end

def self.add_bar ctx,x,y,w,h
  # cairo_move_to(cr,11.0,20.5);
  # cairo_line_to(cr,11.0,70.5);
  # cairo_set_line_width(cr,1.85);
  # cairo_stroke(cr);
  SYM[:cairo_move_to].call(ctx,x,y)
  SYM[:cairo_line_to].call(ctx,x,h)
  SYM[:cairo_set_line_width].call(ctx,w)
  SYM[:cairo_stroke].call(ctx)
end

def self.add_text(ctx,txt,x,y)
  # cairo_select_font_face (cr, "serif", CAIRO_FONT_SLANT_NORMAL = 0, CAIRO_FONT_WEIGHT_BOLD = 1);
  # cairo_set_font_size (cr, 12.0);
  # cairo_move_to (cr, 21.0, 90.0);
  # cairo_show_text (cr, "TEST1234");  
  SYM[:cairo_select_font_face].call(ctx,"serif",0,1)
  SYM[:cairo_set_font_size].call(ctx,12.0)
  SYM[:cairo_move_to].call(ctx,x,y)
  SYM[:cairo_show_text].call(ctx,txt)
end

def self.draw(ctx,fname)
  # surface = cairo_get_target(cr)
  # cairo_destroy(cr);
  # cairo_surface_write_to_png (surface, "hello.png");
  # cairo_surface_destroy (surface);
  r,rs = SYM[:cairo_get_target].call(ctx)
  SYM[:cairo_destroy].call(ctx);
  SYM[:cairo_surface_write_to_png].call(r,fname)
  SYM[:cairo_surface_destroy].call(r);
end
end
end

include BC
c = B.ctx 132, 100

B.add_bar(c,  11.00 , 20.00,  1.85, 70.0)
B.add_bar(c,  13.50 , 20.00,  0.85, 70.0)
B.add_bar(c,  16.50 , 20.00,  0.85, 70.0)
B.add_bar(c,  22.00 , 20.00,  1.85, 70.0)
B.add_bar(c,  25.50 , 20.00,  2.85, 70.0)
B.add_bar(c,  30.50 , 20.00,  0.85, 70.0)
B.add_bar(c,  32.50 , 20.00,  0.85, 70.0)
B.add_bar(c,  37.00 , 20.00,  1.85, 70.0)
B.add_bar(c,  39.50 , 20.00,  0.85, 70.0)
B.add_bar(c,  44.00 , 20.00,  1.85, 70.0)
B.add_bar(c,  47.50 , 20.00,  2.85, 70.0)
B.add_bar(c,  50.50 , 20.00,  0.85, 70.0)
B.add_bar(c,  55.00 , 20.00,  1.85, 70.0)
B.add_bar(c,  58.50 , 20.00,  2.85, 70.0)
B.add_bar(c,  63.50 , 20.00,  0.85, 70.0)
B.add_bar(c,  65.50 , 20.00,  0.85, 70.0)
B.add_bar(c,  68.50 , 20.00,  2.85, 70.0)
B.add_bar(c,  73.00 , 20.00,  3.85, 70.0)
B.add_bar(c,  76.50 , 20.00,  0.85, 70.0)
B.add_bar(c,  79.00 , 20.00,  1.85, 70.0)
B.add_bar(c,  83.50 , 20.00,  2.85, 70.0)
B.add_bar(c,  87.50 , 20.00,  0.85, 70.0)
B.add_bar(c,  91.50 , 20.00,  0.85, 70.0)
B.add_bar(c,  94.00 , 20.00,  1.85, 70.0)
B.add_bar(c,  99.50 , 20.00,  2.85, 70.0)
B.add_bar(c, 103.00 , 20.00,  1.85, 70.0)
B.add_bar(c, 105.50 , 20.00,  0.85, 70.0)
B.add_bar(c, 110.00 , 20.00,  1.85, 70.0)
B.add_bar(c, 115.50 , 20.00,  2.85, 70.0)
B.add_bar(c, 118.50 , 20.00,  0.85, 70.0)
B.add_bar(c, 121.00 , 20.00,  1.85, 70.0)


B.add_text(c,"TEST1234", 21.0, 90.0)
B.draw(c,"test_bc.png")

Mad plotz

A recent submission to a journal has caused us a few headaches over the past few weeks, as the editors sent the paper back to us stating that we did not meet the minimal reporting requirements for the type of experiment that was performed. Which is poetic justice in a way, since for the past few years I have been promoting the use of minimal reporting requirements and standard data formats.

I think this particular journal, however, has gone a bit too far in asking for annotated spectra for every identification in the result set. For most low-throughput experiments this is not such a big deal, but we had thousands of identifications and even wrote an algorithm to automatically assign a quality score to those identifications so that such manual validation of spectra should not be necessary.

But I digress. Instead of fighting it, we decided to give the editors what they want, annotated spectra for every hit. It turns out that this is not such a trivial thing to do. Even gathering all of the data was a tough job, since the experiment was performed many years ago on instrumentation that is nearing its end of life. A lot of file parsing and data reorganization had to be done, prior to any development effort to produce the images that the journal wanted.

A bit of background and some numbers will help us understand the enormous task we undertook. The experiment was a proteomics profile of two developmental stages in zebra fish. We used two methodologies, 2D gels and LCMS, to fractionate the samples and ran them through mass spectrometers. The 2D gels gave fewer identifications than the LCMS, but it was still a lot of data. For instance, just these results contaied of 30,000 peptide identifications! You can reduce that to about 2,000 proteins that the journal has asked for annotated spectra. Needless to say, the brute force method of taking screen shots of each spectra from the program would not work.

I wrote a few scripts and libraries to parse the raw data and the final result table to come up with the above figure. This is bringing mzXML, excel, and MGF files together with Ruby, C libraries, and the R statistical tool to produce the nice picture you see, but it took me two weeks to figure out the specifics. How on earth could a regular bencher do this?

I think the journal is in for a rude awakening once the backlash of angry rebuttals from paper submitters start flowing in. I would also like to see their reaction to the gigantic pile of spectra we are about to send them.

Friday, August 17, 2007

Hacks Before Code

I often find that when you are trying to solve two problems at once, you do a poor job of both. Case in point, someone just came into my office asking how they would go about getting the number of SNPs per gene from some excel file they have. I start to explain set theory and databases and you could see visible signs of mental shutdown ensue (the slacking jaw, the glazed eyes). Trying a different tactic and showing them a script as I wrote it to create a hash keyed by gene and the value being the count of SNPs from the file gave equal results.

So I am trying out Something New. I am going to push that researchers learn to program in a context that is completely separate from science, and is hopefully fun enough that they stick with it for more than a few days. Enter Hackety.org, a project spearheaded by _why the lucky stiff that seeks to (insert Fake Steve Jobs "voice") re-instill the child like wonder back into learning how to program.

With HacketyHack, I hope that researchers are motivated to learn aspects of programming in an entertaining environment before they have to do any real work, which of course will suck some of the fun out of the activity.

I'll be putting together lessons to augment the existing 7 exercises of HacketyHack in the coming months with real but simple bioinformatics tutorials. So download that hack-box and get coding folks!