Showing posts with label ruby. Show all posts
Showing posts with label ruby. Show all posts

Friday, February 15, 2008

Sequel's lackluster to_*

Sequel is a great bare-bones ORM, but the bare-bones quality of Sequel::Model leave something to be desired. For instance the obj.to_json method just calls the default ruby object inspect method, which prints out the class name and memory space. Not helpful. Also no to_xml() for easy REST incorporation. Almost makes me want to go back to ActiveRecord, but then what's the point in using Merb?

Anywho, it's not that hard to extend the functionality of Sequel::Model, so I have started writing some gems to make development with Merb + Sequel a little easier. Like real to_xml & to_json methods on the model instances and instance collections. More on this as it develops.

Merb TLS mail plugin gem

UPDATE: Now available on github : merb_tlsmail github page

My previous post on sending mail via a TLS SMTP server on merb covered monkey patching Merb::Mailer.

I took the time to code this up as a gem, using Merb's meta-programming routines to extend Merb:Mailer in a standard way (for merb that is). See this open ticket in the Merb lighthouse issue tracker to download the gem until it is released as a proper plugin.

Tuesday, February 12, 2008

Secure SMTP server (TLS) from merb apps

It seems that Merb's Mailer class is either using a local sendmail client or a non-TLS enabled SMTP server. This is not a unique problem to merb, but rather a deficiency in Ruby 1.8.

I took some time to look around and found that Rails has the same problem, and it was fixed via a plugin, not a gem as is the "merb way". There was also a gem that packaged the Net::SMTP classes from Ruby 1.9, which do have TLS support. It isn't hard to guess what I did next.

I monkey punched Merb::Mailer to overwrite the net_smtp method and added two config options into merb_init.rb. See the pastie for the code example here.

http://pastie.caboo.se/151190

Friday, January 4, 2008

Zed.. very humorous

Zed Shaw's latest rant is a hoot. When I first read it, it was clearly the draft he mentioned it was. I'm glad he posted it as a draft, though, because the next iteration did give a chance for DHH to clarify, and also gave Zed a chance to frame the whole rant a bit better with his admission that he himself was the main person responsible for almost going to the poor house.

Tuesday, November 27, 2007

Speaking at Philly on Rails Meeting

I will be speaking at the PhillyOnRails December group meeting next week. Topics, previously covered on this blog, will be using R with Ruby to make plots and using Gbarcode.

Wednesday, October 10, 2007

Easy Gbarcode

A few posts ago I had mentioned that my Gbarcode project should really be a bit more user friendly as well as provide an easy way to create PNGs without loading memory hogging libraries like RMagick. I investigated using the Cairo libraries for this here.

Good news for every one that uses Gbarcode, and those thinking of using it, I created a small wrapper module that is more Ruby-ish and also uses Cairo to print out to PNG. The functionality is bare-bones, with a fixed height of 150 pixels and width is determined by the length of the encoded barcode. It works well for Code 128 barcodes. Grab the ruby file here.

To use this module, here is an example (note that bad encoding schemes will result in a raised excpetion, hence the begin/rescue block):
require 'gbc'
require 'gbarcode'
require 'markaby'

begin
b = GBC::Barcode.new(ARGV[0], Gbarcode::BARCODE_ANY)
puts b.ascii, b.partial, b.encoding
b.to_png("out.png")
b = Markaby::Builder.new()
b.html do
body do
img(:src => "out.png")
end
end
f = File.open("out.html","w")
f.write(b.to_s)
f.close
rescue Exception => e
puts e.message
end

Thursday, September 27, 2007

Cairo for barcodes

Continuing my discussion about Gbarcode 2, I tested my theory of using Cairo for creation of barcodes. It turns out that this is not so hard, but it was a bit of a lerning curve, to say the least. I used the ruby DL library to load the cairo shared libs from the system (thanks to the GD2 gem for the code hints here). A few small methods where all that were needed for bare minimum functionality: creating bars and adding text. No fancy formating here.

Next, I used gnu barcode to get layout information for a barcode, so I could test the drawing methods independently of barcode creation logic.

The result is the picture posted above. Neat huh? Code is posted below, but I think for a production gem, I'll probably not use DL, since I have to wrap the gnu C libs for actually creating barcodes from text strings anyway.

Without further ado, the test script:
require 'dl'
require 'rbconfig'

module BC
VERSION = '1.5.0'.freeze

def self.cairo_library_name
case Config::CONFIG['arch']
when /darwin/
  'libcairo.2.dylib'
when /mswin32/, /cygwin/
  'cairo.dll'
else
  'libcairo.so.2'
end
end
def self.name_for_symbol(symbol, signature)
case Config::CONFIG['arch']
when /mswin32/, /cygwin/
  sum = -4
  signature.each_byte do |char|
    sum += case char
    when ?D: 8
    else     4
    end
  end
  "#{symbol}@#{sum}"
else
  symbol.to_s
end
end

private_class_method :cairo_library_name, :name_for_symbol

LIB = DL.dlopen(cairo_library_name)
SYM = {
:cairo_image_surface_create   => 'PIII',
:cairo_create    => 'PP',
:cairo_get_target    => 'PP',
:cairo_destroy    => '0P',
:cairo_surface_destroy    => '0P',
:cairo_surface_write_to_png    => '0PS',
:cairo_set_source_rgb    => '0PDDD',
:cairo_move_to    => '0PDD',
:cairo_line_to    => '0PDD',
:cairo_set_line_width    => '0PD',
:cairo_stroke    => '0P',
:cairo_select_font_face    => '0PSII',
:cairo_set_font_size    => '0PD',
:cairo_show_text    => '0PS'
}.inject({}) { |x, (k, v)| x[k] = LIB[name_for_symbol(k, v), v]; x }

class LibraryError < rs =" SYM[:cairo_image_surface_create].call(0,w,h)" rs =" SYM[:cairo_create].call(s)"> #{s.class}] :: R[#{r} =>  #{r.class}] :: RS[#{rs} =>  #{rs.class}]"
  SYM[:cairo_set_source_rgb].call(r,0.0,0.0,0.0)
  puts "S [#{s} => #{s.class}] :: R[#{r} =>  #{r.class}] :: RS[#{rs} =>  #{rs.class}]"
  return r
end

def self.ctx w,h
  context(surface(w,h))
end

def self.add_bar ctx,x,y,w,h
  # cairo_move_to(cr,11.0,20.5);
  # cairo_line_to(cr,11.0,70.5);
  # cairo_set_line_width(cr,1.85);
  # cairo_stroke(cr);
  SYM[:cairo_move_to].call(ctx,x,y)
  SYM[:cairo_line_to].call(ctx,x,h)
  SYM[:cairo_set_line_width].call(ctx,w)
  SYM[:cairo_stroke].call(ctx)
end

def self.add_text(ctx,txt,x,y)
  # cairo_select_font_face (cr, "serif", CAIRO_FONT_SLANT_NORMAL = 0, CAIRO_FONT_WEIGHT_BOLD = 1);
  # cairo_set_font_size (cr, 12.0);
  # cairo_move_to (cr, 21.0, 90.0);
  # cairo_show_text (cr, "TEST1234");  
  SYM[:cairo_select_font_face].call(ctx,"serif",0,1)
  SYM[:cairo_set_font_size].call(ctx,12.0)
  SYM[:cairo_move_to].call(ctx,x,y)
  SYM[:cairo_show_text].call(ctx,txt)
end

def self.draw(ctx,fname)
  # surface = cairo_get_target(cr)
  # cairo_destroy(cr);
  # cairo_surface_write_to_png (surface, "hello.png");
  # cairo_surface_destroy (surface);
  r,rs = SYM[:cairo_get_target].call(ctx)
  SYM[:cairo_destroy].call(ctx);
  SYM[:cairo_surface_write_to_png].call(r,fname)
  SYM[:cairo_surface_destroy].call(r);
end
end
end

include BC
c = B.ctx 132, 100

B.add_bar(c,  11.00 , 20.00,  1.85, 70.0)
B.add_bar(c,  13.50 , 20.00,  0.85, 70.0)
B.add_bar(c,  16.50 , 20.00,  0.85, 70.0)
B.add_bar(c,  22.00 , 20.00,  1.85, 70.0)
B.add_bar(c,  25.50 , 20.00,  2.85, 70.0)
B.add_bar(c,  30.50 , 20.00,  0.85, 70.0)
B.add_bar(c,  32.50 , 20.00,  0.85, 70.0)
B.add_bar(c,  37.00 , 20.00,  1.85, 70.0)
B.add_bar(c,  39.50 , 20.00,  0.85, 70.0)
B.add_bar(c,  44.00 , 20.00,  1.85, 70.0)
B.add_bar(c,  47.50 , 20.00,  2.85, 70.0)
B.add_bar(c,  50.50 , 20.00,  0.85, 70.0)
B.add_bar(c,  55.00 , 20.00,  1.85, 70.0)
B.add_bar(c,  58.50 , 20.00,  2.85, 70.0)
B.add_bar(c,  63.50 , 20.00,  0.85, 70.0)
B.add_bar(c,  65.50 , 20.00,  0.85, 70.0)
B.add_bar(c,  68.50 , 20.00,  2.85, 70.0)
B.add_bar(c,  73.00 , 20.00,  3.85, 70.0)
B.add_bar(c,  76.50 , 20.00,  0.85, 70.0)
B.add_bar(c,  79.00 , 20.00,  1.85, 70.0)
B.add_bar(c,  83.50 , 20.00,  2.85, 70.0)
B.add_bar(c,  87.50 , 20.00,  0.85, 70.0)
B.add_bar(c,  91.50 , 20.00,  0.85, 70.0)
B.add_bar(c,  94.00 , 20.00,  1.85, 70.0)
B.add_bar(c,  99.50 , 20.00,  2.85, 70.0)
B.add_bar(c, 103.00 , 20.00,  1.85, 70.0)
B.add_bar(c, 105.50 , 20.00,  0.85, 70.0)
B.add_bar(c, 110.00 , 20.00,  1.85, 70.0)
B.add_bar(c, 115.50 , 20.00,  2.85, 70.0)
B.add_bar(c, 118.50 , 20.00,  0.85, 70.0)
B.add_bar(c, 121.00 , 20.00,  1.85, 70.0)


B.add_text(c,"TEST1234", 21.0, 90.0)
B.draw(c,"test_bc.png")

Mad plotz

A recent submission to a journal has caused us a few headaches over the past few weeks, as the editors sent the paper back to us stating that we did not meet the minimal reporting requirements for the type of experiment that was performed. Which is poetic justice in a way, since for the past few years I have been promoting the use of minimal reporting requirements and standard data formats.

I think this particular journal, however, has gone a bit too far in asking for annotated spectra for every identification in the result set. For most low-throughput experiments this is not such a big deal, but we had thousands of identifications and even wrote an algorithm to automatically assign a quality score to those identifications so that such manual validation of spectra should not be necessary.

But I digress. Instead of fighting it, we decided to give the editors what they want, annotated spectra for every hit. It turns out that this is not such a trivial thing to do. Even gathering all of the data was a tough job, since the experiment was performed many years ago on instrumentation that is nearing its end of life. A lot of file parsing and data reorganization had to be done, prior to any development effort to produce the images that the journal wanted.

A bit of background and some numbers will help us understand the enormous task we undertook. The experiment was a proteomics profile of two developmental stages in zebra fish. We used two methodologies, 2D gels and LCMS, to fractionate the samples and ran them through mass spectrometers. The 2D gels gave fewer identifications than the LCMS, but it was still a lot of data. For instance, just these results contaied of 30,000 peptide identifications! You can reduce that to about 2,000 proteins that the journal has asked for annotated spectra. Needless to say, the brute force method of taking screen shots of each spectra from the program would not work.

I wrote a few scripts and libraries to parse the raw data and the final result table to come up with the above figure. This is bringing mzXML, excel, and MGF files together with Ruby, C libraries, and the R statistical tool to produce the nice picture you see, but it took me two weeks to figure out the specifics. How on earth could a regular bencher do this?

I think the journal is in for a rude awakening once the backlash of angry rebuttals from paper submitters start flowing in. I would also like to see their reaction to the gigantic pile of spectra we are about to send them.

Tuesday, August 28, 2007

Gbarcode using GD script

BTW, here is the script I used to create the barcode in the previous post using the gbarcode and gd2 gems:


require 'rubygems'
require 'gd2'
require 'gbarcode'

include Gbarcode
include GD2

b = barcode_create("TEST1234567890")
barcode_encode(b,BARCODE_128)

w = 20
h = 100
x = 10

y1 = 10
y2 = h - 20

bars = b.partial.split(//).map {|e| e.to_i}
bars.map {|e| w += e}

i = Image::IndexedColor.new(w,h)
i.palette << c =" Canvas.new(i)" color =" Color::BLACK" font =" Font::Small" f =" File.open(">

Monday, August 20, 2007

The SNP per Gene count

IN my last post I related an example where a scientist came to me to parse a file for the number of SNPs per gene in an excel file. The simplest solution would be to use a hash keyed on the gene symbol and the value tracks the number of times you have seen a particular gene symbol. Here is the program:


require 'rubygems'
require 'fastercsv'

genecount= Hash.new()
FasterCSV.foreach(ARGV[0], :headers => true) do |row|
# headers => id,snp_id,genome_build,chromosome,coordinate,gene_symbol,priority,snp_per_gene
if (genecount[row["gene_symbol"]])
genecount[row["gene_symbol"]] += 1
else
genecount[row["gene_symbol"]] = 1
end
end


output = File.open("#{ARGV[0]}.rev.csv", "w")
output.puts("gene_symbol,snp_count")
genecount.each_pair do |g,c|
output.puts "\"#{g}\",#{c}"
end
output.close

Friday, August 17, 2007

Hacks Before Code

I often find that when you are trying to solve two problems at once, you do a poor job of both. Case in point, someone just came into my office asking how they would go about getting the number of SNPs per gene from some excel file they have. I start to explain set theory and databases and you could see visible signs of mental shutdown ensue (the slacking jaw, the glazed eyes). Trying a different tactic and showing them a script as I wrote it to create a hash keyed by gene and the value being the count of SNPs from the file gave equal results.

So I am trying out Something New. I am going to push that researchers learn to program in a context that is completely separate from science, and is hopefully fun enough that they stick with it for more than a few days. Enter Hackety.org, a project spearheaded by _why the lucky stiff that seeks to (insert Fake Steve Jobs "voice") re-instill the child like wonder back into learning how to program.

With HacketyHack, I hope that researchers are motivated to learn aspects of programming in an entertaining environment before they have to do any real work, which of course will suck some of the fun out of the activity.

I'll be putting together lessons to augment the existing 7 exercises of HacketyHack in the coming months with real but simple bioinformatics tutorials. So download that hack-box and get coding folks!

ITMArT: A request tracking system

For a few (3-4) months I have been working on a user request and order management tracking system. Most of that time has been spent wrestling with RoR's ajax functionality and making the UI as intuitive as I possibly can. Basically I took the "getting real" book at face value and started with the interface.

What remains, though, are lots of "under-the-hood" plumbing to get small things like getting user accounts to work with the CAS SSO server, access control lists and group management. Oh, and email alerts... yeeesh. Well at least it looks pretty.


The search works well and the cart concept seems to be pretty easy to follow. The order processing, though, still leaves something to be desired. Reporting is air-ware at the moment.

I'll keep posting tidbits about this project often (since it currently take 90% of my time) so stay tuned!

Tut 1: Rename a set of files

Today I had a researcher come to me asking if I can write a script to rename a set of result files following some convention. This article will cover that bit of coding, but first some background:

1) I use Ruby, and Ruby on Rails, for my day-to-day operations. While there are some rough edges in Ruby's library support, it get's most things done efficiently, and of course you can't get much better than RoR for web apps. So any code in this blog will usually be Ruby code.

2) We have a commercial Laboratory Information Management System (LIMS) that creates identifiers for experiments, samples, and result files. The twist here is that most (3/4) of the experiments have already been accomplished before introduction of the LIMS. So while the LIMS is capable of outputing queue files for the instruments to name the files according to LIMS' convention, this does not apply here and we must retrofit the LIMS IDs into the existing result files.

Why is this important at all? Well, the LIMS can automatically assign the result file to the annotated experiment in the system on file upload if the result file has the correct identifier in the name. While you could do this manually, you would not want to do this for the 1000 result files that were/are going to be produced. See first post on time wasting by researchers that don't know how to code. At least this one is smart enough to know there is a better way.

The good news is that as long as the filename contains the LIMS ID, it does not matter what the rest of the name is, so we only have to figure out a way to relate the existing filename to proposed LIMS ID. This turns out to be easier than expected since they both have a sequential number that corresponds to the source sample in them.

E.g. :
existing file name = 07Aug05_SF_ASA_583.RAW
LIMS ID = APA1742A583MS3
proposed rename = APA1742A583MS3_07Aug05_SF_ASA_583.RAW

Thus a simple regular expression can pull out the proper sample number from the result filename and LIMS ID and do the renaming. Without further adieu, the script:

#!/usr/bin/env ruby
require 'rubygems'
require 'fastercsv'

# output a useage message if no inputs are given
unless ARGV[0]
puts "Need input queue file and directory of RAW files"
puts "USAGE:"
puts "ruby rename.rb INPUT_QUEUE_FILE INPUT_DIR"
exit(0)
end

#define a LIMS ID lookup hash keyed by the sample number
lims_ids = {}

# use FasterCSV to parse the LIMS instrument queue file for the LIMS IDs
# We need the third column for the filename (remember that arrays start with zero ;)
FasterCSV.foreach(ARGV[0]) do |row|
if (row[2] =~ /(\d+)MS3\-/)
k = $1
row[2] =~ /^(\S+MS3)\-/
lims_ids[k] = $1
end
end

# change to the directory with all of the result files
# and read the files that have a "RAW" extension
Dir.chdir(ARGV[1])
raw_files = Dir.glob("*.{RAW,raw}")

# go through the set of files and rename them
raw_files.each do |f|
puts f
f =~ /(\d+)\.RAW$/i
puts $1
if (lims_ids[$1])
system("mv #{f} #{lims_ids[$1]}_#{f}")
end
end