Monday, August 20, 2007

The SNP per Gene count

IN my last post I related an example where a scientist came to me to parse a file for the number of SNPs per gene in an excel file. The simplest solution would be to use a hash keyed on the gene symbol and the value tracks the number of times you have seen a particular gene symbol. Here is the program:


require 'rubygems'
require 'fastercsv'

genecount= Hash.new()
FasterCSV.foreach(ARGV[0], :headers => true) do |row|
# headers => id,snp_id,genome_build,chromosome,coordinate,gene_symbol,priority,snp_per_gene
if (genecount[row["gene_symbol"]])
genecount[row["gene_symbol"]] += 1
else
genecount[row["gene_symbol"]] = 1
end
end


output = File.open("#{ARGV[0]}.rev.csv", "w")
output.puts("gene_symbol,snp_count")
genecount.each_pair do |g,c|
output.puts "\"#{g}\",#{c}"
end
output.close