This past week, my good friend and collegue at AWS, Chris Crosbie, published a blog post about analyzing whole genome sequence data with one of our genomic analysis platform partners, Seven Bridges Genomics, importing the resulting VCF files to Amazon Redshift, and doing a simple analysis of the data within R and Bioconductor.
The post is lengthy and technically dense, but it is delightful in that it showcases one of the major advantages of working with genomics data on AWS: namely the rich and diverse ecosystem of tools and platform providers that a researcher can bundle together to fit their exact needs.
There has certainly been a failing on my part to communicate the fullness of this ecosystem, and Chris’s post has been a good reminder to me that I need to communicate better (and more often) to the broader bioinformatics community about how to effectively pair down their choices when facing some analysis challenge.
PS: If you like Chris’s post, be sure to read the other one we put together with Matt Wood about uploading dbGaP data to Amazon S3.