Microarray analysis

This course is about what?

Get some general background on DNA microarrays and protein microarrays first! NCBI has more in-depth scientific applications for you as well. A scientific overview of the opportunities and challenges when working with microarrays is available, too.

First thing first: How to create a microarray?

Probe design has to follow several principles to optimize their specificity and sensitivity:

Luckily for you, all these subsequent steps are now automated for you and designing a microarray is very much a black-box process, where by you only have to specify your target sequences. Yet it is still important to understand the underlying principles to provide these software with the correct data.

Most commercial vendors have their own Microarray design platform, e.g. Agilent or Affymetrix.

However, third-party software, independent from any vendor is also available. Some software is available for free (such as Picky, developed here at Iowa State University; registration required) or commercial (such as Array Designer).

In this course, we will use Picky. You can download a trial-version for free. If you want to keep using it, your PI will have to request an academic license.

The team has provided excellent tutorials online, so we encourage you to walk through these now.

The original paper for Picky which includes additional background and design criteria for the probes was published in Bioinformatics. It contains valuable background-information if you want to use Picky for your own projects. The paper also elaborates on probe design concerns.

Where can I find microarray data?

Many microarray experiments have been made available to the general public. Several excellent sources are available online to retrieve these datasets:

Preparing to analyze microarray data

Okay, you've created an array and you used it in a project and you have some data... Or you simply downloaded one of the datasets from the above-mentioned databases. Now what?

In this course, we will in part rely on software that was designed to run under R, a wide-spread, popular, and freely available statistics software-package.

This means that you'll need to familiarize yourself with R! If you haven't installed R yet on your system, please do so now. [local copy]

An excellent introduction to R is provided by the development-team itself. [local copy]

However, this course is about microarrays! Garrett Dancik, a former graduate student in the BCB program, therefore wrote an excellent tutorial himself, along with excercises. He was nice enough to provide answers, too.

In order to work efficiently with microarray data in R, you'll also need to install BioConductor on top of R. During the installation, you screen will look somewhat like this. You can check if everything went well by typing in library("affy") at the command-line in R after installation of BioConductor. If you get an error message such as Error in library("affy") : there is no package called 'affy'. things didn't go so well and you should try to re-install the software.

Once you're through with installing BioConductor, you can install the sample data as per instructions.

There are three more additional libraries you need: Twilight, hgu95av2 and hgu95av2cdf.

If you want to see what libraries you've installed so far, you can use the library() command. Another useful R-tip: Ctrl+L will clear your screen.

Are we there yet?

Yay! You made it to this part of the course and you're still breathing (although you may experience some slight elevation in blood pressure).

Yes, that was hard. However the extent of pre-manufactured software that you need to install before actually getting to analyze your data is a testament to the extent in which microarrays have become standard tools to unravel biological and medical mysteries. It also means that you don't have any programming to do to: everything's been done for you by other people already. Isn't that great? Unfortunately, it does mean that you have some catch-up to do and occasionally will have to grind our way through software installation protocols as we just went through.

Ready to get started with some actual analysis? Try loading the following packages:

Now load the twilight module as well (we're not going to tell you everything).

If you get no error messages, you're ready to start with for the real work.

A first tutorial can be found here. Make sure to interpret and map the pathway-names to your system though!

After working through this, you're ready for second tutorial on estrogen. The estrogen tutorial can be found here [local copy].

By the way, .cel files are huge! The sample files that come with the estrogen library are 17243 x 19525 pixels in size. While you can open them in graphics packages such as GIMP, it is not recommended. You're better off leaving this to specialized software such as BioConductor or geWorkbench. The image-files are black & White.

Differentially expressed gene categories

The material in this section was provided by Dr. Dan Nettleton and illustrates how to apply the methodology described in Nettleton, D., Recknor, J., Reecy, J.M. (2007). Identification of differentially expressed gene categories in microarray studies using nonparametric multivariate analysis. Bioinformatics. 24, 192-201. [local copy]

A guided R-excercise is available here.

Explorase: advanced microarray data visualization

Explorase is an advanced package for R that was developed by Michael Lawrence at Iowa State University. A paper is available as well. [local copy]

Before we can use Explorase, we'll need to install it. To speed up the process, you can install GTK+ and GGobi indivually from local images. After this, please go to the Explorase website, click on the Download link and follow the instructions.

If all went well, close R and restart it. Start Explorase by typing library(explorase), followed by explorase().

Additional tutorials on microarray data analysis