## Instructions for Isolating One Median from the Sea of Data

This document is also available for download in pdf or word format.

## Introduction

A print includes, at a minimum, quadruplicates of each antigen so we need a way of compressing all this information into one numerical value for statistical analyses. Inevitably, compressing the data leads to loss of information so we use a scheme that minimizes the loss of useful information. To this end, we take the median value of a unique antigen. The median is an excellent representation of the average value for data that is not normally distributed and reduces skewing do to outliers in the data.

## Which Column?

GenePix already calculated the median, mean, standard deviation, ratio, and many other statistical parameters during the analysis section of adding a grid to the slide. In addition, GenePix 5.0 outputs information from the GAL file, settings, and flags. In all, it’s a sea of information, and for a person doing their first analysis, swimming may not look like an attractive option.

Fortunately, we use a simple strategy in our statistical analyses. This strategy uses the median value of the fluorescent color used with the secondary antibody. For example, if a cy3-conjugated antibody was used, then we want to collect the F532 Median – B532 column (column AX in Excel). Alternatively, if a cy5-conjugated antibody was used, then we want to collect the F635 Median – B635 column (column AW in Excel).

## Consolidating All the Slides into One Sheet

Launch *Microsoft Excel*

Select *File, Open… (Ctrl+O)*

Select a slide from the GenePix analysis (see *Howto Grid mArray Images*)

Select the **Name** column (column G in Excel)

Select *Edit, Copy (Ctrl+C)*

Select *File, New… (Ctrl+N)*

I’ll call the new spreadsheet, *spreadsheet A*, and the old spreadsheet, *spreadsheet X*.

Select *Edit, Paste (Ctrl+V)* in the first column (column A).

Rename the label Name to something such as Unique ID (later programs don’t like the header Name).

Return to *spreadsheet X*.

Select the appropriate **FXXX Median – BXXX** column (XXX refers to the wavelength of the desired color)

Select *Edit, Copy (Ctrl+C)*

Return to *spreadsheet A*.

Select *Edit, Paste (Ctrl+V)* in the second column (column B).

Rename the label FXXX Median – BXXX to something more descriptive of the sample and slide #.

Select *File, Open… (Ctrl+O)*

Select the next slide from the GenePix analysis

Repeat the procedure for adding the Median column from *spreadsheet X *to* spreadsheet A* until you have one file containing all the slides from the scan/analysis.

Select *File, Save As*

Name the Excel file with a description that represents the state of the analysis. For example, you might name the file

slXX-YY_print_YY_MM_DD_medians.xls

**The first part in the data compression process is complete.**

## Sorting All the Median Values

Select *File, New… (Ctrl+N)*

I’ll call the new spreadsheet, *spreadsheet B*, and the spreadsheet with all the slides, *spreadsheet A*.

Highlight the entire data set *(Ctrl+A)* from *spreadsheet A*.

Select *Edit, Copy (Ctrl+C)*

Return to *spreadsheet B*.

Select *Edit, Paste (Ctrl+V)*

*See the images below for the final product of the next few steps ^{[1]}.*

Delete empty rows above headers.

Insert a row between *1 & 2* (beneath the row of sample/slide names)

Number the empty cells beneath each slide with a number that starts with one and increases to *n*, where *n* is the total number of slides in the experiment.

Move the Unique ID to cell *A2*.

Highlight cells *A3 – MN* (where *M* is the final data column and *N* is the final data row)

Select *Data, Sort…*

Select *Column A*, Check *Ascending*, Check *No Header Row*, and Click *OK*

This step groups all the replicates together (see sorted image).

### Consolidating All the Median Values into One Median

The algorithm to consolidate the one or two thousand total antigens into only the unique antigens uses functions that the average Excel user may or may not be familiar with. I’ll go slowly and clarify each step along the way.

Please refer to the image below because I rely on the cell names and numbers in this explanation. This particular experiment contains 26 samples, spanning columns B – AA. I hid columns E – AA to fit all the relevant information onto a reasonably sized image.

*A1 – AA2*

Select

*Edit, Copy (Ctrl+C)*

Select cell

*AD1*

Select

*Edit, Paste (Ctrl+V)*

Enter the number

*1*in cell

*AB2*

Enter the following algorithm in cell

*AB3*

=IF((EXACT($A3,$A4)),1+AB2,1)

Select cell

*AB3*

Select

*Edit, Copy (Ctrl+C)*

Highlight cells

*AB3 – ABn*

^{[2]}Select

*Edit, Paste (Ctrl+V)*

Select cell

*AC3*

Enter the number 1 in cell

*AC3*

Select cell

*AC4*

Enter the following algorithm in cell

*AC4*

=AC3+1

Select cell

*AC4*

Select

*Edit, Copy (Ctrl+C)*

Highlight cells

*AC4 – ACn*

Select

*Edit, Paste (Ctrl+V)*

Select cell

*AD3*

Enter the following algorithm in cell

*AD3*

=IF((EXACT($A3,$A4))," ",A3)

Select cell

*AD3*

Select

*Edit, Copy (Ctrl+C)*

Highlight cells

*AD3 – ADn*

Select

*Edit, Paste (Ctrl+V)*

Select cell

*AE3*

Enter the following algorithm in cell

*AE3*

=IF((EXACT($A3,$A4))," ",MEDIAN(OFFSET($A$3,($AC3-$AB2),AE$2,1,1):OFFSET($A$3,$AC2,AE$2,1,1)))

Select cell

*AE3*

Select

*Edit, Copy (Ctrl+C)*

Highlight cells

*AE3 – BDn*

Select

*Edit, Paste (Ctrl+V)*

**The second part in the data compression process is complete.**

### Eliminating the Extra Space

Highlight cells *AD1 – BDn*

Select *Edit, Copy (Ctrl+C)*

Select *Sheet2*

Right click on cell *A1* and select *Paste Special…*

Select *Values* and click *OK*

Highlight cells *A3 – AAn*

Select *Data, Sort…*

Select *Column A*, Check *Descending*, Check *No Header Row*, and Click *OK*

Scroll down to row *n* and make sure all the antigens are grouped continuously. If they aren’t, then delete the intervening rows. Save this file.

**The data compression is now complete!**

### Brief Explanation of the Algorithm

There are two main parts to this algorithm. The first part searches through the list of antigen names looking for duplications. The second part determines the number of replicates for a particular antigen then computes the median value of all the replicates.

The first algorithm uses the EXACT function to search for exact string matches. I ask the question, is this name the same as the previous one. If the answer is *yes*, then I add a blank to the cell and move on. If the answer is *no*, then I add the name to the cell and move on.

The second algorithm uses the EXACT function in the same manner as the first algorithm. In addition, the second algorithm also uses the OFFSET and MEDIAN functions. The OFFSET function allows me to include all the cells that particular group of replicates occupy. The MEDIAN function computes the median value of this set.

[1] I reference cells by their row (numbers) and their column (letters). For example, A1 is the cell for the first row and first column.

[2] n represents the final row of the complete data set

*Brian A. Kidd* Ó 2004