Instructions for Isolating One Median from the Sea of Data
This document is also available for download in pdf or word format.
Introduction
A print includes, at a minimum, quadruplicates of each antigen so we need a way of compressing all this information into one numerical value for statistical analyses. Inevitably, compressing the data leads to loss of information so we use a scheme that minimizes the loss of useful information. To this end, we take the median value of a unique antigen. The median is an excellent representation of the average value for data that is not normally distributed and reduces skewing do to outliers in the data.
Which Column?
GenePix already calculated the median, mean, standard deviation, ratio, and many other statistical parameters during the analysis section of adding a grid to the slide. In addition, GenePix 5.0 outputs information from the GAL file, settings, and flags. In all, it’s a sea of information, and for a person doing their first analysis, swimming may not look like an attractive option.
Fortunately, we use a simple strategy in our statistical analyses. This strategy uses the median value of the fluorescent color used with the secondary antibody. For example, if a cy3-conjugated antibody was used, then we want to collect the F532 Median – B532 column (column AX in Excel). Alternatively, if a cy5-conjugated antibody was used, then we want to collect the F635 Median – B635 column (column AW in Excel).
Consolidating All the Slides into One Sheet
Launch Microsoft Excel
Select File, Open… (Ctrl+O)
Select a slide from the GenePix analysis (see Howto Grid mArray Images)
Select the Name column (column G in Excel)
Select Edit, Copy (Ctrl+C)
Select File, New… (Ctrl+N)
I’ll call the new spreadsheet, spreadsheet A, and the old spreadsheet, spreadsheet X.
Select Edit, Paste (Ctrl+V) in the first column (column A).
Rename the label Name to something such as Unique ID (later programs don’t like the header Name).
Return to spreadsheet X.
Select the appropriate FXXX Median – BXXX column (XXX refers to the wavelength of the desired color)
Select Edit, Copy (Ctrl+C)
Return to spreadsheet A.
Select Edit, Paste (Ctrl+V) in the second column (column B).
Rename the label FXXX Median – BXXX to something more descriptive of the sample and slide #.
Select File, Open… (Ctrl+O)
Select the next slide from the GenePix analysis
Repeat the procedure for adding the Median column from spreadsheet X to spreadsheet A until you have one file containing all the slides from the scan/analysis.
Select File, Save As
Name the Excel file with a description that represents the state of the analysis. For example, you might name the file
slXX-YY_print_YY_MM_DD_medians.xls
The first part in the data compression process is complete.
Sorting All the Median Values
Select File, New… (Ctrl+N)
I’ll call the new spreadsheet, spreadsheet B, and the spreadsheet with all the slides, spreadsheet A.
Highlight the entire data set (Ctrl+A) from spreadsheet A.
Select Edit, Copy (Ctrl+C)
Return to spreadsheet B.
Select Edit, Paste (Ctrl+V)
See the images below for the final product of the next few steps[1].
Delete empty rows above headers.
Insert a row between 1 & 2 (beneath the row of sample/slide names)
Number the empty cells beneath each slide with a number that starts with one and increases to n, where n is the total number of slides in the experiment.
Move the Unique ID to cell A2.
Highlight cells A3 – MN (where M is the final data column and N is the final data row)
Select Data, Sort…
Select Column A, Check Ascending, Check No Header Row, and Click OK
This step groups all the replicates together (see sorted image).
Consolidating All the Median Values into One Median
The algorithm to consolidate the one or two thousand total antigens into only the unique antigens uses functions that the average Excel user may or may not be familiar with. I’ll go slowly and clarify each step along the way.
Please refer to the image below because I rely on the cell names and numbers in this explanation. This particular experiment contains 26 samples, spanning columns B – AA. I hid columns E – AA to fit all the relevant information onto a reasonably sized image.
Select Edit, Copy (Ctrl+C)
Select cell AD1
Select Edit, Paste (Ctrl+V)
Enter the number 1 in cell AB2
Enter the following algorithm in cell AB3
=IF((EXACT($A3,$A4)),1+AB2,1)
Select cell AB3
Select Edit, Copy (Ctrl+C)
Highlight cells AB3 – ABn[2]
Select Edit, Paste (Ctrl+V)
Select cell AC3
Enter the number 1 in cell AC3
Select cell AC4
Enter the following algorithm in cell AC4
=AC3+1
Select cell AC4
Select Edit, Copy (Ctrl+C)
Highlight cells AC4 – ACn
Select Edit, Paste (Ctrl+V)
Select cell AD3
Enter the following algorithm in cell AD3
=IF((EXACT($A3,$A4))," ",A3)
Select cell AD3
Select Edit, Copy (Ctrl+C)
Highlight cells AD3 – ADn
Select Edit, Paste (Ctrl+V)
Select cell AE3
Enter the following algorithm in cell AE3
=IF((EXACT($A3,$A4))," ",MEDIAN(OFFSET($A$3,($AC3-$AB2),AE$2,1,1):OFFSET($A$3,$AC2,AE$2,1,1)))
Select cell AE3
Select Edit, Copy (Ctrl+C)
Highlight cells AE3 – BDn
Select Edit, Paste (Ctrl+V)
The second part in the data compression process is complete.
Eliminating the Extra Space
Highlight cells AD1 – BDn
Select Edit, Copy (Ctrl+C)
Select Sheet2
Right click on cell A1 and select Paste Special…
Select Values and click OK
Highlight cells A3 – AAn
Select Data, Sort…
Select Column A, Check Descending, Check No Header Row, and Click OK
Scroll down to row n and make sure all the antigens are grouped continuously. If they aren’t, then delete the intervening rows. Save this file.
The data compression is now complete!
Brief Explanation of the Algorithm
There are two main parts to this algorithm. The first part searches through the list of antigen names looking for duplications. The second part determines the number of replicates for a particular antigen then computes the median value of all the replicates.
The first algorithm uses the EXACT function to search for exact string matches. I ask the question, is this name the same as the previous one. If the answer is yes, then I add a blank to the cell and move on. If the answer is no, then I add the name to the cell and move on.
The second algorithm uses the EXACT function in the same manner as the first algorithm. In addition, the second algorithm also uses the OFFSET and MEDIAN functions. The OFFSET function allows me to include all the cells that particular group of replicates occupy. The MEDIAN function computes the median value of this set.
[1] I reference cells by their row (numbers) and their column (letters). For example, A1 is the cell for the first row and first column.
[2] n represents the final row of the complete data set
Brian A. Kidd Ó 2004