How to: Write data as a tab-delimited text file
Author: Henrik Bengtsson
Created on: 2010-04-22
Last updated: 2010-04-23
Data sets and data files are fundamental concepts in the aroma framework, where a data set contains multiple data files in structured directories. There exist multiple methods for extracting signals, that is, reading signals into memory, from the data set or individual data files. For more information, see the 'How tos' section. However, in some cases there is a need to export the data as tab-delimited text files to be imported in other software tools. In this section, we will describe how to write the data to tab-delimited text files. It is possible to generate either (i) one output file per data file, or (ii) one output file for the whole data set.
The writeDataFrame()
method takes either a single file or a data set as
its first argument. In addition to this, there are various arguments,
where maybe the most important one, argument columns
, specifies which
columns the generated text file should contain.
For example:
dfTxt <- writeDataFrame(ds, columns="*")
will generate a tab-delimited file with one column per signal field (typically one field per file), where as:
dfTxt <- writeDataFrame(ds, columns=c("unitName", "*"))
will in addition to the above insert a column (first column) with unit names, which are obtained from the unit names file (e.g. the CDF file). Similarly, if one do:
dfTxt <- writeDataFrame(ds, columns=c("unitName", "chromosome", "position", "*"))
the second and third columns will contain chromosome and position information for each unit (loci), which are obtained from the UGP file.
See also
To write annotation data, see how-to page 'Write annotation data as a tab-delimited text file'.
One data file per tab-delimited text file
Example: Export a single data file as a tab-delimited text file with annotation data added
dataSet <- "HapMap270,6.0,CEU,testSet"
tags <- "ACC,ra,-XY,BPN,-XY,AVG,A+B,FLN,-XY"
chipType <- "GenomeWideSNP_6"
ds <- AromaUnitTotalCnBinarySet$byName(dataSet, tags=tags, chipType=chipType)
print(ds)
## AromaUnitTotalCnBinarySet:
## Name: HapMap270
## Tags: 6.0,CEU,testSet,ACC,ra,-XY,BPN,-XY,AVG,A+B,FLN,-XY
## Full name: HapMap270,6.0,CEU,testSet,ACC,ra,-XY,BPN,-XY,AVG,A+B,FLN,-XY
## Number of files: 3
## Names: NA06991, NA06993, NA07000 [3]
## Path (to the first file):
## totalAndFracBData/HapMap270,6.0,CEU,testSet,ACC,ra,-XY,BPN,-XY,AVG,A+B,FLN,-XY/GenomeWideSNP_6
## Total file size: 21.53 MB
## RAM: 0.00MB
df <- ds[[2]]
print(df)
## AromaUnitTotalCnBinaryFile:
## Name: NA06993
## Tags: total
## Full name: NA06993,total
## Pathname:
## totalAndFracBData/HapMap270,6.0,CEU,testSet,ACC,ra,-XY,BPN,-XY,AVG,A+B,FLN,-XY/GenomeWideSNP_6/NA06993,total.asb
## File size: 7.18 MB (7526121 bytes)
## RAM: 0.00 MB
## Number of data rows: 1881415
## File format: v1
## Dimensions: 1881415x1
## Column classes: double
## Number of bytes per column: 4
## Footer: \<createdOn\>20100422 17:46:03
## CEST\</createdOn\>\<platform\>Affymetrix\</platform\>\<chipType\>GenomeWideSNP_6,Full\</chipType\>
## \<srcFile\>\<srcDataSet\>HapMap270,6.0,CEU,testSet,ACC,ra,-XY,BPN,-XY,AVG,A+B,FLN,-XY\</srcDataSet\>\<srcChipType\>GenomeWideSNP_6,Full,monocell\</srcChipType\>
## \<srcFullName\>NA06993,chipEffects\</srcFullName\>
## \<srcChecksum\>1b7625d385394f42f5b31aa988ff43a1\</srcChecksum\>\</srcFile\>
## Platform: Affymetrix
## Chip type: GenomeWideSNP_6,Full
# Also export a column containing the unit names.
dfTxt <- writeDataFrame(df, columns=c("unitName", "chromosome", "position", "*"))
print(dfTxt)
## TabularTextFile:
## Name: NA06993
## Tags: total
## Full name: NA06993,total
## Pathname:
## totalAndFracBData/HapMap270,6.0,CEU,testSet,ACC,ra,-XY,BPN,-XY,AVG,A+B,FLN,-XY/GenomeWideSNP_6/NA06993,total.txt
## File size: 62.35 MB (65376366 bytes)
## RAM: 0.01 MB
## Number of data rows: NA
## Columns [4]: 'unitName', 'chromosome', 'position', 'NA06993,total'
## Number of text lines: NA
data <- readDataFrame(dfTxt, rows=1010:1024)
print(data)
## unitName chromosome position NA06993,total
## 1010 SNP_A-2001589 1 34110291 1022.368
## 1011 SNP_A-2001596 1 34119149 4317.809
## 1012 SNP_A-2001598 1 34119693 3229.630
## 1013 SNP_A-2001642 1 34170728 6060.184
## 1014 SNP_A-2001643 1 34172791 3469.545
## 1015 SNP_A-4268291 1 34179429 1953.738
## 1016 SNP_A-2001684 1 34204360 1353.817
## 1017 SNP_A-4214101 1 34204556 3615.931
## 1018 SNP_A-2001700 1 34211296 1784.901
## 1019 SNP_A-2001835 1 34287073 2973.341
## 1020 SNP_A-2001840 1 34306289 2415.758
## 1021 SNP_A-4214120 1 34357252 2631.183
## 1022 SNP_A-2001896 1 34377866 6363.690
## 1023 SNP_A-4268333 1 34436399 1606.675
## 1024 SNP_A-2002002 1 34519557 1946.391
A whole data set per tab-delimited text file
Example: Export all data of a data set to a tab-delimited text file with annotation data added
dataSet <- "HapMap270,6.0,CEU,testSet"
tags <- "ACC,ra,-XY,BPN,-XY,AVG,A+B,FLN,-XY"
chipType <- "GenomeWideSNP_6"
ds <- AromaUnitTotalCnBinarySet$byName(dataSet, tags=tags, chipType=chipType)
print(ds)
## AromaUnitTotalCnBinarySet:
## Name: HapMap270
## Tags: 6.0,CEU,testSet,ACC,ra,-XY,BPN,-XY,AVG,A+B,FLN,-XY
## Full name: HapMap270,6.0,CEU,testSet,ACC,ra,-XY,BPN,-XY,AVG,A+B,FLN,-XY
## Number of files: 3
## Names: NA06991, NA06993, NA07000 [3]
## Path (to the first file):
## totalAndFracBData/HapMap270,6.0,CEU,testSet,ACC,ra,-XY,BPN,-XY,AVG,A+B,FLN,-XY/GenomeWideSNP_6
## Total file size: 21.53 MB
## RAM: 0.00MB
# Also export a column containing the unit names.
dfTxt <- writeDataFrame(ds, columns=c("unitName", "chromosome", "position", "*"))
print(dfTxt)
## TabularTextFile:
## Name: HapMap270
## Tags: 6.0,CEU,testSet,ACC,ra,-XY,BPN,-XY,AVG,A+B,FLN,-XY
## Full name: HapMap270,6.0,CEU,testSet,ACC,ra,-XY,BPN,-XY,AVG,A+B,FLN,-XY
## Pathname:
## totalAndFracBData/HapMap270,6.0,CEU,testSet,ACC,ra,-XY,BPN,-XY,AVG,A+B,FLN,-XY/GenomeWideSNP_6/HapMap270,6.0,CEU,testSet,ACC,ra,-XY,BPN,-XY,AVG,A+B,FLN,-XY.txt
## File size: 107.86 MB (113103874 bytes)
## RAM: 0.01 MB
## Number of data rows: NA
## Columns [6]: 'unitName', 'chromosome', 'position', 'NA06991,total',
## 'NA06993,total', 'NA07000,total'
## Number of text lines: NA
data <- readDataFrame(dfTxt, rows=1010:1024)
print(data)
## unitName chromosome position NA06991,total NA06993,total NA07000,total
## 1010 SNP_A-2001589 1 34110291 954.6941 1022.368 1352.647
## 1011 SNP_A-2001596 1 34119149 4499.8872 4317.809 4380.319
## 1012 SNP_A-2001598 1 34119693 2138.8340 3229.630 2419.442
## 1013 SNP_A-2001642 1 34170728 5545.6758 6060.184 5707.734
## 1014 SNP_A-2001643 1 34172791 3561.7803 3469.545 3780.201
## 1015 SNP_A-4268291 1 34179429 2454.7314 1953.738 1925.875
## 1016 SNP_A-2001684 1 34204360 1435.8201 1353.817 1715.853
## 1017 SNP_A-4214101 1 34204556 3941.3589 3615.931 4174.944
## 1018 SNP_A-2001700 1 34211296 2232.3728 1784.901 2363.954
## 1019 SNP_A-2001835 1 34287073 3385.6470 2973.341 3188.489
## 1020 SNP_A-2001840 1 34306289 2451.4780 2415.758 3017.298
## 1021 SNP_A-4214120 1 34357252 3204.5381 2631.183 3220.736
## 1022 SNP_A-2001896 1 34377866 7543.6479 6363.690 6853.816
## 1023 SNP_A-4268333 1 34436399 1718.2601 1606.675 1876.243
## 1024 SNP_A-2002002 1 34519557 1620.9423 1946.391 1545.906