Re: newbie question about data I/O
On Sep 18, 6:44 pm, "Eric Pruneau" <eric.prun...@cgocable.ca> wrote:
"Seeker" <zhongm...@gmail.com> a =E9crit dans le message de news:
af4c91a8-a601-47e3-b25a-3587cdd7f...@p9g2000vbl.googlegroups.com...
Howdy, gurus
I want to write code to read in a large genomic file. The data look
like
Marker location freq T mu=
sigma_2 S p-
value
rs2977670 713754 0.925 779 9.604 141.278 2.=
202 0.02763
rs2977656 719811 0.992 793 9.120 134.796 2.=
733 0.00627
Here is my code:
#include <iostream>
#include <fstream>
#include <string>
#include <vector>
using namespace std;
int main(int argc, char** argv)
{
vector<string> snp_list1,snp_list2,snp_list3;
vector<int> location1,location2,location3;
vector<double> freq1,freq2,freq3;
vector<int> T1,T2,T3;
vector<double> mu1,mu2,mu3;
vector<double> sigma_21,sigma_22,sigma_23;
vector<double> S1,S2,S3;
vector<double> p1,p2,p3;
//read in 1st data file;
FILE *in=fopen(argv[1],"r");
char line[128];
fgets(line,128,in); //skip the 1st row;
while (fgets(line,128,in))
{
cout << line << endl;
char *str = strtok(line, "\t"); // the space in "\t" is a tab
string marker(str);
snp_list1.push_back(marker);
str = strtok(NULL, "\t");
location1.push_back(atof(str));
str = strtok(NULL, "\t");
freq1.push_back(atof(str));
str = strtok(NULL, "\t");
T1.push_back(atof(str));
str = strtok(NULL, "\t");
mu1.push_back(atof(str));
str = strtok(NULL, "\t");
sigma_21.push_back(atof(str));
str = strtok(NULL, "\t");
S1.push_back(atof(str));
str = strtok(NULL, "\t");
p1.push_back(atof(str));
}
fclose(in);
//verify the vectors
for (int i=0; i<snp_list1.size();++i)
cout << snp_list1[i] << endl;
return 0;
}
I tried to run the code but always met errors shown as "error while
dumping state..(core dumped)". I am new to C++.Thanks a lot for your
input.
Here is a simple code to read some numbers in a text file using fstream a=
nd
stringstream
Here is my text file
SomeString 1 2 3 4 5
now I can read this like that
#include <fstream> // for ifstream
#include <sstring> // for istringstream
...
int main()
{
ifstream ifs("file.txt"); // this open the file in text mode b=
y default
string strLine;
vector<int> v;
getline(ifs, strLine);
istringstream iss(strLine);
iss >> strLine; // extract the first element, we assume it is =
a string
// now loop until the end of the line and extract every intege=
r
while(!iss.eof())
{
int tmp;
iss >> tmp;
v.push_back(tmp);
}
return (0);
}
It should be ewasy to modify that to read your file. Note that I didn't d=
o
much error checking.
Eric
If none of the columns have blank values, then the "input stream
extraction" with the >> operators will read in the data conveniently.
They skip whitespace and line endings.
ifstream input_file("filename");
string Marker;
int location;
double freq;
int T;
double mu;
....
while(!!input_file)
{
input_file >> Marker >> location >> freq >> T >> mu >> ...
}
input_file.close();
Then, where you also want to push those onto vectors, you can overload
the definition and build up the extractor for the vector of the type.
template <typename T> istream& operator>>(istream& in, vector<T>& vec)
{
T temp;
in >> temp;
vec.push_back(temp);
return in;
}
Then, as you've templatized the input extractor defined for a vector
of the type, it is more concise. That is where string and the built-
in types of int and double already have extractors defined.
#include <string>
#include <vector>
#include <iostream>
#include <fstream>
using std::string;
using std::vector;
using std::istream;
using std::ifstream;
vector<string> Markers;
vector<int> locations;
vector<double> freqs;
vector<int> Ts;
vector<double> mus;
// ...
template <typename T> istream& operator>>(istream& in, vector<T>& vec)
{
T temp;
in >> temp;
vec.push_back(temp);
return in;
}
void read_input_file()
{
ifstream input_file("file_name");
// read off the header
string header;
::getline(input_file, header);
// read off the lines
while(!!input_file)
{
input_file >> Markers >> locations >> freqs >> Ts >> mus >> ...;
}
input_file.close();
}
int main()
{
read_input_file();
return 0;
}
When evaluating input_file, it's an istream, ifstream : istream, and
it has the ! operator defined to return whether it has failed an
extraction (failbit), eg converting string to int, or gone into a bad
state (badbit), eg file error. There are some other semantics of the
input extraction.
If the columns entries had blank values, then that would be bad
because of reading a fixed number of columns into variables of
expected types.
Now, in terms of defining the vector extraction, that is where the
vectors are defined for the columns but the data is laid out in rows,
it's row-major instead of column-major. A different and reasonable
overload of the vector extractor would be along the lines of
template <typename T> istream& operator >> ( istream& in, vector<T>&
vec)
{
T temp;
while ( !!(in >> temp) vec.push_back(temp);
return in;
}
but that would always return, if it returned normally, with the eof,
fail, or bad bit set.
You might also want to define a record structure, and then define an
extractor for it.
struct record
{
string Marker;
int location;
double freq;
int T;
double mu;
// ...
};
then define the extractor for the record
istream& operator>>(istream& in, record& r)
{
return in >> r.Marker >> r.location >> r.freq >> r.T >> r.mu ; //...
}
then use it in the line reading loop with then the result being a
vector of records instead of a vector of vectors.
string header;
::getline(in, header);
vector<record> records;
while(!!in)
{
in >> records;
}
Now I might have made a mistake in the above but it is hopefully
correct.
Thanks,
Ross F.