Re: std::list Duplicates with Different Data
In article <MPG.2293b77914e2118e989704@news.cox.net>, mrc2323@cox.net
says...
[ ... ]
Okay, here's what I am trying to do:
1. I have a large file with names (in the form "last, first") and
gender codes ('M'/'F') from which I want to parse the "first" name
string and build a std::list (or some other container type) of all
unique (first) names and genders.
2. I intend to use this information in a data entry application that
checks the validity of a inputted first_name against this data
collection. If there's a conflict (e.g. "SUE",'M', or "MARVIN",'F'), I
want the application to pause and let the user decide if that's correct.
3. There are cases ("PAT", "CHRIS", etc.) where the name is valid,
regardless of gender. Therefore, the std::list should contain multiple
objects that have the same "search key" value (e.g. "CHRIS") but have
different gender codes - 2 different objects with identical "find"
values. I am struggling with (1) building the std::list and (2)
searching it for all possible variations of name & gender.
BTW, I appreciate the helpful response, Jerry...
First of all, I would _not_ use a linked list. Second, I'd use a single
entry for each first name, storing the number of males and number of
females with that first name. I believe that should simplify your code
quite a bit. Personally, I'd write the code something like this:
// Warning: all code in the post is UNTESTED!
// data is it's read from the file:
struct person {
std::string fname;
std::string lname;
char gender;
};
// read the data from the file:
std::istream &operator>>(std::istream &is, person &p) {
// assumes file is of form: last_name ',' first_name ',' gender '\n'
is.getline(p.lname, ',');
is.getline(p.fname, ',');
is >> p.gender;
if (is.peek() != '\n')
is.setstate(std::ios::failbit);
return is;
}
enum gender { MALE, FEMALE };
// holds the data we care about:
struct name_use {
std::string first_name;
long f_use;
long m_use;
name_use(std::string name, gender g) :
first_name(name), f_use(0), m_use(0)
{
if (g==MALE)
++m_use;
else
++f_use;
}
bool operator<(name_use &other) {
return first_name < other.first_name;
}
};
std::set<name_use> names;
std::ifstream input("myfile.hst");
person temp;
while (input>>temp) {
gender g = temp.gender == 'M' ? MALE : FEMALE;
name_use nu(temp.fname, gender);
std::set<name_use>::iterator it = names.find(nu);
if (it != names.end()) {
// name found -- increment appropriate count
if (gender == MALE)
++it->m_use;
else
++it->f_use;
}
else { // name not present yet
names.insert(nu);
}
To put this to use, you'd set a threshold, and check whether the value
was below that threshold:
const double threshold = 0.05;
person p;
get_data(input, p);
std::set<name_use>::iterator n = names.find(person.fname);
if ((n == names.end())
// maybe a typo?
warn("Please verify name");
else {
double percent_male = double(n->m_use)/(n->f_use+n->m_use);
double percent_female = 1.0 - percent_male;
if (p.gender == 'M' && (percent_male < threshold))
warn("Please verify gender");
else if (p.gender == 'F' && (percent_female < threshold))
warn("Please verify gender");
}
There are, of course, a number of alternatives, such as using an
std::map, with the first name as the key and the usages as the
associated data. This might be a tad cleaner in places, but I doubt the
difference would be particularly major.
--
Later,
Jerry.
The universe is a figment of its own imagination.