Re: Can extra processing threads help in this case?

From:

Hector Santos <sant9442@nospam.gmail.com>

Newsgroups:

microsoft.public.vc.mfc

Date:

Tue, 06 Apr 2010 12:31:12 -0400

Message-ID:

<#QIKZaa1KHA.220@TK2MSFTNGP06.phx.gbl>

Peter Olcott wrote:

> I would envision only using anything as heavy weight as
> SQLite for just the financial aspect of the transaction.

SQLITE is not "heavy weight," its lite weight and only good for a
single accessor applications. Very popular for applications in
configurations or user recorsd, but only THEY have access and no one
else.

You can do handle multiple access but at the expense of speed. The
SQLITE people makes no bones about that. SQLITE works because the
target market don't have any sort of critical speed requirement and
can afford the latency in DATAFILE sharing.

SQLITE uses what is called a Reader/Writer Lock technique very common
in synchronization of a common resource among threads

    You can have many readers, but one writer.
    If readers are active, the writer must wait until no more readers
    if writers are active, the reader must wait until no more writers

If you use OOPS with a class based ReaderWriter Class,, it makes the
programming easier:

   Get
   {
      CReader LOCK()
      get record
   }

   Put
   {
      CWriter LOCK()
      put record
   }

The nice thing is that when you lose local scope, the destructor of
the reader/writer lock will release/decrement the lock references.

Now in Windows, thread synchronization is generally done use whats
called Kernel Objects. They are SEMAPHORES, a MUTEX is a special type
of semaphore.

For unix, I am very rusty here, but it MIGHT still use the old school
method which was also used in DOS using what I called "File
Semaphores." In other words, a FILE is used to signify a LOCK.

So one process will create a temporary file:

         process-id.LCK

and other other processes will wait on that file disappearing and only
the OWNER (creator of the lock) can release/delete it.

As I understood it, pthreads was an augmented technology and library
to allow unix based applications to begin using threads. I can't tell
you the details but as I always understood it they all - WINDOWS and
UNIX - are conceptually the same when it comes to common resource
sharing models. In other words, you look for the same type of things
in both.

> The queue of HTTP requests would use a lighter weight simple
> file.

For you, you can use a single log file or individual *.REQ files which
might be better/easier using a File Notification event concept. Can't
tell you abou *nix, but for Windows:

       FindFirstChangeNotification()
       ReadDirectoryChangeW()

The former might be available under *nix since its the older idea. The
latter was introduced for NT 3.51 so its available for all NT based
OSes. It is usually used with IOCP designs for scalability and
performance.

In fact, one can use ReadDirectoryChangeW() along with Interlocked
Singly Linked Lists:

    http://msdn.microsoft.com/en-us/library/ms684121(v=VS.85).aspx

to give you a highly optimized, high performance atomic FIFO concept.
  However, there is a note I see for 64bit operations.

> I would use some sort of IPC to inform the OCR that a
> request is available to eliminate the need for a polled
> interface. The OCR process would retrieve its jobs form this
> simple file.

See above.

> According the Unix/Linux docs multiple threads could append
> to this file without causing corruption.

So does windows. However, there could be a dependency on the storage
device and file drivers.

In general, as long as you open for append, write and close, and do
leave it open, don't use any files stat readings or seeking on your
own, it works very nicely:

    FILE *fv = fopen("request.log","at");
    if (fv) {
        fprint(fv,"%s\n",whatever);
        fclose(fv);
    }

However, if you really wanted a guarantee, then you can user a
critical section, a named kernel object (named so it can be shared
among processes), or use sharing mode open file functions with a READ
ONLY sharing attribute. Using CreateFile(), it would look like this:

BOOL AppendRequest(const TYourData &data)
{
   HANDLE h = INVALID_HANDLE_VALUE;
   DWORD maxTime = GetTickCount()+ 20*1000; // 20 seconds max wait
   while (1)
   {
     h = CreateFile("request.log",
                     GENERIC_WRITE,
                     FILE_SHARE_READ,
                     NULL,
                     OPEN_ALWAYS,
                     FILE_ATTRIBUTE_NORMAL,
                     NULL);
     if (h != INVALID_HANDLE_VALUE) break; // We got a good handle
     int err = GetLastError();
     if (err != 5 && err != 32) {
        return FALSE;
     }
     if (GetTickCount() > maxTime) {
        SetLastError(err); // make sure error is preserved
        return FALSE;
     }
     _cprintf("- waiting: %d\n",GetTickCount()-maxTime);
     Sleep(50);
   }
   SetFilePointer(h,0,NULL,FILE_END);

   DWORD dw = 0;
   if (!WriteFile(h,(void *)&data,sizeof(data),&dw,NULL)) {
        // something unexpected happen
        CloseHandle(h);
        return FALSE;
   }

   CloseHandle(h);
   return TRUE;
}

> If this is not the
> case then a single thread could be invoked through some sort
> of FIFO, such as in Unix/Linux is implemented as a named
> pipe, with each of the web server threads writing to the
> FIFO.

If that is all *nix has to offer, historically, using named pipes can
be unreliable, especially under multiple threads.

But since you continue to mix up your engineering designs and you need
to get that straight, process vs threads, the decision will decide
what to use.

Lets say you listen and ultimately design a multi-thread ready EXE and
you want to also allow multiple EXE to run, either on the same machine
or another machine and want to keep this dumb FIFO design for your
OCR, then by definition you need a FILE BASED sharing system.

While there are methods to do cross machine MESSAGING, like named
pipes, it is still fundamentally based on a file concept behind the
scenes, they are just "special files".

You need to trust my 30 years of designing server with HUGE IPC
requirements. You can write your OWN "messaging queue" with ideas
based on the above AppendRequest(), just change the file name to some
shared resource location:

      \\SERVER_MACHINE\SharedFolder\request.log

and you got your Intra and Inter Process communications, Local,
Remote, Multi-threads, etc.!

Of course, using an shared SQL database with tables like above to do
the same thing.

Your goal as a good "Software Engineer" is to outline the functional
requirements and also use BLACK BOX interfacing. You could just
outline this using an abstract OOPS class:

class CRequestHandlerAbstract {
public:
     virtual bool Append(const TYourData &yd) = 0;
     virtual bool GetNext(TYourData &yd) = 0;
     virtual bool SetFileName(const char *sz) { return sfn = sz; }

     struct TYourData {
        ..fields...
     };
protected:
     virtual bool OpenFile() = 0;
     virtual bool CloseFile() = 0;
     string sfn;
};

and that is all you basically need to know. The implementation of
this abstract class will be for the specific method and OS you will be
using. What doesn't change is your Web server and OCR. It will use
the abstract methods as the interface points.

Yes that is the sort of system that I have been envisioning.
I still have to have SQL to map the email address login ID
to customer number.

That will depends on how you wish to define your customer number. Its
  a purely numeric and serial, i.e, start at 1, then you can define in
your SQL database table schema, an auto-increment id field which the
SQL engine will auto-increment for you when you first create the user
account with the INSERT command.

Example, a table "CUSTOMERS" in the database is create:

CREATE TABLE customers (
   id int auto_increment,
   Name text,
   Email Text,
   Password text
)

When you create the account, the insert will look like this:

INSERT INTO customers values
     (NULL,'Peter','pete@abc.com','some_hash_value')

By using the NULL for the first ID field, SQL will automatically use
the next ID number.

In general, a typical SQL tables layout uses auto-increase ID fields
as the primary or secondary key for each table, that allows you to not
duplicate data. So you can have an SESSIONS table for currently
logged in users:

CREATE TABLE sessions (
   id int auto_increment, <<--- view it as your transaction session id
   cid int,
   StartTime DateTime,
   EndTime DataTime,
   ..
   ..
)

where the link is Customers.id = Sessions.cid.

WARNING:

One thing to remember is that DBA (Database Admins) value their work
and are highly paid. Do not argue or dispute with them as you
normally do. Most certainly will not have the patience shown here to
you. SQL setups is a HIGHLY complex subject and it can be easy if you
keep it simple. Don't get LOST with optimization until the need
arises, but using common sense table designs should be non-brainer
upfront. Also, while there is a standard in the "SQL language" there
are differences between SQL engines, like the above CREATE statements,
they are generally slightly different for different SQL engines. So I
advise you to use common SQL data types and avoid special definitions
unless you made the final decision to stick with one vendor SQL engine.

You are a standard design, all you will need at a minimum for tables are:

   customers customer table
                      auto-increment primary key: cid

   products customer products limits, etc, table
                      auto-increment primary key: pid
                      secondary key: cid

                      This would be a one to many table.

                      customers.cid <---o products.cid

                      select * from customers, products
                          where customers.cid = products.cid

                      You can use a JOIN here too which a DBA will
                      tell you to do, but the above is the BASIC
                      concept.

   sessions sessions management table
                      can server as session history log as well

                      auto-increment primary key: sid
                      secondary key: cid

   requests Your "FIFO"
                      auto-increment primary key: rid
                      secondary key: cid
                      secondary key: sid

Some DBAs might suggest combining tables, Using or not using indices
or secondary keys, etc. There are is no real answer and it highly
depends on the SQL when it comes to optimization. So DON'T key lost
with it. You can ALWAYS create indices if need be.

I have been envisioning the primary means of IPC, as a
single binary file with fixed length records. I have also
envisioned how to easily split this binary file so that it
does not grow too large. For example automatically split it
every day, and archive the older portion.

Well, to do that you have no choice but to implement your own file
sharing class as shown above. The concept is basically a Log Rotater.
You can now update the CRequestHandlerAbstract class with one more
method requirement:

class CRequestHandlerAbstract {
public:
     virtual bool Append(const TYourData &yd) = 0;
     virtual bool GetNext(TYourData &yd) = 0;
     virtual bool SetFileName(const char *sz) { return sfn = sz; }

     virtual bool RotateLog() = 0; // << NEW REQUIREMENT

     struct TYourData {
        ..fields...
     };
protected:
     virtual bool OpenFile() = 0;
     virtual bool CloseFile() = 0;
     string sfn;
};

But you also achieve rotation if you use a special file naming
nomenclature, this is called Log Periods. It could be based on
today's date.

      "request-{yyyymmdd}.log"

That will guarantee a daily log, or do it other periods:

      "request-{yyyy-mm}.log" monthly
      "request-{yyyy-ww}.log" week number
      "request-{yyyy-mm}.log" monthly
      "request-{yyyymmddhh}.log" hourly

and so on, and you also couple it by size.

This can be handle by adding a LogPeriod, FileNameFormat, MaxSize
variables which the OpenFile() can use;

class CRequestHandlerAbstract {
public:
     virtual bool Append(const TYourData &yd) = 0;
     virtual bool GetNext(TYourData &yd) = 0;
     virtual bool SetFileName(const char *sz) { return sfn = sz; }

     virtual bool RotateLog() = 0; // << NEW REQUIREMENT

     struct TYourData {
        ..fields...
     };
protected:
     virtual bool OpenFile() = 0;
     virtual bool CloseFile() = 0;
     ctring sfn;

public:
     int LogPeriod; // none, hourly, daily, weekly, monthly...
     int MaxLogSize;
     Ctring FileNameFormat;
};

and by using a template idea for the file name you can use string
replacements very easily.

     GetSystemTime(&st)

     CString logfn = FileNameFormat;
     if (logfn.Has("yyyy"}) logfn.Replace("yyyy",Int2Str(st.wYear));
     if (logfn.Has("mm"}) logfn.Replace("mm",Int2Str(st.wMonth));
     ... etc ...

     if (MaxLogSize > 0) {
        DWORD fs = GetFileSizeByName(logfn,NULL);
        if (fs != -1 && fs >= MaxLogSize) {
            // Rename file with unique serial number
            // "request-yyyymm-1.log"
            // "request-yyyymm-2.log"
            // etc.
            // finding highest #.

            RenameFileWithASerialNumberAppended(logfn)
        }
     }

etc.

--
HLS