Proposed roadmap for SIMPL 3.0

The SIMPL surrogate code is getting a little "long in the tooth" and is need of some rework and simplification.

Now might be a good time to set our aim at stuff we'd like to hit for SIMPL 3.0.

I propose that the basic structure of a central protocol router and surrogate parents which fork children is sound and should be kept.     In addition the general messaging surrounding a remote name_locate() and around the TCP/IP carrier are also sound and should be kept.   ie.  we keep the API into SIMPL proper as is.

I am proposing  to introduce the concept of a more formal state machine logic to the surrogate_r and surrogate_s processes.

In the process we can hopefully center those state machines around single select() statements instead of the multiple ones which exist now.

In addition maybe we can get rid of the concept of the ACK and merge its functionality with a simplified keep alive message that is multiplexed onto the same socket as the SIMPL message and reply is being carried on.

On the surrogate_r side:

Possible states are:

When surrogate_r first comes up it initializes into the STARTING state. Then drops into a straight Receive() (because sockets are not operational yet so there is nothing to multiplex on).

The only permissible messages to Receive() in the STARTING state would be a SUR_NAME_LOCATE or a proxy to close down.

The receipt of a SUR_NAME_LOCATE drops the surrogate into the WAITING_NAME_LOCATE_REPLY state.     At this point socket is opened to the remote side and the SUR_NAME_LOCATE message is sent down the pipe,  followed immediately with a SUR_ALIVE message. Surrogate_r now drops to its main select() loop which is timed.

There are 3 possible events that can happen in the select():

  1. a dingle comes on the SIMPL fifo
  2. a dingle comes on the TCP/IP socket
  3. a timeout occurs
Here's how I see the state machine happening.


Event: dingle from SIMPL fifo
Action:  error because nothing can be Send() from a reply blocked sender

Event:  dingle from socket
Action: if it is a SUR_NAME_LOCATE then Reply() back to sender to unplug the name_locate() there and drop into the WAITING_FOR_MESSAGE state.
If it is a SUR_ALIVE then toggle the aliveflag and remain in current state.

NOTE: need to check the socket with non blocking read to see if a second message is on this socket.    If it is we need to handle it as well.

Event:  timeout
Action:  if aliveflag has not been toggled then initiate shutdown if aliveflag has been toggled then issue SUR_ALIVE on socket and reset aliveflag


Event:  dingle on SIMPL
Action:  if it is a message then forward this message down the socket and change state to WAITING_FOR_REPLY if we have a proxy then initiate the closedown sequence

Event:  dingle on socket
Action: if SUR_ALIVE then toggle the aliveflag if any other token then do error handling because nothing other than SUR_ALIVE should be coming down this socket in this state

Event:  timeout
Action:  if do aliveflag logic as above


Event:  dingle on SIMPL
Action:  if a message then do error logic.    If it is a proxy then initiate close down.

Event:  dingle on socket
Action:  if SUR_REPLY then Reply() this to sender and switch to WAITING_FOR_MESSAGE state if SUR_ALIVE then toggle aliveflag

Event:  timeout
Action: do aliveflag logic as above


The surrogate_s side:

Once the socket is accepted, the child created then you drop into a select() on that socket and the surrogate_s's reply fifo fd.

Once again the states are:

The events are:
  1. dingle on socket
  2. dingle on reply fifo
  3. timeout
The state machine becomes:


Event: dingle on socket
Action: if SUR_NAME_LOCATE then issue the local name_locate() and pump a response back down the socket  ... drop into WAITING_FOR_MESSAGE state if SUR_ALIVE then respond with a SUR_ALIVE on socket  ... toggle aliveflag

Event:   dingle on reply fifo
Action:   error

Event:  timeout
Action: if aliveflag is set then reset.

NOTE: the first SUR_ALIVE message will come after the other side times out once you don't want to die here if the aliveflag hasn't been set yet


Event:  dingle on socket
Action: if SUR_SEND then set the state as WAITING_FOR_REPLY and initiate upper half of a Send()
if SUR_ALIVE then increment aliveflag and respond with a SUR_ALIVE on socket

Event:  dingle on reply fifo
Action:  error

Event:   timeout
Action: if aliveflag is set then reset   else if aliveflag is > 2 then initiate shutdown sequence


Event: dingle on socket
Action: if SUR_ALIVE increment aliveflag and respond SUR_ALIVE on socket anything else is an error

Event:   dingle on reply fifo
Action: do the bottom half of a Send() to grab the reply and pump it down the socket as a SUR_REPLY  drop back to a WAITING_FOR_MESSAGE state

Event: timeout
Action:  if alive flag is set then reset  else if aliveflag > 2 initiate shutdown sequence

On the administrative side of the process,   I'm proposing to create a temporary side branch in our tree and CVS as:

This subdirectory would not be integrated into the top level build so that as we continue with SIMPL 2.x bug fix releases the existing surrogate code preempts the new stuff. If someone wants to "flip" over to the new stuff it should be a simple matter of going into this tcp_x subdirectory and manually running the
make clobber
make install
Which will overwrite the surrogateTcp executable with the experimental version.

Once we arrive at stable code in tcp_x this will be remerged back into a single tcp subdirectory for the 3.0 release.

Any thoughts?

As always no timetable is proposed because much will depend on the contributions from the developers in the  SIMPL community like yourselves.   We all own the code in this project.

It will advance as fast as we want to make it happen and no faster.