Talk:Link (directive)

Fault tolerance
(discussion moved from Talk:JSON link --Dchassin 15:47, 11 April 2013 (UTC))

--Dchassin 15:12, 5 April 2013 (UTC)

It may be desirable to have a fault tolerant link that doesn't stop the simulation when an non-critical error occurs on the connection. Normally, errors cause the simulation to terminate (by returning TS_INVALID as the sync time). If instead it returned TS_NEVER, then the simulation would continue without the sync response. This is normally not good because it might lead to inconsistent states on either end of the link, but for certain kinds of situation it may be preferred.

Recommend we add a link parameter called "on_error" that can have several values:


 * QUIT : means that no retry is performed and simulation stops. This is current behavior and should be the default.
 * DROP : means that no retry is attempted and the connection is closed permanently.
 * delay : means that the connection is retried every delay seconds, indefinitely.
 * delay,max,action : mean that the connected is retried every delay second up to max times. The action can be QUIT or DROP after the link gives up.

--Dchassin 19:40, 9 April 2013 (UTC)

This was somewhat done by addition of on_error to the link directive. At this point three possible values are supported: ABORT, RETRY, IGNORE

--Bpalmintier 19:54, 9 April 2013 (UTC) ''Note: this comment was being drafted before Dchassin's latest addition. It could be that IGNORE addresses some of these suggestions, though it is still important to specify precisely what values are used instead of the missing reply, and potentially to provide the retry support.

This is a great idea. In addition to the list for "on_error" options, It would be nice to have an option where the GridLAB-D model continues to run using either the last returned values or (if appropriate) the corresponding local simulation values. This could add something like the following:


 * CONTINUE: means to use the last reported values returned by the remote host
 * SIM: means to use simulated values instead of the missing data from the remote host

These could also be "actions" for the retry based "on_error" options. As a result, perhaps the "delay" and "max" options could simply be used as modifiers for the core actions. As in [ QUIT | DROP | CONTINUE | SIM ](,(,))

In all cases, GridLAB could first retry for the max attempts before taking the specified action. For SIM & CONTINUE, it seems important to define what happens if delay_secxmax_attempt is greater than the model time step. Perhaps the model should pause until the max number or retries times out and then advance. Then it is up to the user to specify a combination of timeout,delay_sec, andmax_attempt that ensures a fast enough model timestep for time critical remote simulations.

--Dchassin 20:20, 9 April 2013 (UTC)

I agree this should definitely be on the list, but maybe for the long term.

--Dchassin 23:07, 9 April 2013 (UTC)

One note regarding provisions fallback values: this would have to be done across the entire schema. It's simply way too complex to do. At this point, the existing value would remain untouched when IGNORE is used and an error occurs.