
Documentation on the workings of PhysCalc
-----------------------------------------

Physcalc is divided into the following modules:

	PhysMain - Includes the main() function, the command
				parser, and the help routines.

	PhysNode - Routines for allocating, deallocating, and
				manipulating nodes.

	PhysConv - Routines for manipulating dimensioned numbers
				and doing unit conversions.

	PhysSolv - Routines for parsing and evaluating
				expressions.

	PhysOper - Functions which do the actual math;
				PhysCalc's built in operators and functions.

	PhysMLib - Standard routines lifted from some of my
				other programs, put here for completeness.


The Origins and Evolution of PhysCalc

	Initially, PhysCalc was two unrelated programs: a
program to do unit conversion, and a program to evaluate
simple arithmetical expressions using the standard rules of
precedence.
	The program to do unit conversion, called Convert, was a
translation and enhancement of a BASIC program from Byte
magazine (Vol 10/No 3 [March 85] by David Kahn).  It was
fixed at six fundamental dimensions, and came with many of
the units found in PhysCalc.  Before being integrated into
PhysCalc, I added the feature of having it read in the unit
data from a file, instead of just having it be static data
inside the program.
	The program to evaluate expressions (SOLVE.C) was my own
creation, as part of my effort to understand the nature of
compilers. All it did was take expressions involving
parenthesis, addition, subtraction, multiplication, and
division, and computed the result using standard algebraic
precedence of operations.  After some more tinkering, I
created a mechanism whereby instead of recursively calling a
new function for each operator, function, or level of
precedence, I simply had one routine which used a table of
information and a stack to do most of the recursive-descent
parsing.

	Somewhere along the way, I had the notion to combine the
two programs together to compute arithmetic using
dimensioned quantities.  The SOLVE program already had a
generic NUMBER type which could be redefined to be a
dimensioned number.  I soon added user-defined variables and
functions, plus a help feature.
	The program initially fell naturally into three main
components: main, solve, and convert.  I divided the program
up into modules mostly because they were too much to
conveniently handle in one large text file.  The Main module
included the main() function, and the functions for the
commands LET, DEFINE and HELP, and some miscellaneous
functions.	The Solve module contained the routines to parse
and evaluate expressions - there was no distinction made
between parsing and evaluating.
	The next step was to make the distinction between
parsing and evaluating, which required an intermediate tree
structure to hold the information.	I put routines to do the
support work for this into another file, Nodes.  After the
routines to parse and evaluate (in file Solve) became too
big and numerous, I decided to split off the built-in
operator functions and function functions, and the
associated parsing data, into another file, Oper.  When I
decided to make this an entirely stand-alone program that
could be packed up, sent off, and compiled elsewhere, I put
the few functions I was using from my private library into
another separate small module, MLib.
	At this point, the program was pretty powerfull and
flexible, but the source code was slowly turning into mush
as I added features.  I had one large header file
(PhysCalc.h) which contained all the declarations, includes,
and structures common to all the modules, the result of
which 90% of my functions were being declared for use by all
modules, and the program was still essentially one large
heavily-interconnected set of functions with no true
modularity to my grouping of functions.
	So I concentrated on trying to really modularize the
program.  I explicitly declared functions to be LOCAL,
IMPORT, or EXPORT, and did some rearranging to make as many
functions LOCAL as possible.  I tried to narrow down the
scope of as many things as reasonably possible.  I divided
up the header file in two - one for structure declarations,
includes, and constants, and the other for function and
variable declarations.


Internal program structure

	I have tried to separate the source code into several
semi-separate chunks for different aspects of the program:
main program and command handling, parsing and evaluating,
built in functions and operators, definitions and
declarations, unit conversion and handling, node handling.
If major modifications are needed in the future, I will
reorganize it along object-oriented lines.
	Expressions are parsed using a flexible recursive
descent parser.  The program does not use a parse table, but
has prefix/infix/postfix and priority information provided
for each operator, and so does not need separate routines to
parse each operator or level of precedence. Function parse()
in PHYSSOLV.C contains the guts of this mechanism.
	Expressions are parsed into a tree.  Each node is either
a kind of number (integer, real, fractional, dimensional), a
string identifying a variable, or a list of sub-nodes.
Depending on the node type and where the node is being used,
the sub-nodes represent operands for a built in operator, a
function name and parameters, or the information associated
with a user defined variable or function.
	Upon evaluation, the tree is traveled and all variables
are replaced with their associated values and functions are
expanded and evaluated. If the program encounters an
unknown, the expression tree at that point is left
unevaluated.


Documentation for the structures declared in PHYSCALC.H
-------------------------------------------------------

typedef struct d_num {		/* Structure for holding a dimensional number */
	double num;			/* scalar value */
	int di[ MAXDIM ];	/* dimensions */
} DNUM;

	This is the structure for a dimensioned number, containing the scalar
value, "num", and the dimensions "di".  "di" is an array of integers, one for
each possible fundamental dimension.  The integer corresponding to a dimension
indicates the power of that dimension (squared, cubed, ect.).  A power of zero
indicates that the dimensioned number does not have that dimension.  "DNUM" is
a shorthand reference to structure d_num.


struct fractstruct {		/* Fraction structure */
	int numer,denom;
};

	Struct fractstruct represents a fraction.  "numer" and "denom" contain the
values of the numerator and denominator, respectively, of the fraction.

	The above two structures are analagous to the regular C numeric data types
int, double, and so on.  At the time I created them, I did not know much about
object oriented programming and I did not worry about how they were used and
accessed.  If I wrote them again, I would limit access to the members through
a few functions and macros to increase modularity and encapsulation.


typedef struct nodestruct {	/* Holds multiple var types */
	int type;
	union {
		char str[1];				/* String */
		double dbl;					/* double number */
		long lng;					/* long integer */
		struct fractstruct frt;		/* Fraction */
		struct d_num dmn;
		struct nodestruct *list[1];	/* List of ptrs to nodes */
	} *data;	/* Pointer to the above union */
} NODE,*NODEP;	/* sizeof(NODE) should be 4 bytes */

	This structure is the foundation of PhysCalc, allowing functions to pass
around and work on data for which the type cannot be known at compile time.
"NODE" and "NODEP" are shorthand references to this structure.  The compile
time sizeof() this node is correct, but for "*data", it is misleading, since
at run time only as much room as is needed by a particular node is allocated,
and the string and list types are of variable size.  In a way, this is two
structures, a node, and the data it points to.	Since there should be a one to
one correspondence between a node and it's data, it is conceptually one big
structure.	It could probably (and perhaps should) have been defined as one
piece, without the extra pointer to the data.  It ended up this way because it
enable changing the node's type without risking changing the pointer to it.

	Nodes are the building blocks of expressions.  Nodes can point to other
nodes, or contain data.  A node represents an arbitrarily complex expression.
Nodes for functions or operators contain pointers to other nodes which contain
the function name, and the expressions to be passed as parameters.

	"type" contains the type of the node - which way union "data" is to be
interpreted. Type can be set to SNODE, INODE, FNODE, DNODE, NNODE, or LNODE+i,
indicating "str", "lng", "frt", "dbl", "dmn", and "list" respectively.
"*data" then points to the actual contents of the node.
	Types INODE and DNODE indicate usage of components "lng" and "dbl"
respectively, and these are just plain numbers (long or double).
	Types FNODE and NNODE are for "frt" and "dmn", fractions or dimensional
numbers.  These are structures themselves instead of basic types.
	Type SNODE is for string nodes.  The data pointer points to a NULL
terminated array of characters.  A string can be any size, but requires at
least one character (the NULL), so "str" is declared to be one character.  I
would have declared component "str" without a size, but then sizeof() won't
work.  A string node is usually interpreted as referring to a variable, named
in the string.

	Type LNODE indicates that the node is a function.  The contents is a
NULL-terminated list of pointers to nodes.	The first pointer should be to a
string node containing the name of the function, and the rest of the pointers
are to expressions for each parameter.
	Types greater than LNODE are reserved for built-in operators.  Operator
number n has node type (LNODE + n).  The list of pointers should contain one
or two expressions (depending on the kind of operator).

	Module PhysNode.C contains the code for creating, destroying, copying, and
manipulating nodes.

	For example, for node n, the expression (3 + f(x)) would be represented:

ͻ	  ͻ
 Type = (LNODE + 1)       type = INODE	
 data->list[0] =    data->lng = 3	
 data->list[1] = Ŀ	  ͼ
 data->list[2] = NULL  	  ͻ
ͼ   type = SNODE	
ͻ     data->str = "f" 
 type = LNODE		      ͼ
 data->list[0] =   ͻ
 data->list[1] =  Type = SNODE	
 data->list[2] = NULL 	   data->str = "x" 
ͼ	  ͼ


extern struct dimstruct {       /* Holds a dimension definition */
	struct dimstruct *next;		/* ptr to next in list */
	char *name;					/* dimension name */
	int dimension[ MAXDIM ];	/* dimensionality */
};

	This is the structure for the storage of dimension information.  The
information is stored in a linked list, each element containing the name of
the dimension an it's associated dimensions.  "next" is the pointer to the
next dimension in the list.  "name" points to the name of the dimension.
"dimension" is an array of integers representing the dimensions, like in
structure d_num.


extern struct varstruct {   /* Holds a user-variable */
	struct varstruct *next;
	char *name;
	NODEP value;
	int rflag;		/* flag to prevent infinite recursion */
};

	Variables are contained in a linked list.  "next" points to the next
variable in the list.  "name" points to the name of the variaable.  "value"
points to a node containing an expression associated with the variable.
"rflag" is a flag to prevent recursion when evaluating variables.  If "rflag"
is TRUE, the variable is left unevaluated because the variable is already
being evaluated.


extern struct ufstruct {    /* Holds a user-function */
	struct ufstruct *next;
	char *name;
	NODEP dummy;	/* List-node of dummy strings */
	NODEP value;
};

	User defined functions are also held in a linked list, where "next" points
to the next function definition in the list.  "name" points to the name of the
function.  "dummy" points to a special node (type LNODE) which is not a
function, but a list of pointers to string nodes.  These string nodes are the
parameters to the function.  "value" points to the function expression.  Any
variables in that expression matching one of the strings in "dummy" are
replaced with the corresponding expression passed to the function upon
evaluation.



NOTE: PhysCalc was originally written to convert everything
to uppercase and then work with it.  It has been retrofitted
to not convert filenames. Hopefully, the retrofit did not
cause other case-sensitivity problems.

