Carthage README file

What is it?

  Carthage is a yacc/lex-based parser for SGML DTDs which can
delete references to undeclared elements.  It can also do a few
other things, depending on the run-time flags you give it.

  Carthage is unsupported software; it may be used freely without
further permission or royalty, but users who improve it or fix
errors are requested to notify the author so he can also fix them.


How do I use it?

  To clean a DTD in the current directory, just invoke the command
'carthage' with two arguments:  the filename of the input DTD and
the file name for the output DTD.  For example:

  carthage dirty.dtd clean.dtd

  The DTD parser itself is called 'carthago' (that's Latin for 
'Carthage') and can be invoked on its own if you like.  To get a 
list of the options it understands, try invoking it with the
option '-?'.  When I did this just now, this was the result:

  $ carthago -?
  carthago:  expected usage is
    carthago [options] < dtd-file > output-file
    where options are these (* means default):
    --msglevel 5*|n (or --trace --debug --verbose)
    --msdrop* --mskeep (drop/keep marked sections, -s0 -s1)
    --dupentquiet* --dupentwarn (warn if entity declared twice? -e0 -e1)
    --pedrop* --pekeep (drop/keep parameter entity declarations -p0 -p1)
    --commdrop* --commkeep (drop/keep parameter entity declarations -c0 -c1)
    --undeclared* --unused --declared --used (include in report file, any or all)
    --delete <gi> <gi> ... (list of GIs to delete from content models)
    --delenda <filename> (name of delete-list file)
    --output <filename> (name of report file, may reuse as delenda file)


What this means is that carthago reads from stdin and writes to
stdout, and takes the following options:

  --msglevel <integer>
  --trace               (same as --msglevel 0)
  --debug               (same as --msglevel 1)
  --verbose             (same as --msglevel 3)

    These control the emission of error and warning messages.  Trace, 
debugging, and 'verbose' messages are enabled when the message level
is set to 0, 1, or 3 respectively.  The default (--msglevel 5) enables
informative messages; to allow warnings only, set --msglevel 7.  For
error messages only, use --msglevel 10.  Values higher than 10 have no
effect.

The default is --msglevel 5, and message levels above 10 have no
effect.

  --msdrop*
  --mskeep

    Control whether marked sections marked IGNORE are suppressed
silently in the output or kept.  By default they are suppressed.

  --dupentquiet* 
  --dupentwarn 

    Control whether a warning is issued if an entity is declared twice.
The default is to say nothing (because the TEI uses multiple 
declarations frequently, so double declarations are seldom errors
in my own work.) 

  --pedrop* 
  --pekeep 

    Control whether to drop or keep parameter entity declarations in
the output.  By default they're dropped, because all references to
such entities are expanded and there seems to reason to keep the
declarations around if the references are expanded.  

  --commdrop 
  --commkeep*

    Control whether to drop or keep comments in the output.  The
message says that by default they are dropped but I've just checked
the code and by default they are kept.  Sorry about that.

  --undeclared* 
  --unused 
  --declared 
  --used 

    Control what lists of elements should be included in the report file, 
if any.  By default undeclared elements (i.e. element which are used
but not declared) are included in the list.  Unused elements (elements
declared but not appearing in content models), declared elements, and
elements referred to in content models (--used) may also be listed.
Comments in the output file should make clear which is which.

  --delete <gi> <gi> ... 
  --delenda <filename>

    Indicate a list of generic identifiers which should be deleted
from content models before element declarations are written out.  The
--delete option allows you to specify generic identifiers on the 
command line; I use this for testing, but it might also be handy for
real work in some cases.  The --delenda option takes the name of a 
file containing the generic identifiers to be deleted.  ('Delenda' is
the Latin word meaning 'things to be deleted'.  In a famous speech
Cicero used the word to describe Rome's enemy Carthage -- Carthago
delenda est -- which gives this program its name.)

    If the program encounters a reference to a generic identifier
that cannot safely be deleted (e.g. because it's a required 
subelement), an error message is issued and the reference is left
undeleted.  If


  --output <filename> 

    Specify the name of the report file to be generated; the file
thus created may be reused as the delenda file.


How do I fetch and compile it?

  1 ftp to ftp-tei.uic.edu and fetch everything from the directories
 
    /pub/tei/grammar/carthage 
    /pub/tei/grammar/carthage/lib

The makefiles assume that these will go into ~/carthage and ~/lib,
respectively, but if you are willing to change the makefiles, you 
can put them where you like.

  2 go to the library directory and type make

N.B. on my AIX system this works fine; on the Sun I have access to,
gcc complains bitterly about line 43 of file msg.c.  The quick hack
is to delete the asterisk from line 43 if you get such a complaint.
The next release will replace this whole routine with a more rational
message function; in the meantime, please let me know if changing it,
or not changing it, leads to crashes.  

  3 go to the carthage directory and type make carthago

  4 (optional) add ~/carthage to your path, or move carthage and 
carthago to your ~/bin directory.

  5 If you want a simple check that carthago compiled correctly, just
invoke it and type some declarations at it.  Here's a transcript of
such an interactive session; I typed the lines labeled u, carthago
typed those labeled c:

u    $ carthago --delete a e i o u --verbose
u    <!-- test of carthago --
u    >
c    <!-- test of carthago -->
u    <!element a - - (#pcData) >
c    
c    <!ELEMENT a - -  (#PCDATA) >
c    
c    ! ! ! <stdin>:2  Element a is marked for deletion, but is declared.! ! !
u    <!element b - o (a?, c, d, e) >
c    ! ! ! <stdin>:3  Model requires hand work:  (c, d, e)! ! !
c    
c    <!ELEMENT b - O  (c, d, e) >
c    
u    <!element c - o (a?, b, c?, d, e*, f+) >
c    
c    <!ELEMENT c - O  (b, c?, d, f+) >
c    
u    <!element d - - ( a | b | c | d | e | f | g | h | i | j)* >
c    
c    <!ELEMENT d - -  (b | c | d | f | g | h | j)* >
c    
    $ 


To test that carthage (the shell script) also works, invoke it on
some existing DTD, e.g. the dirty.dtd included in the carthage
directory.  Check that your output looks like the clean.dtd also
in the directory:

   carthage dirty.dtd new.dtd
   diff clean.dtd new.dtd

N.B. if ftp didn't do it, you will need to set the execute permission
on the carthage script.  If it still doesn't work, make sure it's
pointing to the actual location of the Korn shell on your system.

Feel free to modify the SGML_PATH and DPP_PATH values in the shell
script; the ones in there now are just those I habitually use; they
should be replaced with the ones you habitually use.

-C. M. Sperberg-McQueen
 University of Illinois at Chicago
 
 17 June 1996