I. Introduction.

This describes how to add a track to the UCSC genome browser.
The two major steps in adding a track are creating a table
containing the track information, and putting a description
of the track in trackDd.  The browser has one mysql database for
each version of each genome that it displays.  Both the track
table and the track description live in this database.  The current
human genome database is hg13, while the current mouse database is
mm2.  


II. MySQL Preliminaries.

Before you get started it is good to look at these databases a little,
and make sure that you have update access to them.  You could do this
directly with the 'mysql' command,  but let's do it instead with the
'hgsql' command, which will keep you from having to type your user name
and password all the time.

Assuming you've got the browser source already installed in ~/kent/src
do the following to create hgsql
	cd ~/kent/src/lib
	make
This make almost always goes smoothly on Linux.  You may need to
remove the '-ggdb' flag in the makefile on other systems, and possibly
set up a MACHTYPE environment variable, and then mkdir $MACHTYPE on
some systems.  Next
	cd ../hg/lib
	make
The main problem that can happen with this make is if the mysql libraries
and include files are not found.   See kent/src/README for details. The
next step is
	cd ../hgsql
	make
	rehash
The hgsql program is just a thin wrapper around mysql.  It looks for the
password and username in the file ~/.hg.conf.   Here's the necessary
parts of .hg.conf:

# db.host is the name of the MySQL host to connect to
db.host=localhost
# db.user is the username is use when connecting to the host
db.user=mysqlUserName
# this is the password to use with the above hostname
db.password=mysqlPassword

This .hg.conf is similar to the cgi-bin/hg.conf file that the
browser uses,  but it need not contain everything that file does.
Also it's advisable to have a read-only user/password in cgi-bin/hg.conf
while you'll want a read-write user/password in ~/.hg.conf.  Setting
this up can involve doing some 'grants' in mysql.  See the documentation
at www.mysql.com for how do set up various users.

Assuming your mysql and .hg.conf are set up, and that you already
have a mirror site going then the command
	hgsql database
(where database is something like hg13 or mm2) should bring you to the
mysql prompt.  Do
	mysql> show tables;
and you should see a large list of tables.  When you've finished adding
a track, the track(s) for your tables will be among them.  Also try
doing:
	mysql> describe trackDb;
This will list the fields of the trackDb table, which has a row for
each track.  Then do
	mysql> select tableName,shortLabel,type from trackDb;
This will show you some of the key fields from this table.  You won't
be updating this table directly, but it can be handy to look at it
sometimes for debugging purposes.   Some other useful mysql commands are
        mysql> select count(*) from sex;
This will count up all the items in the sex table.  You thought there
were only two?  This table reflects the diversity of the sex fields in
genbank.  Try
	mysql> select * from sex;
to see the full diversity.   Well, enough of that non-normalized nightmare.
To get out do:
        mysql> quit


III. Loading the main track table.

The UCSC Genome Browser Database is usually loaded from
a text file of some sort.  The most popular types of
text files are .psl files for blat alignments,  .bed files 
for a wide variety of data, and GTF files for gene predictions.   
See http://genome.ucsc.edu/goldenPath/help/customTrack.html
for further information on these formats.  For now we'll
assume you have a file in one of these formats that you
want to add to the browser.

A) Creating the loader programs
	cd ~/kent/src/hg/makeDb
	cd hgLoadPsl
	make
	cd ../hgLoadBed
	make
	cd ../ldHgGene
	make
	cd ../hgPepPred
   The makeDb directory also contains loaders for a number of more
   specialized tables including hgLoadOut for RepeatMasker data.  There
   is also a .doc file describing in detail how we created each database
   in the files named things like makeHg13.doc and makeMm2.doc.

B) Loading a bed file
   Loading a bed file is the most straightforward.  First decide on
   the name you want to call the table.  Then do
       hgLoadBed database tableName file.bed
   type hgLoadBed with no arguments for further information. 
   Database will be something like hg13 or mm2.

C) Loading a psl file
   Loading a psl file is also easy.  Make sure that the psl file
   is sorted by chromosome (tName) and start position (tStart).  Use
   kent/src/hg/pslSort or just plain Unix sort for this if necessary.
   If the number of alignments is somewhat modest (say less than 
   500,000) then do
   	hgLoadPsl database -table=tableName file.psl -tNameIx
   This will load everything into one big table.  For huge numbers of
   alignments the browser will be faster if you first split up the
   data into one file for each chromosome.  Name these files 
   chr1_tableName.psl  chr2_tableName.psl and so forth.  Then do
        hgLoadPsl database chr*_tableName.psl
   This will end up making a separate table for each chromosome.
   Unfortunately it is still a bit complicated to make the details
   pages for a psl format track to include the alignments themselves.
   Please contact us at UCSC if this is a priority for you and we
   will try to make it easier.

D) Loading a GTF (or GFF) file
   Generally GTF is a much more tightly defined standard than GFF, so
   GTF files are more likely to work without tweaking.  However most
   reasonable GFFs will work as well.  To load do
        ldHgGene database tableName file(s).gtf
   This will make a gene-prediction type table. You often will want
   to create an associated predicted peptide table as well. To do this
   do
        hgPepPred database generic tableNamePep file(s).fa
   The first word after the '>' in the fasta files should use the same
   symbol as the 'group' in GFF files or 'transcript_id' in GTF files.

IV. Updating trackDb

Your data will not display in the browser until you load it
into trackDb. To do this first
	cd ~/kent/src/hg/makeDb/trackDb
and look at the file trackDb.ra,  and read the README file.
Then decide whether your new track should be global, organism
specific, or assembly specific, and edit the corresponding 
trackDb.ra file.  Generally it's good to find an existing
track as similar as possible to the track you want to add,
copy and paste it, and modify the copy.  Then put any explanitory
text you want on the track in trackName.html in the appropriate
directory.  After this do a
	make alpha
to update the trackDb table,  or a
        make
to update trackDb_user (where user is you Unix username).
If multiple engineers are working on the project you can set
up cgi-bin-user directories with hg.conf files that will
tell the browser to use trackDb_user instead of trackDb
to avoid conflicts with other engineers code.   

After the 'make alpha' the browser should show your track.
Congratulations if you've made it this far.  See also
http://www.soe.ucsc.edu/~sugnet/doc/trackHowto/browserTalk.pdf
for Charles Sugnet's description of how to add a track including
some code customization.

-Jim Kent   Feb 14, 2003.

      
	
