This directory is set up to run TGCI1 on the gene-id data.   The output
file is gene-id.data.new which contains the gene-id data described
using 12 new features.   We found that the new features were not used
when pooled with the old features; therefore, only the new features are
kept.  This new data can then be used by any standard learning algorithm.   
We used C4.5 in our experiments.

Thanks to Jude Shavlik's group for providing the original data.
In the original data, each feature represented a nucleotide.   Here
we start with each feature representing a codon (three nucleotides).
See Mark Craven and Jude Shavlik's paper "Investigating the value of 
a good input representation" in Computational Learning Theory and 
Natural Learning Systems for more information.

No domain theory was available in the gene identification domain; 
therefore, we created an artificial domain theory using the information 
that organisms may favor certain nucleotide triplets over others in 
gene coding.   The domain theory embodies the knowledge that a DNA 
segment is likely to be a gene coding segment if its triplets are 
coding-favoring triplets or if its triplets are not noncoding-favoring 
triplets.    The decision of which triplets were coding-favoring, which 
were noncoding-favoring, and which favored neither, was made empirically 
by analyzing the makeup of 2500 coding and 2500 noncoding sequences.

The files in this directory are:

   README - this file

   tgci1-run - shellscript for running tgci1.   No arguements. 

   footer, header, tgci1-input - used internally by tgci1-run shellscript.

   gene-id.data.orig - original gene-id data.  (This is only a subset of
	all the data available.   Also the features represent codon groups
	rather than individual nucleotides.)

   gene-id.data.new - data produced by tgci1-run

   gene-id.names.new - names file for C4.5

   tgci1.lisp - lisp code for tgci1

   theory.lisp - gene-id theory in a format readable by tgci1.lisp.  Note
	that this is an artificial theory as described in the article.

NOTES:
   1. To run tgci1, run the tgci1-run shellscript without any arguements.
   2. In tgci1-run, you will probably need to substitute in your version
	of LISP where I have lisp4.1

