Index of /~bkessler/sanskrit-thesis/prog/segments

      Name                    Last modified       Size  Description

[DIR] Parent Directory 25-Jul-2004 18:33 - [TXT] Makefile 25-Jul-2004 11:06 3k [TXT] readSegmentsLex.c 25-Jul-2004 11:06 44k [TXT] readSegmentsLex.l-ep..> 25-Jul-2004 11:06 1k [TXT] readSegmentsLex.l-pr..> 25-Jul-2004 11:06 1k [TXT] readSegmentsYacc.h 25-Jul-2004 11:06 1k [TXT] readSegmentsYacc.y 25-Jul-2004 11:06 4k [TXT] xduceSegments 25-Jul-2004 11:06 26k [TXT] xduceSegments.c 25-Jul-2004 11:06 2k

# -*- mode:     Fundamental -*- -------------------------------------------- #
# File:         segments/README
# Description:  Describes xduceSegments
# Author:       Brett Kessler
# Created:       3-May-92
# Modified:     Sun May  3 09:19:19 1992 (Brett Kessler)
# Language:     English
##############################################################################

This directory contains code for building xduceSegments, which reads the
../SEGMENTS data file and produces several files serving as fragments of
code usable by later programmes.  These automatically generated source code
files are deposited in the directory ../frags/:

segmentsLex are Lex rules for recognizing and tokenizing text that is
coded as per the transcriptions registered in the ASCII column of the
SEGMENTS file.  If there are multiple possible segmentations, the code
chooses the ongest possible segments (returning the fewest possible
tokens).  Lex returns the identifier SEGMENT_NAME for any segment, and
sets yylval to an integer uniquely identifying the segment.  These numbers
are assigned sequentially in the order the segments were listed in the
SEGMENTS file, counting from 0, but this has no external significance.

segmentsH is a fragment for inclusion in a C include (.h) file, defining
Segments_COUNT, the total number of segments.

segmentsC is a fragment for inclusion in a C implementation (.c) file,
defining the contents of the array Segments_Facts.  For each segment
in order (i.e., accessible by the numeric ID mentioned above), this
array lists the ASCII representation, the LaTeX epresentation, and the
features mask.  The first two fields are useful for terminal and LaTeX
display respectively.  The last is the sum of the mask for all the
features for which the segment was declared positive in the SEGMENTS
file.  Since the feature masks are all powers of two, their sum yields
a single number that succinctly and reversibly represents those features.

attributesLex is a fragment of Lex source code that recognizes the
spelt out form of attribute names declared at the bottom of the
SEGMENTS file, returning the code ATTRIBUTE_NAME and a numeric
identifier.  attributesH declares the value of Attributes_COUNT, and
attributesC defines Attributes_Names, which gives their name again,
for display purposes, when they are looked up by the ID.


The code in this directory has a simple organization.  The Lex file
for tokenizing SEGMENTS is generated automatically by sandwiching
../frags/featuresLex between readSegmentsLex.l-prolog and
readSegmentsLex.l-epilog.  This is compiled into a C file which gets
included at the bottom of readSegmentsYacc.y, producing a complete
Yacc parser for SEGMENTS.  The parser writes the frags/ files as it
parses each segment declaration.  readSegmentsYacc.h simply declares
an interface to the parser, and xduceSegments.c implements the main
module for the final programme, xduceSegments.  Typing `make` will
recompile the frags/ files if they are older than the ../SEGMENTS
file.

##############################################################################
##                                End of README
##############################################################################