Index of /~bkessler/sanskrit-thesis/prog/features
Name Last modified Size Description
Parent Directory 25-Jul-2004 18:33 -
Makefile 25-Jul-2004 11:06 4k
avm.c 25-Jul-2004 11:06 6k
avm.h 25-Jul-2004 11:06 2k
featName.c 25-Jul-2004 11:06 1k
featName.h 25-Jul-2004 11:06 1k
fvPair.c 25-Jul-2004 11:06 1k
fvPair.h 25-Jul-2004 11:06 2k
readFeaturesLex.l 25-Jul-2004 11:06 1k
readFeaturesYacc.h 25-Jul-2004 11:06 1k
readFeaturesYacc.y 25-Jul-2004 11:06 2k
valueType.c 25-Jul-2004 11:06 1k
valueType.h 25-Jul-2004 11:06 2k
xduceFeatures 25-Jul-2004 11:06 26k
xduceFeatures.c 25-Jul-2004 11:06 1k
# -*- mode: Fundamental -*- -------------------------------------------- #
# File: features/README
# Description: Describes sandhi/features directory.
# Author: Brett Kessler
# Created: 2-May-92
# Modified: Sun May 3 09:35:33 1992 (Brett Kessler)
# Language: English
##############################################################################
The code in the features/ directory builds xduceFeatures, which reads
the ../FEATURES file and generates in frag/ Lex and C source code.
The FEATURES file declares the phonemic features found in Sanskrit.
It uses a sort of feature geometry, where certain features (basic or
leaf features) only have meaning in the domain of some dominant,
composite feature. For readability, this is expressed in an AVM
notation, where each attribute is a feature name, and the value
denotes the type of value the feature can take: "binary" means it is a
leaf feature, and composite features take an AVM as their value. The
program xduceFeatures reads that file and writes it in a form more
directly accessible to other programs. In so doing, it assigns a
unique identifier to each feature, so that other programs only have
to deal with integers. It also computes a bit mask value for each
leaf feature. These are unique powers of two, so that any segment can
be uniquely and reversibly identified as a sum of such masks for each
leaf feature that has a positive value. The presence of composite
features can be inferred from the presence of their child features,
and so have mask values of 0.
The output of this program are the following three files in ../frag/:
featuresLex is a fragment of a Lex source file for reading
feature names from a file and returning their unique ID. It is used by
other programs (xduceSegments and xduceRules) that need to read data
files that mention features by name (SEGMENTS and RULES).
featuresH is a fragment of C code (actually preprocessor code) that
defines Features_COUNT, which tells how many features there are. It
is used by xduceSegments and derive.
featuresC is a fragment of C code that define arrays which tell
other programs information found in the FEATURES file. It defines
Features_Names, which tells the name of a feature; Features_parent, which
tells its dominating feature (or Features_NONE if top-level); and
Features_mask, which tells its mask. All are accessed by the feature
ID number. In the somewhat rarer case that an ID needs to be inferred
from some value (particularly, when we need to know what children a
composite feature has), the array is simply enumerated, which is not
particularly time-consuming since there are so few features.
In this directory, the main module is xduceFeatures.c, which creates the
empty frags/ files and invokes the parser, via the interface described in
readFeaturesYacc.h. The parser is implemented in the Yacc file
readFeaturesYacc.y, which incorporates the tokenizer readFeaturesLex.c,
which is compiled from the Lex file readFeaturesLex.l. The rest of the
files define the major data types (*.h) and implement operations on
objects of those types (*.c): avm.* are attribute-value matrices, lists
of feature-value pairs, themselves defined in fvPair.*. Each fvPair
consists of a feature name (featName.*) and a data type, which in the
case of composite features may be another avm. The parser reads in
the FEATURES file into one recursive AVM feature, then invokes
AVM_TransduceToPhonesReader with the empty frags/ files.
##############################################################################
## End of README
##############################################################################