syntax.xle – syntaktischer zucker (I)

Posted on May 25, 2021 by na

english title: how to not get mad, yet, parsing

general disclaimer: these lecture notes are personal and might make as little sense to you as they do for me. I will only share a little subset of them, when I think it might be more helpful than confusing and more correct than wrong. Use at own risk. If something you read here evokes strong emotions (like deep disagreement or amour fou) and you can’t stand it, feel free to contact me, if you know how: narnold@cl…

As in the course on formal syntax we are already working with xle, I think the pressure is high, to release a post showing some ways to not get completely mad debugging xle grammars. I recently discovered in the xle documentation a useful way to easily create some shortcuts for (partially) testing your grammar.

There will be more posts in the future and probably this post itself will get updated a bunch of times.

xle

brief introduction: what is xle? xle stands for extremely painful lessons in frustrationstoleranz and has been created to prove that your level of frustrationstoleranz is not high enough, and also, the idiot is always you.

it also tries to teach you how to be creative and develop strategies when dealing with real world stuff (that is to say, legacy and broken stuff), when you are not looking. so let’s have a look.

you might say… (“the bad”)

…but (“the good”)

on (not) using emacs

If you do not happen to already know emacs, using it means using an unfamiliar editor to edit source code of an unfamiliar specification language (xle) in eventually unfamiliar interactive shell way of interaction, but one of the less comfortable sort.

As often with keyboard-focused editors, it might be even hard for you to do the most trivial and common things (copy/paste, select text, change focus, re-run commands, enter text, exit the program, run a command…). Thus I am suggesting another path here, skipping emacs altogether.

We will not need emacs here, you can just use your favourite text editor to edit the .lfg files – however if you have syntax errors, you might want to open emacs just to get the syntax highlighting, but it’s really not doing much, and most IDEs provide bracket highlighting even for unknown languages. You will also loose the auto-formatting functionality of xlemacs, if you completely avoid using emacs.

But I think it is in general not necessary to use emacs for interacting with xle.

workflow and debugging

usually you will find yourself debugging a single sentence for several hours, until you find, you forgot to update the lexical entry of some determiner, or you confused dat with acc, OBJ with OBL or what not. Whenever you make a change to the grammar, you need to re-create the parser, and try to parse again. Some examples:

% parse {weil Axel xle mag geht er oft in den Keller .}  # that's the goal

% parse {CP: weil Axel xle mag}            # let's try parsing subtrees
% parse {I': geht er oft in den Keller}

...

% parse {C: weil}       # let's see if at least all the terminals parse correctly
% parse {N: Axel}
% parse {N: xle}
% parse {P: in}
...

Instead of writing about how much time you loose doing all this manually and even more so by mistyping and habitually using your arrow keys, trying to correct your typo, but – “Eh eh, no arrow keys here”, press 4 times backspace to get rid of ^[[D (or even ^[[D^[[D^[[C^[[C if you have strength of character!) – I will just provide you with a solution, rc files:

first tests

# putting the following line in a text-file with name xlerc in the same folder as
# our .lfg file will automatically run create-parser and try to parse the whole sentence.

create-parser g4.lfg ; parse {weil Axel xle mag geht er oft in den Keller .}


# end of file: xlerc

You should try, if this works for you. You can even create the whole file by a single command in the terminal, just navigate to the directory of xle-exercises/g4/ and create the xlerc file as follows (run this in your terminal, not in the xle-shell (copy all 3 lines at once into the terminal and hit Enter):

tee -a ./xlerc <<'EOF'
create-parser g4.lfg ; parse {weil Axel xle mag geht er oft in den Keller .}
EOF

This was just for getting started, now we will come to the shiny stuff…

every key hit counts

xle allows you to define procedures (basically your own commands) that you can then use in the xle shell. Don’t wanna type create-parser all the time? Define a shortcut. In your xlerc file, write:

proc c {FILENAME} {create-parser $FILENAME.lfg}

explanation: From left to right, this defines the procedure named c taking an argument named FILENAME. In the second pair of brackets we define the actual commands to be run. If you now python or almost any programming language, it’s just like any function in Python a basic way of grouping a bunch of lines of code and associating it with a name to invoke those lines later:

def c(FILENAME): 
    createparser(FILENAME ++ ".lfg")

Then the $-sign reads the value of the variable FILENAME that had been passed to the function as an argument and appends the file-suffix “.lfg” to save us another 4 strokes.

Now rerun xle – (ctrl-c sends a signal to the xle process to kill itself, which is btw. the reason why the same key combination can’t be used for copying stuff from the terminal. An additional shift as in shift-ctrl-c works in some cases.)

In xle again, you should then be able to simply run c g4 instead of typing create-parser g4.lfg and have the same effect.

proc r {} {create-parser g4.lfg}

If you hardcode the grammar file you are currently working with, you only need to type r and hit Enter to recompile/reload the grammar.

Sadly the graphical window gui does not allow you to run multiple parses in a sequence. It will indeed perform them but only keep the windows of the last parse. However you can still see in the console output, if the parse was successful or not:

proc 1nouns {} {
parse {N: Axel} ;
parse {N: xle}  ;
parse {N: Keller}
}

It’s also possible to use single digits as procedure names:

# to parse the whole sentence 1

proc 1 {} {
r;  # runs our recompile shortcut
parse {weil Axel xle mag geht er oft in den Keller .}
}

# to parse partials of sentence one you might want to use 1a
# or even 11, 12, 13 for sentence 1, partial 1 etc.

proc 1a {} {
parse {CP: weil Axel xle mag}
}

proc 11 {} {
parse {CP: weil Axel xle mag}
}

# in the end, to check if all parses (still work)
# assuming we defined proc 1, proc 2 ... proc 4

proc all {} {1; 2; 3; 4}

So, my general advice would be, to analyze each sentence separately and create shortcut commands based on that analysis for quickly testing parses of substructures, working bottom-up. (although you might have to ignore some errors which will go away for more complete phrases… you should basically use this method to see if any c-structure tree is constructed at all, not if it is fully without error.)

upcoming chapters…

limitations

custom rc files

You can use the source-command to load (and run) any rc-file you like. Doesn’t need to be called xlerc. source xlerc is basically what’s run automatically for you, if the file exists.

folder structure (suggestion)

For every grammar you build, take a separate folder with it’s own xlerc file, specifying the shortcut-procedures you want to have.

 .../xle-exs/g4/g4.lfg
               /xlerc
            /g5/g5.lfg
               /xlerc

future work: