syntax.xle – syntaktischer zucker (I)

Posted on May 25, 2021 by na

english title: how to not get mad, yet, parsing

general disclaimer: these lecture notes are personal and might make as little sense to you as they do for me. I will only share a little subset of them, when I think it might be more helpful than confusing and more correct than wrong. Use at own risk. If something you read here evokes strong emotions (like deep disagreement or amour fou) and you can’t stand it, feel free to contact me, if you know how: narnold@cl…

As in the course on formal syntax we are already working with xle, I think the pressure is high, to release a post showing some ways to not get completely mad debugging xle grammars. I recently discovered in the xle documentation a useful way to easily create some shortcuts for (partially) testing your grammar.

There will be more posts in the future and probably this post itself will get updated a bunch of times.

xle

brief introduction: what is xle? xle stands for extremely painful lessons in frustrationstoleranz and has been created to prove that your level of frustrationstoleranz is not high enough, and also, the idiot is always you.

it also tries to teach you how to be creative and develop strategies when dealing with real world stuff (that is to say, legacy and broken stuff), when you are not looking. so let’s have a look.

you might say… (“the bad”)

xle is old
xle is proprietary (that’s why it is still bad today)
xle is basically unmaintained
there is no modern IDE for xle under active development
the documentation is overloaded and at the same time incomplete
xle syntax is bad, sometimes overly verbose
ux:
- the rendered ui stuff is a pain to work with
- you can’t use arrow-up to get previous commands (as in most modern shells)
- you can’t use arrow-left to move the cursor for modifying a command (as you can just everywhere)
- the stdout and compiler errors are hardly legible and even harder to understand

…but (“the good”)

it obeys some old basic unix philosophy, which we can (ab)use to make our life much easier, when debugging xle grammars.
it works, in the end. (sometimes.)

on (not) using emacs

If you do not happen to already know emacs, using it means using an unfamiliar editor to edit source code of an unfamiliar specification language (xle) in eventually unfamiliar interactive shell way of interaction, but one of the less comfortable sort.

As often with keyboard-focused editors, it might be even hard for you to do the most trivial and common things (copy/paste, select text, change focus, re-run commands, enter text, exit the program, run a command…). Thus I am suggesting another path here, skipping emacs altogether.

We will not need emacs here, you can just use your favourite text editor to edit the .lfg files – however if you have syntax errors, you might want to open emacs just to get the syntax highlighting, but it’s really not doing much, and most IDEs provide bracket highlighting even for unknown languages. You will also loose the auto-formatting functionality of xlemacs, if you completely avoid using emacs.

But I think it is in general not necessary to use emacs for interacting with xle.

workflow and debugging

usually you will find yourself debugging a single sentence for several hours, until you find, you forgot to update the lexical entry of some determiner, or you confused dat with acc, OBJ with OBL or what not. Whenever you make a change to the grammar, you need to re-create the parser, and try to parse again. Some examples:

% parse {weil Axel xle mag geht er oft in den Keller .}  # that's the goal

% parse {CP: weil Axel xle mag}            # let's try parsing subtrees
% parse {I': geht er oft in den Keller}

...

% parse {C: weil}       # let's see if at least all the terminals parse correctly
% parse {N: Axel}
% parse {N: xle}
% parse {P: in}
...

Instead of writing about how much time you loose doing all this manually and even more so by mistyping and habitually using your arrow keys, trying to correct your typo, but – “Eh eh, no arrow keys here”, press 4 times backspace to get rid of ^[[D (or even ^[[D^[[D^[[C^[[C if you have strength of character!) – I will just provide you with a solution, rc files:

rc stands for runcommands and is a typical way of configuring programs under linux.
it’s basically a script file and on startup of the program, all commands will be executed
xle looks out for a file with the special name in the current working directory (not only there, but that’s what we’re gonna use) called xlerc or .xlerc. For our purpose it’s a good idea to use the former, as preceeding a file name with a dot usually means the file should be hidden. On linux compare the output of ls ~ and ls -a ~ to learn more.

first tests

let’s update our xle-runcommands file with every new exercise sheet, so it will always fit the current exercises and let’s call the lfg file g4.lfg for the grammar of exercise sheet 4.
the semicolon ; combines commands to a sequence of commands: first execute create-parser, and then parse {…}, even if create-parser fails due to some syntax error in the .lfg file!

# putting the following line in a text-file with name xlerc in the same folder as
# our .lfg file will automatically run create-parser and try to parse the whole sentence.

create-parser g4.lfg ; parse {weil Axel xle mag geht er oft in den Keller .}


# end of file: xlerc

You should try, if this works for you. You can even create the whole file by a single command in the terminal, just navigate to the directory of xle-exercises/g4/ and create the xlerc file as follows (run this in your terminal, not in the xle-shell (copy all 3 lines at once into the terminal and hit Enter):

tee -a ./xlerc <<'EOF'
create-parser g4.lfg ; parse {weil Axel xle mag geht er oft in den Keller .}
EOF

This was just for getting started, now we will come to the shiny stuff…

every key hit counts

xle allows you to define procedures (basically your own commands) that you can then use in the xle shell. Don’t wanna type create-parser all the time? Define a shortcut. In your xlerc file, write:

proc c {FILENAME} {create-parser $FILENAME.lfg}

explanation: From left to right, this defines the procedure named c taking an argument named FILENAME. In the second pair of brackets we define the actual commands to be run. If you now python or almost any programming language, it’s just like any function in Python a basic way of grouping a bunch of lines of code and associating it with a name to invoke those lines later:

def c(FILENAME): 
    createparser(FILENAME ++ ".lfg")

Then the $-sign reads the value of the variable FILENAME that had been passed to the function as an argument and appends the file-suffix “.lfg” to save us another 4 strokes.

Now rerun xle – (ctrl-c sends a signal to the xle process to kill itself, which is btw. the reason why the same key combination can’t be used for copying stuff from the terminal. An additional shift as in shift-ctrl-c works in some cases.)

In xle again, you should then be able to simply run c g4 instead of typing create-parser g4.lfg and have the same effect.

proc r {} {create-parser g4.lfg}

If you hardcode the grammar file you are currently working with, you only need to type r and hit Enter to recompile/reload the grammar.

Sadly the graphical window gui does not allow you to run multiple parses in a sequence. It will indeed perform them but only keep the windows of the last parse. However you can still see in the console output, if the parse was successful or not:

proc 1nouns {} {
parse {N: Axel} ;
parse {N: xle}  ;
parse {N: Keller}
}

It’s also possible to use single digits as procedure names:

# to parse the whole sentence 1

proc 1 {} {
r;  # runs our recompile shortcut
parse {weil Axel xle mag geht er oft in den Keller .}
}

# to parse partials of sentence one you might want to use 1a
# or even 11, 12, 13 for sentence 1, partial 1 etc.

proc 1a {} {
parse {CP: weil Axel xle mag}
}

proc 11 {} {
parse {CP: weil Axel xle mag}
}

# in the end, to check if all parses (still work)
# assuming we defined proc 1, proc 2 ... proc 4

proc all {} {1; 2; 3; 4}

So, my general advice would be, to analyze each sentence separately and create shortcut commands based on that analysis for quickly testing parses of substructures, working bottom-up. (although you might have to ignore some errors which will go away for more complete phrases… you should basically use this method to see if any c-structure tree is constructed at all, not if it is fully without error.)

upcoming chapters…

limitations

as for now I have discovered one limitation regarding special characters. you cannot parse a word like { xle-Spaß } using rc-files. Maybe it’s necessary to also specify the encoding explicitly in the rc file as well, to fix that.
update: has been fixed (see next posting) using set-character-encoding stdio utf-8

custom rc files

You can use the source-command to load (and run) any rc-file you like. Doesn’t need to be called xlerc. source xlerc is basically what’s run automatically for you, if the file exists.

folder structure (suggestion)

For every grammar you build, take a separate folder with it’s own xlerc file, specifying the shortcut-procedures you want to have.

 .../xle-exs/g4/g4.lfg
               /xlerc
            /g5/g5.lfg
               /xlerc

future work:

[ ] demo gif of terminal workflow