Sunday, May 13, 2012

Let the hacking commence

In this post, I'd like to sum up all the changes and fixes I've made so far to the CL package of Elisp, as well as introduce separate tools (functions and macros) that can be useful in lexical Elisp.  You can find all of them by running:

svn export
svn export

They seem to work fine in Emacs 24.0.95 at my laptop :) So, let's start.

1. Brief overview


1.1. CL package vs true Common Lisp

The CL package (which was created before the introduction of lexical scoping) doesn't provide full Common Lisp-compatibility for Elisp.  It should be regarded as a set of useful Common Lisp-like facilities to make Elisp programming be higher-level.

When I was making my way through the CL package, it was a great disappointment for me to discover the package to be inconsistent with canonical Common Lisp.

And I sent an email to the Emacs-Devel mailing list containing all the (negative) CL experience I had had by that time.  Since then, I managed to find acceptable workarounds or fix almost all of things I didn't like.  They all will be described below.

To put the main idea short: don't expect Elisp and CL package to be as good as
Common Lisp and its standard library.  These 2 languages serve very different purposes, and belong to different weight categories.


1.2. Influence of lexical scoping on the CL package

Another sort of things I'll mention here is changes that are caused by the introduction of lexical scoping in Emacs 24.  Giving that, a bunch of things CL package offers become either obsolete or unneeded at all.  I'm talking mainly about lexical-let and related stuff that emulates lexical scoping in dynamically-scoped Elisp.  Since now Elisp natively supports lexical scoping, all these things are now obsolete.. they degrade performance and (which is much worse) programmer's efficiency.


2. CL package changes

Before we start, I'd just want to say that I tried to introduce as little changes as possible.  Issues of having to merge all these changes every time a new version of Emacs is released and the problem of cross-system compatibility both do matter.  Not to break anything is still more important thing to take care of.  So I tried to minimize changes in existing code in favor of introducing new functions, where possible.  In those places where changes were inevitable, I did my best to make them as seamless and harmless as possible.


2.1. macroexpand-1

Elisp doesn't have macroexpand-1 facility, contrary to Common Lisp.  By the way, Common Lisp lacks macroexpand-all, whereas Elisp does have it.

macroexpand-1 turns out to be really necessary, as you'll see later on.  So I decided to implement it.  macroexpand is a C subroutine, so there's no other
option as to implement macroexpand-1 in C too.  Fortunately, this was not very difficult, since macroexpand is implemented by just performing single macroexpansions in a while loop.  So I just cut the loop's body and pasted it
into macroexpand-1's definition (this was not as simple as it sounds, though,
but it is essentially what was done), and then I made macroexpand spin in a
loop calling the new routine (macroexpand-1) and performing sequential
macroexpansions, until what we get is not a macro call.  Pretty easy.

2.2. Places

Places are not handled gracefully by the CL package either.  Yes, it is very good that they've been introduced to Elisp.  But they are not quite the same as in true Common Lisp.  You can read about some problems with them in that same
mail by the link above.

Fortunately, it was easy to fix the main inconsistency. get-setf-method has been
changed to call cl-macroexpand-1 in cases it cannot find a setf-expansion for the form.  Previously it used cl-macroexpand for that, expanding all the intermediate macro forms and not examining whether any of them is a setf-able place.

2.3. Symbol macros

Common Lisp has a notion of a symbol macro and a special form named
"symbol-macrolet".  In the CL package, they tried to implement such a thing in Elisp as well.  They should better have not done that..

There's a variable named "cl-macro-environment" which tracks, as its name says, the current macro environment (both when evaluating and when byte compiling code).  Normally it contains conses mapping symbols (macro names) to macro functions.  Authors of the CL package decided to represent symbol macros as cons cells which map strings (symbol-name of macro symbols) to their expansions.  Now comes the most disgusting thing about this: cl-macro-environment is scanned with assq, meaning that strings are compared with eq.  This makes a user care about what actual string is stored in a symbol's symbol-name slot, which is very bad.
It is quite possible that 2 distinct symbols share the same string as their
symbol-name, and they both get expanded with the same symbol macro definition, while only one of them is really a symbol macro.  I'll cite one of examples you can find here:

(defmacro test-macro ()
  (let* ((symbol-1 'sym)
         (symbol-2 (make-symbol (symbol-name symbol-1))))
    `(let ((v1 0)
           (v2 0))
       (symbol-macrolet ((,symbol-1 (incf v1))
                         (,symbol-2 (incf v2)))
         (list v1 v2)))))

This code prints (0 2), when it should print (1 1): that's because the same string object serves the purpose of being the symbol name for 2 distinct symbols.

So I advise to all Elisp programmers: forget about symbol macros.  They're not
handled properly, and Elisp has been unaware of symbol macros for a very long
time before.  (Honestly, the language doesn't need them.)  Moreover, there are
many places in the CL package itself where symbol macros are handled incorrectly (macros setf, psetf, shiftf, rotatef -- they all just check their arguments with the simple symbolp predicate, not even trying to find out whether it is a symbol macro).


2.4. cl-macroexpand-all

This function's consistency is also dubious.  First, in some cases it may return
non-fully expanded expression.  Look at this:

ELISP> (cl-macroexpand-all '(let (((aref x 1) 10)) x))
    (((aref x 1)

Yes, I agree that this ability to treat let like letf (and setq like setf, by the way) was probably devised to be used only internally by the CL package. Nevertheless, I'm sure this behavior is erroneous, anyway.

Now look at the second example:

ELISP> (labels ((double (x) (list x x)))
         (list 'double (double 10)))
   (list x x))
 (10 10))

ELISP> (labels ((double (x) (list x x)))
         (list #'double (double 10)))
   (list x x))
 (10 10))

Do you see the problem ?  Under "labels" all the quoted names of a function
defined are treated as the references to that function.  So there's no way to
just write a quoted symbol 'double.  Also bad.

In order to fix cl-macroexpand-all and not to break anything, I created a separate function named ux-mexp-all (see ux.el).  This is just fixed
cl-macroexpand-all.  It doesn't return anything non-fully expanded, and it correctly deals with quoted function names.  See this:

ELISP> (ux-labels ((double (x) (list x x)))
         (list 'double (double 10) #'double))
 (10 10)
   (list x x)))

ux-labels is labels adjusted for lexical scoping.  See the next section below.

3. flet and others in the lexical world

First, the lexical-let facility is deprecated now.  Likewise, all the macros that expand into lexical-let get also deprecated.  At this point, it seems to me there's only 1 such macro: flet.  Second, all macros that use cl-macroexpand-all
to expand their bodies have also problems (just because of cl-macroexpand-all).  They are: labels and macrolet.  I've created their counterparts: ux-labels and ux-mlet.  These use ux-mexp-all instead of cl-macroexpand-all.

As for lexical-let, the piece of advice is quite simple: just don't use it in lexical-scoping code at all.

4. A small fix to pcase.el

pcase is an Elisp library implementing ML-style pattern matching.  I was very
glad to see such a library appeared in Emacs.  Elisp programming gets more

The only problem I found is that:

(defun pcase--small-branch-p (code)
  (and (= 1 (length code))
       (or (not (consp (car code)))
           (let ((small t))
             (dolist (e (car code))
               (if (consp e) (setq small nil)))

It is very naive to check whether a piece of code is small by analyzing it directly.  It may contain any macro calls, and a single cons "(x)" may expand
into a huge piece of code.  So I changed that function to be this:

(defvar cl-macro-environment)
(declare-function ux-mexp-body "" (m e) t)

(defun pcase--small-branch-p (code)
  (setq code (ux-mexp-body code cl-macro-environment))
  (and (= 1 (length code))
       (or (not (consp (car code)))
           (let ((small t))
             (dolist (e (car code))
               (if (consp e) (setq small nil)))

The new function just performs a macroexpansion before checking anything. ux-mexp-body just mapcars over the body with ux-mexp-all, that is, expands each form with ux-mexp-all, and assembles the resulting expansions in a list (new body).

Saturday, May 12, 2012

Lexical scope

Emacs 24

Emacs 24 is going to get a significant feature: Emacs Lisp will natively support lexical scoping.  This might haven't impressed you much, and to a certain degree you're right: it doesn't imply any visible UI or behavioral improvements, nor it even implies any changes the end-user can really appreciate directly.  Nevertheless, the indirect consequences of introducing lexical scope support are hard to overestimate.

With lexical scope supported, a whole bunch of powerful abstractions become accessible.  Thus programmers get still more powerful instruments for making complicated programs and implementing fantastic features in Emacs.

The 24th version of Emacs is not released yet, but Alpha versions (pretests) have been accessible at GNU FTP server ( since 2011 autumn, and they have lexical scoping implemented.  (At the time of writing this, the latest pretest is 24.0.96.  I use 24.0.95.)  So there's no reason for Elisp programmers to waste time: pick up lexical Elisp right now !

For those who have started to play with lexically-scoped Elisp or are determined to start, I devote this post.

Lexical scoping vs. dynamic scoping

First off, dynamic scoping is pervasive in Emacs and it is not possible to refuse from it completely (which would be anyway unreasonable).  In fact, Emacs 24 supports both lexical and dynamic scoping. How do they coexist then ?

To put it short, there's a buffer-local variable named "lexical-binding" that is essentially a switch between lexical and dynamic scoping.

As for the interpreter, when it evaluates lambdas and the value of lexical-binding is t, it creates something which is called "a closure" by Elisp manual.  What is meant here is just a list whose car is eq to the symbol "closure".  The second element of the list serves the role of a captured lexical environment for that closure, in the form of alist (association list).  This is followed by an ordinary lambda list and a body.  Here's an example of a closure:

ELISP> (let ((x 12))
         #'(lambda (y)
             (* x y)))
 ((x . 12)
 (* x y))

To ensure that closures are nothing more than a list of a special shape, you can do the following:

ELISP> (fset 'func '(closure ((x1 . 1) (x2 . 1)) (x3)
                               (setq x1 x2
                                     x2 x3))))
 ((x1 . 1)
  (x2 . 1))
 (prog1 x1
   (setq x1 x2 x2 x3)))

ELISP> (func 10)
ELISP> (func 10)
ELISP> (func 12)
ELISP> (func 13)
ELISP> (func 22)
ELISP> (func 109)

(Above is something like a finite state transducer that returns its input sequence shifted back by 2 positions.)

As for the byte compiler, this is also the "lexical-binding" variable who chooses the way to go.  When the byte compiler generates code, it looks at that value; if it is t, then it generates code in such a way that closures get created at runtime, in the form of usual byte-compiled function objects.

Look at the following function:

(defun make-multiplier (x)
  (lambda (y)
    (* x y)))

If we try to compile this, here's what we get:

byte code for make-multiplier:
  doc:   ...
  args: 257
0    constant  make-byte-code
1    constant  257
2    constant  "\300 _\207"
3    constant  vconcat
4    constant  vector
5    stack-ref 5
6    call      1
7    constant  []
8    call      2
9    constant  3
10    constant  "\n\n(fn Y)"
11    call      5
12    return     

So make-multiplier calls make-byte-code at runtime to create a fresh byte-compiled function object, and returns it.  Each call to make-multiplier leads to a new distinct function being created and returned.  Each of them will be closed over distinct location named "x" in the code above.  This is exactly how closures and lexical scoping are supposed to work.

In the case when lexical-binding is nil, the old behavior is in effect: simple lambdas are created instead of closures (interpreter) & function objects get compiled at byte-compile time rather than arranged to be created dynamically at runtime (byte-compiler).


When lexical scoping is on, the "defvar" special form can be used to make particular bindings be dynamic instead of lexical (which is default).

I haven't found a thorough explanation of how defvar now works with regards to lexical scoping, that's why I've made some research and will now try to share the results with you.

Value is present

First of all, it must be noted that a new symbol attribute has been introduced; now every symbol has a bit flag indicating whether that symbol names a special variable or not.  You can read that flag with "special-variable-p" predicate.  The latter's source in C is very simple, here it is:

DEFUN ("special-variable-p", Fspecial_variable_p, Sspecial_variable_p, 1, 1, 0,
       doc: /* ...  */)
  (Lisp_Object symbol)
   CHECK_SYMBOL (symbol);
   return XSYMBOL (symbol)->declared_special ? Qt : Qnil;

So it just checks a declared_special bit field.

Second, when you evaluate "(defvar <var> <val>)", the variable <var> receives the property of being special, that is, any symbol you put in place of <var> gets that declared_special bit set.  For example:

ELISP> (defvar my-var 10)
ELISP> lexical-binding
ELISP> (let ((my-var 12))
         (lambda ()
 nil my-var)

See that the closure hasn't captured my-var ?  In fact, let just checks whether my-var has been declared special or not, by looking into its declared_special field.  In this example, my-var is really a special variable, so let binds it the same way it did that in all versions of Emacs prior to 24.

As for byte compilation, the byte compiler behaves similarly: if it sees a "let" form binding a variable which has declared_special bit set, it generates the code which performs dynamic binding of that variable at runtime (with a byte code primitive named "varbind").  Otherwise the binding is lexical, and the code being generated just manipulates the stack storing values there, without any changes to the symbol-value slot of the variable being bound.

Value is not present

That's not all, though.  When you evaluate "(defvar <var>)" without specifying any value, defvar doesn't set the declared_special flag of <var>.

When interpreting Elisp code, defvar just remembers that <var> is special by putting it onto the alist stored in the internal variable Vinterpreter_internal_environment.  (This variable is not exposed to Elisp layer at all; it is just C variable.)  This alist is of the same form and semantics as closures' captured lexical environments mentioned above:  a cons cell element (VAR . VAL) means that VAR is a lexical variable in effect, having the value VAL.  When a plain symbol (non-cons) is encountered in the alist, it means that this symbol is a dynamically-bound variable, and should be dynamically bound by all subsequent lets.

So, now you understand how the interpreter works regarding lexical environments: it just tracks (with Vinterpreter_internal_environment) the environment during the evaluation process by augmenting it within let blocks and in defvar forms.  When the interpreter has to create a closure, it does so by giving it the current value of this Vinterpreter_internal_environment variable.  That value is what is captured by the closure.

Let's see something concrete:

ELISP> (setq x
             (let ((w 10))
               (defvar w)
               #'(lambda (y)
                   (+ w y))))
  (w . 10)
 (+ w y))
ELISP> (funcall x 12)
*** Eval error ***  Symbol's value as variable is void: w

The closure has captured a lexical environment "(w (w . 10) t)".  We'll examine it from end to beginning:

  1. t -- for some reason, all closures contain t there.  It says that the variable "t" is dynamic (not captured).  This is by all evidence just an implementational technique chosen by people who worked on this.  You can simply pay no attention at those t symbols captured.
  2. (w . 10) -- this was pushed onto Vinterpreter_internal_environment by let.
  3. w -- this was added by defvar.  This entry does not shadow the previous one, and the resulting closure captures "w" with the value of 10.
Now see what happens if we move defvar outside of let:

ELISP> (progn
         (defvar w)
         (let ((w 10))
           #'(lambda (y)
               (+ w y))))
 (w t)
 (+ w y))
ELISP> (funcall (progn
                  (defvar w)
                  (let ((w 10))
                    #'(lambda (y)
                        (+ w y))))
*** Eval error ***  Symbol's value as variable is void: w

In this case, the closure's captured environment doesn't have "(w . 10)" at all, since by the moment let is evaluated, the interpreter is already aware that "w" has been declared dynamic (by appearing inside Vinterpreter_internal_environment as a plain symbol). 

Valueless defvar and byte compilation

The byte compiler also treats valueless defvars specially in some cases.  But don't expect it to provide the same level of convenience as Common Lisp does.  After all, that functionality has not yet been released, and Elisp is not a general-purpose language.

To catch the point, let's start with examples:

(defun make-multiplier ()
  (defvar m1-x)
  (let ((m1-x (buffer-size)))
    #'(lambda (m2)
        (* m1-x m2))))

This is compiled into:

byte code for make-multiplier:
  doc:   ...
  args: 0
0    constant  buffer-size
1    call      0
2    varbind   m1-x
3    constant  make-byte-code
4    constant  257
5    constant  "\300 _\207"
6    constant  vconcat
7    constant  vector
8    varref    m1-x
9    call      1
10    constant  []
11    call      2
12    constant  3
13    constant  "\n\n(fn M2)"
14    call      5
15    unbind    1
16    return     

So, when a variable has been declared special with defvar and then bound with let, what we end up with is a dynamic binding ("varbind m1-x").  This behavior is the same as in interpreting case.

Now try to evaluate (M-:) this:

(disassemble (make-multiplier))

You get something like:

byte code:
  doc:   ...
  args: 257
0    constant  349
1    stack-ref 1
2    mult     
3    return

This means that m1-x was captured by the closure resulting from make-multiplier, although it was bound dynamically with the enclosing let.  This behavior is contrary to that in interpreting case.  N'est-ce pas ?

Putting "(defvar m1-x)" at the beginning of lambda body doesn't change anything, nor helps inserting "(defvar m1-x)" at the beginning of let.  In all cases m1-x is captured as a lexical variable.

The conclusion is this: when the byte compiler figures out whether to capture an enclosing variable, it considers only its declared_special attribute.  Try to place a defvar with some value specified, and you will see that m1-x is not captured as lexical any more.


So how to use all that ?  Conclusion

I would recommend everybody to use a couple of approaches in dealing with scoping stuff:

1. If you don't need lexical binding in a particular file (=library), just don't use it and that's all.  Otherwise, see item 2.

2. If you do need lexical binding, then add a file-local variable "lexical-binding" (use "add-file-local-variable-prop-line" for that).  You'll now have lexical scoping in effect by default.

3. If you want to create a (possibly internal within your library) global variable, feel free to do this at file level with defvar.  Your variable will be dynamic throughout your Emacs system: it won't be captured by closures and let-bindings of it will act as dynamic ones.  The same is true about defconst, by the way.

4. Previous 3 items cover most of typical Elisp programming cases and needs.  If you want to be able to use something like:

(let ((y 10))
  (declare (special y))

as you would do in Common Lisp, and get equivalent behavior and sematics, then just forget about this and look for other ways around :)  Keep in mind that defvar won't do the same job for Elisp as the "special" declaration currently does for CL; defvar is much less powerful.