Emacs Lisp 发展历史

Emacs History

  • Any sufficiently complicated C or Fortran program contains an ad-hoc, informally-specified, bug-ridden, slow implementation of half of Common Lisp.

Basic Language Design

  • Also like MacLisp, Emacs Lisp was (and still is) a Lisp-2 language [Steele and Gabriel 1993], which means that

    1. the namespaces for functions and ordinary values are separate
    2. and to call a function bound to a variable, a program must use funcall.
    3. Also, symbols can be used as function values.

Symbols and Dynamic Scoping

Like any Lisp, Emacs Lisp has always had a symbol data type: The expression ’emacs yields a symbol named emacs. One way to look at symbols is that they are immutable strings with a fast equality check and usually fast hashing. This makes them suitable for values representing enumerations or options. Another is that symbols can represent names in an Emacs Lisp program. Symbols are thus a key feature to make Emacs Lisp homoiconic, meaning that each Emacs Lisp form can be represented by a data structure called an S-expression that prints out the same as the form.

Richard Stallman chose dynamic scoping to meet extensibility requirements in Emacs [Stallman 1981]. The Emacs code base uses variables to hold configuration options. It would in principle be possible to structure the code base to pass configuration options explicitly, but this would make the code verbose, and require changing many function signatures once a new configuration option gets added to the system.

Using Lexical Binding

Even when lexical binding is enabled, certain variables will continue to be dynamically bound. These are called special variables. Every variable that has been defined with defvar, defcustom or defconst is a special variable (see Defining Variables). All other variables are subject to lexical binding.

Backquote

Quote is usually introduced via ' where the reader translates 'SEXP to (quote SEXP), and this expression evaluates to the value SEXP. For example:

1
'(a (b c d) c)

evalutes to a list whose first element is the symbol a, the second a list consisting of elements b, c, and d, and whose third element is c. It does so in constant time, by always returning the same value already present in the source code returned by the reader.

Hook

One important aspect of the extensibility Richard Stallman originally conceived for Emacs was the ability to make existing functions run additional code without having to change them, so as to extend their behavior. Emacs supports this at well-defined points called hooks.

Of course, authors do not always have the foresight to place hooks where users need them, so in 1992, the advice.el package was added to Emacs 19, providing a defadvice macro duplicating a design available in MacLisp and Lisp Machines.

1
2
3
4
(defadvice eval-region (around cl-read activate)
  "Use the reader::read instead of the original read if cl-read-active."
  (with-elisp-eval-region (not cl-read-active)
                          ad-do-it))

Furthermore, the way the defadvice macro gave access to function arguments did not work with lexical scoping. This was solved in late 2012: as a result of a discussion with a user asking how to put several functions into a single variable, Stefan Monnier developed a new package nadvice.el that can not only combine several functions into a single variable, but uses that to provide the same core features as defadvice but with a much simpler design.

For example, the old advice system had special features to control whether a piece of advice should be compiled whereas in the new system there is no need for that because a piece of advice is simply a normal function. Similarly, the old advice system has special primitives to get access to the function’s arguments, whereas in the new system, the function’s original arguments are passed as normal arguments to the piece of advice and can hence be accessed without any special construct. The nadvice.el package was released as part of Emacs 24.4, and has proved popular in the sense that it has largely replaced the old defadvice in new code, but rather few packages that used defadvice have been converted to the new system.

Interactive Functions

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
(defun forward-symbol (arg)
  "Move point to the next position that is the end of a symbol.
A symbol is any sequence of characters that are in either the word constituent or symbol constituent syntax class.
With prefix argument ARG, do it ARG times if positive, or move backwards ARG times if negative."
  (interactive "^p")
  (if (natnump arg)
      (re-search-forward "\\(\\sw\\|\\s_\\)+" nil 'move arg)
    (while (< arg 0)
      (if (re-search-backward "\\(\\sw\\|\\s_\\)+" nil 'move) (skip-syntax-backward "w_"))
      (setq arg (1+ arg)))))

In the above example, p means that the function accepts a prefix argument: The user can type C-u number prior to invoking the function, and the number will be passed to the function as an argument, in this case as arg. The ^ (a more recent addition) has to do with region selection with a pressed shift key–it may lead to the region being activated.

Buffer-local variables

Variables can be both buffer-local and dynamically bound at the same time:

1
2
3
(let ((buffer-file-name "/home/rms/.emacs"))
  (with-current-buffer "some-other-buffer"
    buffer-file-name))

This example will not return /home/rms/.emacs but the buffer-local value of buffer-file-name in the buffer some-other-buffer instead, because with-current-buffer temporarily changes which buffer is current.

String

Emacs Lisp included support for string objects from the beginning of course. Originally, they were just byte arrays. In 1992, during the early development of Emacs 19, this basic type was extended by Joseph Arceneaux with support for text properties.

Each char of a string can be annotated with a set of properties, mapping property names to values, where property names can be any symbol. This can be used to carry information such as color and font to use when displaying the various parts of the string.

Along with buffer-local variables (Section 4.10), this is one of the cases where the core Emacs Lisp language has been extended with a feature that specifically caters to the needs of the Emacs text editor, to keep track of rendering information. It is more generally useful, of course, and turns strings into a much fancier datatype than in most other languages.

IO

Rather than follow the usual design based on file or stream objects and primitives like open/read/write/close, Emacs Lisp offers only coarser access to files via two primitive functions insert-file-contents and write-region that transfer file contents between a file and a buffer. So all file manipulation takes place by reading the file into a buffer, performing the desired manipulation in this buffer, and then writing the result back into the file.

Since this approach does not extend naturally to interaction with external processes or remote hosts, these are handled in a completely different way. Primitive functions spawn a sub-process or open up a connection to a remote host and return a so-called process object. These objects behave a bit like streams, with process-send-string corresponding to the traditional write, but where the traditional read is replaced by execution of a callback whenever data is received from the sub-process or the remote host.

Basic Language Implementation

Byte-order Interpreter

Emacs has two execution engines for Emacs Lisp,

  • One is a very simple interpreter written in C operating directly on the S-expression representation of the source code.
  • The other is a byte-code engine, implemented as a primitive byte-code function that interprets its string argument as a sequence of stack-based byte codes to execute. A compiler, written in Emacs Lisp, translates Emacs Lisp code to that byte-code language.

While Emacs Lisp is basically a "run of the mill" programming language, with some specific functions tailored to the particular use of a text editor, this byte-code language is much less standard since it includes many byte codes corresponding to Emacs Lisp primitives such as forward-char, insert, or current-column.

Tail-Call Optimization

Emacs Lisp does not implement proper tail calls [Clinger 1998]: Each function call consumes stack space.

Post-XEmacs period

This section discusses some notable evolution of the design of Emacs Lisp after 2010 which have not found their way into XEmacs so far.

Lexical Scoping

Dynamic scoping had two main drawbacks in practice:

  • The lack of closures
  • The global visibility of variable names

Finally in 2010 Igor Kuzmin worked on a summer project under the direction of Stefan Monnier, in which he tried to add lexical scoping to the byte-code compiler differently: instead of directly adding support for lexical scoping and closures to the single-pass byte-code compiler code (which required significant changes to the code and was made more complex by the need to fit into a single pass), he implemented a separate pass (itself split into two passes) to perform a traditional closure conversion as a pre-processing step. This freed the closure conversion from the constraints imposed by the design of the single-pass byte-code compiler, making it much easier to implement, and it also significantly reduced the amount of changes needed in the byte-code compiler, thus reducing the risk of introducing regressions.

Two years later, Emacs 24.1 was released with support for lexical scoping based on Miles Bader’s lexbind branch combined with Igor’s closure conversion.

Pattern Matching

While working on the lexical-binding feature, Stefan Monnier grew increasingly frustrated with the shape of the code used to traverse the abstract syntax tree, littered with car, cdr carrying too little information, compared to the kind of code he would write for that in statically-typed functional languages with algebraic datatypes.

So the pcase.el package was born, first released as part of Emacs 24.1, and used extensively in the part of the byte-code compiler providing support for lexical scoping.

CL-Lib

During the development of Emacs 24.3 the issue of better integration of the cl.el package came up again. The main point of pressure was the desire to use cl.el functions within packages bundled with Emacs. Richard Stallman still opposed it, but this time, a compromise was found: replace the cl.el package with a new package cl-lib.el that provides the same facilities, but with names that all use the cl- prefix. This way, the cl-lib.el package does not turn Emacs Lisp into Common Lisp, but instead provides Common Lisp facilities under its own namespace, leaving Emacs Lisp free to evolve in its own way.