Rethinking Lisp's format-strings

Motivation

Having a format string in C is one thing. C can't easily deal with dynamic types, so passing a format strings that tells printf "how to deal with each object I'm sending you next" is at least justifiable. Lisp has never had that problem, and therefore never had a real need for format strings.

And let's admit it, Lisp's format strings are byzantine. Every combination of @/:/directive-letter is made to mean something, regardless how odd and non-orthogonal that thing may be. That sort of ultra-succint, gotta-memorize-what-it-means syntax is antithetical to Lisp IMO. Not even C tries to ram data together that badly. It's as semantically crowded as Perl.

Format strings have directives that recapitulate mapcar, directives to print roman numerals, directives to print English-language plurals. I could go on, but the point is that format strings a bloated mess, and we can kick them out of Lisp.

A proposal

In the broad view, the proposal is to handle formatting much the same way Lisp does everything else. Lisp already has the perfect data structure for all that sequential formatting info: Lists. How'd we ever forget about them?

So perhaps calling this a proposal is misleading, because all the elements are already in Lisp. I had no trouble writing proof-of-concept code without even thinking about changing anything deep. So this is really a proposal to do things by certain conventions. Some specific functions and conventions are proposed.

String-trees

A string-tree is recursively defined as either a string or a list of string-trees. That entails that flattening them always produces a list of strings.

I propose that formatting functions be only required to return string-trees. Any kind of final outputter would flatten the tree to a list of strings and output those strings in order.

This convention makes it simple to write combinable something-to-printable behavior. Just gather the string-trees in order into a list and return that list. mapcar and some of its friends can play too.

The string-tree convention is efficient. Flexibly making output from two or more strings entails joining the strings into a common object. Joining two strings in a cons cell is the most efficient general-purpose way of doing that.

The string-tree convention is extensible. If some other type of data should be part of the output stream, eg multibyte characters or binary font objects, it is easy to define the underlying type (which now is string) as including that. You won't have to change anything that doesn't directly handle the new functionality.

print-string-tree

print-string-tree would take a string-tree, flatten it, and print resulting strings and nothing else on a given stream, in order. It would not control formatting.

obj-to-string-tree

obj-to-string-tree is similar to write-to-string, but returns a string-tree rather than a string. Note that this is easier, because a string is always a string-tree. write-to-string is, in effect, an implementation of obj-to-string-tree

obj-to-string-tree takes the same keys as write.

fmt-lambda

I introduce fmt-lambda first because it is more similar to conventional format strings and it's simpler, even tho I find it less useful than format by key.

The first argument to fmt-lambda would be a list of string-trees and lambda forms of one argument. The lambda forms would be applied to the rest of the arguments, and the result collected in order into a list which will be a string-tree.

It is an error if any lambda form does not accept 1 argument.

When safety is optimized for, it is an error if any form produces a result which is not a string-tree, and if any argument is neither a lambda form nor a string-tree.

Format by key

In essence, this is formatting by name rather than by position.

The function `fmt' would take a list of string-trees and other objects. Objects other than lists and strings would (as if) have obj-to-string-tree applied to them.

When safety is optimized for, it is an error if any parm is a list but is not a string-tree.

Note that any argument can be formatted in any way simply by passing it as an argument to the appropriate object-to-string function, and using that.

Conciseness

One area where this proposal has a disadvantage is wrt conciseness. Why? Because it has to share its namespace with Lisp symbols in general, while format strings don't have to share.

So I propose that the common functions have abbreviated names.

print-string-tree would be abbreviated `prt'.

obj-to-string-tree would have 3 forwarding functions, named after the defunct format control strings:

Package

Printing and formatting were part of Lisp so early that they are not properly in a package. This change is a good opportunity to package them.

But natural-language-specific features like English plurals would move into their own package specific to that language.