Rethinking package organization

Terminology

As it is used here, "package" means basically a unit of software that "goes together". It includes almost anything you'd download and make, or buy shrink-wrapped and install. The term includes almost any application you can think of, and all sorts of add-ons. It excludes some boot software, the main system directory tree itself, everything in the home directory tree, devices pretending to be files in /dev/*, and prolly a few other odds and ends that aren't like real software.

Motivation

Let's admit it, Unix systems (and other systems) get incredibly messy. Keeping some semblance of order is a constant chore, and even with the best efforts, /bin, /etc, /usr/doc, etc are fearsome to behold unfiltered. Finding all the files a package owns is a chore, and guaranteeing that you've found them all is practically impossible. The inverse task, finding what package owns a file, is just as bad.

ISTM this situation comes about because there is no overall design for Unix systems. There is, you may object, a scheme where /bin holds binaries, etc. That would be tolerable for tiny, minute systems with maybe half a dozen packages (still a bit of a chore, but cancels out the complexity of finding */bin/* and so forth in multiple places). But it simply does not scale to modern systems.

There are also security issues. Letting a package do as it pleases in shared directories is less safe. Admittedly, this is not yet a serious problem.

Uninstalling is also an issue. How many packages provide routines to uninstall themselves? How many miss a few files? Uninstalling could be much easier: "rm -rf PACKAGE-DIR".

Overview

A package would be contained entirely within a directory tree, which would contain nothing but that package. "Install" would disappear. In place of install, packages would simply supply a config file that would indicate what canonical things they provided (executable, info docs, html docs, man pages, libs, source modules, source headers, etc)

The details

How it would work

First, packages must expect to be built this way. For most existing packages, leaving out "make install" will do it.

To ensure that the package doesn't write outside its own tree, none of the building tools (compilers, make, makeinfo etc) would support building anything outside the given directory tree, not even indirectly thru ".." or symbolic links. This is not to suggest that many packages are malicious, but many do try to build outside their given directory, and in any case making your system more secure is good.

However, existing tools don't support this. So this would be ensured by building in an environment that doesn't allow writing to outside a given tree. In effect, building in a "sandbox". Existing tools (cc, make, etc) run by that user would inherit those permissions

This could be ensured by creating a temporary user, `builder', who was not permitted to write outside that tree. Tools run by that user would inherit those permissions and would not be able to exploit loopholes, such as making symbolic links (ln) to other directories and then following them. At the end of the build process, the directory tree would be recursively chown'ed to some other id that properly owned the package.

One loophole that needs plugging is that `builder' should not be able to write even to world-writeable resources it does not own.

How the system would find a package's stuff

The natural first question about such a design is how it would handle various search-path type things: path, infopath, man pages, doc files, include files. In the status quo, frequently these things are found because thay are put into big directories that serve the entire system, like /usr/bin, /usr/man, and /usr/doc.

Inside the directory would be a canonically-named overview file that indicated where various pieces were located: executables, doc files, info files, X resource subfiles, X wmconfig files, etc. The suggested format, for easy parsing and manipulation, is s-expressions:

      (
	  (executables "bin/*")
	  (info-docs   "texi/*.info")
	  (X-resources "config/X/*"))

How this info is used would be idiomatic to the host system. EG, one system mite build a persisting PATH, /usr/info/dir, and so forth from that information. Another system mite read all the overview files every time it was booted. Another system mite collect the information in one place in its own format and work from that.

This design is easily extensible to non-standard requirements. EG, for sub-packages to a packages (eg elisp packages for emacs) or for systems with new capabilities (perhaps virtual-reality docs would be common in 5 years, or natural-language interaction docs). One extension is to use different type of overview file with a different name. Another is to allow extensions to the overview format.

Allocation

Sometimes it could make sense for different "parts" of a package, eg source or logs, to be stored on a particular disk, partition, server, etc. EG, you mite want all the binaries to be stored on one partition, because they're frequently read. All the doc files mite be on a slower disk - they're not read very frequently, but when they're needed, they have to be available.

The situation is analogous to different types of allocation in a programming language that manages memory manually. ISTM it's possible to handle it in an analogous way.

The partitions etc in question would have "alloc" directories that would indirectly hold the relevant parts of the package. These alloc directories would not be directly installed into. Instead, the build would request a fresh subdirectory in that directory, and would symbolically link to it. In the normal course of system-use, nothing would ever directly read or write to any alloc subdirectory, but would only access it thru the symbolic link from the package directory tree.

NB, this is a deviation from the current build process. A build would need to somehow indicate that it was building that sort of resource. The most backward-compatible way of supporting alloc is by looking at the overview file after the build proper is finished, and moving the relevant files or tree to the target partition and replacing them with symbol links.

Per-user resources

Another natural question about this design is, what about per-user resources? Obviously most or all per-user things (config files, etc) belong in the users' home directories or subtrees thereof. Is this an exception to the "no outside building" rule and if it is, is it a good one?

It may be an issue for systems with just one or a handful of users. When such systems want to erase a package wholly (not just upgrade), its resources aren't totally reclaimed if it leaves configs etc in each home directory. But that's extremely rare and not burdensome, so IMO it's best to just make the exception.

However, it would make sense for packages' overview files to also indicate exactly what home-directory resources they "own". If nothing else, this could support a uniform "what-owns FILE" command. So the overview file (above) could contain a line like:

      (per-user    "foo/*")

which would indicate that it used the subdirectory "foo" in a users' home directory.

This would allow name clashes, but ideas to defeat name-clashes basically involve mimicing the physical or logical layout of the overall package organization. This means that packages mustn't be moved, that packages must know how the system lays them out, that users must know how the system lays packages out, and that the user must do more navigating in their own home directory. The drawbacks are too serious and the name clashes too rare.

Loading the basics

Of course, a system has to bootstrap somehow. Some parts of the system would have to be loaded before the system could be controlled in this manner.

However, even most of the code to control the system could be arranged in this format, and once it was running, could obey its own rules.

What bootstrapping code absolutely must execute before package subtrees can be respected? A boot image. A path to find the start-up binaries. Not much else.

Would where to unpack src files etc be an issue?

It would be "clean" to unpack into the same directory tree that one builds in. But ISTM it isn't neccessary for this design to work.

One approach is to consider unpacking a stage of building, and thus allow it to request allocations as above. The unpack and make stages could be done at the time or at separate times.

Issue: Shared support

Suppose that two packages share some extensions. EG, two packages both use the Guile scripting language, and sometimes new Guile files are useful for both.

We definitely don't want to put shared stuff into one package's directory. Part of the idea with this design is that a package owns is co-terminous with a single directory tree, and may be deleted without affecting any other package. For similar reasons, we don't want to simply make a shared link to that, which also risks accidentally deleting the shared support.

So shared support would have to go into its own directory-tree. That makes some inroads on the idea that a package is coterminous with its support. But a single package can't realistically be in too many pieces anyways. If a package had a dozen types of shared resources, all of which mite realistically be shared with other packages of equal standing, someone would have split it up already, as well they should.

So ISTM there should be no special structural provision for shared resources. In the worst case, it's no great burden. In general, it conforms to the actual structure of the total software better.

Shared resources present more of an issue wrt finding them when building than wrt placing them, but that is really an autoconf / resource-class / "what-provides X" sort of issue, not a package structure issue. Placing packages in a more organized manner can only help this.