Zachary Weinberg
CS 260
31 January 2002

Project proposal

This project will investigate novel methods of visualization for the file system hierarchies (file trees) with history.

The problem

It is common for the source code for a software application to be stored in the form of a complex tree of files, with the history of each file recorded. There are several systems for doing the recording, such as CVS. If the application has been under continuous development for even a few years, the history becomes difficult to grasp all at once, requiring machine assistance to pick out the interesting events.

Existing software for sorting through revision history is fairly weak. Beginning with the simplest case, most systems have built-in tools to display the change log entries for a single file. This shows only the comments made by the person who applied the change; seeing the change itself takes more work. There is often no way to associate changes made to several files at once, and no way to filter out changes which are not currently interesting.

When the history is not a simple linear progression of changes, but has been "branched" for whatever reason, the problem becomes more complex. For instance, it is frequently desirable to know whether or not a specific change appears on each of several branches, but some systems keep no records at all of when changes were copied between branches; even if they do, it is not clear whether just one modification was copied, or the entire branch was merged back in.

Finally, it is desirable to tie history browsing together with cross-reference utilities such as LXR or cscope. The only tool I am aware of which even begins to attempt this is ClearCase.

Existing partial solutions

The Subversion project attempts to address some of the more egregious failings in CVS, including its poor handling of branching and its nonexistent handling of file renames. As of yet they have no visualization tools.

Another CVS replacement, Bitkeeper, does have some impressive visualization tools, making it easy to walk back in time and associate changes across files. Unfortunately, it has no branches (they have been six months away for three years).

The Mozilla project built an interesting tool known as Bonsai, which can perform sophisticated queries against a CVS repository.

Two different research papers propose to address the difficulty of finding changes by making the "grain" at which changes are recorded much smaller: Magnusson et al., Fine-Grained Revision Control for Collaborative Software Development; Wagner and Graham, Efficient Self-Versioning Documents.

Ideas from other domains

The problem of displaying the interconnections between files of source code is analogous to the problem of displaying the interconnections in hypertext. This has been analyzed in depth by Ted Nelson: see this article on what he calls xanalogical structure. He has also proposed a system known as ZigZag which offers a clever way to represent a multidimensional graph; this is exactly what a revision tree is.

David Gelernter's "lifestreams" concept is intended as a replacement for the whole notion of hierarchical file systems, but it could also be used as a better way to represent the history of a source tree. In particular, the "substreaming" mechanism offers a natural interface to finding changes which match criteria, and the notion of an explicit, adjustable "current time" could inspire a better way to browse through one file's history.

The proposal

In this project I will research the above possible solutions in more detail, and investigate other possibilities. I will then design and mock-up an interface which incorporates these ideas, and subject it to first pass user testing to see how well it works in real life. Time permitting, I will complete an implementation.