Missing History

From 2017.06.28 onward, this Fossil repository has the full checkin-by-checkin history of the MySQL++ project, plus material such as wiki pages like this one which never was available in the old Subversion repository.

Prior to that date, the repository contains every released version of MySQL++ from the project's founding in 1998 with version 0.64.1.1.a up until the 3.2.3 release made at the end of 2016.

What Happened?

Our prior version control host ceased operations without notifying me via any channel I pay attention to, and they didn't offer a method to download the archived Subversion repository after the fact. That left me with only the release tarballs and a few local version checkouts from which to reconstruct the project history.

I wrote a conversion script that unpacked each release tarball in turn, removed all generated files according to a set of ignore rules, then checked the differences relative to the prior tarball into a fresh Fossil repository. This gives us a release-by-release commit history prior to 2017, with a single exception: the version 1.7.33 tarball I have archived is damaged, so I elected not to check that partial release into this Fossil repository, causing the revision history to skip from version 1.7.32 to 1.7.34.

Because Fossil is a distributed version control system designed around a philosophy of ensuring that each clone diverges as little as is practical from the other clones, the only way this can ever happen again is if everyone who has a clone of MySQL++'s repository deletes all their copies. So long as there is at least one person on the planet with an up-to-date clone, the MySQL++ development history will be preserved.

The Value of History

If I'd chosen to migrate the MySQL++ project to Fossil before losing access to the old Subversion repo, I'd have a checkin-by-checkin conversion instead of this present release-level conversion, the primary benefit of which would be to get all of the pre-2017 checkin comments. We best we're left with is the ChangeLog.md file, which was distilled from that comment stream.

The thing is, that commit comment stream is largely a matter of historical curiosity. If I could wave a magic wand and go back in time to do a commit-level conversion before the old repo host down, the resulting Fossil repository would be larger, but it wouldn't be appreciably more useful. MySQL++ long since became stable enough that old versions are of very little practical interest. Current versions build and work on pretty much every major OS released in the past 10 years or more, and older OSes often have a contemporary MySQL++ package built for them. There simply is very little call for digging into past versions via this version control system.

Having confronted that reality, the question then came down to how much history to try and preserve in the conversion.

I considered checking in only a handful of epochal versions and leaving it at that. Say, 1.7.9, 1.7.40, plus a selection of 2.x and 3.x releases showing off major stages of the library's development, capped by the current release at the time of conversion, 3.2.3. I rejected this because in all the time of watching the project's old mailing list, I did not often see people going back to one of these important historical versions.

I therefore decided that what was most important was to scatter the project's complete release history far and wide, so as to avoid a calamity like this again. I wanted everyone who cloned this Fossil repository to have the ability to replace the current project home page if I disappeared like Gna did. That's the beauty of DVCSes: we have plenty of other examples of source code repository hosts disappearing, necrotizing, or becoming evil.

So Why Is It Smaller?

As of this writing, the size of the cloned repository size is about the same size as the latest release tarball: each are about 4 MiB. How can this be if the repository contains every released version? There are a couple of major sources of savings.

File Removal

The primary reason for the size savings is that the conversion script removed all the files that can be generated from another file also checked in. For the most part, these generated files are included in release tarballs only as a convenience to the end user of the MySQL++ library, since generating them is a rather involved process, requiring tools that aren't always easily available.

In rough order of size, the sources of bloat in the release tarballs relative to this Fossil repository are:

Documentation: The user and reference manuals are currently generated from the DocBook and Doxygen source material, respectively. The Fossil repository only has that source material checked in, not the HTML, PDF, PostScript, DVI, PNG, GIF, and other files generated from these sources.
Build System Files: The Fossil repository does not contain any of the Makefiles or IDE project files generated from mysql++.bkl by Bakefile. Similarly, past release checkins are missing the outputs from the GNU Autotools called by the contemporaneous versions of the bootstrap script.
Generated Headers: There are a couple of Perl scripts in MySQL++ that generate header files that would be tedious to maintain by hand. This repository includes only the Perl scripts, not the headers they generate. The release tarballs contain the headers as well, so the end user of the library doesn't have to have Perl. (This is a particular bother on Windows, which is one reason we recommend bootstrapping the library on a POSIX type system.)
.in File Outputs: The build system generates several files included in release tarballs from a file with the same name plus a *.in extension.
Junk: The release tarballs sometimes included files that never should have been included, such as editor temporaries, old backup files, and files trivially generated from another that we were already going to include.

See the ignore rule list if you want to know what all is purposely missing from the Fossil repository as compared to the release tarballs.

There are two processes that generate most of the above: the the bootstrap process and make doc. End users of the library are not expected to be able to do either easily, whereas we do burden users of the Fossil repository with that.

Compression

There is a second reason for the svelte Fossil clone size: Fossil employs a 2-level compression scheme that ruthlessly squishes out redundancy between checkins.

The first level is delta compression, which stores only the differences between each file that exists in each sibling checkin pair. If a file only had a few characters added to it between releases, only those few characters and a bit of metadata is stored in the repository.

On top of that, Fossil uses gzip compression for each stored artifact. With a primarily-text based repository like this one, that can result in huge savings.

As of this writing, this Fossil repository enjoys an overall 11:1 compression ratio as compared to checking out each version on disk separately.