BuildSystem's Subversion Repository Improvements

From Mandriva Community Wiki

Jump to: navigation, search

Our current subversion layout suffers mainly a problem that it will ever grow without any clean and easy way to control it. Said that, it also becomes harder and harder to manage backups/reloads if they get needed.

We are very tight in free disk space on the server due to this, so we should take the actions quickly.

So, first we will list all current problems we are aware of and then we are going to propose some ideas. Please, feel free to fill the lists if you have new suggestions.

Contents

[edit] Current problems

  • Ever growing repository: tends to infinite. Hard to becomes hard to manage (specially backups and restores) and risky.
  • Development tarballs uploaded as test packages during cooker development are immortals

[edit] Proposals

[edit] Split packages/ even more

Currently, we use the layout describe here, with all distro versions included in one big main repository.

This idea would be split this packages/ repository into several smaller ones, one for each distro version. We would create a new repository each time we start working into a new Distribution version. Layout would be something like:

http://svn.mandriva.com/svn/packages/2008.1/
|- cooker/
|- release/
|- misc/
|- updates/
`- updates_releases/

Where cooker/ would hold the cooker development until the distro is released, then it would be freeze by making a R/O copy of it to release/. misc/ is still the same (for changelogs) and the usual updates* stuff.

[edit] Pros

  • Easy to "turn off" an old repository
  • Smaller repositories, easy to check, make backups and handle recovers
  • Slightly better control of the commits present/submitted during the final distro release days

[edit] Cons

  • Direct changelogs are 6 months old only: you beyond that you have to check another repository
  • We have to svn switch all checkouts when a distro is released
  • Backports might get confusing

[edit] Do not store tarballs into it

There is an idea flowing out there to not store upstream tarballs into the repository: they would be pulled up from a storage pool or downloaded from upstream sites when they are wanted, via repsys/mdvsys getsrpm <package> command. Bogdano said he can easily adapt repsys to work by this scheme.

That said, we would not store any upstream tarballs into the svn. SOURCES/ would get much smaller and would cause a considerable overall size reduction, and by now something like >~90% are source tarballs.

For this, we would have to a) adapt repsys/mdvsys and the developers checkout procedure, as a plain svn checkout may not get what you want anymore (it would be based on repsys/mdvsys checkout command) and b) think on a way to "upload" this source tarballs to a local pool if they can't be downloaded on-the-fly from upstream servers (this may not be wanted at all too).

[edit] Pros

  • From the packager POV, easily to live with
  • Faster new upstream commits
  • Once in place, easier to administrate than the split repository.
  • Easy to release new distros
  • No changes on repository layouts
  • Svn mirrors becomes very light weight

[edit] Cons

  • Direct svn checkout commands may not get what you want anymore
  • repsys/mdvsys tools becomes even more mandatory (is this really a Cons? :)
  • no way to directly rebuild an older version of the package if it's not in the pool
    • I think tarballs of released versions should be kept "forever", plus we should have a policy on development tarballs lifetime (maybe "all tarballs which were used in a release cycle") --spuk
  • we move the space problem from svn to this new pool. The only additional control we get is that we can limit the size of this pool, that is, limit how many older versions we want to keep
    • plus we can use external xdelta to keep space usage very low --spuk
  • as noted by Andreas, there are security implications on trusting external tarballs
    • for this proposal, we should keep hashes/signatures in SVN and verify them pre-build, I believe http://rpm5.org/ has some new spec syntax on the way, for specifying signatures/hashes to be checked at build; we could also keep some per package file, like $package.{md5,sha256} (Fedora has a sources file in pkgdir that contains the hashes --AnssiHannula)

[edit] Store tarballs in another repository

[pixel] explanation by example, after a "repsys co patch", you would get:

SOURCES/patch-2.5.4-destdir.patch
SOURCES/patch-2.5.4-unreadable_to_readable.patch
SOURCES/patch-2.5.8-sigsegv.patch
SOURCES/patch-2.5.8-stderr.patch
SOURCES/patch-2.5.9.tar.bz2 -> ../UPSTREAM/patch-2.5.9.tar.bz2
UPSTREAM/patch-2.5.9.tar.bz2
SPECS/patch.spec

where SOURCES and SPECS are hosted on the current svn repository:

% svn info .
URL: svn+ssh://svn.mandriva.com/svn/packages/cooker/patch/current
% svn info SOURCES
URL: svn+ssh://svn.mandriva.com/svn/packages/cooker/patch/current/SOURCES

UPSTREAM is hosted on another repository:

% svn info UPSTREAM
URL: svn+ssh://svn.mandriva.com/svn/tarballs/cooker/patch/current

repsys/mdvsys will handle the creation of the symlink and ensure no tarballs are committed in SOURCES/

[edit] Pros

  • most all 'pros' from the 'Do not store tarballs into it' proposal
  • we don't care about the tarballs svn server history: at every release, we can create the svn repository from latest tarballs (but how costly is that?)
  • not far from current way to do things
  • I believe it is easier on tools (repsys/mdvsys/etc.), as no need for handling "out of SVN" problem, simple regular use of SVN should work
  • backups of the big tarballs repository can be done with a simple 'svn export' command, no need to backup history
  • backups of the more important package development repository become very light, quick, and generally "cheap" (just like in the 'Do not store tarballs into it' proposal

[edit] Cons

  • we wouldn't be able to use external xdelta to save even more space on tarballs


[edit] Use svn externals

same as 'Store tarballs in another repository' where

UPSTREAM is an svn external:

% svn pg svn:externals .
UPSTREAM svn+ssh://svn.mandriva.com/svn/tarballs/cooker/patch/current

Note: as discussed, as svn:externals seems limited and not really that good in general, maybe even bad in some cases, an alternative is to have the split packages+tarballs repositories, without use of svn:externals. Then SVN should work mostly as usual, provided the developer checks out the proper stuff (i.e. does the extra checkout from tarballs repository in UPSTREAM subdir), which would be done automatically by the repository tools. This way we also do not have to put another mechanism for tarballs availability (ftp, http, ..) and keep using a single access control scheme etc.

[edit] Pros

  • regular svn checkout works

[edit] Cons

  • svn commit will not commit the external UPSTREAM. repsys/mdvsys commit could be used to commit both (this implies people should start using repsys/mdvsys commit instead of plain svn commit (but only required when commiting new tarballs))
  • external doesn't like mixing svn+ssh and http. nor repsys mirrors

[edit] Other projects

How other projects handle this.

[edit] Fedora

Check Package Maintainers webpage and their repo. As spotted by Bogdano, it seems Fedora keeps a dual packages+tarballs (CVS) repositories scheme.

[edit] FreeBSD

AFAIK, FreeBSD ports are kept in a CVS repository, along with their local patches; the tarballs (distfiles) are pulled directly from upstream projects, or any place listed as a download source for the port, at build time; the FreeBSD project keeps (at least) some tarballs on their own site, as a fallback; the distfiles downloaded are checked for matching MD5 *and* SHA256 *and* size before building.

[edit] Suse

Suse uses openSUSE Build Service. Sources of their build service is available here

Personal tools