Development/Packaging/BuildSystem/Analysis
From Mandriva Community Wiki
This summarizes the differences between distribution build systems.
Contents |
[edit] Introduction
This document presents an overview of the Conectiva Linux and Mandrakelinux build system architectures, highlighting interesting features that could be used to improve future versions of the Mandriva Linux build system.
What we call "build system" comprises the pieces of software responsible for the management of all steps between the package maintainer work in creating or updating the package and the availability of that package to the unstable distribution user. It includes sanity checks, policy compliance tests, regression tests, binary and source package building and revision control systems.
The current Mandriva buildsystem is based on the Mandrakelinux buildsystem, and will be referred to in this document as the "Mandrakelinux buildsystem". The analysis of this environment is based on the author's current knowledge about the system architecture and may contain errors, omissions or inaccuracies. If this is the case, please let the author know.
[edit] Buildsystems
[edit] Conectiva
[edit] How it works
The Conectiva buildsystem is centered on a database that stores source data and metadata for revision control. Developers send packages to revision control and submit them for building. The system retrieves the necessary data from the database, builds a source RPM package, stamps it with a serial number (called "stardate") and submits it to the autotest box. The autotester lints the specfile, builds the package and installs it on its enviroment. Should any error occur, the package is rejected and the developer notified. If it passes cleanly, it's enqueued for buildmaster approval or modification on the final build machine, where it is rebuilt on a strict controlled environment and stored in the binary package repository.
Multi-arch/multi-distro system has been used to build the Unitedlinux-based distribution, the PPC port of Conectiva Linux, several custom distro branches and maintenance of previous releases.
Reasoning to build this system is based on the following premises:
- Official packages are to be built in a trusted environment by a trusted user.
- Only extensively tested binaries are to be released (i.e. no massive rebuild prior to release).
- Only policy compliant packages are allowed to enter the distribution.
- Packages must upgrade cleanly.
- Changes initiated from a compromised developer workstation can be audited and rolled back.
Detailed information can be found at the RepositorySystem page.
[edit] Why it is good
- All packages are built by a trusted user (the buildmaster) in a strictly controlled environment (the main build machine) using a package assembled by a trusted source (repsys).
- No need for interactive remote logins/shells. The developer deals directly with the RCS and just signals it to send a specific revision of an SRPM package to the build machine.
- Ensures consistency between source and binary packages and the rest of the environment since everything originates with the source package (binary packages are never submitted).
- Package build, installation and upgrade are all tested automatically. If the installation or upgrade fails, for example, the package is rejected.
- Several different distributions and architectures can be maintained at the same time by submitting packages to different autotesters and build machines.
- Any released package can be quickly retrieved from the RCS.
- Stardate stamping allows package age comparison on quick inspection and can be used to implement rebuild-on-dependency.
- The autotester can continually rebuild distribution packages to ensure that old packages are not broken.
- Regression tests can be performed by the autotester.
- Changes in source files can be backtracked and audited.
- Easy management of updates for previous releases.
- Easy management of custom branches.
- WIP packages can be stored on the system, no need to resubmit a package to fix minor specfile problems.
- More than one maintainer can work concurrently in the same package.
[edit] What could be improved
Missing in the current build system is a mechanism to automatically rebuild any packages that are affected by changes in a package recently submitted (e.g. library ABI changes). Also the autotester is implemented using a jumble of old and new scripts and could benefit from a better architecture.
The buildmaster could be a bottleneck, although this has rarely been the case because more than one buildmaster can work in parallel.
[edit] Mandrakelinux
[edit] How it works
The "Kenobi" architecture used in the Mandrakelinux buildsystem makes a build host shell directly available to the developer. In the build host the developer build the packages and uploads them to Cooker. There is no buildmaster as the developer-built binary and source packages are sent abroad after Rpmlint checks. Specfiles are put into revision control when it passes Rpmlint.
A brief explanation of the system can be found in the Maintainer Howto.
[edit] Why it is good
- Already there.
- A simple system with easy maintenance.
- Saves CPU cycles by not rebuilding every package sent to the system.
[edit] What could be improved
The trusted environment rebuild approach could be used to ensure consistency between source and binary packages, avoid intentional or unintentional inclusion of extraneous code, prevent dependency problems in binary packages and improve the overall reliability of the system. (Even bona fide developers could be tricked to introduce malicious code that cannot be audited without a matching source package.)
Developers overseas or somewhere else suffer in this process by having to interactively log onto the build machine and build the packages. This is painful to do over high latency internet links.
[edit] PLF
The PLF build system consist of a set of chroots on different architectures, where maintainers have to build their package manually, then upload the result to the central repository. There is a lead distribution/architecture (cooker/i586) where uploading first is mandatory, and all other distributions/architecture are optional only. Upload enforce a set of mandatory rules, whereas later QA processus scan a broader array of checks.
All the tools used are developed as part of an independant project, Youri.
[edit] The tools
[edit] Repsys tools
The Repsys architecture relies in a set of user tools for comfortable operation. Although not essential, the tools are convenient to have a productive development environment.
- repsys: a command-line interface to the RCS. Can be used to retrieve, submit, create new packages, assemble an SRPM using the stored data, and get changelog and maintainer email information. Used mainly by the developer. (These actions can be alternately performed using a web interface and subversion tools directly.)
- bm: the build manager wraps rpm functionlity by allowing package building with arbitrary topdirs, private buildspace, logging, step building, etc. Used mainly by the developer.
- Updrpm: A tool used by the buildmaster to approve, reject, modify and build packages approved by the autotester. Currently implementation is a legacy OPC shell script that could benefit from a revamp and integration with bm.
[edit] Kenobi tools
Kenobi also has a set of tools used by the developer to send packages to Cooker:
- rpmlint: an utility to check package sanity and policy compliance.
- uplftp: dispatch the package down the pipeline after the developer declares it ready. Can be called by other aliases such as ftpcooker, ftpcontrib, etc.
- rpmmon: a tool to track upstream releases, query maintainership, ...
- ue: install or remove a package from all hosts of the build cluster
[edit] Youri
Youri is a set of tools for managing a rpm-based distribution. It focuses on genericity, extensibility and coherency. Is it used so far both by PLF and JPackage projects.
For more details, see project web site
[edit] Revision control
The role of the RCS in both architectures is not the same. Repsys stores any and every file that is part of the source package (both data and metadata using Subversion), while Kenobi stores only metadata using CVS. The repsys approach allows for generation of source and binary RPM packages with no contact with human hands, ensuring a matched pair of source and binary packages. Any binary package can be backtracked to the sources used to generate it using a tracking number, allowing for source code auditing should it become necessary.
Subversion has a number of features that make it especially interesting for usage in Repsys, such as:
- Good handling of binary files.
- Supports file and directory renaming (unlike CVS).
- Revision numbering suitable to use as "stardate" (so we don't need an ancillary stardate assignment subsystem).
- Cheap copies provide a straightforward branching system.
Repsys generates RPM specfile changelogs on the fly using the RCS changelog.
[edit] Policy and testing
[edit] Policy enforcement
Policy regarding specfiles and packaged sources and binaries can be easily linted inspecting the package files. To enforce clean installation and update, however, requires installation of the binary packages (possibly upgrading an earlier version) prior to approval.
[edit] Package tests
[edit] Security concerns
[edit] Authentication
Authentication is the process by which the build system verifies if the user is who he or she claims to be. In the Mandrakelinux build system, authentication is made when the developer logs in the build cluster through a secure shell. In the Conectiva build system authentication is handled by Subversion. Neither method allows transit of plaintext passwords or disclose sensitive information. The main difference here is that under Repsys there's no need to create a system user account (with shell access) for the developer.
[edit] Authorization
Once authenticated possible user actions in the Mandrakelinux build system are only limited by the account permissions. A full shell is made available for the user to build his/her package. A number of actions requiring superuser privileges are executed using sudo(8). An internal developer can send packages to the main or contributed trees, for Cooker or other stable distribution. Packages can be installed or removed by the user using the ue tool, giving him/her ample control about the environment (that can incidentally affect other developers sharing this environment).
In Repsys the user can do normal revision control operations and submit packages to the build system. Permissions can be granted in a per user basis to submit packages to different targets (e.g. only the maintenance team can submit packages to the "updates" tree and so on).
If the developer workstation is compromised, Repsys limits the extension of potential damages to changes in the RCS which can be rolled back to a previous state (the last good stardate) with little effort regardless of the amount of changes. The Kenobi system is more permissive since it grants a shell account with data transfer and possibility to build and install packages as the superuser, making it easy for an attacker to take full control of the package build environment.
[edit] Data transfer
The Mandrakelinux build system relies on ssh (scp) and FTP for data transfer. Conectiva uses standard Subversion HTTPS data transfers coupled with some copies over NFS to pass packages along the pipeline (it can -- and should -- be changed to use something better). Only files signed by the SRPM building agent should be accepted by the autotester, and only files signed by the autotester should be accepted by the main build machine. Only files signed by the buildmaster should be allowed to enter the package repository.
[edit] Auditing
Should security-related or other concerns be raised on a certain binary package, Repsys allows its tracking number (stardate) to be used to retrieve the sources and metadata used to build that binary for auditing. The clean-room build approach ensures that binaries are built using strictly the data and metadata associated with that tracking number, with no extraneous elements added to the source set or specfile.
[edit] Summary
Features of Repsys and Kenobi build systems can be summarized in the following table:
Conectiva | Mandriva | |
---|---|---|
Who builds packages | Buildmaster | Developer |
Sanity/Policy check | Test machine builds package | Rpmlint |
Errors and warnings | Specfile, build, binary, installation, upgrade | Specfile, binary |
Custom tools | repsys, bm, updrpm | ftpcooker, ftpcontrib |
Revision control | Subversion, data and metadata | CVS, metadata only |
Who approves packages | Buildmaster reviews and modifies if needed | ? |
Source code audit | Source can be tracked using stardate | Source and binary may be inconsistent |
Distributed compilation | Only in test machine | Yes |
Multi-arch | Yes (requires native hardware) | Yes (w/ manual handling?) |
Issue tracking | Bugzilla | Bugzilla |
Issue tracking integration | Warns about open tickets; closes tickets automatically if referenced in changelog | ? |
Current hardware (ia32) | Athlon XP 2600 (autotester) 700MHz 8-way Xeon (build) |
4 x 2.8GHz 4-way Xeon (build cluster) |
Users | ~20 | ~100? |
Unstable distribution | Yes, Snapshot | Yes, Cooker |
Other features | Cyclic testing | ? |
Pros | Stronger policy enforcement, clean-room build, auditing | It's already there |
Cons | Requires lots of storage space, no autobuild based on dependencies | Vulnerable to malicious binaries, can lead to inconsistencies |
- Image:Repsys.fig: Repsys architecture diagram source file (xfig)
- Image:Kenobi.fig: Kenobi architecture diagram source file (xfig)