Development/Packaging/BuildSystem/Improvements

From Mandriva

Jump to: navigation, search
Build System Improvements

Status: work in progress

Analysis of problems in the build system and ways to improve it.

Contents


[edit]

Build system overview

[edit]

Interface

[edit]

Fault tolerance

[edit]

Network problems

Image:machines_small.png

Critical network operations in the build system (View large)

  • Network outage:
  • Name resolution problems: the build node fails silently if it fails to resolve the name of the cluster master (kenobi).
  • NIS server down: this would interrupt the build process with packages being enqueued for build but never scheduled since the build user data is provided by NIS.
  • NFS server down: this would at some point prevent the creation of chroot filesystems. rpm is very sensitive to stalled NFS filesystems and could block if it tries to stat a stalled filesystem.
  • HTTP server down:
[edit]

System problems

  • Missing directory under uploads/: Ulri fails and cycles same package again and again in infinite loop.
  • Out of disk space in input queue:
  • Out of disk space in cluster node:
[edit]

Hardware problems

  • Transient failure in kenobi:
  • Permanent failure in kenobi:
  • Transient failure in cluster node:
  • Permanent failure in cluster node:
  • Transient failure in ken:
  • Permanent failure in ken:
[edit]

Legacy issues

[edit]

Cluster usage

The cluster was initially designed for direct package built by users and is still shared between users and build bots. Users have their home directory in a specific node which is exported using NFS to the other nodes. Authentication is based on NIS. The mandrake user home directory is always local, but authentication is still performed by NIS. Currently nodes n2 and n4 are not used by bots.

[edit]

Possible improvements

[edit]

Arcane interface

[edit]

Package checkout

In Repository System Quickstart, Andreas shows that the command to check out a package is:

$ svn co svn+ssh://svn.mandriva.com/svn/packages/cooker/bm/current bm
A    bm/SOURCES
A    bm/SOURCES/bm-2.1-rpmbuild.patch.bz2
A    bm/SOURCES/bm-2.1.tar.bz2
A    bm/SPECS
A    bm/SPECS/bm.spec
Checked out revision 975.

This could be wrapped as

$ mdvsys co bm
Checked out revision 975.

assuming cooker as default, or mdvsys co 2007.1/bm to check out a specific version.

[edit]

Progress report

The current interface informs exactly what it is doing (e.g. Executing sudo -H -u mandrake perl -I/usr/local/lib/perl/ /usr/local/bin/youri-upload --config /etc/youri/upload.devel.conf --define user=andreas cooker /home/andreas/@63201:krb5-1.4.3-7mdv2006.0.src.rpm (sudo_user andreas) which is quite useless for the end user. This should be replaced by a more descriptive message such as Sending package to the input queue.

[edit]

Diagnostics

[edit]

Monitoring

  • Use mrtg, rrdtool or similar to monitor CPU, network and disk usage in the build cluster. Buchan Milne (ranger) suggested to use hobbit instead and also volunteered for setting it up.
  • Overwrite argv[0] to show a description of what Iurt is doing. DONE
  • Wrapper scripts should use exec to reuse same process and unclutter ps(1) output.
[edit]

Job tracking

[edit]

Robustness

[edit]

Enviromental sanity

  • Don't use hardcoded UID 501 for user builder, change to something in system range. DONE
  • Create missing directories under uploads/ instead of failing and falling into an infinite loop. DONE
  • Add traps to iurt to umount /proc and /dev/pts in chroot on exit
[edit]

Network server failure

  • Name server: adding the name of the cluster master to /etc/hosts could help.
  • NFS: It would be safer to avoid NFS whenever possible and create chroot filesystems using HTTP. NFS-based homes can be replaced by local accounts only and distributed compiling. Even if NFS homes stay, home of user mandrake in each node can be changed to /export/home/mandrake, otherwise the scheduler won't be able to log in the build node and run Iurt remotely in case of NFS outage.
  • NIS: Using distributed compiling to prevent building on NFS-mounted directories would make NIS unnecessary. The mandrake user account can be local to each node.
[edit]

Performance

[edit]

Don't use NFS homes

[edit]

Distributed compiling

[edit]

More build bots per node

[edit]

Improve current youri steps:

  • youri on kenobi:

Image:youri-kenobi.png

  • youri on sandbox:

Image:youri-sandbox.png

  • youri on kenobi with an improved get_files function:

Image:youri-kenobi-getfiles.png

[edit]

Maintenability

[edit]

Source code cleanup

  • Create log functions instead of repeating print $run{LOG} "message" if $run{verbose} > n at each log message. DONE
[edit]

Script defaults

Personal tools