Development/Resources/Cluster/Admin

From Mandriva Community Wiki

Jump to: navigation, search
Common tasks regarding cluster administration

A quick reference of various cluster admin tasks

Contents

Platform overview

The platform for the build system is running on multiple machines and is composed of 2 main layers:

  • the logical "cooker" layer
  • the physical layer (technical system infrastructure)

Troubleshooting the cooker layer

The logical cooker layer is made of chroots runnning on multiple nodes (n1..n5 for x86, seggie & deborah for x86_64). To access these machines you login as normal with "ssh n5" for example. From there, you can sudo to perform administration tasks as described below. To help in the daily platform administration, you can request to do so on the distrib-admin mailing list. To get help, login on IRC and contact distrib-admin members who have sudo priliveges on this layer.

Troubleshooting the physical layer

This layered is formed by the real system running on the nodes, some of them are in 2007, some of them in a mix of 2006 & cooker. Normaly as a build engineer or a contributor, you do not need to access this part.

If a problem cannot be solved from the logical cooker layer, you (as a registered member of distrib-admin) can contact ISTeam to get help. The current SLA for the physical layer is that HW troubles are taken into account during the week with IS Team normal duties. HW troubles happening during the week-end are only fixed on a best effort basis.

Access to the root shell

If you want to become root, just use sudo bash.

Cluster configuration

Some configuration files are managed by cfengine, the configuration is in /var/lib/config/ on kenobi.

A cron task is updating each node of the cluster. Managed files are updated every hour or so. To expedite or force an update, run the /etc/cron.hourly/config script on each managed machine.

These are managed via LDAP:

  • user and group accounts (service accounts are local for each machine, i.e., id < 500). Group accounts follow RFC2307bis, that is, group membership is via the member attribute using a full DN.
  • sudo rights
  • automount maps
  • password policies (not in effect yet)
  • svn login to user+email mapping ([users] section in repsys.conf)
  • administration privileges (ou=System Accounts and ou=System Groups)
  • svnperms groups for ACLs (same as posix groups)
  • mandriva.org email aliases for contributors
  • ssh public keys are also stored in the user's entry in LDAP

The tree layout is based on Projects/OpenLDAP_DIT

The "producer" LDAP server (i.e., "master" and R/W) is at svn.mandriva.com, and kenobi.mandriva.com hosts one "consumer" (i.e., "slave" and R/O). So direct any write operations to the producer or else you will get back an error and a referral to the right server.

LDAP

There are two LDAP servers: the one on svn.mandriva.com is the producer one, i.e., where all writes go to. The other one is at kenobi.mandriva.com and is a read-only consumer. All cluster machines point to these two servers, using the closest one first. So if one goes down, the other is used.

This configuration means that all write operations have to be directed to svn.mandriva.com. Read operations can be done in any of the two, as the replication is quite fast meaning the data is consistent.

So, how does one deal with this change? User accounts and groups creation is handled by script. All the rest is, for now, handled via LDAP commands. So, dear admin, fire up your LDAP client of choice. Some suggestions: luma, gq, openldap-clients.

One last thing: use your own account to perform admin tasks. Because of the use of group memberships, members of the Account Admins system group (see cn=Account Admins,ou=System Groups,dc=mandriva,dc=com) have such rights. If you are not a member of that group and think this is an error, please contact distrib-admin@mandrivalinux.org.

Sudoers in LDAP

Sudoers configuration is stored under the ou=sudoers,dc=mandriva,dc=com branch. Any change to this branch is immediately visible in all cluster nodes, so be careful.

Sudo defaults are managed under the cn=defaults entry. All other entries represent sudo roles, which are similar to a line in /etc/sudoers.

Here is an example of a sudo role called youri-submit:

dn: cn=youri-submit,ou=sudoers,dc=mandriva,dc=com
objectClass: sudoRole
cn: youri-submit
sudoUser: %packager
sudoRunAs: mandrake
sudoCommand: /usr/local/bin/mdv-youri-submit.wrapper
sudoOption: !authenticate
sudoHost: n1.mandriva.com
sudoHost: n2.mandriva.com
sudoHost: n3.mandriva.com
sudoHost: n4.mandriva.com
sudoHost: n5.mandriva.com
sudoHost: seggie.mandriva.com
sudoHost: deborah.mandriva.com
sudoHost: kenobi.mandriva.com

This role has these characteristics:

  • is about the command /usr/local/bin/mdv-youri-submit.wrapper
  • can be run only on the listed hosts (n1 through n5, seggie, deborah and kenobi)
  • has to be run as the mandrake user (i.e., sudo -u mandrake)
  • can be run by any member of the packager group
  • no authentication needed (equivalent of NOPASSWD in sudoers)

If we want to add another command to this role, just add it via another sudoCommand attribute. Like this:

dn: cn=youri-submit,ou=sudoers,dc=mandriva,dc=com
(...)
sudoCommand: /usr/local/bin/mdv-youri-submit.wrapper
sudoCommand: /usr/local/bin/youri-submit
(...)

Want another user or group in the list of authorized entities? Just add another sudoUser attribute. Another option? Add sudoOption. Host? sudoHost. And so on.

User and group accounts in LDAP

User and group accounts are stored in LDAP with the following characteristics:

  • using RFC2307bis schema (i.e., groups use groupOfNames as structural object class)
  • using cn=unixIdPool entry to store free user and group global numeric identifiers
  • group membership is automatically handled by the OpenLDAP server whenever an user is removed
  • primary group for all user accounts is users (gidNumber=100)

The svn/git machine uses a little trick to prevent users from using interactive shells: on that machine only, loginShell from LDAP is overriden locally to point to a wrapper script which only allows git and svn commands. This is done in /etc/ldap.conf.

For now there are two scripts: one to add users and another to add groups. Both have similar sintaxes and behaviour, see their usage text for more details.

NOTE: anything that touches the userPassword attribute has to be protected by encryption. This means that any non-anonymous authentication has to use ldaps:// or ldap:// + START_TLS. For the OpenLDAP client command-line tools, add -ZZ as a parameter. For other LDAP clients, check the respective documentation.

WARNING: there is a race condition in adding users and groups because the increment+modify operation (RFC 4525 and RFC 4527 for pre/post-read) in LDAP is not being used at the moment. This means that if more than one admin is using the add user/group script at the same time, the new entries may get the same uidNumber or gidNumber. uidNumber is protected from duplication on the server side and the script would fail if it happened, but not gidNumber. So, if such a problem arises, please check these attributes for duplication and, if necessary, increment the values in the cn=unixIdPool,dc=mandriva,dc=com entry.

Adding an user

Use this script: cluster-adduser.sh

Currently these are the roles available for user accounts. A role is represented by a set of groups:

Roles explained
Role Privileges Groups Use for...
apprentice shell,iurt apprentice people who may become maintainers in the future
svn-only commit,shell svn basic svn access
translator commit,shell svn,po only translator related work
packager commit,shell,iurt,upload svn,packager maintainers who commit and upload packages

(nss_ldap has support for nested groups, i.e., groups within groups, but I don't trust it yet)

To add a new user account, please use the above mentioned cluster-adduser.sh script. WARNING: this script is still lacking some features:

  • subscribing the user to maintainers@ mailing list in the case of a packager role
  • create home directory on specified cluster node and svn machine

So, after creating the user in LDAP, these tasks need to be performed manually:

  • for packagers, ask the user to subscribe himself to the maintainers list by sending an email to sympa@mandrivalinux.org with the body "subscribe maintainers"
  • for packagers, create the account on https://maint.mandriva.com/
  • ssh into the node and run /var/lib/config/bin/new-account <loginname> users

Adding a group

Use this script: cluster-addgroup.sh

This script also has the option of specifying an owner for the group. Owners are like group admins: they can manage the group ownership at will by including or removing members.

Note it's currently not possible to add initial members to the group with the script. This has to be performed later with standard LDAP operations.

Modifying a group

Here is a script to add user(s) to a group: http://svn.mandriva.com/svn/soft/build_system/account_management/cluster-adduser2group.sh

Alternatively, you can use standard LDAP operations on the group entry. For example, to make jsmith part of the packager group, run this:

$ ldapmodify -x -ZZ -D uid=<your-login>,ou=people,dc=mandriva,dc=com -W -h svn.mandriva.com
Enter LDAP Password: secret
dn: cn=packager,ou=Group,dc=mandriva,dc=com
changetype: modify
add: member
member: uid=jsmith,ou=People,dc=mandriva,dc=com

modifying entry "cn=packager,ou=Group,dc=mandriva,dc=com"
^D

Or just use your LDAP client of choice to add that new member attribute.

To remove an user from a group, it's almost the same operation. Just be careful to specify which member you are removing: if left blank, all members would be removed!

$ ldapmodify -x -ZZ -D uid=<your-login>,ou=people,dc=mandriva,dc=com -W -h svn.mandriva.com
Enter LDAP Password: secret
dn: cn=packager,ou=Group,dc=mandriva,dc=com
changetype: modify
delete: member
member: uid=jsmith,ou=People,dc=mandriva,dc=com

modifying entry "cn=packager,ou=Group,dc=mandriva,dc=com"
^D

The command above removed jsmith from the packager group.

Modifying an user

Again, standard LDAP operations should be used to perform modifications on an user entry. For example, to change the alias email of jsmith to js22@gmail.com, run this:

$ ldapmodify -x -ZZ -D uid=<your-login>,ou=people,dc=mandriva,dc=com -W -h svn.mandriva.com
Enter LDAP Password: secret
dn: uid=jsmith,ou=People,dc=mandriva,dc=com
changetype: modify
replace: mailForwardingAddress
mailForwardingAddress: js22@gmail.com

modifying entry "uid=jsmith,ou=People,dc=mandriva,dc=com"
^D

The manager of an user, defined by the manager attribute if present, has some additional permissions over that entry when compared to the user him/herself. For example, the manager can upload a photo (jpegPhoto), change the mailForwardingAddress value and edit some other attributes which won't be listed here because this could change in the future as this feature is more (or less) used.

For example, if jsmith had this entry:

dn: uid=jsmith,ou=people,dc=mandriva,dc=com
(...)
manager: uid=peter,ou=people,dc=mandriva,dc=com
mailForwardingAddress: peter21@gmail.com

This means that uid=peter,ou=people,dc=mandriva,dc=com could change, among other things, the mailForwardingAddress of this jsmith entry.

The manager attribute also points out who is tutor of an user in the case of an apprentice, so you know who to contact if needed.

Removing a group

To remove a group, just delete it's entry in LDAP with a standard LDAP operation. Like this:

$ ldapdelete -x -ZZ -D uid=yourname,ou=People,dc=mandriva,dc=com -W -h svn.mandriva.com cn=<group>,ou=Group,dc=mandriva,dc=com
Enter LDAP Password: secret

Notice that any filesystem objects which had this group in its set of permissions will stop recognizing the group name, showing the numeric identifier instead.

Removing/Disabling an user

In most cases, it's better to disable an user account instead of removing it.

To disable an account, just make sure it has the shadowExpire: 1 attribute/value in it. For example, to disable the account of the jsmith user:

$ ldapmodify -x -ZZ -D uid=<yourlogin>,ou=People,dc=mandriva,dc=com -W -h svn.mandriva.com
Enter LDAP Password: secret
dn: uid=jsmith,ou=People,dc=mandriva,dc=com
changetype: modify
replace: shadowExpire
shadowExpire: 1

modifying entry "uid=jsmith,ou=People,dc=mandriva,dc=com"
^D

This will prevent that user from logging in the cluster and using authenticated svn sessions. To re-enable an account, just remove that attribute. For example, in the case of the same jsmith user:

$ ldapmodify -x -ZZ -D uid=<yourlogin>,ou=People,dc=mandriva,dc=com -W -h svn.mandriva.com
Enter LDAP Password: secret
dn: uid=jsmith,ou=People,dc=mandriva,dc=com
changetype: modify
delete: shadowExpire

modifying entry "uid=jsmith,ou=People,dc=mandriva,dc=com"
^D

Alternatively, you can use these scripts: cluster-enableuser.sh and cluster-disableuser.sh.

Removing an user is a bit more tricky, because of the many places the user is referenced. Some are automatically handled, but others are not:

  • group membership: automatically handled by the OpenLDAP server. The user is removed from all groups to which he/she belongs to.
  • automount maps: have to be manually removed from ou=Mounts,dc=mandriva,dc=com
  • sudo rules: have to be manually updated in ou=sudoers,dc=mandriva,dc=com unless groups are used (search for sudoUser=name)
  • bugzilla: not affected (bugzilla's database is independent of LDAP)
  • email alias: automatically dropped, because it's in the user entry itself
  • maintainers@ ml subscription: has to be dealt with manually
  • home directory: has to be dealt with manually

We will try to come up with a script to do this all.

Automount maps in LDAP

Automount maps are stored in LDAP under ou=Mounts,dc=mandriva,dc=com. These maps are automatically created by the cluster-useradd.sh script. Here is an example:

dn: cn=andreas,ou=auto.home,ou=Mounts,dc=mandriva,dc=com
objectClass: automount
automountInformation: -rw,nfs,soft,intr,nosuid,rsize=8192,wsize=8192 n5.mandriva.com:/export/home/&
cn: andreas

Password policies in LDAP

These are currently ignored. The documentation is here only for completeness.

It is possible now to store password policies in LDAP, and use different policies for different users. For example, here is a fictional policy called "cluster":

dn: cn=cluster,ou=Password Policies,dc=mandriva,dc=com
pwdExpireWarning: 604800
cn: cluster
objectClass: pwdPolicy
objectClass: namedObject
pwdMinLength: 6
pwdCheckQuality: 1
pwdAttribute: userPassword
pwdMaxAge: 5184000
pwdMustChange: TRUE
pwdInHistory: 2

Policies will only be used after some more testing with ssh interaction, specially the password change feature. Unless we decide to use ssh-key authentication only (better).

Contributor email aliases in LDAP

The contributors email aliases are also handled in LDAP. Periodically a script scans the user entries and generates a postfix-compatible virtual alias file that is sent to the mandriva.org MTA. These two attributes are used to construct the alias: mail and mailForwardingAddress (can also be called mailAlternateAddress).

For example, this user entry:

dn: uid=jsmith,ou=People,dc=mandriva,dc=com
(...)
mail: jsmith@mandriva.org
mail: jsmith@mandriva.com
mailForwardingAddress: jsmith26@gmail.com
(...)

Would generate this line for a postfix virtual alias (note how @mandriva.com was ignored):

jsmith@mandriva.org      jsmith26@gmail.com

So, the rules are:

  • mail: has to be a @mandriva.org address
  • mailForwardingAddress: if mail is a @mandriva.org address, then this attribute contains the aliased address, i.e., the final destination
  • any mail other than @mandriva.org is ignored by the script

The current script is here: generate-aliases.py

svnperms.conf groups in LDAP

The svnperms.conf has fine grained ACLs for write access to different paths of a repository. One of its sections is the [group] one, which defines a group and its members. This definition was moved to LDAP and shares the posix groups, i.e., svn groups are the same as posix groups and all posix groups are available to be used as svn groups.

So, for example, to add an user to a group mentioned in an svn ACL, just add this user to that posix group.

To add permissions for a new project, a new entry has to be added in the matching repository section (for example [projects]) of svnperms.conf:

 myproject/.* = *() @mygroup(add,remove,update)

ssh public keys in LDAP

The SSH public keys are centralized in LDAP, inside each user's entry. We are not yet, however, using openssh with the LPK patch, which means that SSH still looks for these keys somewhere on file.

So there is a cron job which periodically fetches the keys from LDAP and stores them in a local file, outside the user's home directory. This has the added bonus that the user will still be able to login even if his/her homedir is not available due to nfs problems.

The following scripts are available to deal with SSH keys in LDAP:

  • send-sshkey-ldap.py: this script is used to add or replace keys in the user's entry
  • ldap-sshkey2file.py: this is the script that runs via cron and stores the keys in /var/lib/pubkeys/loginname/authorized_keys

The cluster-adduser.sh script already adds the ssh public key to the user it is creating in LDAP. In fact, supplying the ssh key is mandatory now.

There are ACLs protecting access to these keys:

  • the user can update his/her keys at will using LDAP commands or the send-sshkey-ldap.py script as long as he/she has a password in LDAP;
  • admins can update keys;
  • keys can only be read by the user or admins in authenticated sessions. So, an anonymous search of the LDAP tree won't display the SSH keys. This may need to be changed if we start to use the LPK patch.

Granting svn commit rights to a user who already has ssh access

There is more than one SVN repository available. The example below is for the packages one:

  • check that the user is part of at least the svn group (run id <user>)
  • add the user to the group that already has the needed rights
  • if some ACL change is needed (usually it's not), do the following:
    • checkout svn+ssh://svn.mandriva.com/svn/config/svn/packages/conf
    • edit the svnperms.conf file if needed, and commit. Note that groups are defined in LDAP.
    • update the checkout on svn.mandriva.com cd /svn/packages/conf; svn up (a checkout on commit feature would be nice, but there is some privileges problem for the moment, and no clean way to handle them)

Groups used in ACLs in the svnperms.conf file are defined in LDAP as regular posixGroup/groupOfNames. So, if all that is needed is to add an user to a group, do it in LDAP.

For example, if one wants to add user jsmith to the drakx group, this would do it if you are an admin:

$ ldapmodify -x -ZZ -D uid=<your-login>,ou=people,dc=mandriva,dc=com -W -h svn.mandriva.com
Enter LDAP Password: secret
dn: cn=drakx,ou=Group,dc=mandriva,dc=com
changetype: modify
add: member
member: uid=jsmith,ou=People,dc=mandriva,dc=com

modifying entry "cn=drakx,ou=Group,dc=mandriva,dc=com"
^D

The change is effective immediately for all repositories.

Adding a buildhost to the upload system ACL, based on architecture

In order to declare another buildhost to the upload system, you need to edit the file /etc/youri/hosts.conf, i.e. use the /var/lib/config copy of kenobi. The format is simple :

host-regexp arch-regexp

The check comes from Youri::Upload::Check::Host (/usr/local/lib/perl/Youri/Upload/Check/Host.pm)

Adding full privileges to someone on bugzilla

Ask vdanen@mandriva.com

Access to the real system outside of chroot

In order to recover in case of big problems, cluster node uses a chroot. The real system can be accessed on port 12, like this:

ssh n5 -p 12

You can also mount the real partion ( /dev/hda5 ) and use chroot to go outside of the first chroot.

Cleaning iurt process that does not respond

If the build is too old (i.e. more than one day, and does nothing, iurt should be killed. Please take a look at the log file first (shown by ps aux) to try another method. rpm/urpmi locking problems are known, this requires a kill.

RPM build failing in weird ways

Builds with bm work, but with rpm don't. Check if nscd is running. If not, start it with:

sudo /sbin/service nscd restart (allowed for everybody in the packagers group)

This is a bug in nss_ldap (http://bugzilla.padl.com/show_bug.cgi?id=273) that can be "workarounded" by using nscd.

This was fixed with nss_ldap-257, which should be installed in all cluster nodes by now.

System doesn't show changes in accounts/groups

We use nscd in cluster machines, which is a cache of user and group information. So when one changes some group membership, for example, it can take some time to show up for tools like getent and id. To speed it up, just invalidate the cache. For example, to invalidade the group cache, run:

sudo nscd -i group

To invalidade the passwd cache, the command is:

sudo nscd -i passwd

Problem with autofs?

In case autofs is not working, here is a quick summary on how things are set up: autofs is the one from 2006.0, because we run a 2006.0 kernel outside of the cooker chroot (there are various incompatibilities with the cooker version at the moment).

To reinstall, just run :

rpm -Uvh /mnt/BIG/dis/community/2006.0/i586/media/main/autofs-4.1.4-4.2.20060mdk.i586.rpm

Autofs is run from inside the cooker chroot. The config files are /etc/auto.home and /etc/auto.master, managed by cfengine.

Kenobi uses autofs5, because it has a newer kernel. The config files are stored in /etc/autofs instead of /etc directly.

Upload is stopped on kenobi

Sometimes, a mail is sent to signal a problem on kenobi or ken :

Subject: [Maintainers] kenobi.mandriva.com filesystem is full
Only 4878812 bytes available.

Stopping upload and mirroring processes.

This mail, on kenobi is sent by the script /etc/cron.hourly/stop_if_full, which runs /home/mandrake/bin/stop_if_full. The script checks avaliable disk space on /export/home/ and /mnt/BIG/ and stops crond.

So if this happens, this usually means that something is taking too much space, and among the usual suspects, we have /mnt/BIG/dis/uploads/failure/cooker/{contrib,main}/release/, that can fill pretty quickly. Using find and rm to remove the old log is the usual solution to clean it.

Once this is done, crond must be started again.

service crond start

(re)move a package

The package repository reference machine is ken. If you have the right access to it, whatever is done there in terms of package move, removal, etc. is reflected in the rest of the world. For example, to move a package from 2007 main/backports to 2007 contrib backports one could do this:

for m in /mnt/BIG/dis/2007.0/{SRPMS,*/media}; do mv $m/main/backports/*warzone* $m/contrib/backports; done

Checking the logs of buildsystem

Since buildsystem is mainly using cron, you can get the logs of the job by running "mutt" on kenobi, as the user mandrake.

Cluster is broken, what should be checked ?

  • First, check disk space on every node.
  • Check if the time is correct. Beware, ntpdate will refuse to sync if the gap between kenobi and the cluster is too wide.

I modified a group/user but the changes don't show up!

All cluster nodes use nscd, which means, the posix data from LDAP is cached. To see the real data, stop nscd and check again. If you are in a hurry, you can remove the cache in /var/db/nscd/* and restart the daemon.

rpmlint configuration tips for kenobi

Kenobi runs rpmlint on each submitted package.

  • RPM groups: /etc/rpmlint/config, managed via cfengine. So, edit /var/lib/config/etc/rpmlint/config instead
  • extra checks are loaded from /usr/local/bin/rpmlint/

Todo

  • script to create user accounts. Mostly done:
    • support rfc2307bis groups
    • support cn, sn, givenName and email domain
    • better support for being called in a script (i.e., needs command line parameters for some stuff)
    • support for uid/gid pool instead of enumerating all users/groups in order to find out what number to use
    • possibly not depend on nss_ldap configured
    • create home dir on node and svn.mandriva.com
    • upload ssh key to hosts (home host + svn.mandriva.com (and patch it with command in the svn host))
    • subscribe to maintainers list
    • use employeeType and manager attributes. For example, for the apprentice role we could have:
      • employeeType: apprentice
      • manager: the DN of the user who is tutoring this apprentice (for example, uid=peroyvind,ou=People,dc=mandriva,dc=com)
    • give bugzilla permissions (I think this could be done automatically by bugzilla via an email regexp)
  • change OpenLDAP ACLs to allow the manager of an user to write to some of his/her attributes (still need to define which ones). For example, fix email, reset password, add photo, etc.
  • autofs: test autofs with these maps, come um with a configuration
  • fix emails: contributor vs employee (.org vs .com) (based on repsys.conf)
  • fix aliases: need a script to dump forwarding email from ldap into aliases format and rsync to postfix server
  • replication: setup slave/consumer, decide on which machine(s)
  • nested groups: patch nss_ldap to disable nested group support (patch done, but not applied by default, it's not "upstream quality")
  • svnperms: patch svnperms.py to support groups in LDAP instead of svnperms.conf
  • script to remove and/or disable users
  • to help admins not familiar with LDAP, script to:
    • manage group membership (add users to groups, remove users from groups)
Personal tools
Looking for a job?