Format of synthesis.hdlist.cz index
From Mandriva Community Wiki
This page presents the format used by synthesis.hdlist.cz index, generated by genhdlist.
Note: take care where hdlists/synthesis are built on mirrors that there are hard links from media/media_info/hdlist_cz to media/main/media_info/hdlist.cz
So if you want to rebuild your hdlist, don't forget to remake the hard link. The best way to regenerate hdlists for all media is to use gendistrib.
Contents |
Parsing synthesis is an easy process. I did it in python in 1/2 day, without doc on it or special libs in python. See the attachment.
[edit] Format
First, synthesis.hdlist.cz are, as the name does not imply, is compressed with gzip. So the first thing to do is to use something like perl IO::Gzip or python gzip module
The format is easy to understand, even if not documented at all. Most of the job is done by a perl-XS library and the "description" can be found in perl-URPM source code.
Here is a sample entry in the file :
@provides@openldap1[== 1.2.12-4mdk] @requires@libldap1[== 1.2.12-4mdk]@rpm-helper[*]@/bin/sh[*]@/bin/sh[*]@bash@libc.so.6@libc.so.6(GLIBC_2.0)@libc.so.6(GLIBC_2.1)\ @libc.so.6(GLIBC_2.3)@libcrypt.so.1@libcrypt.so.1(GLIBC_2.0)@libnsl.so.1@libpthread.so.0@libpthread.so.0(GLIBC_2.0)@libpthread.so.0(GLIBC_2.1)\ @libpthread.so.0(GLIBC_2.3.2)@libresolv.so.2@libtermcap.so.2 @summary@LDAP servers and sample clients. @info@openldap1-1.2.12-4mdk.i586@0@2054148@System/Servers
The lines are always in the same order. Or at least, the @info@ part is always marking the end of the entry.
The first field is always the type of the line. So far, it can be :
- provides
- requires
- obsoletes
- conflict
- summary
- info
The 4 first tags ( provides, requires, obsoletes, and conflict ), are using the same scheme. They are followed by one or more package names, sometimes with version restriction ( like package[== version] ) . Restriction can be <= >= or ==, as far as i have seen. Multiples packages are separated by @.
The summary is simple too, since it is only followed by the summary on one line.
The last line is info, split like this :
@info@name-version-release.arch@epoch@size@group@
As most names are self-explanatory, I will not explain them in detail. arch is src for src.rpm, or i586,pcc, noarch. Size is in bytes, and the group is the rpm group, as listed in Mandriva Groups.
[edit] Problem with synthesis
However, one problem remains.
What if an rpm includes a @ in the name, or in the description ?
Right now, there is nothing to avoid the problem, and genhdlist will crash, and most of the tools using synthesis will be broken by this bug.
- synthesis.py: A sample python class of a synthesis parser

