- With KDE (as described in http://www.kdedevelopers.org/node/1664 http://www.kdedevelopers.org/node/1663), the major parts of startup are now spent in : - kernel (~33%) - whatever "kernel" means exactly in sysprof - X (~20%) - fonts (~15%) - still - KConfig parsing (~10%) - it seems reading everything into QMap is not that good idea after all - dynamic linker (~10%) - that's with using prelink though, it's ~33% if we use only kdeinit and nothing else, on the other hand I have a patch for ld.so that avoids relocation processing completely for dlopen of prelinked libraries, which kind of fakes a prelinked dlopen, then there's still 2% spend in other libraries processing Which again makes the dynamic linker a noticeable cost in KDE performance. Even with prelink we spend 10% of time in relocation processing, and since KDE startup doesn't use dlopen that extensively like e.g. Konqueror startup the costs may be even higher with extensively plugin-based applications. Prelink currently doesn't work with dlopen. Possible solutions include adding such support to prelink or using Michael Meeks' -Bdirect when it becomes stable. Strictly speaking, prelink is a technically better solution than -Bdirect. The costs coming with relocation processing are time needed to perform the processing and dirty pages caused by COW when writing the relocations. Bdirect doesn't avoid relocation processing, it only makes it significantly faster by changing the standard ELF symbol lookup rules and avoiding seaching all libraries in the normal search scope. Such symbol lookup is only rarely needed anyway and some would probably consider lookup done the -Bdirect way more reasonable. Making prelink work with dlopen would make the symbol lookup work more similarly to the way -Bdirect does it, there's however no work going on with prelink WRT dlopen AFAIK. While -Bdirect cannot result in the same performance savings like prelink it should be able to get significantly close. Since -Bdirect doesn't avoid relocation processing, it doesn't avoid unshared memory caused by writes of the relocations, while prelink could theoretically avoid almost all the dirty pages, because prelinked relocations don't result in writes, as the binaries already contain the proper values. For average KDE application the amount of memory is somewhen between 0.5 and 1 MiB. Since 2.6 kernels apparently can't report dirty pages for a process, it needs to be estimated. For K3b, the totals for dirty pages caused by relocations here is ~1089 KiB , if .data section is excluded, it's ~614 KiB. However vast majority of .data size comes from libraries and we almost don't have any real global data, so this should be mostly what didn't make it into data.rel_ro for some reason and probably in practice wouldn't be dirty. With prelink only ~786 KiB (~405 without .data) is dirty, which is pretty high given the large number of conflicts, if it was possible to avoid all conflicts the numbers could be probably as low as 200 KiB. I tried also with ksysguard but the numbers are almost the same (slightly smaller). Many of the conflicts are caused by either copy relocations or duplicated symbols. Copy relocations could be avoided by using PIC even for binaries, we do that already anyway because of kdeinit, #pragma interface/implementation could hopefully help with most duplicated symbols (if we don't go with RTLD_GLOBAL we'll need to avoid them anyway for exceptions/typeinfo to work properly). However, kdeinit should do the job as well WRT memory. E.g. for the same k3b case, if k3b was a kdeinit module, the unshared memory should be only about 100 KiB. When I tried to start just plain KDE the memory usage is 28MiB with KDE_IS_PRELINKED=1 but it's 22MiB with normal kdeinit setup. To sum it up a bit, if we don't want to spend 1/3 of time in KDE in ld.so, we need either prelink or -Bdirect, whichever is found worthy. If -Bdirect proves to be usable, it should be good enough, for both application startup and dlopen, but we'd still need to keep kdeinit because of the memory usage. It's questionable whether prelink in its present form is worth using - it currently results in higher memory usage than using kdeinit, when used together with kdeinit there's the problem it doesn't work with dlopen. If however -Bdirect or kdeinit is found unsuitable for some reason I believe prelink could be improved to work with dlopen and most conflicts are avoided.