[Rpm-devel] rpm-4.4.6 on solaris9-sparc64
n3npq.jbj at gmail.com
Wed Jun 28 13:36:28 EDT 2006
On Jun 28, 2006, at 1:07 PM, Frank Cusack wrote:
> On June 28, 2006 10:50:49 AM -0400 Jeff Johnson <n3npq at mac.com> wrote:
>> On Jun 28, 2006, at 10:05 AM, Frank Cusack wrote:
>>> On June 28, 2006 6:49:15 AM -0400 Jeff Johnson
>>> <n3npq.jbj at gmail.com> wrote:
>>>> On Jun 28, 2006, at 1:11 AM, Frank Cusack wrote:
>>>>> On June 26, 2006 4:04:18 PM -0400 Jeff Johnson
>>>>> <n3npq.jbj at gmail.com> wrote:
>>>>>> The historical issue that is forcing this horrendopusly
>>>>>> complicated mess is that
>>>>>> apiary (at one time the replacment for beehive) insisted on
>>>>>> running rpmlib
>>>>>> on a thread, and posix signals do not have SIGCHLD delivery to
>>>>>> that thread.
>>>>> Sorry for butting in, but why is rpm threaded?
>>>> Note that any library can be put a thread, threaded or not, by
>>>> loading a module in bindings on a
>>>> rpmlib itself is not thread safe, but there are ways to run
>>>> libraries on a single thread safely.
>>>> However, SIGCHLD instantly ceases to be delivered to any signal
>>>> handler on the thread thread.
>>> How's that? Signal handlers are per-process, not per thread. If an
>>> async signal (like SIGCHLD) is delivered, any thread not blocking it
>>> might get it, but if there is a handler it probably doesn't matter
>>> too much which thread takes it.
>> Sure signal handlers were per-process once, but posix signal
>> delivery is not per-process (one of
>> many benefits established by NPTL, yes, benefits), so per-thread
>> signal handlers can/do exist and
>> are absolutely necessary for per-thread signal delivery.
> That's incorrect. POSIX signal delivery is *either* process-directed
> or thread-directed; this depends on the type of signal. e.g. SEGV
> always goes to the thread that caused it, however an external SIGHUP
> is process-directed and goes to a random thread depending on per-
> signal mask.
> That's delivery. Signal HANDLERS are still per-process. There can
> be only one handler per signal. If an async signal (SIGCHLD) is
> it can still be *delivered* to any thread not blocking it, but the
> handler runs and the returns. Different threads CANNOT setup thread-
> specific handlers. If a thread sets up a handler, the previous
> is no longer used.
While I'm sure your statements are correct, and I appreciated being
problem remains the same:
If applications insist on running rpmlib (specifically the
python ts.run()) on a thread, then
SIGCHLD (as currently implemented) must be able to waken that
thread when blocked.
>> The issue is that SIGCHLD has delivery only to main thread.
>> So if you put rpmlib on a thread, then rpm's internal lazy signal
>> handler is per-thread, not
>> and SIGCHLD is never delivered, so scripts never wake up. There
>> are ways to instantiate the
>> on the main thread, but do the ts.run() on a different thread,
>> and there may well be additional
>> on how the handlers are setup.
> I didn't follow that, but having rpmlib on a different thread does not
> affect whether or not SIGCHLD is delivered/handled. If there is a
> handler, and at least one thread is not blocking the signal, it will
> be delivered and handled.
You are incorrect, the problem of thread not awakening at compl,etion
of scriptlet was easily reproducible
when the code was written 3+ years ago.
> Now, RH's NPTL (not kernel.org) does get process-directed vs thread-
> wrong, so maybe that affects things, but I think Tim is saying he sees
> this on Solaris.
Wrong or right is immaterial: rpmlib's reaping must be reliable in
both solaris and linux (both RH and other) cases.
>> So an alternative implementation is to clock the SIGCHLD delivery
>> from the main thread to the
>> thread on which ts.run() was called/blocked using mutexes and
>> condvar, waiting for a scriptlet
>> to finish.
>> A 4th alternative might seem to be using waitpid() rather than
>> SIGCHLD, but graceful exit
>> on ^C et al to avoid stale bdb locks forces a signal handler
>> anyways, SIGCHLD delivery is just
>> another annoying wrinkle.
> I can't tell if you're barking up the wrong tree here without looking
> at the code, but I suspect the problem is that SIGCHLD is being posted
> to the wrong thread and it isn't being taken into account that some
> thread other than the one that starts scriptlets is receiving it.
So look at the code rather than guessing, *please*.
> If that's the case, one fix (probably the cleanest, again, if what I'm
> saying even makes sense considering I haven't looked at this code) is
> simply to block SIGCHLD before doing anything else (importantly:
> spawning any threads), then in ts.run() (I guess this starts the
> scriptlets) do a blocking sigwait(). I assume ts.run() already just
> blocks waiting for a pipe read.
If you haven't looked at the code, you can't possibly understand the
And if you haven't tested the code, then any signakl fixing is
suspect. Signal handlers are extremely
difficult to get correct, and while the current implementation is
more complex than I would
like, the implementation has provavbly been working sufficiently
reliably to install gazillions
of packages. That's all I really care about.
>> All of which starts to answer your question why rpm is -lpthread,
>> another reason being Berkeley
>> DB concurrent access
>> unifying thread and process locks, yet another reason being -
>> lbeecrypt having a threaded entropy
>> gatherer (not currently used by rpm, necessary if rpm is to sign,
>> not just verify, signatures or
>> use per-session https keys).
> Right. In my version I build beecrypt without thread support,
> for bdb. (concurrent access with bdb works fine without threads as
> long as
> all users of bdb for a given db agree on the lock type -- might not
> work for
> a distro like RH where the system bdb uses threads and the rpm db
> might be
> accessed outside of rpmlib; but if you insist that all access to
> rpm db
> go through rpmlib it's ok to have different locking than the system
So fork away, you're on your own.
I don't insist on anything. An rpmdb is just a Berkeley DB, can be
accessed by any and all
implementations outside of rpm, whether by system bdb, or internal
bdb, that can do
an implementation that interacts with other applications that use an
I have chosen a concurrent access model because that is a simpler API
for most applications imho,
and is demonstrably high performing. Show me a higher performing
shared db and I will use
that instead. Hint: sqlite3 is only 2.4 times slower than Berkeley DB
when I measured.
But, by all means, if you want full database shared/exclusive
locking, than have at through whatever means
floats your boat.
Meanwhile, you asked a question which I answered. You're welcome.
We now return to trying to figger why Tim Mooney is seeing
scriptlet's hang on a platform that I have no reasonable access to.
73 de Jeff
More information about the Rpm-devel