By default, Bugzilla does not search the list of RESOLVED bugs.
You can force it to do so by putting the upper-case word ALL in front of your search query, e.g.: ALL tdelibs
We recommend searching for bugs this way, as you may discover that your bug has already been resolved and fixed in a later release.
Bug 2422 - [Regression] kdesktop hang in SaverEngine::waitForLockEngage
Summary: [Regression] kdesktop hang in SaverEngine::waitForLockEngage
Status: RESOLVED FIXED
Alias: None
Product: TDE
Classification: Unclassified
Component: tdebase (show other bugs)
Version: R14.1.x [Trinity]
Hardware: All Linux
: P5 critical
Assignee: Timothy Pearson
URL:
Depends on:
Blocks: R14.0.1
  Show dependency treegraph
 
Reported: 2015-04-12 13:14 CDT by Slávek Banko
Modified: 2015-04-13 19:16 CDT (History)
3 users (show)

See Also:
Compiler Version:
TDE Version String:
Application Version:
Application Name:


Attachments
Fix ommited call fullyOnline() in kdesktop_lock (357 bytes, patch)
2015-04-12 18:32 CDT, Slávek Banko
Details | Diff
Kdesktop hang in threads with commit 85126bf (46.21 KB, text/plain)
2015-04-12 20:12 CDT, Slávek Banko
Details
Kdesktop hang in threads 1 (49.28 KB, text/plain)
2015-04-13 12:50 CDT, Slávek Banko
Details
Kdesktop hang in threads 2 (49.11 KB, text/plain)
2015-04-13 12:51 CDT, Slávek Banko
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Slávek Banko 2015-04-12 13:14:42 CDT
I tried to backport commits 755776b, e80c2ba, a17bfb0 and 751c96f to R14.0.1 and I discovered regression. When is disabled Delay saver start after lock, after lock the screen (Ctrl+Alt+L) kdesktop hangs in SaverEngine::waitForLockEngage. Reproducibility is 100%.

#0  0xb778ad6c in __kernel_vsyscall ()
#1  0xb5cf1a37 in do_sigsuspend (set=<optimized out>)
    at ../sysdeps/unix/sysv/linux/sigsuspend.c:63
#2  *__GI___sigsuspend (set=0xb5e26ff4) at ../sysdeps/unix/sysv/linux/sigsuspend.c:74
#3  0xb76e454d in SaverEngine::waitForLockEngage (this=0xbf94e470)
    at /tmp/buildd/tdebase-trinity-14.0.1~pre22/kdesktop/lockeng.cc:828
#4  0xb76bfa68 in KRootWm::slotLock (this=0x9206c50)
    at /tmp/buildd/tdebase-trinity-14.0.1~pre22/kdesktop/krootwm.cc:828
#5  0xb76c7db0 in KRootWm::tqt_invoke (this=0x9206c50, _id=19, _o=0xbf94dc98)
    at /tmp/buildd/tdebase-trinity-14.0.1~pre22/obj-i486-linux-gnu/kdesktop/krootwm.moc:181
#6  0xb69fb3db in TDEGlobalAccelPrivate::activate (this=0x9207788, pAction=0x9235f98, seq=...)
    at /tmp/buildd/tdelibs-trinity-14.0.1~pre19/tdecore/kglobalaccel_x11.cpp:436
#7  0xb69fbc4f in TDEGlobalAccelPrivate::x11KeyPress (this=0x9207788, pEvent=0xbf94e170)
    at /tmp/buildd/tdelibs-trinity-14.0.1~pre19/tdecore/kglobalaccel_x11.cpp:404
#8  0xb69fbf54 in TDEGlobalAccelPrivate::x11Event (this=0x9207788, pEvent=0xbf94e170)
    at /tmp/buildd/tdelibs-trinity-14.0.1~pre19/tdecore/kglobalaccel_x11.cpp:267
#9  0xb694fa29 in publicx11Event (e=<optimized out>, this=<optimized out>)
    at /tmp/buildd/tdelibs-trinity-14.0.1~pre19/tdecore/tdeapplication.cpp:1911
#10 TDEApplication::x11EventFilter (this=0x90edb78, _event=0xbf94e170)
    at /tmp/buildd/tdelibs-trinity-14.0.1~pre19/tdecore/tdeapplication.cpp:2189
#11 0xb770d041 in non-virtual thunk to XAutoLock::~XAutoLock() ()
    at /tmp/buildd/tdebase-trinity-14.0.1~pre22/kdesktop/kdesktopapp.cpp:90
#12 0xb62d2900 in qt_x11EventFilter (ev=0xbf94e170) at kernel/qapplication_x11.cpp:422
#13 0xb62e1a60 in TQApplication::x11ProcessEvent (this=0x90edb78, event=0xbf94e170)
    at kernel/qapplication_x11.cpp:3435
#14 0xb6321754 in TQEventLoop::processX11Events (this=0x9119c28)
    at kernel/qeventloop_x11_glib.cpp:353
#15 0xb6322a76 in TQEventLoop::gsourceDispatch (this=0x9119c28, gs=0x9114580)
    at kernel/qeventloop_x11_glib.cpp:614
#16 0xb6322bbc in qt_gsource_dispatch (source=0x9114580, callback=0, user_data=0x0)
    at kernel/qeventloop_x11_glib.cpp:123
#17 0xb5acb6d3 in g_main_context_dispatch () from /lib/i386-linux-gnu/libglib-2.0.so.0
#18 0xb5acba70 in ?? () from /lib/i386-linux-gnu/libglib-2.0.so.0
#19 0xb5acbb51 in g_main_context_iteration () from /lib/i386-linux-gnu/libglib-2.0.so.0
#20 0xb63217de in TQEventLoop::processEvents (this=0x9119c28, flags=4)
    at kernel/qeventloop_x11_glib.cpp:279
#21 0xb63605d0 in TQEventLoop::enterLoop (this=0x9119c28) at kernel/qeventloop.cpp:227
#22 0xb63606ad in TQEventLoop::exec (this=0x9119c28) at kernel/qeventloop.cpp:174
#23 0xb634786f in TQApplication::exec (this=0x90edb78) at kernel/qapplication.cpp:3015
#24 0xb76bed1e in kdemain (argc=1, argv=0xbf94e954)
    at /tmp/buildd/tdebase-trinity-14.0.1~pre22/kdesktop/main.cc:293
#25 0x080488fb in main (argc=1, argv=0xbf94e954)
    at /tmp/buildd/tdebase-trinity-14.0.1~pre22/obj-i486-linux-gnu/kdesktop/kdesktop_tdeinit_executable.cpp:2
#26 0xb5cdde46 in __libc_start_main (main=0x80488e0 <main(int, char**)>, argc=1,
    ubp_av=0xbf94e954, init=0x8048910 <__libc_csu_init>, fini=0x8048900 <__libc_csu_fini>,
    rtld_fini=0xb779b590, stack_end=0xbf94e94c) at libc-start.c:244
#27 0x08048841 in _start ()
Comment 1 Timothy Pearson 2015-04-12 16:03:56 CDT
I am aware of the fault and am working to correct it.
Comment 2 Timothy Pearson 2015-04-12 16:35:54 CDT
This should be fixed in GIT hash 85126bf (tdebase).  Can you test and verify?

Thanks!
Comment 3 Slávek Banko 2015-04-12 18:32:52 CDT
Created attachment 2480 [details]
Fix ommited call fullyOnline() in kdesktop_lock

I'm sorry - before I noticed that you're working on the problem, I had already prepared another patch.

Because this problem reminded me a situation that I already fixed => kdesktop_lock never sent that signal - see http://git.trinitydesktop.org/cgit/tdebase/commit/?id=cbbc7ad0 - and because in your commit e80c2bae was moved call fullyOnline() I focused on examining the situations where call fullyOnline() could be omitted.

I found such a situation, prepared a one-line patch and tested - successfully! Therefore I believe that the problem does not lie in incorrect handling of the signal, but that the signal was never sent. What do you think?

Now I'll have to test your patch...
Comment 4 Slávek Banko 2015-04-12 19:16:31 CDT
(In reply to Timothy Pearson from comment #2)
> This should be fixed in GIT hash 85126bf (tdebase).  Can you test and verify?
> 
> Thanks!

I tried your commit 85126bf (without my additional patch) and receive disturbing results! The situation is now reversed and a little worsened.

When Delay saver start after lock is turned off, the behavior seems to be okay.

When saver Delay start after lock is turned on, after lock appears unlock screen window, but will remain visible content of desktop! After unlocking the kdesktop hang.

I note - on the machine where I have my one-line fix from attachment 2480 [details] and no your commit 85126bf everything is fine.
Comment 5 Slávek Banko 2015-04-12 20:12:55 CDT
Created attachment 2481 [details]
Kdesktop hang in threads with commit 85126bf

Note - with commmit 85126bf my additional one-line patch does not help.
Information and backtrace attached.
Comment 6 Timothy Pearson 2015-04-12 20:46:21 CDT
(In reply to Slávek Banko from comment #5)
> Created attachment 2481 [details]
> Kdesktop hang in threads with commit 85126bf
> 
> Note - with commmit 85126bf my additional one-line patch does not help.
> Information and backtrace attached.

That deadlock is exactly what I was trying to prevent by forcing the signal handler to execute in an auxiliary thread.  Obviously I did something wrong!

Let me analyse this further.  The whole situation is complicated by the fact that when I originally wrote this code I was unaware of the fact that Linux signal handers can sporadically deadlock in malloc()/free() due to attempted reentrancy; this same problem actually hit the Chromium developers so apparently it isn't common knowledge.
Comment 7 Timothy Pearson 2015-04-12 20:59:00 CDT
(In reply to Slávek Banko from comment #4)
> (In reply to Timothy Pearson from comment #2)
> > This should be fixed in GIT hash 85126bf (tdebase).  Can you test and verify?
> > 
> > Thanks!
> 
> I tried your commit 85126bf (without my additional patch) and receive
> disturbing results! The situation is now reversed and a little worsened.
> 
> When Delay saver start after lock is turned off, the behavior seems to be
> okay.
> 
> When saver Delay start after lock is turned on, after lock appears unlock
> screen window, but will remain visible content of desktop! After unlocking
> the kdesktop hang.
> 
> I note - on the machine where I have my one-line fix from attachment 2480 [details]
> [details] and no your commit 85126bf everything is fine.

We will need your one line fix; its omission appears to have been in error.  Can you go ahead and commit that while I track down the threading deadlock?

Thanks!
Comment 8 Timothy Pearson 2015-04-12 21:24:53 CDT
(In reply to Slávek Banko from comment #5)
> Created attachment 2481 [details]
> Kdesktop hang in threads with commit 85126bf
> 
> Note - with commmit 85126bf my additional one-line patch does not help.
> Information and backtrace attached.

I'm not sure this is related to my recent commit.  My analysis indicates the deadlock is sporadic and is occurring in kdesktop_lock, which naturally causes kdesktop to wait forever for a response from the hung kdesktop_lock process.

Curiously the kdesktop_lock hang is in LockProcess::~LockProcess, stuck on deleting one of the timers.  Furthermore at the time of the hang only one thread is active.
Comment 9 Timothy Pearson 2015-04-12 21:47:41 CDT
This is my backtrace from kdesktop_lock.  The only thing I can figure is that the original kdesktop_lock process did not exit due to the recently terminated helper thread not releasing the TQApplication::tqt_mutex global lock when it terminated.

(gdb) thread apply all bt

Thread 1 (Thread 0x7f5179eb37c0 (LWP 17024)):
#0  __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
#1  0x00007f51726084b9 in _L_lock_909 () from /lib/x86_64-linux-gnu/libpthread.so.0
#2  0x00007f51726082e0 in __GI___pthread_mutex_lock (mutex=0xe08748) at ../nptl/pthread_mutex_lock.c:79
#3  0x00007f5176f96450 in TQRecursiveMutexPrivate::lock (this=0xe08740) at tools/qmutex_unix.cpp:251
#4  0x00007f5176d29b71 in TQApplication::internalNotify (this=<optimized out>, receiver=0xe36a20, e=0x7fff89590840) at kernel/qapplication.cpp:2875
#5  0x00007f5176d2a068 in TQApplication::notify (this=this@entry=0xdbd850, receiver=receiver@entry=0xe36a20, e=e@entry=0x7fff89590840)
    at kernel/qapplication.cpp:2726
#6  0x00007f5177bc5d0b in TDEApplication::notify (this=0xdbd850, receiver=0xe36a20, event=0x7fff89590840)
    at /build/buildd/tdelibs-trinity-14.1.0~r1241/tdecore/tdeapplication.cpp:660
#7  0x00007f5176d7ac7c in TQObject::removeChild (this=0xe36a20, obj=0xe38260) at kernel/qobject.cpp:1682
#8  0x00007f5176d7d698 in TQObject::~TQObject (this=0xe08748, __in_chrg=<optimized out>) at kernel/qobject.cpp:737
#9  0x00007f5176d9a649 in TQTimer::~TQTimer (this=0xe38260, __in_chrg=<optimized out>) at kernel/qtimer.cpp:131
#10 0x0000000000412eb2 in LockProcess::~LockProcess (this=0xe36a20, __in_chrg=<optimized out>)
    at /root/TEMP2/tdebase-trinity-14.0.0-r1664/kdesktop/lock/lockprocess.cc:326
#11 0x00000000004132d6 in LockProcess::~LockProcess (this=0xe36a20, __in_chrg=<optimized out>)
    at /root/TEMP2/tdebase-trinity-14.0.0-r1664/kdesktop/lock/lockprocess.cc:350
#12 0x000000000042eb5d in main (argc=3, argv=0x7fff89592d68) at /root/TEMP2/tdebase-trinity-14.0.0-r1664/kdesktop/lock/main.cc:534
#13 0x00007f5174b36b45 in __libc_start_main (main=0x42db26 <main(int, char**)>, argc=3, argv=0x7fff89592d68, init=<optimized out>, fini=<optimized out>,
    rtld_fini=<optimized out>, stack_end=0x7fff89592d58) at libc-start.c:287
#14 0x0000000000411d39 in ?? ()
Comment 10 Timothy Pearson 2015-04-12 22:34:18 CDT
I traced this to the use of pthread_cancel() in kdesktop_lock.  Fixed in GIT hash e569dd4 (tdebase).

Can you please test and verify?

Thanks!
Comment 11 Timothy Pearson 2015-04-12 22:41:37 CDT
(In reply to Timothy Pearson from comment #10)
> I traced this to the use of pthread_cancel() in kdesktop_lock.  Fixed in GIT
> hash e569dd4 (tdebase).
> 
> Can you please test and verify?
> 
> Thanks!

On further testing I'm still seeing some issues.  The interactions between kdesktop and kdesktop_lock are very complex and difficult to get right, not least of all due to the dcop calls.
Comment 12 Timothy Pearson 2015-04-12 23:08:14 CDT
(In reply to Timothy Pearson from comment #11)
> (In reply to Timothy Pearson from comment #10)
> > I traced this to the use of pthread_cancel() in kdesktop_lock.  Fixed in GIT
> > hash e569dd4 (tdebase).
> > 
> > Can you please test and verify?
> > 
> > Thanks!
> 
> On further testing I'm still seeing some issues.  The interactions between
> kdesktop and kdesktop_lock are very complex and difficult to get right, not
> least of all due to the dcop calls.

The kdesktop_lock failure I was seeing due to incorrect signal reception should be fixed in GIT hash 37bc5cd.  As always please test and verify; the interactions are very complex and any race conditions can take time to manifest.

Thanks!
Comment 13 Slávek Banko 2015-04-13 11:33:19 CDT
Comment on attachment 2480 [details]
Fix ommited call fullyOnline() in kdesktop_lock

Pushed to GIT in hash c1b62c1c (tdebase).
Comment 14 Slávek Banko 2015-04-13 12:15:16 CDT
There is improvement - on lock the screen is properly hidden content of desktop. But as I tested, regardless of the Delay saver start after lock, on first locking is kdesktop okay but after the second locking kdesktop remains hang.
Comment 15 Timothy Pearson 2015-04-13 12:45:12 CDT
(In reply to Slávek Banko from comment #14)
> There is improvement - on lock the screen is properly hidden content of
> desktop. But as I tested, regardless of the Delay saver start after lock, on
> first locking is kdesktop okay but after the second locking kdesktop remains
> hang.

I am currently unable to reproduce this issue.  Can you post:
1.) The [ScreenSaver] section of your kdesktoprc file
2.) A backtrace ("thread apply all bt") of both kdesktop and kdesktop_lock during the hang?

Thanks!
Comment 16 Slávek Banko 2015-04-13 12:49:09 CDT
(In reply to Timothy Pearson from comment #15)
> (In reply to Slávek Banko from comment #14)
> > There is improvement - on lock the screen is properly hidden content of
> > desktop. But as I tested, regardless of the Delay saver start after lock, on
> > first locking is kdesktop okay but after the second locking kdesktop remains
> > hang.
> 
> I am currently unable to reproduce this issue.  Can you post:
> 1.) The [ScreenSaver] section of your kdesktoprc file
> 2.) A backtrace ("thread apply all bt") of both kdesktop and kdesktop_lock
> during the hang?
> 
> Thanks!

[ScreenSaver]
DelaySaverStart=true
Enabled=true
HideActiveWindowsFromSaver=true
HideCancelButton=false
Lock=false
LockGrace=20000
Saver=KClock.desktop
Timeout=60
UseTDESAK=true
UseUnmanagedLockWindows=false
Comment 17 Slávek Banko 2015-04-13 12:50:47 CDT
Created attachment 2482 [details]
Kdesktop hang in threads 1

This should be the case when DelaySaverStart = false.
Comment 18 Slávek Banko 2015-04-13 12:51:24 CDT
Created attachment 2483 [details]
Kdesktop hang in threads 2

This should be the case when DelaySaverStart = true.
Comment 19 Slávek Banko 2015-04-13 12:56:17 CDT
(In reply to Timothy Pearson from comment #15)
> (In reply to Slávek Banko from comment #14)
> > There is improvement - on lock the screen is properly hidden content of
> > desktop. But as I tested, regardless of the Delay saver start after lock, on
> > first locking is kdesktop okay but after the second locking kdesktop remains
> > hang.
> 
> I am currently unable to reproduce this issue.  Can you post:
> 1.) The [ScreenSaver] section of your kdesktoprc file
> 2.) A backtrace ("thread apply all bt") of both kdesktop and kdesktop_lock
> during the hang?
> 
> Thanks!

Now I have checked that kdesktop_lock is no longer running when kdesktop hangs => backtrace for kdesktop_lock can not be delivered.
Comment 20 Timothy Pearson 2015-04-13 13:10:47 CDT
(In reply to Slávek Banko from comment #19)
> (In reply to Timothy Pearson from comment #15)
> > (In reply to Slávek Banko from comment #14)
> > > There is improvement - on lock the screen is properly hidden content of
> > > desktop. But as I tested, regardless of the Delay saver start after lock, on
> > > first locking is kdesktop okay but after the second locking kdesktop remains
> > > hang.
> > 
> > I am currently unable to reproduce this issue.  Can you post:
> > 1.) The [ScreenSaver] section of your kdesktoprc file
> > 2.) A backtrace ("thread apply all bt") of both kdesktop and kdesktop_lock
> > during the hang?
> > 
> > Thanks!
> 
> Now I have checked that kdesktop_lock is no longer running when kdesktop
> hangs => backtrace for kdesktop_lock can not be delivered.

Thank you for the backtraces.  It looks like we are dealing with a deadlock that I have not experienced on my test systems at all.

At this point I need to know what is holding the TQt global lock.  Can you attach gdb to that process and obtain the owner thread TID for each thread via the method described here? http://stackoverflow.com/a/3491304

Also attaching the output of "info threads" from that same gdb session would be helpful.

Thanks!
Comment 21 Timothy Pearson 2015-04-13 13:23:27 CDT
(In reply to Timothy Pearson from comment #20)
> Thank you for the backtraces.  It looks like we are dealing with a deadlock
> that I have not experienced on my test systems at all.
> 
> At this point I need to know what is holding the TQt global lock.  Can you
> attach gdb to that process and obtain the owner thread TID for each thread
> via the method described here? http://stackoverflow.com/a/3491304
> 
> Also attaching the output of "info threads" from that same gdb session would
> be helpful.
> 
> Thanks!

On further analysis it appears that somehow we are reentering a libX11 function, likely due to TQt3 not delaying the lockProcessWaiting() call to the main thread for an as of yet unknown reason.  I no longer need the TID information I requested above.
Comment 22 Timothy Pearson 2015-04-13 15:25:52 CDT
(In reply to Timothy Pearson from comment #21)
> On further analysis it appears that somehow we are reentering a libX11
> function, likely due to TQt3 not delaying the lockProcessWaiting() call to
> the main thread for an as of yet unknown reason.  I no longer need the TID
> information I requested above.

Based on the backtrace I think I see a potential problem, specifically an internal xcb poll() in the main thread is being interrupted by the incoming signal, which is then (somehow) confusing libX11 and causing reentrancy with subsequent deadlock.

I have attempted to lock out signal reception in the main thread in GIT hash ab8dfba (tdebase).  Can you please test and see if this resolves the issue?

Thanks!
Comment 23 Slávek Banko 2015-04-13 18:34:42 CDT
(In reply to Timothy Pearson from comment #22)
> (In reply to Timothy Pearson from comment #21)
> > On further analysis it appears that somehow we are reentering a libX11
> > function, likely due to TQt3 not delaying the lockProcessWaiting() call to
> > the main thread for an as of yet unknown reason.  I no longer need the TID
> > information I requested above.
> 
> Based on the backtrace I think I see a potential problem, specifically an
> internal xcb poll() in the main thread is being interrupted by the incoming
> signal, which is then (somehow) confusing libX11 and causing reentrancy with
> subsequent deadlock.
> 
> I have attempted to lock out signal reception in the main thread in GIT hash
> ab8dfba (tdebase).  Can you please test and see if this resolves the issue?
> 
> Thanks!

Great! On my test machine now works perfeclty!
Comment 24 Timothy Pearson 2015-04-13 19:16:54 CDT
(In reply to Slávek Banko from comment #23)
> (In reply to Timothy Pearson from comment #22)
> > (In reply to Timothy Pearson from comment #21)
> > > On further analysis it appears that somehow we are reentering a libX11
> > > function, likely due to TQt3 not delaying the lockProcessWaiting() call to
> > > the main thread for an as of yet unknown reason.  I no longer need the TID
> > > information I requested above.
> > 
> > Based on the backtrace I think I see a potential problem, specifically an
> > internal xcb poll() in the main thread is being interrupted by the incoming
> > signal, which is then (somehow) confusing libX11 and causing reentrancy with
> > subsequent deadlock.
> > 
> > I have attempted to lock out signal reception in the main thread in GIT hash
> > ab8dfba (tdebase).  Can you please test and see if this resolves the issue?
> > 
> > Thanks!
> 
> Great! On my test machine now works perfeclty!

Glad to hear it!  The entire kdesktop/kdesktop_lock system should be much more reliable at this point; I have gone through and removed all the signal-unsafe code paths that I could find, and blocked signals to the unsafe code paths that remain.