| Summary: | [Regression] kdesktop hang in SaverEngine::waitForLockEngage | ||
|---|---|---|---|
| Product: | TDE | Reporter: | Slávek Banko <slavek.banko> |
| Component: | tdebase | Assignee: | Timothy Pearson <kb9vqf> |
| Status: | RESOLVED FIXED | ||
| Severity: | critical | CC: | bugwatch, kb9vqf, slavek.banko |
| Priority: | P5 | ||
| Version: | R14.1.x [Trinity] | ||
| Hardware: | All | ||
| OS: | Linux | ||
| Compiler Version: | TDE Version String: | ||
| Application Version: | Application Name: | ||
| Bug Depends on: | |||
| Bug Blocks: | 2246 | ||
| Attachments: |
Fix ommited call fullyOnline() in kdesktop_lock
Kdesktop hang in threads with commit 85126bf Kdesktop hang in threads 1 Kdesktop hang in threads 2 |
||
I am aware of the fault and am working to correct it. This should be fixed in GIT hash 85126bf (tdebase). Can you test and verify? Thanks! Created attachment 2480 [details] Fix ommited call fullyOnline() in kdesktop_lock I'm sorry - before I noticed that you're working on the problem, I had already prepared another patch. Because this problem reminded me a situation that I already fixed => kdesktop_lock never sent that signal - see http://git.trinitydesktop.org/cgit/tdebase/commit/?id=cbbc7ad0 - and because in your commit e80c2bae was moved call fullyOnline() I focused on examining the situations where call fullyOnline() could be omitted. I found such a situation, prepared a one-line patch and tested - successfully! Therefore I believe that the problem does not lie in incorrect handling of the signal, but that the signal was never sent. What do you think? Now I'll have to test your patch... (In reply to Timothy Pearson from comment #2) > This should be fixed in GIT hash 85126bf (tdebase). Can you test and verify? > > Thanks! I tried your commit 85126bf (without my additional patch) and receive disturbing results! The situation is now reversed and a little worsened. When Delay saver start after lock is turned off, the behavior seems to be okay. When saver Delay start after lock is turned on, after lock appears unlock screen window, but will remain visible content of desktop! After unlocking the kdesktop hang. I note - on the machine where I have my one-line fix from attachment 2480 [details] and no your commit 85126bf everything is fine. Created attachment 2481 [details]
Kdesktop hang in threads with commit 85126bf
Note - with commmit 85126bf my additional one-line patch does not help.
Information and backtrace attached.
(In reply to Slávek Banko from comment #5) > Created attachment 2481 [details] > Kdesktop hang in threads with commit 85126bf > > Note - with commmit 85126bf my additional one-line patch does not help. > Information and backtrace attached. That deadlock is exactly what I was trying to prevent by forcing the signal handler to execute in an auxiliary thread. Obviously I did something wrong! Let me analyse this further. The whole situation is complicated by the fact that when I originally wrote this code I was unaware of the fact that Linux signal handers can sporadically deadlock in malloc()/free() due to attempted reentrancy; this same problem actually hit the Chromium developers so apparently it isn't common knowledge. (In reply to Slávek Banko from comment #4) > (In reply to Timothy Pearson from comment #2) > > This should be fixed in GIT hash 85126bf (tdebase). Can you test and verify? > > > > Thanks! > > I tried your commit 85126bf (without my additional patch) and receive > disturbing results! The situation is now reversed and a little worsened. > > When Delay saver start after lock is turned off, the behavior seems to be > okay. > > When saver Delay start after lock is turned on, after lock appears unlock > screen window, but will remain visible content of desktop! After unlocking > the kdesktop hang. > > I note - on the machine where I have my one-line fix from attachment 2480 [details] > [details] and no your commit 85126bf everything is fine. We will need your one line fix; its omission appears to have been in error. Can you go ahead and commit that while I track down the threading deadlock? Thanks! (In reply to Slávek Banko from comment #5) > Created attachment 2481 [details] > Kdesktop hang in threads with commit 85126bf > > Note - with commmit 85126bf my additional one-line patch does not help. > Information and backtrace attached. I'm not sure this is related to my recent commit. My analysis indicates the deadlock is sporadic and is occurring in kdesktop_lock, which naturally causes kdesktop to wait forever for a response from the hung kdesktop_lock process. Curiously the kdesktop_lock hang is in LockProcess::~LockProcess, stuck on deleting one of the timers. Furthermore at the time of the hang only one thread is active. This is my backtrace from kdesktop_lock. The only thing I can figure is that the original kdesktop_lock process did not exit due to the recently terminated helper thread not releasing the TQApplication::tqt_mutex global lock when it terminated.
(gdb) thread apply all bt
Thread 1 (Thread 0x7f5179eb37c0 (LWP 17024)):
#0 __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
#1 0x00007f51726084b9 in _L_lock_909 () from /lib/x86_64-linux-gnu/libpthread.so.0
#2 0x00007f51726082e0 in __GI___pthread_mutex_lock (mutex=0xe08748) at ../nptl/pthread_mutex_lock.c:79
#3 0x00007f5176f96450 in TQRecursiveMutexPrivate::lock (this=0xe08740) at tools/qmutex_unix.cpp:251
#4 0x00007f5176d29b71 in TQApplication::internalNotify (this=<optimized out>, receiver=0xe36a20, e=0x7fff89590840) at kernel/qapplication.cpp:2875
#5 0x00007f5176d2a068 in TQApplication::notify (this=this@entry=0xdbd850, receiver=receiver@entry=0xe36a20, e=e@entry=0x7fff89590840)
at kernel/qapplication.cpp:2726
#6 0x00007f5177bc5d0b in TDEApplication::notify (this=0xdbd850, receiver=0xe36a20, event=0x7fff89590840)
at /build/buildd/tdelibs-trinity-14.1.0~r1241/tdecore/tdeapplication.cpp:660
#7 0x00007f5176d7ac7c in TQObject::removeChild (this=0xe36a20, obj=0xe38260) at kernel/qobject.cpp:1682
#8 0x00007f5176d7d698 in TQObject::~TQObject (this=0xe08748, __in_chrg=<optimized out>) at kernel/qobject.cpp:737
#9 0x00007f5176d9a649 in TQTimer::~TQTimer (this=0xe38260, __in_chrg=<optimized out>) at kernel/qtimer.cpp:131
#10 0x0000000000412eb2 in LockProcess::~LockProcess (this=0xe36a20, __in_chrg=<optimized out>)
at /root/TEMP2/tdebase-trinity-14.0.0-r1664/kdesktop/lock/lockprocess.cc:326
#11 0x00000000004132d6 in LockProcess::~LockProcess (this=0xe36a20, __in_chrg=<optimized out>)
at /root/TEMP2/tdebase-trinity-14.0.0-r1664/kdesktop/lock/lockprocess.cc:350
#12 0x000000000042eb5d in main (argc=3, argv=0x7fff89592d68) at /root/TEMP2/tdebase-trinity-14.0.0-r1664/kdesktop/lock/main.cc:534
#13 0x00007f5174b36b45 in __libc_start_main (main=0x42db26 <main(int, char**)>, argc=3, argv=0x7fff89592d68, init=<optimized out>, fini=<optimized out>,
rtld_fini=<optimized out>, stack_end=0x7fff89592d58) at libc-start.c:287
#14 0x0000000000411d39 in ?? ()
I traced this to the use of pthread_cancel() in kdesktop_lock. Fixed in GIT hash e569dd4 (tdebase). Can you please test and verify? Thanks! (In reply to Timothy Pearson from comment #10) > I traced this to the use of pthread_cancel() in kdesktop_lock. Fixed in GIT > hash e569dd4 (tdebase). > > Can you please test and verify? > > Thanks! On further testing I'm still seeing some issues. The interactions between kdesktop and kdesktop_lock are very complex and difficult to get right, not least of all due to the dcop calls. (In reply to Timothy Pearson from comment #11) > (In reply to Timothy Pearson from comment #10) > > I traced this to the use of pthread_cancel() in kdesktop_lock. Fixed in GIT > > hash e569dd4 (tdebase). > > > > Can you please test and verify? > > > > Thanks! > > On further testing I'm still seeing some issues. The interactions between > kdesktop and kdesktop_lock are very complex and difficult to get right, not > least of all due to the dcop calls. The kdesktop_lock failure I was seeing due to incorrect signal reception should be fixed in GIT hash 37bc5cd. As always please test and verify; the interactions are very complex and any race conditions can take time to manifest. Thanks! Comment on attachment 2480 [details]
Fix ommited call fullyOnline() in kdesktop_lock
Pushed to GIT in hash c1b62c1c (tdebase).
There is improvement - on lock the screen is properly hidden content of desktop. But as I tested, regardless of the Delay saver start after lock, on first locking is kdesktop okay but after the second locking kdesktop remains hang. (In reply to Slávek Banko from comment #14) > There is improvement - on lock the screen is properly hidden content of > desktop. But as I tested, regardless of the Delay saver start after lock, on > first locking is kdesktop okay but after the second locking kdesktop remains > hang. I am currently unable to reproduce this issue. Can you post: 1.) The [ScreenSaver] section of your kdesktoprc file 2.) A backtrace ("thread apply all bt") of both kdesktop and kdesktop_lock during the hang? Thanks! (In reply to Timothy Pearson from comment #15) > (In reply to Slávek Banko from comment #14) > > There is improvement - on lock the screen is properly hidden content of > > desktop. But as I tested, regardless of the Delay saver start after lock, on > > first locking is kdesktop okay but after the second locking kdesktop remains > > hang. > > I am currently unable to reproduce this issue. Can you post: > 1.) The [ScreenSaver] section of your kdesktoprc file > 2.) A backtrace ("thread apply all bt") of both kdesktop and kdesktop_lock > during the hang? > > Thanks! [ScreenSaver] DelaySaverStart=true Enabled=true HideActiveWindowsFromSaver=true HideCancelButton=false Lock=false LockGrace=20000 Saver=KClock.desktop Timeout=60 UseTDESAK=true UseUnmanagedLockWindows=false Created attachment 2482 [details]
Kdesktop hang in threads 1
This should be the case when DelaySaverStart = false.
Created attachment 2483 [details]
Kdesktop hang in threads 2
This should be the case when DelaySaverStart = true.
(In reply to Timothy Pearson from comment #15) > (In reply to Slávek Banko from comment #14) > > There is improvement - on lock the screen is properly hidden content of > > desktop. But as I tested, regardless of the Delay saver start after lock, on > > first locking is kdesktop okay but after the second locking kdesktop remains > > hang. > > I am currently unable to reproduce this issue. Can you post: > 1.) The [ScreenSaver] section of your kdesktoprc file > 2.) A backtrace ("thread apply all bt") of both kdesktop and kdesktop_lock > during the hang? > > Thanks! Now I have checked that kdesktop_lock is no longer running when kdesktop hangs => backtrace for kdesktop_lock can not be delivered. (In reply to Slávek Banko from comment #19) > (In reply to Timothy Pearson from comment #15) > > (In reply to Slávek Banko from comment #14) > > > There is improvement - on lock the screen is properly hidden content of > > > desktop. But as I tested, regardless of the Delay saver start after lock, on > > > first locking is kdesktop okay but after the second locking kdesktop remains > > > hang. > > > > I am currently unable to reproduce this issue. Can you post: > > 1.) The [ScreenSaver] section of your kdesktoprc file > > 2.) A backtrace ("thread apply all bt") of both kdesktop and kdesktop_lock > > during the hang? > > > > Thanks! > > Now I have checked that kdesktop_lock is no longer running when kdesktop > hangs => backtrace for kdesktop_lock can not be delivered. Thank you for the backtraces. It looks like we are dealing with a deadlock that I have not experienced on my test systems at all. At this point I need to know what is holding the TQt global lock. Can you attach gdb to that process and obtain the owner thread TID for each thread via the method described here? http://stackoverflow.com/a/3491304 Also attaching the output of "info threads" from that same gdb session would be helpful. Thanks! (In reply to Timothy Pearson from comment #20) > Thank you for the backtraces. It looks like we are dealing with a deadlock > that I have not experienced on my test systems at all. > > At this point I need to know what is holding the TQt global lock. Can you > attach gdb to that process and obtain the owner thread TID for each thread > via the method described here? http://stackoverflow.com/a/3491304 > > Also attaching the output of "info threads" from that same gdb session would > be helpful. > > Thanks! On further analysis it appears that somehow we are reentering a libX11 function, likely due to TQt3 not delaying the lockProcessWaiting() call to the main thread for an as of yet unknown reason. I no longer need the TID information I requested above. (In reply to Timothy Pearson from comment #21) > On further analysis it appears that somehow we are reentering a libX11 > function, likely due to TQt3 not delaying the lockProcessWaiting() call to > the main thread for an as of yet unknown reason. I no longer need the TID > information I requested above. Based on the backtrace I think I see a potential problem, specifically an internal xcb poll() in the main thread is being interrupted by the incoming signal, which is then (somehow) confusing libX11 and causing reentrancy with subsequent deadlock. I have attempted to lock out signal reception in the main thread in GIT hash ab8dfba (tdebase). Can you please test and see if this resolves the issue? Thanks! (In reply to Timothy Pearson from comment #22) > (In reply to Timothy Pearson from comment #21) > > On further analysis it appears that somehow we are reentering a libX11 > > function, likely due to TQt3 not delaying the lockProcessWaiting() call to > > the main thread for an as of yet unknown reason. I no longer need the TID > > information I requested above. > > Based on the backtrace I think I see a potential problem, specifically an > internal xcb poll() in the main thread is being interrupted by the incoming > signal, which is then (somehow) confusing libX11 and causing reentrancy with > subsequent deadlock. > > I have attempted to lock out signal reception in the main thread in GIT hash > ab8dfba (tdebase). Can you please test and see if this resolves the issue? > > Thanks! Great! On my test machine now works perfeclty! (In reply to Slávek Banko from comment #23) > (In reply to Timothy Pearson from comment #22) > > (In reply to Timothy Pearson from comment #21) > > > On further analysis it appears that somehow we are reentering a libX11 > > > function, likely due to TQt3 not delaying the lockProcessWaiting() call to > > > the main thread for an as of yet unknown reason. I no longer need the TID > > > information I requested above. > > > > Based on the backtrace I think I see a potential problem, specifically an > > internal xcb poll() in the main thread is being interrupted by the incoming > > signal, which is then (somehow) confusing libX11 and causing reentrancy with > > subsequent deadlock. > > > > I have attempted to lock out signal reception in the main thread in GIT hash > > ab8dfba (tdebase). Can you please test and see if this resolves the issue? > > > > Thanks! > > Great! On my test machine now works perfeclty! Glad to hear it! The entire kdesktop/kdesktop_lock system should be much more reliable at this point; I have gone through and removed all the signal-unsafe code paths that I could find, and blocked signals to the unsafe code paths that remain. |
I tried to backport commits 755776b, e80c2ba, a17bfb0 and 751c96f to R14.0.1 and I discovered regression. When is disabled Delay saver start after lock, after lock the screen (Ctrl+Alt+L) kdesktop hangs in SaverEngine::waitForLockEngage. Reproducibility is 100%. #0 0xb778ad6c in __kernel_vsyscall () #1 0xb5cf1a37 in do_sigsuspend (set=<optimized out>) at ../sysdeps/unix/sysv/linux/sigsuspend.c:63 #2 *__GI___sigsuspend (set=0xb5e26ff4) at ../sysdeps/unix/sysv/linux/sigsuspend.c:74 #3 0xb76e454d in SaverEngine::waitForLockEngage (this=0xbf94e470) at /tmp/buildd/tdebase-trinity-14.0.1~pre22/kdesktop/lockeng.cc:828 #4 0xb76bfa68 in KRootWm::slotLock (this=0x9206c50) at /tmp/buildd/tdebase-trinity-14.0.1~pre22/kdesktop/krootwm.cc:828 #5 0xb76c7db0 in KRootWm::tqt_invoke (this=0x9206c50, _id=19, _o=0xbf94dc98) at /tmp/buildd/tdebase-trinity-14.0.1~pre22/obj-i486-linux-gnu/kdesktop/krootwm.moc:181 #6 0xb69fb3db in TDEGlobalAccelPrivate::activate (this=0x9207788, pAction=0x9235f98, seq=...) at /tmp/buildd/tdelibs-trinity-14.0.1~pre19/tdecore/kglobalaccel_x11.cpp:436 #7 0xb69fbc4f in TDEGlobalAccelPrivate::x11KeyPress (this=0x9207788, pEvent=0xbf94e170) at /tmp/buildd/tdelibs-trinity-14.0.1~pre19/tdecore/kglobalaccel_x11.cpp:404 #8 0xb69fbf54 in TDEGlobalAccelPrivate::x11Event (this=0x9207788, pEvent=0xbf94e170) at /tmp/buildd/tdelibs-trinity-14.0.1~pre19/tdecore/kglobalaccel_x11.cpp:267 #9 0xb694fa29 in publicx11Event (e=<optimized out>, this=<optimized out>) at /tmp/buildd/tdelibs-trinity-14.0.1~pre19/tdecore/tdeapplication.cpp:1911 #10 TDEApplication::x11EventFilter (this=0x90edb78, _event=0xbf94e170) at /tmp/buildd/tdelibs-trinity-14.0.1~pre19/tdecore/tdeapplication.cpp:2189 #11 0xb770d041 in non-virtual thunk to XAutoLock::~XAutoLock() () at /tmp/buildd/tdebase-trinity-14.0.1~pre22/kdesktop/kdesktopapp.cpp:90 #12 0xb62d2900 in qt_x11EventFilter (ev=0xbf94e170) at kernel/qapplication_x11.cpp:422 #13 0xb62e1a60 in TQApplication::x11ProcessEvent (this=0x90edb78, event=0xbf94e170) at kernel/qapplication_x11.cpp:3435 #14 0xb6321754 in TQEventLoop::processX11Events (this=0x9119c28) at kernel/qeventloop_x11_glib.cpp:353 #15 0xb6322a76 in TQEventLoop::gsourceDispatch (this=0x9119c28, gs=0x9114580) at kernel/qeventloop_x11_glib.cpp:614 #16 0xb6322bbc in qt_gsource_dispatch (source=0x9114580, callback=0, user_data=0x0) at kernel/qeventloop_x11_glib.cpp:123 #17 0xb5acb6d3 in g_main_context_dispatch () from /lib/i386-linux-gnu/libglib-2.0.so.0 #18 0xb5acba70 in ?? () from /lib/i386-linux-gnu/libglib-2.0.so.0 #19 0xb5acbb51 in g_main_context_iteration () from /lib/i386-linux-gnu/libglib-2.0.so.0 #20 0xb63217de in TQEventLoop::processEvents (this=0x9119c28, flags=4) at kernel/qeventloop_x11_glib.cpp:279 #21 0xb63605d0 in TQEventLoop::enterLoop (this=0x9119c28) at kernel/qeventloop.cpp:227 #22 0xb63606ad in TQEventLoop::exec (this=0x9119c28) at kernel/qeventloop.cpp:174 #23 0xb634786f in TQApplication::exec (this=0x90edb78) at kernel/qapplication.cpp:3015 #24 0xb76bed1e in kdemain (argc=1, argv=0xbf94e954) at /tmp/buildd/tdebase-trinity-14.0.1~pre22/kdesktop/main.cc:293 #25 0x080488fb in main (argc=1, argv=0xbf94e954) at /tmp/buildd/tdebase-trinity-14.0.1~pre22/obj-i486-linux-gnu/kdesktop/kdesktop_tdeinit_executable.cpp:2 #26 0xb5cdde46 in __libc_start_main (main=0x80488e0 <main(int, char**)>, argc=1, ubp_av=0xbf94e954, init=0x8048910 <__libc_csu_init>, fini=0x8048900 <__libc_csu_fini>, rtld_fini=0xb779b590, stack_end=0xbf94e94c) at libc-start.c:244 #27 0x08048841 in _start ()