| Summary: | TDE exits too slow | ||
|---|---|---|---|
| Product: | TDE | Reporter: | Darrell <darrella> |
| Component: | tdebase | Assignee: | Timothy Pearson <kb9vqf> |
| Status: | RESOLVED FIXED | ||
| Severity: | blocker | CC: | bugwatch, darrella, fatzer2, kb9vqf, slavek.banko |
| Priority: | P1 | ||
| Version: | R14.0.0 [Trinity] | ||
| Hardware: | Other | ||
| OS: | All | ||
| See Also: | http://bugs.pearsoncomputing.net/show_bug.cgi?id=922 | ||
| Compiler Version: | TDE Version String: | ||
| Application Version: | Application Name: | ||
| Bug Depends on: | 258, 922 | ||
| Bug Blocks: | |||
| Attachments: |
A patch to work around a possible race condition.
Patch to provide cmake configure option to build ksmserver timing marks Konqueror backtrace Ksmserver backtrace ksmserver backtrace when logging out |
||
|
Description
Darrell
2011-12-22 16:22:30 CST
Some new information: Basically, I see an excessive slow exit whenever I enable the KDED System Base URL Notifier daemon. With that daemon enabled I see an exit time of about 17 seconds. When disabled I see an exit time of about 7 seconds. That latter time is still slow, but much better than the other time. When I see the excessive slow exit, I might see any of the following error messages in my xsession log: klauncher: Fatal IO error: client killed kdeinit: Fatal IO error: client killed kded: Fatal IO error: client killed kxdglauncher: cannot connect to X server :0 ICE default IO error handler doing an exit (), pid 17202, errno = 32 None of those errors consistently appear with the slow exit time, but they do appear. One error message in my xsession log that always appears when I see the excessive slow exit: kdeinit: Got EXEC_NEW 'kconf_update' from launcher That last message is similar to the problems reported in bug report 688. However, for this report I temporarily configured the vesa X driver and every time I still saw the excessive slow exit time and the same kconf_update error message. There definitely is something wrong with kconf_update. Possibly the problem in bug report 688 is not related to Nvidia drivers at all but is solely caused by kconf_update. I'm bumping this bug report a notch from Minor to Major and changing the report description. Today I tested a fresh Trinity install from GIT. I am impressed with how responsive the entire desktop is compared to KDE3. Unfortunately, exiting the session is horribly long. The session exit speed tarnishes an otherwise stellar experience. Once again, the "Got EXEC_NEW 'kconf_update' from launcher" message appears in the tail end of the xsession log. I won't say this is the sole cause of the slow exit speed, but there is a relationship with KDED System Base URL Notifier daemon. Here is a typical xsession log with stopping the KDED System Base URL Notifier daemon before ending the session: [starttde] Shutting down Trinity... [tdeinit] Terminating Trinity. [klauncher] Exiting on signal 1 [starttde] Running Trinity shutdown scripts... [starttde] Trinity shutdown complete. Here is a typical xsession log without stopping the KDED System Base URL Notifier daemon before ending the session: [starttde] Shutting down Trinity... [tdeinit] Terminating Trinity. [klauncher] Exiting on signal 1 [tdeinit] Launched DCOPServer, pid = 11111 result = 0 [tdeinit] Launched KLauncher, pid = 11115 result = 0 [tdeinit] opened connection to :0.0 [tdeinit] Launched KDED, pid = 11116 result = 0 [tdeinit] Got EXT_EXEC 'kbuildsycoca' from launcher. [kbuildsycoca] kbuildsycoca running... [tdeinit] PID 11118 terminated. [tdeinit] Got EXEC_NEW 'kconf_update' from launcher. [tdeinit] PID 11119 terminated. [tdeinit] PID 11116 terminated. [starttde] Running Trinity shutdown scripts... [starttde] Trinity shutdown complete. end from FAM server connection [tdeinit] Fatal IO error: client killed [klauncher] Exiting on signal 15 kded: Fatal IO error: client killed kxdglauncher: cannot connect to X server :0 [KDE-ICE error] ICE default IO error handler doing an exit(), pid = 9790, errno = 32 There is a bunch of things going on with that daemon running than without. The message "kded: Fatal IO error: client killed" likely is related to the daemon. And why are kbuildsycoca and kconf_update being restarted? Leaving the daemon running in KDE3 does not impact the exit time. Exit time with KDE3 is about 4.5 seconds. Exit time in TDE with the daemon stopped is about 7 seconds, which is still too slow. Is there a way to trace what is happening during a session shutdown? I have an interesting update with useful information. I can reduce the exit time from about 17 seconds to about 7, without disabling the KDED System Base URL Notifier daemon. All I need do is trigger a device event. That is, mount a partition, insert a USB stick, insert a CD/DVD. I do not have to mount the new device, just cause Trinity to become aware of the new device. For example, I can insert a CD/DVD and then cancel the notification popup dialog. That is sufficient to cause the KDED System Base URL Notifier daemon to not stall when logging out. This behavior is very repeatable. Without any device event the exit time is always about 17 seconds. With any device event the exit time is always about 7 seconds. With the latter the xsession log is always clean as mentioned in Comment 3. Part of the solution then to resolving this bug report with respect to the KDED System Base URL Notifier daemon is connected to resolving the following bug reports: 372 storage media applet doesn't show usb stick at kde startup 385 Unmounted removable device icons do not appear on the desktop 578 media:/ doesn't show files, remounts unit 634 CDs not correctly recognized by TDE The patch that resolved bug report 372 and 385 did not have any effect on this bug report. I still need to initiate a device change to witness the improved exit time. :( Therefore the KDED System Base URL Notifier daemon is still not working quite right. My hunch is that the KDED System Base URL Notifier daemon is unaware that a shutdown is in progress, and is somehow kicked into doing startup tasks by something that happens during the shutdown sequence. That sounds exactly like what is happening, based upon the xsession log (kconf_update, etc.). How do we slap it upside the head? (In reply to comment #7) > That sounds exactly like what is happening, based upon the xsession log > (kconf_update, etc.). > > How do we slap it upside the head? I can't replicate this on my Ubuntu machine, which will make debugging more difficult. For reference, the offending service is present here: tdebase/kioslave/system/kdedmodule/ What happens if you navigate to the system:/ URL in Konqueror before logging out? Does the slow shutdown still occur? Typing system:/ in Konqueror makes no difference. I tried several different scenarios. The only method I found to avoid the slower exit speed is initiating a device change event. Mounting is not required, just the change. Having a CD/DVD in the tray before starting the session has no effect. Ejecting the disk before ending the sessions works. Summary: only a change event works. I cannot replicate the slower exit speed in a virtual machine. Only in an actual physical machine. I use the same machine for both TDE and KDE3 but don't experience the problem in KDE3 (different partitions, no cross contamination). I start X/TDE from the command line through startx and not through TDM. Yet testing through TDM reveals the same times and behavior. (In reply to comment #9) > Typing system:/ in Konqueror makes no difference. > > I tried several different scenarios. The only method I found to avoid the > slower exit speed is initiating a device change event. Mounting is not > required, just the change. > > Having a CD/DVD in the tray before starting the session has no effect. Ejecting > the disk before ending the sessions works. > > Summary: only a change event works. > > I cannot replicate the slower exit speed in a virtual machine. Only in an > actual physical machine. I use the same machine for both TDE and KDE3 but don't > experience the problem in KDE3 (different partitions, no cross contamination). > > I start X/TDE from the command line through startx and not through TDM. Yet > testing through TDM reveals the same times and behavior. Having to have a device change status before shutdown occurs normally could be due to any number of problems, and is therefore not very helpful for narrowing down the exact location of the problem. Another potential cause of this would be the system:/ URL being accessed during the shutdown sequence on fast computers. Side note: the device change does not have to be initiated from within the session. I can toggle to a different console, mount a partition, and then the TDE session exits with the faster speed. What can I do to troubleshoot further? Or even experiment with code changes? (In reply to comment #11) > Side note: the device change does not have to be initiated from within the > session. I can toggle to a different console, mount a partition, and then the > TDE session exits with the faster speed. > > What can I do to troubleshoot further? Or even experiment with code changes? I would start by putting printf("<something>"); fflush(); debug markers in the system kioslave constructor, init(), and destructor methods. I would also put a debug statement in the directory listing method that prints out the requested URL. This is all so that I could get a feel for what types of access are being requested when. If you throw the printf();fflush(); pairs into the kioslave, the debug messages will show up in your ~/.xsession_errors log in the correct temporal locations. Posting that log here after you inserted the debug statements would provide valuable information. Tim Ok. I understand what you're trying to accomplish. I've done that many times in my other programming ventures. I just hope I can get the c++ syntax correct --- building tdebase takes almost a half hour these days. > Another potential cause of this would be the system:/ URL being accessed during
> the shutdown sequence on fast computers.
Please explain how I could investigate this further.
I tested on a 350 MHz PII machine. I could not replicate the stall or the xsession errors. Exit times were consistent with/without a device event and the xsessionlog was always clean.
Along with being unable to replicate the problem on a virual machine as well, I think there might be something to your statement.
I can't slow my fast "problem" machine any more than I now witness the problem, which with CPU governing is 1 GHz.
(In reply to comment #14) > > Another potential cause of this would be the system:/ URL being accessed during > > the shutdown sequence on fast computers. > > Please explain how I could investigate this further. With the debug statements I mentioned earlier--when they are inserted into the ~/.xsession_errors logfile I can make a pretty good guess as to what is going wrong. After adding the printf statements I no longer can replicate the stall. I added numerous printf statements in the two kdedmodule cpp files. Probably too many. However, not directly related to my intentions, but possibly related to the logout stall, I found four typos in tdelibs/dcop/KDE-ICE. There were four occurrences of IceClosetStatus that I changed to IceCloseStatus I'm guessing the IceClosetStatus typo is a remnant from IceClosetqStatus, most of which was cleaned up after purging inadvertent tqStatus changes. Whether or not I force a device event now makes no difference. The exit to the command line now is always about the same 6 to 7 seconds, as opposed to the 16 to 17 seconds previously. At this point I don't know what serves as the cure. The tdelibs typo corrections or the possibility that, with the printf/stdout requirements, the kded system module has something "to do" during a logout that no longer causes a stall. I say that because there a few places in the two cpp files where there is nothing between braces. The next thing is build tdelibs again without the typo patch and only change that one aspect. If nothing changes then I'll have to build several versions of tdebase with differing locations of the printf statements. One way or another after this, I'll push the tdelibs patch to GIT. Possible success!
This required much trial and error. Regardless of where I inserted printf statements, the stalls disappeared. Posting any xsession log would have been futile.
Finally I narrowed the problem to one location: tdebase/kioslave/system/kdedmodule/systemdirnotify.cpp, SystemDirNotify::init.
After removing all other printf statements, the one printf always was required. Here is the final patch I used:
diff -urN tdebase/kioslave/system/kdedmodule/systemdirnotify.cpp tdebase.new/kioslave/system/kdedmodule/systemdirnotify.cpp
--- tdebase/kioslave/system/kdedmodule/systemdirnotify.cpp 2012-03-23 19:54:25.000000000 -0500
+++ tdebase.new/kioslave/system/kdedmodule/systemdirnotify.cpp 2012-04-04 17:50:53.000000000 -0500
@@ -36,6 +36,7 @@
void SystemDirNotify::init()
{
if( mInited )
+ kdDebug() << "SystemDirNotify::init(" << mInited << ")" << endl;
return;
mInited = true;
KGlobal::dirs()->addResourceType("system_entries",
The kdDebug statement does not clutter the xsession output like the printf statement yet seems to serve the same purpose as the printf statement.
I don't know why this work-around succeeds or whether there is a cleaner or more robust solution.
Now that the cause is isolated, perhaps an improved solution is possible. If not then I ask for a peer review and signoff of the above patch before pushing to GIT.
If this particular patch is approved and pushed, then perhaps to avoid future deletions I should modify the patch with a comment why the kdDebug statement exists.
What you have done is to work around a race condition. I would add a comment like this: // FIXME Work around race condition by inserting delay before following code is executed--the root cause of the race should be investigated and fixed I'll add the comment and push to GIT. Curious --- to learn --- how you discerned that. :) After reading your response I notice homedirnotify.cpp uses the exact same code for initialization. Should we push a similar "temporary" patch for that too? I haven't yet noticed anything directly related, but --- who knows. (In reply to comment #19) > I'll add the comment and push to GIT. > > Curious --- to learn --- how you discerned that. :) Educated guess. :-) I've been around the innards of multithreaded programs long enough to recognize a race condition from the symptoms you describe. > > After reading your response I notice homedirnotify.cpp uses the exact same code > for initialization. Should we push a similar "temporary" patch for that too? I > haven't yet noticed anything directly related, but --- who knows. I wouldn't. This is a temporary hack only and the real root of the race condition may not cause a problem within homedirnotify.cpp. Tim Okay. Pushed as is with comment in GIT hash 9e172fa7a1e93cc77e09616eb793b823d29ebaec. By the way, the xsession log is nice and clean again with this work-around. Although relieved to have found a work-around to the extensive delay, I want to leave this bug report open. Seven seconds to exit (no session saving) still seems too long. Thanks for your help! (In reply to comment #21) > Okay. Pushed as is with comment in GIT hash > 9e172fa7a1e93cc77e09616eb793b823d29ebaec. > > By the way, the xsession log is nice and clean again with this work-around. > > Although relieved to have found a work-around to the extensive delay, I want to > leave this bug report open. Seven seconds to exit (no session saving) still > seems too long. > > Thanks for your help! No problem; glad to be of assistance! Regarding the 7 second delay, I seem to remember hardcoded delays in ksmserver. Might be worth checking into to make sure none of them are being triggered. Another potential source of delay is due to the time it takes to close down dozens of running programs. /dev/ammo42 mentioned the delays in the mailing list (read original problem description). I'll try to learn more. I manually close all apps when I exit, but reflecting on that statement, I realize I run a few apps in the system tray: kalarm, kteatime, akregator, klippy, kwikdisk, kmix, most of the kde related services. I will experiment with manually terminating those apps before starting the exit and see how much difference there is. At least with the serious "race" delay out of the way we can tinker more clearly. The "race condition" patch broke a few things. The feedback dialog no longer appears, Confirm logout no longer works, and there is no grade fading effect. After rebuilding tdebase with a reversed patch, those functions return. I reversed the patch in GIT. Back to square one. :( (In reply to comment #24) > The "race condition" patch broke a few things. > > The feedback dialog no longer appears, Confirm logout no longer works, and > there is no grade fading effect. > > After rebuilding tdebase with a reversed patch, those functions return. I > reversed the patch in GIT. > > Back to square one. :( Out of curiosity do the ill effects occur if you use printf() instead of kdDebug()? Created attachment 525 [details]
A patch to work around a possible race condition.
I had to rebuild, but curiously, using printf works whereas kdebug did not. No problems with the dialogs appearing.
More curious, the printf message never appears in the xsession log, which means that snippet of code is not being executed. Yet adding the printf avoids the exit stall and I don't need a device change event either.
I am attaching the updated patch to this bug report in case others experience the same problem, but I won't push to GIT for now. Further, I can devise work-arounds and live with the "improved" 7 second exit time --- I'd rather see team energies focused on many other bug reports.
Thanks much for your help!
Another reason for not pushing to GIT for now is that leaves open the possibility that others might experience the bug. Additional data might help resolve the problem in a meaningful manner. (In reply to comment #26) > Created attachment 525 [details] > A patch to work around a possible race condition. > > I had to rebuild, but curiously, using printf works whereas kdebug did not. No > problems with the dialogs appearing. kdDebug works very differently than printf() and requires access to the TDE core IIRC, so I could easily see it interfering with the shutdown dialogs. (In reply to comment #28) > (In reply to comment #26) > > Created attachment 525 [details] [details] > > A patch to work around a possible race condition. > > > > I had to rebuild, but curiously, using printf works whereas kdebug did not. No > > problems with the dialogs appearing. > > kdDebug works very differently than printf() and requires access to the TDE > core IIRC, so I could easily see it interfering with the shutdown dialogs. Just FYI kddebug apparently requests access to the TDE configuration system and should NOT be used in shutdown-related code. 1. Regarding slow exit culprits, do your comments mean a patch is forthcoming? :-) 2. Do we want to push the existing patch attachment (changing kdedebug -> printf)? I haven't noticed any problems after changing to printf. (In reply to comment #30) > 1. Regarding slow exit culprits, do your comments mean a patch is forthcoming? > :-) No, sorry. As stated before I have yet to see this on any of my systems. > 2. Do we want to push the existing patch attachment (changing kdedebug -> > printf)? I haven't noticed any problems after changing to printf. Go ahead. Just make sure that the printf is preceded by a comment explaining its hacky purpose and a warning not to remove it. (In reply to comment #31) > (In reply to comment #30) > > 1. Regarding slow exit culprits, do your comments mean a patch is forthcoming? > > :-) > > No, sorry. As stated before I have yet to see this on any of my systems. > > > 2. Do we want to push the existing patch attachment (changing kdedebug -> > > printf)? I haven't noticed any problems after changing to printf. > > Go ahead. Just make sure that the printf is preceded by a comment explaining > its hacky purpose and a warning not to remove it. Actually, just for fun try replacing the hacky printf(); with sleep(1); and see if that also cures the problem. Tim Race condition patch pushed in GIT hash d41f5217. I read your statement again about kdDebug and now realize you were commenting about why using printf is better than using kdDebug, and not proposing possible solutions. There are a handful of kdDebug statements in tdebase/ksmserver/shutdown.cpp. If that good or bad? Should those statements be converted to printf statements? fprintf(stderr...)? What about using kdDebug in tdebase/ksmserver/startup.cpp or tdebase/ksmserver/server.cpp? Regarding the bug report description, Xfce exits in about 2.5 seconds. With the race condition patch, TDE still exits in about 6.5 seconds and that is without saving the session. To debug, would be nice if somehow we could temporarily insert timing marks and printf statements. (In reply to comment #33) > Race condition patch pushed in GIT hash d41f5217. > > I read your statement again about kdDebug and now realize you were commenting > about why using printf is better than using kdDebug, and not proposing possible > solutions. > > There are a handful of kdDebug statements in tdebase/ksmserver/shutdown.cpp. If > that good or bad? Should those statements be converted to printf statements? > fprintf(stderr...)? It might be good to convert them to printf(), as this will prevent potential reloading of ksycoca on fast multicore systems. > What about using kdDebug in tdebase/ksmserver/startup.cpp or > tdebase/ksmserver/server.cpp? Won't matter, I would leave it as-is with kdDebug being used. > Regarding the bug report description, Xfce exits in about 2.5 seconds. With the > race condition patch, TDE still exits in about 6.5 seconds and that is without > saving the session. To debug, would be nice if somehow we could temporarily > insert timing marks and printf statements. You can do that in ksmserver/shutdown.cpp. ;-) Just use printf and you should be fine. I still wager much of that delay is ksmserver waiting for various apps to save their state (this is different from saving a session, and always happens BTW). Tim Do you see the slow logout time (with a locally reverted hash d41f5217) if you set showFancyLogout=false? I ask because in Bug 922 you mention a very similar slowdown to the one reported here, although Bug 922 is not the appropriate place to discuss this topic. ;-) The only way to test the patch for bug report 922 is with showFancyLogout=true. When showFancyLogout=false, then apps intercept the session exit request and the exit time is normal. The ICE error does not appear. But perhaps I stumbled across the cause. As yet there is no GUI check to control the status dialog. Aargh --- too darned late at night.... The only way to test the patch for bug report 922 is with showFancyLogout=true. When showFancyLogout=false, then apps intercept the session exit request and the exit time is normal. The ICE error does not appear. Perhaps I stumbled across the cause. Well, not really. I rebooted and now can't repeat the problem. I witnessed the slow exit only once (kolourpaint) after many session restarts. With that said, when showFancyLogout=true and I select Discard from the save dialog, I see this message in my xsession log: [dcopserver] DCOP aborting while waiting for answer from 'knotify' I see the message whether I save the document or don't save. The error message does not appear when there are no unsaved documents. At this point I'm too tired to troubleshoot further. (In reply to comment #37) > I rebooted and now can't > repeat the problem. I witnessed the slow exit only once (kolourpaint) after > many session restarts. Was this with showFancyLogout=true or showFancyLogout=false? The problem reported in bug report 922 applies only when showFancyLogout=true. When showFancyLogout=false then the status dialog never appears and apps with unsaved files always works. No sense in testing the latter. I was up way too late last night trying to test this and find a pattern. Possibly my exhaustion was part of the problem, although as originally reported, I witnessed the slow exit earlier in the evening. I know I saw what I saw and repeatibility was perfect each time. I am rebuilding a new package set because of the ABI breakage in the latest tdelibs patch. Like the Konqueror SIGABRT problem reported in bug report 293, a rebuild of everything is necessary. Possibly the new slow exit is related. I should know more later today after the run completes. I found the root cause for the latest slow exit issue. Some good news is I can replicate this bug in 3.5.10 too. Therefore this bug has existed a long time. In all that time few people noticed and probably even fewer investigated, if any. This might sound crazy, but the cause is not having an event sound enabled for warning dialogs. I don't have an event sound enabled for warning messages. I have event sounds disabled for most message dialogs. When I have an event sound enabled for warning dialogs, then the delay is unlikely to occur. I write unlikely because on one occasion I witnessed the delay despite having the event sound enabled. For certain, when an event sound for warning messages is not enabled, then the delay always happens. Open KControl, Sound & Multimedia, System Notifications, Event source: System Notifications, Event: "A warning message is being shown." Uncheck "Play a sound" and select the Apply button. This is the most common error message: * ICE default IO error handler doing an exit(), pid = 27431, errno = 11 The same error message appears in 3.5.10 too. The error message is from from libICE error.c. I'm thinking there is breakage somewhere in interprocess communications. When I close an app that has an unsaved file and Discard the changes, but wait 10 to 20 seconds before exiting TDE, the exit time is normal with no abnormal delay. There seems to be a timeout of some kind. The only time I see the ICE error message is discarding changes while trying to exit TDE. I tested this in both virtual and real machines. I have seen three other error messages, but I want to focus on one thing at a time. Hopefully a patch will cause all error messages to disappear. Tim, Regarding the previous Comment 40 that not having an event sound enabled for warning dialogs causes a significant session exit delay, do we want a separate bug report for that? Or will we resolve that problem through this bug report? The work-around is to enable a sound event, but that is not a great solution for users who do not want a sound for that dialog. Let me know. Thanks. FYI: KDE4 exits real fast (although startup is a bit slower than Trinity). Perhaps some code snippets can be massaged from KDE4. Possible connection is bug report 1425. Regarding comment 40, this anomaly seems to have disappeared. The exit time remains the same for me now whether or not I have a warning sound enabled. Exit time is about 5 seconds for me. Thus there have been nominal optimizations over the months since starting this report, although I don't know exactly what. Typically I have several apps in the system tray, but closing them all before timing the exit results in saving only less than a half second. Five seconds is not horrible but remains noticeable for anybody familiar to other environments, such as xfce or kde4. Interesting update while trying to help test bug report 1464. The testing account I use needs about 17 seconds to exit Trinity. So there remains something awry about exiting Trinity. I'm adding this note for reference --- I'll try to get back to this at a later time. I added some profiling code to ksmserver, and while this is not yet pushed to GIT, I did notice that around 3 seconds are spent waiting for logoutSoundFinished to fire on my test box. It would be somewhat embarrassing if this was the root cause of the slow TDE logout! The logout sound delay only occurs if the system does not have any audio hardware installed. I am seeing a second problem in that timeoutQuit is triggered due to knotify not exiting on prior request; this reliably adds 10 seconds to the logout process. An interesting bit is the testing account does not always use 17 seconds to exit Trinity. Sometimes I see about 5 to 6 seconds, which is about the same as my normal account (which to me is too long). Note: I don't define a login or logout sound in any of my accounts. Possibly part of the delay is related to not having a sound defined. That you mention this seems potentially related to my older post about not having a warning sound defined, although I no longer can replicate that exact delay. (In reply to comment #47) > An interesting bit is the testing account does not always use 17 seconds to > exit Trinity. Sometimes I see about 5 to 6 seconds, which is about the same as > my normal account (which to me is too long). > > Note: I don't define a login or logout sound in any of my accounts. Possibly > part of the delay is related to not having a sound defined. That you mention > this seems potentially related to my older post about not having a warning > sound defined, although I no longer can replicate that exact delay. Likely knotify is still not exiting on demand, even if the 10 second hard limit is not triggered. That agrees a little with the message I shared in comment 37. (In reply to comment #48) > (In reply to comment #47) > > An interesting bit is the testing account does not always use 17 seconds to > > exit Trinity. Sometimes I see about 5 to 6 seconds, which is about the same as > > my normal account (which to me is too long). > > > > Note: I don't define a login or logout sound in any of my accounts. Possibly > > part of the delay is related to not having a sound defined. That you mention > > this seems potentially related to my older post about not having a warning > > sound defined, although I no longer can replicate that exact delay. > > Likely knotify is still not exiting on demand, even if the 10 second hard limit > is not triggered. Still working on the knotify exit delay, but wanted to note that this has been around for a LONG time, even in KDE4. In fact, KDE4 may still exhibit this problem, but not on your system for whatever reason: https://bugs.kde.org/show_bug.cgi?id=152611 Interesting. True, on my system KDE4 exits immediately. So fast that I could not help but notice. Xfce exits immediately too. I am not surprised if this problem has been around since the 3.x days. We've seen that more than once around here. Not assigning blame, just saying that when a bug exists for some people but not all that debugging gets far more challenging. Corner case issues are a devil. I'm guessing that is at least partially the case here. Then there is social conditioning. Users grow accustomed to a several second delay and presume that is normal. So few people challenge that presumed status quo. :) (In reply to comment #51) > Interesting. True, on my system KDE4 exits immediately. So fast that I could > not help but notice. > > Xfce exits immediately too. > > I am not surprised if this problem has been around since the 3.x days. We've > seen that more than once around here. Not assigning blame, just saying that > when a bug exists for some people but not all that debugging gets far more > challenging. Corner case issues are a devil. I'm guessing that is at least > partially the case here. > > Then there is social conditioning. Users grow accustomed to a several second > delay and presume that is normal. So few people challenge that presumed status > quo. :) I believe I have it traced properly now, though a fix is not yet ready. And yes, this is a bit of a corner case. :-) Essentially, most available computers are now so fast (especially with more than one CPU) that the running knotify instance exits almost immediately. Unfortunately, ksmserver is not yet finished using knotify, so it launches another instance, which then has to be forcibly terminated after a 10-second wait before logout can complete. The fix is likely to delay knotify termination until after twin has terminated. If I manually delay knotify termination with gdb, my "slow" test system logs out in under 5 seconds, with the profiling code indicating that most of that time is spent waiting for running applications (such as Firefox) to save their session state. knotify heisenbug fixed in GIT hash 1c7bdfd. If TDE now exits quickly enough for you, please close this bug report. If not, please enable the new compile-time shutdown profiling code (#define PROFILE_SHUTDOWN 1) in tdebase and attach the resultant ~/.xsession_errors file to this report. For reference, on my test VM using a NFS root, TDE exits in a little over 4 seconds with several applications (such as Firefox) running when logout was requested. I'll test and report. Looking at the patch, would a cmake configure option be suitable for enabling the shutdown spew? Something like -DWITH_VERBOSE_SHUTDOWN? (In reply to comment #54) > I'll test and report. > > Looking at the patch, would a cmake configure option be suitable for enabling > the shutdown spew? Something like -DWITH_VERBOSE_SHUTDOWN? Might be a good idea. Such an option should be off by default (i.e. not use the value of BUILD_ALL as its default like most other build options do). I ran some tests with the latest patch. I ran a set of five timed exits on all of my machines using an unscientific stopwatch method. Apparently I'm not afflicted by the corner case delay previously described and patched and can't help test that. :-( On my two dual cores, without the patch, exit times averaged about 5.5 to 5.9 seconds. On my PI and PII, exit times averaged about 10 seconds (they are horribly old machines but provide for some interesting testing). After the patch the exit times were consistently the same. I added time markers to my starttde and xinitrc. Interestingly, the biggest delay on my systems is with tdeinit_shutdown, launched in starttde. The delay time is about two seconds (Look at the times at the end at "Shutting down Trinity"). As can be seen from the xsession-errors log, the corner case delay with ksmserver does not happen on my systems and is not a cause of any delay. The time markers show ksmserver terminating quickly. I don't know whether a 5.5 to 5.9 second exit time on a dual core system is normal. Still seems long to me. I can manually terminate all apps in the system tray before exiting Trinity and the exit time is reduced only by a fraction of a second. I have more information to add but I'm still assembling the data. ====================================== [ksmserver] 'Shutdown initiated' [23:06:54:382] [ksmserver] 'Legacy save complete' [23:06:54:394] [ksmserver] 'completeShutdownOrCheckpoint' [23:06:54:396] [ksmserver] 'completeShutdownOrCheckpoint' [23:06:54:401] [ksmserver] 'completeShutdownOrCheckpoint' [23:06:54:406] [ksmserver] 'completeShutdownOrCheckpoint' [23:06:54:407] [ksmserver] 'completeShutdownOrCheckpoint' [23:06:54:407] [ksmserver] 'completeShutdownOrCheckpoint' [23:06:54:407] [ksmserver] 'completeShutdownOrCheckpoint' [23:06:54:408] [ksmserver] 'completeShutdownOrCheckpoint' [23:06:54:409] [ksmserver] 'completeShutdownOrCheckpoint' [23:06:54:409] [ksmserver] 'completeShutdownOrCheckpoint' [23:06:54:409] [ksmserver] 'completeShutdownOrCheckpoint' [23:06:54:409] [ksmserver] 'completeShutdownOrCheckpoint' [23:06:54:409] [ksmserver] 'completeShutdownOrCheckpoint' [23:06:54:409] [ksmserver] 'completeShutdownOrCheckpoint' [23:06:54:410] [ksmserver] 'Phase 2 complete' [23:06:54:410] [ksmserver] 'Session discarded' [23:06:54:410] [ksmserver] 'notifySlot' [23:06:54:412] [ksmserver] 'startKilling' [23:06:54:412] [ksmserver] 'completeKilling' [23:06:54:483] [ksmserver] 'completeKilling' [23:06:54:485] [ksmserver] 'completeKilling' [23:06:54:486] [ksmserver] 'completeKilling' [23:06:54:486] [ksmserver] 'completeKilling' [23:06:54:486] [ksmserver] 'completeKilling' [23:06:54:486] [tdeinit] PID 6196 terminated. X Error: BadWindow (invalid Window parameter) 3 Major opcode: 7 Minor opcode: 0 Resource id: 0x2800008 X Error: BadWindow (invalid Window parameter) 3 Major opcode: 6 Minor opcode: 0 Resource id: 0x2800008 [ksmserver] 'completeKilling' [23:06:54:498] [tdeinit] PID 6170 terminated. X Error: BadWindow (invalid Window parameter) 3 Major opcode: 7 Minor opcode: 0 Resource id: 0x2600008 X Error: BadWindow (invalid Window parameter) 3 Major opcode: 6 Minor opcode: 0 Resource id: 0x2600008 X Error: BadWindow (invalid Window parameter) 3 Major opcode: 7 Minor opcode: 0 Resource id: 0x2000008 X Error: BadWindow (invalid Window parameter) 3 Major opcode: 6 Minor opcode: 0 Resource id: 0x2000008 [ksmserver] 'completeKilling' [23:06:54:525] [ksmserver] 'completeKilling' [23:06:54:528] [ksmserver] 'completeKilling' [23:06:54:591] [ksmserver] 'completeKilling' [23:06:54:602] [tdeinit] PID 6192 terminated. [ksmserver] 'completeKilling' [23:06:54:632] [ksmserver] 'completeKilling' [23:06:54:973] [ksmserver] 'killWM' [23:06:54:973] [ksmserver] 'completeKillingWM' [23:06:54:973] [ksmserver] 'completeKillingWM' [23:06:54:974] [ksmserver] 'completeKillingWM' [23:06:54:981] [ksmserver] 'completeKillingWM' [23:06:54:999] [ksmserver] 'killingCompleted' [23:06:54:999] [tdeinit] PID 6138 terminated. [starttde] Shutting down Trinity... [23:06:55:683412249] [tdeinit] Terminating Trinity. [tdelauncher] Exiting on signal 1 [starttde] tdeinit_shutdown. [23:06:57:710935496] [starttde] dcopserver_shutdown. [23:06:57:832070939] [starttde] artsshell. [23:06:57:878495134] [starttde] kdeinit4_shutdown. [23:06:57:896654119] [starttde] Running Trinity shutdown scripts... [23:06:57:900248900] [starttde] Trinity shutdown complete. [23:06:58:175146092] xinitrc: Exiting X: 2013-04-24 23:06:58 Created attachment 1185 [details]
Patch to provide cmake configure option to build ksmserver timing marks
Tim,
Attached is a patch as discussed in comments 54 and 55. I tested the patch both ways (OFF/ON) and in each case the xsession-errors log looked as expected.
Please provide a sanity inspection and if ok I'll push to GIT.
Question:
When I view the xsession-errors log in mcedit, I see a bunch of ^M characters preceding all of the [ksmserver] markers. Second, all of the ksmserver comments have a blank line feed after the output. Can this be cleaned?
I have some observations that might be related: During testing, I installed kdesvn-trinity and kdesvn-kio-plugins-trinity (yes v3.5.13-sru branch). And beyond that I set in kdmrc: Compositor=kompmgr During logout in .xsession-errors I see: [kwin-workspace] Using '/home/axis/.kompmgr.pid' as kompmgr pidfile [startkde] Shutting down Trinity... [klauncher] Exiting on signal 1 [25398:25464:61868684793:ERROR:zygote_host_linux.cc(398)] write: Roura přerušena (SIGPIPE) The Trinity ltdl loader was unable to dlopen() the shared library '/usr/lib64/trinity/kded_kdesvnd.so' : '/usr/lib64/trinity/kded_kdesvnd.so: sdílený objektový soubor nelze otevřít: Adresář nebo soubor neexistuje' [startkde] Running Trinity shutdown scripts... [startkde] Trinity shutdown complete. I leave aside that /usr/lib64/trinity/kded_kdesvnd.so is wrong path (I'll deal with this separately). It seems suspicious, why in lougout appears message: [kwin-workspace] Using '/home/axis/.kompmgr.pid' as kompmgr pidfile And why in lougout is effort to load library: The Trinity ltdl loader was unable to dlopen() the shared library It looks as it trying to start something, instead of peaceful shutdown. (In reply to comment #57) > Created attachment 1185 [details] > Patch to provide cmake configure option to build ksmserver timing marks > > Tim, > > Attached is a patch as discussed in comments 54 and 55. I tested the patch both > ways (OFF/ON) and in each case the xsession-errors log looked as expected. > > Please provide a sanity inspection and if ok I'll push to GIT. > > Question: > > When I view the xsession-errors log in mcedit, I see a bunch of ^M characters > preceding all of the [ksmserver] markers. Second, all of the ksmserver comments > have a blank line feed after the output. Can this be cleaned? Patch looks OK, go ahead and clean up the timing mark #define near the top of shutdown.cpp if you want to. :-) Please, this is a case where rather than BUILD_ should be used WITH_? In this case WITH_PROFILE_SHUTDOWN rather than BUILD_PROFILE_SHUTDOWN? Patch pushed to GIT in commit 62d43d23.
>Please, this is a case where rather than BUILD_ should be used WITH_?
I don't know. Exactly what is the difference between BUILD and WITH?
(In reply to comment #56) <snip> > I added time markers to my starttde and xinitrc. Interestingly, the biggest > delay on my systems is with tdeinit_shutdown, launched in starttde. The delay > time is about two seconds (Look at the times at the end at "Shutting down > Trinity"). There are actually three commands run in that time slot: $TDEDIR/bin/tdeinit_shutdown $TDEDIR/bin/dcopserver_shutdown --wait $TDEDIR/bin/artsshell -q terminate Out of those three, I vote dcopserver_shutdown as the likely delay culprit. Is there any way you could insert a time marker in your starttde script between each of those three commands so that I can get a better idea of which one is the true cause of the delay on your systems? Thanks! (Odpověď na komentář #61)
> Patch pushed to GIT in commit 62d43d23.
>
> >Please, this is a case where rather than BUILD_ should be used WITH_?
>
> I don't know. Exactly what is the difference between BUILD and WITH?
I understand that BUILD_ affects whether will be builded binary or library - and it usually corresponds to the folder in the sources. While WITH_ determines with such properties would be these binaries or libraries builded. Therefore, in this case, I tend to use WITH_.
> Is there any way you could insert a time marker in your starttde Already done. :-) Look at the log output in comment 56. [starttde] Shutting down Trinity... [23:06:55:683412249] [tdeinit] Terminating Trinity. [tdelauncher] Exiting on signal 1 [starttde] tdeinit_shutdown. [23:06:57:710935496] [starttde] dcopserver_shutdown. [23:06:57:832070939] [starttde] artsshell. [23:06:57:878495134] [starttde] kdeinit4_shutdown. [23:06:57:896654119] [starttde] Running Trinity shutdown scripts... [23:06:57:900248900] [starttde] Trinity shutdown complete. [23:06:58:175146092] xinitrc: Exiting X: 2013-04-24 23:06:58 > I understand that BUILD_ affects whether will be builded binary or library - and it usually corresponds to the folder in the sources. While WITH_ determines with such properties would be these binaries or libraries builded.
Is this defined anywhere? I'm asking because I'd like to know.
(Odpověď na komentář #65)
> > I understand that BUILD_ affects whether will be builded binary or library - and it usually corresponds to the folder in the sources. While WITH_ determines with such properties would be these binaries or libraries builded.
>
> Is this defined anywhere? I'm asking because I'd like to know.
I do not know about that rule was written.
I assume from the way that are currently defined.
If the distinction is a local project guideline then perhaps we should have that guideline defined someplace? (In reply to comment #64) > > Is there any way you could insert a time marker in your starttde > > Already done. :-) Look at the log output in comment 56. > > [starttde] Shutting down Trinity... [23:06:55:683412249] > [tdeinit] Terminating Trinity. > [tdelauncher] Exiting on signal 1 > [starttde] tdeinit_shutdown. [23:06:57:710935496] > [starttde] dcopserver_shutdown. [23:06:57:832070939] > [starttde] artsshell. [23:06:57:878495134] > [starttde] kdeinit4_shutdown. [23:06:57:896654119] > [starttde] Running Trinity shutdown scripts... [23:06:57:900248900] > [starttde] Trinity shutdown complete. [23:06:58:175146092] > xinitrc: Exiting X: 2013-04-24 23:06:58 Thanks, I must have been blind earlier. ;-) Fix for 2 second delay committed to GIT in hash a9e5bed29204900ce96e9bd89dc4a3444512ca83. Essentially, the original developer entered a blocking read on the tdeinit socket for an unknown reason, likely relying on the dying X11 server or tdeinit process (I'm not sure which) to ensure eventual tdeinit_shutdown termination. I stress tested logout/immediate login in a while loop with no ill effects, so I suspect that the wait was not needed in the first place. With this change, TDE logs out in 3 seconds on my real (non-virtual) test machine. Good enough? ;-) Sounds encouraging. Let me rebuild and test. :-) I have no idea what a blocking read is but wow. I'm getting about 3 seconds now with my normal account. I have to add --- the lack of the additional 2.5 seconds FEELS different. Trinity feels snappier, although we are only discussing the exit time. No weird messages in xsession-errors either. Nice job! :-) (In reply to comment #70) > I have no idea what a blocking read is but wow. I'm getting about 3 seconds now > with my normal account. I have to add --- the lack of the additional 2.5 > seconds FEELS different. Trinity feels snappier, although we are only > discussing the exit time. No weird messages in xsession-errors either. Nice > job! :-) Great! In a nutshell, the blocking read was there to make absolutely sure that tdeinit was not running before tdeinit_shutdown would allow starttde to complete execution. It worked by sitting on the tdeinit socket and attempting a read, essentially waiting for that read to error out when the tdeinit socket was closed. However, this introduced a significant delay, as my tests indicated the read *never* returned, meaning the code did not do what it was supposed to do and only *seemed* to work due to some interaction of shutdown processes that I did not bother fully tracing. Furthermore, there is no reason to wait for tdeinit shutdown before executing the last steps in the shutdown process; tdeinit will close down very rapidly after receiving the shutdown request, even on slow system, as its shutdown path is only a few lines of C++ code. If TDE's logout time is now in line with other environments, can we mark this bug resolved? :-) Tagging as resolved. :-) Hopefully bug report 1477 is something altogether different --- although I did not add 1477 to the R14 etherpad. I will run a quick check now that you've eliminated about 2.5 seconds. Due to commit 8f8dce36 ([tdebase] Fix premature termination of applications during shutdown on slow machines) logout is incredibly long. After selecting logout from T-Menu opens the dialog for selecting shutdown mode. I will select Close the current session and confirm. Dialog for selecting shutdown mode closes but then for a minute nothing happens. After one minute waiting opens logout status dialog and begins closing programs. During closing programs, then I watch another minute waiting for completion of closing programs. During testing, I found that for mad more than 120s session termination is sufficient to have an open Konqueror with uri "locate:*.ogg". Another Konqueror windows that I open in the time between logout confirmation and real start to logout, are not saved in the session. Just sharing out loud, any chance bug report 1477 is related? (Odpověď na komentář #74)
> Just sharing out loud, any chance bug report 1477 is related?
I think not - on the test machine I have no remote mounts.
Other than the NFS report, I'm not seeing any delays with exiting. Then again, I don't use confirmation, the logout status dialog, or sessions. I will enable those for a day or two to see what happens. This should partially be resolved in GIT hash 08390bd (tdebase). The logout stall is fundamentally caused by an application (or applications) not responding to the X11 SM SaveYourself signal; the prior behavior timed out in 10 seconds which somewhat masked applications not responding, but also made it so that a slow machine might prematurely terminate an application that was saving its settings. Therefore, the aforementioned GIT hash adds the ability for the user to forcibly skip the SaveYourself step if needed. Changing status to NEEDINFO as I consider the above GIT commit a valid solution to this report, but I would like other TDE developers (and users!) to weigh in on this concept and its interface/implementation. How to test? Is there a new dialog config option? (In reply to comment #79) > How to test? Is there a new dialog config option? There is no configuration option. If the SaveYourself step stalls for more than 1 second, two buttons will appear in the logout status dialog. Ways to test include what Slavek mentioned earlier (Konqueror with URI "locate:*.ogg"), or leaving Kate open with an unsaved text document when you log out. The patch in the attachment 525 [details] is malformed!
You forgot the brackets around the "if", so the function unconditionally returns now, I haven't tested what it breaks but I'm sure there is something.
if( mInited ) {
// FIXME bla-bla-bla
printf("[systemdirnotify] SystemDirNotify::init(mInited)");
return;
}
(In reply to comment #81) > The patch in the attachment 525 [details] is malformed! > > You forgot the brackets around the "if", so the function unconditionally > returns now, I haven't tested what it breaks but I'm sure there is something. > > if( mInited ) { > // FIXME bla-bla-bla > printf("[systemdirnotify] SystemDirNotify::init(mInited)"); > return; > } Good catch! Darrell, can you fix this and see if the printf() still has the same effect as before? Thanks! I will push the missing brackets to GIT. I don't know about the testing other than verifying nothing new happens. I browsed the original comments and my eyes got blurry trying to remember exactly what I observed before and after the patch. (Odpověď na komentář #83) > I will push the missing brackets to GIT. I don't know about the testing other > than verifying nothing new happens. I browsed the original comments and my eyes > got blurry trying to remember exactly what I observed before and after the > patch. Last patches respond to the problem described in the comment 73 I'm running a full build set now (I started before the hard freeze announcment). When the build run completes I'll try to test the latest patch against Comment 73. I don't know whether I can test properly because I haven't been able to replicate the latest stall report. If nothing happens then I'll test a patch with the missing brackets. Regarding commit 8f8dce36, the shutdown dialog now appears despite having kcontrol TDE Components->Session Manager->Show logout status dialog disabled/unchecked. The patch should be tweaked to honor that setting. I understand that the dialog should appear when something is delaying the logout process, but only then. right now the dialog appears all the time. Side note: I can't get locate:*.ogg to work. I have the tdeio-locate package installed. Typing locate:*.ogg: An error occurred while loading locate:*.ogg: Could not start process Unable to create io-slave: tdelauncher said: Error loading 'tdeio_locate'. . When I open kate with an unsaved document and attempt to log out of Trinity, the shutdown dialog appears (whether configured or not, as noted previously). After a couple of seconds the two new button options appear inside the dialog. The first time I tested the Skip Notification button the entire desktop froze. I had to use Ctrl-Alt-Backspace to exit. I haven't been able to replicate again. I hate when events like that happen only once. :-( Regarding Comment 87, refer to bug report 1285. The problem is incomplete support for hidden visibility. Any chance we can get the shutdown dialog appearance fixed to abide by the user's configuration in kcontrol TDE Components->Session Manager->Show logout status dialog? That is, when the option is disabled, the dialog does not appear unless the two second timeout occurs? I have fixed the regression of the dialog not respecting the users' settings in GIT hash c060a81. However, if the logout dialog is disabled, it will not appear at all, ever. My best advice is just to Ctrl+Alt+Backspace the X server if you have disabled the logout dialog and the logout hangs; this is the way it had to be done in KDE 3.x. :-) Until/unless someone notes a hang that cannot be aborted while the logout dialog is enabled, I am going to provisionally mark this RESOLVED FIXED. The situation has not improved, perhaps contrary deteriorated. The initial condition is still the same: running Konqueror with URI "locate *. ogg". With enabled logout status dialog: Too fast to see buttons Skip Notification + Abort Logout => 2 seconds is too little for a slower machine. After pressing Skip Notification => long wait => in ssh console I see the Konqueror crash => waiting drkonqi hidden under logout status dialog. Requires manually kill drkonqi process, or not completed logout at all! With disabled logout status dialog: After a minute waiting occurs a crash konqueror and ksmserver - opens with 2 × drkonqi. After processing drkonqi, a few seconds waiting for the completion of logout. I note that before commit 8f8dce36 I did not notice such problems. I will send a both backtraces asap. Created attachment 1416 [details]
Konqueror backtrace
Created attachment 1417 [details]
Ksmserver backtrace
Note: Tested with kdebase GIT c060a817. OK, thanks for the info. This bug report just refuses to die!! Is this a regression from KDE 3.5.10? I seem to remember KDE 3.x hanging on logout occasionally, but it has been so long that I can't be sure. If hanging on logout is not a regression then I may need to demote this from Blocker status. If it is a regression, then a bisect may be needed. Tim I also see a ksmserver crash when trying to logout of the session. Has to be related with something very recent. I built a package set Monday and did not see this crash. Created attachment 1418 [details]
ksmserver backtrace when logging out
I don't have the shutdown dialog or shutdown confirmations enabled. Just a straight logout. I updated tdelibs and tdebase as of this afternoon and rebuilt those two packages after running a full package set rebuild this morning.
The ksmserver crash occurs on three different machines. Okay, I see now a patch has been pushed to tdebase. Rebuilding... I don't know whether this is related to comment 82, but I did test applying the patch with the braces. As I wait for tdebase to rebuild with the ksmserver crash, I notice the following in my xsession-error log: SystemDirNotify::init(mInited)[systemdirnotify] SystemDirNotify::init(mInited)[systemdirnotify] TQClipboard: cannot transfer data, no data available (Odpověď na komentář #95) > OK, thanks for the info. This bug report just refuses to die!! > > Is this a regression from KDE 3.5.10? I seem to remember KDE 3.x hanging on > logout occasionally, but it has been so long that I can't be sure. If hanging > on logout is not a regression then I may need to demote this from Blocker > status. If it is a regression, then a bisect may be needed. > > Tim I think that bisect is not needed. As I mentioned in a comment 73 and reminded in comment 91, I began to noticed slow logout after commit: 8f8dce36 - [tdebase] Fix premature termination of applications during shutdown on slow machines. http://git.trinitydesktop.org/cgit/tdebase/commit/?id=8f8dce36 Incidentally, this is why I do not backport this commit into 3.5.13-sru branch. When I tried backport this commit, I noticed a significant slowdown on logout. That's why I know exactly when the problem started. So yes, it's definitely a regression. (In reply to comment #101) > (Odpověď na komentář #95) > > OK, thanks for the info. This bug report just refuses to die!! > > > > Is this a regression from KDE 3.5.10? I seem to remember KDE 3.x hanging on > > logout occasionally, but it has been so long that I can't be sure. If hanging > > on logout is not a regression then I may need to demote this from Blocker > > status. If it is a regression, then a bisect may be needed. > > > > Tim > > I think that bisect is not needed. As I mentioned in a > comment 73 and reminded in comment 91, > I began to noticed slow logout after commit: > > 8f8dce36 - [tdebase] Fix premature termination of applications during shutdown > on slow machines. > http://git.trinitydesktop.org/cgit/tdebase/commit/?id=8f8dce36 > > Incidentally, this is why I do not backport this commit into 3.5.13-sru branch. > When I tried backport this commit, I noticed a significant slowdown on logout. > That's why I know exactly when the problem started. > > So yes, it's definitely a regression. OK. The problem is that the code that was altered has a nasty habit of force-closing applications on slow machines, leading to data loss. Is that something we want to ship in R14? I understand that we do not wish to revert this commit. But simple reproducibility with Konqueror and uri "locate:*.ogg" would be helpful to detect bug or symptoms that may be common to other cases which, during our testing so far remained undetected. Rebuilt tdebase: no ksmserver crash anymore. (In reply to comment #103) > I understand that we do not wish to revert this commit. > > But simple reproducibility with Konqueror and uri "locate:*.ogg" would be > helpful to detect bug or symptoms that may be common to other cases which, > during our testing so far remained undetected. Sure. I don't mean to come across irritated at anything other than this particular bug that I can't seem to fix. :-) Can you enable the shutdown dialog on your system and see what X11 phase the shutdown is stalling on? Thanks! Tim I confirm when konqueror is open using the locate protocol that the logout process freezes hard. Sounds like a problem with tdeio-locate. dcop? Idea: I don't know what is up with tdeio-locate hanging the logout (commit 5c26e7a in bug report 1285 made no difference here), but when I enable the shutdown dialog and the two buttons appear, I had the idea that perhaps the dialog could list which process is acting persnickety that invokes the two buttons to appear? Another cosmetic observation with the shutdown dialog's behavior, seems on my system the two buttons always appear somewhere in the logout process, even if the dialog does not actually halt the logout to prompt user intervention. Seems to me the two buttons should never appear unless user intervention is required. When I use the locate: protocol in konqueror I notice two tdeio-locate processes appear rather than one. I see the following in my xsession-error log: tdelauncher: Fatal IO error: client killed kded: Fatal IO error: client killed [tdeinit] Fatal IO error: client killed [tdeinit] sending SIGHUP to children. [tdeinit] sending SIGTERM to children. [tdeinit] Exit. [tdeinit] Fatal IO error: client killed [tdeinit] sending SIGHUP to children. Hangup /opt/trinity/bin/starttde: line 1: [starttde]: command not found [starttde] Shutting down Trinity... [tdeinit wrapper] Warning: socket connection failed: : No such file or directory [tdeinit wrapper] Error: Can't contact tdeinit! [starttde] Running Trinity shutdown scripts... The logout process never gets to actually run the shutdown scripts. When I manually kill the two tdeio-locate processes then the logout process exits with no hangups. Missing braces noticed in comment 81 pushed to git in commit bac7ea9a. I opened Konqueror with locate://*.ogg and logged out. When the action buttons appeared in the logout status dialog, I clicked "Skip Notification" and the logout process completed without error. I don't see a problem here (except possibly in tdeio-locate not responding to the SaveYourself request?) Interesting you don't see the hang and we do. :-) I don't understand the whole SaveYourself process but your description sounds appropriate. I select the Skip Notification button and then the status dialog proceeds but hangs and never reaches the end the "X/X" status line. On my primary system the hang usually occurs at about 14/17 (processes). Let me know what information to grab to help. I posted some notes in comment 109. (In reply to comment #112) > Interesting you don't see the hang and we do. :-) > > I don't understand the whole SaveYourself process but your description sounds > appropriate. I select the Skip Notification button and then the status dialog > proceeds but hangs and never reaches the end the "X/X" status line. On my > primary system the hang usually occurs at about 14/17 (processes). > > Let me know what information to grab to help. I posted some notes in comment > 109. The information you just posted is quite helpful. My logout stalled at a similar stage for a while, but eventually (a couple of minutes) completed without error. I suspect that tdeio-locate is not only failing to respond to the SaveYourself request, but is probably also failing to respond to the SmsDie request. I just tested on my laptop and HTPC. With both the logout does finally terminate but there is a stall of about one minute. I can confirm small change: While earlier I watched Konqueror crash in 100% of cases, now I can rarely watch successful logout. Note: Actually, I have in Konqueror opened two tabs - the first with "locate:*.ogg", and second with "locate:zip". Logout takes about three minutes. In most cases with Konqueror crash. Note 1: With enabled logout status dialog can be waiting indefinitely because Konqueror crash remains hidden under the empty screen from logout status dialog. Note 2: Since Skip Notification is designed especially for a slow computers, 2 seconds is impossibly short period => on slow computers buttons will appear almost immediately, which is nonsense. I propose to extend the period to 10 seconds I have some true slow machines here: PI and PII class. Unlike the 1990s, neither runs any modern software snappily anymore, including Trinity. I'm in the middle of a package build run, which should finish this evening. I'll update the old machines with the latest packages and run some tests. Which cpp and line number contains the two second delay? Should we make the delay user-definable in kcontrol, possibly a spinner widget in increments of whole seconds, under the 'Show logout status dialog' check box? I found them in ksmserver/shutdown.cpp: #define KSMSERVER_SHUTDOWN_CLIENT_UNRESPONSIVE_TIMEOUT 60000 #define KSMSERVER_NOTIFICATION_MANUAL_OPTIONS_TIMEOUT 1000 The 60000 (60 seconds) roughly matches what I saw on the laptop and HTPC in comment 114. The 1000 (1 second) roughly matches what I described in comment 108. The 1 second period seems too fast for most modern hardware. That period might work fine on really old hardware. We'll find out when I later test on my PI and PII. The 60 second timeout seems to test any user's patience. Granted the real problem is the specific process not responding, but nonetheless causes a genuine WTF moment. :-) I like the idea of both timeouts being user-definable. I don't know what might be adequate defaults. :-) What if a Force Termination button were made available once the logout has stalled for a few seconds in Phase 2 (i.e. no activity from terminating applications)? This might produce a bit of a "hall of shame" effect that could motivate authors of stall-inducing applications to actually fix those applications... Interesting idea, but the "hall of shame" most often will be Trinity apps. :-) If a non Trinity app causes the hang, most users likely just want to exit. Many systems come with Ctrl-Alt-Backspace disabled and they need a way to force the logout to continue. Possibly the force button is nothing more than Ctrl-Alt-Backspace in disguise. Because I am helping with testing, I actually waited to observe the eventual logout after about 60 seconds. Most users are not obligated and will get frustrated. If I wasn't part of the testing group I'd revert to Ctrl-Alt-Backspace and say nasty things about Trinity. :-) Possibly a separate enhancement request? For now I think we resolve why tdeio-locate is acting persnickety? Whether or not a Force Termination button is added, I like the idea of adding more verbosity in the status dialog. When the logout hangs the users sees exactly which process is not responding. That would help with bug reports. The dialog seems to support that to some degree because I have seen actual app/process names scream by, but every time I see the hang I only see "X/X" and not the app/process name. (Odpověď na komentář #118)
> I found them in ksmserver/shutdown.cpp:
>
> #define KSMSERVER_SHUTDOWN_CLIENT_UNRESPONSIVE_TIMEOUT 60000
> #define KSMSERVER_NOTIFICATION_MANUAL_OPTIONS_TIMEOUT 1000
>
> The 60000 (60 seconds) roughly matches what I saw on the laptop and HTPC in
> comment 114. The 1000 (1 second) roughly matches what I described in comment
> 108.
>
> The 1 second period seems too fast for most modern hardware. That period might
> work fine on really old hardware. We'll find out when I later test on my PI and
> PII.
>
> The 60 second timeout seems to test any user's patience. Granted the real
> problem is the specific process not responding, but nonetheless causes a
> genuine WTF moment. :-)
>
> I like the idea of both timeouts being user-definable. I don't know what might
> be adequate defaults. :-)
On a slow machine is 1 second very little period - unfade effect of logout screen may take longer time :) Therefore, I believe that the interval for displaying buttons should be longer.
I'll post some data when I get the PI and PII updated. I created a patch for testing that changed both numbers: 60000->30000 and 1000->3000. I'll post data from that testing too. Right now I hope we can resolve why tdeio-locate is being a bad critter. :-) > What if a Force Termination button were made available
I don't use Windows much but I recall something similar to that idea being available on Windows NT4. The default timeout was 20 seconds but the dialog basically allowed terminating the waiting period and force killed the hung process.
Regarding more verbosity when the shutdown dialog hangs, I notice in shutdown.cpp, lines 697 - 761, there are if/else tests whether the dialog notification shows "%1/%2" or "%1/%2, %3." Why don't we just make "%1/%2, %3" the default all the time?
Some testing observations: Changing KSMSERVER_NOTIFICATION_MANUAL_OPTIONS_TIMEOUT 1000->3000 ================================================================= On my dual core systems the two buttons never appeared in the status dialog during normal no-hang logouts. That resolved the cosmetic issue mentioned in comment 108. The PI and PII liked the longer time too, although once on the PI the buttons still appeared during a normal no-hang logout. I did not repeat the appearance of the buttons on the PI but I restarted Trinity only a half dozen times. Changing KSMSERVER_NOTIFICATION_MANUAL_OPTIONS_TIMEOUT 1000->2000 ================================================================= No difference on the dual core systems. The two buttons never appeared during a normal no-hang logout. The PI and PII sometimes showed the two buttons during a normal no-hang logout and sometimes didn't. Conclusion, the default time should be no less than 2000 milliseconds, although 3000 probably is better for cosmetic reasons. KSMSERVER_SHUTDOWN_CLIENT_UNRESPONSIVE_TIMEOUT 60000->30000 =========================================================== On all of my systems this still seemed an awful long time. KSMSERVER_SHUTDOWN_CLIENT_UNRESPONSIVE_TIMEOUT 60000->20000 =========================================================== Still seemed like a long time but seems like a reasonable compromise too, even on the slower PI/PII systems. Comments: ========= The user does not know why the two buttons suddenly appear. When the reason is kate/kwrite with an unsaved document, the user might deduce what is happening because an additional dialog appears. With a hung app such as the currently broken tdeio-locate, there are no clues at all. When the two buttons appear the dialog status message should change to inform the user what is happening and the status message should list the unresponsive process that invoked the two buttons. Listing both the process ID and name would be a helpful debugging tool. When the status message displays the process name, such as kmix, kicker, etc. (parameter %3 in the sources), the status message does not refresh the process name when the number sequence changes, such as 8/12, kmix -> 9/12, kmix -> 10/12, kmix, etc. The process name remains static despite the number sequence changing. When the process does not return a name to the status message, then the %3 parameter should be blank. When the status dialog encounters an unresponsive process, the status message should provide the user notice why the logout stalled. Even if a third "Kill Now" button suddenly appears in the dialog, the status message should inform the user. When the "Kill Now" button appears, the status message changes to something like "Process XYZ is not responding." Running Trinity on the PI and PII is a severe test of patience. Not recommended! :-) Anybody using a PI or PII with a modern Linux based graphical environment probably needs a reality check. I would not treat such Trinity related issues as corner cases but nut cases. That said, I think for now we can live with hard-coded timeouts of 20000 and 2000, but eventually a slider or spinner widget to let the user define the timeouts would be a nice enhancement. I hope this helps! I have changed the button delay time to 3000, thanks for the work on that! While adding the code to show which application is stalling the logout process, I found a small glitch in the service handling code that *may* have been prolonging the stall to absurd levels. Can you re-test tdebase with GIT hash 7c366d7 and let me know if the logout stall remains? Thanks! By testing do you mean to leave KSMSERVER_SHUTDOWN_CLIENT_UNRESPONSIVE_TIMEOUT=60000? (In reply to comment #127) > By testing do you mean to leave > KSMSERVER_SHUTDOWN_CLIENT_UNRESPONSIVE_TIMEOUT=60000? For this test, yes. I still need to decide what to do with the value of KSMSERVER_SHUTDOWN_CLIENT_UNRESPONSIVE_TIMEOUT. Tim Ok. I rebuilt without patching KSMSERVER_SHUTDOWN_CLIENT_UNRESPONSIVE_TIMEOUT and left the default at 60000. This is a preliminary report. I ran only one one dual core system. I have not tested on the older PI/PII systems. When I ran locate:*.ogg, the two buttons eventually appeared at about 3 seconds. When selecting Skip Notification the logout continued MUCH faster than previously. When Trinity exits and the screen turns black, the black period is longer than during a non-stall logout. There are no clues in the shutdown dialog or the xsession-error log as to what invoked the two buttons to appear. I think we weill needs something with that to help with debugging. I will run further tests on the PI/PII later this afternoon. Follow up from my previous post: Notes: I login from the command line using startx. I do not have Trinity logout confirmations enabled. I have the shutdown dialog enabled for these tests. On my dual core systems, a non-stall logout averages about 3.7 seconds to return fully to the command line. When I use locate:*.ogg to produce a stall, from the moment I select the Skip Notification button the logout time averages about 8.5 seconds. Better than the test results in comment 125, which was about 50 seconds, but a longer than a non-stall logout. On the slow PI/PII systems, a non-stall logout averages about 12.5 seconds to return fully to the command line. When I use locate:*.ogg to produce a stall, from the moment I select the Skip Notification button the logout time averages about 16.5 seconds. I realize a process has stalled the logout. Perhaps I misunderstand but seems to me when the user selects the Skip Notification button the logout process should continue about as fast as a normal non-stall logout. There is no information in the status dialog or the xsession-error log as to the cause of the stall that invokes the two buttons. The status dialog status message offers no help as to why the buttons appear. Comment on attachment 1416 [details] Konqueror backtrace I found that the Konquror crash not related to logging out. That's why I brought it to a separate bug report - see bug 1605. Latest tests based upon patches from 7/31. The latest set of patches look good. When I create a stall condition with tdeio-locate, the status dialog shows the two buttons. The status message: Notifying applications of logout request (11/13).... When I create an unsaved document in Kate I see the following: Notifying applications of logout request (11/12, kate).... Seems tdeio-locate is broken in two ways: causing a stall and not providing the status dialog process information. When I don't select either button, the status dialog times out at approximately the hard-coded 60000 milliseconds. I still think this time is too long. Twenty seconds is a reasonable compromise but a user-defined timeout is better. Regardless of the actual timeout value, one idea is when the two buttons appear, to add a digital countdown timer. That would provide the user some idea what will happen and when. Possibly something like: Forced termination: 60 seconds With the number changing each second. When I tested with an unsaved document in kate, the 60 second timeout never activated. Apparently the kate dialog overrides the status dialog timeout. When I responded to the kate dialog, the status dialog proceeds with the logout in normal fashion because nothing is truly "stalling" the process. After responding to the kate dialog, the logout period is about 3.7 seconds, the same as a normal logout. Summary: ======== * tdeio-locate is broken in at least two ways. Do we want to resolve those bugs in this report or in a new report? * A countdown timer would be helpful. * 60 seconds still seems to long and 20 seconds seems a reasonable compromise. * During a true stall as with tdeio-locate, the logout period after selecting the Skip Notification button remains about 8.5 seconds, which is about 5 seconds longer than a normal non-stall logout. * I don't see any related messages in xsession-errors but I don't know whether the recent patches added such support. (I do have -DBUILD_PROFILE_SHUTDOWN=ON.) Many of the mentioned improvements have been included in GIT. I have also reduced the protection timeout from 60 seconds to 20 seconds as recommended by Darrell above, as I see no reason for it to be longer given the new graphical feedback. At this point I believe the only outstanding issue is kio_locate not responding to session management requests. Can you please test and confirm? Thanks! Tim, thanks for the work you've put into this bug report. I updated the tdebase sources and rebuilt. Here is what I see: * The countdown timer works nicely. :-) * As previously mentioned, when I create a stall with tdeio-locate, there is no process name to help with debugging. This is not the fault of the status dialog, only that tdeio-locate is broken in such a manner that information is unavailable. Ignoring that tdeio-locate needs repair, I don't know how to remedy this lack of information. As all other processes provide a process app name to the status dialog, how should the dialog handle stubborn apps? A process ID? Replace the empty app name with "Unknown process" in the status dialog message string? I think something would be helpful, not just for debugging for developers, but for users too. * When I create a stall with kate, the status dialog begins the 20 second countdown. At the end of the 20 seconds nothing happens and both the kate and status dialog remain on screen. In the case of kate I don't think users will get confused. Most users should figure out what is happening. Yet that the status dialog does not disappear after the 20 second timeout does seem funky. I wonder whether the status dialog should appear at all because kate already has its own dialog. Should the status dialog perhaps remain idle/off screen until the kate dialog is acknowledged? That would seem to make more sense. * As previously mentioned, during a stall, the time from when the status dialog disappears and the logout is fully completed is about 8.5 seconds on my dual core system. A normal non-stall logout is about 3.7 seconds. I recognize that stalled logouts will be the exception corner case rather than the norm, but seems odd that the process notification and killing does not proceed in about the same 3.7 second period after the status dialog disappears. This extra delay could be solely tdeio-locate all on its own because I see a 3.7 second time when I acknowledge the kate dialog. For testing all of this, tdeio-locate seems to be a classic Frankenstein corner case. I'm curious to see the results after resolving the tdeio-locate problems. I hope this helps! A trivial observation: the status dialog is not wide enough to always display the full text of the status message, which means the process names get clipped. Could we make the status dialog wider? Is the width defined in tdebase/kcontrol/smserver/smserverconfigdlg.ui at 325 pixels? If yes then I could tinker a bit with that. (In reply to comment #134) > Tim, thanks for the work you've put into this bug report. No problem. :-) I want this fixed for R14.0.0, as this is one area where TDE can easily display an interface consistent with the project's stated goals regarding keeping the user informed and in control at all times. > I updated the tdebase sources and rebuilt. Here is what I see: > > * The countdown timer works nicely. :-) Good! > * As previously mentioned, when I create a stall with tdeio-locate, there is no > process name to help with debugging. This is not the fault of the status > dialog, only that tdeio-locate is broken in such a manner that information is > unavailable. Ignoring that tdeio-locate needs repair, I don't know how to > remedy this lack of information. As all other processes provide a process app > name to the status dialog, how should the dialog handle stubborn apps? A > process ID? Replace the empty app name with "Unknown process" in the status > dialog message string? I think something would be helpful, not just for > debugging for developers, but for users too. I tried and tried to extract something useful from the session manager, but to no avail. All I can reliably give is something like "Unknown client (ID %d)" where client ID is not even a PID, it is an X11 session management ID. Would that be an improvement or simply add confusion? The client IDs are not helpful from a system admin point of view as they cannot (easily) be mapped to a process ID. > * When I create a stall with kate, the status dialog begins the 20 second > countdown. At the end of the 20 seconds nothing happens and both the kate and > status dialog remain on screen. In the case of kate I don't think users will > get confused. Most users should figure out what is happening. Yet that the > status dialog does not disappear after the 20 second timeout does seem funky. I > wonder whether the status dialog should appear at all because kate already has > its own dialog. Should the status dialog perhaps remain idle/off screen until > the kate dialog is acknowledged? That would seem to make more sense. I have added code to hide the dialog during client interaction, and it does make the logout seem cleaner than before. Good idea! However, I have only been able to replicate the timeout showing up during interaction once. There is code in place that should be not only hiding the dialog but also stopping the timer and displaying "logout paused" during client interaction. Can you see if there is a reliable method to trigger this bug in the latest tdebase code from GIT? > * As previously mentioned, during a stall, the time from when the status dialog > disappears and the logout is fully completed is about 8.5 seconds on my dual > core system. A normal non-stall logout is about 3.7 seconds. I recognize that > stalled logouts will be the exception corner case rather than the norm, but > seems odd that the process notification and killing does not proceed in about > the same 3.7 second period after the status dialog disappears. This extra delay > could be solely tdeio-locate all on its own because I see a 3.7 second time > when I acknowledge the kate dialog. > > For testing all of this, tdeio-locate seems to be a classic Frankenstein corner > case. I'm curious to see the results after resolving the tdeio-locate problems. tdeio-locate should now be fixed in GIT, but it will require a recompile of tqt3, tdelibs, and tdeio-locate to test. Basically, there is no reason for tdeioslaves to use X11 session management; I have therefore disabled session management in tdeio-locate which appears to fix the logout stall. If no ill effects are observed from this change, I would like to disconnect all the other tdeioslaves from the X11 session manager as well with similar patches. > I hope this helps! Yes it does! Thanks! Where is the status dialog width defined? I want to test that. I believe "unknown client" or "unknown process" is the best option. Adding other information that is not useful works against our project goals as well as confuse everybody. Let's accept that some apps/processes are not going to play nice with the new features. We live with that. :-) I will update my sources and rebuild. Initially I will keep the older tdeio-locate package as that provides a good mechanism for testing the features. I'll minimally rebuild tdebase to test the latest interaction with kate. Then later I'll build a complete new package set since several modules are affected to get tdeio-locate to function correctly. This methodical approach will take a day or two to complete. :-) That said, I don't understand your description of the timeout problem, but hopefully I'll understand better when I rebuild and test. I only rebuilt tdebase and tested kate. Upon logging out, the kate dialog appears immediately. The shutdown dialog appears about 3 seconds later. I was expecting the shutdown dialog not to appear at all when kate's dialog appeared. Perhaps I'm confused about that. From a user's perspective, I think most users will figure out what is happening when both dialogs appear. Yet I think when kate's dialog appears the shutdown dialog should stay out of the way for a cleaner interface. A cleaner interface will be less confusing to most users. The countdown timer starts at 18 seconds rather than 20. This seems repeatable although I did not test extensively. Ignoring both dialogs results in the shutdown dialog countdown timer resetting to 21 seconds. The shutdown dialog does not disappear and the countdown timer remains fixed at 21 seconds. The kate dialog remains on screen as well. I expected 20 seconds. Possibly an +1 increment glitch in the code. After ignoring both dialogs, selecting the Do Not Save button in the kate dialog results in the logout proceeding as normal. When testing the current buggy tdeio-locate, again the shutdown timer started at 18 seconds. As there is no additional dialog to acknowledge, the logout proceeds after the timer expires. As I already know the skip notification timer is 20 seconds, I'm expecting to see the countdown timer always start at 20. I believe the shutdown dialog is including the delay timer of 3 seconds in the math. With some imagination, the 21 seconds could be a fixable incorrect +1 increment glitch. As the delay timer is defined as 3 seconds, the 18 seconds likely is a result of 21 - 3. If the +1 increment glitch is remedied, then I'd expect the shutdown timer to start at 17 seconds (20 - 3). If the skip notification section is patched to ignore the 3 second delay timer, then I'd expect the shutdown dialog to always begin at 20 seconds. As non developer users will not be familiar with the code timer definitions, I think the countdown timer should start at 20 and not 20 - 3 (21 - 3). From an interface perspective, 20 seems like a "whole" number that is "intentional" whereas 18 seconds seems off the wall and strange. I understand the math but non developers users won't. :-) My thinking: the shutdown dialog does not appear for 3 seconds and only appears when a process delays the logout. Therefore the 20 second skip notification timer should not include the 3 second delay timer. Both should be separate timers. This seems to be partially the case already because when the shutdown countdown timer terminated and reset, the timer reset to 21 (20) seconds rather than 18 seconds. So seems only the intialization of the countdown timer needs attention. I'm going to update my entire source tree and rebuild a new package set. That will mean not testing tdeio-locate as we have been the past many days, but will reveal whether the latest X session changes work as intended. (In reply to comment #138) > I only rebuilt tdebase and tested kate. > > Upon logging out, the kate dialog appears immediately. The shutdown dialog > appears about 3 seconds later. This is useful information. This does not happen on my test machine, and it should not be happening at all. I will need to look into this problem further. For comparison sake, what happens at your end? I built a new package set with the latest changes as of this afternoon. tdeio-locate no longer stalls the logout! Short of temporarily reverting the patches to tdeio-locate, I don't know of any way to replicate the previous behavior. As the logout now proceeds normally, I was unable to see whether the status dialog message actually notified tdeio-locate or subsequently killed the process. The status messages scroll too fast. I noted the process IDs of tdeio-locate but did not find those IDs listed anywhere in the xsession-error log with respect to twin or ksmserver. The status dialog still appears when the kate dialog appears. The timeout numbers (math) need attention (comment 138). The status dialog width needs widening. A short list, really, considering everything else. :-) Is there some way I can test further with respect to tdeio slaves and X11 session fixes? I don't pretend to understand all that you fixed, but I can still test. (In reply to comment #141) > The status dialog still appears when the kate dialog appears. Should be fixed. > The timeout numbers (math) need attention (comment 138). Should be fixed via the dialog no longer appearing when a client is interacting. > The status dialog width needs widening. My only concern with this is 640x480 screens (yes, they still exist on servers and such!). How much wider does it need to be? Could I shorten the status dialog text instead? > A short list, really, considering everything else. :-) Yep! I hope we can close this bug for good very soon! So that this doesn't get overlooked, the other tdeioslaves should be patched the same way tdeio-locate was. This would a one-line patch for all tdeioslaves and *shouldn't* cause any problems unless the slave saves some kind of data on session close (highly unlikely, but one never knows!). I rebuilt tdebase. Attempting to logout results in the normal status dialog. Attempting to logout with an unsaved kate document results in the kate dialog appearing. The status dialog does not appear. When I acknowledge the kate dialog the logout proceeds without any status dialog ever appearing. Am I supposed to see the status dialog after acknowledging the kate dialog? Regarding the status dialog width, I'm dumbfounded that 640x480 still dictates design concerns. :-) That said, I don't think shortening the status dialog message is a good idea. If I knew where the width was defined I could tinker at my end (within the 640x480 limitation). Along that thought, would be nice if I could temporarily patch a 1/4 or 1/2 second delay in the status dialog messages so I can read them one-by-one. How would I patch that? This would be a test patch only. I try to read the status messages but they fly by too fast. A temporary 1/4 or 1/2 second delay would allow me to read them. Once in a while, not always, I can see "notificati" or something like that and I believe that process is kdbusnotification. Yet most of the time I can't read anything at all. Yes, a testament to the snappy exit speed :-), but for "quality assurance purposes" I'd like to actually validate/read the status text messages and see the closing parentheses. Regarding the remaining tdeio slaves you want to patch, I saw the tdeio-locate commit and the patch was a one-liner. I say go ahead and patch them all now. If something breaks, which I doubt, somebody will let us know soon enough. :-) Note: one element we did not test is logging out when the user does not have the status dialog enabled. The previously broken tdeio-locate would have been good for that test. Without the status dialog I suspect the desktop just hangs with no notice or clue provided to the user. In response to your queries regarding the location of the code setting the dialog width, I am pretty sure that the dialog width is dynamically set based on the TDE banner/header. It's been a while since I worked on that code. :-) IIRC, Windows used to insert ellipses if the text ran more than the width of some status dialogs. Would that work here? I don't want to tinker too much with the dialog as I like the look of it (artistically) the way it is now. I don't know how the status dialog is sized. All I see in the dialog title bar is "Please wait..." and the dialog is much wider than that text snippet. The truncated message text is cosmetic. I suppose reading all the text is meaningful only when a stall occurs. When a stall occurs the user might not be able to read the full string but should be able to read most of the string. Most of the strings are readable, but some longer process names are truncated at the right edge of the dialog. I'm rebuilding a full package set. I'll report later on the recent patches with interaction of the dialog and kate. Is there anything we can do when users do not enable the status dialog and a logout stall occurs? Does the 20 second timeout still occur? Will temporarily reversing commit 704ef698 2013-08-06 cause tdeio-locate to stall as previously experienced? We need a simple mechanism to test all of the recent patches as well as testing when the status dialog is not enabled. * I tested the latest patches against kate with an unsaved document. For a brief moment I see the status dialog, the status dialog then disappears, then the kate dialog appears. I checked that the status dialog was not cleverly hidden behind the kate dialog, etc. :-) When I acknowledge the kate dialog the status dialog reappears and the logout proceeds. The status dialog is on screen less time that when there is no stall. I'm guessing the status dialog appears, kate stalls the process, the status dialog disappears but retains memory of previous notifications, and when reappearing resumes the process, which would explain why the dialog is on screen less time than normal. Just guessing. :-) * I reversed the patch in tdeio-locate but that did not create a stall. Thus I no longer can test the two buttons. * I don't know of any way to test a hidden stall as done with the previously broken tdeio-locate. * A testing patch that adds a 1/4 to 1/2 second pause after each status dialog message would be a nice way to slow the logout process to be readable. Everything flies by so fast I don't know how we can really verify anything. (In reply to comment #148) > * I reversed the patch in tdeio-locate but that did not create a stall. Thus I > no longer can test the two buttons. That should have worked. Did you log out and log in after building and installing the modified tdeio-locate? > * I don't know of any way to test a hidden stall as done with the previously > broken tdeio-locate. > > * A testing patch that adds a 1/4 to 1/2 second pause after each status dialog > message would be a nice way to slow the logout process to be readable. > Everything flies by so fast I don't know how we can really verify anything. You would need to add a "usleep(250000);" after each call to "static_cast<KSMShutdownIPDlg*>(shutdownNotifierIPDlg)->setStatusMessage(...);" in shutdown.cpp. I would like to close this report if possible, and have you file a new report with a lower priority for the truncated text problem. This report is an R14 blocker, whereas I don't think the truncated text should block R14. :-) Durnit. My bad. I downloaded the tdeio-locate patch but forgot to actually apply. Reversing the patch does indeed restore the stall behavior. With that specially compiled package I now can test/observe the buttons again. I'll tinker with a pause patch on my own. I filed bug report 1618 for the remaining minor issues. |