By default, Bugzilla does not search the list of RESOLVED bugs.
You can force it to do so by putting the upper-case word ALL in front of your search query, e.g.: ALL tdelibs
We recommend searching for bugs this way, as you may discover that your bug has already been resolved and fixed in a later release.

Bug 760

Summary: TDE exits too slow
Product: TDE Reporter: Darrell <darrella>
Component: tdebaseAssignee: Timothy Pearson <kb9vqf>
Status: RESOLVED FIXED    
Severity: blocker CC: bugwatch, darrella, fatzer2, kb9vqf, slavek.banko
Priority: P1    
Version: R14.0.0 [Trinity]   
Hardware: Other   
OS: All   
See Also: http://bugs.pearsoncomputing.net/show_bug.cgi?id=922
Compiler Version: TDE Version String:
Application Version: Application Name:
Bug Depends on: 258, 922    
Bug Blocks:    
Attachments: A patch to work around a possible race condition.
Patch to provide cmake configure option to build ksmserver timing marks
Konqueror backtrace
Ksmserver backtrace
ksmserver backtrace when logging out

Description Darrell 2011-12-22 16:22:30 CST
Exit times from TDE vary but I have seen delays up to 15 to 17 seconds before I am returned to the command line (I use startx). This is on a dual core machine. Typical exit times are about 7 to 8 seconds, which is still too long.

Per request of project leader Tim, I am filing this bug report to track future troubleshooting and testing of possible causes. "/dev/ammo42" noticed some timeouts in the code that could be part of the problem.
Comment 1 Darrell 2011-12-23 23:31:14 CST
Some new information: Basically, I see an excessive slow exit whenever I enable the KDED System Base URL Notifier daemon. With that daemon enabled I see an exit time of about 17 seconds. When disabled I see an exit time of about 7 seconds. That latter time is still slow, but much better than the other time.

When I see the excessive slow exit, I might see any of the following error messages in my xsession log:

klauncher: Fatal IO error: client killed
kdeinit: Fatal IO error: client killed
kded: Fatal IO error: client killed
kxdglauncher: cannot connect to X server :0
ICE default IO error handler doing an exit (), pid 17202, errno = 32

None of those errors consistently appear with the slow exit time, but they do appear.

One error message in my xsession log that always appears when I see the excessive slow exit:

kdeinit: Got EXEC_NEW 'kconf_update' from launcher

That last message is similar to the problems reported in bug report 688. However, for this report I temporarily configured the vesa X driver and every time I still saw the excessive slow exit time and the same kconf_update error message. There definitely is something wrong with kconf_update. Possibly the problem in bug report 688 is not related to Nvidia drivers at all but is solely caused by kconf_update.
Comment 2 Darrell 2012-03-22 01:35:47 CDT
I'm bumping this bug report a notch from Minor to Major and changing the report description.

Today I tested a fresh Trinity install from GIT. I am impressed with how responsive the entire desktop is compared to KDE3. Unfortunately, exiting the session is horribly long. The session exit speed tarnishes an otherwise stellar experience.

Once again, the "Got EXEC_NEW 'kconf_update' from launcher" message appears in the tail end of the xsession log.
Comment 3 Darrell 2012-03-22 17:52:24 CDT
I won't say this is the sole cause of the slow exit speed, but there is a relationship with KDED System Base URL Notifier daemon.

Here is a typical xsession log with stopping the KDED System Base URL Notifier daemon before ending the session:

[starttde] Shutting down Trinity...
[tdeinit] Terminating Trinity.
[klauncher] Exiting on signal 1
[starttde] Running Trinity shutdown scripts...
[starttde] Trinity shutdown complete.

Here is a typical xsession log without stopping the KDED System Base URL Notifier daemon before ending the session:

[starttde] Shutting down Trinity...
[tdeinit] Terminating Trinity.
[klauncher] Exiting on signal 1
[tdeinit] Launched DCOPServer, pid = 11111 result = 0
[tdeinit] Launched KLauncher, pid = 11115 result = 0
[tdeinit] opened connection to :0.0
[tdeinit] Launched KDED, pid = 11116 result = 0
[tdeinit] Got EXT_EXEC 'kbuildsycoca' from launcher.
[kbuildsycoca] kbuildsycoca running...
[tdeinit] PID 11118 terminated.
[tdeinit] Got EXEC_NEW 'kconf_update' from launcher.
[tdeinit] PID 11119 terminated.
[tdeinit] PID 11116 terminated.
[starttde] Running Trinity shutdown scripts...
[starttde] Trinity shutdown complete.
end from FAM server connection
[tdeinit] Fatal IO error: client killed
[klauncher] Exiting on signal 15
kded: Fatal IO error: client killed
kxdglauncher: cannot connect to X server :0
[KDE-ICE error] ICE default IO error handler doing an exit(), pid = 9790, errno = 32

There is a bunch of things going on with that daemon running than without. The message "kded: Fatal IO error: client killed" likely is related to the daemon.

And why are kbuildsycoca and kconf_update being restarted?

Leaving the daemon running in KDE3 does not impact the exit time. Exit time with KDE3 is about 4.5 seconds. Exit time in TDE with the daemon stopped is about 7 seconds, which is still too slow.

Is there a way to trace what is happening during a session shutdown?
Comment 4 Darrell 2012-03-23 12:13:46 CDT
I have an interesting update with useful information.

I can reduce the exit time from about 17 seconds to about 7, without disabling the KDED System Base URL Notifier daemon. All I need do is trigger a device event.

That is, mount a partition, insert a USB stick, insert a CD/DVD. I do not have to mount the new device, just cause Trinity to become aware of the new device. For example, I can insert a CD/DVD and then cancel the notification popup dialog. That is sufficient to cause the KDED System Base URL Notifier daemon to not stall when logging out.

This behavior is very repeatable. Without any device event the exit time is always about 17 seconds. With any device event the exit time is always about 7 seconds. With the latter the xsession log is always clean as mentioned in Comment 3.

Part of the solution then to resolving this bug report with respect to the KDED System Base URL Notifier daemon is connected to resolving the following bug reports:

372 storage media applet doesn't show usb stick at kde startup
385 Unmounted removable device icons do not appear on the desktop
578 media:/ doesn't show files, remounts unit
634 CDs not correctly recognized by TDE
Comment 5 Darrell 2012-03-28 23:23:05 CDT
The patch that resolved bug report 372 and 385 did not have any effect on this bug report. I still need to initiate a device change to witness the improved exit time. :( Therefore the KDED System Base URL Notifier daemon is still not working quite right.
Comment 6 Timothy Pearson 2012-03-29 00:32:36 CDT
My hunch is that the KDED System Base URL Notifier daemon is unaware that a shutdown is in progress, and is somehow kicked into doing startup tasks by something that happens during the shutdown sequence.
Comment 7 Darrell 2012-03-29 00:47:29 CDT
That sounds exactly like what is happening, based upon the xsession log (kconf_update, etc.).

How do we slap it upside the head?
Comment 8 Timothy Pearson 2012-03-29 12:39:59 CDT
(In reply to comment #7)
> That sounds exactly like what is happening, based upon the xsession log
> (kconf_update, etc.).
> 
> How do we slap it upside the head?

I can't replicate this on my Ubuntu machine, which will make debugging more difficult.

For reference, the offending service is present here: tdebase/kioslave/system/kdedmodule/

What happens if you navigate to the system:/ URL in Konqueror before logging out?  Does the slow shutdown still occur?
Comment 9 Darrell 2012-03-29 17:44:34 CDT
Typing system:/ in Konqueror makes no difference.

I tried several different scenarios. The only method I found to avoid the slower exit speed is initiating a device change event. Mounting is not required, just the change.

Having a CD/DVD in the tray before starting the session has no effect. Ejecting the disk before ending the sessions works.

Summary: only a change event works.

I cannot replicate the slower exit speed in a virtual machine. Only in an actual physical machine. I use the same machine for both TDE and KDE3 but don't experience the problem in KDE3 (different partitions, no cross contamination).

I start X/TDE from the command line through startx and not through TDM. Yet testing through TDM reveals the same times and behavior.
Comment 10 Timothy Pearson 2012-03-30 02:40:20 CDT
(In reply to comment #9)
> Typing system:/ in Konqueror makes no difference.
> 
> I tried several different scenarios. The only method I found to avoid the
> slower exit speed is initiating a device change event. Mounting is not
> required, just the change.
> 
> Having a CD/DVD in the tray before starting the session has no effect. Ejecting
> the disk before ending the sessions works.
> 
> Summary: only a change event works.
> 
> I cannot replicate the slower exit speed in a virtual machine. Only in an
> actual physical machine. I use the same machine for both TDE and KDE3 but don't
> experience the problem in KDE3 (different partitions, no cross contamination).
> 
> I start X/TDE from the command line through startx and not through TDM. Yet
> testing through TDM reveals the same times and behavior.

Having to have a device change status before shutdown occurs normally could be due to any number of problems, and is therefore not very helpful for narrowing down the exact location of the problem.

Another potential cause of this would be the system:/ URL being accessed during the shutdown sequence on fast computers.
Comment 11 Darrell 2012-03-30 09:48:25 CDT
Side note: the device change does not have to be initiated from within the session. I can toggle to a different console, mount a partition, and then the TDE session exits with the faster speed.

What can I do to troubleshoot further? Or even experiment with code changes?
Comment 12 Timothy Pearson 2012-03-30 14:24:00 CDT
(In reply to comment #11)
> Side note: the device change does not have to be initiated from within the
> session. I can toggle to a different console, mount a partition, and then the
> TDE session exits with the faster speed.
> 
> What can I do to troubleshoot further? Or even experiment with code changes?

I would start by putting printf("<something>"); fflush(); debug markers in the system kioslave constructor, init(), and destructor methods.  I would also put a debug statement in the directory listing method that prints out the requested URL.  This is all so that I could get a feel for what types of access are being requested when.

If you throw the printf();fflush(); pairs into the kioslave, the debug messages will show up in your ~/.xsession_errors log in the correct temporal locations.  Posting that log here after you inserted the debug statements would provide valuable information.

Tim
Comment 13 Darrell 2012-03-30 17:53:22 CDT
Ok. I understand what you're trying to accomplish.  I've done that many times in my other programming ventures. I just hope I can get the c++ syntax correct --- building tdebase takes almost a half hour these days.
Comment 14 Darrell 2012-03-31 13:37:17 CDT
> Another potential cause of this would be the system:/ URL being accessed during
> the shutdown sequence on fast computers.

Please explain how I could investigate this further.

I tested on a 350 MHz PII machine. I could not replicate the stall or the xsession errors. Exit times were consistent with/without a device event and the xsessionlog was always clean.

Along with being unable to replicate the problem on a virual machine as well, I think there might be something to your statement.

I can't slow my fast "problem" machine any more than I now witness the problem, which with CPU governing is 1 GHz.
Comment 15 Timothy Pearson 2012-04-03 13:43:13 CDT
(In reply to comment #14)
> > Another potential cause of this would be the system:/ URL being accessed during
> > the shutdown sequence on fast computers.
> 
> Please explain how I could investigate this further.

With the debug statements I mentioned earlier--when they are inserted into the ~/.xsession_errors logfile I can make a pretty good guess as to what is going wrong.
Comment 16 Darrell 2012-04-04 01:04:32 CDT
After adding the printf statements I no longer can replicate the stall.

I added numerous printf statements in the two kdedmodule cpp files. Probably too many.

However, not directly related to my intentions, but possibly related to the logout stall, I found four typos in tdelibs/dcop/KDE-ICE. There were four occurrences of

IceClosetStatus

that I changed to

IceCloseStatus

I'm guessing the IceClosetStatus typo is a remnant from IceClosetqStatus, most of which was cleaned up after purging inadvertent tqStatus changes.

Whether or not I force a device event now makes no difference. The exit to the command line now is always about the same 6 to 7 seconds, as opposed to the 16 to 17 seconds previously.

At this point I don't know what serves as the cure. The tdelibs typo corrections or the possibility that, with the printf/stdout requirements, the kded system module has something "to do" during a logout that no longer causes a stall. I say that because there a few places in the two cpp files where there is nothing between braces.

The next thing is build tdelibs again without the typo patch and only change that one aspect. If nothing changes then I'll have to build several versions of tdebase with differing locations of the printf statements.

One way or another after this, I'll push the tdelibs patch to GIT.
Comment 17 Darrell 2012-04-04 18:43:46 CDT
Possible success!

This required much trial and error. Regardless of where I inserted printf statements, the stalls disappeared. Posting any xsession log would have been futile.

Finally I narrowed the problem to one location: tdebase/kioslave/system/kdedmodule/systemdirnotify.cpp, SystemDirNotify::init.

After removing all other printf statements, the one printf always was required. Here is the final patch I used:

diff -urN tdebase/kioslave/system/kdedmodule/systemdirnotify.cpp tdebase.new/kioslave/system/kdedmodule/systemdirnotify.cpp
--- tdebase/kioslave/system/kdedmodule/systemdirnotify.cpp  2012-03-23 19:54:25.000000000 -0500
+++ tdebase.new/kioslave/system/kdedmodule/systemdirnotify.cpp  2012-04-04 17:50:53.000000000 -0500
@@ -36,6 +36,7 @@
 void SystemDirNotify::init()
 {
  if( mInited )
+   kdDebug() << "SystemDirNotify::init(" << mInited << ")" << endl;
    return;
  mInited = true;
  KGlobal::dirs()->addResourceType("system_entries",

The kdDebug statement does not clutter the xsession output like the printf statement yet seems to serve the same purpose as the printf statement.

I don't know why this work-around succeeds or whether there is a cleaner or more robust solution.

Now that the cause is isolated, perhaps an improved solution is possible. If not then I ask for a peer review and signoff of the above patch before pushing to GIT.

If this particular patch is approved and pushed, then perhaps to avoid future deletions I should modify the patch with a comment why the kdDebug statement exists.
Comment 18 Timothy Pearson 2012-04-04 19:56:45 CDT
What you have done is to work around a race condition.  I would add a comment like this:

// FIXME Work around race condition by inserting delay before following code is executed--the root cause of the race should be investigated and fixed
Comment 19 Darrell 2012-04-04 21:04:27 CDT
I'll add the comment and push to GIT.

Curious --- to learn --- how you discerned that. :)

After reading your response I notice homedirnotify.cpp uses the exact same code for initialization. Should we push a similar "temporary" patch for that too? I haven't yet noticed anything directly related, but --- who knows.
Comment 20 Timothy Pearson 2012-04-04 21:50:02 CDT
(In reply to comment #19)
> I'll add the comment and push to GIT.
> 
> Curious --- to learn --- how you discerned that. :)

Educated guess. :-)  I've been around the innards of multithreaded programs long enough to recognize a race condition from the symptoms you describe.

> 
> After reading your response I notice homedirnotify.cpp uses the exact same code
> for initialization. Should we push a similar "temporary" patch for that too? I
> haven't yet noticed anything directly related, but --- who knows.

I wouldn't.  This is a temporary hack only and the real root of the race condition may not cause a problem within homedirnotify.cpp.

Tim
Comment 21 Darrell 2012-04-04 22:11:34 CDT
Okay. Pushed as is with comment in GIT hash 9e172fa7a1e93cc77e09616eb793b823d29ebaec.

By the way, the xsession log is nice and clean again with this work-around.

Although relieved to have found a work-around to the extensive delay, I want to leave this bug report open. Seven seconds to exit (no session saving) still seems too long.

Thanks for your help!
Comment 22 Timothy Pearson 2012-04-04 23:00:50 CDT
(In reply to comment #21)
> Okay. Pushed as is with comment in GIT hash
> 9e172fa7a1e93cc77e09616eb793b823d29ebaec.
> 
> By the way, the xsession log is nice and clean again with this work-around.
> 
> Although relieved to have found a work-around to the extensive delay, I want to
> leave this bug report open. Seven seconds to exit (no session saving) still
> seems too long.
> 
> Thanks for your help!

No problem; glad to be of assistance!

Regarding the 7 second delay, I seem to remember hardcoded delays in ksmserver.  Might be worth checking into to make sure none of them are being triggered.  Another potential source of delay is due to the time it takes to close down dozens of running programs.
Comment 23 Darrell 2012-04-04 23:10:35 CDT
/dev/ammo42 mentioned the delays in the mailing list (read original problem description). I'll try to learn more.

I manually close all apps when I exit, but reflecting on that statement, I realize I run a few apps in the system tray: kalarm, kteatime, akregator, klippy, kwikdisk, kmix, most of the kde related services.

I will experiment with manually terminating those apps before starting the exit and see how much difference there is. At least with the serious "race" delay out of the way we can tinker more clearly.
Comment 24 Darrell 2012-04-06 19:02:33 CDT
The "race condition" patch broke a few things.

The feedback dialog no longer appears, Confirm logout no longer works, and there is no grade fading effect.

After rebuilding tdebase with a reversed patch, those functions return. I reversed the patch in GIT.

Back to square one. :(
Comment 25 Timothy Pearson 2012-04-07 14:33:07 CDT
(In reply to comment #24)
> The "race condition" patch broke a few things.
> 
> The feedback dialog no longer appears, Confirm logout no longer works, and
> there is no grade fading effect.
> 
> After rebuilding tdebase with a reversed patch, those functions return. I
> reversed the patch in GIT.
> 
> Back to square one. :(

Out of curiosity do the ill effects occur if you use printf() instead of kdDebug()?
Comment 26 Darrell 2012-04-07 16:17:38 CDT
Created attachment 525 [details]
A patch to work around a possible race condition.

I had to rebuild, but curiously, using printf works whereas kdebug did not. No problems with the dialogs appearing.

More curious, the printf message never appears in the xsession log, which means that snippet of code is not being executed. Yet adding the printf avoids the exit stall and I don't need a device change event either.

I am attaching the updated patch to this bug report in case others experience the same problem, but I won't push to GIT for now. Further, I can devise work-arounds and live with the "improved" 7 second exit time --- I'd rather see team energies focused on many other bug reports.

Thanks much for your help!
Comment 27 Darrell 2012-04-07 16:19:19 CDT
Another reason for not pushing to GIT for now is that leaves open the possibility that others might experience the bug. Additional data might help resolve the problem in a meaningful manner.
Comment 28 Timothy Pearson 2012-04-08 20:16:44 CDT
(In reply to comment #26)
> Created attachment 525 [details]
> A patch to work around a possible race condition.
> 
> I had to rebuild, but curiously, using printf works whereas kdebug did not. No
> problems with the dialogs appearing.

kdDebug works very differently than printf() and requires access to the TDE core IIRC, so I could easily see it interfering with the shutdown dialogs.
Comment 29 Timothy Pearson 2012-05-15 14:16:36 CDT
(In reply to comment #28)
> (In reply to comment #26)
> > Created attachment 525 [details] [details]
> > A patch to work around a possible race condition.
> > 
> > I had to rebuild, but curiously, using printf works whereas kdebug did not. No
> > problems with the dialogs appearing.
> 
> kdDebug works very differently than printf() and requires access to the TDE
> core IIRC, so I could easily see it interfering with the shutdown dialogs.

Just FYI kddebug apparently requests access to the TDE configuration system and should NOT be used in shutdown-related code.
Comment 30 Darrell 2012-05-15 14:31:32 CDT
1. Regarding slow exit culprits, do your comments mean a patch is forthcoming? :-)

2. Do we want to push the existing patch attachment (changing kdedebug -> printf)? I haven't noticed any problems after changing to printf.
Comment 31 Timothy Pearson 2012-05-15 14:41:53 CDT
(In reply to comment #30)
> 1. Regarding slow exit culprits, do your comments mean a patch is forthcoming?
> :-)

No, sorry.  As stated before I have yet to see this on any of my systems.

> 2. Do we want to push the existing patch attachment (changing kdedebug ->
> printf)? I haven't noticed any problems after changing to printf.

Go ahead.  Just make sure that the printf is preceded by a comment explaining its hacky purpose and a warning not to remove it.
Comment 32 Timothy Pearson 2012-05-15 14:42:43 CDT
(In reply to comment #31)
> (In reply to comment #30)
> > 1. Regarding slow exit culprits, do your comments mean a patch is forthcoming?
> > :-)
> 
> No, sorry.  As stated before I have yet to see this on any of my systems.
> 
> > 2. Do we want to push the existing patch attachment (changing kdedebug ->
> > printf)? I haven't noticed any problems after changing to printf.
> 
> Go ahead.  Just make sure that the printf is preceded by a comment explaining
> its hacky purpose and a warning not to remove it.

Actually, just for fun try replacing the hacky printf(); with sleep(1); and see if that also cures the problem.

Tim
Comment 33 Darrell 2012-05-16 14:18:31 CDT
Race condition patch pushed in GIT hash d41f5217.

I read your statement again about kdDebug and now realize you were commenting about why using printf is better than using kdDebug, and not proposing possible solutions.

There are a handful of kdDebug statements in tdebase/ksmserver/shutdown.cpp. If that good or bad? Should those statements be converted to printf statements? fprintf(stderr...)?

What about using kdDebug in tdebase/ksmserver/startup.cpp or tdebase/ksmserver/server.cpp?

Regarding the bug report description, Xfce exits in about 2.5 seconds. With the race condition patch, TDE still exits in about 6.5 seconds and that is without saving the session. To debug, would be nice if somehow we could temporarily insert timing marks and printf statements.
Comment 34 Timothy Pearson 2012-05-16 22:23:47 CDT
(In reply to comment #33)
> Race condition patch pushed in GIT hash d41f5217.
> 
> I read your statement again about kdDebug and now realize you were commenting
> about why using printf is better than using kdDebug, and not proposing possible
> solutions.
> 
> There are a handful of kdDebug statements in tdebase/ksmserver/shutdown.cpp. If
> that good or bad? Should those statements be converted to printf statements?
> fprintf(stderr...)?

It might be good to convert them to printf(), as this will prevent potential reloading of ksycoca on fast multicore systems.

> What about using kdDebug in tdebase/ksmserver/startup.cpp or
> tdebase/ksmserver/server.cpp?

Won't matter, I would leave it as-is with kdDebug being used.

> Regarding the bug report description, Xfce exits in about 2.5 seconds. With the
> race condition patch, TDE still exits in about 6.5 seconds and that is without
> saving the session. To debug, would be nice if somehow we could temporarily
> insert timing marks and printf statements.

You can do that in ksmserver/shutdown.cpp. ;-)  Just use printf and you should be fine.

I still wager much of that delay is ksmserver waiting for various apps to save their state (this is different from saving a session, and always happens BTW).

Tim
Comment 35 Timothy Pearson 2012-05-17 00:38:29 CDT
Do you see the slow logout time (with a locally reverted hash d41f5217) if you set showFancyLogout=false?

I ask because in Bug 922 you mention a very similar slowdown to the one reported here, although Bug 922 is not the appropriate place to discuss this topic. ;-)
Comment 36 Darrell 2012-05-17 01:02:44 CDT
The only way to test the patch for bug report 922 is with showFancyLogout=true.

When showFancyLogout=false, then apps intercept the session exit request and the exit time is normal. The ICE error does not appear.

But perhaps I stumbled across the cause. As yet there is no GUI check to control the status dialog.
Comment 37 Darrell 2012-05-17 01:42:05 CDT
Aargh --- too darned late at night....

The only way to test the patch for bug report 922 is with showFancyLogout=true.

When showFancyLogout=false, then apps intercept the session exit request and the exit time is normal. The ICE error does not appear.

Perhaps I stumbled across the cause. Well, not really. I rebooted and now can't repeat the problem. I witnessed the slow exit only once (kolourpaint) after many session restarts.

With that said, when showFancyLogout=true and I select Discard from the save dialog, I see this message in my xsession log:

[dcopserver] DCOP aborting while waiting for answer from 'knotify'

I see the message whether I save the document or don't save. The error message does not appear when there are no unsaved documents.

At this point I'm too tired to troubleshoot further.
Comment 38 Timothy Pearson 2012-05-17 02:19:18 CDT
(In reply to comment #37)
> I rebooted and now can't
> repeat the problem. I witnessed the slow exit only once (kolourpaint) after
> many session restarts.

Was this with showFancyLogout=true or showFancyLogout=false?
Comment 39 Darrell 2012-05-17 11:16:03 CDT
The problem reported in bug report 922 applies only when showFancyLogout=true. When showFancyLogout=false then the status dialog never appears and apps with unsaved files always works. No sense in testing the latter.

I was up way too late last night trying to test this and find a pattern. Possibly my exhaustion was part of the problem, although as originally reported, I witnessed the slow exit earlier in the evening. I know I saw what I saw and repeatibility was perfect each time.

I am rebuilding a new package set because of the ABI breakage in the latest tdelibs patch. Like the Konqueror SIGABRT problem reported in bug report 293, a rebuild of everything is necessary. Possibly the new slow exit is related. I should know more later today after the run completes.
Comment 40 Darrell 2012-05-20 18:42:43 CDT
I found the root cause for the latest slow exit issue. Some good news is I can replicate this bug in 3.5.10 too. Therefore this bug has existed a long time. In all that time few people noticed and probably even fewer investigated, if any.

This might sound crazy, but the cause is not having an event sound enabled for warning dialogs.

I don't have an event sound enabled for warning messages. I have event sounds disabled for most message dialogs.

When I have an event sound enabled for warning dialogs, then the delay is unlikely to occur. I write unlikely because on one occasion I witnessed the delay despite having the event sound enabled.

For certain, when an event sound for warning messages is not enabled, then the delay always happens.

Open KControl, Sound & Multimedia, System Notifications, Event source: System Notifications, Event: "A warning message is being shown." Uncheck "Play a sound" and select the Apply button.

This is the most common error message:

* ICE default IO error handler doing an exit(), pid = 27431, errno = 11

The same error message appears in 3.5.10 too. The error message is from from libICE error.c. I'm thinking there is breakage somewhere in interprocess communications.

When I close an app that has an unsaved file and Discard the changes, but wait 10 to 20 seconds before exiting TDE, the exit time is normal with no abnormal delay. There seems to be a timeout of some kind.

The only time I see the ICE error message is discarding changes while trying to exit TDE.

I tested this in both virtual and real machines.

I have seen three other error messages, but I want to focus on one thing at a time. Hopefully a patch will cause all error messages to disappear.
Comment 41 Darrell 2012-06-13 11:44:37 CDT
Tim,

Regarding the previous Comment 40 that not having an event sound enabled for
warning dialogs causes a significant session exit delay, do we want a separate bug report for that? Or will we resolve that problem through this bug report?

The work-around is to enable a sound event, but that is not a great solution for users who do not want a sound for that dialog. Let me know. Thanks.
Comment 42 Darrell 2013-04-03 22:31:27 CDT
FYI: KDE4 exits real fast (although startup is a bit slower than Trinity). Perhaps some code snippets can be massaged from KDE4. Possible connection is bug report 1425.
Comment 43 Darrell 2013-04-23 11:33:25 CDT
Regarding comment 40, this anomaly seems to have disappeared. The exit time remains the same for me now whether or not I have a warning sound enabled.

Exit time is about 5 seconds for me. Thus there have been nominal optimizations over the months since starting this report, although I don't know exactly what. Typically I have several apps in the system tray, but closing them all before timing the exit results in saving only less than a half second.

Five seconds is not horrible but remains noticeable for anybody familiar to other environments, such as xfce or kde4.
Comment 44 Darrell 2013-04-24 14:59:26 CDT
Interesting update while trying to help test bug report 1464. The testing account I use needs about 17 seconds to exit Trinity. So there remains something awry about exiting Trinity. I'm adding this note for reference --- I'll try to get back to this at a later time.
Comment 45 Timothy Pearson 2013-04-24 15:58:10 CDT
I added some profiling code to ksmserver, and while this is not yet pushed to GIT, I did notice that around 3 seconds are spent waiting for logoutSoundFinished to fire on my test box.  It would be somewhat embarrassing if this was the root cause of the slow TDE logout!
Comment 46 Timothy Pearson 2013-04-24 16:18:32 CDT
The logout sound delay only occurs if the system does not have any audio hardware installed.  I am seeing a second problem in that timeoutQuit is triggered due to knotify not exiting on prior request; this reliably adds 10 seconds to the logout process.
Comment 47 Darrell 2013-04-24 16:28:57 CDT
An interesting bit is the testing account does not always use 17 seconds to exit Trinity. Sometimes I see about 5 to 6 seconds, which is about the same as my normal account (which to me is too long).

Note: I don't define a login or logout sound in any of my accounts. Possibly part of the delay is related to not having a sound defined. That you mention this seems potentially related to my older post about not having a warning sound defined, although I no longer can replicate that exact delay.
Comment 48 Timothy Pearson 2013-04-24 16:32:09 CDT
(In reply to comment #47)
> An interesting bit is the testing account does not always use 17 seconds to
> exit Trinity. Sometimes I see about 5 to 6 seconds, which is about the same as
> my normal account (which to me is too long).
> 
> Note: I don't define a login or logout sound in any of my accounts. Possibly
> part of the delay is related to not having a sound defined. That you mention
> this seems potentially related to my older post about not having a warning
> sound defined, although I no longer can replicate that exact delay.

Likely knotify is still not exiting on demand, even if the 10 second hard limit is not triggered.
Comment 49 Darrell 2013-04-24 16:35:30 CDT
That agrees a little with the message I shared in comment 37.
Comment 50 Timothy Pearson 2013-04-24 19:48:47 CDT
(In reply to comment #48)
> (In reply to comment #47)
> > An interesting bit is the testing account does not always use 17 seconds to
> > exit Trinity. Sometimes I see about 5 to 6 seconds, which is about the same as
> > my normal account (which to me is too long).
> > 
> > Note: I don't define a login or logout sound in any of my accounts. Possibly
> > part of the delay is related to not having a sound defined. That you mention
> > this seems potentially related to my older post about not having a warning
> > sound defined, although I no longer can replicate that exact delay.
> 
> Likely knotify is still not exiting on demand, even if the 10 second hard limit
> is not triggered.

Still working on the knotify exit delay, but wanted to note that this has been around for a LONG time, even in KDE4.  In fact, KDE4 may still exhibit this problem, but not on your system for whatever reason: https://bugs.kde.org/show_bug.cgi?id=152611
Comment 51 Darrell 2013-04-24 20:22:00 CDT
Interesting. True, on my system KDE4 exits immediately. So fast that I could not help but notice.

Xfce exits immediately too.

I am not surprised if this problem has been around since the 3.x days. We've seen that more than once around here. Not assigning blame, just saying that when a bug exists for some people but not all that debugging gets far more challenging. Corner case issues are a devil. I'm guessing that is at least partially the case here.

Then there is social conditioning. Users grow accustomed to a several second delay and presume that is normal. So few people challenge that presumed status quo. :)
Comment 52 Timothy Pearson 2013-04-24 20:34:26 CDT
(In reply to comment #51)
> Interesting. True, on my system KDE4 exits immediately. So fast that I could
> not help but notice.
> 
> Xfce exits immediately too.
> 
> I am not surprised if this problem has been around since the 3.x days. We've
> seen that more than once around here. Not assigning blame, just saying that
> when a bug exists for some people but not all that debugging gets far more
> challenging. Corner case issues are a devil. I'm guessing that is at least
> partially the case here.
> 
> Then there is social conditioning. Users grow accustomed to a several second
> delay and presume that is normal. So few people challenge that presumed status
> quo. :)

I believe I have it traced properly now, though a fix is not yet ready.  And yes, this is a bit of a corner case. :-)

Essentially, most available computers are now so fast (especially with more than one CPU) that the running knotify instance exits almost immediately.  Unfortunately, ksmserver is not yet finished using knotify, so it launches another instance, which then has to be forcibly terminated after a 10-second wait before logout can complete.

The fix is likely to delay knotify termination until after twin has terminated.  If I manually delay knotify termination with gdb, my "slow" test system logs out in under 5 seconds, with the profiling code indicating that most of that time is spent waiting for running applications (such as Firefox) to save their session state.
Comment 53 Timothy Pearson 2013-04-24 20:48:28 CDT
knotify heisenbug fixed in GIT hash 1c7bdfd.

If TDE now exits quickly enough for you, please close this bug report.  If not, please enable the new compile-time shutdown profiling code (#define PROFILE_SHUTDOWN 1) in tdebase and attach the resultant ~/.xsession_errors file to this report.

For reference, on my test VM using a NFS root, TDE exits in a little over 4 seconds with several applications (such as Firefox) running when logout was requested.
Comment 54 Darrell 2013-04-24 21:12:01 CDT
I'll test and report.

Looking at the patch, would a cmake configure option be suitable for enabling the shutdown spew? Something like -DWITH_VERBOSE_SHUTDOWN?
Comment 55 Timothy Pearson 2013-04-24 21:28:27 CDT
(In reply to comment #54)
> I'll test and report.
> 
> Looking at the patch, would a cmake configure option be suitable for enabling
> the shutdown spew? Something like -DWITH_VERBOSE_SHUTDOWN?

Might be a good idea.  Such an option should be off by default (i.e. not use the value of BUILD_ALL as its default like most other build options do).
Comment 56 Darrell 2013-04-25 12:59:58 CDT
I ran some tests with the latest patch.

I ran a set of five timed exits on all of my machines using an unscientific stopwatch method.

Apparently I'm not afflicted by the corner case delay previously described and patched and can't help test that. :-(

On my two dual cores, without the patch, exit times averaged about 5.5 to 5.9 seconds. On my PI and PII, exit times averaged about 10 seconds (they are horribly old machines but provide for some interesting testing).

After the patch the exit times were consistently the same.

I added time markers to my starttde and xinitrc. Interestingly, the biggest delay on my systems is with tdeinit_shutdown, launched in starttde. The delay time is about two seconds (Look at the times at the end at "Shutting down Trinity").

As can be seen from the xsession-errors log, the corner case delay with ksmserver does not happen on my systems and is not a cause of any delay. The time markers show ksmserver terminating quickly.

I don't know whether a 5.5 to 5.9 second exit time on a dual core system is normal. Still seems long to me. I can manually terminate all apps in the system tray before exiting Trinity and the exit time is reduced only by a fraction of a second.

I have more information to add but I'm still assembling the data.

======================================
[ksmserver] 'Shutdown initiated' [23:06:54:382]
[ksmserver] 'Legacy save complete' [23:06:54:394]
[ksmserver] 'completeShutdownOrCheckpoint' [23:06:54:396]
[ksmserver] 'completeShutdownOrCheckpoint' [23:06:54:401]
[ksmserver] 'completeShutdownOrCheckpoint' [23:06:54:406]
[ksmserver] 'completeShutdownOrCheckpoint' [23:06:54:407]
[ksmserver] 'completeShutdownOrCheckpoint' [23:06:54:407]
[ksmserver] 'completeShutdownOrCheckpoint' [23:06:54:407]
[ksmserver] 'completeShutdownOrCheckpoint' [23:06:54:408]
[ksmserver] 'completeShutdownOrCheckpoint' [23:06:54:409]
[ksmserver] 'completeShutdownOrCheckpoint' [23:06:54:409]
[ksmserver] 'completeShutdownOrCheckpoint' [23:06:54:409]
[ksmserver] 'completeShutdownOrCheckpoint' [23:06:54:409]
[ksmserver] 'completeShutdownOrCheckpoint' [23:06:54:409]
[ksmserver] 'completeShutdownOrCheckpoint' [23:06:54:409]
[ksmserver] 'completeShutdownOrCheckpoint' [23:06:54:410]
[ksmserver] 'Phase 2 complete' [23:06:54:410]
[ksmserver] 'Session discarded' [23:06:54:410]
[ksmserver] 'notifySlot' [23:06:54:412]
[ksmserver] 'startKilling' [23:06:54:412]
[ksmserver] 'completeKilling' [23:06:54:483]
[ksmserver] 'completeKilling' [23:06:54:485]
[ksmserver] 'completeKilling' [23:06:54:486]
[ksmserver] 'completeKilling' [23:06:54:486]
[ksmserver] 'completeKilling' [23:06:54:486]
[ksmserver] 'completeKilling' [23:06:54:486]
[tdeinit] PID 6196 terminated.
X Error: BadWindow (invalid Window parameter) 3
  Major opcode:  7
  Minor opcode:  0
  Resource id:  0x2800008
X Error: BadWindow (invalid Window parameter) 3
  Major opcode:  6
  Minor opcode:  0
  Resource id:  0x2800008
[ksmserver] 'completeKilling' [23:06:54:498]
[tdeinit] PID 6170 terminated.
X Error: BadWindow (invalid Window parameter) 3
  Major opcode:  7
  Minor opcode:  0
  Resource id:  0x2600008
X Error: BadWindow (invalid Window parameter) 3
  Major opcode:  6
  Minor opcode:  0
  Resource id:  0x2600008
X Error: BadWindow (invalid Window parameter) 3
  Major opcode:  7
  Minor opcode:  0
  Resource id:  0x2000008
X Error: BadWindow (invalid Window parameter) 3
  Major opcode:  6
  Minor opcode:  0
  Resource id:  0x2000008
[ksmserver] 'completeKilling' [23:06:54:525]
[ksmserver] 'completeKilling' [23:06:54:528]
[ksmserver] 'completeKilling' [23:06:54:591]
[ksmserver] 'completeKilling' [23:06:54:602]
[tdeinit] PID 6192 terminated.
[ksmserver] 'completeKilling' [23:06:54:632]
[ksmserver] 'completeKilling' [23:06:54:973]
[ksmserver] 'killWM' [23:06:54:973]
[ksmserver] 'completeKillingWM' [23:06:54:973]
[ksmserver] 'completeKillingWM' [23:06:54:974]
[ksmserver] 'completeKillingWM' [23:06:54:981]
[ksmserver] 'completeKillingWM' [23:06:54:999]
[ksmserver] 'killingCompleted' [23:06:54:999]
[tdeinit] PID 6138 terminated.
[starttde] Shutting down Trinity... [23:06:55:683412249]
[tdeinit] Terminating Trinity.
[tdelauncher] Exiting on signal 1
[starttde] tdeinit_shutdown. [23:06:57:710935496]
[starttde] dcopserver_shutdown. [23:06:57:832070939]
[starttde] artsshell. [23:06:57:878495134]
[starttde] kdeinit4_shutdown. [23:06:57:896654119]
[starttde] Running Trinity shutdown scripts... [23:06:57:900248900]
[starttde] Trinity shutdown complete. [23:06:58:175146092]
xinitrc: Exiting X: 2013-04-24 23:06:58
Comment 57 Darrell 2013-04-25 18:43:19 CDT
Created attachment 1185 [details]
Patch to provide cmake configure option to build ksmserver timing marks

Tim,

Attached is a patch as discussed in comments 54 and 55. I tested the patch both ways (OFF/ON) and in each case the xsession-errors log looked as expected.

Please provide a sanity inspection and if ok I'll push to GIT.

Question:

When I view the xsession-errors log in mcedit, I see a bunch of ^M characters preceding all of the [ksmserver] markers. Second, all of the ksmserver comments have a blank line feed after the output. Can this be cleaned?
Comment 58 Slávek Banko 2013-04-25 19:01:54 CDT
I have some observations that might be related: During testing, I installed kdesvn-trinity and kdesvn-kio-plugins-trinity (yes v3.5.13-sru branch). And beyond that I set in kdmrc: Compositor=kompmgr

During logout in .xsession-errors I see:

[kwin-workspace] Using '/home/axis/.kompmgr.pid' as kompmgr pidfile

[startkde] Shutting down Trinity...
[klauncher] Exiting on signal 1
[25398:25464:61868684793:ERROR:zygote_host_linux.cc(398)] write: Roura přerušena (SIGPIPE)
The Trinity ltdl loader was unable to dlopen() the shared library '/usr/lib64/trinity/kded_kdesvnd.so' : '/usr/lib64/trinity/kded_kdesvnd.so: sdílený objektový soubor nelze otevřít: Adresář nebo soubor neexistuje'
[startkde] Running Trinity shutdown scripts...
[startkde] Trinity shutdown complete.

I leave aside that /usr/lib64/trinity/kded_kdesvnd.so is wrong path (I'll deal with this separately).

It seems suspicious, why in lougout appears message:

[kwin-workspace] Using '/home/axis/.kompmgr.pid' as kompmgr pidfile

And why in lougout is effort to load library:

The Trinity ltdl loader was unable to dlopen() the shared library

It looks as it trying to start something, instead of peaceful shutdown.
Comment 59 Timothy Pearson 2013-04-25 20:15:21 CDT
(In reply to comment #57)
> Created attachment 1185 [details]
> Patch to provide cmake configure option to build ksmserver timing marks
> 
> Tim,
> 
> Attached is a patch as discussed in comments 54 and 55. I tested the patch both
> ways (OFF/ON) and in each case the xsession-errors log looked as expected.
> 
> Please provide a sanity inspection and if ok I'll push to GIT.
> 
> Question:
> 
> When I view the xsession-errors log in mcedit, I see a bunch of ^M characters
> preceding all of the [ksmserver] markers. Second, all of the ksmserver comments
> have a blank line feed after the output. Can this be cleaned?

Patch looks OK, go ahead and clean up the timing mark #define near the top of shutdown.cpp if you want to. :-)
Comment 60 Slávek Banko 2013-04-26 19:29:42 CDT
Please, this is a case where rather than BUILD_ should be used WITH_? In this case WITH_PROFILE_SHUTDOWN rather than BUILD_PROFILE_SHUTDOWN?
Comment 61 Darrell 2013-04-26 21:20:37 CDT
Patch pushed to GIT in commit 62d43d23.

>Please, this is a case where rather than BUILD_ should be used WITH_?

I don't know. Exactly what is the difference between BUILD and WITH?
Comment 62 Timothy Pearson 2013-04-27 02:26:11 CDT
(In reply to comment #56)
<snip>
> I added time markers to my starttde and xinitrc. Interestingly, the biggest
> delay on my systems is with tdeinit_shutdown, launched in starttde. The delay
> time is about two seconds (Look at the times at the end at "Shutting down
> Trinity").

There are actually three commands run in that time slot:
$TDEDIR/bin/tdeinit_shutdown
$TDEDIR/bin/dcopserver_shutdown --wait
$TDEDIR/bin/artsshell -q terminate

Out of those three, I vote dcopserver_shutdown as the likely delay culprit.  Is there any way you could insert a time marker in your starttde script between each of those three commands so that I can get a better idea of which one is the true cause of the delay on your systems?

Thanks!
Comment 63 Slávek Banko 2013-04-27 06:20:11 CDT
(Odpověď na komentář #61)
> Patch pushed to GIT in commit 62d43d23.
> 
> >Please, this is a case where rather than BUILD_ should be used WITH_?
> 
> I don't know. Exactly what is the difference between BUILD and WITH?

I understand that BUILD_ affects whether will be builded binary or library - and it usually corresponds to the folder in the sources. While WITH_ determines with such properties would be these binaries or libraries builded. Therefore, in this case, I tend to use WITH_.
Comment 64 Darrell 2013-04-27 10:25:11 CDT
> Is there any way you could insert a time marker in your starttde

Already done. :-) Look at the log output in comment 56.

[starttde] Shutting down Trinity... [23:06:55:683412249]
[tdeinit] Terminating Trinity.
[tdelauncher] Exiting on signal 1
[starttde] tdeinit_shutdown. [23:06:57:710935496]
[starttde] dcopserver_shutdown. [23:06:57:832070939]
[starttde] artsshell. [23:06:57:878495134]
[starttde] kdeinit4_shutdown. [23:06:57:896654119]
[starttde] Running Trinity shutdown scripts... [23:06:57:900248900]
[starttde] Trinity shutdown complete. [23:06:58:175146092]
xinitrc: Exiting X: 2013-04-24 23:06:58
Comment 65 Darrell 2013-04-27 10:28:13 CDT
> I understand that BUILD_ affects whether will be builded binary or library - and it usually corresponds to the folder in the sources. While WITH_ determines with such properties would be these binaries or libraries builded.

Is this defined anywhere? I'm asking because I'd like to know.
Comment 66 Slávek Banko 2013-04-27 12:23:06 CDT
(Odpověď na komentář #65)
> > I understand that BUILD_ affects whether will be builded binary or library - and it usually corresponds to the folder in the sources. While WITH_ determines with such properties would be these binaries or libraries builded.
> 
> Is this defined anywhere? I'm asking because I'd like to know.

I do not know about that rule was written.
I assume from the way that are currently defined.
Comment 67 Darrell 2013-04-27 13:06:00 CDT
If the distinction is a local project guideline then perhaps we should have that guideline defined someplace?
Comment 68 Timothy Pearson 2013-04-28 04:16:38 CDT
(In reply to comment #64)
> > Is there any way you could insert a time marker in your starttde
> 
> Already done. :-) Look at the log output in comment 56.
> 
> [starttde] Shutting down Trinity... [23:06:55:683412249]
> [tdeinit] Terminating Trinity.
> [tdelauncher] Exiting on signal 1
> [starttde] tdeinit_shutdown. [23:06:57:710935496]
> [starttde] dcopserver_shutdown. [23:06:57:832070939]
> [starttde] artsshell. [23:06:57:878495134]
> [starttde] kdeinit4_shutdown. [23:06:57:896654119]
> [starttde] Running Trinity shutdown scripts... [23:06:57:900248900]
> [starttde] Trinity shutdown complete. [23:06:58:175146092]
> xinitrc: Exiting X: 2013-04-24 23:06:58

Thanks, I must have been blind earlier. ;-)

Fix for 2 second delay committed to GIT in hash a9e5bed29204900ce96e9bd89dc4a3444512ca83.  Essentially, the original developer entered a blocking read on the tdeinit socket for an unknown reason, likely relying on the dying X11 server or tdeinit process (I'm not sure which) to ensure eventual tdeinit_shutdown termination.  I stress tested logout/immediate login in a while loop with no ill effects, so I suspect that the wait was not needed in the first place.

With this change, TDE logs out in 3 seconds on my real (non-virtual) test machine.  Good enough? ;-)
Comment 69 Darrell 2013-04-28 12:37:19 CDT
Sounds encouraging. Let me rebuild and test. :-)
Comment 70 Darrell 2013-04-28 16:53:27 CDT
I have no idea what a blocking read is but wow. I'm getting about 3 seconds now with my normal account. I have to add --- the lack of the additional 2.5 seconds FEELS different. Trinity feels snappier, although we are only discussing the exit time. No weird messages in xsession-errors either. Nice job! :-)
Comment 71 Timothy Pearson 2013-04-28 16:59:13 CDT
(In reply to comment #70)
> I have no idea what a blocking read is but wow. I'm getting about 3 seconds now
> with my normal account. I have to add --- the lack of the additional 2.5
> seconds FEELS different. Trinity feels snappier, although we are only
> discussing the exit time. No weird messages in xsession-errors either. Nice
> job! :-)

Great!

In a nutshell, the blocking read was there to make absolutely sure that tdeinit was not running before tdeinit_shutdown would allow starttde to complete execution.  It worked by sitting on the tdeinit socket and attempting a read, essentially waiting for that read to error out when the tdeinit socket was closed.  However, this introduced a significant delay, as my tests indicated the read *never* returned, meaning the code did not do what it was supposed to do and only *seemed* to work due to some interaction of shutdown processes that I did not bother fully tracing.  Furthermore, there is no reason to wait for tdeinit shutdown before executing the last steps in the shutdown process; tdeinit will close down very rapidly after receiving the shutdown request, even on slow system, as its shutdown path is only a few lines of C++ code.

If TDE's logout time is now in line with other environments, can we mark this bug resolved? :-)
Comment 72 Darrell 2013-04-28 17:08:26 CDT
Tagging as resolved. :-) Hopefully bug report 1477 is something altogether different --- although I did not add 1477 to the R14 etherpad. I will run a quick check now that you've eliminated about 2.5 seconds.
Comment 73 Slávek Banko 2013-06-29 21:14:10 CDT
Due to commit 8f8dce36 ([tdebase] Fix premature termination of applications during shutdown on slow machines) logout is incredibly long. After selecting logout from T-Menu opens the dialog for selecting shutdown mode. I will select Close the current session and confirm. Dialog for selecting shutdown mode closes but  then for a minute nothing happens. After one minute waiting opens logout status dialog and begins closing programs. During closing programs, then I watch another minute waiting for completion of closing programs.

During testing, I found that for mad more than 120s session termination is sufficient to have an open Konqueror with uri "locate:*.ogg". Another Konqueror windows that I open in the time between logout confirmation and real start to logout, are not saved in the session.
Comment 74 Darrell 2013-07-02 13:11:06 CDT
Just sharing out loud, any chance bug report 1477 is related?
Comment 75 Slávek Banko 2013-07-02 13:23:53 CDT
(Odpověď na komentář #74)
> Just sharing out loud, any chance bug report 1477 is related?

I think not - on the test machine I have no remote mounts.
Comment 76 Darrell 2013-07-02 13:38:51 CDT
Other than the NFS report, I'm not seeing any delays with exiting. Then again, I don't use confirmation, the logout status dialog, or sessions. I will enable those for a day or two to see what happens.
Comment 77 Timothy Pearson 2013-07-21 18:05:58 CDT
This should partially be resolved in GIT hash 08390bd (tdebase).  The logout stall is fundamentally caused by an application (or applications) not responding to the X11 SM SaveYourself signal; the prior behavior timed out in 10 seconds which somewhat masked applications not responding, but also made it so that a slow machine might prematurely terminate an application that was saving its settings.  Therefore, the aforementioned GIT hash adds the ability for the user to forcibly skip the SaveYourself step if needed.
Comment 78 Timothy Pearson 2013-07-21 21:42:38 CDT
Changing status to NEEDINFO as I consider the above GIT commit a valid solution to this report, but I would like other TDE developers (and users!) to weigh in on this concept and its interface/implementation.
Comment 79 Darrell 2013-07-21 23:11:59 CDT
How to test? Is there a new dialog config option?
Comment 80 Timothy Pearson 2013-07-22 01:23:34 CDT
(In reply to comment #79)
> How to test? Is there a new dialog config option?

There is no configuration option.  If the SaveYourself step stalls for more than 1 second, two buttons will appear in the logout status dialog.

Ways to test include what Slavek mentioned earlier (Konqueror with URI "locate:*.ogg"), or leaving Kate open with an unsaved text document when you log out.
Comment 81 Alexander Golubev (Fat-Zer) 2013-07-22 15:05:13 CDT
The patch in the attachment 525 [details] is malformed!

You forgot the brackets around the "if", so the function unconditionally returns now, I haven't tested what it breaks but I'm sure there is something.

if( mInited ) {
    // FIXME bla-bla-bla
    printf("[systemdirnotify] SystemDirNotify::init(mInited)");
    return;
}
Comment 82 Timothy Pearson 2013-07-22 16:29:07 CDT
(In reply to comment #81)
> The patch in the attachment 525 [details] is malformed!
> 
> You forgot the brackets around the "if", so the function unconditionally
> returns now, I haven't tested what it breaks but I'm sure there is something.
> 
> if( mInited ) {
>     // FIXME bla-bla-bla
>     printf("[systemdirnotify] SystemDirNotify::init(mInited)");
>     return;
> }

Good catch!  Darrell, can you fix this and see if the printf() still has the same effect as before?

Thanks!
Comment 83 Darrell 2013-07-22 19:24:55 CDT
I will push the missing brackets to GIT. I don't know about the testing other than verifying nothing new happens. I browsed the original comments and my eyes got blurry trying to remember exactly what I observed before and after the patch.
Comment 84 Slávek Banko 2013-07-22 19:31:28 CDT
(Odpověď na komentář #83)
> I will push the missing brackets to GIT. I don't know about the testing other
> than verifying nothing new happens. I browsed the original comments and my eyes
> got blurry trying to remember exactly what I observed before and after the
> patch.

Last patches respond to the problem described in the comment 73
Comment 85 Darrell 2013-07-22 19:39:36 CDT
I'm running a full build set now (I started before the hard freeze announcment). When the build run completes I'll try to test the latest patch against Comment 73. I don't know whether I can test properly because I haven't been able to replicate the latest stall report. If nothing happens then I'll test a patch with the missing brackets.
Comment 86 Darrell 2013-07-22 22:24:26 CDT
Regarding commit 8f8dce36, the shutdown dialog now appears despite having kcontrol TDE Components->Session Manager->Show logout status dialog disabled/unchecked.

The patch should be tweaked to honor that setting.

I understand that the dialog should appear when something is delaying the logout process, but only then. right now the dialog appears all the time.
Comment 87 Darrell 2013-07-22 22:27:44 CDT
Side note: I can't get locate:*.ogg to work. I have the tdeio-locate package installed.

Typing locate:*.ogg:

An error occurred while loading locate:*.ogg:
Could not start process Unable to create io-slave:
 tdelauncher said: Error loading 'tdeio_locate'.
 .
Comment 88 Darrell 2013-07-22 22:37:07 CDT
When I open kate with an unsaved document and attempt to log out of Trinity, the shutdown dialog appears (whether configured or not, as noted previously). After a couple of seconds the two new button options appear inside the dialog.

The first time I tested the Skip Notification button the entire desktop froze. I had to use Ctrl-Alt-Backspace to exit. I haven't been able to replicate again. I hate when events like that happen only once. :-(
Comment 89 Darrell 2013-07-25 06:18:11 CDT
Regarding Comment 87, refer to bug report 1285. The problem is incomplete support for hidden visibility.

Any chance we can get the shutdown dialog appearance fixed to abide by the user's configuration in kcontrol TDE Components->Session Manager->Show logout status dialog? That is, when the option is disabled, the dialog does not appear unless the two second timeout occurs?
Comment 90 Timothy Pearson 2013-07-25 13:09:56 CDT
I have fixed the regression of the dialog not respecting the users' settings in GIT hash c060a81.  However, if the logout dialog is disabled, it will not appear at all, ever.  My best advice is just to Ctrl+Alt+Backspace the X server if you have disabled the logout dialog and the logout hangs; this is the way it had to be done in KDE 3.x. :-)

Until/unless someone notes a hang that cannot be aborted while the logout dialog is enabled, I am going to provisionally mark this RESOLVED FIXED.
Comment 91 Slávek Banko 2013-07-25 14:13:04 CDT
The situation has not improved, perhaps contrary deteriorated.

The initial condition is still the same:
running Konqueror with URI "locate *. ogg".


With enabled logout status dialog:

Too fast to see buttons Skip Notification + Abort Logout => 2 seconds is too little for a slower machine.
After pressing Skip Notification => long wait => in ssh console I see the Konqueror crash => waiting drkonqi hidden under logout status dialog. Requires manually kill drkonqi process, or not completed logout at all!


With disabled logout status dialog:

After a minute waiting occurs a crash konqueror and ksmserver - opens with 2 × drkonqi. After processing drkonqi, a few seconds waiting for the completion of logout.


I note that before commit 8f8dce36 I did not notice such problems.
I will send a both backtraces asap.
Comment 92 Slávek Banko 2013-07-25 14:15:24 CDT
Created attachment 1416 [details]
Konqueror backtrace
Comment 93 Slávek Banko 2013-07-25 14:15:52 CDT
Created attachment 1417 [details]
Ksmserver backtrace
Comment 94 Slávek Banko 2013-07-25 14:17:19 CDT
Note: Tested with kdebase GIT c060a817.
Comment 95 Timothy Pearson 2013-07-25 15:07:47 CDT
OK, thanks for the info.  This bug report just refuses to die!!

Is this a regression from KDE 3.5.10?  I seem to remember KDE 3.x hanging on logout occasionally, but it has been so long that I can't be sure.  If hanging on logout is not a regression then I may need to demote this from Blocker status.  If it is a regression, then a bisect may be needed.

Tim
Comment 96 Darrell 2013-07-25 16:43:13 CDT
I also see a ksmserver crash when trying to logout of the session. Has to be related with something very recent. I built a package set Monday and did not see this crash.
Comment 97 Darrell 2013-07-25 16:46:13 CDT
Created attachment 1418 [details]
ksmserver backtrace when logging out

I don't have the shutdown dialog or shutdown confirmations enabled. Just a straight logout. I updated tdelibs and tdebase as of this afternoon and rebuilt those two packages after running a full package set rebuild this morning.
Comment 98 Darrell 2013-07-25 16:47:49 CDT
The ksmserver crash occurs on three different machines.
Comment 99 Darrell 2013-07-25 16:48:55 CDT
Okay, I see now a patch has been pushed to tdebase. Rebuilding...
Comment 100 Darrell 2013-07-25 16:59:52 CDT
I don't know whether this is related to comment 82, but I did test applying the patch with the braces. As I wait for tdebase to rebuild with the ksmserver crash, I notice the following in my xsession-error log:

SystemDirNotify::init(mInited)[systemdirnotify] SystemDirNotify::init(mInited)[systemdirnotify] 

TQClipboard: cannot transfer data, no data available
Comment 101 Slávek Banko 2013-07-25 18:37:44 CDT
(Odpověď na komentář #95)
> OK, thanks for the info.  This bug report just refuses to die!!
> 
> Is this a regression from KDE 3.5.10?  I seem to remember KDE 3.x hanging on
> logout occasionally, but it has been so long that I can't be sure.  If hanging
> on logout is not a regression then I may need to demote this from Blocker
> status.  If it is a regression, then a bisect may be needed.
> 
> Tim

I think that bisect is not needed. As I mentioned in a 
comment 73 and reminded in comment 91,
I began to noticed slow logout after commit:

8f8dce36 - [tdebase] Fix premature termination of applications during shutdown on slow machines.
http://git.trinitydesktop.org/cgit/tdebase/commit/?id=8f8dce36

Incidentally, this is why I do not backport this commit into 3.5.13-sru branch. When I tried backport this commit, I noticed a significant slowdown on logout. That's why I know exactly when the problem started.

So yes, it's definitely a regression.
Comment 102 Timothy Pearson 2013-07-25 18:42:48 CDT
(In reply to comment #101)
> (Odpověď na komentář #95)
> > OK, thanks for the info.  This bug report just refuses to die!!
> > 
> > Is this a regression from KDE 3.5.10?  I seem to remember KDE 3.x hanging on
> > logout occasionally, but it has been so long that I can't be sure.  If hanging
> > on logout is not a regression then I may need to demote this from Blocker
> > status.  If it is a regression, then a bisect may be needed.
> > 
> > Tim
> 
> I think that bisect is not needed. As I mentioned in a 
> comment 73 and reminded in comment 91,
> I began to noticed slow logout after commit:
> 
> 8f8dce36 - [tdebase] Fix premature termination of applications during shutdown
> on slow machines.
> http://git.trinitydesktop.org/cgit/tdebase/commit/?id=8f8dce36
> 
> Incidentally, this is why I do not backport this commit into 3.5.13-sru branch.
> When I tried backport this commit, I noticed a significant slowdown on logout.
> That's why I know exactly when the problem started.
> 
> So yes, it's definitely a regression.

OK.  The problem is that the code that was altered has a nasty habit of force-closing applications on slow machines, leading to data loss.  Is that something we want to ship in R14?
Comment 103 Slávek Banko 2013-07-25 18:52:08 CDT
I understand that we do not wish to revert this commit.

But simple reproducibility with Konqueror and uri "locate:*.ogg" would be helpful to detect bug or symptoms that may be common to other cases which, during our testing so far remained undetected.
Comment 104 Darrell 2013-07-25 19:30:25 CDT
Rebuilt tdebase: no ksmserver crash anymore.
Comment 105 Timothy Pearson 2013-07-25 19:44:40 CDT
(In reply to comment #103)
> I understand that we do not wish to revert this commit.
> 
> But simple reproducibility with Konqueror and uri "locate:*.ogg" would be
> helpful to detect bug or symptoms that may be common to other cases which,
> during our testing so far remained undetected.

Sure.  I don't mean to come across irritated at anything other than this particular bug that I can't seem to fix. :-)

Can you enable the shutdown dialog on your system and see what X11 phase the shutdown is stalling on?

Thanks!

Tim
Comment 106 Darrell 2013-07-28 18:52:27 CDT
I confirm when konqueror is open using the locate protocol that the logout process freezes hard. Sounds like a problem with tdeio-locate. dcop?
Comment 107 Darrell 2013-07-28 19:40:14 CDT
Idea: I don't know what is up with tdeio-locate hanging the logout (commit 5c26e7a in bug report 1285 made no difference here), but when I enable the shutdown dialog and the two buttons appear, I had the idea that perhaps the dialog could list which process is acting persnickety that invokes the two buttons to appear?
Comment 108 Darrell 2013-07-28 19:49:32 CDT
Another cosmetic observation with the shutdown dialog's behavior, seems on my system the two buttons always appear somewhere in the logout process, even if the dialog does not actually halt the logout to prompt user intervention. Seems to me the two buttons should never appear unless user intervention is required.
Comment 109 Darrell 2013-07-28 20:05:57 CDT
When I use the locate: protocol in konqueror I notice two tdeio-locate processes appear rather than one.

I see the following in my xsession-error log:

tdelauncher: Fatal IO error: client killed
kded: Fatal IO error: client killed
[tdeinit] Fatal IO error: client killed
[tdeinit] sending SIGHUP to children.
[tdeinit] sending SIGTERM to children.
[tdeinit] Exit.
[tdeinit] Fatal IO error: client killed
[tdeinit] sending SIGHUP to children.
Hangup
/opt/trinity/bin/starttde: line 1: [starttde]: command not found
[starttde] Shutting down Trinity...
[tdeinit wrapper] Warning: socket connection failed: : No such file or directory
[tdeinit wrapper] Error: Can't contact tdeinit!
[starttde] Running Trinity shutdown scripts...

The logout process never gets to actually run the shutdown scripts.

When I manually kill the two tdeio-locate processes then the logout process exits with no hangups.
Comment 110 Darrell 2013-07-29 12:49:46 CDT
Missing braces noticed in comment 81 pushed to git in commit bac7ea9a.
Comment 111 Timothy Pearson 2013-07-29 15:36:11 CDT
I opened Konqueror with locate://*.ogg and logged out.  When the action buttons appeared in the logout status dialog, I clicked "Skip Notification" and the logout process completed without error.  I don't see a problem here (except possibly in tdeio-locate not responding to the SaveYourself request?)
Comment 112 Darrell 2013-07-29 16:01:27 CDT
Interesting you don't see the hang and we do. :-)

I don't understand the whole SaveYourself process but your description sounds appropriate. I select the Skip Notification button and then the status dialog proceeds but hangs and never reaches the end the "X/X" status line. On my primary system the hang usually occurs at about 14/17 (processes).

Let me know what information to grab to help. I posted some notes in comment 109.
Comment 113 Timothy Pearson 2013-07-29 16:04:01 CDT
(In reply to comment #112)
> Interesting you don't see the hang and we do. :-)
> 
> I don't understand the whole SaveYourself process but your description sounds
> appropriate. I select the Skip Notification button and then the status dialog
> proceeds but hangs and never reaches the end the "X/X" status line. On my
> primary system the hang usually occurs at about 14/17 (processes).
> 
> Let me know what information to grab to help. I posted some notes in comment
> 109.

The information you just posted is quite helpful.  My logout stalled at a similar stage for a while, but eventually (a couple of minutes) completed without error.  I suspect that tdeio-locate is not only failing to respond to the SaveYourself request, but is probably also failing to respond to the SmsDie request.
Comment 114 Darrell 2013-07-29 16:19:49 CDT
I just tested on my laptop and HTPC. With both the logout does finally terminate but there is a stall of about one minute.
Comment 115 Slávek Banko 2013-07-29 16:32:10 CDT
I can confirm small change: While earlier I watched Konqueror crash in 100% of cases, now I can rarely watch successful logout.

Note: Actually, I have in Konqueror opened two tabs - the first with "locate:*.ogg", and second with "locate:zip". Logout takes about three minutes. In most cases with Konqueror crash.
Comment 116 Slávek Banko 2013-07-29 16:41:49 CDT
Note 1: With enabled logout status dialog can be waiting indefinitely because Konqueror crash remains hidden under the empty screen from logout status dialog.

Note 2: Since Skip Notification is designed especially for a slow computers, 2 seconds is impossibly short period => on slow computers buttons will appear almost immediately, which is nonsense. I propose to extend the period to 10 seconds
Comment 117 Darrell 2013-07-29 16:51:06 CDT
I have some true slow machines here: PI and PII class. Unlike the 1990s, neither runs any modern software snappily anymore, including Trinity. I'm in the middle of a package build run, which should finish this evening. I'll update the old machines with the latest packages and run some tests.

Which cpp and line number contains the two second delay?

Should we make the delay user-definable in kcontrol, possibly a spinner widget in increments of whole seconds, under the 'Show logout status dialog' check box?
Comment 118 Darrell 2013-07-29 17:01:56 CDT
I found them in ksmserver/shutdown.cpp:

#define KSMSERVER_SHUTDOWN_CLIENT_UNRESPONSIVE_TIMEOUT 60000
#define KSMSERVER_NOTIFICATION_MANUAL_OPTIONS_TIMEOUT 1000

The 60000 (60 seconds) roughly matches what I saw on the laptop and HTPC in comment 114. The 1000 (1 second) roughly matches what I described in comment 108.

The 1 second period seems too fast for most modern hardware. That period might work fine on really old hardware. We'll find out when I later test on my PI and PII.

The 60 second timeout seems to test any user's patience. Granted the real problem is the specific process not responding, but nonetheless causes a genuine WTF moment. :-)

I like the idea of both timeouts being user-definable. I don't know what might be adequate defaults. :-)
Comment 119 Timothy Pearson 2013-07-29 17:04:50 CDT
What if a Force Termination button were made available once the logout has stalled for a few seconds in Phase 2 (i.e. no activity from terminating applications)?  This might produce a bit of a "hall of shame" effect that could motivate authors of stall-inducing applications to actually fix those applications...
Comment 120 Darrell 2013-07-29 17:20:30 CDT
Interesting idea, but the "hall of shame" most often will be Trinity apps. :-) If a non Trinity app causes the hang, most users likely just want to exit. Many systems come with Ctrl-Alt-Backspace disabled and they need a way to force the logout to continue. Possibly the force button is nothing more than Ctrl-Alt-Backspace in disguise.

Because I am helping with testing, I actually waited to observe the eventual logout after about 60 seconds. Most users are not obligated and will get frustrated. If I wasn't part of the testing group I'd revert to Ctrl-Alt-Backspace and say nasty things about Trinity. :-)

Possibly a separate enhancement request? For now I think we resolve why tdeio-locate is acting persnickety?
Comment 121 Darrell 2013-07-29 17:28:28 CDT
Whether or not a Force Termination button is added, I like the idea of adding more verbosity in the status dialog. When the logout hangs the users sees exactly which process is not responding. That would help with bug reports. The dialog seems to support that to some degree because I have seen actual app/process names scream by, but every time I see the hang I only see "X/X" and not the app/process name.
Comment 122 Slávek Banko 2013-07-29 17:37:44 CDT
(Odpověď na komentář #118)
> I found them in ksmserver/shutdown.cpp:
> 
> #define KSMSERVER_SHUTDOWN_CLIENT_UNRESPONSIVE_TIMEOUT 60000
> #define KSMSERVER_NOTIFICATION_MANUAL_OPTIONS_TIMEOUT 1000
> 
> The 60000 (60 seconds) roughly matches what I saw on the laptop and HTPC in
> comment 114. The 1000 (1 second) roughly matches what I described in comment
> 108.
> 
> The 1 second period seems too fast for most modern hardware. That period might
> work fine on really old hardware. We'll find out when I later test on my PI and
> PII.
> 
> The 60 second timeout seems to test any user's patience. Granted the real
> problem is the specific process not responding, but nonetheless causes a
> genuine WTF moment. :-)
> 
> I like the idea of both timeouts being user-definable. I don't know what might
> be adequate defaults. :-)

On a slow machine is 1 second very little period - unfade effect of logout screen may take longer time :) Therefore, I believe that the interval for displaying buttons should be longer.
Comment 123 Darrell 2013-07-29 18:20:06 CDT
I'll post some data when I get the PI and PII updated. I created a patch for testing that changed both numbers: 60000->30000 and 1000->3000. I'll post data from that testing too. Right now I hope we can resolve why tdeio-locate is being a bad critter. :-)
Comment 124 Darrell 2013-07-29 22:22:53 CDT
> What if a Force Termination button were made available

I don't use Windows much but I recall something similar to that idea being available on Windows NT4. The default timeout was 20 seconds but the dialog basically allowed terminating the waiting period and force killed the hung process.

Regarding more verbosity when the shutdown dialog hangs, I notice in shutdown.cpp, lines 697 - 761, there are if/else tests whether the dialog notification shows "%1/%2" or "%1/%2, %3." Why don't we just make "%1/%2, %3" the default all the time?
Comment 125 Darrell 2013-07-30 00:18:53 CDT
Some testing observations:

Changing KSMSERVER_NOTIFICATION_MANUAL_OPTIONS_TIMEOUT 1000->3000
=================================================================

On my dual core systems the two buttons never appeared in the status dialog during normal no-hang logouts. That resolved the cosmetic issue mentioned in comment 108. The PI and PII liked the longer time too, although once on the PI the buttons still appeared during a normal no-hang logout. I did not repeat the appearance of the buttons on the PI but I restarted Trinity only a half dozen times.

Changing KSMSERVER_NOTIFICATION_MANUAL_OPTIONS_TIMEOUT 1000->2000
=================================================================

No difference on the dual core systems. The two buttons never appeared during a normal no-hang logout. The PI and PII sometimes showed the two buttons during a normal no-hang logout and sometimes didn't.

Conclusion, the default time should be no less than 2000 milliseconds, although 3000 probably is better for cosmetic reasons.

KSMSERVER_SHUTDOWN_CLIENT_UNRESPONSIVE_TIMEOUT 60000->30000
===========================================================

On all of my systems this still seemed an awful long time.

KSMSERVER_SHUTDOWN_CLIENT_UNRESPONSIVE_TIMEOUT 60000->20000
===========================================================

Still seemed like a long time but seems like a reasonable compromise too, even on the slower PI/PII systems.

Comments:
=========

The user does not know why the two buttons suddenly appear. When the reason is kate/kwrite with an unsaved document, the user might deduce what is happening because an additional dialog appears. With a hung app such as the currently broken tdeio-locate, there are no clues at all. When the two buttons appear the dialog status message should change to inform the user what is happening and the status message should list the unresponsive process that invoked the two buttons. Listing both the process ID and name would be a helpful debugging tool.

When the status message displays the process name, such as kmix, kicker, etc. (parameter %3 in the sources), the status message does not refresh the process name when the number sequence changes, such as 8/12, kmix -> 9/12, kmix -> 10/12, kmix, etc. The process name remains static despite the number sequence changing. When the process does not return a name to the status message, then the %3 parameter should be blank.

When the status dialog encounters an unresponsive process, the status message should provide the user notice why the logout stalled. Even if a third "Kill Now" button suddenly appears in the dialog, the status message should inform the user. When the "Kill Now" button appears, the status message changes to something like "Process XYZ is not responding."

Running Trinity on the PI and PII is a severe test of patience. Not recommended! :-) Anybody using a PI or PII with a modern Linux based graphical environment probably needs a reality check. I would not treat such Trinity related issues as corner cases but nut cases. That said, I think for now we can live with hard-coded timeouts of 20000 and 2000, but eventually a slider or spinner widget to let the user define the timeouts would be a nice enhancement.

I hope this helps!
Comment 126 Timothy Pearson 2013-07-31 11:23:51 CDT
I have changed the button delay time to 3000, thanks for the work on that!

While adding the code to show which application is stalling the logout process, I found a small glitch in the service handling code that *may* have been prolonging the stall to absurd levels.  Can you re-test tdebase with GIT hash 7c366d7 and let me know if the logout stall remains?

Thanks!
Comment 127 Darrell 2013-07-31 11:55:54 CDT
By testing do you mean to leave KSMSERVER_SHUTDOWN_CLIENT_UNRESPONSIVE_TIMEOUT=60000?
Comment 128 Timothy Pearson 2013-07-31 11:58:26 CDT
(In reply to comment #127)
> By testing do you mean to leave
> KSMSERVER_SHUTDOWN_CLIENT_UNRESPONSIVE_TIMEOUT=60000?

For this test, yes.  I still need to decide what to do with the value of KSMSERVER_SHUTDOWN_CLIENT_UNRESPONSIVE_TIMEOUT.

Tim
Comment 129 Darrell 2013-07-31 12:34:23 CDT
Ok. I rebuilt without patching KSMSERVER_SHUTDOWN_CLIENT_UNRESPONSIVE_TIMEOUT and left the default at 60000.

This is a preliminary report. I ran only one one dual core system. I have not tested on the older PI/PII systems.

When I ran locate:*.ogg, the two buttons eventually appeared at about 3 seconds. When selecting Skip Notification the logout continued MUCH faster than previously.

When Trinity exits and the screen turns black, the black period is longer than during a non-stall logout.

There are no clues in the shutdown dialog or the xsession-error log as to what invoked the two buttons to appear. I think we weill needs something with that to help with debugging.

I will run further tests on the PI/PII later this afternoon.
Comment 130 Darrell 2013-07-31 17:38:54 CDT
Follow up from my previous post:

Notes: I login from the command line using startx. I do not have Trinity logout confirmations enabled. I have the shutdown dialog enabled for these tests.

On my dual core systems, a non-stall logout averages about 3.7 seconds to return fully to the command line. When I use locate:*.ogg to produce a stall, from the moment I select the Skip Notification button the logout time averages about 8.5 seconds. Better than the test results in comment 125, which was about 50 seconds, but a longer than a non-stall logout.

On the slow PI/PII systems, a non-stall logout averages about 12.5 seconds to return fully to the command line. When I use locate:*.ogg to produce a stall, from the moment I select the Skip Notification button the logout time averages about 16.5 seconds.

I realize a process has stalled the logout. Perhaps I misunderstand but seems to me when the user selects the Skip Notification button the logout process should continue about as fast as a normal non-stall logout.

There is no information in the status dialog or the xsession-error log as to the cause of the stall that invokes the two buttons. The status dialog status message offers no help as to why the buttons appear.
Comment 131 Slávek Banko 2013-08-01 13:53:10 CDT
Comment on attachment 1416 [details]
Konqueror backtrace

I found that the Konquror crash not related to logging out. That's why I brought it to a separate bug report - see bug 1605.
Comment 132 Darrell 2013-08-01 17:20:57 CDT
Latest tests based upon patches from 7/31.

The latest set of patches look good.

When I create a stall condition with tdeio-locate, the status dialog shows the two buttons. The status message:

Notifying applications of logout request (11/13)....

When I create an unsaved document in Kate I see the following:

Notifying applications of logout request (11/12, kate)....

Seems tdeio-locate is broken in two ways: causing a stall and not providing the status dialog process information.

When I don't select either button, the status dialog times out at approximately the hard-coded 60000 milliseconds. I still think this time is too long. Twenty seconds is a reasonable compromise but a user-defined timeout is better.

Regardless of the actual timeout value, one idea is when the two buttons appear, to add a digital countdown timer. That would provide the user some idea what will happen and when. Possibly something like:

Forced termination: 60 seconds

With the number changing each second.

When I tested with an unsaved document in kate, the 60 second timeout never activated. Apparently the kate dialog overrides the status dialog timeout. When I responded to the kate dialog, the status dialog proceeds with the logout in normal fashion because nothing is truly "stalling" the process. After responding to the kate dialog, the logout period is about 3.7 seconds, the same as a normal logout.

Summary:
========

* tdeio-locate is broken in at least two ways. Do we want to resolve those bugs in this report or in a new report?

* A countdown timer would be helpful.

* 60 seconds still seems to long and 20 seconds seems a reasonable compromise.

* During a true stall as with tdeio-locate, the logout period after selecting the Skip Notification button remains about 8.5 seconds, which is about 5 seconds longer than a normal non-stall logout.

* I don't see any related messages in xsession-errors but I don't know whether the recent patches added such support. (I do have -DBUILD_PROFILE_SHUTDOWN=ON.)
Comment 133 Timothy Pearson 2013-08-04 14:58:25 CDT
Many of the mentioned improvements have been included in GIT.  I have also reduced the protection timeout from 60 seconds to 20 seconds as recommended by Darrell above, as I see no reason for it to be longer given the new graphical feedback.

At this point I believe the only outstanding issue is kio_locate not responding to session management requests.  Can you please test and confirm?

Thanks!
Comment 134 Darrell 2013-08-04 19:12:21 CDT
Tim, thanks for the work you've put into this bug report.

I updated the tdebase sources and rebuilt. Here is what I see:

* The countdown timer works nicely. :-)

* As previously mentioned, when I create a stall with tdeio-locate, there is no process name to help with debugging. This is not the fault of the status dialog, only that tdeio-locate is broken in such a manner that information is unavailable. Ignoring that tdeio-locate needs repair, I don't know how to remedy this lack of information. As all other processes provide a process app name to the status dialog, how should the dialog handle stubborn apps? A process ID? Replace the empty app name with "Unknown process" in the status dialog message string? I think something would be helpful, not just for debugging for developers, but for users too.

* When I create a stall with kate, the status dialog begins the 20 second countdown. At the end of the 20 seconds nothing happens and both the kate and status dialog remain on screen. In the case of kate I don't think users will get confused. Most users should figure out what is happening. Yet that the status dialog does not disappear after the 20 second timeout does seem funky. I wonder whether the status dialog should appear at all because kate already has its own dialog. Should the status dialog perhaps remain idle/off screen until the kate dialog is acknowledged? That would seem to make more sense.

* As previously mentioned, during a stall, the time from when the status dialog disappears and the logout is fully completed is about 8.5 seconds on my dual core system. A normal non-stall logout is about 3.7 seconds. I recognize that stalled logouts will be the exception corner case rather than the norm, but seems odd that the process notification and killing does not proceed in about the same 3.7 second period after the status dialog disappears. This extra delay could be solely tdeio-locate all on its own because I see a 3.7 second time when I acknowledge the kate dialog.

For testing all of this, tdeio-locate seems to be a classic Frankenstein corner case. I'm curious to see the results after resolving the tdeio-locate problems.

I hope this helps!
Comment 135 Darrell 2013-08-04 19:29:12 CDT
A trivial observation: the status dialog is not wide enough to always display the full text of the status message, which means the process names get clipped. Could we make the status dialog wider?

Is the width defined in tdebase/kcontrol/smserver/smserverconfigdlg.ui at 325 pixels? If yes then I could tinker a bit with that.
Comment 136 Timothy Pearson 2013-08-06 09:57:58 CDT
(In reply to comment #134)
> Tim, thanks for the work you've put into this bug report.

No problem. :-)  I want this fixed for R14.0.0, as this is one area where TDE can easily display an interface consistent with the project's stated goals regarding keeping the user informed and in control at all times.

> I updated the tdebase sources and rebuilt. Here is what I see:
> 
> * The countdown timer works nicely. :-)

Good!

> * As previously mentioned, when I create a stall with tdeio-locate, there is no
> process name to help with debugging. This is not the fault of the status
> dialog, only that tdeio-locate is broken in such a manner that information is
> unavailable. Ignoring that tdeio-locate needs repair, I don't know how to
> remedy this lack of information. As all other processes provide a process app
> name to the status dialog, how should the dialog handle stubborn apps? A
> process ID? Replace the empty app name with "Unknown process" in the status
> dialog message string? I think something would be helpful, not just for
> debugging for developers, but for users too.

I tried and tried to extract something useful from the session manager, but to no avail.  All I can reliably give is something like "Unknown client (ID %d)" where client ID is not even a PID, it is an X11 session management ID.  Would that be an improvement or simply add confusion?  The client IDs are not helpful from a system admin point of view as they cannot (easily) be mapped to a process ID.

> * When I create a stall with kate, the status dialog begins the 20 second
> countdown. At the end of the 20 seconds nothing happens and both the kate and
> status dialog remain on screen. In the case of kate I don't think users will
> get confused. Most users should figure out what is happening. Yet that the
> status dialog does not disappear after the 20 second timeout does seem funky. I
> wonder whether the status dialog should appear at all because kate already has
> its own dialog. Should the status dialog perhaps remain idle/off screen until
> the kate dialog is acknowledged? That would seem to make more sense.

I have added code to hide the dialog during client interaction, and it does make the logout seem cleaner than before.  Good idea!

However, I have only been able to replicate the timeout showing up during interaction once.  There is code in place that should be not only hiding the dialog but also stopping the timer and displaying "logout paused" during client interaction.  Can you see if there is a reliable method to trigger this bug in the latest tdebase code from GIT?
 
> * As previously mentioned, during a stall, the time from when the status dialog
> disappears and the logout is fully completed is about 8.5 seconds on my dual
> core system. A normal non-stall logout is about 3.7 seconds. I recognize that
> stalled logouts will be the exception corner case rather than the norm, but
> seems odd that the process notification and killing does not proceed in about
> the same 3.7 second period after the status dialog disappears. This extra delay
> could be solely tdeio-locate all on its own because I see a 3.7 second time
> when I acknowledge the kate dialog.
> 
> For testing all of this, tdeio-locate seems to be a classic Frankenstein corner
> case. I'm curious to see the results after resolving the tdeio-locate problems.

tdeio-locate should now be fixed in GIT, but it will require a recompile of tqt3, tdelibs, and tdeio-locate to test.  Basically, there is no reason for tdeioslaves to use X11 session management; I have therefore disabled session management in tdeio-locate which appears to fix the logout stall.  If no ill effects are observed from this change, I would like to disconnect all the other tdeioslaves from the X11 session manager as well with similar patches.
 
> I hope this helps!

Yes it does!  Thanks!
Comment 137 Darrell 2013-08-06 12:47:35 CDT
Where is the status dialog width defined? I want to test that.

I believe "unknown client" or "unknown process" is the best option. Adding other information that is not useful works against our project goals as well as confuse everybody.

Let's accept that some apps/processes are not going to play nice with the new features. We live with that. :-)

I will update my sources and rebuild. Initially I will keep the older tdeio-locate package as that provides a good mechanism for testing the features. I'll minimally rebuild tdebase to test the latest interaction with kate. Then later I'll build a complete new package set since several modules are affected to get tdeio-locate to function correctly. This methodical approach will take a day or two to complete. :-)

That said, I don't understand your description of the timeout problem, but hopefully I'll understand better when I rebuild and test.
Comment 138 Darrell 2013-08-06 14:51:16 CDT
I only rebuilt tdebase and tested kate.

Upon logging out, the kate dialog appears immediately. The shutdown dialog appears about 3 seconds later.

I was expecting the shutdown dialog not to appear at all when kate's dialog appeared. Perhaps I'm confused about that. From a user's perspective, I think most users will figure out what is happening when both dialogs appear. Yet I think when kate's dialog appears the shutdown dialog should stay out of the way for a cleaner interface. A cleaner interface will be less confusing to most users.

The countdown timer starts at 18 seconds rather than 20. This seems repeatable although I did not test extensively.

Ignoring both dialogs results in the shutdown dialog countdown timer resetting to 21 seconds. The shutdown dialog does not disappear and the countdown timer remains fixed at 21 seconds. The kate dialog remains on screen as well.

I expected 20 seconds. Possibly an +1 increment glitch in the code.

After ignoring both dialogs, selecting the Do Not Save button in the kate dialog results in the logout proceeding as normal.

When testing the current buggy tdeio-locate, again the shutdown timer started at 18 seconds. As there is no additional dialog to acknowledge, the logout proceeds after the timer expires.

As I already know the skip notification timer is 20 seconds, I'm expecting to see the countdown timer always start at 20. I believe the shutdown dialog is including the delay timer of 3 seconds in the math.

With some imagination, the 21 seconds could be a fixable incorrect +1 increment glitch. As the delay timer is defined as 3 seconds, the 18 seconds likely is a result of 21 - 3. If the +1 increment glitch is remedied, then I'd expect the shutdown timer to start at 17 seconds (20 - 3). If the skip notification section is patched to ignore the 3 second delay timer, then I'd expect the shutdown dialog to always begin at 20 seconds.

As non developer users will not be familiar with the code timer definitions, I think the countdown timer should start at 20 and not 20 - 3 (21 - 3). From an interface perspective, 20 seems like a "whole" number that is "intentional" whereas 18 seconds seems off the wall and strange. I understand the math but non developers users won't. :-)

My thinking: the shutdown dialog does not appear for 3 seconds and only appears when a process delays the logout. Therefore the 20 second skip notification timer should not include the 3 second delay timer. Both should be separate timers. This seems to be partially the case already because when the shutdown countdown timer terminated and reset, the timer reset to 21 (20) seconds rather than 18 seconds. So seems only the intialization of the countdown timer needs attention.

I'm going to update my entire source tree and rebuild a new package set. That will mean not testing tdeio-locate as we have been the past many days, but will reveal whether the latest X session changes work as intended.
Comment 139 Timothy Pearson 2013-08-06 15:20:20 CDT
(In reply to comment #138)
> I only rebuilt tdebase and tested kate.
> 
> Upon logging out, the kate dialog appears immediately. The shutdown dialog
> appears about 3 seconds later.

This is useful information.  This does not happen on my test machine, and it should not be happening at all.  I will need to look into this problem further.
Comment 140 Darrell 2013-08-06 16:26:37 CDT
For comparison sake, what happens at your end?
Comment 141 Darrell 2013-08-06 22:47:58 CDT
I built a new package set with the latest changes as of this afternoon. tdeio-locate no longer stalls the logout!

Short of temporarily reverting the patches to tdeio-locate, I don't know of any way to replicate the previous behavior. As the logout now proceeds normally, I was unable to see whether the status dialog message actually notified tdeio-locate or subsequently killed the process. The status messages scroll too fast. I noted the process IDs of tdeio-locate but did not find those IDs listed anywhere in the xsession-error log with respect to twin or ksmserver.

The status dialog still appears when the kate dialog appears.

The timeout numbers (math) need attention (comment 138).

The status dialog width needs widening.

A short list, really, considering everything else. :-)

Is there some way I can test further with respect to tdeio slaves and X11 session fixes? I don't pretend to understand all that you fixed, but I can still test.
Comment 142 Timothy Pearson 2013-08-08 00:50:43 CDT
(In reply to comment #141)
> The status dialog still appears when the kate dialog appears.

Should be fixed.

> The timeout numbers (math) need attention (comment 138).

Should be fixed via the dialog no longer appearing when a client is interacting.

> The status dialog width needs widening.

My only concern with this is 640x480 screens (yes, they still exist on servers and such!).  How much wider does it need to be?  Could I shorten the status dialog text instead?

> A short list, really, considering everything else. :-)

Yep!  I hope we can close this bug for good very soon!

So that this doesn't get overlooked, the other tdeioslaves should be patched the same way tdeio-locate was.  This would a one-line patch for all tdeioslaves and *shouldn't* cause any problems unless the slave saves some kind of data on session close (highly unlikely, but one never knows!).
Comment 143 Darrell 2013-08-08 15:54:47 CDT
I rebuilt tdebase.

Attempting to logout results in the normal status dialog.

Attempting to logout with an unsaved kate document results in the kate dialog appearing. The status dialog does not appear. When I acknowledge the kate dialog the logout proceeds without any status dialog ever appearing. Am I supposed to see the status dialog after acknowledging the kate dialog?

Regarding the status dialog width, I'm dumbfounded that 640x480 still dictates design concerns. :-) That said, I don't think shortening the status dialog message is a good idea.

If I knew where the width was defined I could tinker at my end (within the 640x480 limitation). Along that thought, would be nice if I could temporarily patch a 1/4 or 1/2 second delay in the status dialog messages so I can read them one-by-one. How would I patch that? This would be a test patch only.

I try to read the status messages but they fly by too fast. A temporary 1/4 or 1/2 second delay would allow me to read them. Once in a while, not always, I can see "notificati" or something like that and I believe that process is kdbusnotification. Yet most of the time I can't read anything at all. Yes, a testament to the snappy exit speed :-), but for "quality assurance purposes" I'd like to actually validate/read the status text messages and see the closing parentheses.

Regarding the remaining tdeio slaves you want to patch, I saw the tdeio-locate commit and the patch was a one-liner. I say go ahead and patch them all now. If something breaks, which I doubt, somebody will let us know soon enough. :-)
Comment 144 Darrell 2013-08-08 15:59:12 CDT
Note: one element we did not test is logging out when the user does not have the status dialog enabled. The previously broken tdeio-locate would have been good for that test. Without the status dialog I suspect the desktop just hangs with no notice or clue provided to the user.
Comment 145 Timothy Pearson 2013-08-08 23:33:48 CDT
In response to your queries regarding the location of the code setting the dialog width, I am pretty sure that the dialog width is dynamically set based on the TDE banner/header.  It's been a while since I worked on that code. :-)

IIRC, Windows used to insert ellipses if the text ran more than the width of some status dialogs.  Would that work here?  I don't want to tinker too much with the dialog as I like the look of it (artistically) the way it is now.
Comment 146 Darrell 2013-08-09 16:23:39 CDT
I don't know how the status dialog is sized. All I see in the dialog title bar is "Please wait..." and the dialog is much wider than that text snippet.

The truncated message text is cosmetic. I suppose reading all the text is meaningful only when a stall occurs. When a stall occurs the user might not be able to read the full string but should be able to read most of the string.

Most of the strings are readable, but some longer process names are truncated at the right edge of the dialog.

I'm rebuilding a full package set. I'll report later on the recent patches with interaction of the dialog and kate.

Is there anything we can do when users do not enable the status dialog and a logout stall occurs? Does the 20 second timeout still occur?
Comment 147 Darrell 2013-08-09 16:25:44 CDT
Will temporarily reversing commit 704ef698 2013-08-06 cause tdeio-locate to stall as previously experienced? We need a simple mechanism to test all of the recent patches as well as testing when the status dialog is not enabled.
Comment 148 Darrell 2013-08-09 23:23:03 CDT
* I tested the latest patches against kate with an unsaved document. For a brief moment I see the status dialog, the status dialog then disappears, then the kate dialog appears. I checked that the status dialog was not cleverly hidden behind the kate dialog, etc. :-) When I acknowledge the kate dialog the status dialog reappears and the logout proceeds. The status dialog is on screen less time that when there is no stall. I'm guessing the status dialog appears, kate stalls the process, the status dialog disappears but retains memory of previous notifications, and when reappearing resumes the process, which would explain why the dialog is on screen less time than normal. Just guessing. :-)

* I reversed the patch in tdeio-locate but that did not create a stall. Thus I no longer can test the two buttons.

* I don't know of any way to test a hidden stall as done with the previously broken tdeio-locate.

* A testing patch that adds a 1/4 to 1/2 second pause after each status dialog message would be a nice way to slow the logout process to be readable. Everything flies by so fast I don't know how we can really verify anything.
Comment 149 Timothy Pearson 2013-08-10 14:10:38 CDT
(In reply to comment #148)
> * I reversed the patch in tdeio-locate but that did not create a stall. Thus I
> no longer can test the two buttons.

That should have worked.  Did you log out and log in after building and installing the modified tdeio-locate?

> * I don't know of any way to test a hidden stall as done with the previously
> broken tdeio-locate.
> 
> * A testing patch that adds a 1/4 to 1/2 second pause after each status dialog
> message would be a nice way to slow the logout process to be readable.
> Everything flies by so fast I don't know how we can really verify anything.

You would need to add a "usleep(250000);" after each call to "static_cast<KSMShutdownIPDlg*>(shutdownNotifierIPDlg)->setStatusMessage(...);" in shutdown.cpp.

I would like to close this report if possible, and have you file a new report with a lower priority for the truncated text problem.  This report is an R14 blocker, whereas I don't think the truncated text should block R14. :-)
Comment 150 Darrell 2013-08-10 17:16:59 CDT
Durnit. My bad. I downloaded the tdeio-locate patch but forgot to actually apply. Reversing the patch does indeed restore the stall behavior. With that specially compiled package I now can test/observe the buttons again.

I'll tinker with a pause patch on my own.

I filed bug report 1618 for the remaining minor issues.