By default, Bugzilla does not search the list of RESOLVED bugs.
You can force it to do so by putting the upper-case word ALL in front of your search query, e.g.: ALL tdelibs
We recommend searching for bugs this way, as you may discover that your bug has already been resolved and fixed in a later release.
Bug 2998 - kdesktop uses 100% cpu and becomes unresponsive with multiple logins
Summary: kdesktop uses 100% cpu and becomes unresponsive with multiple logins
Status: RESOLVED FIXED
Alias: None
Product: TDE
Classification: Unclassified
Component: tdebase (show other bugs)
Version: R14.0.x [Trinity]
Hardware: amd64 Debian Stretch
: P5 critical
Assignee: Timothy Pearson
URL:
Depends on:
Blocks: R14.0.6
  Show dependency treegraph
 
Reported: 2019-02-06 10:07 CST by Russell Brown
Modified: 2019-02-11 10:46 CST (History)
3 users (show)

See Also:
Compiler Version:
TDE Version String:
Application Version:
Application Name:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Russell Brown 2019-02-06 10:07:13 CST
Logging in (via XDMCP) multiple times as the same user/uid is causing
kdesktop to do weird refresh stuff then get stuck in a loop (100% CPU
usage).  I have seen this once-in-a-blue-moon under 14.0.5/Debian Jessie
but 14.0.6/Debian Stretch seems to make it fail every time.

Yes, I know it's bad practise to login multiple times as the same user
but people will do these things despite being told not to :-(

Anyway, to recreate this; login as a.n.user via XDMCP/tdm (I use physical X-Terminals or VNC sessions).

Fire up another X-Terminal/VNC Session and login again as the same user.

The mouse cursor will now flicker in a weird 'I'm Busy' kind of fashion and the desktop will refresh sparodically.

You'll also see kdestops for the specific user consuming high amounts of CPU.

Login again with a third session (or wait a while with two sessions) and you'll and up with a kdesktop using 100% CPU and being unresponsive.

The looping kdesktop thread seems to be stuck in a polling loop on a pipe connected to the parent kdesktop process (verbose notes below).

As a matter of interest, if you 'strace' the first kdesktop and watch as you login on the second session; kdesktop will start to constantly call access(1) and lstat(1) on /opt/trinity/bin/kdesktop_lock and creates and closes pipes. This doesn't seem sane to me.

Here's an extract from such an strace:

read(10, "\0", 16)                      = 1
wait4(31621, [{WIFSIGNALED(s) && WTERMSIG(s) == SIGKILL}], WNOHANG, NULL) = 31621
access("/opt/trinity/bin/kdesktop_lock", F_OK) = 0
access("/opt/trinity/bin/kdesktop_lock", X_OK) = 0
lstat("/opt/trinity/bin/kdesktop_lock", {st_mode=S_IFREG|0755, st_size=323520, ...}) = 0
getpid()                                = 30005
geteuid()                               = 1001
pipe([18, 19])                          = 0
clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f376887bbd0) = 31624
close(19)                               = 0
read(18, "", 1)                         = 0
close(18)                               = 0

Is this bit related to Bug 2775?

Anyway, here's the stuff I found once kdesktop gets itself into the 100% CPU loop.

------  
  PID   USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
  10285 utest     20   0  258196  23212  19128 S 100.0  0.0  45:48.54 kdesktop
  21584 lls       20   0   44608   4636   2804 R  26.3  0.0   0:00.07 top
      1 lls       20   0  205688   7984   5312 S   0.0  0.0   3:50.32 systemd

Looking at the 100% kdesktop process, it has a thread (10287) running in
a constant poll timeout loop (strace streams the lines below):

poll([{fd=13, events=POLLIN}, {fd=15, events=POLLIN}], 2, 0) = 0 (Timeout)
poll([{fd=13, events=POLLIN}, {fd=15, events=POLLIN}], 2, 0) = 0 (Timeout)
poll([{fd=13, events=POLLIN}, {fd=15, events=POLLIN}], 2, 0) = 0 (Timeout)
poll([{fd=13, events=POLLIN}, {fd=15, events=POLLIN}], 2, 0) = 0 (Timeout)

Looking at /proc/10287/fd I see:

lr-x------ 1 utest tract 64 Feb  6 11:17 0 -> pipe:[11722107]
l-wx------ 1 utest tract 64 Feb  6 11:17 1 -> /home/utest/.xsession-errors
lr-x------ 1 utest tract 64 Feb  6 11:17 10 -> pipe:[11687858]
l-wx------ 1 utest tract 64 Feb  6 11:17 11 -> pipe:[11687858]
lrwx------ 1 utest tract 64 Feb  6 11:17 12 -> socket:[11667305]
lr-x------ 1 utest tract 64 Feb  6 11:17 13 -> pipe:[11688700]
l-wx------ 1 utest tract 64 Feb  6 11:17 14 -> pipe:[11688700]
lrwx------ 1 utest tract 64 Feb  6 11:17 15 -> anon_inode:[eventfd]
lr-x------ 1 utest tract 64 Feb  6 11:17 16 -> /var/tmp/tdecache-utest/tdesycoca
l-wx------ 1 utest tract 64 Feb  6 11:17 2 -> /home/utest/.xsession-errors
lrwx------ 1 utest tract 64 Feb  6 11:17 3 -> socket:[11746354]
lrwx------ 1 utest tract 64 Feb  6 11:17 4 -> socket:[11687855]
lrwx------ 1 utest tract 64 Feb  6 11:17 5 -> socket:[11687856]
lr-x------ 1 utest tract 64 Feb  6 11:17 6 -> pipe:[11687857]
l-wx------ 1 utest tract 64 Feb  6 11:17 7 -> pipe:[11687857]
lrwx------ 1 utest tract 64 Feb  6 11:17 8 -> anon_inode:[eventfd]
lrwx------ 1 utest tract 64 Feb  6 11:17 9 -> anon_inode:[eventfd]

The pipe on fd 13 for the thread seems to be connected to it's parent:

# (find /proc -type l | xargs ls -l | grep 11688700) 2>/dev/null
lr-x------ 1 utest tract 64 Feb  6 11:57 /proc/10285/fd/12 -> pipe:[11688700]
l-wx------ 1 utest tract 64 Feb  6 11:57 /proc/10285/fd/13 -> pipe:[11688700]
lr-x------ 1 utest tract 64 Feb  6 11:58 /proc/10285/task/10285/fd/12 -> pipe:[11688700]
l-wx------ 1 utest tract 64 Feb  6 11:58 /proc/10285/task/10285/fd/13 -> pipe:[11688700]
lr-x------ 1 utest tract 64 Feb  6 11:58 /proc/10285/task/10287/fd/12 -> pipe:[11688700]
l-wx------ 1 utest tract 64 Feb  6 11:58 /proc/10285/task/10287/fd/13 -> pipe:[11688700]
#

-------

I'm very happy to test, log, check out ideas as I have a test server available.
Comment 1 Russell Brown 2019-02-06 10:28:28 CST
Hmmm... just for a laugh, I deleted kdesktop_lock on my test server and I can no longer get kdesktop into a 100% CPU loop.

Obviously removing the facility to lock one's screen isn't a Real World solution but perhaps it'll help track down the issue.

HTH
Comment 2 Russell Brown 2019-02-06 11:44:00 CST
More random flailing around (please tell me to stop if this is unhelpful).

I've reinstated kdesktop_lock and noticed that when I login, TDE creates /tmp/tde-<username>/kdesktop_lock_lockfile.

This contains the PID of the kdesktop_lock for the session.

Logging in a second time as the same username, the PID at the top of this file is replaced by a different number and the users .xsession-errors starts filling up with:

[tdecore] Deleting stale lockfile /tmp/tde-xtest/kdesktop_lock_lockfile
[tdecore] Deleting stale lockfile /tmp/tde-xtest/kdesktop_lock_lockfile
[tdecore] Deleting stale lockfile /tmp/tde-xtest/kdesktop_lock_lockfile
[tdecore] Deleting stale lockfile /tmp/tde-xtest/kdesktop_lock_lockfile

The original kdesktop_lock process no longer exists and a new one is being constantly respawned (presumably by the access/lstat/pipe chunk of code mentioned in my previous comment).

As something in kdesktop/kdesktop_lock seems to be relying on this file, shouldn't it use the DISPLAY name rather than the username so that it's unique and refers to a specific display?

Oh yes, looking at tdebase/kdesktop/lock/main.cc around line 332 (in the 14.0.5 source), there's a comment about it terminating the existing (stale) process.

Hmmm... I'll try pulling the latest from GIT and compiling. If that works I'll alter lock/main.cc to include the DISPLAY name and see if that fixes it.
Comment 3 Russell Brown 2019-02-07 05:19:49 CST
As hoped, adding the $DISPLAY to the lockfile name appears to fix the problem.

Here's the code I put into kdesktop/lock/main.cc. Yes it's not right (not sure what the official TDE mechanism for mallocing space is) but it proves the point.

//TDELockFile lock(locateLocal("tmp", "kdesktop_lock_lockfile"));
        char lockfilename[1024];
        strcpy(lockfilename,"kdesktop_lock_lockfile.");
        strcat(lockfilename,getenv("DISPLAY"));
        TDELockFile lock(locateLocal("tmp", lockfilename));
Comment 4 Russell Brown 2019-02-07 05:23:50 CST
Now why did that submit the comment before I'd finished typing?

Anyway, to continue....

This creates:

# ls -l /tmp/tde-xtest/
total 8
-rw-r--r-- 1 xtest tract 31 Feb  7 11:07 kdesktop_lock_lockfile.192.168.0.1:0
-rw-r--r-- 1 xtest tract 31 Feb  7 11:07 kdesktop_lock_lockfile.192.168.0.1:1
lrwxrwxrwx 1 xtest tract 33 Feb  7 11:07 tdesycoca -> /var/tmp/tdecache-xtest/tdesycoca
# 

which stops kdeskop/kdesktop_lock fighting over control of the two different displays.

Can someone who understands all of this pretty it up and include the fix?
Comment 5 Slávek Banko 2019-02-11 10:46:04 CST
Resolved by commit d3834ad4f5 (master) and 5f7a92ca0a (r14.0.x).