| Summary: | kdesktop uses 100% cpu and becomes unresponsive with multiple logins | ||
|---|---|---|---|
| Product: | TDE | Reporter: | Russell Brown <russell> |
| Component: | tdebase | Assignee: | Timothy Pearson <kb9vqf> |
| Status: | RESOLVED FIXED | ||
| Severity: | critical | CC: | bugwatch, russell, slavek.banko |
| Priority: | P5 | ||
| Version: | R14.0.x [Trinity] | ||
| Hardware: | amd64 | ||
| OS: | Debian Stretch | ||
| Compiler Version: | TDE Version String: | ||
| Application Version: | Application Name: | ||
| Bug Depends on: | |||
| Bug Blocks: | 2885 | ||
Hmmm... just for a laugh, I deleted kdesktop_lock on my test server and I can no longer get kdesktop into a 100% CPU loop. Obviously removing the facility to lock one's screen isn't a Real World solution but perhaps it'll help track down the issue. HTH More random flailing around (please tell me to stop if this is unhelpful). I've reinstated kdesktop_lock and noticed that when I login, TDE creates /tmp/tde-<username>/kdesktop_lock_lockfile. This contains the PID of the kdesktop_lock for the session. Logging in a second time as the same username, the PID at the top of this file is replaced by a different number and the users .xsession-errors starts filling up with: [tdecore] Deleting stale lockfile /tmp/tde-xtest/kdesktop_lock_lockfile [tdecore] Deleting stale lockfile /tmp/tde-xtest/kdesktop_lock_lockfile [tdecore] Deleting stale lockfile /tmp/tde-xtest/kdesktop_lock_lockfile [tdecore] Deleting stale lockfile /tmp/tde-xtest/kdesktop_lock_lockfile The original kdesktop_lock process no longer exists and a new one is being constantly respawned (presumably by the access/lstat/pipe chunk of code mentioned in my previous comment). As something in kdesktop/kdesktop_lock seems to be relying on this file, shouldn't it use the DISPLAY name rather than the username so that it's unique and refers to a specific display? Oh yes, looking at tdebase/kdesktop/lock/main.cc around line 332 (in the 14.0.5 source), there's a comment about it terminating the existing (stale) process. Hmmm... I'll try pulling the latest from GIT and compiling. If that works I'll alter lock/main.cc to include the DISPLAY name and see if that fixes it. As hoped, adding the $DISPLAY to the lockfile name appears to fix the problem.
Here's the code I put into kdesktop/lock/main.cc. Yes it's not right (not sure what the official TDE mechanism for mallocing space is) but it proves the point.
//TDELockFile lock(locateLocal("tmp", "kdesktop_lock_lockfile"));
char lockfilename[1024];
strcpy(lockfilename,"kdesktop_lock_lockfile.");
strcat(lockfilename,getenv("DISPLAY"));
TDELockFile lock(locateLocal("tmp", lockfilename));
Now why did that submit the comment before I'd finished typing? Anyway, to continue.... This creates: # ls -l /tmp/tde-xtest/ total 8 -rw-r--r-- 1 xtest tract 31 Feb 7 11:07 kdesktop_lock_lockfile.192.168.0.1:0 -rw-r--r-- 1 xtest tract 31 Feb 7 11:07 kdesktop_lock_lockfile.192.168.0.1:1 lrwxrwxrwx 1 xtest tract 33 Feb 7 11:07 tdesycoca -> /var/tmp/tdecache-xtest/tdesycoca # which stops kdeskop/kdesktop_lock fighting over control of the two different displays. Can someone who understands all of this pretty it up and include the fix? Resolved by commit d3834ad4f5 (master) and 5f7a92ca0a (r14.0.x). |
Logging in (via XDMCP) multiple times as the same user/uid is causing kdesktop to do weird refresh stuff then get stuck in a loop (100% CPU usage). I have seen this once-in-a-blue-moon under 14.0.5/Debian Jessie but 14.0.6/Debian Stretch seems to make it fail every time. Yes, I know it's bad practise to login multiple times as the same user but people will do these things despite being told not to :-( Anyway, to recreate this; login as a.n.user via XDMCP/tdm (I use physical X-Terminals or VNC sessions). Fire up another X-Terminal/VNC Session and login again as the same user. The mouse cursor will now flicker in a weird 'I'm Busy' kind of fashion and the desktop will refresh sparodically. You'll also see kdestops for the specific user consuming high amounts of CPU. Login again with a third session (or wait a while with two sessions) and you'll and up with a kdesktop using 100% CPU and being unresponsive. The looping kdesktop thread seems to be stuck in a polling loop on a pipe connected to the parent kdesktop process (verbose notes below). As a matter of interest, if you 'strace' the first kdesktop and watch as you login on the second session; kdesktop will start to constantly call access(1) and lstat(1) on /opt/trinity/bin/kdesktop_lock and creates and closes pipes. This doesn't seem sane to me. Here's an extract from such an strace: read(10, "\0", 16) = 1 wait4(31621, [{WIFSIGNALED(s) && WTERMSIG(s) == SIGKILL}], WNOHANG, NULL) = 31621 access("/opt/trinity/bin/kdesktop_lock", F_OK) = 0 access("/opt/trinity/bin/kdesktop_lock", X_OK) = 0 lstat("/opt/trinity/bin/kdesktop_lock", {st_mode=S_IFREG|0755, st_size=323520, ...}) = 0 getpid() = 30005 geteuid() = 1001 pipe([18, 19]) = 0 clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f376887bbd0) = 31624 close(19) = 0 read(18, "", 1) = 0 close(18) = 0 Is this bit related to Bug 2775? Anyway, here's the stuff I found once kdesktop gets itself into the 100% CPU loop. ------ PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 10285 utest 20 0 258196 23212 19128 S 100.0 0.0 45:48.54 kdesktop 21584 lls 20 0 44608 4636 2804 R 26.3 0.0 0:00.07 top 1 lls 20 0 205688 7984 5312 S 0.0 0.0 3:50.32 systemd Looking at the 100% kdesktop process, it has a thread (10287) running in a constant poll timeout loop (strace streams the lines below): poll([{fd=13, events=POLLIN}, {fd=15, events=POLLIN}], 2, 0) = 0 (Timeout) poll([{fd=13, events=POLLIN}, {fd=15, events=POLLIN}], 2, 0) = 0 (Timeout) poll([{fd=13, events=POLLIN}, {fd=15, events=POLLIN}], 2, 0) = 0 (Timeout) poll([{fd=13, events=POLLIN}, {fd=15, events=POLLIN}], 2, 0) = 0 (Timeout) Looking at /proc/10287/fd I see: lr-x------ 1 utest tract 64 Feb 6 11:17 0 -> pipe:[11722107] l-wx------ 1 utest tract 64 Feb 6 11:17 1 -> /home/utest/.xsession-errors lr-x------ 1 utest tract 64 Feb 6 11:17 10 -> pipe:[11687858] l-wx------ 1 utest tract 64 Feb 6 11:17 11 -> pipe:[11687858] lrwx------ 1 utest tract 64 Feb 6 11:17 12 -> socket:[11667305] lr-x------ 1 utest tract 64 Feb 6 11:17 13 -> pipe:[11688700] l-wx------ 1 utest tract 64 Feb 6 11:17 14 -> pipe:[11688700] lrwx------ 1 utest tract 64 Feb 6 11:17 15 -> anon_inode:[eventfd] lr-x------ 1 utest tract 64 Feb 6 11:17 16 -> /var/tmp/tdecache-utest/tdesycoca l-wx------ 1 utest tract 64 Feb 6 11:17 2 -> /home/utest/.xsession-errors lrwx------ 1 utest tract 64 Feb 6 11:17 3 -> socket:[11746354] lrwx------ 1 utest tract 64 Feb 6 11:17 4 -> socket:[11687855] lrwx------ 1 utest tract 64 Feb 6 11:17 5 -> socket:[11687856] lr-x------ 1 utest tract 64 Feb 6 11:17 6 -> pipe:[11687857] l-wx------ 1 utest tract 64 Feb 6 11:17 7 -> pipe:[11687857] lrwx------ 1 utest tract 64 Feb 6 11:17 8 -> anon_inode:[eventfd] lrwx------ 1 utest tract 64 Feb 6 11:17 9 -> anon_inode:[eventfd] The pipe on fd 13 for the thread seems to be connected to it's parent: # (find /proc -type l | xargs ls -l | grep 11688700) 2>/dev/null lr-x------ 1 utest tract 64 Feb 6 11:57 /proc/10285/fd/12 -> pipe:[11688700] l-wx------ 1 utest tract 64 Feb 6 11:57 /proc/10285/fd/13 -> pipe:[11688700] lr-x------ 1 utest tract 64 Feb 6 11:58 /proc/10285/task/10285/fd/12 -> pipe:[11688700] l-wx------ 1 utest tract 64 Feb 6 11:58 /proc/10285/task/10285/fd/13 -> pipe:[11688700] lr-x------ 1 utest tract 64 Feb 6 11:58 /proc/10285/task/10287/fd/12 -> pipe:[11688700] l-wx------ 1 utest tract 64 Feb 6 11:58 /proc/10285/task/10287/fd/13 -> pipe:[11688700] # ------- I'm very happy to test, log, check out ideas as I have a test server available.