By default, Bugzilla does not search the list of RESOLVED bugs.
You can force it to do so by putting the upper-case word ALL in front of your search query, e.g.: ALL tdelibs
We recommend searching for bugs this way, as you may discover that your bug has already been resolved and fixed in a later release.
Bug 1508 - Qt3/TQt3 contains numerous threading data races
Summary: Qt3/TQt3 contains numerous threading data races
Status: RESOLVED FIXED
Alias: None
Product: TDE
Classification: Unclassified
Component: qt3 (show other bugs)
Version: R14.0.0 [Trinity]
Hardware: Other Linux
: P1 blocker
Assignee: Timothy Pearson
URL:
Depends on:
Blocks: 1404 1453
  Show dependency treegraph
 
Reported: 2013-05-11 18:59 CDT by Timothy Pearson
Modified: 2013-06-08 02:19 CDT (History)
4 users (show)

See Also:
Compiler Version:
TDE Version String:
Application Version:
Application Name:


Attachments
Fix a number of major threading failures in Qt3/TQt3 (53.22 KB, patch)
2013-05-13 02:46 CDT, Timothy Pearson
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Timothy Pearson 2013-05-11 18:59:26 CDT
Running Qt3/TQt3 applications under Valgrind's DRD tool reveals numerous threading data races in the Qt3/TQt3 libraries.  These data races are likely the root cause of many subtle and hard to reproduce TDE bugs.

I am currently working on introducing the required thread mutexes into the Qt3/TQt3 libraries.  This bug report exists primarily to allow dependency of various threading bugs in the tracker on this report.
Comment 1 Darrell 2013-05-12 14:20:49 CDT
Is there an estimated timeline for completion? Second, is this the type of issue that halts nightly builds until further notice?
Comment 2 Timothy Pearson 2013-05-12 15:51:19 CDT
No timeline yet as this is a complex issue that needs careful, meticulous work to avoid introducing deadlocks and other "strange"/random failures.

Regarding the nightly builds, I recently worked through the remaining barriers to automation.  Therefore, the nightly builds are, for the first time, actually run every night as needed, and do not need to be halted as they were in the past.
Comment 3 Timothy Pearson 2013-05-13 02:46:43 CDT
Created attachment 1274 [details]
Fix a number of major threading failures in Qt3/TQt3

I believe I have isolated a majority of the larger threading failures in TQt3.  A patch is attached which should resolve those threading failures, but I need volunteers to test this patch on single core and multicore systems to ensure that no regressions were accidentally introduced.  For example, locked up applications and noticeable performance drops would be considered regressions.

If no regressions are reported, I will merge this patch with both Qt3 and TQt3.  Please note that there may be other threading failures in Qt3/TQt3 even with this patch, however most of the major ones should be fixed with this patch.

While a full TDE rebuild is not needed, the following modules (at a minimum) should rebuilt as they are either used in or contain threaded applications:
tdelibs
tdebase
tdeutils
tdenetwork
amarok
Comment 4 Darrell 2013-05-13 12:59:16 CDT
What kind of failures/problems should one see before applying the patch? Basic crashing and slowdowns?

I see bug report 1404 listed as a dependency report to this bug report. Testing will require several days because I don't use ark every five minutes; perhaps not at all for days and as reported in 1404, the bug is not repeatable. I presume, however, the backtraces in 1404 led to the patch in attachment 1274 [details]?

I don't use TDM either (bug report 1453).

On a side note, despite the great work in bug report 760, I have of late been witnessing complete stalls when exiting Trinity. Not repeatable but the stall is sufficiently long (black screen of death) that I use Ctrl-Alt-Backspace to kill X altogether. Could this be related?

I'm just looking for specific details for testing. I get the feeling this bug report is important. :-)
Comment 5 Timothy Pearson 2013-05-13 13:43:58 CDT
(In reply to comment #4)
> What kind of failures/problems should one see before applying the patch? Basic
> crashing and slowdowns?

Correct.

> I see bug report 1404 listed as a dependency report to this bug report. Testing
> will require several days because I don't use ark every five minutes; perhaps
> not at all for days and as reported in 1404, the bug is not repeatable. I
> presume, however, the backtraces in 1404 led to the patch in attachment 1274 [details]?

No they did not.  For me, Ark was operating improperly by using 100% CPU after initial loading of an archive, but I could not get it to crash.  Therefore, I suspected a threading problem and ran ark through helgrind and drd, which both revealed numerous data races in TQt3.  Eliminating those data races resulted in the patch attached to this report.  Interestingly the ark high CPU bug is NOT resolved with this patch, even though helgrind reports no errors from ark.

> I don't use TDM either (bug report 1453).
> 
> On a side note, despite the great work in bug report 760, I have of late been
> witnessing complete stalls when exiting Trinity. Not repeatable but the stall
> is sufficiently long (black screen of death) that I use Ctrl-Alt-Backspace to
> kill X altogether. Could this be related?

I don't know.  Try enabling the shutdown profiling to see which application is stalling the shutdown.  Unfortunately, due to the way the X session management system is designed, any faulty application (even a non-TDE application!) can stall the shutdown process for a long period of time.

> I'm just looking for specific details for testing. I get the feeling this bug
> report is important. :-)

Yes it is. :-)
Comment 6 Darrell 2013-05-13 13:55:03 CDT
What is shutdown profiling?
Comment 7 Darrell 2013-05-13 14:06:36 CDT
Never mind --- I remember: -DBUILD_PROFILE_SHUTDOWN=ON in tdebase configure options.
Comment 8 Darrell 2013-05-13 20:09:39 CDT
I rebuilt a fresh package set including the patch in attachment 1274 [details].

I am not seeing any CPU hogging with any apps. A few spot checks with ark and I am not seeing 100% usage (except when opening large archives) and no crashing (bug report 1404). Probably will require a few days of observation before trying to conclude anything.

Monitoring processes with top reveals kded and ksmserver both consume a consistent 4% to 5% CPU time. Not sure what that is about.
Comment 9 Timothy Pearson 2013-05-14 19:36:17 CDT
Comment on attachment 1274 [details]
Fix a number of major threading failures in Qt3/TQt3

This patch, along with thread termination improvements, was pushed to GIT in hashes 4eba9b8 (Qt3) and 9a4765a (TQt3)
Comment 10 Alex Couture 2013-05-18 15:39:50 CDT
(In reply to comment #8)
> I rebuilt a fresh package set including the patch in attachment 1274 [details].
> 
> I am not seeing any CPU hogging with any apps. A few spot checks with ark and I
> am not seeing 100% usage (except when opening large archives) and no crashing
> (bug report 1404). Probably will require a few days of observation before
> trying to conclude anything.
> 
> Monitoring processes with top reveals kded and ksmserver both consume a
> consistent 4% to 5% CPU time. Not sure what that is about.

Yes, I'm seeing this too, kded and ksmserver on my Asus EEE (first-gen) with Ubuntu 13.04 and TDE R14 nightlies, with the desktop on idle, in Konsole (with top) I see kded taking 23.6% of the CPU and ksmserver 23.4% of the CPU. Please note that it is a 800mhz Celeron CPU, so it's higher than your 4-5% CPU use, but it really slow down the laptop to an almost-unusable state.

Is there a way I can give debug info on this problem?

-Alexandre
Comment 11 Timothy Pearson 2013-05-18 16:13:21 CDT
(In reply to comment #10)
> (In reply to comment #8)
> > I rebuilt a fresh package set including the patch in attachment 1274 [details] [details].
> > 
> > I am not seeing any CPU hogging with any apps. A few spot checks with ark and I
> > am not seeing 100% usage (except when opening large archives) and no crashing
> > (bug report 1404). Probably will require a few days of observation before
> > trying to conclude anything.
> > 
> > Monitoring processes with top reveals kded and ksmserver both consume a
> > consistent 4% to 5% CPU time. Not sure what that is about.
> 
> Yes, I'm seeing this too, kded and ksmserver on my Asus EEE (first-gen) with
> Ubuntu 13.04 and TDE R14 nightlies, with the desktop on idle, in Konsole (with
> top) I see kded taking 23.6% of the CPU and ksmserver 23.4% of the CPU. Please
> note that it is a 800mhz Celeron CPU, so it's higher than your 4-5% CPU use,
> but it really slow down the laptop to an almost-unusable state.
> 
> Is there a way I can give debug info on this problem?
> 
> -Alexandre

Yes, though I do not think this information belongs in this bug report as ksmserver and kded are not multithreaded.

Please file a new report with the following information, after ensuring that both the tdelibs and tdebase debugging symbols are installed and that all packages are up to date.

1.) Is this a regression (i.e. do you see the relatively high CPU use with TDE 3.5.13.x)?
2.) Break into kded with gdb (gdb --pid `pidof kded`) and execute 'bt', then post the output
3.) Break into ksmserver with gdb (gdb --pid `pidof kded`) and execute 'bt', then post the output

Thanks!

With respect to this bug report, I am changing its status to NEEDINFO as the threading fixes have been in GIT for some time now and Helgrind now reports applications such as Ark as DRD clean.  I primarily need to know if there are any regressions tracable to this report; if no one reports any regressions within the next couple weeks I will close this report as fixed.
Comment 12 Darrell 2013-05-18 16:54:48 CDT
I'll file a new bug report for kded/ksmserver.

With respect to this report, leaving open for a few weeks does not impact the R14.0.0 release plan. In a few weeks should nobody report anything new then we close the report and strike the item from the road map. :-)

Because of the related bug reports I have been updating my full package set every few days. I'm not seeing any crashes, slow downs, stalls, etc. Ark has been stable. We have bug report 1404 to specifically address ark and we can leave that open for a few more weeks too, to allow me time to use ark under normal usage habits.

At the moment for this report, thumbs up. :-)
Comment 13 Timothy Pearson 2013-06-08 02:19:30 CDT
After three weeks no new crashes observed and the system seems slightly more stable, so I am going to mark this one as RESOLVED FIXED.  If Valgrind or a similar tool positively traces a data race into Qt3/TQt3, please reopen this report and append detailed information about the new data race.

Thanks all for testing the patches!