[00:18:08] 10Continuous-Integration, 7Upstream: Change force merged cause a deadlock in Zuul gate-and-submit pipeline - https://phabricator.wikimedia.org/T93812#1208500 (10Legoktm) Is there a link for an upstream ticket? [02:23:16] 6Release-Engineering, 10Gerrit-Migration, 10Gitblit-Deprecate: Import all gerrit.wikimedia.org repositories with Diffusion - https://phabricator.wikimedia.org/T616#1208588 (10demon) This is basically almost all done. {P349} [05:25:49] 6Release-Engineering, 10MediaWiki-Debug-Logging, 10MediaWiki-General-or-Unknown, 10MediaWiki-Tarball-Backports, and 3 others: Create a minimal backport of PSR-3 logging to MediaWiki 1.23 LTS - https://phabricator.wikimedia.org/T91653#1208701 (10Legoktm) 5Open>3Resolved [05:29:03] legoktm@deployment-bastion:~$ mwscript ../../../../home/legoktm/listUnattached.php --wiki=enwiki [05:29:03] The MediaWiki script file "/mnt/srv/mediawiki-staging/php-master/../../../../home/legoktm/listUnattached.php" does not exist. [05:29:09] it does exist! [05:34:56] and deployment-mediawiki01 doesn't even have mwscript [05:36:04] > The last Puppet run was at Sun Apr 12 00:28:36 UTC 2015 (4627 minutes ago). [05:39:42] ok, you didn't see anything. [05:44:46] > ERROR - File not found: /home/legoktm/unattached.txt [05:44:48] um what [05:44:59] does my home just not exist or something? [05:47:54] I put it in /tmp for now and it's working [05:47:55] weird [06:31:27] 10Beta-Cluster, 6MediaWiki-API-Team, 10SUL-Finalization: Finalize SUL on beta cluster - https://phabricator.wikimedia.org/T96075#1208729 (10Legoktm) 5Open>3Resolved The following 13 users were renamed: ``` mysql> select * from users_to_rename; +--------+-------------------------------+------------------+... [08:01:12] zeljkof: fighting with hangout [08:01:22] hashar: :D [08:05:05] hashar: https://github.com/zeljkofilipin/selenium-vagrant [08:11:56] 10Browser-Tests, 5Patch-For-Review: IE Browser tests job have no test being run due to a mistake in cucumber tag - https://phabricator.wikimedia.org/T95398#1208843 (10hashar) Waiting for @zeljkofilipin to confirm the jobs are working just fine. [08:13:28] 10Browser-Tests: Transfer the main Sauce Labs account to a generic WMF account - https://phabricator.wikimedia.org/T94191#1208845 (10hashar) Please document on Office Wiki the SauceLabs account and email alias being used :) [08:15:58] !log Jenkins process went wild taking all CPU busy on gallium [08:16:04] Logged the message, Master [08:17:46] !log CPU spike started around 6:20am UTC [08:19:52] !log Exception in thread "RequestHandlerThread[#2]" java.lang.OutOfMemoryError: Java heap space [08:19:56] stupid bot [08:20:12] Logged the message, Master [08:22:54] !log restarted Jenkins [08:22:59] Logged the message, Master [09:06:35] 5Continuous-Integration-Isolation, 6Phabricator: Create a yellow project for 'nodepool' - https://phabricator.wikimedia.org/T95965#1208899 (10hashar) Self remember, should tag P513 and P514 with it. [09:08:38] 10Continuous-Integration, 5Continuous-Integration-Isolation, 6operations: install/setup/deploy cobalt as replacement for gallium - https://phabricator.wikimedia.org/T95959#1208904 (10hashar) Should we start drawing a network diagram representing the different lan / vlan we have and the traffic flows between... [09:26:32] 10Continuous-Integration, 5Continuous-Integration-Isolation, 6operations, 5Patch-For-Review, 7Upstream: Create a Debian package for NodePool on Debian Jessie - https://phabricator.wikimedia.org/T89142#1208920 (10hashar) From upstream at http://lists.openstack.org/pipermail/openstack-infra/2015-April/0026... [09:57:04] 10Continuous-Integration, 5Continuous-Integration-Isolation, 6operations, 5Patch-For-Review: Provide Debian package python-pymysql for jessie-wikimedia - https://phabricator.wikimedia.org/T96131#1208959 (10hashar) 3NEW [10:11:23] 5Continuous-Integration-Isolation, 6Phabricator: Create a yellow project for 'nodepool' - https://phabricator.wikimedia.org/T95965#1208969 (10Aklapper) p:5Triage>3Normal Feel free to just paste a project description here (like in the Zuul link that you pasted) so I don't have to come up with one. ;) See ht... [10:16:04] 10Continuous-Integration, 5Continuous-Integration-Isolation, 6operations: Provide Debian package python-pymysql for jessie-wikimedia - https://phabricator.wikimedia.org/T96131#1208971 (10hashar) [10:25:49] 10Continuous-Integration, 5Continuous-Integration-Isolation, 6operations, 5Patch-For-Review, 7Upstream: Create a Debian package for NodePool on Debian Jessie - https://phabricator.wikimedia.org/T89142#1208974 (10hashar) [10:26:33] 10Continuous-Integration, 5Continuous-Integration-Isolation, 6operations, 5Patch-For-Review, 7Upstream: Create a Debian package for NodePool on Debian Jessie - https://phabricator.wikimedia.org/T89142#1028174 (10hashar) [10:26:58] 10Continuous-Integration, 5Continuous-Integration-Isolation, 6operations, 5Patch-For-Review, 7Upstream: Create a Debian package for NodePool on Debian Jessie - https://phabricator.wikimedia.org/T89142#1028174 (10hashar) I have updated the dependency table in the task details to take in account Jessie ins... [11:41:12] 10Browser-Tests, 5Patch-For-Review: IE Browser tests job have no test being run due to a mistake in cucumber tag - https://phabricator.wikimedia.org/T95398#1209023 (10zeljkofilipin) All done! {icon ship} [11:41:24] 10Browser-Tests, 5Patch-For-Review: IE Browser tests job have no test being run due to a mistake in cucumber tag - https://phabricator.wikimedia.org/T95398#1209024 (10zeljkofilipin) 5Open>3Resolved [12:26:27] 10Browser-Tests, 5Patch-For-Review: IE Browser tests job have no test being run due to a mistake in cucumber tag - https://phabricator.wikimedia.org/T95398#1209040 (10hashar) \o/ [12:27:57] 6Release-Engineering, 10Labs-Vagrant, 10MediaWiki-General-or-Unknown, 10MediaWiki-Vagrant, 10Wikimedia-Git-or-Gerrit: Create ability to trivially spin up MediaWiki instance of a particular Gerrit changeset - https://phabricator.wikimedia.org/T76245#1209044 (10hashar) a:5hashar>3None [12:28:17] 6Release-Engineering, 10Labs-Vagrant, 10MediaWiki-General-or-Unknown, 10MediaWiki-Vagrant, 10Wikimedia-Git-or-Gerrit: Create ability to trivially spin up MediaWiki instance of a particular Gerrit changeset - https://phabricator.wikimedia.org/T76245#793630 (10hashar) Unassigning myself since that is merel... [12:33:24] PROBLEM - English Wikipedia Mobile Main page on beta-cluster is CRITICAL: Name or service not known [12:35:01] 10Continuous-Integration, 7Regression, 7Upstream: ERROR: Failed to notify endpoint 'HTTP:http://127.0.0.1:8001/jenkins_endpoint' - https://phabricator.wikimedia.org/T93321#1209053 (10hashar) Found an upstream bug report: https://issues.jenkins-ci.org/browse/JENKINS-27323 [12:39:23] (03PS4) 10Zfilipin: Move WB_REPO_PASSWORD environment variable to Jenkins Credentials plugin store [integration/config] - 10https://gerrit.wikimedia.org/r/203309 (https://phabricator.wikimedia.org/T89343) [12:39:53] (03CR) 10Zfilipin: [C: 032] "Got 2 +1s, rebasing and merging." [integration/config] - 10https://gerrit.wikimedia.org/r/203309 (https://phabricator.wikimedia.org/T89343) (owner: 10Zfilipin) [12:42:45] 10Continuous-Integration, 7Regression, 7Upstream: ERROR: Failed to notify endpoint 'HTTP:http://127.0.0.1:8001/jenkins_endpoint' - https://phabricator.wikimedia.org/T93321#1209066 (10hashar) I have downgraded the plugin from 1.9 to previous 1.7. [12:42:51] (03Merged) 10jenkins-bot: Move WB_REPO_PASSWORD environment variable to Jenkins Credentials plugin store [integration/config] - 10https://gerrit.wikimedia.org/r/203309 (https://phabricator.wikimedia.org/T89343) (owner: 10Zfilipin) [12:43:11] hashar: argh! jenkins is behaving again :'( [12:43:21] zeljkof-lunch: what do you mean? [12:43:35] * zeljkof has forgot to change the nick back [12:44:05] hashar: I am trying to save https://integration.wikimedia.org/ci/configure after deleting WB_REPO_PASSWORD [12:44:18] and getting 503 [12:44:23] ganglia shows a spike of IO load http://ganglia.wikimedia.org/latest/?c=Miscellaneous%20eqiad&h=gallium.wikimedia.org&m=cpu_report&r=hour&s=descending&hc=4&mc=2 [12:45:49] hashar: anyway, looks like the env var is deleted [12:47:28] seems it was a short spike [12:48:07] oh [12:48:08] Handling POST /ci/configSubmit from 109.60.85.134 : RequestHandlerThread[#122] [12:48:17] that is a stuck thread for 350 seconds [12:48:19] seen at https://integration.wikimedia.org/ci/monitoring [12:48:25] RECOVERY - English Wikipedia Mobile Main page on beta-cluster is OK: HTTP OK: HTTP/1.1 200 OK - 27705 bytes in 0.547 second response time [12:48:39] killed it [12:56:05] 10Browser-Tests: Transfer the main Sauce Labs account to a generic WMF account - https://phabricator.wikimedia.org/T94191#1209089 (10zeljkofilipin) > Renata Santillan > Sauce Labs > > Hi Željko, > > I have moved all the accounts to be a subaccount of wikimedia. Please check it out and let me know if anything... [13:01:25] 5Continuous-Integration-Isolation, 6Phabricator: Create a yellow project for 'nodepool' - https://phabricator.wikimedia.org/T95965#1209095 (10hashar) There is no documentation written yet, the project is still in its early phase. Copy pasting from Zuul I came up with: ---- Nodepool is a python daemon which s... [13:03:42] 10Browser-Tests: Transfer the main Sauce Labs account to a generic WMF account - https://phabricator.wikimedia.org/T94191#1209104 (10zeljkofilipin) > zfilipin > Wikimedia > > Hi Renata, > > everything looks fine. > > Thanks! > > Željko > > April 15, 2015, 3:02 PM [13:12:17] 10Browser-Tests: Transfer the main Sauce Labs account to a generic WMF account - https://phabricator.wikimedia.org/T94191#1209121 (10zeljkofilipin) >>! In T94191#1208845, @hashar wrote: > Please document on Office Wiki the SauceLabs account and email alias being used :) [[ https://office.wikimedia.org/w/index.p... [13:13:32] 10Continuous-Integration, 7Upstream: Change force merged cause a deadlock in Zuul gate-and-submit pipeline - https://phabricator.wikimedia.org/T93812#1209124 (10hashar) That was from a discussion on #openstack-infra http://eavesdrop.openstack.org/irclogs/%23openstack-infra/%23openstack-infra.2015-04-14.log -... [13:16:12] 10Browser-Tests: Create Sauce Labs account for Andrew Russell Green - https://phabricator.wikimedia.org/T94192#1209127 (10zeljkofilipin) [13:16:13] 10Browser-Tests, 6Release-Engineering: Things to do after Chris leaves - https://phabricator.wikimedia.org/T94032#1209126 (10zeljkofilipin) [13:16:15] 10Browser-Tests: Transfer the main Sauce Labs account to a generic WMF account - https://phabricator.wikimedia.org/T94191#1209125 (10zeljkofilipin) 5Open>3Resolved [13:16:56] 10Browser-Tests, 6Release-Engineering: Things to do after Chris leaves - https://phabricator.wikimedia.org/T94032#1209129 (10zeljkofilipin) 5stalled>3Resolved a:3zeljkofilipin [13:17:25] 10Browser-Tests: Create Sauce Labs account for Andrew Russell Green - https://phabricator.wikimedia.org/T94192#1209131 (10zeljkofilipin) a:3zeljkofilipin [13:20:21] 10Browser-Tests: Create Sauce Labs account for Andrew Russell Green - https://phabricator.wikimedia.org/T94192#1209132 (10zeljkofilipin) Sent invitation to agreen@wikimedia.org. [13:20:39] 10Browser-Tests: Create Sauce Labs account for Andrew Russell Green - https://phabricator.wikimedia.org/T94192#1209136 (10zeljkofilipin) a:5zeljkofilipin>3AndyRussG [13:22:13] 10Browser-Tests: Create Sauce Labs account for Andrew Russell Green - https://phabricator.wikimedia.org/T94192#1157154 (10zeljkofilipin) @andyrussg: please mark the task as resolved when you create the account. Apologies for the delay, the main account was created today. [13:22:40] zeljkof: thanks! [13:23:22] AndyRussG: apologies for the delay, the main account was created earlier today :( [13:23:33] zeljkof: no problem at all! :) [13:23:40] Really appreciate the help [13:27:12] AndyRussG: I am glad I could help, I just wish it was done a week or two ago :) [13:53:05] zeljkof: it's cool! I'm glad to see browser tests working again BTW [13:53:34] Now that they're back up, I'll make a patch to send a message to fr-tech whenever they fail--that's OK, I guess? [13:54:31] AndyRussG: that is actually preferred :) [14:10:57] zeljkof: K will do! Gotta run for now, thanks, cya :) [14:32:33] 10Continuous-Integration, 10Ops-Access-Requests, 6operations: Add user wmde-fisch to LDAP group wmde - https://phabricator.wikimedia.org/T95546#1209309 (10hashar) The Jenkins account shows up with the 'wmde' group at https://integration.wikimedia.org/ci/user/wmde-fisch/ @WMDE-Fisch should thus be able to co... [14:42:05] 10Continuous-Integration, 10Wikidata, 10Wikidata-Sprint-2015-04-07: the changed job configuration extension-unittests -> extension-unittests-generic for Wikidata.git makes it not run all tests and fail - https://phabricator.wikimedia.org/T95897#1209321 (10hashar) To test extensions we load them from their PH... [15:00:45] 10Continuous-Integration: Dequeue changes from 'test' when they enter 'gate-and-submit' pipeline - https://phabricator.wikimedia.org/T78328#1209356 (10hashar) [15:01:18] 10Continuous-Integration: Dequeue changes from 'test' when they enter 'gate-and-submit' pipeline - https://phabricator.wikimedia.org/T78328#842598 (10hashar) I have rephrased the task detail, the idea would be to have Zuul to dequeue from other pipelines when a change enter one. [15:05:20] 10Continuous-Integration: php-composer-validate job should not be triggered if a composer.json file is removed from the repository - https://phabricator.wikimedia.org/T89263#1209368 (10hashar) In Zuul, the job has a file filter applied to it: > **files (optional)** > This job should only be run if at least one... [15:07:18] 10Continuous-Integration, 10MediaWiki-extensions-ContentTranslation, 5Patch-For-Review: QUnit tests for ContentTranslation fail - unable to merge commits - https://phabricator.wikimedia.org/T93510#1209373 (10hashar) Is that still an issue? [15:08:38] 10Browser-Tests, 10Continuous-Integration: Have wmf-insecte use color to make the reading of the scrollback easier - https://phabricator.wikimedia.org/T64573#1209374 (10hashar) [15:11:16] 10Continuous-Integration, 10MediaWiki-extensions-ContentTranslation, 5Patch-For-Review: QUnit tests for ContentTranslation fail - unable to merge commits - https://phabricator.wikimedia.org/T93510#1209381 (10Nikerabbit) I haven't seen that error in a while, if that is what you are asking. [15:12:42] 10Browser-Tests, 10Continuous-Integration, 7Upstream: Have wmf-insecte use color to make the reading of the scrollback easier - https://phabricator.wikimedia.org/T64573#1209384 (10hashar) Renamed wmf-selenium-bot to wmf-insecte. The messages are sent by the [[ https://wiki.jenkins-ci.org/display/JENKINS/IRC... [15:12:51] 10Browser-Tests, 10Continuous-Integration, 7Upstream: Have wmf-insecte use color to make the reading of the scrollback easier - https://phabricator.wikimedia.org/T64573#1209388 (10hashar) [15:13:09] 10Browser-Tests, 6Release-Engineering, 7Upstream: Do not say "< wmf-insecte> Yippee, build fixed!" - https://phabricator.wikimedia.org/T95395#1188546 (10hashar) [15:13:17] 10Browser-Tests, 6Release-Engineering, 7Upstream: Do not say "< wmf-insecte> Yippee, build fixed!" - https://phabricator.wikimedia.org/T95395#1188546 (10hashar) [15:13:40] 10Browser-Tests, 6Release-Engineering, 7Upstream: Do not say "< wmf-insecte> Yippee, build fixed!" - https://phabricator.wikimedia.org/T95395#1188546 (10hashar) Related is {T64573} [15:15:58] 10Browser-Tests, 10Continuous-Integration: Cucumber linter should run for all repositories that contain Cucumber code - https://phabricator.wikimedia.org/T58251#1209406 (10hashar) Don't you need to use bundler to run it ? Supposedly page object would be provided so: bundle install bundle exec cucumber -d W... [15:16:16] I am processing my phabricator emails queue [15:26:22] 10Continuous-Integration, 10Wikimedia-Hackathon-2015, 7Upstream: All new extensions should be setup automatically with Zuul - https://phabricator.wikimedia.org/T92909#1209434 (10hashar) [15:48:01] (03PS1) 10Aude: Update Wikidata branch to wmf/1.26wmf2 [tools/release] - 10https://gerrit.wikimedia.org/r/204286 [15:51:10] (03PS1) 10JanZerebecki: Remove Wikidata jslint job [integration/config] - 10https://gerrit.wikimedia.org/r/204287 [16:01:53] hashar: hey, coming, waiting for the computer to boot [16:03:56] hashar: when you've time, https://gerrit.wikimedia.org/r/#/c/202689 [16:11:12] 6Release-Engineering: Make qunit test failures rcontain useful and readable information about where does it come from, how did you get there, etc - https://phabricator.wikimedia.org/T96072#1209539 (10EBernhardson) [16:12:06] hashar: I'm going to go up to the room where the QR is going to be, so, hit me on email if you want :) [16:12:57] 6Release-Engineering: Make qunit test failures contain useful and readable information about where does it come from, how did you get there, etc - https://phabricator.wikimedia.org/T96072#1207596 (10EBernhardson) [16:13:21] (03CR) 1020after4: [C: 032] Update Wikidata branch to wmf/1.26wmf2 [tools/release] - 10https://gerrit.wikimedia.org/r/204286 (owner: 10Aude) [16:23:25] (03Merged) 10jenkins-bot: Update Wikidata branch to wmf/1.26wmf2 [tools/release] - 10https://gerrit.wikimedia.org/r/204286 (owner: 10Aude) [16:24:27] (03PS1) 1020after4: ignore local.conf [tools/release] - 10https://gerrit.wikimedia.org/r/204289 [16:43:49] 10Browser-Tests: Create Sauce Labs account for Andrew Russell Green - https://phabricator.wikimedia.org/T94192#1209656 (10AndyRussG) 5Open>3Resolved [16:44:35] 10Browser-Tests: Create Sauce Labs account for Andrew Russell Green - https://phabricator.wikimedia.org/T94192#1157154 (10AndyRussG) @zeljkofilipin thanks again! [17:13:55] (03PS1) 10AndyRussG: E-mail fr-tech for CentralNotice browser tests [integration/config] - 10https://gerrit.wikimedia.org/r/204295 [17:29:42] hashar: greg-g: does this small patch look right? https://gerrit.wikimedia.org/r/204295 [17:32:59] AndyRussG: to my barely trained eye, yep! [17:33:33] btw, done with our section of the quarterly review, went well I think [17:41:32] greg-g: cool, congrats [17:42:05] FR was Monday [18:00:35] ^d: so in looking at the checkoutMediaWiki patch, and doing some scaps around staging, it seems like l10nupdate-1 does all the skins, extensions, and even core checkout if need be—does this seem right? [18:02:22] <^d> l10nupdate-1 is a black box to me [18:09:24] this is the main piece I'm looking at: https://github.com/wikimedia/operations-puppet/blob/production/modules/scap/files/l10nupdate-1#L40-L67 [18:10:02] now that I'm looking at it though, it won't work since core already has a skins and extension directory all it will do is try to run a git pull on those :\ [18:14:01] hmm actually it seems like it's cloning everything again to /var/lib/l10nupdate/mediawiki, huh. [18:17:33] 10Browser-Tests, 10MediaWiki-extensions-MultimediaViewer, 6Mobile-Web, 10MobileFrontend, 6Multimedia: Should be possible in browser tests to use images with meta data or without meta data - https://phabricator.wikimedia.org/T67274#1209973 (10phuedx) Should this task be closed? @Jdlrobson: At the very le... [18:17:40] 10Browser-Tests, 3Fundraising Sprint House of Pain, 10MediaWiki-extensions-CentralNotice, 5Patch-For-Review: CentralNotice bucket improvements, step 3 bis part 2: moar better cross-browser tests - https://phabricator.wikimedia.org/T86092#1209974 (10AndyRussG) Now closing this card, since we have a full hou... [18:17:59] <^d> thcipriani: Yeah it only does that to fetch i18n changes [18:18:35] 10Browser-Tests, 3Fundraising Sprint House of Pain, 10MediaWiki-extensions-CentralNotice, 5Patch-For-Review: CentralNotice bucket improvements, step 3 bis part 2: moar better cross-browser tests - https://phabricator.wikimedia.org/T86092#1209975 (10AndyRussG) 5Open>3Resolved [18:19:45] <^d> thcipriani: /var/lib/ is a funny place to stash that :) [18:19:57] <^d> It should really be /srv or /tmp [18:20:52] 10Browser-Tests, 3Fundraising Sprint House of Pain, 10MediaWiki-extensions-CentralNotice, 5Patch-For-Review: CentralNotice bucket improvements, step 3 bis part 2: moar better cross-browser tests - https://phabricator.wikimedia.org/T86092#1209977 (10AndyRussG) (The idea of running QUnit on many browsers [se... [18:21:18] ^d: is the process of setting up the mediawiki-staging dir documented somewhere? Right now, I'm just syncing to staging-mw01, seeing what breaks, tracking down the scripts to fix, repeat. [18:21:53] <^d> Nope, no documented process. [18:21:57] <^d> Not yet, at least [18:22:10] <^d> We should puppetize what's not [18:22:32] <^d> So far it gets you to at least a working host, but you have to run checkoutMW yourself [18:22:53] heh, oh good. [18:23:19] yeah, after checkoutMW there are a handful of errors you get still when you try to run from php-master on staging-mw01 [18:23:44] I'm documenting errors and fixes as I go, I'll put them on the bug when I have anything working to show for it. [18:26:08] 10Browser-Tests, 10MediaWiki-extensions-MultimediaViewer, 6Mobile-Web, 10MobileFrontend, 6Multimedia: Should be possible in browser tests to use images with meta data or without meta data - https://phabricator.wikimedia.org/T67274#1210005 (10Jdlrobson) It sounds like this is no longer needed by anyone so... [18:26:14] 10Browser-Tests, 10MediaWiki-extensions-MultimediaViewer, 6Multimedia: Should be possible in browser tests to use images with meta data or without meta data - https://phabricator.wikimedia.org/T67274#1210006 (10Jdlrobson) [18:26:23] 10Browser-Tests: Should be possible in browser tests to use images with meta data or without meta data - https://phabricator.wikimedia.org/T67274#712014 (10Jdlrobson) [18:41:07] 10Continuous-Integration, 10VisualEditor: Concurrent builds using local Chromium/Firefox browsers on Linux host fail - https://phabricator.wikimedia.org/T90673#1210074 (10Jdforrester-WMF) [18:43:01] 10Continuous-Integration, 10VisualEditor: Concurrent builds using local Chromium/Firefox browsers on Linux host fail - https://phabricator.wikimedia.org/T90673#1064651 (10Jdforrester-WMF) At the weekly triage meeting today, we decided that this wasn't urgent for this quarter, so declined to accept it. [18:46:25] 10Continuous-Integration, 10MediaWiki-Unit-tests, 7JavaScript: Apache on Jenkins slave can take over 30s to respond - https://phabricator.wikimedia.org/T95971#1210083 (10Krinkle) Nope, this still happens several times a day. https://integration.wikimedia.org/ci/job/mwext-VisualEditor-qunit/15050/consoleFull... [18:49:09] hashar: ^ [18:49:15] hashar: This is getting worse. [18:49:26] Something happened causing our CI slave apache to be much much slower than it used to be. [19:00:30] 10Browser-Tests, 6Collaboration-Team, 10Flow, 5Patch-For-Review: Fix failed Flow browsertests Jenkins jobs - https://phabricator.wikimedia.org/T94153#1210134 (10DannyH) [19:09:53] 5Continuous-Integration-Isolation, 10hardware-requests, 6operations: eqiad: 2 hardware access request for CI isolation on labsnet - https://phabricator.wikimedia.org/T93076#1210155 (10hashar) labnodepool1001 has been installed and is ready for service implementation scandium (zuul mergers) should land in la... [19:19:48] hashar: ping [19:22:11] hashar: It's happening like every other build. Everything is backlogged behind failures. This is gonna cause severe delays. [19:24:05] !log Aborting beta-scap-eqiad. Has been stuck for 2 hours on "Notifying IRC" after "Connection time out" from scap. [19:24:12] Logged the message, Master [19:24:50] !log Aborting browser tests jobs. Stuck for over 5 hours. [19:24:53] Logged the message, Master [19:27:03] It's probably suck on the next step after IRC notif, which is e-mail [19:27:08] It won't abort. [19:27:25] Anyway, beta and browsertests are less important right now. Pre-merge jobs are timing out because of Apache somehow. [19:28:26] 10Continuous-Integration, 10MediaWiki-extensions-ContentTranslation, 5Patch-For-Review: QUnit tests for ContentTranslation fail - unable to merge commits - https://phabricator.wikimedia.org/T93510#1210192 (10hashar) Sorry I have missed the fact this task has been closed by Santhosh back on March 24th [19:30:10] Krinkle: do you mean jobs are stuck ? [19:30:30] Pre-merge jobs time out because of Jenkins. Beta and browser tests are stuck for 6 hours and Jenkins won't abort. [19:30:37] bah [19:30:39] because of Apache* [19:30:47] https://phabricator.wikimedia.org/T95971#1210083 [19:30:59] UTC 14:47:43 IRC notifier plugin: Sending notification to: #wikimedia-releng [19:31:00] https://integration.wikimedia.org/ci/job/beta-scap-eqiad/49113/ will not abort [19:31:14] yeah same deal there [19:31:15] If yu look in configure, the step after IRC is Email [19:31:19] let me look at threads [19:31:23] cool [19:31:32] https://www.mediawiki.org/wiki/Continuous_integration/Jenkins#Debugging [19:31:38] -- > https://integration.wikimedia.org/ci/monitoring?part=threadsDump [19:32:34] "Executor #0 for deployment-bastion.eqiad : executing beta-scap-eqiad #49113" prio=5 BLOCKED [19:32:34] hudson.plugins.im.IMConnectionProvider.currentConnection(IMConnectionProvider.java:83) [19:32:35] hudson.plugins.ircbot.IrcPublisher.getIMConnection(IrcPublisher.java:102) [19:32:36] Interesting [19:33:29] https://issues.jenkins-ci.org/browse/JENKINS-9233 [19:33:34] similar issue [19:33:46] "There seems to be no way to recover besides killing the whole Jenkins service and restarting." [19:34:53] yuouuuuu [19:35:00] it happened last week [19:35:04] PROBLEM - Puppet failure on integration-slave-jessie-1001 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [19:35:32] must be related to some recent plugin upgrade :/ [19:37:26] 10Continuous-Integration, 7Jenkins, 7Upstream: Jenkins: Builds (for beta cluster and browser tests) are stuck forever if IRC notification failed - https://phabricator.wikimedia.org/T96183#1210216 (10Krinkle) 3NEW [19:38:45] hashar: I noticed that the relevant browser tests jobs were reconfigured by zeljkof earlier today. This is the first build since then [19:39:20] hashar: Anyhow, beta/browserests are less important. [19:39:31] less important than what ? [19:39:33] People are blocked on merging due to Apache timeouts [19:39:42] does it prevent them from working ? [19:39:46] Yes. [19:39:50] Deployments, tests. everything. [19:40:12] False negatives on main pre-merge builds are important. [19:40:47] Especially because Zuul does not report back the failures until everything in the queue has finished. [19:40:57] So you'll get it back like 2 hours after submission and then try again. [19:41:06] This has been slowing down development adn productivity for 2 days now [19:41:23] If James were here he'd probably make some 1000s dollar figure about much much this is costing us. [19:41:52] so why nobody have pressed the red button yet until now? [19:42:05] hashar: I filed the bug yesterday first thing after I saw it [19:42:13] You presumed it was related to labs outage [19:42:20] People just self-merge [19:42:23] that's what happens [19:42:27] this kind of slow ness is unacceptbale. [19:42:43] Most people are not like us, slow tests means deploy anyway, who fucking cares right? [19:42:47] Well, unfortunately, that's the reality. [19:43:23] tcp6 160 0 208.80.154.135:39429 193.219.128.49:7000 CLOSE_WAIT 1519/java [19:43:41] that is the Jenkins connection to sendak.freenode.net. [19:43:58] Anyway, the beta jobs being stuck is not related because htis is happening on the slaves. and since yesterday when the beta jobs were not stuck. [19:44:54] 10Continuous-Integration, 7Jenkins, 7Upstream: Jenkins: Builds (for beta cluster and browser tests) are stuck forever if IRC notification failed - https://phabricator.wikimedia.org/T96183#1210240 (10hashar) tcp6 160 0 208.80.154.135:39429 193.219.128.49:7000 CLOSE_WAIT 1519/java That i... [19:47:11] looking at the irc plugin global conf [19:51:24] 10Continuous-Integration, 7Jenkins: Jobs stuck indefinitely after they are finished - https://phabricator.wikimedia.org/T91430#1210266 (10Krinkle) 5Open>3declined a:3Krinkle Can't reproduce and this task is too generic. Something specific like T96183 should be reported instead. [19:52:24] 10Continuous-Integration, 10MediaWiki-Unit-tests, 7JavaScript: Apache on Jenkins slave takes over 30s to respond (QUnit/AJAX "Test timed out") - https://phabricator.wikimedia.org/T95971#1210273 (10Krinkle) [19:54:05] <^d> thcipriani: I started a patch to put the dsh group file in hiera, will let us unbreak scap & co [19:54:12] Krinkle: so yeah the IRC plugin is deadlocked :/ [19:55:55] ^d: I started looking at that with create_resources, then realized it was a bad idea, is you patch on staging-palladium? [19:56:11] <^d> I haven't tried on staging-palladium yet [19:56:16] hashar: Let's wait with restarting until after deployments [19:56:20] <^d> It's still WIP, checking out in puppet compiler [19:56:23] there's still enough executors [20:03:37] ^d: looks like a success. Also seems like the only righteous way to hieraize that. [20:04:07] <^d> :) [20:07:05] Krinkle: if only we could terminate the stuck connection [20:07:12] will ask ops [20:09:11] cool [20:22:25] Krinkle: have you done anything? [20:22:41] the close wait is gone [20:22:43] hacked it [20:26:31] so the CLOSE_WAIT is gone [20:26:38] but that does not unlock jenkins anyway [20:27:13] thcipriani: ^d I’m in the remoties meeting, and also doing some migrations for tools. I’ll look at your patches right after :) [20:27:56] 10Continuous-Integration, 7Jenkins, 7Upstream: Jenkins: Builds (for beta cluster and browser tests) are stuck forever if IRC notification failed - https://phabricator.wikimedia.org/T96183#1210319 (10hashar) I finished the CLOSE_WAIT connection by injecting an ACK packet pretending to be from freenode server:... [20:34:09] !log hard restarting Jenkins [20:34:17] Logged the message, Master [20:36:19] hashar: I did nothing [20:37:53] 10Continuous-Integration, 7Jenkins, 7Upstream: Jenkins: Builds (for beta cluster and browser tests) are stuck forever if IRC notification failed - https://phabricator.wikimedia.org/T96183#1210331 (10hashar) I have stopped Jenkins then `kill -9` it and started it back. The jobs and config saves were blocked... [20:38:02] Krinkle: yeah just wanted to be sure [20:38:15] so I managed to get rid of the CLOES_WAIT connection [20:38:22] but that did not release the lock in the irc plugin [20:38:40] so Jenkins back anyway [20:38:52] it came back in roughly 4 minutes! [20:39:07] I've started deployment-bastion again [20:39:16] oh and there's a job [20:39:41] so in short [20:39:46] java is painful [20:39:51] cause I have no idea how the code work [20:39:57] nor I have any idea how to hook / debug it [20:40:58] Krinkle: so I am more or less in the position you are: clueless [20:41:13] with /monitoring/ we can have some clue about what is being blocked though [20:41:20] by looking at the threads and sort them by status [20:41:32] BLOCKED ones are usually good candidates [20:42:39] hashar: Yeah [20:42:47] It's good to be able to see that is actually is blocked, not just slow [20:43:12] 10Continuous-Integration, 7Jenkins, 7Upstream: Jenkins: Builds (for beta cluster and browser tests) are stuck forever if IRC notification failed - https://phabricator.wikimedia.org/T96183#1210332 (10hashar) After restarting the `JenkinsIsBusyListener-thread` looks like: ``` sun.misc.Unsafe.park(Native Method... [20:44:20] one positive note [20:44:34] I found out the irc plugin global configuration as a check box to enable color output in irc message [20:44:39] and we have a task for that :) [20:46:58] hashar: Hm.. okay :) [20:48:58] hashar: I'm trying to debug the slowness issue, but not sure what to debug. The jenkins log just stalls for 30 seconds. there's nothing in the mediawiki debug or error log. Nothing in apache error log. [20:49:08] 10Browser-Tests, 10Continuous-Integration, 7Upstream: Have wmf-insecte use color to make the reading of the scrollback easier - https://phabricator.wikimedia.org/T64573#1210347 (10hashar) We had Jenkins deadlocking on IRC notifications for most of the day (T96183). While looking at the global Jenkins configu... [20:49:23] color option is https://phabricator.wikimedia.org/T64573#1210347 :D [20:49:46] Krinkle: that is jobs hitting the local qunit virtual host right? [20:49:53] Yes [20:50:07] Checking mysql logs now in case that's where its slow [20:53:51] all empty. mysql.log, mysql.err, mysql/error.log [20:53:54] literally [20:54:04] (03CR) 10Hashar: "Almost right! Need to include qa-alerts@lists.wikimedia.org as well." (031 comment) [integration/config] - 10https://gerrit.wikimedia.org/r/204295 (owner: 10AndyRussG) [20:54:14] (03CR) 10Hashar: [C: 04-1] E-mail fr-tech for CentralNotice browser tests [integration/config] - 10https://gerrit.wikimedia.org/r/204295 (owner: 10AndyRussG) [20:54:30] Krinkle: do you have an example build failing ? [20:54:41] https://integration.wikimedia.org/ci/job/mwext-VisualEditor-qunit/15050 [20:55:16] https://integration.wikimedia.org/ci/job/mediawiki-core-qunit/40383/ [20:55:34] https://integration.wikimedia.org/ci/job/mediawiki-core-qunit/40370/ [20:55:35] etc. [20:55:40] so unfinished ajax request [20:55:49] is because some request to the mediawiki api is still going on right? [20:55:50] No, that's a side effect [20:55:56] and it is expected to have terminated already? [20:56:03] It's making a request to load.php for language.data module [20:56:05] That request times out [20:56:08] ah [20:56:13] see full log and look at first error [20:56:16] the rest is cascading errors [20:56:43] 18:48:17 ................................................................................ [20:56:43] 18:48:47 Chromium 41.0.2272 (Ubuntu) mediawiki.jqueryMsg Match PHP parser FAILED Test timed out [21:01:01] bah [21:01:33] on integration-slave-trusty-1011 looking at fgrep jenkins-mwext-VisualEditor-qunit-15050 /var/log/apache2/qunit_access.log [21:01:41] localhost:80 127.0.0.1 - - [15/Apr/2015:17:46:51 +0000] "GET /jenkins-mwext-VisualEditor-qunit-15050/load.php?skin=fallback&lang=hi&debug=false&modules=mediawiki.language.data%7Cmediawiki.language&only=scripts HTTP/1.1" 200 6319 "http://localhost:9876/context.html" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/41.0.2272.76 Chrome/41.0.2272.76 Safari/537.36" [21:01:44] that is the last query [21:03:20] Krinkle: could it be some of those query ending up triggering a huge process on the backend [21:03:28] such as rebuilding the whole localization cache ? [21:04:49] the www log indeed has nothing https://integration.wikimedia.org/ci/job/mwext-VisualEditor-qunit/15050/artifact/log/mw-debug-www.log/*view*/ [21:04:49] :( [21:05:16] hashar: It's a new install, so for each language the test usees, the first time it will build the localisation cache yeah [21:05:21] But we've been doing that since for ever [21:05:24] same in phpunit [21:05:30] I have sent a patch that should get us timestamp in debug logs $wgDebugTimestamps = true ==> https://gerrit.wikimedia.org/r/#/c/203806/ [21:05:41] yeah I agree [21:05:51] but then the code base being tested is different [21:06:02] OK [21:06:08] so maybe there is a regression in core [21:06:30] another possibility is that the requests takes too long because several jobs are running in paralel and mysql can't keep up [21:06:36] but then, there is nothing specific in the log [21:07:47] I'm logged in on trusty-1012 (depooled) gonna try and reproduce [21:08:42] I am not sure why the request is not logged though [21:08:51] It is logged, the request itself [21:09:00] Start request GET /jenkins-mwext-VisualEditor-qunit-15050/load.php?skin=fallback&lang=ml&debug=false&modules=mediawiki.language.data%7Cmediawiki.language&only=scripts [21:09:03] https://integration.wikimedia.org/ci/job/mwext-VisualEditor-qunit/15050/artifact/log/mw-debug-www.log/*view*/ [21:09:04] oh [21:09:21] which is seen on apache side just fine [21:09:24] And 10 others for different language codes [21:09:50] (not in parallel, the test does one at a time and wait for tests to finish) [21:12:54] Krinkle: so before I get to bed: if in doubt stop jenkins / kill -9 the process and start it again :) [21:13:12] and sorry to have no further idea/clues about the timeout you are fighting with [21:13:30] Okay. thanks :) [21:13:39] maybe enabling mediawiki profiling can yields some more info [21:15:14] !log Jenkins browser test jobs sometime deadlock because of the IRC notification plugin https://phabricator.wikimedia.org/T96183 [21:15:22] Logged the message, Master [21:18:18] have a good night! [21:18:49] good night! [21:24:59] 6Release-Engineering, 6Engineering-Community, 6Team-Practices, 10Wikimedia-Hackathon-2015, 3ECT-April-2015: RelEng team offsite - May 2015 - Pre Wikimedia Hackathon - https://phabricator.wikimedia.org/T89036#1210506 (10Qgil) This task was part of #ECT-March-2015, and it is still open and assigned. Assumi... [21:32:12] 10Browser-Tests, 6Mobile-Web: Issue with Chrome driver with resizing window - https://phabricator.wikimedia.org/T88288#1210559 (10Jdlrobson) Is this still an issue @zeljkofilipin ? [21:36:02] Hi, can I get access (and sudo) to deployment-eventlogging02.eqiad.wmflabs? [21:43:55] yes [21:44:02] what is your wikitech account name? [21:44:13] Krenair: already added him (backscroll in -analytics) [21:44:20] altohugh this is the last time I’m doing that hopefully :) [21:44:52] but not as an admin? [21:45:05] oh right, that's broken and always follows in the next edit to the page. urgh. [21:46:13] Krenair: you don’t need to be admin to have sudo [21:46:49] "have sudo" means ALL as ALL or specific commands? [21:47:01] (we never say) [21:47:28] ALL [21:47:38] then it's more like "have root" [21:47:38] ALL as ALL [21:47:44] potato, potato [21:48:02] mmmm, potatoes! [21:48:04] * YuviPanda goes to find food [21:48:09] how does that work without being project admin? [21:51:42] <^d> Custom sudo groups [21:59:35] (03CR) 10Krinkle: [C: 032] mediawiki: set $wgDebugTimestamps [integration/jenkins] - 10https://gerrit.wikimedia.org/r/203806 (owner: 10Hashar) [22:00:41] (03Merged) 10jenkins-bot: mediawiki: set $wgDebugTimestamps [integration/jenkins] - 10https://gerrit.wikimedia.org/r/203806 (owner: 10Hashar) [22:00:47] 10Continuous-Integration, 10MediaWiki-Unit-tests, 7JavaScript: Apache on Jenkins slave takes over 30s to respond (QUnit/AJAX "Test timed out") - https://phabricator.wikimedia.org/T95971#1210658 (10Krinkle) a:3Krinkle [22:02:22] 10Continuous-Integration: Zuul status page should show the pipelines "window" value - https://phabricator.wikimedia.org/T93701#1210665 (10Krinkle) The Zuul status page is pretty stable right now. Let's do enhancements like this upstream instead? [22:02:33] 10Continuous-Integration: Zuul status page should show the pipelines "window" value - https://phabricator.wikimedia.org/T93701#1210667 (10Krinkle) p:5Triage>3Normal [22:13:26] (03PS1) 10Krinkle: mw-set-env: Remove core dump file size allowance of 2GB [integration/jenkins] - 10https://gerrit.wikimedia.org/r/204395 (https://phabricator.wikimedia.org/T96025) [22:13:34] (03CR) 10Krinkle: [C: 032] mw-set-env: Remove core dump file size allowance of 2GB [integration/jenkins] - 10https://gerrit.wikimedia.org/r/204395 (https://phabricator.wikimedia.org/T96025) (owner: 10Krinkle) [22:14:54] (03PS2) 10AndyRussG: E-mail fr-tech for CentralNotice browser tests [integration/config] - 10https://gerrit.wikimedia.org/r/204295 [22:18:57] (03Merged) 10jenkins-bot: mw-set-env: Remove core dump file size allowance of 2GB [integration/jenkins] - 10https://gerrit.wikimedia.org/r/204395 (https://phabricator.wikimedia.org/T96025) (owner: 10Krinkle) [22:20:14] 10Continuous-Integration: Disable core dumps generation on CI labs slaves - https://phabricator.wikimedia.org/T96025#1210720 (10Krinkle) 5Open>3Resolved a:3Krinkle [22:29:20] (03CR) 10AndyRussG: "Thanks!! :)" [integration/config] - 10https://gerrit.wikimedia.org/r/204295 (owner: 10AndyRussG) [22:34:52] 10Continuous-Integration, 7I18n: banana checker for i18n files must run by default for all MediaWiki extensions - https://phabricator.wikimedia.org/T94547#1210784 (10Krinkle) To avoid more abandoned test infrastructure, we're no longer adding "global" anything. This doesn't scale and causes issues to be nobody... [22:35:38] 10Continuous-Integration: Upgrade Zuul server to latest upstream - https://phabricator.wikimedia.org/T94409#1210788 (10Krinkle) [22:36:00] 10Continuous-Integration, 10Wikimedia-Fundraising: wikimedia/fundraising/dash.git should pass jshint - https://phabricator.wikimedia.org/T67053#1210789 (10Krinkle) [22:39:43] 10Continuous-Integration, 6translatewiki.net: puppetlint for translatewiki should be green and voting - https://phabricator.wikimedia.org/T95090#1210799 (10Krinkle) Please submit a patch against the integration/config repository to make the desired changed. [22:53:50] 10Continuous-Integration, 10MediaWiki-Unit-tests, 7JavaScript: Apache on Jenkins slave takes over 30s to respond (QUnit/AJAX "Test timed out") - https://phabricator.wikimedia.org/T95971#1210822 (10Krinkle) This also affects builds for MediaWiki extensions (e.g. Flow, VisualEditor, Gather, ..), and MediaWiki... [23:15:05] Krinkle: OK so [23:15:25] Krinkle: Ctrl+F the log at https://integration.wikimedia.org/ci/job/mediawiki-core-qunit/40370/artifact/log/mw-debug-www.log/*view*/ for "start request" [23:15:41] Yes [23:16:07] Actually [23:16:09] Look at this [23:16:14] LocalisationCache::isExpired(nl): cache missing, need to make one [23:16:16] Connected to database 0 at localhost [23:16:18] LocalisationCache::isExpired(jp): cache missing, need to make one [23:16:19] Connected to database 0 at localhost [23:16:38] The jp request starts before the nl request has finished [23:16:44] Hm.. [23:17:03] RoanKattouw: I've gone over the unit tests that makes the request twice in the last year to make sure they are not concurrent. [23:17:11] And the zh request overlaps a bit with the jp one too [23:17:18] https://github.com/wikimedia/mediawiki/blob/master/tests/qunit/suites/resources/mediawiki/mediawiki.jqueryMsg.test.js [23:17:23] getMwLanguage [23:17:30] https://github.com/wikimedia/mediawiki/blob/master/tests/qunit/suites/resources/mediawiki/mediawiki.jqueryMsg.test.js#L325 [23:17:39] Must've missed something then [23:17:54] Well either that or the log writer is doing really strange things [23:17:56] RoanKattouw: Well, if it takes more then 30 seconds, it would run concurrent of course [23:17:59] But for now I'll trust the logs [23:18:03] Because it timed out [23:18:09] which is not a fatal error [23:18:25] Yeah, me too. [23:18:45] I fixed the concurrency in https://github.com/wikimedia/mediawiki/commit/365b6f3af90e05be13c63d9e5ac8223c7e4f344b [23:18:53] a few months ago [23:21:01] !log beta-update-databases-eqiad stuck waiting for executors on a node that has plenty executors available [23:21:06] Logged the message, Master [23:27:43] 10Continuous-Integration, 6Mobile-Web: Jenkins: Set up jsduck test and publish jobs for MobileFrontend - https://phabricator.wikimedia.org/T66374#698417 (10Jdlrobson) So this is done and https://phabricator.wikimedia.org/T74794 is no longer blocked? [23:30:30] 10Continuous-Integration, 6Release-Engineering: Rewrite beta-update-databases to not use unstable Configuration Matrix - https://phabricator.wikimedia.org/T96199#1210871 (10Krinkle) 3NEW [23:30:58] Krinkle: So can you explain to me how this code you linked to is for sure not concurrent? I don't know what process() does [23:31:13] Oh I see its definition here [23:32:09] 10Continuous-Integration, 6Mobile-Web: Jenkins: Set up jsduck test and publish jobs for MobileFrontend - https://phabricator.wikimedia.org/T66374#1210884 (10Krinkle) >>! In T66374#1210863, @Jdlrobson wrote: > So this is done and https://phabricator.wikimedia.org/T74794 is no longer blocked? Indeed. Submit a p... [23:32:49] Krinkle: Out of paranoia one thing I could suggest is changing the .done(foo).fail(bar).always(baz) chain to .then(foo,bar).always(baz) to ensure order [23:33:02] Although the done callback doesn't actually do anything async [23:33:40] But even then [23:33:47] The ajax requests should totally be chained [23:34:57] I read the implementation of getMwLanguage() and how it's used and it looks watertight to me [23:35:35] 10Continuous-Integration, 6Mobile-Web: Jenkins: Set up jsduck test and publish jobs for MobileFrontend - https://phabricator.wikimedia.org/T66374#1210898 (10Jdlrobson) 5Open>3Resolved a:3Jdlrobson Okay we have https://gerrit.wikimedia.org/r/181693 open I guess we need to re-examine that now. [23:35:37] 10Continuous-Integration, 6Mobile-Web, 5Patch-For-Review, 7Technical-Debt: Publish MobileFrontend JS Documentation - https://phabricator.wikimedia.org/T74794#1210901 (10Jdlrobson) [23:40:57] RoanKattouw: Lost connection for a while, did you say something? [23:43:23] No, I was just SWATting [23:43:27] Ahm, hold on [23:43:35] [16:32] RoanKattouw Krinkle: Out of paranoia one thing I could suggest is changing the .done(foo).fail(bar).always(baz) chain to .then(foo,bar).always(baz) to ensure order [23:43:36] [16:33] RoanKattouw Although the done callback doesn't actually do anything async [23:43:38] [16:33] RoanKattouw But even then [23:43:39] [16:33] RoanKattouw The ajax requests should totally be chained [23:43:41] [16:34] RoanKattouw I read the implementation of getMwLanguage() and how it's used and it looks watertight to me [23:45:33] RoanKattouw: Changing done().fail().always() to then().always() wouldn't change order in this case, since always() just adds an extra done() and fail() to the list [23:45:44] Right [23:45:49] and It's okay to call Qunit.start before or after the last assertion. [23:45:58] It just has to be called somewhere after async has finished. [23:45:59] Because we know jQuery's implementation we know that it'll run in that order [23:46:04] Assertions can either go before or after it [23:46:27] I'm baffled at how we mange to have parallel requests though [23:46:31] It's not like node-assert's .end() [23:46:35] What is the list of languages it tries, in what order? [23:46:54] Yeah, beats me. I think the concurrency happens only because of the test timing out [23:46:59] i.e. what is mw.libs.phpParserData.tests [23:47:01] so the request is still pending when the next one starts [23:47:17] that's after hte first one times out so let's forget about those for now I think? [23:47:45] at least, that is assuming the chaining logic in the test suite is as you say watertight that's the only conclusion I can think of why the server would have concurrency for those [23:48:10] What is the timeout value for $.ajax() here? [23:48:12] RoanKattouw: https://github.com/wikimedia/mediawiki/blob/master/tests/qunit/data/mediawiki.jqueryMsg.data.js [23:48:22] It's not specified explicitly [23:48:26] RoanKattouw: there isn't any. QUnit itself terminates the test after 30 seconds of inactivity. [23:48:34] Right [23:48:36] By default, requests do not time out. [23:48:38] And there's still a request pending then [23:48:46] But then how do we have parallel requests in the log [23:48:49] And because it's not an XHR, abort() wouldn't do anything anyway (other than ignoring the callback) [23:49:01] It would be nice if we could have version of https://integration.wikimedia.org/ci/job/mediawiki-core-qunit/40370/artifact/log/mw-debug-www.log/*view*/ with timestamps [23:49:03] So we can see gaps in time [23:49:35] RoanKattouw: test 1 passes, test 2 passes, test 3 is cut off (time out observed), test 4 starts (at this point the server may have 2 requests concurrent) [23:49:53] Does that make sense? [23:49:59] RoanKattouw: I enabled timestamps already [23:50:07] but doesn't help much [23:50:16] Mediawiki logs them in relative time to the individual request [23:50:33] Here is a more recent failure: https://integration.wikimedia.org/ci/job/mediawiki-core-qunit/40415/artifact/log/mw-debug-www.log/*view*/ [23:50:51] Ahm OK [23:50:52] with timestamps [23:50:55] Look at this though [23:51:05] You remember the ones we saw paralellized were nl and jp right? [23:51:21] Yeah [23:51:47] So, look at the list of php parser test [23:51:55] The order of languages matches those in the logs [23:51:57] Except nl isn't in there [23:52:13] That's because nl is here: https://github.com/wikimedia/mediawiki/blob/master/tests/qunit/suites/resources/mediawiki/mediawiki.jqueryMsg.test.js#L579 [23:52:17] Which in turn doesn't have jp [23:53:11] The order of requests is: en, fr, ar, nl, jp, zh, ml [23:53:20] With nl and jp mostly overlapping and jp and zh somewhat overlapping [23:53:44] Oh and ml overtakes zh; and I forgot hi at the end [23:53:49] Now look at the languages in the test data [23:54:19] RoanKattouw: also https://github.com/wikimedia/mediawiki/blob/master/tests/qunit/suites/resources/mediawiki/mediawiki.jqueryMsg.test.js#L545-L558 [23:54:20] phpParserTests has en, fr, ar, jp, zh [23:54:29] Theere's a second set of sample date that does contain nl [23:54:35] For a separate test [23:54:40] formatnum has ar, nl, ml, hi [23:55:26] So the order we should be seeing in the logs is either [en fr ar jp zh] [ar nl ml hi] or [ar nl ml hi] [en fr ar jp zh] [23:55:36] But what we actually see is en fr ar nl jp zh ml hi [23:55:53] So I guess 'ar' is timing out, then the format num tests start [23:55:58] So the two tests are running concurrently [23:56:06] 'fr' or 'ar' [23:56:30] 17:46:47 afterEach failed on Match PHP parser: Unfinished AJAX requests: 1 [23:56:53] That corresponds to the [en fr ar jp zh] set [23:56:55] RoanKattouw: The requests are serial, so there would only ever be one unfinished ajax request [23:57:02] Yeah [23:57:07] My process() function is not aborted [23:57:12] when it times out [23:57:18] So it just keeps firing new ones, when it times out later [23:57:31] so once the format num tests start, it does eventually (either from done or fail, who knows) continue [23:57:36] Yeah [23:57:38] That's bad [23:58:06] Maybe your process() could stop on failure? [23:58:17] The test will already have failed [23:58:25] define failure [23:58:38] Either because QUnit gave up on it, or because the .fail() handler asserted false [23:58:41] I"m thinking of changing the testrunner in mediawiki core to abort any unfinished requests. like we also abort unfinished animations. [23:58:48] Oh crap I guess you could have success after a long time [23:58:51] then it will synchronously collapse before starting the next test [23:58:58] Yeah that would work too [23:59:04] Except you'd still have to abort process() on failure [23:59:13] Because .abort() on an XHR causes the fail handler to run [23:59:17] Right, because it would only abort the first one [23:59:29] Which in the current implementation would cause another request to be fired [23:59:30] adn then process() spawns another one [23:59:40] Alternative suggestion: [23:59:47] explicitly set a timeout in the $.ajax() calls [23:59:54] And set it much lower than 30s, like 10s [23:59:59] I guess that's not watertight though