[00:06:09] 10Differential, 5Gerrit-Migration: Broadcast Differential activity to IRC - https://phabricator.wikimedia.org/T116330#1747152 (10greg) Never said it did (send notifications to irc) ;) . It's OK, it'll be fixed :) [00:06:30] 10Deployment-Systems, 3Scap3: Scap3's checks.yaml file should be optional - https://phabricator.wikimedia.org/T116204#1747154 (10dduvall) p:5Triage>3Normal a:3dduvall [00:14:29] 3Scap3, 10Wikimedia-General-or-Unknown: Special:Version on Wikimedia wikis shows outdated commit hashes - https://phabricator.wikimedia.org/T116345#1747169 (10bd808) Scap generated this gitinfo file: ``` { "head": "b00d06a08fc62053e54f99bd234780f2ce0a07a0", "remoteURL": "https://gerrit.wikimedia.org/r/p/me... [00:15:54] 10Deployment-Systems: Special:Version on Wikimedia wikis shows outdated commit hashes for submodules - https://phabricator.wikimedia.org/T116345#1747172 (10bd808) [00:17:32] 10Differential, 5Gerrit-Migration: Broadcast Differential activity to IRC - https://phabricator.wikimedia.org/T116330#1747176 (10greg) Also, fwiw, you can still use Gerrit for scap dev, the scap3 team is just dogfooding differential to find the things we don't yet know. (don't take as excuse, just giving more... [00:18:58] 10Deployment-Systems: Special:Version on Wikimedia wikis shows outdated commit hashes for submodules - https://phabricator.wikimedia.org/T116345#1747177 (10bd808) p:5Triage>3Low [00:21:37] 10Differential, 5Gerrit-Migration: Broadcast Differential activity to IRC - https://phabricator.wikimedia.org/T116330#1747185 (10mmodell) https://phabricator.wikimedia.org/feed/query/all/ shows differential stories. That's where wikibugs gets it's data so presumably it could easily be modified to notify IRC ab... [00:24:14] 10Differential, 5Gerrit-Migration, 10Wikibugs: Broadcast Differential activity to IRC - https://phabricator.wikimedia.org/T116330#1747187 (10mmodell) [00:32:20] 10Deployment-Systems: Special:Version on Wikimedia wikis shows outdated commit hashes for submodules - https://phabricator.wikimedia.org/T116345#1747209 (10mmodell) I'm not sure why there aren't remote tracking branches, but it seems like that is the best way to tell what the upstream commit is. I wonder why th... [00:48:55] 10Deployment-Systems, 3Scap3: Scap3's checks.yaml file should be optional - https://phabricator.wikimedia.org/T116204#1747221 (10dduvall) [00:50:32] 10Deployment-Systems, 3Scap3: Scap3's checks.yaml file should be optional - https://phabricator.wikimedia.org/T116204#1747222 (10dduvall) I wasn't able to reproduce the bad behavior without a `checks.yaml` file entirely, but I've fixed the other cases. [00:51:02] 10Continuous-Integration-Config, 10MediaWiki-Codesniffer: Make mw-tools-codesniffer-mwcore-testrun job more like the actual mediawiki-core-phpcs job - https://phabricator.wikimedia.org/T116348#1747223 (10Legoktm) 3NEW [01:11:54] (03CR) 10Legoktm: [C: 032] "Looks good, thanks :D" [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/237902 (https://phabricator.wikimedia.org/T92744) (owner: 10TasneemLo) [01:12:32] (03Merged) 10jenkins-bot: Sniff to check assignment in while & if [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/237902 (https://phabricator.wikimedia.org/T92744) (owner: 10TasneemLo) [01:13:25] 10MediaWiki-Codesniffer, 5Patch-For-Review: Add sniff for "if/while ( $a = foo() )" constructs in phpcs - https://phabricator.wikimedia.org/T92744#1747261 (10Legoktm) 5Open>3Resolved [01:15:30] (03CR) 10Legoktm: "I like where this is going :)" [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/237899 (https://phabricator.wikimedia.org/T92751) (owner: 10TasneemLo) [01:16:45] (03Abandoned) 10Legoktm: Sniff to check assignment expressions in while,if [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/196873 (https://phabricator.wikimedia.org/T92744) (owner: 10Sumit) [01:20:33] (03PS4) 10Legoktm: Handle multiple # comments in Space Before Comment [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/243619 (https://phabricator.wikimedia.org/T114633) (owner: 10TasneemLo) [01:23:57] (03CR) 10Legoktm: [C: 032] Handle multiple # comments in Space Before Comment [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/243619 (https://phabricator.wikimedia.org/T114633) (owner: 10TasneemLo) [01:25:38] 10Continuous-Integration-Config, 10MediaWiki-Codesniffer: Make mw-tools-codesniffer-mwcore-testrun job more like the actual mediawiki-core-phpcs job - https://phabricator.wikimedia.org/T116348#1747276 (10Legoktm) I'm thinking... * zuul-cloner to fetch mw/tools/codesniffer and mw/core * cd mw/core * composer up... [01:32:05] (03Merged) 10jenkins-bot: Handle multiple # comments in Space Before Comment [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/243619 (https://phabricator.wikimedia.org/T114633) (owner: 10TasneemLo) [03:00:28] 10Deployment-Systems: Special:Version on Wikimedia wikis shows outdated commit hashes for submodules - https://phabricator.wikimedia.org/T116345#1747370 (10bd808) >>! In T116345#1747209, @mmodell wrote: > I'm not sure why there aren't remote tracking branches, but it seems like that is the best way to tell what... [03:26:48] 10Deployment-Systems: Special:Version on Wikimedia wikis shows outdated commit hashes for submodules - https://phabricator.wikimedia.org/T116345#1747377 (10bd808) The branches are all set in the top level .gitconfig as hoped for. I'm guessing this is because @mmodell is using a not-freaking-ancient version of gi... [03:31:27] 10Deployment-Systems: Special:Version on Wikimedia wikis shows outdated commit hashes for submodules - https://phabricator.wikimedia.org/T116345#1747386 (10mmodell) @bd808: I run make-wmf-branch on my laptop for that reason, and a few others. Tin is ridiculously outdated and it manages to annoy me frequently. [03:34:32] 10Beta-Cluster-Infrastructure, 6operations, 7HHVM: Convert work machines (tin, terbium) to Trusty and hhvm usage - https://phabricator.wikimedia.org/T87036#1747389 (10bd808) [03:34:33] 10Deployment-Systems: Special:Version on Wikimedia wikis shows outdated commit hashes for submodules - https://phabricator.wikimedia.org/T116345#1747388 (10bd808) [03:36:46] 10Beta-Cluster-Infrastructure, 6operations, 7HHVM: Convert work machines (tin, terbium) to Trusty and hhvm usage - https://phabricator.wikimedia.org/T87036#1747392 (10bd808) >>! In T87036#1640693, @hashar wrote: > Following on @Dzahn comment, should probably use Jessie instead of Trusty. I don't think that... [03:50:02] 10MediaWiki-Codesniffer, 5Patch-For-Review: Handle comments with multiple # - https://phabricator.wikimedia.org/T114633#1747402 (10Legoktm) 5Open>3Resolved [05:15:23] PROBLEM - Free space - all mounts on deployment-bastion is CRITICAL: CRITICAL: deployment-prep.deployment-bastion.diskspace._var.byte_percentfree (<11.11%) [05:32:32] 10Deployment-Systems, 3Scap3: Scap3's checks.yaml file should be optional - https://phabricator.wikimedia.org/T116204#1747466 (10mmodell) I had to create checks.yaml in order to get it to work. [05:55:26] (03PS1) 1020after4: add .arcconfig [tools/scap] - 10https://gerrit.wikimedia.org/r/248287 [05:56:46] (03CR) 1020after4: [C: 032] add .arcconfig [tools/scap] - 10https://gerrit.wikimedia.org/r/248287 (owner: 1020after4) [05:57:30] (03Merged) 10jenkins-bot: add .arcconfig [tools/scap] - 10https://gerrit.wikimedia.org/r/248287 (owner: 1020after4) [06:33:36] 10Beta-Cluster-Infrastructure, 6operations, 7Blocked-on-Operations, 7HHVM: Convert work machines (tin, terbium) to Trusty and hhvm usage - https://phabricator.wikimedia.org/T87036#1747484 (10ori) p:5Normal>3High [06:45:24] RECOVERY - Free space - all mounts on deployment-bastion is OK: OK: All targets OK [07:51:42] (03PS1) 10Legoktm: Release 0.5.0, update HISTORY and README [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/248293 [07:54:32] (03CR) 10Polybuildr: [C: 032] Release 0.5.0, update HISTORY and README [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/248293 (owner: 10Legoktm) [07:54:41] waa [07:55:00] I was gonna go to sleep :v [07:55:14] (03Merged) 10jenkins-bot: Release 0.5.0, update HISTORY and README [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/248293 (owner: 10Legoktm) [08:02:17] (03CR) 10Legoktm: "That was faster than I expected :) Tagged and announced: https://lists.wikimedia.org/pipermail/wikitech-l/2015-October/083670.html" [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/248293 (owner: 10Legoktm) [08:03:57] 10MediaWiki-Codesniffer, 7Documentation: "How to install" directions in MW-CS README are out of date - https://phabricator.wikimedia.org/T116359#1747537 (10Legoktm) 3NEW [08:10:38] 10Continuous-Integration-Config, 10MediaWiki-Codesniffer: Make mw-tools-codesniffer-mwcore-testrun job more like the actual mediawiki-core-phpcs job - https://phabricator.wikimedia.org/T116348#1747567 (10polybuildr) >>! In T116348#1747276, @Legoktm wrote: > That doesn't handle cases where we change the phpcs v... [08:24:02] PROBLEM - Puppet failure on deployment-tmh01 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [08:27:56] PROBLEM - Puppet failure on deployment-mx is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [08:31:49] RECOVERY - Puppet staleness on deployment-restbase01 is OK: OK: Less than 1.00% above the threshold [3600.0] [08:34:49] Yippee, build fixed! [08:34:49] Project browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-os_x_10.9-safari-sauce build #760: 09FIXED in 24 min: https://integration.wikimedia.org/ci/job/browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-os_x_10.9-safari-sauce/760/ [08:41:02] 10Beta-Cluster-Infrastructure: +Sysop for User:Mww113 - https://phabricator.wikimedia.org/T116364#1747601 (10Mww113) 3NEW [08:44:04] RECOVERY - Puppet failure on deployment-tmh01 is OK: OK: Less than 1.00% above the threshold [0.0] [09:02:51] RECOVERY - Puppet failure on deployment-mx is OK: OK: Less than 1.00% above the threshold [0.0] [09:18:32] 10Beta-Cluster-Infrastructure: +Sysop for User:Mww113 - https://phabricator.wikimedia.org/T116364#1747628 (10Aklapper) Could you elaborate why would this require Sysop? What does "some MediaWiki: pages" mean in this case? [09:34:32] 5Gerrit-Migration: Proof of concept of code review in Phabricator - https://phabricator.wikimedia.org/T560#1747651 (10Aklapper) >>! From T94167#1746714: > Differential is available in an opt-in basis. RelEng is using it for #scap3 work (see: https://phabricator.wikimedia.org/differential/query/Bl4g_4o5nAT8/ ) [09:37:22] 6Release-Engineering-Team, 15User-greg: Create weekly Diffusion/erential migration meeting - https://phabricator.wikimedia.org/T116037#1747668 (10Aklapper) @greg: May I get added, as as optional attendee? [12:02:27] 10Deployment-Systems, 3Scap3: Deploy RESTBase with scap3 - https://phabricator.wikimedia.org/T116335#1747812 (10mobrovac) >>! In T116335#1746874, @greg wrote: > Tested on Beta Cluster on Wednesday Oct 21st, successful test (with followup fixups, tasks already created). Notes available [here](https://www.media... [12:30:54] 10Beta-Cluster-Infrastructure: +Sysop for User:Mww113 - https://phabricator.wikimedia.org/T116364#1747840 (10Luke081515) a:3Luke081515 Sysop access, ok. So Question: Which wiki? And which MediaWiki Pages would you edit? [12:32:55] 10Beta-Cluster-Infrastructure: +Sysop for User:Mww113 - https://phabricator.wikimedia.org/T116364#1747844 (10Glaisher) Note that changes will reach the beta cluster only *after* getting merged at Gerrit, so you might want to instead have a local test instance and then test there before submitting to Gerrit. [12:56:38] Project beta-scap-eqiad build #75540: 04FAILURE in 2 min 8 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/75540/ [13:06:25] Yippee, build fixed! [13:06:25] Project beta-scap-eqiad build #75541: 09FIXED in 1 min 58 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/75541/ [13:06:42] interesenting [14:35:49] Project browsertests-MobileFrontend-SmokeTests-linux-chrome-sauce build #301: 04FAILURE in 7 min 49 sec: https://integration.wikimedia.org/ci/job/browsertests-MobileFrontend-SmokeTests-linux-chrome-sauce/301/ [15:07:26] 10Differential, 5Gerrit-Migration, 15User-greg: Uploading a new Differential Revision does not reset “Accept” states in commit messages - https://phabricator.wikimedia.org/T164#1748066 (10scfc) The problem I see is that currently from time to time someone accidentally screws up rebases or does not implement... [15:15:42] ostriches, looks like https://phabricator.wikimedia.org/diffusion/OPUP/ needs to be updated to have the #operations tag instead [15:16:08] same for https://phabricator.wikimedia.org/diffusion/OMWC/ [15:19:44] can someone in the acl*operations group do it? [15:20:44] nope, I think it requires a repository-admin person [15:21:14] i.e. twentyafterfour, Catrope, chasemp or ostriches [15:24:17] gotcha [15:25:29] OMWC gets an ops acl? [15:26:57] does the projects list control ACL? [15:28:19] I don't believe so [15:35:56] ostriches? [15:39:52] 5Continuous-Integration-Scaling, 7Tracking: Investigate using Drydock for CI - https://phabricator.wikimedia.org/T116038#1748171 (10JanZerebecki) That seems to mean it is able to run some build steps like checkout of a working copy before other build steps. But the parent task is not about allowing to declare... [15:44:39] Krenair: Heck if I know, I just saw "acl" [15:44:40] :) [15:44:58] So this was called #operations [15:45:10] And then they changed all those references to #acl*operationsteam [15:45:17] a couple of things, like this, need to be switched back I think [15:47:06] yeah, see discussion on ops@ [15:47:33] they were treating #operations as an ACL group, but that wasn't nice to others who wanted to join/watch #ops :) [15:48:44] It wasn't so terrible after herald was opened up [15:49:05] I just had a rule that emailed me whenever someone touched an operations task (without CCing me, unlike certain other people) [15:49:38] *cough* matanya and aklapper *cough* ;) [15:50:22] hehe [15:50:24] yeah [15:53:41] I didn't know it was an acl at all, I just assumed it was for tagging ops-related things. [15:54:11] the associated project on repos isn't, afaik [15:54:30] but ops was using #operations for acl things, which was problematic [16:03:14] PROBLEM - Puppet failure on integration-slave-jessie-1001 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [16:03:14] PROBLEM - Puppet failure on deployment-parsoidcache02 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [16:37:07] Yippee, build fixed! [16:37:07] Project browsertests-Echo-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce build #667: 09FIXED in 2 min 26 sec: https://integration.wikimedia.org/ci/job/browsertests-Echo-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce/667/ [16:38:33] Yippee, build fixed! [16:38:34] Project browsertests-Echo-en.wikipedia.beta.wmflabs.org-linux-chrome-sauce build #650: 09FIXED in 1 min 16 sec: https://integration.wikimedia.org/ci/job/browsertests-Echo-en.wikipedia.beta.wmflabs.org-linux-chrome-sauce/650/ [16:41:09] Krenair: at one point I tagged the repo but iirc it generated a bunch of email on merge updates [16:41:18] that were in additino to gerrit and diffusion was considerably less official then [16:41:20] etc [16:41:29] I have no pref now, at some point we have to go this route [16:43:02] it emails project watchers upon commits going into the repo, right? [16:43:27] ah I see, at some point it was added anyways and now it's the wrong thing, fixed [16:43:32] ....yes I think that's right [17:09:53] 10Beta-Cluster-Infrastructure: +Sysop for User:Mww113 - https://phabricator.wikimedia.org/T116364#1748515 (10Luke081515) p:5Triage>3Normal [17:28:49] (03PS1) 10Dduvall: Fix undefined `last_session_ids=` method exception [selenium] - 10https://gerrit.wikimedia.org/r/248372 (https://phabricator.wikimedia.org/T114368) [17:30:34] (03PS2) 10Dduvall: Fix undefined `last_session_ids=` method exception [selenium] - 10https://gerrit.wikimedia.org/r/248372 (https://phabricator.wikimedia.org/T114368) [18:12:56] (03CR) 10Sbisson: [C: 032] "Thanks Dan!" [selenium] - 10https://gerrit.wikimedia.org/r/248372 (https://phabricator.wikimedia.org/T114368) (owner: 10Dduvall) [18:14:10] (03Merged) 10jenkins-bot: Fix undefined `last_session_ids=` method exception [selenium] - 10https://gerrit.wikimedia.org/r/248372 (https://phabricator.wikimedia.org/T114368) (owner: 10Dduvall) [18:15:23] PROBLEM - Free space - all mounts on deployment-db2 is CRITICAL: CRITICAL: deployment-prep.deployment-db2.diskspace._mnt.byte_percentfree (<11.11%) [18:20:59] (03PS1) 10Dduvall: Release patch version 1.6.2 [selenium] - 10https://gerrit.wikimedia.org/r/248392 [18:24:09] Project beta-update-databases-eqiad build #3855: 04FAILURE in 4 min 9 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/3855/ [18:26:14] PROBLEM - App Server Main HTTP Response on deployment-mediawiki02 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:26:46] PROBLEM - English Wikipedia Mobile Main page on beta-cluster is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:31:07] Project browsertests-CentralNotice-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce build #479: 04FAILURE in 1 min 7 sec: https://integration.wikimedia.org/ci/job/browsertests-CentralNotice-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce/479/ [18:34:45] PROBLEM - English Wikipedia Main page on beta-cluster is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:35:09] PROBLEM - App Server Main HTTP Response on deployment-mediawiki03 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:39:37] RECOVERY - English Wikipedia Main page on beta-cluster is OK: HTTP OK: HTTP/1.1 200 OK - 38951 bytes in 0.470 second response time [18:39:59] RECOVERY - App Server Main HTTP Response on deployment-mediawiki03 is OK: HTTP OK: HTTP/1.1 200 OK - 38645 bytes in 1.207 second response time [18:41:05] RECOVERY - App Server Main HTTP Response on deployment-mediawiki02 is OK: HTTP OK: HTTP/1.1 200 OK - 38636 bytes in 0.530 second response time [18:41:21] PROBLEM - Free space - all mounts on deployment-bastion is CRITICAL: CRITICAL: deployment-prep.deployment-bastion.diskspace._var.byte_percentfree (<60.00%) [18:41:37] RECOVERY - English Wikipedia Mobile Main page on beta-cluster is OK: HTTP OK: HTTP/1.1 200 OK - 30385 bytes in 1.020 second response time [18:45:21] ostriches: this gives me an error, but also the data I want (I think: all not merged changes to any .*wmf\.3 branch): https://gerrit.wikimedia.org/r/#/q/branch:%255E.*wmf%255C.3,n,z [18:45:41] I'm curious if there's a way of doing that in Phab/Differential [18:45:46] PROBLEM - English Wikipedia Main page on beta-cluster is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:45:59] ok, well... what's going on with beta? [18:46:28] thcipriani: marxarelli twentyafterfour help ^ [18:46:40] uhm [18:46:53] re beta clsuter, not differential :) [18:47:14] PROBLEM - App Server Main HTTP Response on deployment-mediawiki02 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:47:46] PROBLEM - English Wikipedia Mobile Main page on beta-cluster is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:48:50] is it a general problem with labs infrastructure? I don't see a pattern ... [18:49:12] https://logstash-beta.wmflabs.org/#/dashboard/elasticsearch/fatalmonitor [18:49:26] lot of db errors [18:50:58] "Error: 1021 Disk full (/mnt/tmp/#sql_6f5_1); waiting for someone to free some space..." [18:51:06] :( [18:51:08] PROBLEM - App Server Main HTTP Response on deployment-mediawiki03 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:52:43] deployment-db2 [18:52:46] has full /mnt [18:53:46] and there isn't much I see to delete [18:53:50] I can delte the error log file [18:54:11] :/ [18:54:13] looks like a massive temp disk table [18:54:18] at /mnt/tmp/#sql_6f5_0.MAD [18:54:21] 77G [18:54:40] so someone ran a very naughty query? [18:54:48] likely [18:55:04] Project browsertests-CentralNotice-en.wikipedia.beta.wmflabs.org-linux-chrome-sauce build #480: 04FAILURE in 2 min 4 sec: https://integration.wikimedia.org/ci/job/browsertests-CentralNotice-en.wikipedia.beta.wmflabs.org-linux-chrome-sauce/480/ [18:55:14] can we kill said query? [18:55:32] I don't know how to get root on mysql [18:55:43] "| 15840440 | wikiadmin | 10.68.16.127:60495 | enwiki | Query | 3775 | Copying to tmp table | SELECT /* Flow\Formatter\ContributionsQuery::queryRevisions Luke081515 */ * F" [18:55:54] is the password stored somewhere I can look it up? [18:56:04] twentyafterfour: you can sudo, then just `mysql` [18:56:15] i'm guessing it's stored in root's my.cnf [18:56:44] oh I tried sudo mysql; didn't work, but sudo su; mysql; did work [18:57:00] Project UploadWizard-api-commons.wikimedia.beta.wmflabs.org build #2831: 04FAILURE in 10 min: https://integration.wikimedia.org/ci/job/UploadWizard-api-commons.wikimedia.beta.wmflabs.org/2831/ [18:57:04] Luke081515|AFK: ! ;) [18:57:39] RECOVERY - English Wikipedia Mobile Main page on beta-cluster is OK: HTTP OK: HTTP/1.1 200 OK - 30411 bytes in 1.905 second response time [18:58:13] so should we kill it? it's almost certainly not going to be able to complete since the disk can't be enlarged to accommodate ;) [18:58:22] !log Killed mysql process 15840440 on account of its gargantuan temp file filling up /mnt [18:58:29] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [18:58:35] marxarelli: nice work [18:59:07] !log deleted atop.log.* files on deployment-bastion. when are we going to enlarge /var on this instance. grr [18:59:11] take that, fiend! [18:59:14] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [18:59:28] twentyafterfour: rebuild it, unfortunately [18:59:54] greg-g: I know. and that's apparently difficult [18:59:55] (03PS18) 10Paladox: Allow skins to also be tested like extensions can [integration/config] - 10https://gerrit.wikimedia.org/r/228470 (https://phabricator.wikimedia.org/T68926) [19:00:00] which really is troubling. [19:00:01] just annoying, I think [19:00:16] did we find that root mysql pass? [19:00:17] but I guess "difficult" and "annoying" are really just two sides to the same thing ;) [19:00:18] time consuming, or we'd have done it already [19:00:30] Krenair: it's stored somewhere, no need to enter the password [19:00:31] I vaguely recall it's 'secret' on one of the boxes, I think it was eventlogging02 or something [19:00:34] ok [19:00:35] RECOVERY - English Wikipedia Main page on beta-cluster is OK: HTTP OK: HTTP/1.1 200 OK - 38951 bytes in 0.520 second response time [19:00:42] (03CR) 10jenkins-bot: [V: 04-1] Allow skins to also be tested like extensions can [integration/config] - 10https://gerrit.wikimedia.org/r/228470 (https://phabricator.wikimedia.org/T68926) (owner: 10Paladox) [19:00:43] just sudo su; mysql [19:00:52] ah yes, isn't that a relatively new thing? [19:00:57] RECOVERY - App Server Main HTTP Response on deployment-mediawiki03 is OK: HTTP OK: HTTP/1.1 200 OK - 38652 bytes in 0.647 second response time [19:01:07] Krenair: I didn't know about it so I guess maybe so [19:01:21] RECOVERY - Free space - all mounts on deployment-bastion is OK: OK: All targets OK [19:01:30] can we get a separate volume for mariadb temp files? [19:01:41] krenair@deployment-db1:~$ sudo mysql [19:01:41] ERROR 1045 (28000): Access denied for user 'root'@'localhost' (using password: NO) [19:01:45] marxarelli: would that solve anything? [19:01:51] and yet it works if I `sudo -i` and then `mysql` [19:01:55] (03PS19) 10Paladox: Allow skins to also be tested like extensions can [integration/config] - 10https://gerrit.wikimedia.org/r/228470 (https://phabricator.wikimedia.org/T68926) [19:02:05] Krenair: that's what got me as well [19:02:05] RECOVERY - App Server Main HTTP Response on deployment-mediawiki02 is OK: HTTP OK: HTTP/1.1 200 OK - 38638 bytes in 0.512 second response time [19:02:24] twentyafterfour: it usually helps performance and prevents data corruption from full data partitions [19:02:30] sudo command; is different from sudo su; command; it must be setting up some environment in one of root's shell rc files [19:02:47] (03CR) 10jenkins-bot: [V: 04-1] Allow skins to also be tested like extensions can [integration/config] - 10https://gerrit.wikimedia.org/r/228470 (https://phabricator.wikimedia.org/T68926) (owner: 10Paladox) [19:03:44] (03PS20) 10Paladox: Allow skins to also be tested like extensions can [integration/config] - 10https://gerrit.wikimedia.org/r/228470 (https://phabricator.wikimedia.org/T68926) [19:04:46] (03CR) 10jenkins-bot: [V: 04-1] Allow skins to also be tested like extensions can [integration/config] - 10https://gerrit.wikimedia.org/r/228470 (https://phabricator.wikimedia.org/T68926) (owner: 10Paladox) [19:07:55] (03PS21) 10Paladox: Allow skins to also be tested like extensions can [integration/config] - 10https://gerrit.wikimedia.org/r/228470 (https://phabricator.wikimedia.org/T68926) [19:08:57] (03CR) 10jenkins-bot: [V: 04-1] Allow skins to also be tested like extensions can [integration/config] - 10https://gerrit.wikimedia.org/r/228470 (https://phabricator.wikimedia.org/T68926) (owner: 10Paladox) [19:10:27] RECOVERY - Free space - all mounts on deployment-db2 is OK: OK: All targets OK [19:10:42] (03PS22) 10Paladox: Allow skins to also be tested like extensions can [integration/config] - 10https://gerrit.wikimedia.org/r/228470 (https://phabricator.wikimedia.org/T68926) [19:11:08] (03CR) 10Florianschmidtwelzow: "Wow, you don't give reviewers any chance to review your changes ;)" (0310 comments) [integration/config] - 10https://gerrit.wikimedia.org/r/228470 (https://phabricator.wikimedia.org/T68926) (owner: 10Paladox) [19:11:15] (03CR) 10Florianschmidtwelzow: [C: 04-1] Allow skins to also be tested like extensions can [integration/config] - 10https://gerrit.wikimedia.org/r/228470 (https://phabricator.wikimedia.org/T68926) (owner: 10Paladox) [19:12:03] (03CR) 10Paladox: "Hi @Florianschmidtwelzow it was failing tests because I didn't change some things so I fixed the errors. sorry didn't realise you were rev" [integration/config] - 10https://gerrit.wikimedia.org/r/228470 (https://phabricator.wikimedia.org/T68926) (owner: 10Paladox) [19:12:57] 10MediaWiki-Codesniffer: Bump mw code sniffer to version 0.4.1 - https://phabricator.wikimedia.org/T115498#1749103 (10Legoktm) We released 0.5.0 instead. [19:13:04] 10MediaWiki-Codesniffer: Bump mw code sniffer to version 0.4.1 - https://phabricator.wikimedia.org/T115498#1749105 (10Legoktm) 5Open>3declined a:3Legoktm [19:20:56] (03CR) 10Sbisson: [C: 032] Release patch version 1.6.2 [selenium] - 10https://gerrit.wikimedia.org/r/248392 (owner: 10Dduvall) [19:23:38] (03Merged) 10jenkins-bot: Release patch version 1.6.2 [selenium] - 10https://gerrit.wikimedia.org/r/248392 (owner: 10Dduvall) [19:35:47] (03PS23) 10Paladox: Enable zuul/jenkins to run unit tests in skin projects [integration/config] - 10https://gerrit.wikimedia.org/r/228470 (https://phabricator.wikimedia.org/T68926) [19:38:18] deployment.wikimedia.beta.wmflabs.org says: [19:38:29] The database is currently locked to new entries and other modifications, probably for routine database maintenance, after which it will be back to normal. [19:38:32] The administrator who locked it offered this explanation: The database has been automatically locked while the slave database servers catch up to the master. [19:38:35] Can someone help? [19:49:29] Luke081515: you broke the db :P [19:49:49] see backscroll above [19:50:23] With which action? [19:50:31] I don't know why [19:51:21] 18:55 <+marxarell> "| 15840440 | wikiadmin | 10.68.16.127:60495 | enwiki | Query | 3775 | Copying to tmp table | SELECT /* Flow\Formatter\ContributionsQuery::queryRevisions Luke081515 */ * F" [19:52:05] hm, ok, but which action was that? I can't see the action :-/ [19:52:32] (03PS24) 10Paladox: Enable zuul/jenkins to run unit tests in skin projects [integration/config] - 10https://gerrit.wikimedia.org/r/228470 (https://phabricator.wikimedia.org/T68926) [19:52:33] I just blocked an IP Adress today, so was it an action, a few days ago? [19:53:30] * greg-g shrugs [19:53:34] maybe it was flow's fault? [19:54:45] I wonder, I don't make something with flow today [19:55:07] (03PS25) 10Paladox: Enable zuul/jenkins to run unit tests in skin projects [integration/config] - 10https://gerrit.wikimedia.org/r/228470 (https://phabricator.wikimedia.org/T68926) [19:55:15] oh, wait: I give selenium user 2 autopatrol a our or two ago, but that was nothing with flow [19:55:16] State: Master has sent all binlog to slave; waiting for binlog to be updated [19:55:41] but on db2: [19:55:42] State: Waiting for master to send event [19:56:08] (03CR) 10jenkins-bot: [V: 04-1] Enable zuul/jenkins to run unit tests in skin projects [integration/config] - 10https://gerrit.wikimedia.org/r/228470 (https://phabricator.wikimedia.org/T68926) (owner: 10Paladox) [19:56:27] I wonder if that's correct.. [19:56:35] (03PS26) 10Paladox: Enable zuul/jenkins to run unit tests in skin projects [integration/config] - 10https://gerrit.wikimedia.org/r/228470 (https://phabricator.wikimedia.org/T68926) [19:59:23] hm. If this error was produced by my last action, it could be only this one: http://en.wikipedia.beta.wmflabs.org/w/index.php?title=Special%3ALog&type=rights&user=&page=User%3ASelenium+Echo+user+2&year=&month=-1&tagfilter= [19:59:35] But I don't know, why this action can make a db crash [20:00:42] Luke081515, sounds like it was you loading a user's contributions [20:01:06] db in cause of a read action? o.O [20:01:32] sure [20:03:34] I looked up the contributions of selenium user 2 and selenium user. selenium user later, and I loaded the pages of him with Special:Nuke, but don't did an action [20:03:45] maybe an error there? [20:04:24] or it [20:04:39] *or at http://en.wikipedia.beta.wmflabs.org/wiki/Talk:Flow_QA, this is a very big flow board [20:05:34] I hope that helps you [20:09:36] twentyafterfour: greg-g: I don't think this is externally blocked? https://phabricator.wikimedia.org/T106732 [20:11:53] 10Deployment-Systems, 7I18n: i18n cache vs resourceloader race condition (RL message key empty) - https://phabricator.wikimedia.org/T68543#1749302 (10Krinkle) [20:12:40] 10Deployment-Systems: i18n cache corruption for enwiki message (possibly others?) meant code displayed instead of labels - https://phabricator.wikimedia.org/T67230#1749305 (10Krinkle) [20:16:34] 10Deployment-Systems: Expose php warnings in mediawiki-config more visibly - https://phabricator.wikimedia.org/T87447#1749311 (10Krinkle) Logstash now tracks notices and warnings as well. And it even has a more verbose debug channel for testwiki. Bringing back the practice of first deploying to mw1017 (testwik... [20:22:20] greg-g: Good, that this happens only at beta. Imagine that would happend at production.... [20:22:46] PROBLEM - Puppet staleness on deployment-restbase01 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [43200.0] [20:27:07] Luke081515, probably someone would have figured out what's wrong by now if it were production [20:29:13] in a bit I'll file a task, if no one beats me, for at least the flow team to look at [20:35:51] Why is Seconds_Behind_Master NULL... [20:36:44] seems that means replication broke [20:37:35] 10Deployment-Systems, 3Scap3: Scap3 targets should use a config file rather than `key:value` arguments - https://phabricator.wikimedia.org/T116432#1749363 (10thcipriani) 3NEW [20:40:20] Krinkle: not blocked that I know of [20:46:38] Krinkle: I think it's only blocked in the sense that I don't know how to move it forward. I don't feel comfortable poking the production apache config [20:47:09] twentyafterfour: Well, right now it's executing random php files when accessing supposedly static content. [20:47:40] because we switched to HHVM and didn't do feature parity, afaict [20:47:49] A regression introduced in the last few months. Possibly caused by the standardisation of apache configs between prod and meta [20:47:56] Or that. [20:48:58] it should be fixed but I'd like someone from operations, or someone more familiar with apache config, to help out because I'm slightly lost - looking through https://github.com/wikimedia/operations-puppet/tree/production/modules/mediawiki/files/apache/sites there are a lot of things going on in there. [20:49:37] it actually looks like a lot could be simplified / refactored to reuse more includes or more puppet templating [20:49:39] 10Browser-Tests, 5Patch-For-Review, 3reading-web-sprint-58-The-Sixth-Sense: Investigate QuickSurveys browser tests failures - https://phabricator.wikimedia.org/T113534#1749403 (10atgo) [20:50:13] i'm seeing readonly errors from api calls to bc [20:50:20] to beta? [20:50:21] :( [20:50:26] yeah [20:50:28] yes, it's read-only at the moment [20:50:32] ah, ok [20:50:34] replication from deployment-db1 to deployment-db2 is broken [20:50:37] repairing things? [20:50:39] I have no idea how to fix it [20:50:40] no [20:50:41] got it [20:50:58] hmm broken by that query we killed earlier? [20:51:11] likely, if it ran out of disk space [20:51:17] I used to know how to fix mysql replication but it's been a long time since I've done it [20:51:23] it wouldn't be able to write to the relay logs [20:51:25] it's actually quite involved [20:51:34] i can give it a shot [20:51:36] I sent a message to -databases about it [20:51:57] I tried to make it restart replication but it stopped again [20:54:05] !log deployment-db2 shows slave io but slave sql failed on duplicate key [20:54:14] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [21:13:49] !log deployment-db1 binlog deployment-db1-bin.000062 appears corrupt [21:13:54] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [21:14:19] marxarelli: need help from an opsen? [21:15:01] * marxarelli has never heard that phrase before. music ... [21:16:10] just throwing it out there, I need to go afk/irc for a short time [21:16:11] greg-g: it depends on whether locking db1 is a problem [21:16:12] Project browsertests-QuickSurveys-en.m.wikipedia.beta.wmflabs.org-linux-chrome-sauce build #43: 04FAILURE in 11 sec: https://integration.wikimedia.org/ci/job/browsertests-QuickSurveys-en.m.wikipedia.beta.wmflabs.org-linux-chrome-sauce/43/ [21:16:42] marxarelli: nah, we can lock beta clsuter for a while to fix it, at this point I'm just curious if something just happened that might happen in prod on Tuesday [21:16:52] i can fix it, but not gracefully (a full dump from db1 to db2 which will lock everything) [21:17:07] yeah, 'tis fine, it's Friday afternoon [21:17:22] contrary to production, us breaking on a friday is OK ;) [21:17:26] i'm fairly certain the lack of diskspace caused the binlog corruption [21:18:08] but was it that query that caused the huge diskspace use, or was that irrelevant (in that regard)? [21:18:18] which is why i suggest we look at putting binlogs and/or tmpdir on a separate volume/partition [21:18:27] * greg-g nods [21:18:27] i have no idea about that [21:18:31] kk [21:18:46] I can over-worry until I'm wrong :) [21:19:04] bbs [21:24:24] you following http://stackoverflow.com/a/3229580 ? [21:26:09] that's more of less what i was planning, yeah [21:26:27] i don't think you strictly need to reset/flush the master logs because you can start replication and a later point [21:26:36] the important thing is to capture the log position with the dump [21:26:50] and make sure the fs is fully synced [21:27:31] but i'm pulling this from my aging memory, so i appreciate the reference :) [21:32:40] Krenair: so i think i'm going to do all databases except for `mysql` in case privileges are setup differently [21:32:45] does that sound reasonable? [21:33:10] and skipping information/performance_schema [21:33:12] I doubt privileges will be different [21:33:35] but sure [21:34:32] thank you, btw, marxarelli [21:35:52] sure thing. i'm kind of fun when shits not on the line [21:35:58] *it's* [21:36:02] i'm no fun [21:37:18] !log dumping databases on deployment-db1 for restore of deployment-db2 [21:37:24] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [21:39:27] marxarelli: fun is after [22:09:32] does beta cluster have an extension1 db cluster? [22:09:41] (totally unrelated to above) [22:10:47] hmm, nope. [22:16:22] legoktm: What should i do for $wgEnabledTranscodeSet in https://gerrit.wikimedia.org/r/#/c/210176/ Since i added EnabledTranscodeSet to extension.json but it is failing jenkins test now saying undefined index. [22:18:30] !log dump of deployment-db1 failed due to "View 'labswiki.bounce_records' references invalid table(s)" [22:18:35] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [22:23:16] 10Beta-Cluster-Infrastructure, 10MediaWiki-extensions-UrlShortener, 10Wikimedia-Extension-setup: Set up UrlShortener extension on the beta cluster - https://phabricator.wikimedia.org/T116444#1749684 (10Legoktm) 3NEW [22:28:13] marxarelli, wtf we have a labswiki db in beta? [22:29:12] do we have a beta as well? :o [22:29:12] VMs on VMs to test if OSM works ;) [22:29:43] Krenair: guess so [22:29:52] Something's wrong here [22:30:37] Ohh.... [22:31:18] no wait, it still doesn't make sense [22:33:13] mysql> explain bounce_records; [22:33:13] ERROR 1356 (HY000): View 'labswiki.bounce_records' references invalid table(s) or column(s) or function(s) or definer/invoker of view lack rights to use them [22:33:35] (03PS1) 10Gergő Tisza: Add checks for mediawiki/oauthclient-php [integration/config] - 10https://gerrit.wikimedia.org/r/248551 [22:33:37] even as root [22:33:44] still, labswiki? wtf? [22:34:08] When I read that I thought the wikitech DB had leaked [22:34:20] It's not wikitech [22:34:23] I just don't know what it is yet [22:34:35] test instance of wikitech? [22:35:30] i have no idea why it's there, but please don't do any modifications while i'm re-dumping :) [22:35:32] no [22:35:39] I'm not modifying anything [22:35:49] :) [22:35:55] Looks like a copy of deploymentwiki? [22:36:26] 10Beta-Cluster-Infrastructure, 6Release-Engineering-Team: Beta Cluster outage: deployment-db2 disk filled up, locked db replication - https://phabricator.wikimedia.org/T116447#1749727 (10greg) 3NEW [22:36:44] 10Beta-Cluster-Infrastructure, 6Release-Engineering-Team: Beta Cluster outage: deployment-db2 disk filled up, locked db replication - https://phabricator.wikimedia.org/T116447#1749736 (10greg) a:3dduvall [22:36:52] just for the record ^ [22:37:54] Okay [22:38:03] So it looks like labswiki is a DB containing a load of views pointing to deploymentwiki [22:38:58] at least 4 of which are broken according to mysqldump so far [22:39:24] bounce_records, hitcounter, page, site_stats [22:39:28] select `deploymentwiki`.`bounce_records`.`br_id` AS `br_id`,`deploymentwiki`.`bounce_records`.`br_user` AS `br_user`,`deploymentwiki`.`bounce_records`.`br_timestamp` AS `br_timestamp`,`deploymentwiki`.`bounce_records`.`br_reason` AS `br_reason` from `deploymentwiki`.`bounce_records` [22:39:53] so who created this and why [22:41:39] https://tools.wmflabs.org/sal/releng?p=0&q=db&d= doesn't help me much, maybe others? :) [22:43:48] Krenair: bouncehandler stuff is probably me. But I don't remember ever touching labswiki. [22:44:37] i see a table called 'bug_54847_password_resets' if that's any clue [22:45:04] that's old stuff [22:45:09] it's just another view pointing to deploymentwiki [22:45:23] all the tables are views in labswiki [22:45:31] was http://bugs.wmflabs.org/54847 [22:45:52] (03CR) 10Legoktm: [C: 04-1] "PHP libraries should use the 'composer-test-package' template. composer-test is for MW extensions, and this library doesn't have any npm/j" [integration/config] - 10https://gerrit.wikimedia.org/r/248551 (owner: 10Gergő Tisza) [22:49:27] (03PS2) 10Gergő Tisza: Add checks for mediawiki/oauthclient-php [integration/config] - 10https://gerrit.wikimedia.org/r/248551 [23:00:20] (03PS3) 10Legoktm: Add checks for mediawiki/oauthclient-php [integration/config] - 10https://gerrit.wikimedia.org/r/248551 (owner: 10Gergő Tisza) [23:01:51] (03CR) 10Legoktm: [C: 032] Add checks for mediawiki/oauthclient-php [integration/config] - 10https://gerrit.wikimedia.org/r/248551 (owner: 10Gergő Tisza) [23:02:53] (03Merged) 10jenkins-bot: Add checks for mediawiki/oauthclient-php [integration/config] - 10https://gerrit.wikimedia.org/r/248551 (owner: 10Gergő Tisza) [23:03:58] !log deploying https://gerrit.wikimedia.org/r/248551 [23:04:03] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [23:04:27] Project browsertests-Gather-en.m.wikipedia.beta.wmflabs.org-linux-chrome-sauce build #302: 04FAILURE in 7 min 27 sec: https://integration.wikimedia.org/ci/job/browsertests-Gather-en.m.wikipedia.beta.wmflabs.org-linux-chrome-sauce/302/ [23:05:08] !log restoring deployment-db2 from dump [23:05:17] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [23:09:59] (03PS1) 10Gergő Tisza: Add Doxygen and test coverage for mediawiki/oauthclient-php [integration/config] - 10https://gerrit.wikimedia.org/r/248562 [23:47:15] 10Beta-Cluster-Infrastructure: +Sysop for User:Mww113 - https://phabricator.wikimedia.org/T116364#1749854 (10Mww113) I believe I would need it on en.wikipedia.beta and meta.wikimedia.beta. I will be testing interface pages related to the title blacklist and antispoof. Specifically: MediaWiki:antispoof-conflict-... [23:56:51] 10Beta-Cluster-Infrastructure: +Sysop for User:Mww113 - https://phabricator.wikimedia.org/T116364#1749862 (10Luke081515) 5Open>3stalled We have to wait for T116447, at the moment the whole beta cluster is ready only, so I can not modify userrights, and I wanted, that user users can say something too, if they...