[00:06:52] (03CR) 10JanZerebecki: [C: 031] Have php-composer-test job pass even if no composer.json exists [integration/config] - 10https://gerrit.wikimedia.org/r/264990 (owner: 10Legoktm) [00:07:44] PROBLEM - English Wikipedia Main page on beta-cluster is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:08:47] 6Release-Engineering-Team, 6Phabricator, 10Traffic, 6operations, 5Patch-For-Review: Phabricator needs to expose ssh - https://phabricator.wikimedia.org/T100519#1950585 (10mmodell) @dzahn: awesome, thanks! [00:26:30] (03CR) 10Paladox: "Hi would this be possible to do it too npm since we want to switch all repos from jslint to npm with jshint and jsonlint left as a fallbac" [integration/config] - 10https://gerrit.wikimedia.org/r/264990 (owner: 10Legoktm) [00:26:39] (03CR) 10Paladox: [C: 031] Have php-composer-test job pass even if no composer.json exists [integration/config] - 10https://gerrit.wikimedia.org/r/264990 (owner: 10Legoktm) [00:28:20] (03CR) 10Legoktm: "Yes, I'd like to do it with composer-test as a test first before updating npm, which is more widely used." [integration/config] - 10https://gerrit.wikimedia.org/r/264990 (owner: 10Legoktm) [00:42:02] (03CR) 10Paladox: "Ok thanks." [integration/config] - 10https://gerrit.wikimedia.org/r/264990 (owner: 10Legoktm) [00:57:17] !log jenkins-deploy@deployment-bastion:/srv/mediawiki-staging/php-master/extensions$ git reset HEAD SpellingDictionary [00:57:21] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [01:00:13] !log jenkins-deploy@deployment-bastion:/srv/mediawiki-staging/php-master/extensions$ git pull && git submodule update --init --recursive [01:00:19] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [01:00:56] oops it seems the --recursive was never before done *sigh* [01:07:40] hopefully that fixed https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/ [01:14:56] 10Continuous-Integration-Infrastructure, 5Continuous-Integration-Scaling, 6operations: install/setup/deploy cobalt as replacement for gallium - https://phabricator.wikimedia.org/T95959#1950862 (10Dzahn) maybe we could go ahead with the same setup we had on gallium, or let's make an actual blocker for the net... [01:15:19] 10Continuous-Integration-Infrastructure, 5Continuous-Integration-Scaling, 10netops, 6operations: install/setup/deploy cobalt as replacement for gallium - https://phabricator.wikimedia.org/T95959#1950863 (10Dzahn) [01:15:48] 10Continuous-Integration-Infrastructure, 5Continuous-Integration-Scaling, 10netops, 6operations: install/setup/deploy cobalt as replacement for gallium - https://phabricator.wikimedia.org/T95959#1204862 (10Dzahn) added @netops please specify which VLAN to use for cobalt [01:19:09] PROBLEM - App Server Main HTTP Response on deployment-mediawiki03 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:26:38] I’m seeing some 'Cannot initiate the connection to webproxy.eqiad.wmnet:8080’ messages in deployment-prep (specifically, on deployment-eventlogging03) [01:26:54] that makes me think that puppet is broken there, or the deployment-prep puppet repo is way out of date. [01:29:00] RECOVERY - App Server Main HTTP Response on deployment-mediawiki03 is OK: HTTP OK: HTTP/1.1 200 OK - 39399 bytes in 1.962 second response time [01:29:29] yeah, puppet is broken on deployment-eventlogging03. [01:29:37] anyone here to care? [01:42:49] mh my intervention for beta-code-update-eqiad didn't work as it's running with --recursive and failing [01:59:01] !log jenkins-deploy@deployment-bastion:/srv/mediawiki-staging/php-master/extensions/SpellingDictionary$ rm -r modules/jquery.uls && git rm modules/jquery.uls [01:59:05] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [02:04:28] Yippee, build fixed! [02:04:28] Project beta-code-update-eqiad build #89404: 09FIXED in 1 min 27 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/89404/ [02:09:07] 10Beta-Cluster-Infrastructure, 10Spelling-Dictionary: beta-code-update fails because of partial submodule in SpellingDictionary - https://phabricator.wikimedia.org/T124266#1951027 (10JanZerebecki) 3NEW [02:11:31] 10Beta-Cluster-Infrastructure, 10Spelling-Dictionary, 5Patch-For-Review: beta-code-update fails because of partial submodule in SpellingDictionary - https://phabricator.wikimedia.org/T124266#1951039 (10JanZerebecki) ``` !log jenkins-deploy@deployment-bastion:/srv/mediawiki-staging/php-master/extensions$ git... [03:39:33] Project browsertests-MobileFrontend-en.m.wikipedia.beta.wmflabs.org-linux-firefox-sauce build #949: 04FAILURE in 57 min: https://integration.wikimedia.org/ci/job/browsertests-MobileFrontend-en.m.wikipedia.beta.wmflabs.org-linux-firefox-sauce/949/ [04:02:08] PROBLEM - App Server Main HTTP Response on deployment-mediawiki03 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:06:58] RECOVERY - App Server Main HTTP Response on deployment-mediawiki03 is OK: HTTP OK: HTTP/1.1 200 OK - 39413 bytes in 0.638 second response time [06:13:10] PROBLEM - App Server Main HTTP Response on deployment-mediawiki03 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:23:34] 10Continuous-Integration-Infrastructure, 5Continuous-Integration-Scaling, 10netops, 6operations: install/setup/deploy cobalt as replacement for gallium - https://phabricator.wikimedia.org/T95959#1951302 (10akosiaris) So, gallium is in `public1-b-eqiad` (208.80.154.128/26). The story behind a public IP is a... [08:26:19] Project browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-os_x_10.9-safari-sauce build #851: 04FAILURE in 16 min: https://integration.wikimedia.org/ci/job/browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-os_x_10.9-safari-sauce/851/ [09:33:52] 10Beta-Cluster-Infrastructure: wgCentralAuthCheckLoggedInURL in Beta Cluster should be https - https://phabricator.wikimedia.org/T124275#1951364 (10Bugreporter) 3NEW [09:34:32] 6Release-Engineering-Team, 6Phabricator, 10Traffic, 6operations, 5Patch-For-Review: Phabricator needs to expose ssh - https://phabricator.wikimedia.org/T100519#1951371 (10faidon) @mmodell can you please fix IPv6 instead or explain why it is difficult to do so? FWIW, IPv6 penetration is > 10% globally and... [09:35:03] 10Beta-Cluster-Infrastructure: wgCentralAuthCheckLoggedInURL in Beta Cluster should be https - https://phabricator.wikimedia.org/T124275#1951373 (10Legoktm) Beta cluster doesn't support HTTPS... [09:35:20] 10Beta-Cluster-Infrastructure, 6Labs, 10Labs-Infrastructure, 6operations: beta: Get SSL certificates for *.{projects}.beta.wmflabs.org - https://phabricator.wikimedia.org/T50501#1951375 (10hashar) [09:35:22] 10Beta-Cluster-Infrastructure: wgCentralAuthCheckLoggedInURL in Beta Cluster should be https - https://phabricator.wikimedia.org/T124275#1951374 (10hashar) [09:35:49] 10Beta-Cluster-Infrastructure: wgCentralAuthCheckLoggedInURL in Beta Cluster should be https - https://phabricator.wikimedia.org/T124275#1951377 (10hashar) 5Open>3stalled There is no HTTPS on beta cluster, hence stalling this task and marking it as being blocked by {T50501}. [09:39:02] Project browsertests-Echo-en.wikipedia.beta.wmflabs.org-linux-chrome-sauce build #742: 04FAILURE in 2 min 1 sec: https://integration.wikimedia.org/ci/job/browsertests-Echo-en.wikipedia.beta.wmflabs.org-linux-chrome-sauce/742/ [09:43:48] 10Beta-Cluster-Infrastructure, 6Labs, 10Labs-Infrastructure, 6operations: beta: Get SSL certificates for *.{projects}.beta.wmflabs.org - https://phabricator.wikimedia.org/T50501#1951388 (10faidon) >>! In T50501#527689, @Krinkle wrote: > Would it be an option to flatten our subdomains? > > We'd only need b... [10:16:58] 10Continuous-Integration-Infrastructure, 5Continuous-Integration-Scaling, 10netops, 6operations: install/setup/deploy cobalt as replacement for gallium - https://phabricator.wikimedia.org/T95959#1951435 (10hashar) `gallium.wikimedia.org` has a bunch of services which are exposed publicly via the misc-web v... [10:28:27] hashar: when you have some time, could you take a look at https://gerrit.wikimedia.org/r/264990 and see if you like the concept? [10:33:15] 6Release-Engineering-Team, 6Phabricator, 10Traffic, 6operations, 5Patch-For-Review: Phabricator needs to expose ssh - https://phabricator.wikimedia.org/T100519#1951457 (10Reedy) >>! In T100519#1951371, @faidon wrote: > @mmodell can you please fix IPv6 instead or explain why it is difficult to do so? FWIW... [10:48:43] legoktm: will do :) [10:55:20] (03CR) 10Hashar: "Sounds wise and can be generalized to the npm job as well." [integration/config] - 10https://gerrit.wikimedia.org/r/264990 (owner: 10Legoktm) [11:11:06] 6Release-Engineering-Team, 6Phabricator, 10Traffic, 6operations, 5Patch-For-Review: Phabricator needs to expose ssh - https://phabricator.wikimedia.org/T100519#1951490 (10mmodell) @faidon: I don't have any idea how to fix ipv6. I have zero experience with the systems involved and I don't even have ipv6... [11:16:41] Am testing https://gerrit.wikimedia.org/r/#/c/264978/ in beta [11:16:52] please let me know if you see any issues with the portals there, or any other apache weirdness [11:20:54] 6Release-Engineering-Team, 6Phabricator, 10Traffic, 6operations, 5Patch-For-Review: Phabricator needs to expose ssh - https://phabricator.wikimedia.org/T100519#1951505 (10faidon) >>! In T100519#1951490, @mmodell wrote: > @faidon: I don't have any idea how to fix ipv6. I have zero experience with the sys... [11:32:12] Krenair: you are probably on your own :-} [11:41:33] 6Release-Engineering-Team, 6Phabricator, 10Traffic, 6operations, 5Patch-For-Review: Phabricator needs to expose ssh - https://phabricator.wikimedia.org/T100519#1951540 (10mmodell) @faidon: I was only summarizing the discussion we (myself, @reedy, @dzahn and @chasemp) had in IRC. Please don't shoot the me... [12:06:46] 6Release-Engineering-Team, 6Phabricator, 10Traffic, 6operations, 5Patch-For-Review: Phabricator needs to expose ssh - https://phabricator.wikimedia.org/T100519#1951552 (10Reedy) >>! In T100519#1951505, @faidon wrote: > In any case, please approach "X is broken and I don't know how to fix it" with "can so... [12:07:42] 6Release-Engineering-Team, 6Phabricator, 10Traffic, 6operations, 5Patch-For-Review: Phabricator needs to expose ssh - https://phabricator.wikimedia.org/T100519#1951553 (10hashar) The DNS IPv6 entry has been dropped yesterday because there is no ssh service listening there to serve the git repositories.... [12:45:08] Project browsertests-MobileFrontend-en.m.wikipedia.beta.wmflabs.org-linux-chrome-sauce build #917: 15ABORTED in 13 min: https://integration.wikimedia.org/ci/job/browsertests-MobileFrontend-en.m.wikipedia.beta.wmflabs.org-linux-chrome-sauce/917/ [13:25:50] Project browsertests-MobileFrontend-en.m.wikipedia.beta.wmflabs.org-linux-chrome-sauce build #919: 04STILL FAILING in 26 min: https://integration.wikimedia.org/ci/job/browsertests-MobileFrontend-en.m.wikipedia.beta.wmflabs.org-linux-chrome-sauce/919/ [13:37:05] (03PS12) 10Hashar: castor: package managers cache storage [integration/config] - 10https://gerrit.wikimedia.org/r/264327 (https://phabricator.wikimedia.org/T112560) [13:37:20] (03PS13) 10Hashar: castor: package managers cache storage [integration/config] - 10https://gerrit.wikimedia.org/r/264327 (https://phabricator.wikimedia.org/T112560) [13:58:57] RECOVERY - App Server Main HTTP Response on deployment-mediawiki03 is OK: HTTP OK: HTTP/1.1 200 OK - 39391 bytes in 0.610 second response time [14:04:26] ryasmeen|Away: ready for the meeting? [14:05:07] PROBLEM - App Server Main HTTP Response on deployment-mediawiki03 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:50:38] 6Release-Engineering-Team, 6Phabricator, 10Traffic, 6operations, 5Patch-For-Review: Phabricator needs to expose ssh - https://phabricator.wikimedia.org/T100519#1951773 (10BBlack) I'm putting together 3x commits for review that I think will resolve this, they should show up below... [15:38:08] (03PS14) 10Hashar: castor: package managers cache storage [integration/config] - 10https://gerrit.wikimedia.org/r/264327 (https://phabricator.wikimedia.org/T112560) [15:47:01] greg-g: i see you have reserved r32 for a meeting with zeljkof [15:47:16] greg-g: if you're not coming to the office, can i steal the room? [15:47:17] :P [15:48:46] (03PS15) 10Hashar: castor: package managers cache storage [integration/config] - 10https://gerrit.wikimedia.org/r/264327 (https://phabricator.wikimedia.org/T112560) [15:49:03] (03CR) 10Hashar: [C: 032] "Unleash time" [integration/config] - 10https://gerrit.wikimedia.org/r/264327 (https://phabricator.wikimedia.org/T112560) (owner: 10Hashar) [15:49:56] (03CR) 10Hashar: [C: 04-2] "grrrr" [integration/config] - 10https://gerrit.wikimedia.org/r/264327 (https://phabricator.wikimedia.org/T112560) (owner: 10Hashar) [15:50:44] (03CR) 10jenkins-bot: [V: 04-1] castor: package managers cache storage [integration/config] - 10https://gerrit.wikimedia.org/r/264327 (https://phabricator.wikimedia.org/T112560) (owner: 10Hashar) [15:53:26] (03CR) 10Hashar: "recheck" [integration/config] - 10https://gerrit.wikimedia.org/r/264327 (https://phabricator.wikimedia.org/T112560) (owner: 10Hashar) [15:53:41] mobrovac: yup [15:54:08] mobrovac: r32 removed from invite [15:54:24] cool! thnx greg-g! [15:55:34] * mobrovac r32 stolen :) [15:57:36] (03CR) 10Hashar: [C: 032] "castor-save now actually skip (exit 1) unless it is gate-and-submit." [integration/config] - 10https://gerrit.wikimedia.org/r/264327 (https://phabricator.wikimedia.org/T112560) (owner: 10Hashar) [15:57:50] (03PS16) 10Hashar: castor: package managers cache storage [integration/config] - 10https://gerrit.wikimedia.org/r/264327 (https://phabricator.wikimedia.org/T112560) [15:58:01] (03CR) 10Hashar: [C: 032] castor: package managers cache storage [integration/config] - 10https://gerrit.wikimedia.org/r/264327 (https://phabricator.wikimedia.org/T112560) (owner: 10Hashar) [15:59:59] (03Merged) 10jenkins-bot: castor: package managers cache storage [integration/config] - 10https://gerrit.wikimedia.org/r/264327 (https://phabricator.wikimedia.org/T112560) (owner: 10Hashar) [16:08:45] (03PS1) 10Hashar: Enable castor on {name}-tox-{toxenv}-jessie jobs [integration/config] - 10https://gerrit.wikimedia.org/r/265502 (https://phabricator.wikimedia.org/T112560) [16:16:27] 10Deployment-Systems, 3Scap3: Bug in scap3 git submodule url rewriting - https://phabricator.wikimedia.org/T121884#1952300 (10mmodell) [16:22:25] (03CR) 10Hashar: [C: 032] Enable castor on {name}-tox-{toxenv}-jessie jobs [integration/config] - 10https://gerrit.wikimedia.org/r/265502 (https://phabricator.wikimedia.org/T112560) (owner: 10Hashar) [16:23:31] 5Continuous-Integration-Scaling, 7Tracking: Investigate using Drydock for CI - https://phabricator.wikimedia.org/T116038#1952317 (10hashar) 5Open>3declined a:3hashar I went with a lame central rsync server. >>! In T112560#1938443, @hashar wrote: > Did a first pass using a cache store/restore system base... [16:23:33] 5Continuous-Integration-Scaling, 5Patch-For-Review, 7Tracking: [tracking] Disposable VMs need a cache for package managers - https://phabricator.wikimedia.org/T112560#1952322 (10hashar) [16:23:46] 5Continuous-Integration-Scaling: Investigate using gemstash - https://phabricator.wikimedia.org/T119196#1952323 (10hashar) 5Open>3declined a:3hashar I went with a lame central rsync server. >>! In T112560#1938443, @hashar wrote: > Did a first pass using a cache store/restore system based on rsync. Investi... [16:23:48] 5Continuous-Integration-Scaling, 5Patch-For-Review, 7Tracking: [tracking] Disposable VMs need a cache for package managers - https://phabricator.wikimedia.org/T112560#1638442 (10hashar) [16:24:31] 5Continuous-Integration-Scaling, 5Patch-For-Review, 7Tracking: [tracking] Disposable VMs need a cache for package managers - https://phabricator.wikimedia.org/T112560#1638442 (10hashar) [16:24:33] 5Continuous-Integration-Scaling, 7Tracking, 7WorkType-NewFunctionality: Investigate using a Squid based man in the middle proxy to cache package manager SSL connections - https://phabricator.wikimedia.org/T116015#1952330 (10hashar) 5Open>3declined Declining this for now. I went with a lame central rsync... [16:24:55] 5Continuous-Integration-Scaling, 5Patch-For-Review, 7Tracking: [tracking] Disposable VMs need a cache for package managers - https://phabricator.wikimedia.org/T112560#1638442 (10hashar) [16:25:00] (03Merged) 10jenkins-bot: Enable castor on {name}-tox-{toxenv}-jessie jobs [integration/config] - 10https://gerrit.wikimedia.org/r/265502 (https://phabricator.wikimedia.org/T112560) (owner: 10Hashar) [16:26:47] 5Continuous-Integration-Scaling, 5Patch-For-Review, 7Tracking: [tracking] Disposable VMs need a cache for package managers - https://phabricator.wikimedia.org/T112560#1952342 (10hashar) So I went with a rsync based approach which I have nicknamed CASTOR for CAche STORage. It is implemented via Jenkins `jjb/... [16:31:39] 6Release-Engineering-Team, 6Phabricator, 10Traffic, 6operations, 5Patch-For-Review: Phabricator needs to expose ssh - https://phabricator.wikimedia.org/T100519#1952355 (10greg) Thanks @bblack [17:05:42] PROBLEM - Host deployment-sca02 is DOWN: CRITICAL - Host Unreachable (10.68.16.173) [17:12:31] PROBLEM - Host integration-slave-jessie-1001 is DOWN: CRITICAL - Host Unreachable (10.68.16.72) [17:12:45] Hi does anyone know when ssh will be enabled on phabricator for things like uploading patches. [17:12:53] PROBLEM - Host deployment-logstash2 is DOWN: CRITICAL - Host Unreachable (10.68.16.147) [17:13:11] PROBLEM - Host deployment-fluorine is DOWN: CRITICAL - Host Unreachable (10.68.16.198) [17:16:43] RECOVERY - Host deployment-sca02 is UP: PING OK - Packet loss = 0%, RTA = 0.61 ms [17:17:55] RECOVERY - Host deployment-logstash2 is UP: PING OK - Packet loss = 0%, RTA = 1.22 ms [17:18:13] RECOVERY - Host deployment-fluorine is UP: PING OK - Packet loss = 0%, RTA = 2.63 ms [17:20:49] legoktm: Do you know if a patch has been uploaded that fixes the merging of arrays for groupermission. [17:22:29] RECOVERY - Host integration-slave-jessie-1001 is UP: PING OK - Packet loss = 0%, RTA = 0.64 ms [17:34:02] 10Continuous-Integration-Config, 10QuickSurveys: QuickSurveys doesn't run qunit on commit basis - https://phabricator.wikimedia.org/T124309#1952542 (10Jdlrobson) 3NEW [17:39:39] marxarelli: do you have a plan for rolling wikis forward today? anomie (not in this channel!) seems to think that trying .11 on group2 again should be safe and then all the way to group3 if errors don't spike. [17:39:48] (03PS1) 10Paladox: [QuickSurveys] Add QUnit tests [integration/config] - 10https://gerrit.wikimedia.org/r/265513 (https://phabricator.wikimedia.org/T124309) [17:39:58] group3? :o [17:40:09] if they do spike then we'd need to either try to patch again or roll back depending on how bad things look [17:40:19] err group1 and group2 [17:40:27] off by one error [17:40:37] 10Continuous-Integration-Config, 10QuickSurveys, 5Patch-For-Review: QuickSurveys doesn't run qunit on commit basis - https://phabricator.wikimedia.org/T124309#1952571 (10Paladox) @Jdlrobson until the patch is merged you can use check experimental which will run the qunit tests. [17:40:44] bd808: that was my thinking as well [17:41:04] being this is my first deployment week, i'm open to suggestions from those more experienced :) [17:41:09] sweet. I'm going to check on logstash and see if it's better now [17:41:32] marxarelli: well to earn your shirt you need to act in good faith and still screw everything up [17:42:12] (then you need to convince someone to print more shirts) [17:45:56] twentyafterfour: Could you create a repo in diffusion for mediawiki extension QuickSurveys https://git.wikimedia.org/summary/mediawiki%2Fextensions%2FQuickSurveys.git [17:47:13] paladox: ok [17:48:33] twentyafterfour: Thanks. do you know when the next round of adding missing repos that were created in gerrit and not in diffusion because since early janaury anyone requesting a repo on gerrit was also created on diffusion but before that repos are still waiting to be created [17:49:23] twentyafterfour: Also could you review my patches at https://gerrit.wikimedia.org/r/#/q/status:open+project:phabricator/extensions+branch:master+owner:%22Paladox+%253Cthomasmulhall410%2540yahoo.com%253E%22,n,z they add more repos to the redirection script. [17:49:41] paladox: https://phabricator.wikimedia.org/diffusion/EQS/ [17:49:57] twentyafterfour: Tahnks. [17:50:00] thanks. [17:50:57] paladox: the redirect script is a generated list - no need to manually patch that list, ostriches can re-generate the list and then we do bulk updates to it [17:51:00] bd808: i feel like i did a pretty good job of that on tuesday [17:51:14] maybe not shirt worthy [17:51:30] twentyafterfour: Oh ok. [17:51:50] marxarelli: don't worry, you can always do something shirt worthy next time :P [17:52:17] (03CR) 10Jdlrobson: [C: 031] [QuickSurveys] Add QUnit tests [integration/config] - 10https://gerrit.wikimedia.org/r/265513 (https://phabricator.wikimedia.org/T124309) (owner: 10Paladox) [17:53:42] thanks paladox [17:53:51] i'd +2 but don't have that in this repository [17:54:25] jdlrobson: Your welcome. For now until that is merged you can do check experimental which should run the qunit tests. [17:56:07] PROBLEM - App Server Main HTTP Response on deployment-mediawiki03 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:01:55] PROBLEM - App Server bits response on deployment-mediawiki03 is CRITICAL: Connection refused [18:02:05] PROBLEM - App Server Main HTTP Response on deployment-mediawiki02 is CRITICAL: Connection refused [18:02:39] PROBLEM - App Server bits response on deployment-mediawiki02 is CRITICAL: Connection refused [18:09:32] PROBLEM - Host deployment-salt is DOWN: CRITICAL - Host Unreachable (10.68.16.99) [18:11:38] PROBLEM - Host deployment-parsoid05 is DOWN: CRITICAL - Host Unreachable (10.68.16.120) [18:11:47] 10Continuous-Integration-Config, 10QuickSurveys, 5Patch-For-Review: QuickSurveys doesn't run qunit on commit basis - https://phabricator.wikimedia.org/T124309#1952642 (10bd808) p:5Triage>3High [18:12:56] PROBLEM - Host deployment-memc03 is DOWN: CRITICAL - Host Unreachable (10.68.16.15) [18:14:03] PROBLEM - Host integration-slave-trusty-1013 is DOWN: CRITICAL - Host Unreachable (10.68.18.28) [18:14:05] Project beta-scap-eqiad build #87139: 04FAILURE in 6 min 29 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/87139/ [18:15:07] (03PS1) 10Paladox: [OpenLayers] Add jenkins tests [integration/config] - 10https://gerrit.wikimedia.org/r/265520 [18:16:53] RECOVERY - App Server bits response on deployment-mediawiki03 is OK: HTTP OK: HTTP/1.1 200 OK - 3896 bytes in 0.002 second response time [18:17:05] RECOVERY - App Server Main HTTP Response on deployment-mediawiki02 is OK: HTTP OK: HTTP/1.1 200 OK - 39391 bytes in 0.603 second response time [18:17:39] RECOVERY - App Server bits response on deployment-mediawiki02 is OK: HTTP OK: HTTP/1.1 200 OK - 3896 bytes in 0.003 second response time [18:17:57] RECOVERY - Host deployment-memc03 is UP: PING OK - Packet loss = 0%, RTA = 0.82 ms [18:18:29] RECOVERY - Host deployment-parsoid05 is UP: PING OK - Packet loss = 0%, RTA = 0.90 ms [18:18:55] RECOVERY - Host deployment-salt is UP: PING OK - Packet loss = 0%, RTA = 0.91 ms [18:19:03] RECOVERY - Host integration-slave-trusty-1013 is UP: PING OK - Packet loss = 0%, RTA = 0.69 ms [18:20:58] RECOVERY - App Server Main HTTP Response on deployment-mediawiki03 is OK: HTTP OK: HTTP/1.1 200 OK - 39399 bytes in 0.695 second response time [18:42:23] 10Continuous-Integration-Infrastructure, 10Analytics: Add json linting test for schemas in mediawiki/event-schemas - https://phabricator.wikimedia.org/T124319#1952779 (10bd808) 3NEW [18:48:13] PROBLEM - Host integration-dev is DOWN: CRITICAL - Host Unreachable (10.68.16.227) [18:49:44] (03PS1) 10Paladox: [mediawiki/event-schemas] Add jenkins tests [integration/config] - 10https://gerrit.wikimedia.org/r/265534 (https://phabricator.wikimedia.org/T124319) [18:49:55] PROBLEM - Host deployment-mathoid is DOWN: CRITICAL - Host Unreachable (10.68.17.222) [18:50:44] (03CR) 10jenkins-bot: [V: 04-1] [mediawiki/event-schemas] Add jenkins tests [integration/config] - 10https://gerrit.wikimedia.org/r/265534 (https://phabricator.wikimedia.org/T124319) (owner: 10Paladox) [18:51:03] PROBLEM - Host deployment-mediawiki03 is DOWN: CRITICAL - Host Unreachable (10.68.17.55) [18:53:41] PROBLEM - Host integration-slave-precise-1014 is DOWN: CRITICAL - Host Unreachable (10.68.17.16) [18:53:57] PROBLEM - Host deployment-pdf02 is DOWN: CRITICAL - Host Unreachable (10.68.16.129) [18:54:28] (03PS2) 10Paladox: [mediawiki/event-schemas] Add jenkins tests [integration/config] - 10https://gerrit.wikimedia.org/r/265534 (https://phabricator.wikimedia.org/T124319) [18:54:40] PROBLEM - Host deployment-memc04 is DOWN: CRITICAL - Host Unreachable (10.68.17.69) [18:55:30] PROBLEM - Host deployment-elastic06 is DOWN: CRITICAL - Host Unreachable (10.68.17.186) [18:56:00] PROBLEM - Host deployment-eventlogging03 is DOWN: CRITICAL - Host Unreachable (10.68.18.111) [18:56:36] PROBLEM - Host integration-puppetmaster is DOWN: CRITICAL - Host Unreachable (10.68.16.42) [18:56:54] PROBLEM - Host integration-saltmaster is DOWN: CRITICAL - Host Unreachable (10.68.18.24) [18:57:14] RECOVERY - Host deployment-mediawiki03 is UP: PING OK - Packet loss = 0%, RTA = 0.61 ms [18:57:22] PROBLEM - Host integration-slave-trusty-1016 is DOWN: CRITICAL - Host Unreachable (10.68.18.34) [18:57:58] RECOVERY - Host integration-dev is UP: PING OK - Packet loss = 0%, RTA = 0.76 ms [18:58:45] RECOVERY - Host integration-slave-precise-1014 is UP: PING OK - Packet loss = 0%, RTA = 0.83 ms [18:58:55] RECOVERY - Host deployment-pdf02 is UP: PING OK - Packet loss = 0%, RTA = 0.75 ms [18:59:39] RECOVERY - Host deployment-mathoid is UP: PING OK - Packet loss = 0%, RTA = 2.07 ms [18:59:41] RECOVERY - Host deployment-memc04 is UP: PING OK - Packet loss = 0%, RTA = 1.05 ms [19:00:29] RECOVERY - Host deployment-elastic06 is UP: PING OK - Packet loss = 0%, RTA = 1.32 ms [19:00:59] RECOVERY - Host deployment-eventlogging03 is UP: PING OK - Packet loss = 0%, RTA = 1.10 ms [19:01:35] RECOVERY - Host integration-puppetmaster is UP: PING OK - Packet loss = 0%, RTA = 1.15 ms [19:01:53] RECOVERY - Host integration-saltmaster is UP: PING OK - Packet loss = 0%, RTA = 1.11 ms [19:02:23] RECOVERY - Host integration-slave-trusty-1016 is UP: PING OK - Packet loss = 0%, RTA = 0.78 ms [19:14:50] (03PS1) 10Ricordisamoa: [labs/tools/ptable] tox [integration/config] - 10https://gerrit.wikimedia.org/r/265543 [19:22:11] (03PS1) 10Legoktm: Skizzerz changed his email address [integration/config] - 10https://gerrit.wikimedia.org/r/265546 [19:22:26] Project beta-update-databases-eqiad build #5955: 04FAILURE in 2 min 25 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/5955/ [19:22:39] (03PS2) 10Legoktm: Skizzerz changed his email address [integration/config] - 10https://gerrit.wikimedia.org/r/265546 [19:22:48] (03CR) 10Legoktm: [C: 032] Skizzerz changed his email address [integration/config] - 10https://gerrit.wikimedia.org/r/265546 (owner: 10Legoktm) [19:24:34] (03CR) 10Ricordisamoa: "labs/tools/ptable patch at https://gerrit.wikimedia.org/r/265544" [integration/config] - 10https://gerrit.wikimedia.org/r/265543 (owner: 10Ricordisamoa) [19:24:38] (03Merged) 10jenkins-bot: Skizzerz changed his email address [integration/config] - 10https://gerrit.wikimedia.org/r/265546 (owner: 10Legoktm) [19:25:04] !log deploying https://gerrit.wikimedia.org/r/265546 [19:25:09] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [19:28:54] (03PS2) 10Legoktm: Have php-composer-test job pass even if no composer.json exists [integration/config] - 10https://gerrit.wikimedia.org/r/264990 [19:29:44] (03CR) 10Legoktm: "When we move to nodepool, I envision that we have one mega job that will run composer, npm, MW phpunit, and MW qunit in one instance inste" [integration/config] - 10https://gerrit.wikimedia.org/r/264990 (owner: 10Legoktm) [19:32:07] problem at betacluster: https://phabricator.wikimedia.org/T124333 Special:Preferences says "Database error - Error: 145 Table './centralauth/globalnames' is marked as crashed and should be repaired (10.68.16.193)" [19:34:11] o.O [19:39:10] marxarelli: ^? [19:39:29] !log deploying jjb changes for https://gerrit.wikimedia.org/r/264990 [19:39:34] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [19:40:32] legoktm: fun times. deploying right now [19:40:45] thcipriani, ostriches, twentyafterfour: ^ [19:41:06] oh, I see. [19:41:25] Hm [19:47:37] where be the root password for mysql on beta? [19:48:23] !log deploying https://gerrit.wikimedia.org/r/265552 [19:48:28] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [19:55:22] ostriches: I got in using the 'sql' shell script [19:55:33] Ah didn't know that'd have enough for REPAIR [19:56:11] not too sure it does yet, haven't run it :P [19:56:37] ostriches: i believe the mysql root is in /root/.my.cnf [19:57:23] !log ran REPAIR TABLE globalnames; on centralauth db [19:57:28] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [19:57:37] seems to have worked. Now there is a redis connection error. [19:59:44] looking in a weird spot I guess. Special:Preferences seems to work now. [20:00:11] login and special:preferences work again now. Should I close the bug and paste the IRC notes from above in as explanation? [20:00:37] quiddity: I can close, I'll make a note as to what I did. [20:00:45] nod. thanks :) [20:00:56] thanks for the report! [20:04:18] 10Beta-Cluster-Infrastructure: Beta Cluster Database error - Error: 145 Table './centralauth/globalnames' is marked as crashed and should be repaired (10.68.16.193) - https://phabricator.wikimedia.org/T124333#1953130 (10thcipriani) 5Open>3Resolved a:3thcipriani Had to repair the `globalnames` table for `ce... [22:42:07] bd808: should be pretty easy. Just create a new svn repo in phab and import the dump over the bare repo [22:42:19] That's what I did for the other 3 [23:10:10] Yippee, build fixed! [23:10:10] Project browsertests-Gather-en.m.wikipedia.beta.wmflabs.org-linux-chrome-sauce build #393: 09FIXED in 13 min: https://integration.wikimedia.org/ci/job/browsertests-Gather-en.m.wikipedia.beta.wmflabs.org-linux-chrome-sauce/393/ [23:11:39] ostriches: how did you get phabricator to import the commits after you copied the repos over? [23:12:48] * twentyafterfour claimed the ticket and created the svn repo but not sure about the import part [23:12:52] I can't remember. [23:13:05] Command line took? [23:13:08] Tool [23:15:13] probably ;) [23:16:33] why are there both Ops and operations tabs in jenkins [23:17:20] * ostriches shrugs [23:23:08] greg-g: have you had any luck replicting that issue? [23:23:19] jdlrobson: I haven't :/ [23:23:19] i've been hitting random on cswiktionary forever and not found anything [23:23:32] just to be clear, he's getting it when clicking on a section header [23:23:37] section edit link that is [23:25:57] jdlrobson: lego just got a repro [23:26:00] see -operations [23:27:07] bd808: that svn archive isn't just one repo, it's a bunch of repos [23:27:26] ostriches: ^ [23:27:51] 582 of them. [23:29:29] Ummm [23:29:31] Ew [23:30:37] 10Deployment-Systems, 3Scap3, 7WorkType-NewFunctionality: Build a dependency graph resolver for deployment stages and tasks - https://phabricator.wikimedia.org/T120684#1953996 (10mmodell) p:5High>3Low [23:33:03] maybe we can merge them into one somehow? It's been a long time since I used svn, I'm quite rusty [23:33:54] I did that before [23:34:26] can't you just dump one, and import several into the same? [23:35:23] I'm not sure... /me searches stackoverflow [23:36:10] Ihttps://stackoverflow.com/questions/267256/combining-multiple-svn-repositories-into-one [23:40:20] hmm . https://integration.wikimedia.org/ci/job/operations-puppet-tox-py27/17669/console is this a jenkins issue? [23:47:06] 6Release-Engineering-Team, 10Browser-Tests-Infrastructure, 10Reading-Web, 5Patch-For-Review: Failed Jenkins job sets Sauce Labs job to passed - https://phabricator.wikimedia.org/T105589#1954088 (10dduvall) @jdlrobson, try upgrading to `mediawiki_selenium` 1.6.3 and see if this still occurs—check the [[ htt...