[01:15:06] (03PS5) 1020after4: First attempt at phabricator/harbormaster job templates [integration/config] - 10https://gerrit.wikimedia.org/r/295396 (https://phabricator.wikimedia.org/T130950) [01:39:28] (03PS6) 1020after4: WIP: first attempt at phabricator/harbormaster job templates [integration/config] - 10https://gerrit.wikimedia.org/r/295396 (https://phabricator.wikimedia.org/T130950) [01:40:39] (03CR) 1020after4: "PS6 changes:" [integration/config] - 10https://gerrit.wikimedia.org/r/295396 (https://phabricator.wikimedia.org/T130950) (owner: 1020after4) [03:28:20] Project browsertests-Wikidata-WikidataTests-Group0-linux-chrome-sauce build #85: 04FAILURE in 2 hr 16 min: https://integration.wikimedia.org/ci/job/browsertests-Wikidata-WikidataTests-Group0-linux-chrome-sauce/85/ [04:51:29] 10Continuous-Integration-Infrastructure, 05Gerrit-Migration, 03releng-201617-q1, 10Differential: Build glue from Differential to Nodepool - https://phabricator.wikimedia.org/T130950#2398151 (10mmodell) [07:08:56] 06Release-Engineering-Team, 03releng-201617-q1, 15User-greg: Perform a technical debt analysis of software and services maintained by WMF Release Engineering - https://phabricator.wikimedia.org/T138225#2398192 (10greg) [07:11:21] 06Release-Engineering-Team, 03releng-201617-q1, 15User-greg: Perform a technical debt analysis of software and services maintained by WMF Release Engineering - https://phabricator.wikimedia.org/T138225#2398195 (10greg) [08:02:16] 10Continuous-Integration-Infrastructure, 10Phabricator: Integrate Jenkins with Phabricator with Harbormaster - https://phabricator.wikimedia.org/T89714#2398214 (10mmodell) [08:02:19] 10Continuous-Integration-Infrastructure, 05Gerrit-Migration, 03releng-201617-q1, 10Differential: Build glue from Differential to Nodepool - https://phabricator.wikimedia.org/T130950#2398217 (10mmodell) [08:03:12] 05Gerrit-Migration, 03releng-201617-q1, 07Documentation: Document workflow and creation of CI jobs in Differential - https://phabricator.wikimedia.org/T130952#2398223 (10mmodell) [08:03:15] 10Continuous-Integration-Infrastructure, 05Gerrit-Migration, 10releng-201516-q2, 10releng-201516-q3, and 2 others: [keyresult] Connect Differential code review with continuous integration - https://phabricator.wikimedia.org/T31#2398224 (10mmodell) [08:03:18] 10Continuous-Integration-Infrastructure, 05Gerrit-Migration, 03releng-201617-q1, 10Differential: Build glue from Differential to Nodepool - https://phabricator.wikimedia.org/T130950#2151799 (10mmodell) 05Open>03Resolved [08:03:48] 05Gerrit-Migration: Identify features Gerrit users would miss in Phabricator - https://phabricator.wikimedia.org/T23#2398227 (10mmodell) [08:03:52] 05Gerrit-Migration: Plan to migrate code review from Gerrit to Phabricator - https://phabricator.wikimedia.org/T18#2398228 (10mmodell) [08:03:55] 10Continuous-Integration-Infrastructure, 05Gerrit-Migration, 10releng-201516-q2, 10releng-201516-q3, and 2 others: [keyresult] Connect Differential code review with continuous integration - https://phabricator.wikimedia.org/T31#477 (10mmodell) 05Open>03Resolved a:03mmodell [08:05:47] 05Gerrit-Migration, 03releng-201617-q1, 07Documentation: Document workflow and creation of CI jobs in Differential - https://phabricator.wikimedia.org/T130952#2398230 (10mmodell) >>! In T130952#2393797, @Paladox wrote: > Oh, but what happends if we doint want for example the npm test to run since we have pac... [08:56:50] 05Gerrit-Migration, 03releng-201617-q1, 07Documentation: Document workflow and creation of CI jobs in Differential - https://phabricator.wikimedia.org/T130952#2398342 (10hashar) >>! In T130952#2393716, @mmodell wrote: > @hashar: Why is CI documentation on mediawiki.org instead of wikitech? We have all the d... [09:04:54] 10Deployment-Systems, 10scap, 10Wikimedia-Logstash, 03Scap3 (Scap3-Adoption-Phase1), 15User-bd808: Deploy kibana with scap3 - https://phabricator.wikimedia.org/T129138#2096388 (10fgiunchedi) I haven't reviewed the deb packages for kibana, though we're already importing elastic.co packages for ES and logs... [09:05:06] 05Gerrit-Migration, 03releng-201617-q1, 07Documentation: Document workflow and creation of CI jobs in Differential - https://phabricator.wikimedia.org/T130952#2398368 (10Paladox) @mmodellwhat about creating more tags that does one job like we do in Zuul. Since running one whole test with multiples being test... [09:25:19] 05Gerrit-Migration, 03releng-201617-q1, 07Documentation: Document workflow and creation of CI jobs in Differential - https://phabricator.wikimedia.org/T130952#2398382 (10mmodell) >>! In T130952#2398368, @Paladox wrote: > @mmodell what about creating more tags that does one job like we do in Zuul. Since runni... [09:30:34] 05Gerrit-Migration, 03releng-201617-q1, 07Documentation: Document workflow and creation of CI jobs in Differential - https://phabricator.wikimedia.org/T130952#2398384 (10Paladox) @mmodell since I know some repo's that are mw extensions and so they need to run the mwext-testextension test with this new tag bu... [09:31:34] 05Gerrit-Migration, 03releng-201617-q1, 07Documentation: Document workflow and creation of CI jobs in Differential - https://phabricator.wikimedia.org/T130952#2398387 (10hashar) One can have a look at the old {T111181} which was a random discussion with Jan From {bf1168939ab4b2ab79a98f9c5bbb91317db2d314} | h... [09:36:45] 05Gerrit-Migration, 03releng-201617-q1, 07Documentation: Document workflow and creation of CI jobs in Differential - https://phabricator.wikimedia.org/T130952#2398392 (10Paladox) @hashar and @mmodell maybe we can create a new file maybe called .ci-entrypoint.yaml and then we can add what test's we want to be... [09:38:58] 10Continuous-Integration-Infrastructure, 05Gerrit-Migration, 03releng-201516-q4, 10Differential: Spec out needed glue for Differential to Gearman to Nodepool - https://phabricator.wikimedia.org/T130949#2398398 (10hashar) The reason I wanted jobs to be triggered via Gearman (instead of Jenkins REST API) was... [09:57:24] 10Continuous-Integration-Infrastructure, 06Labs, 10Labs-Infrastructure: Nodepool has trouble taking snapshots on OpenStack labs - https://phabricator.wikimedia.org/T138106#2398447 (10hashar) p:05Triage>03Low I can't remember whether I managed to reproduce manually using the openstack CLI. But in case it... [10:00:37] 10Continuous-Integration-Infrastructure, 10Zuul: Investigate Zuul 2.1.0-151-g30a433b that stops processing Gerrit events - https://phabricator.wikimedia.org/T137525#2398474 (10hashar) Yeah they are all potential candidates. But I want to reproduce the issue on labs first, take trace and from there we can test... [10:08:01] 05Gerrit-Migration, 03releng-201617-q1, 07Documentation: Document workflow and creation of CI jobs in Differential - https://phabricator.wikimedia.org/T130952#2398492 (10mmodell) {F4191180} [10:08:54] 10Continuous-Integration-Infrastructure, 10Zuul: Investigate Zuul 2.1.0-151-g30a433b that stops processing Gerrit events - https://phabricator.wikimedia.org/T137525#2398507 (10Paladox) @hashar ok. :) [10:09:48] 05Gerrit-Migration, 03releng-201617-q1, 07Documentation: Document workflow and creation of CI jobs in Differential - https://phabricator.wikimedia.org/T130952#2398522 (10Paladox) Oh ok. [10:19:08] hashar_ Hi, i created a gerrit test instance. Website is http://gerrit-test.wmflabs.org/ [10:19:42] hashar_ i also created test instances for jenkins and zuul [10:19:43] http://gerrit-jenkins.wmflabs.org/ [10:19:52] http://gerrit-zuul.wmflabs.org/ [10:19:59] But they run on the same instance. [10:49:00] 10Beta-Cluster-Infrastructure, 10DBA, 10Flow, 03Collab-Team-2016-Apr-Jun-Q4: Run Flow External Store migration in dry-run mode on Beta - https://phabricator.wikimedia.org/T119567#2398659 (10jmatazzoni) 05Open>03Resolved [10:51:02] 10Beta-Cluster-Infrastructure, 10DBA, 10Flow, 03Collab-Team-2016-Apr-Jun-Q4: Run Flow External Store migration in dry-run mode on Beta - https://phabricator.wikimedia.org/T119567#2398664 (10jmatazzoni) [10:51:49] 10Beta-Cluster-Infrastructure, 10DBA, 10Flow, 03Collab-Team-2016-Apr-Jun-Q4: Run Flow External Store migration in dry-run mode on Beta - https://phabricator.wikimedia.org/T119567#1829837 (10jmatazzoni) [10:51:52] 10Beta-Cluster-Infrastructure, 10Flow, 03Collab-Team-2016-Apr-Jun-Q4, 13Patch-For-Review: Set up Flow-specific External Store cluster on Beta (secondary to the main one) - https://phabricator.wikimedia.org/T128417#2398665 (10jmatazzoni) 05Open>03Resolved [12:18:29] ryasmeen, stephanebisson: are you at wikimania? want to pair on javascript and selenium? [12:26:45] zeljkof: I'm not [12:27:37] stephanebisson: :( [12:27:45] next time then :) [12:34:36] Project beta-code-update-eqiad build #109676: 15ABORTED in 1 min 36 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/109676/ [12:43:45] PROBLEM - jenkins_service_running on gallium is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/java .*-jar /usr/share/jenkins/jenkins.war [12:43:45] PROBLEM - jenkins_zmq_publisher on gallium is CRITICAL: Connection refused [12:50:19] Project UploadWizard-api-commons.wikimedia.beta.wmflabs.org build #4162: 09SUCCESS in 19 sec: https://integration.wikimedia.org/ci/job/UploadWizard-api-commons.wikimedia.beta.wmflabs.org/4162/ [12:50:45] RECOVERY - jenkins_service_running on gallium is OK: PROCS OK: 1 process with regex args ^/usr/bin/java .*-jar /usr/share/jenkins/jenkins.war [12:50:53] RECOVERY - jenkins_zmq_publisher on gallium is OK: TCP OK - 0.000 second response time on port 8888 [13:04:49] Project selenium-Math » chrome,beta,Linux,contintLabsSlave && UbuntuTrusty build #59: 09SUCCESS in 39 sec: https://integration.wikimedia.org/ci/job/selenium-Math/BROWSER=chrome,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/59/ [13:05:02] Project selenium-Math » firefox,beta,Linux,contintLabsSlave && UbuntuTrusty build #59: 09SUCCESS in 56 sec: https://integration.wikimedia.org/ci/job/selenium-Math/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/59/ [13:06:01] 10Continuous-Integration-Infrastructure, 13Patch-For-Review: Figure out paths that needs to be backed up on gallium - https://phabricator.wikimedia.org/T65938#2398902 (10hashar) I had the build records migrated under `/var/lib/jenkins/builds/` so we can easily exclude them entirely in the bacula configuration.... [13:19:08] (03PS1) 10Hashar: logrotate castor-save after 5 days [integration/config] - 10https://gerrit.wikimedia.org/r/295491 [13:21:21] (03CR) 10Hashar: [C: 032] logrotate castor-save after 5 days [integration/config] - 10https://gerrit.wikimedia.org/r/295491 (owner: 10Hashar) [13:22:32] (03Merged) 10jenkins-bot: logrotate castor-save after 5 days [integration/config] - 10https://gerrit.wikimedia.org/r/295491 (owner: 10Hashar) [13:26:28] hashar hi, could you review https://gerrit.wikimedia.org/r/295471 please [13:31:06] hashar it fixes a composer issue for flow [13:31:11] on branch REL1_26 [13:32:08] paladox: hi. I guess Flow folks will do ? :) [13:32:17] oh lego got it merged [13:32:39] hashar Oh, thanks legoktm [13:32:46] hashar i got gerrit installed [13:32:48] on labs [13:32:51] :) [13:33:02] hashar http://gerrit-test.wmflabs.org/#/q/status:open [13:36:34] :) [13:42:23] (03PS2) 10Paladox: [wikidata/query/gui-deploy] make npm test voting [integration/config] - 10https://gerrit.wikimedia.org/r/291736 [13:46:58] Project selenium-VisualEditor » firefox,beta,Linux,contintLabsSlave && UbuntuTrusty build #58: 04FAILURE in 2 min 56 sec: https://integration.wikimedia.org/ci/job/selenium-VisualEditor/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/58/ [13:48:21] (03PS3) 10Paladox: [timeline] Update Jenkins tests [integration/config] - 10https://gerrit.wikimedia.org/r/291670 [13:49:35] (03PS4) 10Paladox: [timeline] Update Jenkins tests [integration/config] - 10https://gerrit.wikimedia.org/r/291670 [13:51:51] (03PS5) 10Paladox: [CentralAuth] Add composer-test test [integration/config] - 10https://gerrit.wikimedia.org/r/288819 [13:52:28] (03PS4) 10Paladox: In node-4.3 clone under src [integration/config] - 10https://gerrit.wikimedia.org/r/290702 (https://phabricator.wikimedia.org/T130208) [13:59:45] (03PS3) 10Paladox: Migrate composer-validate to trusty nodepool [integration/config] - 10https://gerrit.wikimedia.org/r/291505 [14:05:11] 10Continuous-Integration-Infrastructure, 07Jenkins: Some Jenkins jobs tend to be stuck and never times out - https://phabricator.wikimedia.org/T138281#2399185 (10hashar) [14:05:13] 10Continuous-Integration-Infrastructure: PHPUnit job hangs for two hours after SlowTimer warning - https://phabricator.wikimedia.org/T136349#2399183 (10hashar) [14:05:55] 10Continuous-Integration-Infrastructure: PHPUnit job hangs for two hours after SlowTimer warning - https://phabricator.wikimedia.org/T136349#2331486 (10hashar) I noticed some build being stuck. I did debug it recently on T138281 and it seems to be the instance having an I/O deadlock of some sort apparently due... [14:06:42] (03PS8) 10Paladox: Migrate mediawiki core jsduck test to jessie [integration/config] - 10https://gerrit.wikimedia.org/r/290791 (https://phabricator.wikimedia.org/T119143) [14:10:07] (03PS9) 10Paladox: Migrate mediawiki core jsduck test to jessie [integration/config] - 10https://gerrit.wikimedia.org/r/290791 (https://phabricator.wikimedia.org/T119143) [14:10:36] hashar jenkins 2.0 is really backwords compat. Since it works the same as jenkins 1.x [14:15:02] (03PS2) 10Paladox: Migrate mw-tools-codesniffer-mwcore-testrun to trusty nodepool [integration/config] - 10https://gerrit.wikimedia.org/r/291501 [14:16:45] (03PS2) 10Paladox: Migrate composer-package-validate to trusty nodepool [integration/config] - 10https://gerrit.wikimedia.org/r/291506 [14:19:08] (03PS16) 10Paladox: Fix dirty VisualEditor submodule [integration/config] - 10https://gerrit.wikimedia.org/r/262432 (https://phabricator.wikimedia.org/T121479) [14:19:26] (03PS8) 10Paladox: Migrate mediawiki-phpunit-phpflavour-composer to nodepool [integration/config] - 10https://gerrit.wikimedia.org/r/290806 (https://phabricator.wikimedia.org/T135001) [14:19:50] (03PS5) 10Paladox: Migrate wikimedia/fundraising/dash node-0.10 test to node-4.3 test [integration/config] - 10https://gerrit.wikimedia.org/r/291603 [14:19:58] (03PS15) 10Paladox: Migrate mwext-testextension-{phpflavor} to trusty nodepool [integration/config] - 10https://gerrit.wikimedia.org/r/292509 (https://phabricator.wikimedia.org/T137199) [14:33:50] Project selenium-WikiLove » firefox,beta,Linux,contintLabsSlave && UbuntuTrusty build #59: 09SUCCESS in 1 min 49 sec: https://integration.wikimedia.org/ci/job/selenium-WikiLove/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/59/ [14:35:01] hi zeljkof [14:35:13] hi aharoni! [14:40:08] https://phabricator.wikimedia.org/diffusion/GMALU/ [14:40:13] https://gerrit.wikimedia.org/r/#/c/256404/ [14:40:23] Tsk! Five character short-code! Kill it! ;-) [15:00:33] 06Release-Engineering-Team, 10Gerrit, 06Operations, 06WMF-Legal, and 2 others: Gerrit seemingly violates data retention guidelines - https://phabricator.wikimedia.org/T114395#1694145 (10Mpaulson) Has this been adjusted so that it deletes the logs after 30 days? [15:10:26] hey, releng people, the beta cluster (sca01 node) doesn't have RAM anymore, I can't run ORES in it, we need a node as similar as possible to scb in prod. scb02 is in use and probably will run out of RAM if I add ORES, can I build a sca03? [15:19:26] thcipriani: ^ [15:19:56] Amir1: one minute SWATting... [15:20:02] sorry [15:20:07] sure [15:38:32] Project selenium-MobileFrontend » chrome,beta,Linux,contintLabsSlave && UbuntuTrusty build #54: 09SUCCESS in 16 min: https://integration.wikimedia.org/ci/job/selenium-MobileFrontend/BROWSER=chrome,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/54/ [15:45:06] Project selenium-MobileFrontend » firefox,beta,Linux,contintLabsSlave && UbuntuTrusty build #54: 04FAILURE in 23 min: https://integration.wikimedia.org/ci/job/selenium-MobileFrontend/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/54/ [16:01:29] Project selenium-CentralNotice » chrome,beta,Linux,contintLabsSlave && UbuntuTrusty build #62: 09SUCCESS in 28 sec: https://integration.wikimedia.org/ci/job/selenium-CentralNotice/BROWSER=chrome,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/62/ [16:01:36] Project selenium-CentralNotice » chrome,beta,Windows 7,contintLabsSlave && UbuntuTrusty build #62: 09SUCCESS in 35 sec: https://integration.wikimedia.org/ci/job/selenium-CentralNotice/BROWSER=chrome,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Windows%207,label=contintLabsSlave%20&&%20UbuntuTrusty/62/ [16:01:40] Project selenium-CentralNotice » firefox,beta,Linux,contintLabsSlave && UbuntuTrusty build #62: 09SUCCESS in 39 sec: https://integration.wikimedia.org/ci/job/selenium-CentralNotice/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/62/ [16:01:41] Project selenium-CentralNotice » firefox,beta,Windows 7,contintLabsSlave && UbuntuTrusty build #62: 09SUCCESS in 40 sec: https://integration.wikimedia.org/ci/job/selenium-CentralNotice/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Windows%207,label=contintLabsSlave%20&&%20UbuntuTrusty/62/ [16:02:10] Project selenium-CentralNotice » chrome,beta,OS X 10.9,contintLabsSlave && UbuntuTrusty build #62: 09SUCCESS in 1 min 8 sec: https://integration.wikimedia.org/ci/job/selenium-CentralNotice/BROWSER=chrome,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=OS%20X%2010.9,label=contintLabsSlave%20&&%20UbuntuTrusty/62/ [16:02:53] Project selenium-CentralNotice » firefox,beta,OS X 10.9,contintLabsSlave && UbuntuTrusty build #62: 09SUCCESS in 1 min 52 sec: https://integration.wikimedia.org/ci/job/selenium-CentralNotice/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=OS%20X%2010.9,label=contintLabsSlave%20&&%20UbuntuTrusty/62/ [16:49:05] In the instructions here: https://wikitech.wikimedia.org/wiki/How_to_deploy_code#Pre-deployment_testing_in_production [16:49:13] sync-common should now be "scap sync-common", correct? [16:49:25] scap pull [16:50:43] Just scap pull, that's it? [16:51:06] yeah. it will sudo internally [16:51:11] I updated that section. [16:51:14] Thanks! [16:51:15] thanks [16:51:56] I updated a bunch of those sync-common usage examples on wikitech but I'm sure I missed a few more [16:53:24] blerg. documentation update: the long tail of modifying subcommands :( [17:08:58] at the WMF, documentation is always a very long tail [18:50:12] Project selenium-Wikidata » firefox,test,Linux,contintLabsSlave && UbuntuTrusty build #32: 04FAILURE in 11 sec: https://integration.wikimedia.org/ci/job/selenium-Wikidata/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=test,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/32/ [19:12:49] 06Release-Engineering-Team, 10Gerrit, 06Operations, 06WMF-Legal, and 2 others: Gerrit seemingly violates data retention guidelines - https://phabricator.wikimedia.org/T114395#1694145 (10greg) >>! In T114395#2399437, @Mpaulson wrote: > Has this been adjusted so that it deletes the logs after 30 days? Based... [19:31:26] (03CR) 10Paladox: ":)" [integration/zuul] (debian/jessie-wikimedia) - 10https://gerrit.wikimedia.org/r/293269 (https://phabricator.wikimedia.org/T137279) (owner: 10Hashar) [19:48:48] 10Continuous-Integration-Infrastructure, 05Gerrit-Migration, 03releng-201617-q2, 10Differential: Build glue from Differential to Nodepool - https://phabricator.wikimedia.org/T130950#2400267 (10greg) [19:49:00] 05Gerrit-Migration, 03releng-201617-q2, 07Documentation: Document workflow and creation of CI jobs in Differential - https://phabricator.wikimedia.org/T130952#2400268 (10greg) [19:49:41] 05Gerrit-Migration, 03releng-201617-q3: Phase 2 repository migrations to Differential (goal - end of September 2016) - https://phabricator.wikimedia.org/T130420#2400269 (10greg) [19:50:01] 05Gerrit-Migration, 03releng-201617-q3, 07Documentation: Update Code Review related documentation on wiki pages from Gerrit to Differential - https://phabricator.wikimedia.org/T207#2400271 (10greg) [19:50:21] 05Gerrit-Migration, 03releng-201617-q4: Phase 3 repository migrations to Differential (goal - end of December 2016) - https://phabricator.wikimedia.org/T130421#2400274 (10greg) [19:50:30] PM-spam ^ [19:51:41] 10releng-201516-q3, 03releng-201617-q2, 10scap, 03Scap3 (Scap3-MediaWiki-MVP), 07WorkType-NewFunctionality: [keyresult] Migrate the MW weekly train deploy to scap3 - https://phabricator.wikimedia.org/T114313#2400278 (10greg) [19:52:08] 03releng-201617-q3, 10scap, 06Operations, 06Performance-Team, and 2 others: Make scap able to depool/repool servers via the conftool API - https://phabricator.wikimedia.org/T104352#2400279 (10greg) [20:07:08] (03CR) 10Paladox: "recheck" [integration/config] - 10https://gerrit.wikimedia.org/r/269653 (owner: 10JanZerebecki) [20:07:28] (03CR) 10jenkins-bot: [V: 04-1] [WIP] Add a sqlite variant of extension-unittests-* [integration/config] - 10https://gerrit.wikimedia.org/r/269653 (owner: 10JanZerebecki) [20:14:32] (03CR) 10Paladox: "recheck" [integration/config] - 10https://gerrit.wikimedia.org/r/173831 (owner: 10Hashar) [20:19:37] 10Continuous-Integration-Infrastructure, 10Zuul: Investigate Zuul 2.1.0-151-g30a433b that stops processing Gerrit events - https://phabricator.wikimedia.org/T137525#2400328 (10Paladox) @hashar zuul is dropping jenkins support in v3. [20:24:37] paladox: he knows :) ^ [20:24:58] greg-g: Oh, sorry. [20:28:31] I'm going to play with sca nodes in beta cluster a little bit to bring back the real ores beta online [20:41:51] Project selenium-Echo » chrome,beta,Linux,contintLabsSlave && UbuntuTrusty build #63: 09SUCCESS in 50 sec: https://integration.wikimedia.org/ci/job/selenium-Echo/BROWSER=chrome,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/63/ [20:41:57] Project selenium-Echo » firefox,beta,Linux,contintLabsSlave && UbuntuTrusty build #63: 09SUCCESS in 56 sec: https://integration.wikimedia.org/ci/job/selenium-Echo/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/63/ [20:42:23] 06Release-Engineering-Team, 15User-greg, 15User-zeljkofilipin: Determine location of 2016 RelEng team offsite - https://phabricator.wikimedia.org/T137721#2400363 (10greg) Suggestions from a computer on where to go: https://teleport.org/flock/#!/131a3d128f18418dc [20:43:49] 06Release-Engineering-Team, 15User-greg: Determine timing of 2016 RelEng team offsite - https://phabricator.wikimedia.org/T137720#2400364 (10greg) [20:44:14] 06Release-Engineering-Team, 15User-greg: Determine timing of 2016 RelEng team offsite - https://phabricator.wikimedia.org/T137720#2376537 (10greg) (doodle didn't give me a "week of" option, so I made a dumb spreadsheet) [20:45:51] 06Release-Engineering-Team, 15User-greg: Create agenda outline for 2016 RelEng team offsite - https://phabricator.wikimedia.org/T138437#2400369 (10greg) [20:48:17] PROBLEM - Host deployment-ores-web is DOWN: CRITICAL - Host Unreachable (10.68.21.158) [21:01:47] 06Release-Engineering-Team, 15User-greg: Create agenda outline for 2016 RelEng team offsite - https://phabricator.wikimedia.org/T138437#2400475 (10greg) Reminders to include: * explicit hacking time * excursion (half)day * .... [21:16:31] thcipriani: around? [21:16:40] Amir1: yup, what's up [21:16:44] ? [21:17:03] oh, just remembered your question from this morning ;) [21:17:23] I want to talk about ores service in beta cluster. It was down and per my talk with ops, they told me it's better to choose puppet configs for beta [21:17:48] that are as the same as the prod cluster (ores in beta used ores configs in labs) [21:18:01] so I tried to do that in sca01 [21:18:08] ah, yeah, ideally beta would use the same puppet roles [21:18:20] (if not too onerous) [21:18:24] it uses now, only difference is hiera settings [21:18:46] but RAM ran out. so I deleted deployment-ores-web and made deployment-sca03 [21:18:54] I enabled roles [21:19:50] (because sca01 and sca02 are small and they share the system with so many other services) [21:20:07] sure, makes sense. [21:20:26] so now, sca03 works just fine but only service there is ores [21:21:05] thcipriani: I will document this somewhere but before making patches, I need to fix an issue that is not resolved yet [21:21:19] when I want to deploy, it can't connect to sca03 [21:21:32] from deployment-tin? [21:21:40] (I deployed it internally via scap3 and worked just fine) [21:21:47] yup, it's not possible to login [21:22:01] * thcipriani catches up [21:24:38] Project selenium-Wikidata » firefox,beta,Linux,contintLabsSlave && UbuntuTrusty build #32: 04FAILURE in 2 hr 34 min: https://integration.wikimedia.org/ci/job/selenium-Wikidata/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/32/ [21:26:03] Amir1: I was able to login from deployment-tin as the deploy-service user. Is that what you're having problems with? [21:27:06] thcipriani: can you try deploying via scap3? it gives connection issue error to me [21:27:34] what was the error message? [21:27:51] (you might need to accept the host key as your user before you can deploy in beta) [21:27:52] 20:51:44 ['/usr/bin/scap', 'deploy-local', '-v', '--repo', 'ores/deploy', '-g', 'worker', 'fetch'] on deployment-sca03.deployment-prep.eqiad.wmflabs returned [255]: Host key verification failed. [21:28:21] thcipriani: ^ [21:29:00] ah, yeah, hang on I can fix that manually (should add this to our technical debt backlog) [21:30:15] thanks [21:30:16] :) [21:30:34] Amir1: give deployment a try now [21:30:46] k [21:31:01] successful [21:31:07] thcipriani: thanks :) [21:31:13] cool. I ran: ssh-keyscan -H deployment-sca03.deployment-prep.eqiad.wmflabs >> /etc/ssh/ssh_known_hosts as root [21:32:11] just needed to accept the host key from the new host before it'll let you login. Doesn't prompt in batch mode (which scap uses). Non-issue in prod, need to fix for beta. [21:32:21] I put it somewhere so we won't forget, but a more sophisticated method would be nice, probably by puppet? [21:33:19] I think that's how it's done in prod, I looked into this at one point and gave up somewhere along the way... [21:42:45] 10Beta-Cluster-Infrastructure, 10ORES, 06Revision-Scoring-As-A-Service: Resurrect ores-beta with production roles - https://phabricator.wikimedia.org/T138445#2400541 (10Ladsgroup) [21:43:11] 10Beta-Cluster-Infrastructure, 10ORES, 06Revision-Scoring-As-A-Service: Resurrect ores-beta with production roles - https://phabricator.wikimedia.org/T138445#2400557 (10Ladsgroup) ores-beta.wmflabs.org [21:46:20] 10Beta-Cluster-Infrastructure, 10ORES, 06Revision-Scoring-As-A-Service, 13Patch-For-Review, 05Wikimania-Hackathon-2016: Resurrect ores-beta with production roles - https://phabricator.wikimedia.org/T138445#2400566 (10Ladsgroup) [21:46:25] Amir1: prod handles the host key thing using resource collection done by Puppet. For various reasons resource collection doesn't work in the beta cluster [21:47:08] bd808: thanks for the explanation, I hope we can find a workaround [21:47:36] I don't know about resource collection but it might be possible to enable it in beta as well [21:48:06] 20after4 looked at doing that at one point and failed [21:48:32] that's not proof that it is impossible, but it's not trivially easy [21:49:02] SSH stuff always look more complicated than it should be [21:49:03] we "fixed" it for normal scap at some point by disabling host key verification with a local patch [21:49:26] (which no longer works because Debian packaging) [21:50:45] thcipriani: it wouldn't be hard to make it a config option. I may have even had a patch to do that one point [21:51:09] this is true. [21:58:25] Project selenium-Core » firefox,beta,Linux,contintLabsSlave && UbuntuTrusty build #64: 09SUCCESS in 6 min 24 sec: https://integration.wikimedia.org/ci/job/selenium-Core/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/64/ [21:58:59] Project selenium-PageTriage » chrome,beta,Linux,contintLabsSlave && UbuntuTrusty build #61: 09SUCCESS in 58 sec: https://integration.wikimedia.org/ci/job/selenium-PageTriage/BROWSER=chrome,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/61/ [21:59:02] Project selenium-PageTriage » firefox,beta,Linux,contintLabsSlave && UbuntuTrusty build #61: 09SUCCESS in 1 min 1 sec: https://integration.wikimedia.org/ci/job/selenium-PageTriage/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/61/ [22:02:14] 06Release-Engineering-Team: Proposal: Add a European mid-day SWAT window - https://phabricator.wikimedia.org/T137970#2400589 (10hashar) #releng is working on a proposal that slightly adjust how we do SWAT and would include an European SWAT window. Most probably in the morning so it also covers India afternoon.... [22:21:46] Project selenium-CentralAuth » firefox,beta,Linux,contintLabsSlave && UbuntuTrusty build #62: 09SUCCESS in 1 min 45 sec: https://integration.wikimedia.org/ci/job/selenium-CentralAuth/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/62/ [22:28:11] Today I decided to find out how our unpuppetised upload in deployment-prep works [22:53:16] upload.beta.wmflabs.org points to deployment-cache-upload04, which runs varnish and is configured to point to deployment-upload by IP [22:53:57] deployment-upload is a precise host running nginx with config at /etc/nginx/sites-enabled/media [22:54:50] docroot is /data/project/upload7, and nginx calls out to PHP to run stuff like /scripts/thumb-handler.php and /scripts/404.php - it also prevents access to /private [22:55:08] # missing thumb: ask a renderer to generate it [22:55:08] location ~ \/thumb\/ { [22:55:09] error_page 404 = /scripts/thumb-handler.php; [22:55:09] } [22:55:09] # catch all the other cases with this [22:55:17] error_page 404 /scripts/404.php; [22:59:40] thumb-handler.php has a hardcoded reference to deployment-cache-text04's IP, also a commented out proxy reference to an IP that is apparently rendering.svc.codfw.wmnet, but I think it's more likely this was rendering.svc.pmtpa.wmnet at the time [23:02:06] sounds like a twisty maze of configuration. [23:03:54] 10Beta-Cluster-Infrastructure, 06Operations, 13Patch-For-Review: /mnt/upload7 does not exist anywhere, yet it is referenced in multiple places in wmf-config - https://phabricator.wikimedia.org/T129586#2400679 (10Krenair) `/mnt/upload7` does not appear to exist anywhere within the beta cluster, it appears to... [23:05:52] (03PS7) 1020after4: Phabricator/harbormaster job templates [integration/config] - 10https://gerrit.wikimedia.org/r/295396 (https://phabricator.wikimedia.org/T130950) [23:14:59] 10Beta-Cluster-Infrastructure, 06Commons, 10MediaWiki-File-management, 06Multimedia: Thumbnail 404s get cached - https://phabricator.wikimedia.org/T69056#2400693 (10Krenair) I'm wondering about moving this from #beta-cluster-infrastructure to #beta-cluster-reproducible... But I did spot this in /data/proje... [23:18:33] Krenair: tgr has wanted to fix all of that for a long time. What we really need is a swift cluster in deploymnet-prep [23:18:45] nobody ever seems to want to work on that though [23:19:46] So we use nginx + a custom PHP script which contacts app servers which pull images from NFS in beta [23:19:52] maybe gilles can be tricked into doing it as part of his thumbor project [23:20:10] in prod we have Varnish trying to pull the images from Swift, and failing that asking MW image scaling servers to render them? [23:20:30] yeah via middleware in the swift server itself [23:20:47] it's a different twisty maze [23:21:06] Is that puppetised? [23:21:14] I belive so, yes [23:21:25] Apparently we have a Swift server in beta now [23:21:43] faidon built it all in prod so I'm sure its all nicely in Puppet [23:22:32] https://phabricator.wikimedia.org/T112421#2102752 [23:23:22] a miss in swift calls thumb.php on an imagescaler which calls to swift to get the original and then the response is saved by swift and returned to the requesting user [23:24:02] and indeed we have deployment-ms-fe01, -ms-be01 and -ms-be02 [23:27:28] I wonder how Swift is configured at the moment given we have no dedicated imagescalers [23:29:35] 03Scap3: Make scap3 config deployment awesome - https://phabricator.wikimedia.org/T138452#2400757 (10mmodell) [23:30:09] 03Scap3, 10scap, 07WorkType-NewFunctionality: Need a way to see config diffs in Scap - https://phabricator.wikimedia.org/T118206#2400773 (10mmodell) [23:30:11] 03Scap3: Make scap3 config deployment awesome - https://phabricator.wikimedia.org/T138452#2400774 (10mmodell) [23:30:24] 03Scap3: Make scap3 config deployment awesome - https://phabricator.wikimedia.org/T138452#2400757 (10mmodell) [23:30:25] 03Scap3, 10scap, 07Documentation, 15User-mobrovac: Document Scap3 config-deploy - https://phabricator.wikimedia.org/T116634#2400775 (10mmodell) [23:30:39] 03Scap3: Make scap3 config deployment awesome - https://phabricator.wikimedia.org/T138452#2400757 (10mmodell) p:05Triage>03High [23:32:41] 10Continuous-Integration-Infrastructure, 05Gerrit-Migration, 10Differential: Determine method of getting changes (diffs) to the nodepool instances - https://phabricator.wikimedia.org/T131378#2400781 (10mmodell) 05Open>03Resolved Conclusion: 1. Works for now 3. I believe this is now possible without clut... [23:58:46] bd808, know anything about debugging jenkins? :) [23:59:23] maybe... what's busted? [23:59:45] I don't think Jenkins is correct about https://gerrit.wikimedia.org/r/295598