[00:17:06] Yippee, build fixed! [00:17:06] Project selenium-Flow » chrome,beta,Linux,contintLabsSlave && UbuntuTrusty build #164: 09FIXED in 1 min 5 sec: https://integration.wikimedia.org/ci/job/selenium-Flow/BROWSER=chrome,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/164/ [00:17:17] Yippee, build fixed! [00:17:18] Project selenium-Flow » firefox,beta,Linux,contintLabsSlave && UbuntuTrusty build #164: 09FIXED in 1 min 17 sec: https://integration.wikimedia.org/ci/job/selenium-Flow/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/164/ [00:45:28] 10Continuous-Integration-Config, 10MediaWiki-extensions-SubPageList: SubPageList testextensionfails - https://phabricator.wikimedia.org/T147393#2691506 (10Reedy) [00:46:35] 10Continuous-Integration-Config, 10MediaWiki-extensions-SubPageList: SubPageList testextension fails - https://phabricator.wikimedia.org/T147393#2691518 (10Reedy) [00:49:31] (03PS1) 10Reedy: Make SubPageList depend on ParserHooks [integration/config] - 10https://gerrit.wikimedia.org/r/314211 (https://phabricator.wikimedia.org/T147393) [02:16:41] Yippee, build fixed! [02:16:42] Project selenium-QuickSurveys » chrome,beta,Linux,contintLabsSlave && UbuntuTrusty build #176: 09FIXED in 3 min 41 sec: https://integration.wikimedia.org/ci/job/selenium-QuickSurveys/BROWSER=chrome,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/176/ [02:19:33] PROBLEM - Puppet staleness on deployment-pdfrender is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [43200.0] [02:33:32] 06Release-Engineering-Team (Long-Lived-Branches), 03Scap3 (Scap3-MediaWiki-MVP), 13Patch-For-Review: Create `scap swat` command to automate patch merging & testing during a swat deployment - https://phabricator.wikimedia.org/T142880#2691593 (10mmodell) [02:41:52] Yippee, build fixed! [02:41:53] Project selenium-CirrusSearch » firefox,beta,Linux,contintLabsSlave && UbuntuTrusty build #175: 09FIXED in 52 sec: https://integration.wikimedia.org/ci/job/selenium-CirrusSearch/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/175/ [03:44:50] (03PS1) 10BryanDavis: ThrottleOverride: run composer tests [integration/config] - 10https://gerrit.wikimedia.org/r/314228 [04:05:52] Yippee, build fixed! [04:05:53] Project selenium-MultimediaViewer » safari,beta,OS X 10.9,contintLabsSlave && UbuntuTrusty build #162: 09FIXED in 9 min 52 sec: https://integration.wikimedia.org/ci/job/selenium-MultimediaViewer/BROWSER=safari,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=OS%20X%2010.9,label=contintLabsSlave%20&&%20UbuntuTrusty/162/ [04:18:23] Yippee, build fixed! [04:18:24] Project selenium-MultimediaViewer » chrome,beta,OS X 10.9,contintLabsSlave && UbuntuTrusty build #162: 09FIXED in 22 min: https://integration.wikimedia.org/ci/job/selenium-MultimediaViewer/BROWSER=chrome,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=OS%20X%2010.9,label=contintLabsSlave%20&&%20UbuntuTrusty/162/ [04:26:08] 10Gerrit: Update site CSS customizations for the new change screen in Gerrit 2.12 - https://phabricator.wikimedia.org/T141286#2691639 (10PleaseStand) >>! In T141286#2507172, @Paladox wrote: > @PleaseStand hi but we do > > .com-google-gerrit-client-diff-DiffTable_BinderImpl_GenCss_style-difftable .CodeMirror pre... [06:29:21] PROBLEM - Puppet run on integration-slave-trusty-1006 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [07:09:23] RECOVERY - Puppet run on integration-slave-trusty-1006 is OK: OK: Less than 1.00% above the threshold [0.0] [07:39:40] 10Gerrit, 13Patch-For-Review, 07Upstream: Gerrit in Microsoft Edge doesn't display the git commands in the download box - https://phabricator.wikimedia.org/T145130#2691756 (10Paladox) According to them the bug does not exist currently in the public released version, ie the annerversery update but does in the... [07:55:31] 03Scap3, 10scap, 06Operations, 13Patch-For-Review, 15User-mobrovac: Scap::server::sources is out of sync with the repositories actually present on tin/mira - https://phabricator.wikimedia.org/T143692#2691774 (10Joe) This is mostly done; next step would be to make scap_source verify the origin and change... [07:58:33] 10Continuous-Integration-Infrastructure, 10puppet-compiler, 15User-Joe: puppet-compiler should not link to not existing change.*.pson file - https://phabricator.wikimedia.org/T126796#2691782 (10Joe) a:05Joe>03None [08:00:21] 03Scap3, 10scap, 06Operations, 13Patch-For-Review, and 2 others: Scap::server::sources is out of sync with the repositories actually present on tin/mira - https://phabricator.wikimedia.org/T143692#2691784 (10Joe) [09:06:13] hashar hi, the patch for copying and pasting upstream got merged :) [09:38:31] 10Continuous-Integration-Infrastructure, 10Monitoring, 06Operations, 13Patch-For-Review, 07Technical-Debt: Remove Ganglia Jenkins plugin from gallium - https://phabricator.wikimedia.org/T147065#2692065 (10hashar) 05Open>03Resolved a:03hashar @Dzahn merged the change and double checked the cleanup o... [09:44:47] paladox: yeah I got the mail notification :} [09:45:15] Oh [09:45:19] I got ie11 [09:45:23] offically supported now [09:45:25] by upstream [09:45:36] with https://gerrit-review.googlesource.com/#/c/86245/ [09:45:37] :) [09:47:32] (03CR) 10Hashar: [C: 032] Make SubPageList depend on ParserHooks [integration/config] - 10https://gerrit.wikimedia.org/r/314211 (https://phabricator.wikimedia.org/T147393) (owner: 10Reedy) [09:48:34] (03Merged) 10jenkins-bot: Make SubPageList depend on ParserHooks [integration/config] - 10https://gerrit.wikimedia.org/r/314211 (https://phabricator.wikimedia.org/T147393) (owner: 10Reedy) [09:55:49] hmm, no dice :/ [09:58:29] 10Continuous-Integration-Config, 10MediaWiki-extensions-SubPageList, 13Patch-For-Review: SubPageList testextension fails - https://phabricator.wikimedia.org/T147393#2692091 (10hashar) p:05Triage>03Normal a:03Reedy CI is updated and the jobs now have ParserHooks extension injected. There is still a gotc... [09:58:59] Reedy: ah hello :) [09:59:26] Reedy: I have commented on the task. Wanna move the exception throwing to $wgExtensionFunctions [09:59:44] Aha [09:59:44] I think I once used the setupAfterCache hook but I remember some review said to use wgExtensionFunctions instead [09:59:53] I can't remember how CI load the extensions [09:59:58] but there is some kind of fixed order [10:00:08] Wonder if it it's worth extension registration both too [10:00:10] so the load fails from time to time but AFAIK we have fixed almost all exts [10:00:22] so folks can just wfLoad/include in whatever order and that should work [10:00:24] which is quite ideal [10:00:34] extension registration I am not sure [10:00:46] I guess the migration is being done on a best effort basis [10:00:59] mostly by me it seems atm :P [10:01:05] our extensions is a burden though [10:01:21] there are a lot of them, and I am pretty sure a good chunk are no more used anywhere [10:01:29] Yup, and aren't maintained [10:01:40] People write them, dump them and then leave them [10:01:53] No proactive maintenance [10:02:11] So we have hundreds using code deprecated in 3 or 4 or more versions ago [10:02:38] we should get some kind of grade for extensions [10:02:46] like grade A : actively maintained and used on wikimedia [10:02:54] grade B: actively maintained, not on Wikimedia [10:02:59] grace C : best effort [10:03:06] grade D: deprecated/abandoned whatever [10:03:22] and I guess we would want to move them to an attic when we stop maintaining them [10:03:31] Mmm [10:03:38] I don't mind working on one for people that care [10:03:47] yeah [10:03:50] But if they haven't been bothered to make sure the CI works for them.... [10:03:53] maybe I am too strict :} [10:05:22] heh [10:05:40] hashar wikibase sqlite test seems to run mysql not sqlite https://integration.wikimedia.org/ci/job/mwext-Wikibase-repo-tests-sqlite-php55/1074/console [10:05:45] I know it's been discussed before... But did we ever get anywhere with periodically running the tests on all (or just dormant) extensions? [10:05:48] 08:37:47 + /srv/deployment/integration/slave-scripts/bin/mw-run-update-script.sh [10:05:48] 08:37:48 MediaWiki 1.28.0-alpha Updater [10:05:48] 08:37:48 [10:06:01] So we find these sorts of breakages before people come across them? [10:06:19] Oh wait [10:06:19] 08:37:48 Going to run database updates for jenkins_u1_mw [10:06:20] 08:37:48 Using SQLite file: '/srv/home/jenkins-deploy/tmpfs/jenkins-1/jenkins_u1_mw.sqlite' [10:06:20] 08:37:48 Depending on the size of your database this may take a while! [10:06:54] (03PS2) 10Hashar: ThrottleOverride: run composer tests [integration/config] - 10https://gerrit.wikimedia.org/r/314228 (owner: 10BryanDavis) [10:06:56] but then that should have picked up this bug https://phabricator.wikimedia.org/T147300 [10:07:33] (03CR) 10Hashar: [C: 032] "Thanks :) Some code sniffer rules are complaining but no big deal." [integration/config] - 10https://gerrit.wikimedia.org/r/314228 (owner: 10BryanDavis) [10:08:08] (03Merged) 10jenkins-bot: ThrottleOverride: run composer tests [integration/config] - 10https://gerrit.wikimedia.org/r/314228 (owner: 10BryanDavis) [10:08:26] Reedy: yeah a daily/weekly run of tests on all extensions would be nice [10:08:35] that might be doable via Zuul [10:08:52] Do we have a task for it? [10:09:00] it has support for some scheduled tasks and reporting over email to bunch of folks / qa-alerts [10:09:11] well given you just had the idea... no there is no task yet hehe [10:11:41] (03CR) 10Hashar: "PHPCS fixed via https://gerrit.wikimedia.org/r/314243" [integration/config] - 10https://gerrit.wikimedia.org/r/314228 (owner: 10BryanDavis) [10:13:02] Reedy: got a simple one that fix PHP CS rules https://gerrit.wikimedia.org/r/#/c/314243/ :D [10:29:58] hashar: bad bd808 [10:31:03] I'd say the stuff between the {} are over indented too [10:31:06] * Reedy fixes [10:32:05] Reedy: just hijack my patch [10:32:09] change author has needed :} [10:32:10] Yeah, I was gonna :P [10:32:12] as [10:32:29] git commit --amend --author "Sam Reed " [10:32:29] :D [10:34:54] \O/ [11:34:27] 06Release-Engineering-Team (Deployment-Blockers), 05Release: MW-1.28.0-wmf.21 deployment blockers - https://phabricator.wikimedia.org/T145220#2692221 (10Matanya) [12:01:36] Yippee, build fixed! [12:01:36] Project selenium-RelatedArticles » chrome,beta-desktop,Linux,contintLabsSlave && UbuntuTrusty build #167: 09FIXED in 35 sec: https://integration.wikimedia.org/ci/job/selenium-RelatedArticles/BROWSER=chrome,MEDIAWIKI_ENVIRONMENT=beta-desktop,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/167/ [12:01:48] Yippee, build fixed! [12:01:49] Project selenium-RelatedArticles » chrome,beta-mobile,Linux,contintLabsSlave && UbuntuTrusty build #167: 09FIXED in 48 sec: https://integration.wikimedia.org/ci/job/selenium-RelatedArticles/BROWSER=chrome,MEDIAWIKI_ENVIRONMENT=beta-mobile,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/167/ [12:15:23] 06Release-Engineering-Team (Deployment-Blockers), 05Release: MW-1.28.0-wmf.21 deployment blockers - https://phabricator.wikimedia.org/T145220#2692290 (10Reedy) [12:22:49] Yippee, build fixed! [12:22:50] Project selenium-GettingStarted » firefox,beta,Linux,contintLabsSlave && UbuntuTrusty build #167: 09FIXED in 48 sec: https://integration.wikimedia.org/ci/job/selenium-GettingStarted/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/167/ [12:41:56] (03PS2) 10Jean-Frédéric: Publish code coverage post-merge in labs/tools/heritage [integration/config] - 10https://gerrit.wikimedia.org/r/314171 [12:47:39] 03Scap3: Unhandled(?) exceptions in scap3 - https://phabricator.wikimedia.org/T147334#2689369 (10mobrovac) The last line of the log tells it all, @Yurik: > IOError: [Errno 28] No space left on device > 17:28:21 ERROR - deploy-local failed: [Errno 28] No space left on device [13:04:32] Yippee, build fixed! [13:04:33] Project selenium-Math » chrome,beta,Linux,contintLabsSlave && UbuntuTrusty build #166: 09FIXED in 32 sec: https://integration.wikimedia.org/ci/job/selenium-Math/BROWSER=chrome,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/166/ [13:04:34] Yippee, build fixed! [13:04:34] Project selenium-Math » firefox,beta,Linux,contintLabsSlave && UbuntuTrusty build #166: 09FIXED in 33 sec: https://integration.wikimedia.org/ci/job/selenium-Math/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/166/ [13:09:39] 10Beta-Cluster-Infrastructure, 06Labs, 13Patch-For-Review: Replace all class imports on Labs with role imports - https://phabricator.wikimedia.org/T147233#2692479 (10Andrew) [13:34:02] (03PS3) 10Hashar: Revert "Temporarily move composer-hhvm/php5 jobs off of nodepool" [integration/config] - 10https://gerrit.wikimedia.org/r/306727 [13:35:21] (03CR) 10jenkins-bot: [V: 04-1] Revert "Temporarily move composer-hhvm/php5 jobs off of nodepool" [integration/config] - 10https://gerrit.wikimedia.org/r/306727 (owner: 10Hashar) [13:37:50] known [13:40:19] (03PS4) 10Hashar: Revert "Temporarily move composer-hhvm/php5 jobs off of nodepool" [integration/config] - 10https://gerrit.wikimedia.org/r/306727 [13:46:29] Yippee, build fixed! [13:46:30] Project selenium-VisualEditor » firefox,beta,Linux,contintLabsSlave && UbuntuTrusty build #169: 09FIXED in 2 min 29 sec: https://integration.wikimedia.org/ci/job/selenium-VisualEditor/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/169/ [13:49:38] (03PS5) 10Hashar: Revert "Temporarily move composer-hhvm/php5 jobs off of nodepool" [integration/config] - 10https://gerrit.wikimedia.org/r/306727 (https://phabricator.wikimedia.org/T143938) [14:07:10] 10Continuous-Integration-Infrastructure, 07Nodepool, 13Patch-For-Review: Bring back jobs to Nodepool - https://phabricator.wikimedia.org/T143938#2692630 (10hashar) I have rebased the patch https://gerrit.wikimedia.org/r/#/c/306727/ //Revert "Temporarily move composer-hhvm/php5 jobs off of nodepool"// That r... [14:08:41] 10Beta-Cluster-Infrastructure, 06Labs, 13Patch-For-Review: Replace all class imports on Labs with role imports - https://phabricator.wikimedia.org/T147233#2692635 (10Andrew) [14:38:13] is there a tracking task for the general "phabricator search" doesn't work as expected? I could find only T146843 [14:38:23] case in point: searching "lithium" doesn't yield https://phabricator.wikimedia.org/T143307 [14:38:58] godog yes [14:40:21] godog https://phabricator.wikimedia.org/T146673 and https://phabricator.wikimedia.org/T146843 [14:42:18] twentyafterfour ^^ [14:44:00] paladox: thanks, subscribed both [14:44:07] Your welcome [14:46:42] PROBLEM - Puppet run on deployment-mathoid is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [14:51:51] 10Deployment-Systems, 06Operations: Have fallback communication channel when freenode has problems - https://phabricator.wikimedia.org/T127904#2057999 (10Luke081515) What we can do too: I setup a complete indipendent ircd at the past (for ircd related tests) at labs. Currently it allows at least 1000 users (I... [15:14:28] PROBLEM - Puppet run on deployment-cache-text04 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [15:16:50] (03PS1) 10Hashar: Recreate jobs for composer-hhvm/php on Nodepool [integration/config] - 10https://gerrit.wikimedia.org/r/314278 (https://phabricator.wikimedia.org/T143938) [15:18:27] (03CR) 10Hashar: [C: 032] "Jobs created:" [integration/config] - 10https://gerrit.wikimedia.org/r/314278 (https://phabricator.wikimedia.org/T143938) (owner: 10Hashar) [15:19:10] (03PS2) 10Hashar: Recreate jobs for composer-hhvm/php on Nodepool [integration/config] - 10https://gerrit.wikimedia.org/r/314278 (https://phabricator.wikimedia.org/T143938) [15:19:30] (03CR) 10Hashar: Recreate jobs for composer-hhvm/php on Nodepool [integration/config] - 10https://gerrit.wikimedia.org/r/314278 (https://phabricator.wikimedia.org/T143938) (owner: 10Hashar) [15:19:35] (03CR) 10Hashar: [C: 032] Recreate jobs for composer-hhvm/php on Nodepool [integration/config] - 10https://gerrit.wikimedia.org/r/314278 (https://phabricator.wikimedia.org/T143938) (owner: 10Hashar) [15:21:41] RECOVERY - Puppet run on deployment-mathoid is OK: OK: Less than 1.00% above the threshold [0.0] [15:22:04] (03Merged) 10jenkins-bot: Recreate jobs for composer-hhvm/php on Nodepool [integration/config] - 10https://gerrit.wikimedia.org/r/314278 (https://phabricator.wikimedia.org/T143938) (owner: 10Hashar) [15:35:01] can anyone explain to me the use of base::firewall on deployment-prep instances? It looks haphazard to me (for example it's present on deployment-eventlogging04 but not on deployment-eventlogging03) [15:37:56] Yippee, build fixed! [15:37:56] Project selenium-MobileFrontend » chrome,beta,Linux,contintLabsSlave && UbuntuTrusty build #183: 09FIXED in 15 min: https://integration.wikimedia.org/ci/job/selenium-MobileFrontend/BROWSER=chrome,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/183/ [15:54:30] RECOVERY - Puppet run on deployment-cache-text04 is OK: OK: Less than 1.00% above the threshold [0.0] [16:01:31] (03PS6) 10Hashar: Revert "Temporarily move composer-hhvm/php5 jobs off of nodepool" [integration/config] - 10https://gerrit.wikimedia.org/r/306727 (https://phabricator.wikimedia.org/T143938) [16:02:37] (03CR) 10Hashar: [C: 04-1] "Jenkins jobs have been reintroduced in commit 515a6d36 so this change is now just about Zuul workflow. From the task we run 1000 builds p" [integration/config] - 10https://gerrit.wikimedia.org/r/306727 (https://phabricator.wikimedia.org/T143938) (owner: 10Hashar) [16:09:35] (03CR) 10Hashar: "Added Chase merely for info." [integration/config] - 10https://gerrit.wikimedia.org/r/306727 (https://phabricator.wikimedia.org/T143938) (owner: 10Hashar) [16:10:38] andrewbogott: can you ask on the task? I can investigate / reply later in the day [16:10:43] gotta rush out now :( [16:10:55] but in the end i think it was just to bring in ferm with the default rules [16:11:03] when some role:: add ferm rules or something like that [16:11:09] bbl [16:11:09] which task? [16:11:36] andrewbogott: either the big tasks listing all non role class [16:11:40] or a new sub task :} [16:11:43] ok [16:11:53] just make sure I am subscribed [16:12:10] I am sure I can find some investigation but gotta dig in puppet logs / sal etc [16:12:31] 10Continuous-Integration-Config, 10Analytics-Cluster: Jenkins tests for analytics/refinery? - https://phabricator.wikimedia.org/T147072#2693456 (10madhuvishy) My patch is unrelated to this, and is specifically only for a step in the refinery deployment process. Not sure if the rest of the repo needs tests - Pi... [16:14:52] andrewbogott: deployment-eventlogging04 has role::eventbus::eventbus that defines a ferm::service [16:14:53] ACCEPT tcp -- 10.0.0.0/8 0.0.0.0/0 tcp dpt:8085 [16:15:03] 03Scap3 (Scap3-MediaWiki-MVP), 10scap: Scap3 needs a way to handle large binary file transport - https://phabricator.wikimedia.org/T119443#2693512 (10thcipriani) 05Open>03Resolved a:03dduvall Scap now supports git-fat the same as trebuchet. [16:15:04] so maybe nowadays ferm::service automatically include base::firewall [16:15:14] but maybe it does not, and thus everything would be disabled but that specific rule [16:15:37] maybe not, I'm not sure. I take it that eventlogging** are different from one another despite the common prefix? [16:16:13] yeah eventlogging03 is the legacy one [16:16:23] 03Scap3, 10scap: Allow scap3 to read target host list from stdin - https://phabricator.wikimedia.org/T122913#2693540 (10thcipriani) [16:16:25] eventlogging04 is using the new system nicknamed eventbuss [16:16:36] 03Scap3, 10scap: Allow scap3 to read target host list from stdin - https://phabricator.wikimedia.org/T122913#1916643 (10thcipriani) p:05Normal>03Low [16:16:39] so o eventlogging03 , maybe role::eventlogging does not bring in any ferm rule [16:16:47] though it has beta::deployaccess applied [16:16:58] but eventlogging03 has no entries in iptables [16:17:10] so I guess there is no ferm::* classes applied [16:17:30] so in short, if some role add a ferm::* , we probably have to add base::firewall [16:17:54] and maybe we can make it so all deployment-prep instances include base::firewall by default [16:18:02] that is doable via some hiera variable [16:18:22] which would enforce people to create ferm:** rules in their puppet manifest if they want their system to work on beta [16:18:26] just quick notes. I am rushing out [16:18:45] so eventbus and eventlogging are two separate services that uses a common framework (called eventlogging to cause some confusion) [16:20:35] 10Deployment-Systems, 03Scap3: Deploy mediawiki release tools repo (rMREL) with scap3 - https://phabricator.wikimedia.org/T142588#2693611 (10thcipriani) [16:20:45] 10Deployment-Systems, 03Scap3: Deploy mediawiki release tools repo (rMREL) with scap3 - https://phabricator.wikimedia.org/T142588#2540290 (10thcipriani) p:05Low>03Lowest [16:22:17] 03Scap3 (Scap3-MediaWiki-MVP), 03releng-201617-q4, 07Tracking: Use scap3's canary deploys for MediaWiki - https://phabricator.wikimedia.org/T131120#2693649 (10thcipriani) [16:24:08] 03Scap3 (Scap3-MediaWiki-MVP), 03releng-201617-q4: Use scap3's canary deploys for MediaWiki - https://phabricator.wikimedia.org/T131120#2156589 (10thcipriani) [16:24:31] 03Scap3 (Scap3-MediaWiki-MVP), 10releng-201516-q3, 03releng-201617-q2, 10scap, and 2 others: [keyresult] Migrate the MW weekly train deploy to scap3 - https://phabricator.wikimedia.org/T114313#2693676 (10thcipriani) [16:25:06] 03Scap3, 10scap: scap3 configuration selection is confusing - https://phabricator.wikimedia.org/T120410#2693682 (10thcipriani) [16:27:20] 10Beta-Cluster-Infrastructure, 06Labs, 13Patch-For-Review: Replace all class imports on Labs with role imports - https://phabricator.wikimedia.org/T147233#2693700 (10Andrew) [16:28:31] 03Scap3 (Scap3-MediaWiki-MVP), 10scap, 07Security-General: Scap should apply security patches - https://phabricator.wikimedia.org/T118478#2693723 (10mmodell) a:03mmodell [16:29:42] 03Scap3 (Scap3-MediaWiki-MVP), 10scap, 06Operations: Move scap target configuration to etcd - https://phabricator.wikimedia.org/T115899#2693727 (10thcipriani) p:05Normal>03Low [16:41:04] 03Scap3 (Scap3-MediaWiki-MVP): Unify scap2 and scap3 - https://phabricator.wikimedia.org/T147477#2693756 (10thcipriani) [16:41:29] 10Beta-Cluster-Infrastructure, 06Labs, 13Patch-For-Review: Replace all class imports on Labs with role imports - https://phabricator.wikimedia.org/T147233#2693773 (10Andrew) [16:43:54] 03Scap3 (Scap3-MediaWiki-MVP): Flatten MediaWiki config, all MediaWiki versions, and extensions into a unified git repo - https://phabricator.wikimedia.org/T147478#2693801 (10thcipriani) [16:44:49] 03Scap3 (Scap3-MediaWiki-MVP): Unify scap2 and scap3 - https://phabricator.wikimedia.org/T147477#2693820 (10thcipriani) [16:47:17] [16:51:26] love it [16:51:29] 10Beta-Cluster-Infrastructure, 06Labs, 13Patch-For-Review: Replace all class imports on Labs with role imports - https://phabricator.wikimedia.org/T147233#2693852 (10Andrew) [16:59:49] 06Release-Engineering-Team, 03Scap3 (Scap3-MediaWiki-MVP), 07Technical-Debt: What is /fonts/ in wmf-config and is it important? - https://phabricator.wikimedia.org/T147481#2693883 (10demon) [16:59:57] thcipriani: Heh ^ [17:00:49] ostriches: nice :) [17:01:02] 03Scap3 (Scap3-MediaWiki-MVP), 10Fundraising-Backlog, 10MediaWiki-extensions-ContributionTracking: Clean up Contribution Tracking settings in main wmf config repo - https://phabricator.wikimedia.org/T147479#2693901 (10demon) [17:02:23] 125 MB of font files. There is a readme...with a note from 2008. [17:02:40] good spot to dig :) [17:03:28] Ah, so maybe timeline stuff [17:03:31] Ancient. [17:03:39] ostriches Hi, im wondering could you drop user Paladox2014 from gerrit please? [17:03:41] Thats me [17:03:58] but i accidentaly did that a long time ago and i doint know why i created two users in gerrit [17:04:08] 10Beta-Cluster-Infrastructure, 06Labs, 13Patch-For-Review: Replace all class imports on Labs with role imports - https://phabricator.wikimedia.org/T147233#2693916 (10Andrew) [17:04:26] No, I don't like deleting things from databases, it creates inconsistencies. [17:04:41] Oh [17:04:41] ok [17:07:49] paladox: We can set it to inactive so nobody could try to use it [17:08:01] Ok, yes please :) [17:08:50] Done [17:10:10] thcipriani: hi, are you planning on pushing group1 today ? [17:10:13] Thanks [17:10:40] matanya: yes, although I haven't checked to see if there are any blocking tasks just yet. [17:11:09] looks clear so far https://phabricator.wikimedia.org/T145220 [17:11:21] thcipriani: i opened an unbreak now earlier and sam reed already solved it [17:11:39] matanya: ah, that's good :) [17:11:55] but i think the part of moving stuff to /libs is going to break stuff [17:13:10] thcipriani: also https://phabricator.wikimedia.org/T147422 and https://phabricator.wikimedia.org/T147414 [17:13:26] which i don't think are blocker, but should get attention [17:14:43] those are certainly troubling [17:15:10] the logspam is pretty bad [17:15:35] thcipriani: i have a few others i'd like someone to peak at before you go further, if that is fair to request [17:16:20] matanya: definitely, if you want to file any of these as blockers it seems reasonable to me. [17:16:47] thcipriani: I prefer not to scream wolf if i don't have to :) [17:18:02] understood [17:18:13] the logspam one definitely gives me pause. [17:18:21] boy, would shell access be useful for me [17:20:18] fwiw, we started working on a draft page for things that could hold the train https://wikitech.wikimedia.org/wiki/Deployments/Holding_the_train [17:21:39] cool [17:22:41] I'm too lazy to file a bug about this, but scap's hhvm-graceful and sync commands call tasks.restart_hhvm and that code seems to be gone from the tasks module. It never worked right but it should be cleaned up properly (or better put back and made to work with etcd/pybal). [17:23:37] bd808: i see some session stack traces on auth manager [17:24:12] matanya: you should talk to someone who works on auth manager :) tgr and anomie would probably be interested [17:24:23] thanks bd808 [17:24:25] * bd808 works in tool labs these days [17:25:29] bd808: ugh, weird. Will look. Thanks for the heads up. [17:25:38] PROBLEM - Puppet run on deployment-kafka05 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [17:34:39] thcipriani: also : https://phabricator.wikimedia.org/T147484 [17:37:03] thcipriani: and i also wanted to pass by you the fact you closed https://phabricator.wikimedia.org/T147359 but it is still happening [17:40:04] 06Release-Engineering-Team (Deployment-Blockers), 05Release: MW-1.28.0-wmf.21 deployment blockers - https://phabricator.wikimedia.org/T145220#2694047 (10thcipriani) [17:40:23] matanya: good catch. [17:40:49] that error initially exploded as soon as I moved wmf.21 to the testwiki. [17:41:29] the current rate of that error is much lower, but definitely not eliminated :( [17:41:48] in addition some stack traces too: [17:41:49] Warning: Destructor threw an object exception: exception 'InvalidArgumentException' with message 'LoadBalancer::reuseConnection: connection not found, has the connection been freed already?' in /srv/mediawiki/php-1.28.0-wmf.21/includes/libs/rdbms/loadbalancer/LoadBalancer.php:620 Stack trace: #0 /srv/mediawiki/php-1.28.0-wmf.21/includes/libs/rdbms/database/DBConnRef.php(599): LoadBalancer->reuseConnection() #1 (): DBConnRef->__destruct() #2 { [17:41:49] main} [17:42:09] touching the same code area [17:44:27] 10Beta-Cluster-Infrastructure, 06Labs, 13Patch-For-Review: Replace all class imports on Labs with role imports - https://phabricator.wikimedia.org/T147233#2694080 (10Andrew) [17:45:37] matanya: indeed. it seems much of this is related to rdbms loadbalancer issues. [17:46:30] thcipriani: are you aware of the redis connection issues from wmf.20 ? [17:47:28] redis connection issues are persistent log noise, but I wasn't aware of them at any elevated levels. [17:48:17] thcipriani, matanya: the train is supposed to be in just over an hour :/ [17:50:48] thcipriani: I mean Warning: timed out after 0.3 seconds when connecting to rdb1007.eqiad.wmnet [110]: Connection timed out for 5,164 times [17:52:14] Krenair: matanya indeed. I'm not clear if the train should actually be blocked or not. There is clearly some elevated logging that should be addressed in the short-term, but is there an impact of this increased logging or is it mostly noise? If so is there a code that should be pulled out of wmf.21 or patches that can be applied to get it working again? [17:52:19] greg-g: ^ [17:52:34] I've seen some exceptions [17:52:53] DBConnectionError from line 882 of /srv/mediawiki/php-1.28.0-wmf.21/includes/libs/rdbms/loadbalancer/LoadBalancer.php: Cannot access the database: No working replica DB server: Unknown error {"exception_id":"V-U6aApAMFwAACXw2usAAAAM"} [17:53:47] yeah, I just reopened that tasks (as matanya pointed out that it was still happening) [17:54:35] which was already a blocker for the train, so I suppose that means we're halted for the time being :( [17:54:39] I three 3 open issues: the DB connections, timedmediahandler issue and Oauth - last one which should not hold train [17:55:11] * thcipriani adds to blockers [17:56:07] 06Release-Engineering-Team (Deployment-Blockers), 05Release: MW-1.28.0-wmf.21 deployment blockers - https://phabricator.wikimedia.org/T145220#2694122 (10thcipriani) [17:58:07] matanya: where you referring to: https://phabricator.wikimedia.org/T147414 ? [17:58:17] yes thcipriani [17:58:27] ok [17:59:08] PROBLEM - Puppet run on deployment-tin is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [17:59:25] 06Release-Engineering-Team (Deployment-Blockers), 05Release: MW-1.28.0-wmf.21 deployment blockers - https://phabricator.wikimedia.org/T145220#2694126 (10thcipriani) [18:00:36] RECOVERY - Puppet run on deployment-kafka05 is OK: OK: Less than 1.00% above the threshold [0.0] [18:01:15] I don't think that oauth one should block, right? [18:01:19] thcipriani: matanya ^ [18:01:36] I agree [18:01:41] it is a simple bug [18:01:42] no data loss, user already wanted to cancel the operation [18:01:53] but he can't [18:02:00] that is the bug [18:02:10] That fatal in TMH is a blocker. [18:02:17] oh... I see, I misunderstood [18:02:22] * greg-g nods [18:02:57] matanya: oh, I think I'm still right, the authorization never happens, so that's OK [18:03:18] greg-g: yes, that is why i agree it is not a blocker [18:03:23] 06Release-Engineering-Team (Deployment-Blockers), 05Release: MW-1.28.0-wmf.21 deployment blockers - https://phabricator.wikimedia.org/T145220#2694134 (10greg) [18:03:24] * greg-g nods [18:03:42] matanya: so, it was just my internal knowledge changing, ignore me :) [18:03:57] I think i did my weekly, will go block some vandals :) [18:04:06] Fix is easy for TMH [18:04:08] Doing it now [18:05:41] oh greg-g if you are around, legal is wanting me to sign stuff again, i am quite annoyed that every now and then i need to send all my private info and sign stuff [18:06:34] can you please figure out why legal troubles me from time to time? i can drop my access if they are so worried about it but then i will not be able to help [18:07:21] matanya: no clue. It's a black box to me, sadly. [18:07:31] sigh [18:07:34] yeah :/ [18:08:06] what do they want you to sign? [18:08:21] cobblestone? phab [18:08:30] you're not a wmf contractor are you? [18:08:33] send them my address, full legal name [18:08:37] no [18:08:42] then what are they doing? [18:08:48] who knows [18:08:49] what is it that they want you to sign inside cobblestone? [18:08:59] i don't know, didn't get the link [18:09:11] but being threatened to lose access [18:09:23] legal can't actually remove any of your rights, so if you've already signed something I wouldn't worry too much yet [18:09:39] it happens about every 6 months or so for diff system/right [18:10:10] Da heck? [18:10:23] I don't worry, it is not like i going to loose my job or something :) [18:10:42] you have access to what, the nda ldap group? [18:11:08] yes [18:11:19] cobblestone is the new contract management software, it'd be great if it was used for this (at least there'd be a place I could look, for once) [18:11:21] which you presumably already signed for [18:11:29] greg-g, no, we have phab for this. [18:11:37] Krenair: sadly, not according to legal [18:11:59] what do you mean no? [18:12:04] i sign a scan in 2011 (?) then i signed on meta, then i signed phab [18:12:15] now they want cobblestone [18:12:28] and probably forgot some stuff in between [18:12:30] you're signing the same agreement each time or different texts? [18:12:43] mostly the same, some wording changes [18:13:06] sorry man that's a hassle [18:13:26] is it made clear whether one supersedes a previous one? [18:13:45] it always has the title "updated" if that counts :) [18:13:48] 10Beta-Cluster-Infrastructure, 06Labs, 13Patch-For-Review: Replace all class imports on Labs with role imports - https://phabricator.wikimedia.org/T147233#2694183 (10Andrew) [18:14:00] no offense, i just don't know why i need to share my info over and over, that is the troubling part [18:14:09] yeah, that part defintely sucks [18:14:35] trust and saftey has a copy, travel has one, ops have one, and legal has one [18:14:55] who gave travel access and why? [18:14:59] i did [18:15:07] probably for wikimania or something [18:15:07] i had to travel [18:15:11] FDC [18:15:11] :) [18:15:48] I've travelled as a WMF-funded volunteer before with no NDA [18:16:01] we need LaaS (Legal as a service) [18:16:03] legaloid [18:16:03] it would really be better if one source of truth would be there to lookup my info and not ask for it over and over [18:16:28] Though at that time I couldn't've signed one whether they wanted it or not [18:16:29] Krenair: I don't think travel needed an NDA, just his details (passport etc) [18:16:39] matanya: yeah, really :( :( [18:16:53] so two complains combined : too many signs, and too many details sharing :) [18:17:02] I wouldn't sign another if I were you, matanya [18:17:10] but IANAL [18:17:12] so [18:18:03] 03Scap3: scap l10n-purge broken - https://phabricator.wikimedia.org/T147349#2694193 (10thcipriani) 05Open>03Resolved [18:18:32] thcipriani: patch for that fatal in TimedMediaHandler [18:18:41] https://gerrit.wikimedia.org/r/#/c/314315/ [18:19:22] brion: thanks! I think ostriches was looking at that one, too... [18:19:36] ah crap i forgot to put it on master [18:19:38] adding :D [18:19:58] heh [18:19:59] 10Beta-Cluster-Infrastructure, 06Labs, 13Patch-For-Review: Replace all class imports on Labs with role imports - https://phabricator.wikimedia.org/T147233#2694202 (10Andrew) [18:19:59] LOL [18:20:07] Ostriches already done that [18:20:09] too many branches man [18:20:10] \o/ [18:20:21] brion https://gerrit.wikimedia.org/r/#/c/314311/ [18:20:42] nice [18:21:00] brion: cookie for you too [18:21:08] om nom nom [18:23:58] I already got it on master and backported. [18:24:00] brion: Thx tho [18:24:01] :) [18:24:21] PROBLEM - Host integration-puppetmaster is DOWN: CRITICAL - Host Unreachable (10.68.16.42) [18:24:21] \o/ [18:25:24] thx ostriches [18:25:32] yw [18:25:39] * brion goes back to sorta making tmh and mmv talk to each other [18:27:13] PROBLEM - Puppet run on deployment-ores-redis is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [18:27:38] brion: thanks for looking/helping! [18:28:20] 06Release-Engineering-Team (Deployment-Blockers), 05Release: MW-1.28.0-wmf.21 deployment blockers - https://phabricator.wikimedia.org/T145220#2694239 (10demon) [18:29:14] 10Beta-Cluster-Infrastructure, 06Labs, 13Patch-For-Review: Replace all class imports on Labs with role imports - https://phabricator.wikimedia.org/T147233#2694242 (10Andrew) [18:37:33] greg-g: did anyone create grafana dashbords for group0/1 ? [18:39:06] RECOVERY - Puppet run on deployment-tin is OK: OK: Less than 1.00% above the threshold [0.0] [18:42:02] matanya: I dunno about grafana, but there's logstash dashboards for group0 and 1. [18:44:23] 10Beta-Cluster-Infrastructure, 06Labs, 13Patch-For-Review: Replace all class imports on Labs with role imports - https://phabricator.wikimedia.org/T147233#2694316 (10Andrew) [18:50:22] ostriches: i am aware, thanks, i was thinking something similar to the auth graph [18:50:55] not to my knowledge [18:52:25] yeah, not that I know of [18:59:55] thcipriani: have time to answer more puppet questions? [19:00:15] andrewbogott: sure, I can try to answer puppet questions. [19:00:32] there are three boxes in deployment prep that look like this: [19:00:36] https://www.irccloud.com/pastebin/8H6JYpZ6/ [19:00:57] I'd like to consolidate ::conftool (and possibly ::firewall) into one of the other classes there. [19:01:17] But I'm not sure that makes sense [19:01:30] * thcipriani catches up [19:01:41] ::appserver includes ::webserver which /sometimes/ includes ::conftool but it maybe doesn't in this case [19:02:14] RECOVERY - Puppet run on deployment-ores-redis is OK: OK: Less than 1.00% above the threshold [0.0] [19:03:32] I created role::beta::mediawiki instead of role::beta::deployaccess so we'd have a place to stash stuff as this work progressed. Where the weird worts go was my idea, so it *might* make sense to just shove things in there. [19:04:37] I saw your questions about base::firewall earlier, but I really have no idea about that one. [19:04:52] i.e., why on some machines and not others [19:04:54] it looks like everywhere with base::firewall also has beta::mediawiki [19:05:07] but, not the reverse. So cramming base::firewall into beta::mediawiki would apply it on new instances [19:05:17] which might be good? Unclear :) [19:05:38] 06Release-Engineering-Team, 03Scap3 (Scap3-MediaWiki-MVP), 07Technical-Debt: What is /fonts/ in wmf-config and is it important? - https://phabricator.wikimedia.org/T147481#2693883 (10hashar) When we have lost the deployment servers content back in February ([[ https://wikitech.wikimedia.org/wiki/Incident_do... [19:05:49] what a tangled puppet we weave :) [19:05:58] ostriches: GDFONTPATH that is old school :] [19:06:10] Yeah :p [19:06:10] thcipriani: in theory the firewall class is applied on (almost) all prod hosts. [19:06:15] I replied on the task and pointed to the old one https://phabricator.wikimedia.org/T39968 [19:06:17] So inasmuch as this is meant to mirror prod... [19:07:08] hashar: Thx [19:07:21] andrewbogott: right, it likely *should* be on all these hosts, but applying it right now would undoubtedly break things. Which hosts have beta::mediawiki but not base::firewall? [19:07:44] hashar: Are they all in packages already? If so, we could write up a trivial puppet class to install them + symlink. [19:07:51] If not, we could toss them all in a package or something [19:08:29] ostriches: I havent closely verified. I gave up when I found that .deb fonts package install in different path while GDFONTPATH accept a single path [19:08:30] thcipriani: deployment-jobrunner02, deployment-tmh01 [19:08:32] that's it [19:08:58] hashar: That's easily worked around. I guess the next step is to just find out :D [19:09:01] * thcipriani digs [19:09:15] role::beta::mediawiki that looks like a hack really [19:09:37] indeed it is a hack [19:09:53] (03PS8) 10Hashar: Add job that allows for updating analytics refinery artifacts with latest source jars [integration/config] - 10https://gerrit.wikimedia.org/r/290640 (https://phabricator.wikimedia.org/T130123) (owner: 10Madhuvishy) [19:10:09] 10Continuous-Integration-Config, 06Analytics-Kanban, 13Patch-For-Review: Get jenkins to update refinery with deploy of new jars {hawk} - https://phabricator.wikimedia.org/T130123#2694414 (10hashar) [19:10:33] (03CR) 10Hashar: [C: 032] "Deployed by madhuvishy" [integration/config] - 10https://gerrit.wikimedia.org/r/290640 (https://phabricator.wikimedia.org/T130123) (owner: 10Madhuvishy) [19:10:42] thcipriani: those are also the boxes that have role::beta::mediawiki but mediawiki::conftool [19:11:35] (03Merged) 10jenkins-bot: Add job that allows for updating analytics refinery artifacts with latest source jars [integration/config] - 10https://gerrit.wikimedia.org/r/290640 (https://phabricator.wikimedia.org/T130123) (owner: 10Madhuvishy) [19:11:47] 10Continuous-Integration-Config, 06Analytics-Kanban, 13Patch-For-Review: Get jenkins to update refinery with deploy of new jars {hawk} - https://phabricator.wikimedia.org/T130123#2694433 (10hashar) @madhuvishy I don't scale anymore :] Best way is to add #continuous-integration-config to the task and poke fo... [19:14:25] thcipriani: so this is my (probably very unsafe) proposal: https://gerrit.wikimedia.org/r/#/c/314328/ [19:16:03] hrm, I was just looking at tmh01 and jobrunner02 the only ports that are open that are different than the ones on mw06 frex are for rpc.statd [19:16:20] so base::firewall *may be* safe mostly [19:17:39] conftool just seems to install conftool, so that's an island unto itself. [19:19:01] and we dont use conftool on beta [19:19:12] afaik it is only to track apps servers to which varnish should send traffic [19:19:23] cant remember what _joe_ said but I guess it can be dropped [19:19:57] base::firewall, I am pretty sure I added it when we started to add to roles something like: include ferm ferm::rule [19:20:09] which ended up with only the iptables rule for that service and nothing else [19:20:15] then ferm got changed to DROP by default [19:20:35] so the hack was to allow all the common flows, which are provided via base::firewall [19:20:45] but all of the above is probably fairly outdated [19:21:18] 10Continuous-Integration-Config, 06Analytics-Kanban, 13Patch-For-Review: Add JJB support for Jenkins Maven Release Plugin {hawk} - https://phabricator.wikimedia.org/T132175#2694529 (10madhuvishy) 05Open>03Resolved [19:21:55] 10Continuous-Integration-Config, 06Analytics-Kanban, 13Patch-For-Review: Get jenkins to update refinery with deploy of new jars {hawk} - https://phabricator.wikimedia.org/T130123#2694531 (10madhuvishy) 05Open>03Resolved Thanks @hashar! I'll verify that. [19:23:14] PROBLEM - Puppet run on deployment-ores-redis is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [19:23:30] IIRC we got conftool on beta after we talked about scap making calls to/through it and we needed some place to test it. [19:24:11] sounds about right [19:25:24] madhuvishy: kudos :) [19:25:39] hashar: thank you :) [19:25:58] madhuvishy: if you get ideas about new jobs / CI ideas please be bold :D [19:26:11] there are too many things going on for me to track everything though [19:26:16] but this channel should be able to offer support [19:27:45] thcipriani: it sounds like I should remove ::conftool from those instances entirely, and then modify my patch to only add ::firewall. Sound right? [19:28:00] andrewbogott: which machines have role::meta::mediaiwki but don't have conftool? [19:28:10] *beta::mediawiki [19:29:17] thcipriani: the same two. deployment-tmh01 and deployment-jobrunner02 [19:30:03] andrewbogott: +1'd [19:31:03] thcipriani: you think that conftool is still good for something? [19:31:15] (I didn't really follow that part of the conversation) [19:32:00] I'm torn about it, honestly. Removing it might be best on second thought. [19:32:22] I don't think it's going to hurt anything, but at the same time, it's just more debt. Yeah, rip it out. [19:32:39] ok, amending my patch... [19:33:54] !log removing mediawiki::conftool from deployment-mediawiki04, deployment-mediawiki06, deployment-mediawiki05 [19:33:57] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [19:34:09] thcipriani: the DB issue seems to all come from resource loader, as far as i can see [19:34:22] oh? [19:34:38] https://logstash.wikimedia.org/app/kibana#/dashboard/resourceloader?_g=(refreshInterval:(display:Off,pause:!f,value:0),time:(from:now-6h,mode:quick,to:now))&_a=(filters:!(('$state':(store:appState),meta:(alias:!n,disabled:!f,index:'logstash-*',key:_type,negate:!f,value:mediawiki),query:(match:(_type:(query:mediawiki)))),('$state':(store:appState),meta:(alias:!n,disabled:!f,index:'logstash-*',key:channel,negate:!f,value:resourceloader),query:( [19:34:38] match:(channel:(query:resourceloader)))),('$state':(store:appState),meta:(alias:!n,disabled:!f,index:'logstash-*',key:normalized_message.raw,negate:!f,value:'ResourceLoaderModule::saveFileDependencies:%20failed%20to%20update%20DB:%20exception%20!'DBConnectionError!'%20with%20message%20!'Cannot%20access%20the%20database:%20No%20working%20replica%20DB%20server:%20Unknown%20error!'%20in%20%2Fsrv%2Fmediawiki%2Fphp-1.28.0-wmf.21%2Fincludes%2Flibs% [19:34:38] 2Frdbms%2Floadbalancer%2FLoad'),query:(match:(normalized_message.raw:(query:'ResourceLoaderModule::saveFileDependencies:%20failed%20to%20update%20DB:%20exception%20!'DBConnectionError!'%20with%20message%20!'Cannot%20access%20the%20database:%20No%20working%20replica%20DB%20server:%20Unknown%20error!'%20in%20%2Fsrv%2Fmediawiki%2Fphp-1.28.0-wmf.21%2Fincludes%2Flibs%2Frdbms%2Floadbalancer%2FLoad',type:phrase))))),options:(darkTheme:!f),panels:!(( [19:34:38] col:1,id:Events-Over-Time,panelIndex:1,row:1,size_x:12,size_y:2,type:visualization),(col:1,id:Trending-Messages,panelIndex:2,row:3,size_x:6,size_y:5,type:visualization),(col:7,id:Event-Level,panelIndex:3,row:3,size_x:6,size_y:2,type:visualization),(col:7,id:Top-20-Hosts,panelIndex:4,row:5,size_x:6,size_y:3,type:visualization),(col:1,id:Top-Domains,panelIndex:5,row:8,size_x:12,size_y:3,type:visualization),(col:1,columns:!(type,level,wiki,host, [19:34:38] message),id:Default-Events-List,panelIndex:6,row:11,size_x:12,size_y:23,sort:!('@timestamp',desc),type:search)),query:(query_string:(analyze_wildcard:!t,query:'*')),title:resourceloader,uiState:(P-1:(vis:(legendOpen:!f)),P-3:(vis:(legendOpen:!f)),P-4:(vis:(legendOpen:!f)))) [19:34:47] oh, crap [19:34:52] heh [19:34:56] :) [19:35:01] https://logstash.wikimedia.org/goto/c74530445505b1d79793eea6c74d4e02 [19:38:59] thcipriani: ok, there's one more base::firewall straggler, in deployment-eventlogging04.deployment-prep.eqiad.wmflabs [19:40:21] matanya: could you update the task with this info? [19:40:54] i am not sure about it, it might be coming from other places too [19:41:57] yeah, hard to tell in logstash since group0 doesn't see much traffic. [19:42:31] yes [19:44:01] thcipriani: I'm going to step away in a few minutes… seems like nothing's broken so far, right? [19:44:31] andrewbogott: yup, so far, nothing that screams is broken. [19:44:52] cool [19:45:35] I'll give it another couple. If you (or hashar) feels inclined to figure out about deployment-eventlogging04.deployment-prep.eqiad.wmflabs… I'll get to make a checkmark on a list if we remove base::firewall from that instance. [19:45:51] (Mostly I'm delaying the inevitable 'make a role class that just installs the firewall') [19:50:08] hrm, I mean it looks like eventbus::eventlogging calls out to ferm, mostly expects base::firewall to exist it seems. I don't know anything about it, really, but everything in site.pp that includes eventlogging::eventbus also includes base::firewall so delta from production, etc. Probably best to just make a roll class for it :\ [19:50:24] thcipriani: also, seen this on special pages only so far, but again, too little traffic to conclude [19:50:49] thcipriani: or include it in eventlogging::eventbus? [19:51:44] doesn't seem like there's an instance where eventlogging::eventbus is included without base::firewall, but you should probably reach out to someone who knows more about that role than I :) [19:52:49] thcipriani: and one relatively new: [19:52:50] "type":"Flow\\Exception\\InvalidDataException","file":"/srv/mediawiki/php-1.28.0-wmf.21/extensions/Flow/includes/Model/AbstractRevision.php","line":366,"message":"Failed to load the content","code":"default","url":null,"ba [19:53:01] will open a task [19:55:02] thcipriani: are we talking about role::eventbus::eventbus or something else? (I don't think eventbus::eventlogging is a real thing) [19:56:12] andrewbogott: sorry, role::eventbus::eventbus :) [19:56:26] conflating servername with the role I was looking at [19:59:15] ok! [19:59:17] * andrewbogott out for now [20:03:15] RECOVERY - Puppet run on deployment-ores-redis is OK: OK: Less than 1.00% above the threshold [0.0] [20:05:39] 03Scap3 (Scap3-MediaWiki-MVP): Flatten MediaWiki config, all MediaWiki versions, and extensions into a unified git repo - https://phabricator.wikimedia.org/T147478#2694777 (10thcipriani) A good starting point for a plan was written in @bd808's comment T101023#1349229 This comment covers a larger scope than this... [20:06:13] https://phabricator.wikimedia.org/T147507 <-- thcipriani [20:07:32] that one is a dup [20:07:36] * bd808 looks for it [20:08:53] matanya: merged into T138310 [20:09:15] * bd808 forgets that stashbot is not loved here [20:09:16] https://phabricator.wikimedia.org/T138310 [20:09:40] thanks bd808 [20:22:50] 03Scap3: Unhandled(?) exceptions in scap3 - https://phabricator.wikimedia.org/T147334#2689369 (10Gehel) yep, maps-test2* are still not functional. This should be resolved once T147194 is done. [20:57:48] PROBLEM - Puppet run on integration-slave-trusty-1004 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [21:02:39] 06Release-Engineering-Team, 10Wikimedia-Developer-Summit, 06Developer-Relations (Oct-Dec-2016): Developer Summit 2017: Work with TPG and RelEng on solution to event documenting - https://phabricator.wikimedia.org/T132400#2695071 (10ksmith) @Qgil and @Rfarrand: This isn't exactly in my wheelhouse, so I'm not... [21:25:53] 06Release-Engineering-Team (Deployment-Blockers), 05Release: MW-1.28.0-wmf.23 deployment blockers - https://phabricator.wikimedia.org/T147517#2695117 (10greg) [21:26:09] 06Release-Engineering-Team (Deployment-Blockers), 05Release: MW-1.28.0-wmf.22 deployment blockers - https://phabricator.wikimedia.org/T146998#2677385 (10greg) [21:32:50] RECOVERY - Puppet run on integration-slave-trusty-1004 is OK: OK: Less than 1.00% above the threshold [0.0] [21:40:50] 06Release-Engineering-Team (Deployment-Blockers), 05Release: MW-1.28.0-wmf.21 deployment blockers - https://phabricator.wikimedia.org/T145220#2695194 (10thcipriani) [21:51:13] 10Beta-Cluster-Infrastructure, 06Labs, 13Patch-For-Review: Replace all class imports on Labs with role imports - https://phabricator.wikimedia.org/T147233#2695228 (10Andrew) [21:52:26] 10Beta-Cluster-Infrastructure, 06Operations, 07Puppet: Make deployment-prep puppetmaster more similar to Production puppetmaster - https://phabricator.wikimedia.org/T146627#2695230 (10AlexMonk-WMF) Will setting up puppetdb fix T72792? Was anything brought up during the ops offsite that should be added to th... [22:10:49] 10Beta-Cluster-Infrastructure, 06Collaboration-Team-Triage, 10Flow: Use custom $wgFlowCluster and $wgFlowDefaultWikiDb on Beta Cluster - https://phabricator.wikimedia.org/T147523#2695266 (10Mattflaschen-WMF) [22:13:32] 10Beta-Cluster-Infrastructure, 06Collaboration-Team-Triage, 10Flow: Use custom $wgFlowCluster and $wgFlowDefaultWikiDb on Beta Cluster - https://phabricator.wikimedia.org/T147523#2695266 (10Mattflaschen-WMF) [22:13:44] 10Beta-Cluster-Infrastructure, 06Collaboration-Team-Triage, 10Flow: Use custom $wgFlowCluster and $wgFlowDefaultWikiDb on Beta Cluster - https://phabricator.wikimedia.org/T147523#2695266 (10Mattflaschen-WMF) [22:16:23] 10Beta-Cluster-Infrastructure, 06Collaboration-Team-Triage, 10Flow: Use dedicated $wgFlowCluster and $wgFlowDefaultWikiDb on Beta Cluster - https://phabricator.wikimedia.org/T147523#2695312 (10Mattflaschen-WMF) [22:21:32] 10Continuous-Integration-Infrastructure, 07Tracking: Have unit tests of all wmf deployed extensions pass when installed together, in both PHP-Zend and HHVM (tracking) - https://phabricator.wikimedia.org/T69216#2695343 (10Krinkle) [22:43:37] PROBLEM - Keyholder status on deployment-tin is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [22:46:50] PROBLEM - Keyholder status on deployment-mira is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [22:54:01] 06Release-Engineering-Team, 10ArchCom-RfC, 06Developer-Relations, 06WMF-Legal, and 2 others: Create formal process for CREDITS files - https://phabricator.wikimedia.org/T139300#2695443 (10RobLa-WMF) @Jdlrobson: ArchCom discussed this as a possible topic for next week's IRC meeting (E316 2016-10-12, Wednes... [22:58:12] 10Deployment-Systems, 06Operations, 06Services, 10service-runner, 15User-mobrovac: Automate compiling service dependencies using production Jessie libraries - https://phabricator.wikimedia.org/T94611#2695447 (10mobrovac) 05Open>03Resolved a:03mobrovac The established practice is > - create a WMF J... [23:03:55] PROBLEM - Puppet run on deployment-db03 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [23:06:42] gerrit is really unhappy today :( [23:22:50] yurik: first I heard, can you say more? otherwise I'll just have to ignore the comment [23:34:13] 06Release-Engineering-Team, 10Wikimedia-Developer-Summit, 06Developer-Relations (Oct-Dec-2016): Developer Summit 2017: Work with TPG and RelEng on solution to event documenting - https://phabricator.wikimedia.org/T132400#2695520 (10RobLa-WMF) [23:43:55] RECOVERY - Puppet run on deployment-db03 is OK: OK: Less than 1.00% above the threshold [0.0] [23:52:14] seems fine to me yurik [23:52:46] greg-g, Krenair, sorry, just saw your replies. For some reason it took ~4-5 min for "git review" to go throuw [23:52:48] though [23:53:17] might have been just my connection, but IRC and other sites seem to be okayish [23:54:05] actually just checked - git pull takes considerable time, even though gerrit.wikimedia.org opens pretty fast