[00:03:46] 10Release-Engineering-Team (Watching / External), 10Developer-Relations, 10MediaWiki-API, 10WMF-Product-Development-Process, 10User-notice: Standardise procedures for deprecating public-facing code - https://phabricator.wikimedia.org/T114384#3714076 (10Krinkle) [01:32:59] (03PS1) 10Legoktm: Add script to manipulate clover.xml files [integration/jenkins] - 10https://gerrit.wikimedia.org/r/386765 [01:33:45] (03CR) 10Legoktm: "I ended up writing Change-Id: I1275e1a50c5992581b914a058159a5c84386d893 to shrink the size of clover.xml to something much more manageable" [integration/config] - 10https://gerrit.wikimedia.org/r/386580 (owner: 10Legoktm) [01:44:35] (03PS1) 10Legoktm: Add support for OpenCV 3.0+ with autotools [integration/uprightdiff] - 10https://gerrit.wikimedia.org/r/386766 [01:44:43] (03CR) 10jerkins-bot: [V: 04-1] Add support for OpenCV 3.0+ with autotools [integration/uprightdiff] - 10https://gerrit.wikimedia.org/r/386766 (owner: 10Legoktm) [01:46:54] (03PS2) 10Legoktm: Add support for OpenCV 3.0+ with autotools [integration/uprightdiff] - 10https://gerrit.wikimedia.org/r/386766 [01:47:02] (03CR) 10jerkins-bot: [V: 04-1] Add support for OpenCV 3.0+ with autotools [integration/uprightdiff] - 10https://gerrit.wikimedia.org/r/386766 (owner: 10Legoktm) [01:59:26] PROBLEM - Puppet errors on castor02 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [02:39:29] RECOVERY - Puppet errors on castor02 is OK: OK: Less than 1.00% above the threshold [0.0] [03:54:17] PROBLEM - Puppet errors on deployment-ores-redis-01 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [03:54:51] PROBLEM - Puppet errors on deployment-urldownloader is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [04:00:15] Project selenium-MultimediaViewer » firefox,mediawiki,Linux,BrowserTests build #559: 04FAILURE in 4 min 14 sec: https://integration.wikimedia.org/ci/job/selenium-MultimediaViewer/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=mediawiki,PLATFORM=Linux,label=BrowserTests/559/ [04:01:57] 10Release-Engineering-Team, 10Operations: upload problem for parsoid release - https://phabricator.wikimedia.org/T179134#3714186 (10Dzahn) [04:09:21] Project selenium-MultimediaViewer » firefox,beta,Linux,BrowserTests build #559: 04FAILURE in 13 min: https://integration.wikimedia.org/ci/job/selenium-MultimediaViewer/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=BrowserTests/559/ [04:29:48] RECOVERY - Puppet errors on deployment-urldownloader is OK: OK: Less than 1.00% above the threshold [0.0] [04:34:15] RECOVERY - Puppet errors on deployment-ores-redis-01 is OK: OK: Less than 1.00% above the threshold [0.0] [05:25:14] PROBLEM - Puppet errors on deployment-ores-redis-01 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [05:33:37] 10Release-Engineering-Team, 10Operations, 10Patch-For-Review: upload problem for parsoid release - https://phabricator.wikimedia.org/T179134#3714378 (10Dzahn) http://puppet-compiler.wmflabs.org/8493/naos.codfw.wmnet/ http://puppet-compiler.wmflabs.org/8493/tin.eqiad.wmnet/ ``` - identityfile ~/.ssh/id... [05:53:52] 10Release-Engineering-Team, 10Operations, 10Patch-For-Review: upload problem for parsoid release - https://phabricator.wikimedia.org/T179134#3714448 (10Dzahn) ``` root@tin:/tmp# deb-upload /tmp/parsoid_0.8.0all_amd64.changes Trying to upload package to releases1001.eqiad.wmnet Uploading to releases1001.eqiad... [05:54:17] 10Release-Engineering-Team, 10Operations, 10Patch-For-Review: upload problem for parsoid release - https://phabricator.wikimedia.org/T179134#3714449 (10Dzahn) 05Open>03Resolved a:03Dzahn [06:00:16] RECOVERY - Puppet errors on deployment-ores-redis-01 is OK: OK: Less than 1.00% above the threshold [0.0] [06:02:54] 10Continuous-Integration-Config, 10BlueSpice, 10Patch-For-Review: Enable unit tests on BlueSpice* repos - https://phabricator.wikimedia.org/T130811#3714470 (10Osnard) My team is working on splitting up BlueSpiceExtensions repo in seperate repos. We already moced BlueSpiceExtensions/Readers to BlueSpiceReader... [06:15:08] PROBLEM - Puppet errors on deployment-mathoid is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [06:50:06] RECOVERY - Puppet errors on deployment-mathoid is OK: OK: Less than 1.00% above the threshold [0.0] [06:57:19] PROBLEM - Puppet errors on deployment-kafka01 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [07:51:16] PROBLEM - Puppet errors on deployment-ores-redis-01 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [08:10:35] jan_drewniak: good morning :] I finally feel better today! [08:31:14] RECOVERY - Puppet errors on deployment-ores-redis-01 is OK: OK: Less than 1.00% above the threshold [0.0] [08:36:20] 10Browser-Tests-Infrastructure, 10Release-Engineering-Team (Kanban), 10User-zeljkofilipin: Add createAccount method to nodemw - https://phabricator.wikimedia.org/T173505#3714809 (10zeljkofilipin) Work in progress: https://github.com/zeljkofilipin/nodemw/commits/createAccount [09:02:03] PROBLEM - Puppet errors on deployment-trending01 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [09:04:08] (03PS1) 10Hashar: operations/debs/gerrit is no more [integration/config] - 10https://gerrit.wikimedia.org/r/386789 [09:06:40] (03CR) 10Hashar: [C: 032] operations/debs/gerrit is no more [integration/config] - 10https://gerrit.wikimedia.org/r/386789 (owner: 10Hashar) [09:07:26] (03CR) 10Hashar: [C: 04-2] operations/debs/gerrit is no more [integration/config] - 10https://gerrit.wikimedia.org/r/386789 (owner: 10Hashar) [09:08:15] (03PS2) 10Hashar: operations/debs/gerrit is no more [integration/config] - 10https://gerrit.wikimedia.org/r/386789 [09:08:40] (03CR) 10Hashar: [C: 032] operations/debs/gerrit is no more [integration/config] - 10https://gerrit.wikimedia.org/r/386789 (owner: 10Hashar) [09:10:22] (03Merged) 10jenkins-bot: operations/debs/gerrit is no more [integration/config] - 10https://gerrit.wikimedia.org/r/386789 (owner: 10Hashar) [10:13:05] (03CR) 10Hashar: "aclocal.m4 / configure / config.status are auto generated by autoconf. Makefile.in by automake. So I don't think they should be committed" [integration/uprightdiff] - 10https://gerrit.wikimedia.org/r/386766 (owner: 10Legoktm) [10:21:31] 10Browser-Tests-Infrastructure, 10Release-Engineering-Team (Kanban), 10User-zeljkofilipin: Add createAccount method to nodemw - https://phabricator.wikimedia.org/T173505#3715004 (10zeljkofilipin) Sent pull request: https://github.com/macbre/nodemw/pull/135 [10:44:11] 10Release-Engineering-Team, 10ORES, 10Operations, 10Scoring-platform-team, and 2 others: 503 spikes and resulting API slowness starting 18:45 October 26 - https://phabricator.wikimedia.org/T179156#3715032 (10hoo) [10:44:19] 10Release-Engineering-Team (Kanban), 10Release, 10Train Deployments: 1.31.0-wmf.5 deployment blockers - https://phabricator.wikimedia.org/T174361#3559324 (10hoo) [10:44:22] 10Release-Engineering-Team, 10ORES, 10Operations, 10Scoring-platform-team, and 2 others: 503 spikes and resulting API slowness starting 18:45 October 26 - https://phabricator.wikimedia.org/T179156#3715052 (10hoo) [10:47:08] 10Release-Engineering-Team, 10ORES, 10Operations, 10Scoring-platform-team, and 3 others: 503 spikes and resulting API slowness starting 18:45 October 26 - https://phabricator.wikimedia.org/T179156#3715061 (10Ladsgroup) [10:58:05] 10Release-Engineering-Team (Kanban), 10Readers-Web-Backlog, 10RelatedArticles, 10Browser-Tests, and 4 others: Automated browser tests cannot create pages on the Beta Cluster as anonymous user in RelatedArticles tests - https://phabricator.wikimedia.org/T176315#3715068 (10zeljkofilipin) >>! In T176315#37074... [11:23:42] (03CR) 10Esanders: Show code coverage percent on index page (031 comment) [integration/docroot] - 10https://gerrit.wikimedia.org/r/386565 (https://phabricator.wikimedia.org/T146970) (owner: 10Legoktm) [11:33:58] 10Release-Engineering-Team, 10ORES, 10Operations, 10Scoring-platform-team, and 3 others: 503 spikes and resulting API slowness starting 18:45 October 26 - https://phabricator.wikimedia.org/T179156#3715115 (10Ladsgroup) @hoo and I sampled x-cache from wberequest dataset for the hour of 19 yesterday for two... [11:38:00] 10Beta-Cluster-Infrastructure: Changes to beta cluster - https://phabricator.wikimedia.org/T179157#3715118 (10zeljkofilipin) [11:39:22] 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team (Kanban), 10User-zeljkofilipin: Changes to beta cluster - https://phabricator.wikimedia.org/T179157#3715132 (10zeljkofilipin) p:05Triage>03Low a:03zeljkofilipin [11:51:51] 10Release-Engineering-Team, 10ORES, 10Operations, 10Scoring-platform-team, and 3 others: 503 spikes and resulting API slowness starting 18:45 October 26 - https://phabricator.wikimedia.org/T179156#3715170 (10hoo) This has some potentially interesting patterns: `watchlist, recentchanges, contributions, log... [11:53:31] 10MediaWiki-Codesniffer, 10Wikidata: Review rules in wikibase/wikibase-codesniffer and see which are appropriate for MW-CS - https://phabricator.wikimedia.org/T164653#3715177 (10thiemowmde) p:05Triage>03Normal [12:04:49] 10Browser-Tests-Infrastructure, 10Release-Engineering-Team (Kanban), 10Developer-Relations (Oct-Dec 2017), 10User-zeljkofilipin: Tech talk: Selenium tests in Node.js - https://phabricator.wikimedia.org/T171852#3715217 (10zeljkofilipin) Blog post: https://phabricator.wikimedia.org/phame/post/view/78/tech_ta... [12:06:31] 10Release-Engineering-Team, 10ORES, 10Operations, 10Scoring-platform-team, and 3 others: 503 spikes and resulting API slowness starting 18:45 October 26 - https://phabricator.wikimedia.org/T179156#3715032 (10Marostegui) From those two masters's (s4 and s5) graphs, we can see that whatever happened, happene... [12:08:48] 10Release-Engineering-Team, 10ORES, 10Operations, 10Scoring-platform-team, and 3 others: 503 spikes and resulting API slowness starting 18:45 October 26 - https://phabricator.wikimedia.org/T179156#3715242 (10thiemowmde) p:05Triage>03High The tasks description talks about ongoing investigation. Is there... [12:12:31] 10Release-Engineering-Team, 10ORES, 10Operations, 10Scoring-platform-team, and 3 others: 503 spikes and resulting API slowness starting 18:45 October 26 - https://phabricator.wikimedia.org/T179156#3715253 (10hoo) p:05High>03Triage >>! In T179156#3715242, @thiemowmde wrote: > The tasks description talks... [12:22:12] 10Release-Engineering-Team, 10MediaWiki-Platform-Team, 10Patch-For-Review, 10Performance-Team (Radar): Support multi-instance hosts on mediawiki-config - https://phabricator.wikimedia.org/T178553#3715279 (10Marostegui) [12:25:42] (03PS1) 10Gehel: Add publication of maven site for wikidata-query-rdf [integration/config] - 10https://gerrit.wikimedia.org/r/386815 [12:26:27] (03CR) 10Gehel: "It is probably possible to reduce duplication in job-templates.yaml, if you know how, I'll take the explanation!" [integration/config] - 10https://gerrit.wikimedia.org/r/386815 (owner: 10Gehel) [12:26:48] hashar: ^ if you have a few minutes... (low priority) [12:28:58] 10Continuous-Integration-Infrastructure (shipyard), 10Release-Engineering-Team (Kanban): CI docker slaves: Could not find data item profile::ci::docker::settings - https://phabricator.wikimedia.org/T179160#3715292 (10hashar) [12:31:56] 10Continuous-Integration-Infrastructure (shipyard), 10Release-Engineering-Team (Kanban): CI docker slaves: Could not find data item profile::ci::docker::settings - https://phabricator.wikimedia.org/T179160#3715306 (10hashar) Caused by puppet patch https://gerrit.wikimedia.org/r/#/c/386166/ adfd61425570121f6cec... [12:34:36] 10Release-Engineering-Team, 10ORES, 10Operations, 10Scoring-platform-team, and 3 others: 503 spikes and resulting API slowness starting 18:45 October 26 - https://phabricator.wikimedia.org/T179156#3715319 (10BBlack) Copying this in from etherpad (this is less awful than 6 hours of raw IRC+SAL logs, but sti... [12:34:37] 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team (Kanban), 10User-zeljkofilipin: Changes to beta cluster - https://phabricator.wikimedia.org/T179157#3715321 (10zeljkofilipin) [12:35:40] 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team (Kanban), 10User-zeljkofilipin: Did something change in beta cluster configuration around September 16 2017? - https://phabricator.wikimedia.org/T179157#3715118 (10zeljkofilipin) [12:36:39] 10Continuous-Integration-Infrastructure (shipyard), 10Release-Engineering-Team (Kanban), 10Patch-For-Review: CI docker slaves: Could not find data item profile::ci::docker::settings - https://phabricator.wikimedia.org/T179160#3715332 (10hashar) p:05Triage>03High a:03hashar [12:36:56] 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team (Kanban), 10User-zeljkofilipin: Did something change in beta cluster configuration around September 16 2017? - https://phabricator.wikimedia.org/T179157#3715334 (10zeljkofilipin) [12:37:32] 10Release-Engineering-Team, 10ORES, 10Operations, 10Scoring-platform-team, and 3 others: 503 spikes and resulting API slowness starting 18:45 October 26 - https://phabricator.wikimedia.org/T179156#3715335 (10BBlack) My gut instinct remains what it was at the end of the log above. I think something in the... [12:42:46] 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team (Kanban), 10User-zeljkofilipin: Did something change in beta cluster configuration around September 16 2017? - https://phabricator.wikimedia.org/T179157#3715341 (10zeljkofilipin) Sent e-mail to [[ https://lists.wikimedia.org/pipermail/wikitech-l/2017-... [12:47:51] 10Release-Engineering-Team, 10ORES, 10Operations, 10Scoring-platform-team, and 3 others: 503 spikes and resulting API slowness starting 18:45 October 26 - https://phabricator.wikimedia.org/T179156#3715381 (10BBlack) Unless anyone objects, I'd like to start with reverting our emergency `varnish max_connecti... [12:48:35] RECOVERY - Puppet errors on integration-slave-docker-1003 is OK: OK: Less than 1.00% above the threshold [0.0] [12:49:14] RECOVERY - Puppet errors on integration-slave-docker-1006 is OK: OK: Less than 1.00% above the threshold [0.0] [12:50:03] 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team (Kanban), 10User-zeljkofilipin: Did something change in beta cluster configuration around September 16 2017? - https://phabricator.wikimedia.org/T179157#3715118 (10Bawolff) It would be helpful here to include the names of the pages that the script tri... [12:54:44] gehel: looking [12:56:32] 10Release-Engineering-Team, 10ORES, 10Operations, 10Scoring-platform-team, and 4 others: 503 spikes and resulting API slowness starting 18:45 October 26 - https://phabricator.wikimedia.org/T179156#3715432 (10hoo) I think I found the root cuase now, seems it's actually related to the WikibaseQualityConstra... [12:56:55] gehel: one can use YAML merging feature [12:57:16] hashar: yeah, I saw you do something like that... [12:58:13] gehel: alternatively in the JJB project named "wikidata", you can use the already existing template '{name}-{project}-maven-site-publish' [12:58:22] and pass it project: query-rdf [12:58:32] but that is a bit messy [12:58:48] since there is already a jjb project named "wikidata-query-rdf" [12:58:53] yeah, I was trying to avoid that second option [13:00:11] 10Release-Engineering-Team, 10ORES, 10Operations, 10Scoring-platform-team, and 4 others: 503 spikes and resulting API slowness starting 18:45 October 26 - https://phabricator.wikimedia.org/T179156#3715446 (10BBlack) >>! In T179156#3715432, @hoo wrote: > I think I found the root cuase now, seems it's actu... [13:00:48] hashar: ok, I'll try to make that work and get back to you if it does not [13:00:54] no_justification upstream are looking to remove reviewdb support in the next 6 months. [13:01:01] no concrete plans from them yet [13:01:05] "That said I honestly don't think it would be that much worse to do with ReviewDb & GWT, since it's mostly just search and replace. We're looking at at least another 6 months or so of ReviewDb code existing, which would be a long time to wait." [13:01:23] 10Release-Engineering-Team, 10ORES, 10Operations, 10Scoring-platform-team, and 4 others: 503 spikes and resulting API slowness starting 18:45 October 26 - https://phabricator.wikimedia.org/T179156#3715447 (10hoo) >>! In T179156#3715446, @BBlack wrote: > >>>! In T179156#3715432, @hoo wrote: >> I think I fo... [13:01:36] (03PS2) 10Hashar: Add publication of maven site for wikidata-query-rdf [integration/config] - 10https://gerrit.wikimedia.org/r/386815 (owner: 10Gehel) [13:01:40] gehel: ^^ [13:01:59] I have renamed the existing to '{name}-maven-site-publish' [13:02:07] (ie I have removed the {project} variable [13:02:10] so that is the canonical template [13:02:23] gave it a YAML alias name with &job_template_maven_site_publish [13:02:49] then it creates a template that has the same content ( !!merge : *foo ) [13:02:54] hashar: the doctest property was different in the 2 templates, so that has to be reintroduced in some way [13:03:04] and override the doc-publish part that has {project} [13:03:16] 10Continuous-Integration-Infrastructure (shipyard), 10Release-Engineering-Team (Kanban), 10Patch-For-Review: CI docker slaves: Could not find data item profile::ci::docker::settings - https://phabricator.wikimedia.org/T179160#3715471 (10hashar) 05Open>03Resolved [13:03:35] 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team (Kanban), 10User-zeljkofilipin: Did something change in beta cluster configuration around September 16 2017? - https://phabricator.wikimedia.org/T179157#3715472 (10Bawolff) Perhaps related: * 16:47, 21 September 2017 Greg (WMF) (talk | contribs) unbl... [13:03:39] gehel: yeah that is line 105-108 of https://gerrit.wikimedia.org/r/#/c/386815/2/jjb/job-templates.yaml [13:03:46] Oh right, you already took care of that! [13:03:53] think of !!merge as inheriting from another class [13:03:54] great! What would I do without you! [13:03:56] 10Gerrit: Migrate to NoteDb - https://phabricator.wikimedia.org/T174034#3715473 (10Paladox) p:05Lowest>03Low Upstream are looking to removing ReviewDB in the next 6 months. Wont affect any stable release. [13:04:03] 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team (Kanban), 10User-zeljkofilipin: Did something change in beta cluster configuration around September 16 2017? - https://phabricator.wikimedia.org/T179157#3715476 (10Bawolff) As an aside, the fact it took over a month to notice suggests something about... [13:04:09] * gehel needs to send some chocolate to hashar [13:04:13] and this way the difference between teh templates is immediately obvious [13:04:21] triggers: - zuul is readded because of a bug in jjb [13:04:31] (03CR) 10Gehel: [C: 031] "LGTM" [integration/config] - 10https://gerrit.wikimedia.org/r/386815 (owner: 10Gehel) [13:05:04] (03CR) 10Hashar: [C: 032] "INFO:jenkins_jobs.builder:Creating jenkins job wikidata-query-rdf-maven-site-publish" [integration/config] - 10https://gerrit.wikimedia.org/r/386815 (owner: 10Gehel) [13:05:05] RECOVERY - Puppet errors on integration-slave-docker-1002 is OK: OK: Less than 1.00% above the threshold [0.0] [13:05:06] well done! [13:06:13] hashar: I'll send you the email where I explain why I did all that. You might be mildly interested... [13:06:14] (03Merged) 10jenkins-bot: Add publication of maven site for wikidata-query-rdf [integration/config] - 10https://gerrit.wikimedia.org/r/386815 (owner: 10Gehel) [13:06:53] !log zuul enqueue --trigger gerrit --pipeline postmerge --project wikidata/query/rdf --change 383791,15 [13:06:56] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [13:08:11] RECOVERY - Puppet errors on integration-slave-docker-1005 is OK: OK: Less than 1.00% above the threshold [0.0] [13:08:26] gehel: and it is running at https://integration.wikimedia.org/ci/job/wikidata-query-rdf-maven-site-publish/1/console [13:08:36] based on the last merged change on that repo https://gerrit.wikimedia.org/r/#/c/383791/ [13:09:03] and fails :( [13:10:32] hashar: expected... there is one more change to merge on that repo to make it work [13:11:02] https://gerrit.wikimedia.org/r/#/c/386803/ (but I'm waiting for Stas to merge it) [13:11:30] !log provision deployment-redis{03,04} with stretch - T148637 [13:11:34] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [13:11:34] T148637: Port redis statistics to Prometheus - https://phabricator.wikimedia.org/T148637 [13:16:23] gehel: also madhuvishy made the release to mavencentral or archive automatic (when a tag is pushed) [13:16:32] but I havent quite looked at the setup [13:17:01] gehel: also on https://doc.wikimedia.org/search-highlighter/experimental/ [13:17:04] so we store credentials for that in jenkins? [13:17:09] the links fail and link to /index.html :( [13:17:16] yeah there is some credential in jenkins [13:17:17] RECOVERY - Puppet errors on integration-slave-docker-1001 is OK: OK: Less than 1.00% above the threshold [0.0] [13:17:44] yep, there are some broken links between projects, I might even fix them at some point, but they are not really interesting... [13:18:04] https://doc.wikimedia.org/search-highlighter/experimental/experimental-highlighter-core/dependency-updates-report.html [13:18:06] oh that is lovely [13:19:03] hashar: my next step is to see if we can publish to sonarcloud on postmerge, to get reports over time (https://sonarcloud.io/dashboard?id=org.jmxtrans%3Ajmxtrans-parent as an example) [13:26:16] If anyone is in to phan stuff, I'd love some feedback as to if https://gerrit.wikimedia.org/r/386830 is useful or not [13:26:59] 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team (Kanban), 10User-zeljkofilipin: Did something change in beta cluster configuration around September 16 2017? - https://phabricator.wikimedia.org/T179157#3715759 (10zeljkofilipin) >>! In T179157#3715476, @Bawolff wrote: > As an aside, the fact it took... [13:27:20] wikibugs is a little slow today ;) [13:31:53] oh, that's not my original comment, nevermind [13:31:55] 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team (Kanban), 10User-zeljkofilipin: Did something change in beta cluster configuration around September 16 2017? - https://phabricator.wikimedia.org/T179157#3715832 (10zeljkofilipin) >>! In T179157#3715392, @Bawolff wrote: > It would be helpful here to in... [13:32:13] 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team (Kanban), 10User-zeljkofilipin: Did something change in beta cluster configuration around September 16 2017? - https://phabricator.wikimedia.org/T179157#3715833 (10zeljkofilipin) [13:35:03] zeljkof: regardless, I would still question the usefulness of tests where it takes over a month for people to figure out why its broken. If something really was broken, it'd be way to late at this point to start figuring out what, the damage to user experiance would already be done [13:35:35] bawolff: the tests were not broken for a month, I think (would have to check) [13:35:47] and they were broken only in one environment (beta cluster) [13:35:59] they were working fine on per-commit job [13:36:27] and yes, ideally a broken test would be an unbreak-now type event [13:36:41] but then, different teams have different priorites [13:38:29] in this case, the test needs to run in at least 3 different environments (mediawiki-vagrant, jenkins slave, beta cluster) [13:38:59] and a team might decide that a broken test in one environment is not a big deal [13:39:34] anyway, nobody is forcing teams to write (selenium) tests, it's up to them to decide if they have value for the team or not [13:39:48] Anyways, should that bug be closed given we know what changed on sept 17? [13:40:03] 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team (Kanban), 10User-zeljkofilipin: Did something change in beta cluster configuration around September 16 2017? - https://phabricator.wikimedia.org/T179157#3715926 (10zeljkofilipin) >>! In T179157#3715472, @Bawolff wrote: > * 03:32, 17 September 2017 Sau... [13:40:24] bawolff: probably, I have just replied [13:40:45] I will leave it open over the weekend, in case anybody else has a comment [13:40:52] and I will close it on Monday [13:41:07] thanks a lot, I think deleting the page on beta cluster caused the problem [13:43:12] Looks like that guy caused a bunch of problems. Blocking all the testing users and whatnot [13:45:46] probably trying to help [13:45:57] but did not work as expected :) [13:52:59] (03PS1) 10Hashar: LinkedWiki requires npm install -> non voting [integration/config] - 10https://gerrit.wikimedia.org/r/386839 [13:53:10] (03PS2) 10Hashar: LinkedWiki requires npm install -> non voting [integration/config] - 10https://gerrit.wikimedia.org/r/386839 [13:54:27] (03PS1) 10Hashar: Add extensions MetaMaster PhabTaskGraph [integration/config] - 10https://gerrit.wikimedia.org/r/386841 [13:54:29] (03CR) 10Umherirrender: [C: 031] LinkedWiki requires npm install -> non voting [integration/config] - 10https://gerrit.wikimedia.org/r/386839 (owner: 10Hashar) [13:55:18] (03PS1) 10Hashar: Add npm to BlueSpiceSubPageTree [integration/config] - 10https://gerrit.wikimedia.org/r/386842 [13:56:11] (03PS1) 10Hashar: CharRangeSpan pass tests -> voting [integration/config] - 10https://gerrit.wikimedia.org/r/386843 [13:57:32] (03PS1) 10Hashar: SemanticGenealogy fails tests -> non voting [integration/config] - 10https://gerrit.wikimedia.org/r/386845 [14:04:21] (03PS1) 10Hashar: Update CustomPage non voting reasons [integration/config] - 10https://gerrit.wikimedia.org/r/386850 (https://phabricator.wikimedia.org/T154803) [14:06:16] (03PS17) 10Hashar: Update tests for BlueSpice [integration/config] - 10https://gerrit.wikimedia.org/r/380790 (owner: 10Umherirrender) [14:06:40] (03CR) 10Hashar: "I have split this change in several smaller ones. That is easier to handle this way :]" [integration/config] - 10https://gerrit.wikimedia.org/r/380790 (owner: 10Umherirrender) [14:07:09] (03CR) 10Hashar: [C: 032] "Maybe we will want to consider adding support for npm install one day." [integration/config] - 10https://gerrit.wikimedia.org/r/386839 (owner: 10Hashar) [14:07:18] (03CR) 10Hashar: [C: 032] Add extensions MetaMaster PhabTaskGraph [integration/config] - 10https://gerrit.wikimedia.org/r/386841 (owner: 10Hashar) [14:07:26] (03CR) 10Hashar: [C: 032] Add npm to BlueSpiceSubPageTree [integration/config] - 10https://gerrit.wikimedia.org/r/386842 (owner: 10Hashar) [14:07:47] (03CR) 10Umherirrender: "There are new bluespice extensions in the meantime" [integration/config] - 10https://gerrit.wikimedia.org/r/380790 (owner: 10Umherirrender) [14:08:08] (03PS2) 10Hashar: CharRangeSpan pass tests -> voting [integration/config] - 10https://gerrit.wikimedia.org/r/386843 [14:08:12] (03Merged) 10jenkins-bot: LinkedWiki requires npm install -> non voting [integration/config] - 10https://gerrit.wikimedia.org/r/386839 (owner: 10Hashar) [14:08:25] (03CR) 10Hashar: [C: 032] SemanticGenealogy fails tests -> non voting [integration/config] - 10https://gerrit.wikimedia.org/r/386845 (owner: 10Hashar) [14:08:36] (03CR) 10Hashar: [C: 031] Update CustomPage non voting reasons [integration/config] - 10https://gerrit.wikimedia.org/r/386850 (https://phabricator.wikimedia.org/T154803) (owner: 10Hashar) [14:09:08] (03Merged) 10jenkins-bot: Add extensions MetaMaster PhabTaskGraph [integration/config] - 10https://gerrit.wikimedia.org/r/386841 (owner: 10Hashar) [14:09:10] (03Merged) 10jenkins-bot: Add npm to BlueSpiceSubPageTree [integration/config] - 10https://gerrit.wikimedia.org/r/386842 (owner: 10Hashar) [14:11:22] (03CR) 10Hashar: [C: 032] CharRangeSpan pass tests -> voting [integration/config] - 10https://gerrit.wikimedia.org/r/386843 (owner: 10Hashar) [14:14:12] (03Merged) 10jenkins-bot: CharRangeSpan pass tests -> voting [integration/config] - 10https://gerrit.wikimedia.org/r/386843 (owner: 10Hashar) [14:14:38] (03CR) 10Hashar: [C: 032] SemanticGenealogy fails tests -> non voting [integration/config] - 10https://gerrit.wikimedia.org/r/386845 (owner: 10Hashar) [14:15:17] (03PS2) 10Hashar: SemanticGenealogy fails tests -> non voting [integration/config] - 10https://gerrit.wikimedia.org/r/386845 [14:15:23] (03CR) 10Hashar: [C: 032] SemanticGenealogy fails tests -> non voting [integration/config] - 10https://gerrit.wikimedia.org/r/386845 (owner: 10Hashar) [14:17:04] PROBLEM - Host deployment-redis03 is DOWN: CRITICAL - Host Unreachable (10.68.21.36) [14:17:04] PROBLEM - Host deployment-redis04 is DOWN: CRITICAL - Host Unreachable (10.68.16.190) [14:17:19] (03Merged) 10jenkins-bot: SemanticGenealogy fails tests -> non voting [integration/config] - 10https://gerrit.wikimedia.org/r/386845 (owner: 10Hashar) [14:18:04] 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team (Kanban), 10User-zeljkofilipin: Did something change in beta cluster configuration around September 16 2017? - https://phabricator.wikimedia.org/T179157#3715118 (10Jdlrobson) @Bawolff we noticed the failures within a day. Whats taken long is fixing th... [14:18:06] 10Release-Engineering-Team (Kanban): Video recording for Selenium tests in Node.js - https://phabricator.wikimedia.org/T179188#3716038 (10zeljkofilipin) [14:18:54] 10Release-Engineering-Team (Kanban), 10User-zeljkofilipin: Video recording for Selenium tests in Node.js - https://phabricator.wikimedia.org/T179188#3716038 (10zeljkofilipin) p:05Triage>03High [14:20:06] 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team (Kanban), 10User-zeljkofilipin: Did something change in beta cluster configuration around September 16 2017? - https://phabricator.wikimedia.org/T179157#3716064 (10Jdlrobson) If this is the case : https://phabricator.wikimedia.org/T179157#3715926 We... [14:22:25] 10Release-Engineering-Team (Kanban), 10User-zeljkofilipin: Video recording for Selenium tests in Node.js - https://phabricator.wikimedia.org/T179188#3716071 (10zeljkofilipin) [14:24:45] 10Release-Engineering-Team, 10Operations, 10Patch-For-Review: upload problem for parsoid release - https://phabricator.wikimedia.org/T179134#3716072 (10ssastry) Thanks! :) [14:27:15] (03CR) 10Umherirrender: [C: 031] Update CustomPage non voting reasons [integration/config] - 10https://gerrit.wikimedia.org/r/386850 (https://phabricator.wikimedia.org/T154803) (owner: 10Hashar) [14:27:16] 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team (Kanban), 10User-zeljkofilipin: Did something change in beta cluster configuration around September 16 2017? - https://phabricator.wikimedia.org/T179157#3716074 (10zeljkofilipin) enwiki at beta cluster allows anonymous edits, but not anonymous page cr... [14:27:47] (03CR) 10Hashar: [C: 032] Update CustomPage non voting reasons [integration/config] - 10https://gerrit.wikimedia.org/r/386850 (https://phabricator.wikimedia.org/T154803) (owner: 10Hashar) [14:28:57] 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team (Kanban), 10User-zeljkofilipin: Did something change in beta cluster configuration around September 16 2017? - https://phabricator.wikimedia.org/T179157#3716077 (10Jdlrobson) Ah. Thanks for this distinction. Then yes.. mystery solved :) [14:31:04] (03PS2) 10Hashar: Update CustomPage non voting reasons [integration/config] - 10https://gerrit.wikimedia.org/r/386850 (https://phabricator.wikimedia.org/T154803) [14:31:11] (03CR) 10Hashar: [C: 032] Update CustomPage non voting reasons [integration/config] - 10https://gerrit.wikimedia.org/r/386850 (https://phabricator.wikimedia.org/T154803) (owner: 10Hashar) [14:33:43] (03Merged) 10jenkins-bot: Update CustomPage non voting reasons [integration/config] - 10https://gerrit.wikimedia.org/r/386850 (https://phabricator.wikimedia.org/T154803) (owner: 10Hashar) [14:34:39] PROBLEM - Puppet errors on deployment-redis05 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [14:36:01] PROBLEM - Puppet errors on deployment-redis06 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [14:41:34] 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team (Kanban), 10User-zeljkofilipin: Did something change in beta cluster configuration around September 16 2017? - https://phabricator.wikimedia.org/T179157#3716097 (10zeljkofilipin) Probably. I will leave the task open until Monday, if case somebody else... [14:43:37] 10Continuous-Integration-Config, 10TestMe: fix or mark as inactive extensions currently failing CI - https://phabricator.wikimedia.org/T134090#3716102 (10Umherirrender) [14:46:21] hashar, thcipriani|afk, greg-g, other beta-interested people: I'm merging some changes to the code that automatically rebases puppet for local puppetmasters. It fixes a known bug and is well-tested, but if you encounter unexpected divergence from what you'd expect in beta's puppet repo it's the first thing to look at. [14:46:24] context in T177944 [14:46:25] T177944: k8s nodes sometimes getting bad token value from hiera - https://phabricator.wikimedia.org/T177944 [14:51:40] 10Browser-Tests-Infrastructure, 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Kanban), 10Wikidata, and 3 others: Run Wikibase daily browser tests on Jenkins - https://phabricator.wikimedia.org/T167432#3716110 (10zeljkofilipin) >>! In T167432#3709235, @WMDE-leszek wrote: > @zeljkofili... [14:56:08] 10Continuous-Integration-Config, 10Composer: Run `composer install` on Jenkins for AbuseFilter & AntiSpoof - https://phabricator.wikimedia.org/T178452#3716120 (10Umherirrender) 05Open>03declined Was addd to mediawiki/vendor, because there are wmf deployed extensions For reference zuul/layout.yaml in integ... [15:03:33] 10Release-Engineering-Team, 10ORES, 10Operations, 10Scoring-platform-team, and 4 others: 503 spikes and resulting API slowness starting 18:45 October 26 - https://phabricator.wikimedia.org/T179156#3716144 (10Lucas_Werkmeister_WMDE) > (Permalink: https://grafana.wikimedia.org/dashboard/db/wikidata-quality?p... [15:06:28] 10Release-Engineering-Team (Kanban), 10User-zeljkofilipin: Run Cucumber Selenium tests in Node.js - https://phabricator.wikimedia.org/T179190#3716147 (10zeljkofilipin) [15:07:03] 10Release-Engineering-Team (Kanban), 10User-zeljkofilipin: Run Cucumber+Selenium+Node.js in CI - https://phabricator.wikimedia.org/T179190#3716162 (10zeljkofilipin) [15:07:15] 10Release-Engineering-Team (Kanban), 10User-zeljkofilipin: Run Cucumber+Selenium+Node.js in CI - https://phabricator.wikimedia.org/T179190#3716147 (10zeljkofilipin) p:05Triage>03High [15:12:27] 10Browser-Tests-Infrastructure, 10releng-201718-q1, 10MediaWiki-General-or-Unknown, 10Epic, and 6 others: Port Selenium tests from Ruby to Node.js - https://phabricator.wikimedia.org/T139740#3716188 (10zeljkofilipin) [15:15:35] andrewbogott: I just switched to this window to give folks the warning you gave 30 minutes ago. :) great minds think alike I guess [15:15:48] :) [15:25:34] 10Browser-Tests-Infrastructure, 10releng-201718-q1, 10MediaWiki-General-or-Unknown, 10Epic, and 6 others: Port Selenium tests from Ruby to Node.js - https://phabricator.wikimedia.org/T139740#3716237 (10zeljkofilipin) [15:26:23] 10Browser-Tests-Infrastructure, 10releng-201718-q1, 10MediaWiki-General-or-Unknown, 10Epic, and 6 others: Port Selenium tests from Ruby to Node.js - https://phabricator.wikimedia.org/T139740#3279727 (10zeljkofilipin) [15:29:34] 10Browser-Tests-Infrastructure, 10releng-201718-q1, 10MediaWiki-General-or-Unknown, 10Epic, and 6 others: Port Selenium tests from Ruby to Node.js - https://phabricator.wikimedia.org/T139740#3716258 (10zeljkofilipin) [15:30:46] 10Browser-Tests-Infrastructure, 10releng-201718-q1, 10MediaWiki-General-or-Unknown, 10Epic, and 6 others: Port Selenium tests from Ruby to Node.js - https://phabricator.wikimedia.org/T139740#3294583 (10zeljkofilipin) [15:34:04] PROBLEM - Free space - all mounts on deployment-kafka01 is CRITICAL: CRITICAL: deployment-prep.deployment-kafka01.diskspace.root.byte_percentfree (<100.00%) [15:36:52] 10Browser-Tests-Infrastructure, 10releng-201718-q1, 10MediaWiki-General-or-Unknown, 10Epic, and 6 others: Port Selenium tests from Ruby to Node.js - https://phabricator.wikimedia.org/T139740#3716282 (10zeljkofilipin) [15:39:38] RECOVERY - Puppet errors on deployment-redis05 is OK: OK: Less than 1.00% above the threshold [0.0] [15:40:50] well here's some new gerrit docs for new users https://gerrit-review.googlesource.com/c/homepage/+/129931 [15:40:50] heh [15:41:59] (03PS3) 10Hashar: Add support for OpenCV 3.0+ with autotools [integration/uprightdiff] - 10https://gerrit.wikimedia.org/r/386766 (owner: 10Legoktm) [15:43:09] 10Browser-Tests-Infrastructure, 10releng-201718-q1, 10MediaWiki-General-or-Unknown, 10Epic, and 6 others: Port Selenium tests from Ruby to Node.js - https://phabricator.wikimedia.org/T139740#3716294 (10zeljkofilipin) [15:44:20] (03CR) 10Hashar: "I took the liberty to update Kunal patch. I merely removed all the artifacts resulting from aclocal/autoconf/automake in favor of just run" [integration/uprightdiff] - 10https://gerrit.wikimedia.org/r/386766 (owner: 10Legoktm) [15:44:30] 10Browser-Tests-Infrastructure, 10releng-201718-q1, 10MediaWiki-General-or-Unknown, 10Epic, and 6 others: Port Selenium tests from Ruby to Node.js - https://phabricator.wikimedia.org/T139740#3294699 (10zeljkofilipin) [15:44:32] (03CR) 10jerkins-bot: [V: 04-1] Add support for OpenCV 3.0+ with autotools [integration/uprightdiff] - 10https://gerrit.wikimedia.org/r/386766 (owner: 10Legoktm) [15:45:20] legoktm: I have slightly cleaned up your uprightdiff patch ( https://gerrit.wikimedia.org/r/#/c/386766/3 ) but I am not familiar with GNU auto* tool chain [15:45:21] :( [15:45:44] https://en.wikipedia.org/wiki/GNU_Build_System is scary :] [15:46:01] RECOVERY - Puppet errors on deployment-redis06 is OK: OK: Less than 1.00% above the threshold [0.0] [15:49:37] (03PS18) 10Umherirrender: Update tests for BlueSpice [integration/config] - 10https://gerrit.wikimedia.org/r/380790 [15:49:52] 10Browser-Tests-Infrastructure, 10releng-201718-q1, 10MediaWiki-General-or-Unknown, 10Epic, and 6 others: Port Selenium tests from Ruby to Node.js - https://phabricator.wikimedia.org/T139740#3716317 (10zeljkofilipin) [15:49:57] (03CR) 10Umherirrender: "Patch Set 18: Added the mention repos" [integration/config] - 10https://gerrit.wikimedia.org/r/380790 (owner: 10Umherirrender) [15:53:10] 10Browser-Tests-Infrastructure, 10releng-201718-q1, 10MediaWiki-General-or-Unknown, 10Epic, and 6 others: Port Selenium tests from Ruby to Node.js - https://phabricator.wikimedia.org/T139740#3716333 (10zeljkofilipin) [15:54:27] 10Browser-Tests-Infrastructure, 10releng-201718-q1, 10MediaWiki-General-or-Unknown, 10Epic, and 6 others: Port Selenium tests from Ruby to Node.js - https://phabricator.wikimedia.org/T139740#3294735 (10zeljkofilipin) [15:55:47] legoktm: alternative is to switch to CMake :D [15:56:35] 10Browser-Tests-Infrastructure, 10releng-201718-q1, 10MediaWiki-General-or-Unknown, 10Epic, and 6 others: Port Selenium tests from Ruby to Node.js - https://phabricator.wikimedia.org/T139740#3716343 (10zeljkofilipin) [15:58:56] 10Browser-Tests-Infrastructure, 10releng-201718-q1, 10MediaWiki-General-or-Unknown, 10Epic, and 6 others: Port Selenium tests from Ruby to Node.js - https://phabricator.wikimedia.org/T139740#3716351 (10zeljkofilipin) [16:00:54] 10Browser-Tests-Infrastructure, 10releng-201718-q1, 10MediaWiki-General-or-Unknown, 10Epic, and 6 others: Port Selenium tests from Ruby to Node.js - https://phabricator.wikimedia.org/T139740#3294757 (10zeljkofilipin) [16:01:16] 10Browser-Tests-Infrastructure, 10Release-Engineering-Team (Kanban), 10releng-201718-q1, 10MediaWiki-General-or-Unknown, and 7 others: Port Selenium tests from Ruby to Node.js - https://phabricator.wikimedia.org/T139740#3716356 (10zeljkofilipin) a:03zeljkofilipin [16:02:04] Project selenium-CentralNotice » chrome,beta,Windows 7,BrowserTests build #561: 09SUCCESS in 1 min 3 sec: https://integration.wikimedia.org/ci/job/selenium-CentralNotice/BROWSER=chrome,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Windows%207,label=BrowserTests/561/ [16:02:54] Project selenium-CentralNotice » firefox,beta,Windows 7,BrowserTests build #561: 09SUCCESS in 1 min 53 sec: https://integration.wikimedia.org/ci/job/selenium-CentralNotice/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Windows%207,label=BrowserTests/561/ [16:03:05] 10Browser-Tests-Infrastructure, 10Release-Engineering-Team (Kanban), 10releng-201718-q1, 10MediaWiki-General-or-Unknown, and 7 others: Port Selenium tests from Ruby to Node.js - https://phabricator.wikimedia.org/T139740#3716358 (10zeljkofilipin) [16:04:26] 10Release-Engineering-Team (Kanban), 10releng-201718-q1, 10MediaWiki-General-or-Unknown, 10Epic, and 3 others: Port Selenium tests from Ruby to Node.js - https://phabricator.wikimedia.org/T139740#3294807 (10zeljkofilipin) [16:05:29] 10Browser-Tests-Infrastructure, 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Kanban), 10Wikidata, and 3 others: Run Wikibase daily browser tests on Jenkins - https://phabricator.wikimedia.org/T167432#3716363 (10zeljkofilipin) ``` 01:13:27.065 212 scenarios (84 failed, 128 passed) 01... [16:57:47] 10Release-Engineering-Team, 10ORES, 10Patch-For-Review, 10Scoring-platform-team (Current): Wheels built on ores-misc-01 are incompatible with ores* and scb* - https://phabricator.wikimedia.org/T179095#3716445 (10awight) [17:14:59] 10Release-Engineering-Team, 10ORES, 10Patch-For-Review, 10Scoring-platform-team (Current): Wheels built on ores-misc-01 are incompatible with ores* and scb* - https://phabricator.wikimedia.org/T179095#3716483 (10awight) Comparing environments: | Host | OS | Python | pip | | scb* | Jessie 8.6 | 3.4.2 | 1.5... [17:25:40] (03CR) 10Legoktm: "Thanks, I wasn't sure whether all of those should be committed. Doing a bit more research it seems like people say you shouldn't commit th" [integration/uprightdiff] - 10https://gerrit.wikimedia.org/r/386766 (owner: 10Legoktm) [18:05:36] twentyafterfour: K, I’m able to reproduce the gitmodules rewriting problem. [18:06:21] It’s actually stranger than I expected… aaah needs a submodule sync perhaps [18:07:02] yup. [18:07:57] wat. I’ve synced and it’s still messing with me, trying to load from an SSH URL that isn’t even in .gitmodules. I’m not sure why. [18:07:58] awight: I added code to do the submodule sync automatically [18:08:16] where should I look to see what you are seeing? [18:08:20] If you have time, feel free to take a look [18:08:25] ty [18:08:25] deployment-sca03.deployment-prep.eqiad.wmflabs [18:08:28] /srv/deployment/ores/deploy [18:08:45] or further upstream might be of interest, [18:09:00] deployment-tin.eqiad.wmflabs /srv/deployment/ores/deploy [18:09:05] try to scap that out if you would... [18:10:22] If you don’t mind be destroying the evidence, maybe I’ll just rm -rf /srv/deployment/ores/deploy-cache/revs/185170ffd169f73fd6383669f92e399f417e5170 [18:10:22] hmm the repos on disk look like they have the right remotes [18:11:45] I blew away the dir and re-cloned. Same problem [18:13:02] hmm so it's got the git-ssh urls in .git [18:13:07] in .git/config [18:13:22] and you want to see the deployment-tin urls for submodules, right? [18:14:18] AIUI that would be best [18:15:05] On tin, .git/config and .gitmodules both have the https urls [18:15:20] So this is caused by rewriting? [18:15:52] awight: yeah it's supposed to rewrite the urls and run git submodule sync [18:16:44] All I really care about in this case is that it’s rewriting the URLs as ssh to a .wmo machine, so it fails to pull the submodules. [18:19:21] I don't actually understand how/why it's doing that [18:20:17] seriously, me neither! Well I’d call it “normal” priority so no rush, and let me know how I can help. [18:23:57] 10Beta-Cluster-Infrastructure, 10Operations, 10Traffic, 10Patch-For-Review, and 2 others: Beta cluster is down - https://phabricator.wikimedia.org/T178841#3716644 (10greg) 05Open>03Resolved a:03greg Well, let's close this and make a follow-up task. [18:26:29] 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team (Kanban), 10Operations, 10Traffic, and 2 others: Investigate what caused the the unattended varnish upgrade in Beta Cluster - https://phabricator.wikimedia.org/T179197#3716649 (10greg) [18:26:58] 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team (Backlog), 10Operations, 10Traffic, 10User-greg: Investigate what caused the the unattended varnish upgrade in Beta Cluster - https://phabricator.wikimedia.org/T179197#3716668 (10greg) [18:27:30] 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team (Backlog), 10Operations, 10Traffic: Investigate what caused the the unattended varnish upgrade in Beta Cluster - https://phabricator.wikimedia.org/T179197#3716670 (10greg) a:05greg>03None [18:28:07] 10Beta-Cluster-Infrastructure, 10Operations, 10Traffic, 10Patch-For-Review, and 2 others: Beta cluster is down - https://phabricator.wikimedia.org/T178841#3704733 (10greg) a:05greg>03hashar [18:31:59] PROBLEM - Puppet errors on deployment-aqs01 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [18:39:22] PROBLEM - Puppet errors on deployment-mx is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [18:43:44] 10Release-Engineering-Team (Kanban), 10Scap (Tech Debt Sprint FY201718-Q2): Scap failing to rewrite submodule urls in beta - https://phabricator.wikimedia.org/T179013#3716723 (10mmodell) [18:43:52] awight: I think I figured it out [18:43:59] https://phabricator.wikimedia.org/D854 [18:44:18] no_justification: awight, care to review ^ [18:44:29] sure! [18:45:16] so what was happening is it forgot to update .gitmodules in the revision dir, it was only doing it on the -cache repo [18:46:04] Random Phab annoyance: if you're not logged in you get UTC time but no timezone and its in 12h [18:46:21] 6pm at 11am is weird haha [18:46:48] heh [18:48:53] twentyafterfour: D852 should be good now [18:48:58] D852: Stop using xrange() for python 3 compat - https://phabricator.wikimedia.org/D852 [18:49:32] D854 accepted [18:49:33] D854: Sync submodules for the cache AND the rev directory - https://phabricator.wikimedia.org/D854 [18:50:31] twentyafterfour: Neat, ok lmk when I should try scapping again. [19:05:41] * paladox reeding times will be easier on sunday [19:05:53] heh my time will be in utc again [19:09:01] 10Release-Engineering-Team (Watching / External), 10Operations, 10Patch-For-Review: upload problem for parsoid release - https://phabricator.wikimedia.org/T179134#3716774 (10greg) [19:11:43] jjb question if anyone knows, i have project named search with sub-project xgboost. It creates a job {name}-{project}-maven. I want to override this specific job, but creating a template for search-xgboost-maven doesn't get used, and the templated {name}-{project}-maven is used. Is there any good way to override that? [19:12:00] RECOVERY - Puppet errors on deployment-aqs01 is OK: OK: Less than 1.00% above the threshold [0.0] [19:13:29] it seems like perhaps the template named {name}-{project}-maven isn't a template, but a literal string match? [19:15:00] no_justification: https://phabricator.wikimedia.org/D855 [19:15:23] * twentyafterfour self-accepts [19:15:27] specifically i need to set job-template: maven: root-pom: to jvm-packages/pom.xml for this specific job [19:16:20] ebernhardson: hmm, my jjb is a lot rusty but I can take a look [19:16:43] ebernhardson so i understand you, you want to create a new template that uses the same as {name}-{project}-maven but additional things? [19:18:01] it sounds like he just needs to override some variables, not necessarily need a new template [19:19:15] paladox: the end result i need is that the job created by the template sets a specific variable. So for 99% of {name}-{project}-maven jobs the maven.root-pom should be 'pom.xml' (default). But for this one project it needs to be 'jvm-packages/pom.xml' [19:19:43] maybe there is some sort of templating variables that i've missed, poking at docs some more [19:20:05] https://docs.openstack.org/infra/jenkins-job-builder/definition.html#default-values-for-template-variables [19:20:25] twentyafterfour: ahh that looks plausible. lemme try [19:20:28] you should be able to define any arbitrary variables [19:20:29] hmm [19:21:09] and override them in the project: job: definition [19:23:57] awight: ok the new scap is deployed on deployment-sca03 and deployment-tin [19:24:16] twentyafterfour: awesome, smoke testing... [19:27:59] twentyafterfour: Looks great—however, the submodule URLs weren’t rewritten at all. That works for me, but probably not best in the long run, since it puts load on phabricator [19:30:27] Tbh your putting load on something regardless awight [19:31:28] (03PS1) 10EBernhardson: Enable maven builds for search/xgboost repository [integration/config] - 10https://gerrit.wikimedia.org/r/386938 [19:33:12] Zppix: sure, but phabricator is phragile and critical to everyone’s workflow. The alternative is to map to the deployment server itself, which is a scalable approach. [19:33:46] Github or gerrit links could be viable too no? [19:45:01] awight: it actually reads the submodule objects from the cache so it shouldn't actually hit phabricator with much additional load [19:45:39] twentyafterfour@deployment-sca03:/srv/deployment/ores/deploy-cache/revs/185170ffd169f73fd6383669f92e399f417e5170$ du -hs .git/modules [19:45:41] 1.9M .git/modules [19:45:45] cool! The rewriting is at your discretion, so as long as it works I’m happy. [19:46:40] well it _should_ rewrite the .gitmodules file but it isn't really fetching much from upstream because it uses git submodule update --init --reference ../../cache/ [19:48:12] I need to get more efficient at that stuff when working locally… these repos are huenormous [19:48:50] --reference is awesome as long as you _never_ remove the referenced repository [19:49:12] keep it as a read-only backing store, essentially [19:52:53] oof [19:53:03] how could that possibly ever go wrong ;-) [19:55:44] twentyafterfour: hey, while I’m dragging you through the bowels of scap, do you know if there’s a config setting to make a certain group of hosts the default target? Currently, scap without the “-l” flag will deploy to both of our clusters (which is incredibly time-consuming) [20:02:33] awight: in the scap.cfg, you can specify a filename in "dsh_targets" which contains the list of nodes [20:02:54] we have some plans to greatly improve the target selection mechanism [20:03:45] twentyafterfour: Weird! We already have dsh_targets, but it isn’t performing as advertised. [20:04:29] It points to the current production cluster, but when I deploy it pushes to both production (scap/ores) and the new cluster (scap/ores-cluster) [20:04:56] because of groups, I presume [20:06:21] https://doc.wikimedia.org/mw-tools-scap/scap3/repo_config.html#available-configuration-variables [20:06:54] "any hosts defined in both will be deployed with the first group" [20:08:37] awight: you could use environments to specify different lists of hosts... one environment per cluster [20:08:42] $(pwd)/scap/environments//scap.cfg [20:09:58] twentyafterfour: owow, thanks for the docs! [20:11:53] you can specify an environment: key in scap.cfg which should allow you to specify a default environment which is used if not overridden on the command line with an --environment argument [20:12:10] so that way your default hosts are in scap/environments/default/dsh_targets [20:12:28] and alternate cluster of hosts is in scap/environments/alt/dsh_targets [20:12:55] awight: ^ I think that's probably how you should set it up, rather than groups [20:13:30] groups are used mainly to ensure that certain servers get updated before others, or just to break a deploy up into chunks of N servers in a batch [20:13:45] environments are for separate clusters which are mostly independant of eachother [20:13:51] that sounds great—if we need to maintain the two for much longer I’ll do it that way. Play “A” is probably to deprecate the current prod cluster ASAP. [20:14:02] :) [20:24:49] 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team (Kanban), 10Operations, 10Patch-For-Review: nutcracker fails to start due to lack of /var/run/nutcracker (ex: deployment-videoscaler01 has memcached failures) - https://phabricator.wikimedia.org/T178457#3716891 (10hashar) 05Open>03Resolved a:03... [20:40:36] 10Continuous-Integration-Infrastructure (shipyard), 10Release-Engineering-Team (Kanban), 10Release Pipeline: Switch CI Docker Storage Driver to devicemapper - https://phabricator.wikimedia.org/T178663#3716910 (10hashar) p:05Triage>03Low low priority since right now `overlay2` does not seem to be an issue. [20:41:12] 10Continuous-Integration-Infrastructure (shipyard): Document minimum required version of docker to build CI images - https://phabricator.wikimedia.org/T178821#3704114 (10hashar) p:05Triage>03Normal [20:42:11] 10Continuous-Integration-Infrastructure (shipyard), 10Release-Engineering-Team (Kanban), 10Patch-For-Review: build CI images after a merge on integration/config - https://phabricator.wikimedia.org/T178594#3716914 (10hashar) a:05hashar>03None I did a basic prototype, it takes a while to build though (50 m... [20:46:31] 10Continuous-Integration-Infrastructure (shipyard): CI docker build should use a git cache - https://phabricator.wikimedia.org/T175968#3716921 (10hashar) p:05Triage>03Low Can be done by: * having some bare repositories on the build host * expose them via a volume: -v /srv/git:/srv/git * then clone from /srv/... [20:54:58] 10Continuous-Integration-Infrastructure (shipyard): Port castor to support docker container - https://phabricator.wikimedia.org/T179208#3716926 (10hashar) [20:56:53] (03CR) 10Hashar: [C: 04-2] "Probably need to fork Castor for the Docker support." [integration/config] - 10https://gerrit.wikimedia.org/r/385390 (owner: 10Hashar) [20:57:33] (03PS9) 10Hashar: Castor for Docker jobs [integration/config] - 10https://gerrit.wikimedia.org/r/385390 (https://phabricator.wikimedia.org/T179208) [20:57:45] 10Continuous-Integration-Infrastructure (shipyard), 10Release-Engineering-Team (Kanban), 10Patch-For-Review: Port castor to support docker container - https://phabricator.wikimedia.org/T179208#3716942 (10hashar) a:03hashar [20:58:14] 10Continuous-Integration-Infrastructure (shipyard), 10Release-Engineering-Team (Kanban), 10Patch-For-Review: Port castor to support docker container - https://phabricator.wikimedia.org/T179208#3716944 (10hashar) p:05Triage>03High high because that blocks further progress to migrate jobs to containers. [21:03:30] NoteDB will be the default backend in 2.15 https://gerrit-review.googlesource.com/#/c/gerrit/+/137111/ [21:03:32] heh [21:15:49] paladox: what is notedb? [21:16:04] hashar it's the new backend replacing reviewdb. [21:16:11] gerrit will no longer support a db [21:16:13] soon [21:16:20] it is moving everything to a repo [21:16:27] ohhhh [21:16:35] so all would be in some kind of git repository? [21:16:39] yes [21:16:42] All-users [21:16:43] just like /refs/meta/config ? [21:16:56] yes, but it will use for example [21:17:02] refs/users/01/1/ [21:17:08] and numbers go up [21:17:46] only users can read there own refs/users/* with the admin being allowed to view all users. [21:19:18] all http passwords will be hashed too, to prevent exposure. So if you forgot the password, the gui will no longer tell you it, so you will have to regenerate it. [21:19:43] NoteDB, also allows users to be cc'ed without being reviewers, it also allows you to be cc'ed without having an account. [21:20:01] though the last part is disabled by default for obvous reasons :) [21:20:55] ohh [21:20:57] while you are around [21:21:01] yep [21:21:10] there is some gerrit plugin to use custom avatars [21:21:15] yep [21:21:28] I think by default it uses gravatar which we cant enable due to privacy reasons [21:21:44] but I think there is one that allows to point to a custom URL [21:21:49] yeh [21:21:51] google uses it too [21:21:55] * paladox looks for it [21:22:01] and maybe (maybe) we could uses the avatar from phabricator :] [21:22:12] hashar https://gerrit.googlesource.com/plugins/avatars-external/ [21:22:14] eg given a Gerrit user Foobar [21:22:16] heh yeh [21:22:19] that would be a good idea [21:22:40] if there is a Phabricator account that is attached to LDAP account Foobar : send the avatar :] [21:22:47] heh yeh :) [21:22:53] or fallback to a predefine one [21:23:04] polygerrit makes avatar's look really nice [21:23:18] but I guess we would need some lookup functionality in phabricator [21:23:25] yes [21:23:36] maybe [21:23:37] hashar polygerrit's getting a new redesgn too https://docs.google.com/presentation/d/17q-ygGioZi_5DITLyELa8oaOr22e15AHy8cq6XTZ0nY/edit#slide=id.g27f16618ec_0_139 [21:23:55] nice [21:24:17] * paladox looks up how the avatar thing will work [21:24:47] so polygerrit is actively developed by Google isn't it ? [21:25:01] (given those slides have @google.com folks listed ) [21:25:26] yes [21:25:30] it is replacing GWTUI [21:25:41] GWTUI is being removed soon too [21:25:45] there's already a patch [21:26:11] hashar i helped to get most of the admin functionailty implemented :) [21:26:23] ie some of projects / all of groups / all of plugins [21:26:40] 10Release-Engineering-Team, 10MediaWiki-Platform-Team, 10Patch-For-Review, 10Performance-Team (Radar): Support multi-instance hosts on mediawiki-config - https://phabricator.wikimedia.org/T178553#3716994 (10Reedy) >>! In T178553#3715246, @Volans wrote: > I'm not very familiar with the beta-side of wmf-conf... [21:28:02] paladox: nice !!!! [21:28:09] Yep :) [21:28:27] paladox: really, I recommend you take note of the major things you are doing here and there [21:28:41] that will prove useful later on when looking for employment or creating your own company :D [21:28:43] heh thanks :) [21:28:47] heh [21:29:06] i think we can use commons [21:29:10] for the avatar thing [21:29:17] as phabricator numbers everything [21:29:31] and dosen't use a custom named url. [21:29:39] Unless there's a conduit we could use? [21:33:02] PROBLEM - Puppet errors on deployment-aqs01 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [21:33:47] hashar Maybe we could implement our own avatar plugin [21:34:16] that allows users to choose there own url for it but fix it to https://phabricator.wikimedia.org/file/F [21:35:20] paladox: most probably we would need a conduit [21:35:29] that would take as input the Gerrit username (eg: ldap) [21:35:29] yeh [21:35:35] yeh :) [21:35:57] and if an account is found in phabricator yield the user avatar (else some random default one) [21:36:05] yeh :) [21:36:17] it will at least go along with the new status field in polygerrit [21:36:20] maybe twentyafterfour can code that in a few seconds, or maybe it is a TON of work [21:36:22] and the new user page in gerrit [21:36:23] I have no clue really [21:36:51] anyway, thanks for the polygerrit link. Very informative [21:36:59] your welcome :) [21:37:00] and I guess I am going to sleep a bit earlier than usual [21:37:04] ok [21:37:19] i guess what we could do is have an anon conduit, and then get from the conduit and match the username which will give us the phab url for the file [21:37:37] hashar notedb supports emojies too [21:37:38] potentially [21:37:41] haha [21:37:55] that goes along with the new status field too [21:37:56] well I am disappearing. Have a good week-end paladox ! [21:37:56] heh [21:38:02] ok and you too :) [21:41:06] might not be to hard ... [21:42:08] twentyafterfour i guess all it requires is a new conduit method, and for it to map the username to the avatar [21:42:20] seems fairly easy [21:42:26] ie like we do with gerrit's, but probaly easy. [21:42:27] yeh [21:43:06] 10Release-Engineering-Team (Watching / External), 10ORES, 10Operations, 10Scoring-platform-team, and 4 others: 503 spikes and resulting API slowness starting 18:45 October 26 - https://phabricator.wikimedia.org/T179156#3717006 (10greg) [21:43:20] i can try to create the gerrit plugin once we have the conduit :) [21:43:33] as i can follow how we do it for its-phabricator [21:44:08] should it use gravitar? [21:44:50] nope [21:45:00] it should use the phab's file thingy [21:49:00] or i wonder should it use commons? You can specify a size there. [22:02:01] twentyafterfour i guess we could base it on https://phabricator.wikimedia.org/api/user.ldapquery [22:02:11] it returns the profile image [22:02:41] but it should probaly not be an array [22:03:06] though it seems some users are mw [22:05:01] I guess users can link there ldap account [22:05:13] * paladox creates a new gerrit plugin named repo [22:13:02] RECOVERY - Puppet errors on deployment-aqs01 is OK: OK: Less than 1.00% above the threshold [0.0] [22:20:36] 10Gerrit, 10Phabricator: Create avatar plugin for gerrit that uses phab's conduit to get users profile image - https://phabricator.wikimedia.org/T179212#3717013 (10Paladox) [22:20:47] 10Gerrit, 10Phabricator: Create avatar plugin for gerrit that uses phab's conduit to get users profile image - https://phabricator.wikimedia.org/T179212#3717025 (10Paladox) p:05Triage>03Normal a:03Paladox [22:22:03] 10Gerrit, 10Phabricator: Create avatar plugin for gerrit that uses phab's conduit to get users profile image - https://phabricator.wikimedia.org/T179212#3717013 (10Paladox) Though the only problem would be, gerrit needs to make the image smaller or bigger in some cases. Is that supported in phab? [22:34:00] PROBLEM - Puppet errors on deployment-aqs01 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [23:13:58] RECOVERY - Puppet errors on deployment-aqs01 is OK: OK: Less than 1.00% above the threshold [0.0]