[01:00:18] 10Gerrit, 10Release-Engineering-Team (Someday), 10Wikimedia-Logstash, 10Patch-For-Review, 10Technical-Debt: Look into shoving gerrit logs into logstash - https://phabricator.wikimedia.org/T141324#3780045 (10Dzahn) I applied the changes first on gerrit2001 and later on cobalt. There seemed to be no issue... [01:43:15] 10Gerrit, 10Developer-Relations, 10GitHub-Mirrors, 10Repository-Admins, and 3 others: Add CODE_OF_CONDUCT.md to Wikimedia repositories - https://phabricator.wikimedia.org/T165540#3780061 (10Tgr) [01:58:28] PROBLEM - Puppet errors on deployment-tin is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [02:38:26] RECOVERY - Puppet errors on deployment-tin is OK: OK: Less than 1.00% above the threshold [0.0] [03:58:33] Project selenium-MultimediaViewer » firefox,mediawiki,Linux,BrowserTests build #585: 04FAILURE in 2 min 32 sec: https://integration.wikimedia.org/ci/job/selenium-MultimediaViewer/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=mediawiki,PLATFORM=Linux,label=BrowserTests/585/ [03:58:53] 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team (Kanban), 10Deployments: cxserver deployment on beta is broken - https://phabricator.wikimedia.org/T181037#3780180 (10KartikMistry) 05Open>03Resolved a:03KartikMistry Thanks @hashar ! [04:00:50] 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team (Kanban), 10Deployments: cxserver deployment on beta is broken - https://phabricator.wikimedia.org/T181037#3780183 (10KartikMistry) Added instructions for Beta at: https://www.mediawiki.org/wiki/Content_translation/Deployments/How-to#Beta [04:01:02] 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team (Kanban), 10Deployments, 10Language-2017-Oct-Dec: cxserver deployment on beta is broken - https://phabricator.wikimedia.org/T181037#3780185 (10KartikMistry) [05:11:48] 10MediaWiki-Releasing, 10MediaWiki-Release-Tools: MediaWiki release patch files should be based off of the previous tarball - https://phabricator.wikimedia.org/T181116#3780204 (10Legoktm) [05:34:30] Looks like Beta Cluster is still using wiki@wikimedia.org for its notifications (e.g. "Congrats on 100th edit") [06:01:19] 10Continuous-Integration-Config, 10Proton, 10Readers-Web-Backlog (Tracking): Set up Jenkins for chromium-render and chromium-render-deploy repositories - https://phabricator.wikimedia.org/T179552#3780262 (10phuedx) [06:45:41] PROBLEM - Puppet errors on deployment-zotero01 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [07:09:01] RECOVERY - Free space - all mounts on deployment-fluorine02 is OK: OK: All targets OK [07:25:43] RECOVERY - Puppet errors on deployment-zotero01 is OK: OK: Less than 1.00% above the threshold [0.0] [07:34:59] PROBLEM - Free space - all mounts on integration-slave-jessie-1001 is CRITICAL: CRITICAL: integration.integration-slave-jessie-1001.diskspace._mnt.byte_percentfree (No valid datapoints found)integration.integration-slave-jessie-1001.diskspace._srv.byte_percentfree (<10.00%) [08:22:18] 10Continuous-Integration-Config, 10MediaWiki-Debian: jenkins-debian-glue job should support UNRELEASED - https://phabricator.wikimedia.org/T181120#3780327 (10hashar) [08:33:59] (03PS1) 10Hashar: debian-glue: use prev distro instead of UNRELEASED [integration/config] - 10https://gerrit.wikimedia.org/r/392789 (https://phabricator.wikimedia.org/T181120) [08:34:36] 10Continuous-Integration-Config, 10Release-Engineering-Team (Kanban), 10MediaWiki-Debian, 10Patch-For-Review: jenkins-debian-glue job should support UNRELEASED - https://phabricator.wikimedia.org/T181120#3780374 (10hashar) a:03hashar I have already updated the debian-glue-non-voting job [08:37:39] (03CR) 10Hashar: "I have already updated the debian-glue-non-voting job." [integration/config] - 10https://gerrit.wikimedia.org/r/392789 (https://phabricator.wikimedia.org/T181120) (owner: 10Hashar) [08:44:30] 10Gerrit, 10Release-Engineering-Team (Someday), 10Wikimedia-Logstash, 10Patch-For-Review, 10Technical-Debt: Look into shoving gerrit logs into logstash - https://phabricator.wikimedia.org/T141324#3780386 (10Paladox) @gehel hi, we are wondering could you help us with why logstash is not showing the logs p... [08:53:52] RECOVERY - Free space - all mounts on integration-slave-jessie-1004 is OK: OK: All targets OK [09:16:28] (03PS1) 10Hashar: Migrate conftool to Docker [integration/config] - 10https://gerrit.wikimedia.org/r/392798 [09:17:05] (03CR) 10Hashar: "flake8 fails until https://gerrit.wikimedia.org/r/#/c/392793/ get merged." [integration/config] - 10https://gerrit.wikimedia.org/r/392798 (owner: 10Hashar) [09:19:39] (03CR) 10Hashar: "I guess the reason is the test suite doesn't start etcd :]" [integration/config] - 10https://gerrit.wikimedia.org/r/392798 (owner: 10Hashar) [09:21:25] hashar: woohoo, thank you for making the CI work :) [09:25:05] legoktm: yeah UNRELEASED Fails utterly :) [09:25:18] seems like taking the previous distribution might be sufficient [09:27:39] (03PS1) 10Hashar: Add debian-glue (non-voting) to conftool [integration/config] - 10https://gerrit.wikimedia.org/r/392799 (https://phabricator.wikimedia.org/T180330) [09:28:49] (03CR) 10jerkins-bot: [V: 04-1] Add debian-glue (non-voting) to conftool [integration/config] - 10https://gerrit.wikimedia.org/r/392799 (https://phabricator.wikimedia.org/T180330) (owner: 10Hashar) [09:29:50] (03PS2) 10Hashar: Add debian-glue (non-voting) to conftool [integration/config] - 10https://gerrit.wikimedia.org/r/392799 (https://phabricator.wikimedia.org/T180330) [09:34:45] (03CR) 10Hashar: [C: 032] Add debian-glue (non-voting) to conftool [integration/config] - 10https://gerrit.wikimedia.org/r/392799 (https://phabricator.wikimedia.org/T180330) (owner: 10Hashar) [09:35:48] (03Merged) 10jenkins-bot: Add debian-glue (non-voting) to conftool [integration/config] - 10https://gerrit.wikimedia.org/r/392799 (https://phabricator.wikimedia.org/T180330) (owner: 10Hashar) [09:44:20] (03PS1) 10Hashar: Make conftool debian glue voting [integration/config] - 10https://gerrit.wikimedia.org/r/392804 [09:45:50] (03CR) 10Hashar: [C: 032] Make conftool debian glue voting [integration/config] - 10https://gerrit.wikimedia.org/r/392804 (owner: 10Hashar) [09:47:02] (03Merged) 10jenkins-bot: Make conftool debian glue voting [integration/config] - 10https://gerrit.wikimedia.org/r/392804 (owner: 10Hashar) [09:53:16] 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team (Kanban), 10Deployments, 10Language-2017-Oct-Dec: cxserver deployment on beta is broken - https://phabricator.wikimedia.org/T181037#3780631 (10hashar) >>! In T181037#3780183, @KartikMistry wrote: > Added instructions for Beta at: https://www.mediawi... [12:22:49] Yippee, build fixed! [12:22:49] Project selenium-GettingStarted » firefox,beta,Linux,BrowserTests build #594: 09FIXED in 47 sec: https://integration.wikimedia.org/ci/job/selenium-GettingStarted/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=BrowserTests/594/ [12:57:06] 10Gerrit, 10Upstream: Git 2.15: "git review -d" is broken since "-set-upstream" got removed - https://phabricator.wikimedia.org/T180548#3781100 (10EddieGP) 05Open>03Resolved a:03EddieGP git-review 1.26 (containing the fix) was released a few days ago. The new version is available from pypi (https://pypi.... [13:12:59] 10MediaWiki-Releasing, 10Security: Consider using a single MediaWiki releases key instead of individual keys - https://phabricator.wikimedia.org/T181019#3781174 (10MoritzMuehlenhoff) My proposal would be the following: - Create a "mediawiki releases 2017 key" (with an expiration date of maybe 2-3 years) - Dis... [13:20:51] 10Gerrit, 10Upstream: Git 2.15: "git review -d" is broken since "-set-upstream" got removed - https://phabricator.wikimedia.org/T180548#3781186 (10Paladox) Thanks. Please do try it and report back that it’s fixed :) [13:26:39] 10Gerrit, 10Upstream: Git 2.15: "git review -d" is broken since "-set-upstream" got removed - https://phabricator.wikimedia.org/T180548#3781226 (10EddieGP) >>! In T180548#3781186, @Paladox wrote: > Thanks. Please do try it and report back that it’s fixed :) It is, that's why I closed the task ;-) ```lang=ba... [14:23:13] 10Release-Engineering-Team (Watching / External), 10Operations, 10Release Pipeline: Update Debian package for Blubber - https://phabricator.wikimedia.org/T179984#3781320 (10akosiaris) OK, I git pulled and refreshed tags. Unfortunately we are still at a no-go state. I now have ``` dh_auto_build -O--buildsyst... [15:25:30] 10Gerrit, 10Operations, 10Traffic, 10Patch-For-Review: Switch on http/2 in apache for gerrit - https://phabricator.wikimedia.org/T180978#3775204 (10MoritzMuehlenhoff) >>! In T180978#3777053, @elukey wrote: > 2) Since the experimental tag has been removed only recently I strongly suggest to use a recent ver... [16:27:36] Hey folks! We're getting some requests to re-enable ORES for ruwiki. I've been assuming that there's zero chance of getting a config change through until next week. But I thought I'd ask. Currently, we have the ORES service rolled back to a known good state, so in theory (haha), re-enabling ORES on ruwiki should be OK. [16:28:05] All we'd need is a MediaWiki config change to re-enable the ORES extension. [16:28:12] halfak: We can trial the change on beta, of course... [16:28:16] right. [16:30:44] FWIW, a "no. Enabling ORES the day before a holiday is crazy" is a perfectly good answer. [16:30:56] I'm doing my due diligence to ask here ^_^ [16:58:46] !log deploying ores-prod-deploy:5084251 T181168 [16:58:50] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [16:58:50] T181168: Replicate RC/WL failures in Beta - https://phabricator.wikimedia.org/T181168 [17:01:20] thcipriani: Where would a scap log be if not under scap/log ? [17:02:20] ...nowhere? I don't know. Which repo are you looking at? [17:02:56] d’oh. I’m trying to write an incident report for an ORES dust-up on Monday [17:02:57] I think scap/log is hard coded in for the scap3 repos [17:03:05] ah [17:03:07] I deployed 4 times that day, but there’s only one logfile [17:04:19] on the 20th? I see 3 log files that were modified that day [17:04:23] they're named...weirdly [17:04:38] scap-sync-2017-11-08-0002-1-g5084251.log scap-sync-2017-11-08-0003.log scap-sync-2017-11-20-0001.log [17:04:47] also -rw-rw-r-- 1 awight wikidev 7161 Nov 20 23:19 95cd523.log [17:04:50] wat. [17:05:15] I was digging with scap deploy-log, does it open r/w ? [17:05:23] oh and that one :) [17:05:49] scap deploy-log should be read only view of the log files [17:06:42] scap deploy-log -v -f scap/log/95cd523.log [17:06:44] and the like [17:07:46] you can also do some fancy filtering, (show logs from only one target, etc) but I honestly haven't played with that feature since it was being developed. [17:08:15] thcipriani: Those “11-08” files are in fact 11-20 actions. Filing a bug... [17:08:30] Want me to preserve any of this info for the bug? [17:08:51] 10Gerrit, 10Operations, 10Traffic, 10Patch-For-Review: Switch on http/2 in apache for gerrit - https://phabricator.wikimedia.org/T180978#3781833 (10demon) p:05Triage>03Lowest gerrit2001 is running stretch, but we haven't reimaged the master cobalt yet (cf T176774). Given that, plus the fact that this... [17:09:12] 10Gerrit, 10Release-Engineering-Team (Someday), 10Operations: Reimage cobalt as stretch - https://phabricator.wikimedia.org/T176774#3636304 (10demon) [17:09:15] 10Gerrit, 10Operations, 10Traffic, 10Patch-For-Review: Switch on http/2 in apache for gerrit - https://phabricator.wikimedia.org/T180978#3781837 (10demon) [17:10:04] eh, just the info about filenames along side the fact that they were deployed on 2017-11-20 should be good enough to start digging [17:10:24] I'd bet we're doing some rev parsing to determine file names that's going wrong somewhere... [17:10:57] multiple tags pointed to the same commit kinda deal. [17:11:18] thanks for the bug! [17:12:03] 10Scap: scap logs sometimes have incorrect date - https://phabricator.wikimedia.org/T181171#3781844 (10awight) [17:14:02] 10Scap: scap logs sometimes have incorrect date - https://phabricator.wikimedia.org/T181171#3781861 (10awight) More context: I was attempting rollbacks, here are the SAL lines from that day: * 21:37 Started deploy [ores/deploy@5084251]: Updating ORES to revscoring 2.0.10, T179711 * 22:54 git describe --always is I guess how we name the log file, which explains the funky names :) [17:15:52] /o\ [17:18:25] 10Release-Engineering-Team (Kanban), 10Scap: scap logs sometimes have incorrect date - https://phabricator.wikimedia.org/T181171#3781884 (10thcipriani) p:05Triage>03Normal Looks like we name the log file with `git describe --always` which could make for some funky names indeed: https://github.com/wikimedia... [17:30:24] 10Scap, 10Scoring-platform-team: Need to make the number of cached revisions configurable - https://phabricator.wikimedia.org/T181176#3781945 (10awight) [17:32:52] I swear we have a task for that one somewhere... [17:33:43] thcipriani: I know we at least chatted about it... [17:34:09] I don’t think I made the bug tho [17:35:13] yeah, last time we chatted about it wasn't the first time it's been talked about. The task would be pretty old if it exists... [17:35:38] but maybe it only existed in my head [17:39:50] RECOVERY - Free space - all mounts on deployment-sca03 is OK: OK: All targets OK [17:43:52] (03PS1) 10Hashar: Move operations-puppet-wmf-style-guide to perm slave [integration/config] - 10https://gerrit.wikimedia.org/r/392872 [17:48:43] (03PS2) 10Hashar: Move operations-puppet-wmf-style-guide to perm slave [integration/config] - 10https://gerrit.wikimedia.org/r/392872 [17:50:48] (03CR) 10Hashar: [C: 032] Move operations-puppet-wmf-style-guide to perm slave [integration/config] - 10https://gerrit.wikimedia.org/r/392872 (owner: 10Hashar) [17:51:58] (03Merged) 10jenkins-bot: Move operations-puppet-wmf-style-guide to perm slave [integration/config] - 10https://gerrit.wikimedia.org/r/392872 (owner: 10Hashar) [17:59:06] 10Beta-Cluster-Infrastructure, 10WMF-Legal, 10Privacy, 10Security: Require email address to register on Beta Cluster - https://phabricator.wikimedia.org/T181034#3782054 (10Bawolff) Adding legal in case they have any thoughts on this. [18:05:20] halfak: given that the shutoff was an emergency action, I think it's okay to re-enable as long as everything hsa been tested, etc., and someone is going to be closely watching it/IRC/village pumps to revert in case it breaks again [18:05:46] but it's greg-g's call (and no_justification too) [18:06:05] Um? Day before holidays? Nope [18:11:24] 10MediaWiki-Releasing, 10Security: Consider using a single MediaWiki releases key instead of individual keys - https://phabricator.wikimedia.org/T181019#3782098 (10Legoktm) Would we be creating a new key every year then? [18:14:54] 10MediaWiki-Releasing, 10Security: Consider using a single MediaWiki releases key instead of individual keys - https://phabricator.wikimedia.org/T181019#3782121 (10demon) As a complete idiot when it comes to this: are subkeys an option? Like, could we have a master key that basically nobody has the password to... [18:14:56] (03PS1) 10Hashar: Tweak operations-puppet-wmf-style-guide thresholds [integration/config] - 10https://gerrit.wikimedia.org/r/392878 [18:18:21] 10MediaWiki-Releasing, 10Security: Consider using a single MediaWiki releases key instead of individual keys - https://phabricator.wikimedia.org/T181019#3782146 (10dpatrick) p:05Triage>03Normal I like this idea, as Moritz laid it out above. I think this would make sense moving forward, and fits well with t... [18:18:34] (03CR) 10Hashar: [C: 032] Tweak operations-puppet-wmf-style-guide thresholds [integration/config] - 10https://gerrit.wikimedia.org/r/392878 (owner: 10Hashar) [18:19:32] 10Release-Engineering-Team (Watching / External), 10Scoring-platform-team (Current): Improvements to ORES deployment documentation and process - https://phabricator.wikimedia.org/T181183#3782151 (10awight) [18:21:08] (03Merged) 10jenkins-bot: Tweak operations-puppet-wmf-style-guide thresholds [integration/config] - 10https://gerrit.wikimedia.org/r/392878 (owner: 10Hashar) [18:21:14] 10Release-Engineering-Team (Watching / External), 10Scoring-platform-team (Current), 10Wikimedia-Incident: Improvements to ORES deployment documentation and process - https://phabricator.wikimedia.org/T181183#3782151 (10awight) [18:23:18] 10Release-Engineering-Team (Watching / External), 10Operations, 10Patch-For-Review, 10Scoring-platform-team (Current), 10Wikimedia-Incident: Cache ORES virtualenv within versioned source - https://phabricator.wikimedia.org/T181071#3782175 (10awight) [18:24:17] 10Release-Engineering-Team (Watching / External), 10Operations, 10Release Pipeline: Update Debian package for Blubber - https://phabricator.wikimedia.org/T179984#3782184 (10thcipriani) >>! In T179984#3781320, @akosiaris wrote: > Which means it complains about not finding https://github.com/docker/distributio... [18:32:16] legoktm, https://phabricator.wikimedia.org/T72249#3780636 [18:35:39] 10Continuous-Integration-Config, 10Librarization, 10Composer, 10Security, 10Security-General: Expand our usage of FriendsOfPHP/security-advisories - https://phabricator.wikimedia.org/T180278#3782265 (10Bawolff) There was a comment on the github about https://packagist.org/packages/roave/security-advisori... [18:39:22] PROBLEM - Puppet errors on deployment-mx is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [18:39:54] 10MediaWiki-Releasing, 10Security: Consider using a single MediaWiki releases key instead of individual keys - https://phabricator.wikimedia.org/T181019#3782273 (10greg) >>! In T181019#3782146, @dpatrick wrote: > as releases could be signed by the Jenkins instance. Just so everyone's on the same page, by "the... [18:43:22] MarcoAurelio: I think it would be a good GCI task, but I don't think I know the archivebot code well enough to be a primary mentor [18:47:00] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team, 10Jenkins, 10Upstream: Jenkins Gearman plugin has deadlock on executor threads (was: Beta Cluster stopped receiving code updates (beta-update-databases-eqiad hung) - https://phabricator.wikimedia.org/T72597#3782293 (10greg) >>! In T72597#... [18:48:29] !log hung beta updates, doing the monthly dance: https://www.mediawiki.org/wiki/Continuous_integration/Jenkins#Hung_beta_code/db_update [18:48:33] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [18:55:09] !log beta update jobs are back [18:55:12] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [18:55:36] 10Release-Engineering-Team (Watching / External), 10Scoring-platform-team (Current): ORES beta cluster config should be as close to production as possible - https://phabricator.wikimedia.org/T181187#3782311 (10awight) [19:02:56] (03PS1) 10Thcipriani: Pipeline changes [integration/config] - 10https://gerrit.wikimedia.org/r/392892 [19:03:48] 10Release-Engineering-Team (Watching / External), 10Global-Collaboration, 10MediaWiki-extensions-ORES, 10Scoring-platform-team (Current): Make ORES-consuming pages more robust to ORES errors - https://phabricator.wikimedia.org/T181191#3782397 (10awight) [19:05:01] mostly randomly curious, how far away is running extensions tests in docker? Mostly thinking about how to build an image on top of that that installs elasticsearch, extra wikis (commons, specific languages), to run our browser test suite in nodejs [19:05:27] we will need multiwiki, and the ability to run maintenance scripts for those wikis (to build elasticsearch indices and such) [19:07:09] I was waiting for all the docker-pkg changes to land before taking a stab at that [19:07:15] Project selenium-MinervaNeue » chrome,beta,Linux,BrowserTests build #211: 04FAILURE in 18 min: https://integration.wikimedia.org/ci/job/selenium-MinervaNeue/BROWSER=chrome,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=BrowserTests/211/ [19:07:53] I think qunit/browser tests are much farther away because we haven't figured out how to get chromium and xvfb in a docker image [19:08:00] legoktm: so likely to be working in december? Or pushed out to next year? [19:08:06] legoktm ah [19:08:10] i know how to do that :) [19:08:22] upstream gerrit have chrome in the docker [19:08:26] legoktm: hm, doesn't chrome headless not need that stuff? I'm already able to run the chrome headless nodejs tests in labs instances [19:09:05] (might be chromium, i would have to double check) [19:09:10] I'm not making any commitments right now :p [19:09:42] I'm not sure about chrome(ium) headless, I was just trying to do a 1:1 port of the current infra, maybe that was the mistake [19:09:42] looks like what i have in labs (through mwv+lxc) is using chrome, not chromium [19:10:04] ?? chrome can't be installed in labs, it's non-free software [19:10:44] legoktm: its not serving up stuff to anyone, its code thats manually called by me typing things [19:11:04] still :) [19:11:05] it's still not allowed to have any non-free software in cloud services [19:11:33] hmm, well i can try and figure out how to get it running under chromium i suppose [19:12:07] apt-get install chromium [19:12:16] it should mostly be the same as doing it for chrome [19:13:02] i have to find appropriate debian repos for the new versions, debian doesn't have headless [19:13:17] they have chrome 57 and i need 59+ iirc [19:13:30] s/chrome/chromium/ [19:14:20] https://tracker.debian.org/pkg/chromium-browser stable has 61 [19:15:04] not for jessie, which is what mwv runs [19:15:43] maybe newer mwv runs stretch? i havn't checked this labs instance has been running for year+ [19:16:47] nope, latest mwv still runs debian/contrib-jessie64 [19:17:38] but anyways the whole idea is to replace this labs instance with CI, but i'll need multiwiki docker images to get it going [19:18:20] Project selenium-MinervaNeue » firefox,beta,Linux,BrowserTests build #211: 04FAILURE in 29 min: https://integration.wikimedia.org/ci/job/selenium-MinervaNeue/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=BrowserTests/211/ [19:19:06] you could set up a wiki without mwv? [19:19:27] thats a giant pain in the arse to get multiwiki going with 3+ wikis without mwv :P [19:20:35] (and pdfhandler, and timedmediahandler, and sitematrix, and all the elasticsearch plugins, and etc. etc.) [19:24:09] 10Release-Engineering-Team (Watching / External), 10Global-Collaboration, 10MediaWiki-extensions-ORES, 10Scoring-platform-team (Current): Make ORES-consuming pages more robust to ORES errors - https://phabricator.wikimedia.org/T181191#3782477 (10awight) [19:26:26] 10Release-Engineering-Team (Watching / External), 10Global-Collaboration, 10MediaWiki-extensions-ORES, 10Scoring-platform-team (Current), 10Wikimedia-Incident: Make ORES-consuming pages more robust to ORES errors - https://phabricator.wikimedia.org/T181191#3782486 (10awight) [19:26:33] 10Release-Engineering-Team (Watching / External), 10Scoring-platform-team (Current), 10Wikimedia-Incident: ORES beta cluster config should be as close to production as possible - https://phabricator.wikimedia.org/T181187#3782487 (10awight) [19:29:18] 10Release-Engineering-Team (Watching / External), 10ORES, 10Operations, 10Scoring-platform-team, and 4 others: 503 spikes and resulting API slowness starting 18:45 October 26 - https://phabricator.wikimedia.org/T179156#3782495 (10awight) @hoo Wondering if you wrote an incident report, that I can add to wit... [19:38:13] 10Release-Engineering-Team (Watching / External), 10ORES, 10Operations, 10Scoring-platform-team, and 4 others: 503 spikes and resulting API slowness starting 18:45 October 26 - https://phabricator.wikimedia.org/T179156#3782508 (10BBlack) No, we never made an incident rep on this one, and I don't think it... [19:40:59] 10Release-Engineering-Team (Watching / External), 10ORES, 10Operations, 10Scoring-platform-team, and 4 others: 503 spikes and resulting API slowness starting 18:45 October 26 - https://phabricator.wikimedia.org/T179156#3782516 (10awight) @BBlack Thanks for the detailed notes! All I was going to add was my... [19:42:15] 10Release-Engineering-Team (Watching / External), 10ORES, 10Operations, 10Scoring-platform-team, and 4 others: 503 spikes and resulting API slowness starting 18:45 October 26 - https://phabricator.wikimedia.org/T179156#3715229 (10Zoranzoki21) Does it made problem with high sleep times in pywiki? [19:45:55] 10Release-Engineering-Team (Watching / External), 10ORES, 10Operations, 10Scoring-platform-team, and 4 others: 503 spikes and resulting API slowness starting 18:45 October 26 - https://phabricator.wikimedia.org/T179156#3782522 (10demon) >>! In T179156#3782516, @awight wrote: > @BBlack Thanks for the detail... [19:50:25] 10Release-Engineering-Team (Watching / External), 10Scoring-platform-team (Current): Write reports about why Ext:ORES is helping cause server 500s and alternatives to fix - https://phabricator.wikimedia.org/T181010#3782540 (10Halfak) [19:50:26] (03CR) 10Thcipriani: [C: 032] "Deployed, working" [integration/config] - 10https://gerrit.wikimedia.org/r/392892 (owner: 10Thcipriani) [19:51:33] (03Merged) 10jenkins-bot: Pipeline changes [integration/config] - 10https://gerrit.wikimedia.org/r/392892 (owner: 10Thcipriani) [19:55:29] 10Release-Engineering-Team (Watching / External), 10Scoring-platform-team (Current): Write reports about why Ext:ORES is helping cause server 500s and alternatives to fix - https://phabricator.wikimedia.org/T181010#3782546 (10Halfak) In T181168, I capture the error message we wanted to see in Beta. In order t... [20:21:32] 10Gerrit, 10Operations, 10Traffic, 10Patch-For-Review: Switch on http/2 in apache for gerrit - https://phabricator.wikimedia.org/T180978#3782577 (10Dzahn) >>! In T180978#3781833, @demon wrote: >> I'm proposing we lower the priority on this and let another service (preferably one with less depending on it)... [20:25:57] 10Gerrit, 10Release-Engineering-Team (Someday), 10Wikimedia-Logstash, 10Patch-For-Review, 10Technical-Debt: Look into shoving gerrit logs into logstash - https://phabricator.wikimedia.org/T141324#3782580 (10Paladox) Thanks to @EBernhardson we figured out our problem. It's due to it needing to add a es ta... [20:32:39] 10Release-Engineering-Team (Watching / External), 10Scoring-platform-team (Current), 10Wikimedia-Incident: ORES beta cluster config should be as close to production as possible - https://phabricator.wikimedia.org/T181187#3782311 (10Addshore) @awight I saw in the incident report that "Not all wikis are availa... [20:36:19] 10Release-Engineering-Team (Watching / External), 10Scoring-platform-team (Current), 10Wikimedia-Incident: Write reports about why Ext:ORES is helping cause server 500s and alternatives to fix - https://phabricator.wikimedia.org/T181010#3782602 (10greg) [20:42:44] 10Release-Engineering-Team (Watching / External), 10Scoring-platform-team (Current), 10Wikimedia-Incident: ORES beta cluster config should be as close to production as possible - https://phabricator.wikimedia.org/T181187#3782611 (10awight) @Addshore good idea! Here's the matrix of wikis that either have ORE... [21:07:14] (03PS1) 10Hashar: dib: contint::hhvm is now a profile [integration/config] - 10https://gerrit.wikimedia.org/r/392926 [21:08:46] (03CR) 10jerkins-bot: [V: 04-1] dib: contint::hhvm is now a profile [integration/config] - 10https://gerrit.wikimedia.org/r/392926 (owner: 10Hashar) [21:15:04] (03PS2) 10Hashar: dib: contint::hhvm is now a profile [integration/config] - 10https://gerrit.wikimedia.org/r/392926 [21:34:44] 10Gerrit, 10Release-Engineering-Team (Someday), 10Wikimedia-Logstash, 10Patch-For-Review, 10Technical-Debt: Look into shoving gerrit logs into logstash - https://phabricator.wikimedia.org/T141324#3782727 (10Paladox) https://gerrit.wikimedia.org/r/#/c/392943/ adds a reconnection param to make sure gerrit... [21:55:13] (03PS1) 10Hashar: contint::browsers is now a profile [integration/config] - 10https://gerrit.wikimedia.org/r/392977 [21:56:24] hasharAway: should we switch mediawiki/debian to the non-voting version now? [21:57:27] (03CR) 10Legoktm: [C: 031] "I think using the previous distribution in the changelog is the best behavior here. With this we should be able to switch mediawiki/debian" [integration/config] - 10https://gerrit.wikimedia.org/r/392789 (https://phabricator.wikimedia.org/T181120) (owner: 10Hashar) [21:58:50] Yippee, build fixed! [21:58:51] Project selenium-PageTriage » chrome,beta,Linux,BrowserTests build #584: 09FIXED in 49 sec: https://integration.wikimedia.org/ci/job/selenium-PageTriage/BROWSER=chrome,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=BrowserTests/584/ [21:59:02] Yippee, build fixed! [21:59:02] Project selenium-PageTriage » firefox,beta,Linux,BrowserTests build #584: 09FIXED in 1 min 1 sec: https://integration.wikimedia.org/ci/job/selenium-PageTriage/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=BrowserTests/584/ [22:01:05] (03CR) 10Hashar: "Yeah I think it is good enough :] I haven't looked at all our debian/changelog to find out some other examples of UNRELEASED." [integration/config] - 10https://gerrit.wikimedia.org/r/392789 (https://phabricator.wikimedia.org/T181120) (owner: 10Hashar) [22:22:43] 10Gerrit, 10Release-Engineering-Team (Someday), 10Wikimedia-Logstash, 10Patch-For-Review, 10Technical-Debt: Look into shoving gerrit logs into logstash - https://phabricator.wikimedia.org/T141324#3782795 (10Dzahn) https://logstash.wikimedia.org/ now shows the first log lines from cobalt :)) Chad star... [22:23:09] 10Gerrit, 10Release-Engineering-Team (Someday), 10Wikimedia-Logstash, 10Patch-For-Review, 10Technical-Debt: Look into shoving gerrit logs into logstash - https://phabricator.wikimedia.org/T141324#3782796 (10Dzahn) 05Open>03Resolved a:03Dzahn [22:23:40] 10Gerrit, 10Release-Engineering-Team (Someday), 10Wikimedia-Logstash, 10Patch-For-Review, 10Technical-Debt: Look into shoving gerrit logs into logstash - https://phabricator.wikimedia.org/T141324#2494353 (10Dzahn) a:05Dzahn>03Paladox [22:23:50] :) [22:32:24] Jenkins starved for nodes for the https://integration.wikimedia.org/ci/view/Beta/job/beta-scap-eqiad/ job? [22:38:47] ah crap. I thought we just fixed that earlier [22:38:49] * thcipriani fixes [22:44:39] hopefully fixed/stays fixed: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/183234/console [22:46:07] thcipriani: Thanks! [22:48:19] thcipriani: has everything been ported over to docker-pkg now or is build.py still in use? [22:48:58] legoktm: tox is kind of a moving target, so I've been letting that settle. We're currently using both. [22:49:30] ok [22:49:43] and the docker-pkg images have to be built on contint1001? [22:50:27] no you can build them locally. To push to the internal repo the creds are on contint1001 [22:50:43] I wote up local install/use of docker-pkg on: https://www.mediawiki.org/wiki/Continuous_integration/Docker#Images_using_docker-pkg [22:51:41] sorry I meant to *publish* them to the wikimedia docker registry [22:52:07] ah, yes, to publish them use contint1001. All CI admins should be able to do so IIRC. [22:52:23] is docker-pkg our special sauce thcipriani? [22:53:04] yeah, it's a built-here thing [22:53:21] https://github.com/wikimedia/operations-docker-images-docker-pkg [22:53:53] I will play with it later then, thanks [23:00:47] 10Continuous-Integration-Config, 10Continuous-Integration-Infrastructure (Little Steps Sprint), 10Release-Engineering-Team (Someday): Get rid of Zend 5.5 tests for wmf branches - https://phabricator.wikimedia.org/T94149#3782871 (10tstarling) [23:30:57] PROBLEM - Puppet errors on deployment-kafka-jumbo-2 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [23:31:25] 10Release-Engineering-Team, 10Wikidata, 10Epic, 10User-Addshore: [Epic] Kill the Wikidata build step - https://phabricator.wikimedia.org/T173818#3782939 (10demon) [23:31:28] 10Release-Engineering-Team (Kanban), 10Release, 10Train Deployments: 1.31.0-wmf.8 deployment blockers - https://phabricator.wikimedia.org/T178635#3782938 (10demon) 05Open>03Resolved [23:31:55] legoktm: I did some minor refactors to make it easier to pop phpcs into scap: https://phabricator.wikimedia.org/D891 [23:32:11] Basically just moved stuff around so it's cleaner. I'll do more over the holiday, probably [23:36:10] PROBLEM - Puppet errors on deployment-netbox is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [23:37:26] PROBLEM - Puppet errors on deployment-kafka-jumbo-1 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [23:59:27] no_justification upstream git-review have finally fixed http with a base url. So users will be happy now :). [23:59:31] * paladox dosen't use it.