[00:01:46] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Kanban), 10MediaWiki-Core-Tests, 10Patch-For-Review, and 2 others: selenium-*-jessie Jenkins jobs failing with `Error: You might be using an older PHP version` - https://phabricator.wikimedia.org/T195830 (10MBinder_WMF) [00:41:55] 10Phabricator-Sprint-Extension, 10Technical-Debt: Deprecate Phabricator Sprint extension - https://phabricator.wikimedia.org/T90906 (10mmodell) Thanks @epriestley, I'll get it patched. [04:42:48] !log on deployment-puppetmaster03 running puppet-merge [04:42:51] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [05:21:41] did not succeed [05:25:25] seems like puppet is broken on deployment-prep in multiple ways [07:45:45] Hello releng! [07:46:16] I'm helping onboarding onimisionipe and I'm getting lost in gerrit access (I have done that 3 years ago and don't remember anything about it) [07:46:52] I see "[2018-08-23 07:27:01,735 +0000] a94637d9 mathew.onipe - AUTH FAILURE FROM 41.217.207.83 user-not-found" in the gerrit logs when onimisionipe is trying to log in. [07:47:39] ok, disregard the request, we found the correct username [07:57:48] 10Phabricator: CURLE_COULDNT_CONNECT when trying to use Conduit - https://phabricator.wikimedia.org/T201746 (10Tgr) Seems like it started working again today. [08:06:54] 10Phabricator: CURLE_COULDNT_CONNECT when trying to use Conduit - https://phabricator.wikimedia.org/T201746 (10mmodell) Could be related to {T201139}? [08:09:20] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Kanban), 10Patch-For-Review, 10Wikimedia-log-errors (Shared Build Failure): mediawiki-quibble jobs fails due to disk full (sql insert failed) - https://phabricator.wikimedia.org/T202457 (10hashar) > Quibble configure the MySQL data using... [08:47:33] (03PS1) 10Hashar: rm wikimedia-fundraising-civicrm-jessie [integration/config] - 10https://gerrit.wikimedia.org/r/454768 (https://phabricator.wikimedia.org/T183512) [08:56:16] (03CR) 10Hashar: [C: 032] rm wikimedia-fundraising-civicrm-jessie [integration/config] - 10https://gerrit.wikimedia.org/r/454768 (https://phabricator.wikimedia.org/T183512) (owner: 10Hashar) [09:05:27] 10MediaWiki-Codesniffer, 10VPS-project-libraryupgrader: Drop "php5,inc" from MediaWiki-CodeSniffer ruleset of extensions - https://phabricator.wikimedia.org/T200956 (10Umherirrender) Would be nice to unify the encoding to "UTF-8", lowercase (utf-8) or without - (utf8) may not work on all systems. I would assu... [09:12:10] (03Merged) 10jenkins-bot: rm wikimedia-fundraising-civicrm-jessie [integration/config] - 10https://gerrit.wikimedia.org/r/454768 (https://phabricator.wikimedia.org/T183512) (owner: 10Hashar) [09:22:25] 10Release-Engineering-Team (Watching / External), 10Patch-For-Review, 10Scoring-platform-team (Current), 10User-Ladsgroup: Another round of discussion about wiki-ai's GitHub->gerrit mirroring - https://phabricator.wikimedia.org/T194212 (10Nikerabbit) [09:24:42] 10Release-Engineering-Team (Watching / External), 10Patch-For-Review, 10Scoring-platform-team (Current), 10User-Ladsgroup: Another round of discussion about wiki-ai's GitHub->gerrit mirroring - https://phabricator.wikimedia.org/T194212 (10Nikerabbit) [09:40:01] (03PS1) 10Umherirrender: Make seccheck for Thanks voting [integration/config] - 10https://gerrit.wikimedia.org/r/454776 [10:41:39] (03CR) 10Thiemo Kreuz (WMDE): [C: 031] "I, personally, are happy with this addition." [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/454702 (owner: 10Umherirrender) [10:44:21] (03CR) 10Thiemo Kreuz (WMDE): [C: 031] "Fine for me. The addition of the "ignoreNewlines" property with PHPCS version 2.7 seems to have fixed the issue mentioned in the comment." [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/454718 (owner: 10Umherirrender) [10:59:59] 10MediaWiki-Codesniffer, 10VPS-project-libraryupgrader: Drop "php5,inc" from MediaWiki-CodeSniffer ruleset of extensions - https://phabricator.wikimedia.org/T200956 (10MarcoAurelio) I wonder if we could do it for {rTSTW} `.phpcs.xml` file; although I'm not sure if we use up-to-date PHP. [11:03:53] <_joe_> !log cherry-picking https://gerrit.wikimedia.org/r/c/operations/puppet/+/453093 on deployment-prep [11:03:55] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [11:06:10] <_joe_> !log reverting my cherry-pick [11:06:13] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [11:17:40] (03PS1) 10Hashar: Migrate BlueSpiceSubPageTree to Quibble [integration/config] - 10https://gerrit.wikimedia.org/r/454795 (https://phabricator.wikimedia.org/T130811) [11:17:59] 10Continuous-Integration-Config, 10Continuous-Integration-Infrastructure (shipyard), 10BlueSpice, 10Patch-For-Review: Enable unit tests on BlueSpice* repos - https://phabricator.wikimedia.org/T130811 (10hashar) [11:18:25] (03CR) 10Hashar: [C: 032] Migrate BlueSpiceSubPageTree to Quibble [integration/config] - 10https://gerrit.wikimedia.org/r/454795 (https://phabricator.wikimedia.org/T130811) (owner: 10Hashar) [11:20:16] (03Merged) 10jenkins-bot: Migrate BlueSpiceSubPageTree to Quibble [integration/config] - 10https://gerrit.wikimedia.org/r/454795 (https://phabricator.wikimedia.org/T130811) (owner: 10Hashar) [13:47:26] <_joe_> !log cherry-picking https://gerrit.wikimedia.org/r/c/operations/puppet/+/453093 on deployment-prep again [13:47:29] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [13:57:12] addshore: if you are around. WikibaseMediaInfo mysteriously fails in PHPUnit due to a dataprovider :-\ https://phabricator.wikimedia.org/T198192 [13:57:21] addshore: not sure whether you are familiar with that repo though [14:00:27] HI [14:00:35] *looks* [14:03:36] hashar: https://phabricator.wikimedia.org/T198192#4526794 i guess it probably is the provider [14:03:59] should get the media info team to look at it :) [14:04:07] i guess they will be waking up soon! [14:04:55] addshore: bah and CI fails due to some other reason now (a selenium test explodes) ... plus I cant reproduce it anymore locally :-\ [14:06:52] ooh, whats the selenium test failure? [14:07:28] addshore: https://integration.wikimedia.org/ci/job/quibble-vendor-sqlite-php70-docker/180/console [14:07:38] 1) Special:RecentChanges shows lemmas in title links to lexemes: [14:07:43] false == true [14:08:04] pretty sure that is because the recentchange update job has not been run yet by the time the suite hit the page [14:08:07] hmm, not seen that before [14:08:51] that would be https://phabricator.wikimedia.org/T199446 [14:09:40] Special:RecentChanges is updated by a background job but on CI the jobs are not necessarly run immediately (in fact, they are not) [14:09:54] or maybe that is something else entirely [14:10:12] * addshore is in another meeting now [16:06:27] (03CR) 10Dduvall: "> Patch Set 4: Code-Review+1" (031 comment) [integration/config] - 10https://gerrit.wikimedia.org/r/454706 (https://phabricator.wikimedia.org/T202457) (owner: 10Dduvall) [16:07:03] marxarelli: that change looks good to me :) [16:07:09] (03CR) 10Hashar: [C: 031] Use a volume under workspace for /tmp in docker containers [integration/config] - 10https://gerrit.wikimedia.org/r/454706 (https://phabricator.wikimedia.org/T202457) (owner: 10Dduvall) [16:07:11] well I mean [16:07:24] it is not ideal, but that is good hack :] [16:09:05] haha, not exactly a hack but not ideal that's for sure [16:09:30] i was just looking at the related "change docker storage driver to devicemapper" task [16:09:37] seems like the right long-term solution [16:09:42] we should adjust Quibble to just assumes all paths are relative (eg ./cache ./src ./data ./log ./tmp etc ) [16:09:59] and get the jenkins job to just cleanup the whole workspace instead of individual directories [16:10:07] but shrinking the vg that we use for the jenkins workspace does not seem ideal :\ [16:11:05] i like that idea. the docker builders could then be simplified [16:11:43] there are quite a few of them and the differences between them seem minute [16:12:33] yup [16:12:39] anddddd [16:13:04] on CI slaves Docker should be configured to point to /srv (the extended dir) instead of /var/lib/docker which is the "small" / [16:13:17] but to do so, we need slaves with more disk (40 GB instances will not be enough) [16:13:50] there is 20G for / 20 G for /srv and the docker images takes a huge chunk of disk (like 10+ G maybe [16:14:05] 10Phabricator, 10Release-Engineering-Team (Kanban): Unable to log in to Phabricator via MediaWiki on mobile (CSP error) - https://phabricator.wikimedia.org/T201460 (10mmodell) 05Open>03Resolved The fix was deployed last night and I'm not able to reproduce currently by following @krinkle's instructions. I t... [16:14:46] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Kanban), 10Patch-For-Review, 10Wikimedia-log-errors (Shared Build Failure): mediawiki-quibble jobs fails due to disk full (sql insert failed) - https://phabricator.wikimedia.org/T202457 (10dduvall) >>! In T202457#4525849, @hashar wrote: >... [16:15:10] hashar: can we attach an additional pv instead? [16:16:21] hmm [16:16:27] ideally yeah [16:16:33] using the labs_lvm module in puppet [16:16:39] but that would require a bunch of puppet work for that [16:17:51] marxarelli: ohhhhhhhhhhhhh [16:19:27] marxarelli: so yeah we blindly allocate 100% of the free disk space [16:19:28] via [16:19:29] modules/role/manifests/ci/slave/labs/common.pp: require ::profile::labs::lvm::srv [16:20:00] so I guess a new profile could be made that would allocate X GB for /srv/docker/ and Y GB for /srv/jenkins/ [16:32:55] andrewbogott: ^ would this work? i.e. declaring two labs_lvm::volume resources, one with x% size and one with 100-x% size? [16:33:44] Probably! [16:35:12] good enough for me :) [16:36:35] (03PS1) 10Hashar: Migrate MOOC to Quibble [integration/config] - 10https://gerrit.wikimedia.org/r/454856 (https://phabricator.wikimedia.org/T199032) [16:36:53] hashar: ^. and i'm going to deploy my jjb change in the interim unless you have objections [16:37:32] i want to get that bigmem instance back up to 8 executors so we can get some good data about utilization and job execution timing [16:38:09] 10Continuous-Integration-Infrastructure (shipyard), 10Release-Engineering-Team (Kanban), 10releng-201718-q3, 10Epic, 10Patch-For-Review: [EPIC] Migrate Mediawiki jobs from Nodepool to Docker - https://phabricator.wikimedia.org/T183512 (10hashar) [16:39:45] marxarelli: the JJB hack is rad :] [16:39:59] I actually thought about how to easily fix the /tmp issue over night, but could not found a solution [16:40:10] until this morning I have read your change and I was like "hoooo that is smart" [16:40:11] ;] [16:40:33] marxarelli: try updating one job and see how it behaves [16:40:37] then update the others [16:41:33] hashar: cool. i forget, is there a way to have the jenkins-jobs command only update jobs that have changed? [16:41:41] or is it only by pattern? [16:43:02] !log Deploying https://gerrit.wikimedia.org/r/c/integration/config/+/454706 for mediawiki-quibble-vendor-mysql-php70-docker only [16:43:05] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [16:45:22] i suppose i could do a diff of the output/ directory... [17:07:32] 10Phabricator: Embedded Commons videos are broken - https://phabricator.wikimedia.org/T200757 (10Aklapper) [17:16:11] James_F: MOOC that is such a pity :((( the extension looks great [17:17:46] but yeah maybe I should archive it [17:23:50] 10Continuous-Integration-Config, 10Wikimedia-General-or-Unknown, 10phan-taint-check-plugin, 10MW-1.32-release-notes (WMF-deploy-2018-08-21 (1.32.0-wmf.18)), 10Patch-For-Review: Enable phan-taint-check-plugin on all Wikimedia-deployed repositories where it is curr... - https://phabricator.wikimedia.org/T201219 [17:28:29] !log Deploying https://gerrit.wikimedia.org/r/c/integration/config/+/454706 for 270 affected jobs [17:28:31] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [17:33:00] hasharAway: Eh. [17:56:52] https://gerrit-review.googlesource.com/q/status:open [17:56:58] They made the header blue heh [18:00:25] hey yall, [18:00:34] including a role on anew host that uses scap::target [18:00:45] not sure why, but it is trying to clone from tin.eqiad.wmnet [18:00:50] where would that be configured? [18:00:58] it isn't in my scap config, afaict it isn't in puppet [18:01:40] thcipriani: ? ^ [18:01:59] ottomata: what's it trying to download. My guess is it might in /srv/deployment[repo]/.git/DEPLOY_HEAD on deploy1001 [18:03:22] oh hmm [18:03:24] ottomata: the git_server setting in DEPLOY_HEAD [18:03:43] sure is [18:03:55] how do I change that? just manually? [18:04:04] if you just manually edit that file, I would bet it would fix puppet on the target [18:04:08] k [18:05:02] hm not quite /usr/bin/scap deploy-local --repo analytics/superset/deploy -D log_json:False' returned 70: 18:04:48 Fetch from: http://tin.eqiad.wmnet/analytics/superset/deploy/.git [18:05:21] add --refresh-config to that command [18:05:30] it already cached the config locally [18:05:46] looking better [18:05:46] 18:05:37 Fetch from: http://deployment.eqiad.wmnet/analytics/superset/deploy/.git [18:09:52] thanks thcipriani i think it sbetter, i got other problems now but i'll figure those out :) [18:10:21] ottomata: sure thing, lemme know if anything else comes up [18:14:37] (03CR) 10Dduvall: "Change was deployed for 270 jobs. I haven't seen a failure due to this change yet, though I did noticed this:" [integration/config] - 10https://gerrit.wikimedia.org/r/454706 (https://phabricator.wikimedia.org/T202457) (owner: 10Dduvall) [18:45:39] 10Release-Engineering-Team (Kanban), 10Release, 10Train Deployments: 1.32.0-wmf.18 deployment blockers - https://phabricator.wikimedia.org/T191064 (10thcipriani) [19:38:29] 10Release-Engineering-Team (Kanban), 10Release, 10Train Deployments: 1.32.0-wmf.18 deployment blockers - https://phabricator.wikimedia.org/T191064 (10thcipriani) [20:43:16] I'd like to set up mw-vagrant with database replication and a separate cluster for Flow, has anyone done this, and if so, do you have notes? :) [20:44:35] 10Beta-Cluster-Infrastructure: Close es.wikibooks beta - https://phabricator.wikimedia.org/T202665 (10MarcoAurelio) [20:48:22] 10Beta-Cluster-Infrastructure: Close es.wikibooks beta - https://phabricator.wikimedia.org/T202665 (10MarcoAurelio) Curiously, while https://github.com/wikimedia/operations-mediawiki-config/blob/master/dblists/closed-labs.dblist lists eswiki as closed, https://es.wikipedia.beta.wmflabs.org/wiki/Especial:Todos_lo... [20:54:12] 10Continuous-Integration-Config, 10Growth-Team, 10MediaWiki-extensions-CentralAuth, 10Notifications, 10MW-1.29-release: Echo tests on REL1_29 fail due to CentralAuth - https://phabricator.wikimedia.org/T202667 (10Reedy) [20:57:52] 10Beta-Cluster-Infrastructure: Close es.wikibooks beta - https://phabricator.wikimedia.org/T202665 (10Krenair) Not here though, that's {T115584} [20:58:05] 10Continuous-Integration-Config, 10Growth-Team, 10MediaWiki-extensions-CentralAuth, 10Notifications, and 2 others: Echo tests on REL1_29 fail due to CentralAuth - https://phabricator.wikimedia.org/T202667 (10Reedy) [20:59:10] 10Continuous-Integration-Config, 10Growth-Team, 10Quibble, 10Thanks: Thanks REL1_31 tests failing due to `unknown class Pimple\Container` - https://phabricator.wikimedia.org/T202668 (10Reedy) [20:59:14] 10Continuous-Integration-Config, 10Growth-Team, 10Quibble, 10Thanks: Thanks REL1_31 tests failing due to `unknown class Pimple\Container` - https://phabricator.wikimedia.org/T202668 (10Reedy) [21:00:48] 10Continuous-Integration-Config, 10Growth-Team, 10Quibble, 10Thanks: Thanks REL1_30 failing with session token errors - https://phabricator.wikimedia.org/T202669 (10Reedy) [21:01:09] Krenair: and why is aawiki marked as closed and not on any list? [21:01:13] * Hauskatze is lost [21:01:45] inheriting the value from closed.dblist? [21:01:55] aa.* are all closed on production [21:02:05] i just know that "aawiki" is historically always the wiki you specify with maintenance script when there is no specific wiki but you have to provide one [21:02:29] 10Continuous-Integration-Config, 10Growth-Team, 10MediaWiki-extensions-CentralAuth, 10Quibble, 10Thanks: Thanks REL1_29 failing due to CentralAuth tables being missing - https://phabricator.wikimedia.org/T202670 (10Reedy) [21:02:34] 10Continuous-Integration-Config, 10Growth-Team, 10MediaWiki-extensions-CentralAuth, 10Quibble, 10Thanks: Thanks REL1_29 failing due to CentralAuth tables being missing - https://phabricator.wikimedia.org/T202670 (10Reedy) p:05Triage>03Lowest [21:03:06] mutante: yeah, but that's not it; we're investigating the mistery of wikis that are closed but not really [21:03:21] ok, *nod* [21:03:28] how do you define closed? [21:03:45] can we torch beta cluster and make it from anew? :P [21:03:52] mutante: read-only wikis [21:03:53] yea:) [21:04:09] Hauskatze: ok, and what makes some "not really ? [21:04:21] 10Continuous-Integration-Config, 10Growth-Team, 10Quibble, 10Thanks: Thanks REL1_30 and REL1_27 failing with session token errors - https://phabricator.wikimedia.org/T202669 (10Reedy) [21:04:37] eswiki beta is closed, but the 'closed' tag isn't being applied despite being on closed-labs.dblist [21:05:07] aawiki beta is not on any closed-labs.dblist, yet sitematrix list it as closed [21:05:22] I suspect this is because aawiki *is* on closed.dblist [21:05:44] oh hehe, ok [21:05:48] 10Continuous-Integration-Config, 10AntiSpoof, 10Equivset, 10Growth-Team, 10Thanks: Thanks REL1_27 failing due to "Your requirements could not be resolved to an installable set of packages" - https://phabricator.wikimedia.org/T202671 (10Reedy) [21:06:04] but closed.dblist should be ignored on labs and, instead, listen to their own xx-labs.dblist files [21:06:38] however Alex mentions something that happened in a patch and I can't really follow after that [21:07:17] * Hauskatze eval.php eswiki@beta [21:09:01] var_dump ( $wikiTags ); [21:09:03] NULL [21:09:11] great [21:13:12] 10Beta-Cluster-Infrastructure: Close es.wikibooks beta - https://phabricator.wikimedia.org/T202665 (10MarcoAurelio) ``` maurelio@deployment-deploy01:~$ mwscript eval.php eswiki > var_dump( $wikiTags ); NULL ``` Doesn't look right I guess. [21:27:09] 10Continuous-Integration-Config, 10Wiki-Loves-Monuments-Database, 10Patch-For-Review: Add Shell linting to heritage repo - https://phabricator.wikimedia.org/T175906 (10Lokal_Profil) @JeanFred This one is fully resolved right? [21:46:47] thcipriani: So I have a proposed fix for the FlaggedRevs train blocker up at https://gerrit.wikimedia.org/r/c/mediawiki/extensions/FlaggedRevs/+/455013 , but I don't have a local testing setup. I'm instead going to test it on mwdebug1001 if you don't mind [21:47:10] RoanKattouw: fine with me, thanks for digging into this! [21:48:24] I love having a responsible way to test in production :D [21:49:48] enwiki or gtfo [21:58:09] OK I can confirm the fix on enwiki [21:58:16] Required a bit of messing around but I got there [22:01:28] RoanKattouw: nice! I usually refrain from merging commits in master, but I'd be fine to backport that change if you can find someone to give it a once over/+2 [22:02:01] I'll try to find someone [22:03:08] thanks [22:37:22] 10Release-Engineering-Team (Kanban), 10Release, 10Train Deployments: 1.32.0-wmf.18 deployment blockers - https://phabricator.wikimedia.org/T191064 (10thcipriani) [22:42:09] (03PS1) 10QChris: Allow “Gerrit Managers” to import history [wikimedia/fundraising/FRUEC] (refs/meta/config) - 10https://gerrit.wikimedia.org/r/455025 [22:42:12] (03CR) 10QChris: [V: 032 C: 032] Allow “Gerrit Managers” to import history [wikimedia/fundraising/FRUEC] (refs/meta/config) - 10https://gerrit.wikimedia.org/r/455025 (owner: 10QChris) [22:42:43] (03PS1) 10QChris: Import done. Revoke import grants [wikimedia/fundraising/FRUEC] (refs/meta/config) - 10https://gerrit.wikimedia.org/r/455026 [22:42:46] (03CR) 10QChris: [V: 032 C: 032] Import done. Revoke import grants [wikimedia/fundraising/FRUEC] (refs/meta/config) - 10https://gerrit.wikimedia.org/r/455026 (owner: 10QChris) [23:02:19] thcipriani: I don't know what the threshold should be for non-error volume in Logstash worthy of train block, but this one is pretty spammy (#1 message on mediawiki-error). – https://phabricator.wikimedia.org/T202686 [23:04:26] there is no absolute threshold it's mostly just when logs get hard to read [23:04:51] I hadn't seen that one since it's a NOTICE [23:05:07] maybe a good rule of thumb would be that if a few hours of mwversion:{next} is more than double of mwversion:{prev}, it's blocking? [23:05:21] it's on mediawiki-errors. [23:05:53] we only exclude debug (all channels), and info (from non-error channels) [23:06:11] I plan to fix the 'error' channel to not use level:INFO so that we can remove that exclude as well. [23:07:12] where does this notice fall on the proposed rule of thumb? [23:07:19] * thcipriani fiddles in logstash [23:08:12] well, too much and logstash will start dropping packets [23:08:28] there is a limit to how much it can take. channel/type doesn't matter ultimately [23:08:53] it's just that I'm excluding non-mediawiki mainly, and debug to exclude mwdebug users explicitly logging all channels for one-off noise. [23:09:14] This shows last 4 hours of plain mediawiki-errors [23:09:15] https://logstash.wikimedia.org/goto/bc5e1222783ab220a14297a1ce0a5f31 [23:09:26] Volume doubled, possibly trippled [23:11:36] 10Release-Engineering-Team, 10Structured-Data-Commons, 10Wikidata, 10User-Urbanecm, 10Wiki-Setup (Create): Create a production test wiki in group0 to parallel Wikimedia Commons - https://phabricator.wikimedia.org/T197616 (10Abit) @MarkTraceur , any movement on this after you talked to Greg? Anything I c... [23:13:08] Krinkle: I feel like I'm seeing something different: I see the count line in the 6k-9k range, a spike starting 22:50, and now normalish [23:14:37] am I looking at the wrong thing? Or interpreting the count incorrectly? [23:15:41] spike looks to be something about memcached afaict [23:16:13] I need to increase the range because I see the first deploy was around 4 hours ago [23:16:16] looking at 8 hours now [23:16:54] how is puppet meant to work on deployment-prep? [23:17:06] I tried running puppet-merge on it but it is broken due to a minor detail in conftool [23:17:28] it's fixable, but I just wanted to check that it's the right thing to do [23:17:36] it looks like puppet hasn't been updated on deployment-prep since June [23:18:32] Krenair: ^ [23:18:47] ugh. There's a cron on deployment-puppetmaster03 thta's supposed to keep it up-to-date, but it bails on some merge conflicts. [23:19:00] thcipriani: yeah, doens't look at bad – https://grafana.wikimedia.org/dashboard/db/production-logging [23:19:40] TimStarling, what. [23:19:53] TimStarling, puppet was running as of a few days ago [23:20:09] and was up to date as of a few days ago [23:20:11] well, it might be that I'm looking at some old system that hasn't been updated since june [23:20:13] I have no idea what puppet-merge does [23:20:28] is this documented? [23:20:46] is what exactly documented? [23:21:04] the cron job etc. [23:21:08] Krinkle: I'm fine rolling back if you think that notice should be more than a notice or if you think the notice could be drowning out something else. [23:21:28] I'm sure it's a mistake, that I am sure of. [23:21:41] But low volume enough that it's not currently causing any serious issue. [23:21:48] TimStarling, pretty sure that's puppetised [23:21:51] anyway [23:21:53] But should be fixed, and ideally not more than a month in the future. [23:22:05] the puppetmaster has commits up to two days ago [23:22:16] anyway, the problem is that people writing and merging code to a first approximation don't look at the logs coming out of that code, only a handful do. [23:22:40] indeed [23:22:54] I imagine a rebase conflict has broken it [23:22:56] so it becomes a "am I (personally) affected and willing to block others", probably not. My experience isn't that important. [23:23:18] but I don't know where you're seeing stuff only up to june [23:23:39] Krinkle: I can send out a "this week in logspam" update tomorrow. There have been a number of spammy log messages recently. [23:24:04] hmm [23:24:27] I'm going to see if I can fix whatever has caused it to break two days ago [23:25:45] are there a *lot* of locally cherry picked changes? [23:26:10] like 19 [23:26:27] It looks like an old version of https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/444610/ was cherry-picked which caused problems when it got merged [23:26:28] yes [23:26:47] 24 [23:27:22] ok, this explains some things [23:27:55] when I said it hadn't been updated since june I meant git log -1 on /var/lib/git/operations/puppet showed a change made in june [23:27:59] !log Fixed puppet rebasing on deployment-puppetmaster03 by removing old cherry-picked version as of https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/444610/ - it broke puppet rebasing since Wednesday [23:28:02] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [23:29:16] I think I did git log -10 and still didn't find the bottom of it, it was all June, so I assumed it must not have been updated [23:29:38] TimStarling, the PS1 shows you where we are with cherry-picks [23:29:48] earlier it was [23:29:49] root@deployment-puppetmaster03:/var/lib/git/operations/puppet(production u+25-41)# [23:30:07] indicating 25 cherry-picks and missing 41 commits from origin/production [23:30:30] yeah, makes sense [23:31:04] actually one of these can go [23:31:08] I ran puppet-merge, which does a git fetch and then shows a diff for the changes that need to be applied [23:31:17] and it showed many things, I suppose 41 [23:31:53] I can see how that would be useful on a production puppetmaster but not here [23:32:25] 10Phabricator: CURLE_COULDNT_CONNECT when trying to use Conduit - https://phabricator.wikimedia.org/T201746 (10ayounsi) Maybe related to the work done by @elukey in T198623. To help troubleshot it, can we know the source host/IP, destination host/IP/port? If HTTP(S) is the script configured to use the proxies (... [23:33:08] TimStarling: relevant - https://gerrit.wikimedia.org/r/#/q/hashtag:beta-picked [23:33:15] the number of cherry-picks is the subject of at least two tasks [23:33:34] Lots of changes pending review and our other outcome that doens't require the local change [23:34:05] my plan was to cherry pick my change https://gerrit.wikimedia.org/r/c/operations/puppet/+/454741, then to test it and immediately self-merge it if it works [23:36:02] and it's not something that can just be solved because that's a ton of different people involved, the people actually looking after deployment-prep can't merge there, some of the cherry-picks are for experimental stuff, and some are just to prevent puppet in beta breaking over prod assumptions [23:36:19] 'Presumably temporary' :( [23:39:43] 9 people plus two called "root" according to git shortlog -23 -s [23:41:09] almost half of the changes are yours, Krenair [23:41:16] current cherry-pick authors? yeah [23:42:01] so you do git cherry-pick as root? [23:42:13] I noticed 8000 files were owned by gitpuppet and 2000 were owned by root [23:42:31] gitpuppet is a role account used by puppet-merge [23:42:34] yeah [23:44:10] I try to prevent puppet from being left to fail across the deployment-prep hosts [23:45:01] nice that someone is doing that [23:45:17] along with other tasks going on in beta [23:45:27] so I end up writing puppet commits [23:45:48] they take a while to get merged so often sit as cherry-picks for a while [23:46:40] you don't have +2? [23:46:45] in puppet.git? [23:46:53] yes [23:47:22] I don't even have deployment access anymore, never had ops [23:48:51] nobody who maintains beta has +2 in ops/puppet [23:49:10] kind of sucks [23:50:40] makes for a lot of cherry-picks at least [23:51:07] you should have a branch, with +2 rights in the branch [23:51:32] then the production branch can actually be called that for a reason [23:51:45] then the stuff going into production wouldn't go to beta [23:52:01] could have a cron job to rebase it [23:52:28] :| [23:52:39] I mean, local cherry picks are a branch from git's point of view [23:55:09] Alternatively, we could all be like you, that is: test and maintain stuff in beta like in prod. So less releng involvement and more ops involvement. [23:56:08] this is (I'm a bit embarrassed to say) the first time I've tried testing a puppet change in beta [23:56:13] are there any extremely simple uncontroversial changes that I can +2 right now? [23:57:11] I don't think we have any right now [23:58:04] Krenair: https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/316512/ ? [23:58:05] For webperf, I treat beta likes third DC cluster. In terms of Hiera etc. and always pick and test there, then +2 from ops “for prod”. Result is, it doesn’t go to prod unless it works in beta. [23:58:22] when labs was brandnew and Ryan Lane started it, there used to be a labs branch to go with the production branch, and then we stopped using that a little while later [23:58:41] I view https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/439791/ as fairly straightforward but it got turned down from puppet swat IIRC, waiting for a specific ops person to take a look [23:59:27] thcipriani, no point with that one right now I don't think [23:59:39] My estimate is it will take 1 month before the branches divert in ways that can’t be auto-rebased , and differ by more than 100 commits the month after. [23:59:43] never got LVS to work on the labs network