[08:37:18] ema: you here? [14:25:15] 10Traffic, 10Discovery, 10Discovery-Analysis, 10Maps, and 2 others: What is a reasonable per-IP ratelimit for maps - https://phabricator.wikimedia.org/T169175#3505968 (10Gehel) Over the last 30 days, backend requests [[ https://grafana-admin.wikimedia.org/dashboard/db/maps-performances?panelId=4&fullscreen... [14:35:42] 10Domains, 10Traffic, 10Operations, 10Wikimedia Resource Center, 10Patch-For-Review: Create resources.wikimedia.org as a redirect - https://phabricator.wikimedia.org/T172417#3505985 (10mcruzWMF) >>! In T172417#3504034, @Reedy wrote: > I don't disagree with Timo above, and I'm guessing #operations will ag... [14:58:07] 10Domains, 10Traffic, 10Operations, 10Wikimedia Resource Center, 10Patch-For-Review: Create resources.wikimedia.org as a redirect - https://phabricator.wikimedia.org/T172417#3506077 (10mcruzWMF) @Reedy @Krinkle Would it be possible to implement this by tomorrow (Tuesday August 8), because if so we would... [15:38:48] ema, bblack, cp1099 is intermittently alerting for more than a week now [15:42:12] XioNoX: intermittently alerting what? [15:42:21] bblack: mailbox lag [15:42:48] ok [15:43:10] that alert is tricky, it can be pointlessly-spammy (sometimes) and it can self-resolve in a couple of different wants [15:43:13] *ways [15:43:39] some hosts will reach the warning level sometimes, but then back off and recover [15:43:54] sometimes they'll hit the critical level and self-recover not long after either, kinda depends on timing in the daily load cycle [15:44:09] and all of them restart their backends once a week via randomized-cron, which will also reset all related things [15:44:46] bblack: I saw that same alert here and there all week, that's why I'm mentioning it [15:44:46] the mailbox lag itself isn't a problem, but it's a leading indicator. often if it stays in the CRITICAL range, it will start causing intermittent 503s on cache_upload [15:45:04] (only warnings though [15:45:04] ) [15:45:37] so usually if we happen to see it alerting in -ops as CRIT and not busy, we go ahead and do an early restart (same as weekly one) to get it back under control. Or if there are upload 503s alerting as well, then it definitely needs restarting (whichever is mailbox-alerting) [15:45:53] the restart operation being (as root): "run-no-puppet varnish-backend-restart" [16:32:13] bblack: now it's critical [16:32:56] running the restart [16:33:24] 10Traffic, 10Discovery, 10Discovery-Analysis, 10Maps, and 2 others: What is a reasonable per-IP ratelimit for maps - https://phabricator.wikimedia.org/T169175#3506554 (10BBlack) It's just per-IP. So yes that sounds fine: if you're peaking at 80/s total, then lets put an upper sanity bound at 100/s misses... [16:33:33] bblack, uh https://www.irccloud.com/pastebin/QCMcj4oo/ [16:34:08] ah, sudo -i, not sudo -s [16:34:51] ah, yes [16:35:25] there have been many such issues with various parts of our tooling over the years (some of them, since resolved). So it's just kind of become habit that I always use "sudo -i" shell for everything. [16:35:38] (except cumin, because I know it wants "sudo cumin" execution from my shell) [16:36:27] XioNoX: also, please !log for those too if you're doing the manual restarts [16:36:54] (in general that applies to just about anything non-trivial / non-readonly done manually on the caches) [16:37:08] done :) [16:37:25] yeah, I don't automatically think about !log yet [16:37:36] don't hesitate to bash it into my head :) [16:38:45] :) [16:39:06] 10Domains, 10Traffic, 10Operations, 10Wikimedia Resource Center, 10Patch-For-Review: Create resources.wikimedia.org as a redirect - https://phabricator.wikimedia.org/T172417#3506569 (10Reedy) It's not on either of us, at this point, it's on #operations to do the review/merging/deployment Though, as you... [16:40:32] really, we should try to slip some tooling into all the prod hosts that does auto-!log of anything executed from sudo / rootshell (well at least when the network is reachable and normal/sane). And maybe some optional way to turn it off if you're going to be doing a bunch of typing and plan to manually log (e.g. CLI alias: stopirclog "I am working on blah and doing blah" which logs that message a [16:40:38] nd stops logging of that shell until it exits) [16:41:08] although there's lots of "yeah but..." about that [16:41:12] like sensitive commands, etc :P [16:42:40] that would be really cool, I did some similar type mechanisms in teh past with a wrapper than disabled puppet and silenced alerting and did a SAL type thing at my last gig. having a tag oriented SAL where you could see commands by user across all nodes and such was infrequently useful but when it was...probably saved us hours of digging [16:43:16] it as a script called 'maint' that had hooks for nagios-api, setting MOTD based on puppet status and other things and then a SAL [16:44:03] maybe a better starting point would be something not automatic [16:44:26] e.g. a command alias "logme" that logs the rest of your commandline and then executes it [16:45:39] less friction between doing adhoc things and tracking them would be welcome [16:45:56] honestly maybe cumin should SAL things that get run [16:46:13] but without some curation and ability separate out automatic things it gets busy [16:46:13] 10Traffic, 10Operations, 10Phabricator, 10Release-Engineering-Team (Kanban): Verify that the codfw lvs is configured correctly for Phabricator - https://phabricator.wikimedia.org/T168699#3506599 (10mmodell) phab2001 web works, git-ssh still unknown. [16:46:39] "Did someone just run puppet across the env without limits?" etc [16:52:30] yeah I just worry about the (hopefully rare!) case that someone has to put some sensitive key on the commandline, usually with HISTCONTROL=ignoreboth or whatever [16:52:43] I guess we could have tools look for things like that env var and other related ones, too [16:53:22] that does get ugly [16:53:26] (or look for the prefixed whitespace that's used with HISTCONTROL, or lack of HISTFILE, etc) [16:53:47] bash does a thing where if hte command has leading space history ignores it [16:53:51] we could follow suit? [16:53:57] yeah that's the above stuff [16:54:02] HISTCONTROL controls that [16:54:03] oh right :) [16:54:33] maybe it would be easier to hook up the logging via root's history logging, too, then it's sort of automatic [16:54:54] that makes sense to me, less recreating the wheel [16:55:20] this is all complicated by the fact that bash doesn't save history in realtime of course [16:55:27] it logs to HISTFILE when the shell exits [16:55:47] it's too bad there's not a HISTPIPE option or something to log them out to some other socket/command in realtime [16:56:07] and immediately is usually when you want to know a $BAD_THING has been done [16:56:37] I once had to deploy https://linux.die.net/man/1/rootsh for PCI tracking [16:56:42] it was a terrible horrible exp [16:56:56] maybe it's better now [16:59:08] 10Domains, 10Traffic, 10Operations, 10Wikimedia Resource Center, 10Patch-For-Review: Create resources.wikimedia.org as a redirect - https://phabricator.wikimedia.org/T172417#3506650 (10Reedy) Just a heads up, `www.wikimedia.org/resources` will not work, but `wikimedia.org/resources` will So please put `... [17:11:19] 10Traffic, 10Discovery, 10Discovery-Analysis, 10Maps, and 2 others: What is a reasonable per-IP ratelimit for maps - https://phabricator.wikimedia.org/T169175#3506699 (10Gehel) @BBlack I'm probably the one who should be around. I can be available any time from 10am to 11pm CEST (1am to 2pm PT). Just let me... [19:47:48] 10Traffic, 10Android-app-feature-Compilations, 10Operations, 10Reading-Infrastructure-Team-Backlog, 10Wikipedia-Android-App-Backlog: Determine how to upload Zim files to Swift infrastructure - https://phabricator.wikimedia.org/T172123#3507727 (10Mholloway) [20:42:29] 10Traffic, 10AbuseFilter, 10Operations, 10Zero: user_wpzero doesn't always work - https://phabricator.wikimedia.org/T169907#3412425 (10zhuyifei1999) The fact: uploaders are not always in WP0 ranges, but downloaders are nearly always in WP0 ranges (Z591) [21:59:32] 10Traffic, 10Operations, 10Patch-For-Review, 10User-notice: Removing support for DES-CBC3-SHA TLS cipher (drops IE8-on-XP support) - https://phabricator.wikimedia.org/T147199#2684468 (10Pigsonthewing) > Users which cannot move off of the underlying Windows XP operating system can install the latest Firefox... [22:35:03] 10Traffic, 10Operations, 10Patch-For-Review, 10User-notice: Removing support for DES-CBC3-SHA TLS cipher (drops IE8-on-XP support) - https://phabricator.wikimedia.org/T147199#3508328 (10MaxSem) If a corporation is insane enough to still run XP and force their users to run IE, we can only hope that yet anot... [23:58:21] 10Domains, 10Traffic, 10Operations, 10Wikimedia Resource Center, 10Patch-For-Review: Create wikimedia.org/resources redirect for Wikimedia Resource Center - https://phabricator.wikimedia.org/T172417#3508519 (10Krinkle)