[13:20:10] PROBLEM - check_swap on frdb2001 is CRITICAL: SWAP CRITICAL - 0% free (0 MB out of 625 MB) [13:25:11] RECOVERY - check_swap on frdb2001 is OK: SWAP OK - 100% free (7627 MB out of 7627 MB) [13:45:33] hey cwd! [14:26:54] (PS2) Mepps: Fix Amazon token timeout [extensions/DonationInterface] - https://gerrit.wikimedia.org/r/379900 (owner: Ejegg) [14:26:57] (CR) Mepps: [C: 2] Fix Amazon token timeout [extensions/DonationInterface] - https://gerrit.wikimedia.org/r/379900 (owner: Ejegg) [14:27:43] (PS2) Mepps: Restore good country data from session [extensions/DonationInterface] - https://gerrit.wikimedia.org/r/379694 (https://phabricator.wikimedia.org/T176450) (owner: Ejegg) [14:27:55] (CR) Mepps: [C: 2] "Looks good, maybe a test in the future?" [extensions/DonationInterface] - https://gerrit.wikimedia.org/r/379694 (https://phabricator.wikimedia.org/T176450) (owner: Ejegg) [14:29:18] (Merged) jenkins-bot: Fix Amazon token timeout [extensions/DonationInterface] - https://gerrit.wikimedia.org/r/379900 (owner: Ejegg) [14:30:36] mepps: o/ [14:30:54] how goes it cwd? [14:31:03] (Merged) jenkins-bot: Restore good country data from session [extensions/DonationInterface] - https://gerrit.wikimedia.org/r/379694 (https://phabricator.wikimedia.org/T176450) (owner: Ejegg) [14:32:06] not bad... sadly reading the news [14:34:33] yes that is really sad [14:35:09] i'm getting a hug from my cat which is comforting [14:36:22] :) [14:49:57] Fundraising-Backlog, fundraising-tech-ops, Operations: Port fundraising stats off Ganglia - https://phabricator.wikimedia.org/T152562#3650967 (Jgreen) [14:50:00] fundraising-tech-ops, Operations, netops: remove fundraising firewall rules related to ganglia - https://phabricator.wikimedia.org/T176319#3650965 (Jgreen) Open>Resolved this is done [14:51:08] Jeff_Green: crap, looking into the postfix thing [14:51:15] cool [14:57:19] Jeff_Green: confusing, that crontab is not on 1002 [14:57:49] i wonder if i changed it so fast it didn't land on every box [14:58:02] could be [14:58:42] easiest thing would be for me to just remove the prometheus ones by hand [14:58:57] Jeff_Green: how did you notice? error log somewhere? [14:59:52] i just happened to be on that box looking at whether or not we'd already done one of the prometheus metric gatherers [15:05:52] Jeff_Green: just pushed a puppetish fix [15:06:01] cool [15:06:53] i got the syslog collector going, it's graphing here: https://grafana.wikimedia.org/dashboard/db/fundraising-overview?refresh=1m&orgId=1 [15:07:09] augh crap i broke it [15:07:20] it looks like it came back though? [15:08:06] i think i reverted before breakage [15:08:21] * cwd running w/ scissors [15:08:38] maybe I'm the one that broke it? for a few minutes the syslog thing was reporting duplicate metrics [15:08:49] nah it was me [15:08:57] stupid mistake, shouldn't have made it on prod [15:09:03] ok [15:09:11] that being said it's going to be annoying to remove that crontab with puppet [15:09:49] it's just the same crontab under a different user? [15:11:41] yeah [15:11:47] prometheus can't call postqueue [15:12:04] i can easily just go around and remove it from all the machines [15:12:06] if you don't mind [15:12:39] it's easy, just add the cron{} next to the correct one, and adjust the user and set ensure=>'absent' for it [15:13:48] so that's what i tried but it complained about a duplicate entry [15:13:54] with the same name [15:14:13] ahh, yeah [15:36:03] fundraising-tech-ops: Create alerts for rsyslog rate limiting - https://phabricator.wikimedia.org/T176924#3651109 (Jgreen) [15:36:06] fundraising-tech-ops, monitoring: overhaul fundraising cluster monitoring - https://phabricator.wikimedia.org/T91508#3651108 (Jgreen) [15:55:18] fundraising-tech-ops: Create alerts for rsyslog rate limiting - https://phabricator.wikimedia.org/T176924#3641538 (Jgreen) As of this AM we're collecting syslog total message count, and dropped message count, to prometheus. They're on the main fundraising dashboard here: https://grafana.wikimedia.org/dashboa... [15:56:46] fundraising-tech-ops: Create alerts for rsyslog rate limiting - https://phabricator.wikimedia.org/T176924#3651262 (Jgreen) >>! In T176924#3651256, @Jgreen wrote: > As of this AM we're collecting syslog total message count, and dropped message count, to prometheus. They're on the main fundraising dashboard he... [16:00:58] Fundraising-Backlog: Update postal address in Thank You email - https://phabricator.wikimedia.org/T177230#3651273 (Pcoombe) [16:08:18] Fundraising-Backlog: Update postal address in Thank You email - https://phabricator.wikimedia.org/T177230#3651273 (Ejegg) Sure, if the updates are already on meta it'll be pretty quick to get those deployed. [16:09:55] (PS2) Ejegg: Report average consumed message age [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/381263 (https://phabricator.wikimedia.org/T176920) [16:09:57] (PS2) Ejegg: Report average thank you mail delay [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/381264 (https://phabricator.wikimedia.org/T176920) [16:51:39] Fundraising-Backlog, Recurring-Donations: Send different TY letter to recurring donors - https://phabricator.wikimedia.org/T88574#3651534 (CCogdill_WMF) p:Low>High [17:05:18] Fundraising Sprint Synchronized Screaming, Fundraising-Backlog, FR-Email, Patch-For-Review: Add field to Silverpop export: first donation date - https://phabricator.wikimedia.org/T150467#3651578 (Ejegg) Open>Resolved p:Triage>Normal [17:05:28] Fundraising Sprint Synchronized Screaming, Fundraising-Backlog, Patch-For-Review: Add currency symbol to email export file - https://phabricator.wikimedia.org/T156410#3651580 (Ejegg) Open>Resolved [17:06:02] Fundraising Sprint Synchronized Screaming, Fundraising-Backlog, Patch-For-Review: Add currency symbol to email export file - https://phabricator.wikimedia.org/T156410#3651586 (CCogdill_WMF) Confirming these look good. Thanks so much! [17:12:33] Fundraising-Backlog, Wikimedia-Fundraising-CiviCRM: Pre-requeue message manipulation should be a hook - https://phabricator.wikimedia.org/T177237#3651621 (Ejegg) [17:39:13] Seddon: hey! :) Would you have the chance perhaps to give a summary of the WMDE and German WLM banner impression issues? It's just that since I'm investigating enwiki impressions in japan, it'd be helpful to know what other impressions issues there are, since they might be related... Thx in advance!! [17:39:32] (or if there's e-mails maybe u could forward them to fr-tech?) [17:40:09] What I've discovered about the Japan enwiki issue is that it's cyclical: impression rate varies by time of day, pointing to some sort of proxy with a datacenter in Japan [17:52:44] fundraising-tech-ops: Create alerts for rsyslog rate limiting - https://phabricator.wikimedia.org/T176924#3651805 (Ejegg) Darn, I'd hoped the Civi host would be fine after cutting the message rate in half, but it looks like there are still spikes of rate-limiting [18:23:45] Fundraising-Backlog, fundraising-tech-ops: civicrm mail settings vs outcome snafu - https://phabricator.wikimedia.org/T177244#3651911 (Jgreen) [18:26:34] Fundraising-Backlog, fundraising-tech-ops: civicrm mail settings vs outcome snafu - https://phabricator.wikimedia.org/T177244#3651927 (Jgreen) [18:29:44] fundraising-tech-ops: prometheus collector or exporter for postfix metrics - https://phabricator.wikimedia.org/T176495#3651930 (Jgreen) For posterity, we now have delivery rates added by log-scraping: class { 'prometheus::collector::syslog': jobs => { '/var/log/mail.log' => {... [18:46:21] fundraising-tech-ops: prometheus collector or exporter for postfix metrics - https://phabricator.wikimedia.org/T176495#3651961 (Jgreen) Timing-wise, it looks like this broke on Jul 12, 2017 between 15:59:01 and 16:02:29 UTC. [19:36:25] !log turned on CiviMail record creation [19:36:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:37:35] Fundraising Sprint Synchronized Screaming, Fundraising-Backlog, fundraising-tech-ops: civicrm mail settings vs outcome snafu - https://phabricator.wikimedia.org/T177244#3652101 (XenoRyet) p:Triage>High [19:39:41] Fundraising Sprint Synchronized Screaming, Fundraising-Backlog: Update postal address in Thank You email - https://phabricator.wikimedia.org/T177230#3652108 (XenoRyet) [19:40:05] Fundraising Sprint Synchronized Screaming, Fundraising-Backlog: Update postal address in Thank You email - https://phabricator.wikimedia.org/T177230#3652111 (mepps) a:mepps [19:42:36] Fundraising Sprint Kickstopper, Fundraising Sprint Loose Lego Carpeting, Fundraising Sprint Murphy's Lawyer, Fundraising Sprint Navel Warfare, and 4 others: Reflect all unsubscribes via Silverpop in CiviCRM - https://phabricator.wikimedia.org/T161760#3652114 (Eileenmcnaughton) [19:42:38] Fundraising Sprint RadioActivewear, Fundraising Sprint Synchronized Screaming, Fundraising-Backlog, Wikimedia-Fundraising-CiviCRM, and 2 others: Reset on_hold in wmf_civicrm_message_email_update - https://phabricator.wikimedia.org/T170350#3652113 (Eileenmcnaughton) Open>Resolved [19:43:10] Fundraising-Backlog, MediaWiki-extensions-ContributionTracking: DB errors from missing ContributionTracking table - https://phabricator.wikimedia.org/T176229#3652115 (XenoRyet) p:Triage>Low [19:44:48] Fundraising-Backlog, MediaWiki-extensions-ContributionTracking: DB errors from missing ContributionTracking table - https://phabricator.wikimedia.org/T176229#3618108 (Ejegg) Oh hey, I bet that's related to this obsolete fundraising page: https://wikimediafoundation.org/wiki/L11_1128_Rinfo_short/en/US I... [19:47:13] Fundraising Sprint Synchronized Screaming, Fundraising-Backlog: adyen cookie error after entering information on Safari - https://phabricator.wikimedia.org/T176913#3652125 (XenoRyet) p:Triage>High [20:01:28] Fundraising Sprint Synchronized Screaming, Fundraising-Backlog, Unplanned-Sprint-Work: adyen cookie error after entering information on Safari - https://phabricator.wikimedia.org/T176913#3652155 (XenoRyet) [20:01:43] Fundraising Sprint Synchronized Screaming, Fundraising-Backlog, fundraising-tech-ops, Unplanned-Sprint-Work: civicrm mail settings vs outcome snafu - https://phabricator.wikimedia.org/T177244#3652157 (XenoRyet) [20:08:14] ejegg: if you grep civi base for verpSeparator [20:08:15] you find the places where Civi constructs it - no obvious nice function we can re-use [20:29:37] I'm just going to be away for a bit driving Luke to a play date (yay) and getting coffee (much bigger yay) [20:49:47] (PS1) Ejegg: Report consumer run time and drush startup time [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/381856 [20:52:59] (PS2) Ejegg: Report consumer run time and drush startup time [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/381856 [21:53:03] Fundraising Sprint Synchronized Screaming, Fundraising-Backlog, fundraising-tech-ops, Unplanned-Sprint-Work: civicrm mail settings vs outcome snafu - https://phabricator.wikimedia.org/T177244#3651911 (Ejegg) Update: this seems to be related to the PHPMailer security patch we deployed way back in... [21:53:31] Fundraising Sprint Synchronized Screaming, Fundraising-Backlog, fundraising-tech-ops, Unplanned-Sprint-Work: civicrm mail settings vs outcome snafu - https://phabricator.wikimedia.org/T177244#3652407 (Ejegg) Does anyone know what is reading the qmail style VERP records from the bounce mailbox? Th... [21:58:54] Fundraising Sprint Synchronized Screaming, Fundraising-Backlog, Unplanned-Sprint-Work: adyen cookie error after entering information on Safari - https://phabricator.wikimedia.org/T176913#3641192 (Ejegg) One possibility: we do client-side browser detection and then for Safari do full redirects rather... [22:19:15] oh hey, something's putting some decent strain on the DBs [22:23:52] ejegg: I did run some mail update but it ran fairly quickly [22:24:33] eileen ah, the writes have just fallen off. did you run it in the last 20 min or so? [22:24:43] yep [22:25:04] looks like I logged on at 6 past the hour [22:25:09] I can run more.... [23:35:42] (PS1) Ejegg: Use **kwargs for JobRunner.run() [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/381898 [23:38:06] (CR) Cdentinger: [C: 2] Allow slow-starting jobs [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/368217 (https://phabricator.wikimedia.org/T171873) (owner: Ejegg) [23:38:48] thanks cwd! [23:39:03] np [23:39:06] (Merged) jenkins-bot: Allow slow-starting jobs [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/368217 (https://phabricator.wikimedia.org/T171873) (owner: Ejegg) [23:49:23] (CR) Cdentinger: [C: 2] Use **kwargs for JobRunner.run() [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/381898 (owner: Ejegg) [23:52:30] (Merged) jenkins-bot: Use **kwargs for JobRunner.run() [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/381898 (owner: Ejegg)