[03:30:06] PROBLEM - check_procs on frdev1001 is CRITICAL: PROCS CRITICAL: 1179 processes [04:05:15] PROBLEM - check_procs on frdev1001 is CRITICAL: PROCS CRITICAL: 1240 processes [04:15:06] PROBLEM - check_procs on frdev1001 is CRITICAL: PROCS CRITICAL: 1124 processes [04:25:15] RECOVERY - check_procs on frdev1001 is OK: PROCS OK: 450 processes [04:35:15] PROBLEM - check_procs on frdev1001 is CRITICAL: PROCS CRITICAL: 1236 processes [05:08:09] !log killed some dedupe queries on staging that were causing alerts [05:08:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:15:15] RECOVERY - check_procs on frdev1001 is OK: PROCS OK: 283 processes [13:48:17] good morning jgleeson! [13:48:29] morning mepps :) [13:48:36] how's it going? [13:49:45] how was your conference? [13:50:17] oops I got disconnected [13:50:31] mepps, how was the conference :) [13:50:41] jgleeson...a little weird [13:51:45] really? how so [14:44:19] Fundraising-Backlog: Help to deploy new Thank you email in Dutch (with updated tax copy) - https://phabricator.wikimedia.org/T193228#4164069 (Pcoombe) I believe the tax deductibility link will work as is. However afaik there isn't currently an option to change the email for specific countries, so we can add... [14:50:49] Wikimedia-Fundraising-Banners: Fix Lint errors for current fundraising banners - https://phabricator.wikimedia.org/T192480#4164097 (Pcoombe) They contain wikitext for simple links, templates etc. [15:04:20] fr-tech ok, so we had a bunch more donations hung up with constraint violations last night [15:04:31] I'm requeueing them all now [15:09:16] is that the 42 REMOVAL emails ejegg ? [15:12:41] yeah [15:12:58] is this the issue that you and eileen were discussing yesterday ejegg? [15:13:10] again it was in the middle of a huge storm of donation imports [15:13:18] from the ingenico audit [15:13:23] mepps yeah, exactly [15:13:36] So, we've gotta do something about this, just not sure what [15:13:44] what did you discover yesterday ejegg? [15:14:04] very little :( [15:14:26] neither of us have been able to recreate the issue [15:14:33] we can cause deadlocks locally [15:14:51] but not in a way that mimics the error messages we see in prod [15:15:35] where it seems like a contact ID is returned, but then is invalid when we go to insert the contribution [15:15:51] and yet despite the 'contact_id is invalid' message [15:15:53] in the meantime ejegg my crm install has totally broken :( [15:16:03] ah weird, have we been able to find that contact_id? [15:16:11] we get a contribution ID back and go on to insert the wmf_contribution_extra row [15:16:14] did we recently set up any deduping on import? [15:16:30] which THEN causes the constraint vio [15:16:38] mepps only for manual CSV imports [15:17:03] the big change to queue imports (and all contrib writes) [15:17:18] is the new triggers for updating wmf_donor [15:17:35] seems like doing that in the DB should have been less intensive than doing it in hooks [15:17:45] (and more reliable) [15:18:02] but I guess it's making that one update too likely to hit deadlocks [15:18:30] so... if we were afraid of everything falling down this weekend we COULD disable those triggers [15:18:35] but I'm not took worried about that [15:18:59] especaially since it turns out the recurring charge job has some good defensive logic [15:19:14] to update the next_charge_date before it tries inserting the contribution [15:19:37] I still think this would be a good idea, though: https://gerrit.wikimedia.org/r/429118 [15:20:23] mepps want to debug your local install via hangout? [15:20:48] ejegg, we can try! but we also have the advancement talk in 10 [15:21:12] oh right, well, let's look at it for 5 min at least? [15:21:35] i'm in the usual hangout [15:26:43] mepps, re:your local problems, try out vagrant! it's building the fr-tech stack without issue currently [15:26:57] actually I remember you had some issues with your version of vagrant? [15:27:13] the windowsy error [15:27:16] :( [15:31:55] yeah :( [16:54:58] Wikimedia-Fundraising-Banners: Fix Lint errors for current fundraising banners - https://phabricator.wikimedia.org/T192480#4164378 (Pcoombe) Open>Resolved Okay, I've fixed our template banners. Diffs: * [Desktop large](https://meta.wikimedia.org/w/index.php?title=MediaWiki:Centralnotice-template-B17... [17:19:49] fundraising-tech-ops, Operations, ops-eqiad: rack frbast1001 - https://phabricator.wikimedia.org/T187363#4164448 (Jgreen) Open>Resolved Casey's got this host up and running, closing task! [17:34:19] cwd, Jeff_Green, Grafana is showing request errors when trying to hit 'frack.codfw prometheus' as a data source. The response being sent back contains: [17:34:21] Error [17:34:21] Our servers are currently under maintenance or experiencing a technical problem. [17:34:21] Please try again in a few minutes. [17:34:46] you can inspect the full response here, https://grafana.wikimedia.org/dashboard/db/fundraising-overview?refresh=5s&orgId=1&from=now-7d&to=now&panelId=20&fullscreen [17:34:59] click on the red icon, and click the response tab [17:35:07] ok, that sounds like a grafana issue, I'll ask the SRE team [17:35:28] thanks Jeff_Green [17:35:38] thanks for the heads up [17:36:51] Jeff_Green: the thing is, we can see results for all the rest [17:37:04] I think it's a problem with the data for those specific graphs [17:37:10] ejegg, from what I can see, the others use a difference data source [17:37:15] different* [17:37:25] but grafana is hiding the error message [17:37:39] ^^ [17:37:47] jgleeson oh hey, does it show anything more when logged in to the admin URL? [17:37:59] iirc grafana has to be able to pull that data from our collectors at codfw, so it could be a firewall or other config issue [17:38:23] ejegg, when you browse here do you see the red icon top left? https://grafana.wikimedia.org/dashboard/db/fundraising-overview?refresh=5s&orgId=1&from=now-7d&to=now&panelId=20&fullscreen [17:38:30] if you click that, can you see the response tab? [17:38:41] yeah, it shows that generic error [17:38:44] that shows a html error page [17:38:57] with some server related failure [17:38:58] If you report this error to the Wikimedia System Administrators, please include the details below.Request from 2a02:c7f:940c:ec00:b4c7:cbe0:e549:5dea via cp1051 cp1051, Varnish XID 155486545Error: 502, Bad Gateway at Fri, 27 Apr 2018 17:38:32 GMT [17:39:07] ah, still generic when I look via grafana-admin [17:39:35] well, I'mma grab some lunch, back in a bit! [17:39:46] enjoy! [17:44:04] fundraising-tech-ops, Patch-For-Review: DNS name for new fundraising bastion - https://phabricator.wikimedia.org/T193178#4164474 (Jgreen) [18:00:53] sounds like grafana, gitting the cache proxy servers [18:01:03] we don't have those in fundraising [18:02:06] interesting [18:02:11] varnish? [18:02:21] yep [18:02:36] most of WP is served from the varnish cache [18:03:01] only updates etc will hit php [18:50:38] Fundraising-Backlog: 2 fundraising tech grafana dashboards broken - https://phabricator.wikimedia.org/T193056#4164587 (jgleeson) @Jeff_G has informed SRE [18:50:53] hmm Jeff_Green, what's your Phabricator handle [18:51:08] Jeff_G looks a a little different [18:51:45] yeah that's some other dude, I think maybe it's maybe jgreen? [18:52:04] got it [18:52:49] Fundraising Sprint Ivory and eggshell white are the same color, Fundraising-Backlog: 2 fundraising tech grafana dashboards broken - https://phabricator.wikimedia.org/T193056#4164608 (jgleeson) [18:53:11] hehehe [19:05:07] Fundraising Sprint Ivory Tower Defense Games, Fundraising Sprint Unbreaking Now, Fundraising Sprint Value Subtracting, Fundraising-Backlog, and 2 others: Adyen Capture Success - any way to retrieve donor details if missing pending message? - https://phabricator.wikimedia.org/T193266#4164664 (Ejegg... [19:07:17] Fundraising Sprint Ivory Tower Defense Games, Fundraising Sprint Ivory and eggshell white are the same color, Fundraising Sprint Unbreaking Now, Fundraising Sprint Value Subtracting, and 4 others: Capture Adyen payments without pending messages - https://phabricator.wikimedia.org/T149861#4164677 (... [19:30:30] (PS15) Mepps: Capture Adyen payments missing pending messages [wikimedia/fundraising/SmashPig] - https://gerrit.wikimedia.org/r/319489 (https://phabricator.wikimedia.org/T149861) (owner: Ejegg) [19:30:35] (CR) Mepps: [C: 2] Capture Adyen payments missing pending messages [wikimedia/fundraising/SmashPig] - https://gerrit.wikimedia.org/r/319489 (https://phabricator.wikimedia.org/T149861) (owner: Ejegg) [19:30:40] :) thanks mepps! [19:31:00] (Merged) jenkins-bot: Capture Adyen payments missing pending messages [wikimedia/fundraising/SmashPig] - https://gerrit.wikimedia.org/r/319489 (https://phabricator.wikimedia.org/T149861) (owner: Ejegg) [20:13:43] (PS1) Ejegg: WIP associate tokenized payments with recur id [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/429475 [20:21:30] (PS2) Ejegg: WIP associate tokenized payments with recur id [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/429475 [20:27:54] (PS1) Ejegg: WIP add payment processor id to recur record [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/429519 [21:12:51] (PS9) Ejegg: Civix stub for SmashPig payment processor extension [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/426068 (https://phabricator.wikimedia.org/T1888678) [23:07:35] Fundraising-Backlog, Fr-CiviCRM-dedupe-FY2017/18: Civi: open Merge & View Result in new tab - https://phabricator.wikimedia.org/T193307#4165552 (MBeat33) [23:48:44] (CR) Ejegg: "This looks great! Some minor comments, but we could totally use this as-is." (4 comments) [wikimedia/fundraising/SmashPig] - https://gerrit.wikimedia.org/r/427198 (owner: Jgleeson)