[00:12:01] Fundraising Sprint Synchronized Screaming, Fundraising Sprint turtles that are robotic that destroy the whole world with their foot, Fundraising-Backlog, Unplanned-Sprint-Work: donatewiki_counts and banner impressions missing in pgehres database - https://phabricator.wikimedia.org/T177331#3678404 (... [00:25:12] Fundraising-Backlog, fundraising-tech-ops: fundraising database replication lag master thread - https://phabricator.wikimedia.org/T173472#3678445 (Jgreen) >>! In T173472#3677982, @Eileenmcnaughton wrote: > We are directly dropping those tables in the code - so session cleanup isn't the pattern For today... [00:40:22] Fundraising Sprint Synchronized Screaming, Fundraising Sprint turtles that are robotic that destroy the whole world with their foot, Fundraising-Backlog, Unplanned-Sprint-Work: donatewiki_counts and banner impressions missing in pgehres database - https://phabricator.wikimedia.org/T177331#3678491 (... [00:44:01] Fundraising-Backlog, fundraising-tech-ops: fundraising database replication lag master thread - https://phabricator.wikimedia.org/T173472#3678507 (Eileenmcnaughton) I added a subtask T178020 for that change - & it's in review now. Looking at this I think the spikes are around about every 30 mins & that i... [00:44:34] Fundraising Sprint turtles that are robotic that destroy the whole world with their foot, Fundraising-Backlog, fundraising-tech-ops, Patch-For-Review, Unplanned-Sprint-Work: Eliminate creating of temp tables from Omnimailing.load task - https://phabricator.wikimedia.org/T178020#3678119 (Eileen... [13:59:03] (PS11) Mepps: Handle payment not initiated [extensions/DonationInterface] - https://gerrit.wikimedia.org/r/382513 [14:00:58] (CR) jerkins-bot: [V: -1] Handle payment not initiated [extensions/DonationInterface] - https://gerrit.wikimedia.org/r/382513 (owner: Mepps) [14:03:59] (PS12) Mepps: Handle payment not initiated [extensions/DonationInterface] - https://gerrit.wikimedia.org/r/382513 [14:54:56] (CR) Mepps: [C: 2] Add sqlite3 to requre-dev [wikimedia/fundraising/SmashPig] - https://gerrit.wikimedia.org/r/383388 (owner: Ejegg) [14:57:33] (Merged) jenkins-bot: Add sqlite3 to requre-dev [wikimedia/fundraising/SmashPig] - https://gerrit.wikimedia.org/r/383388 (owner: Ejegg) [15:01:18] (CR) Mepps: [C: 1] "This looks pretty good to me, but maybe XenoRyet should look too?" [wikimedia/fundraising/dash] - https://gerrit.wikimedia.org/r/383708 (owner: Ejegg) [15:19:10] /nick ejegg [15:19:48] derp [15:19:56] hi mepps! [15:20:02] taking a look at PS12 [15:20:16] hi ejegg, great! [15:21:15] oh cool, looks like some logic towards T176376 [15:21:15] T176376: Generic orphan rectifier needs to handle contribution tracking - https://phabricator.wikimedia.org/T176376 [15:21:55] Fundraising-Backlog: Minimum amount changes from CA donation form to Paypal - https://phabricator.wikimedia.org/T177415#3679936 (Pcoombe) Open>Resolved a:Pcoombe Great, thanks. I've updated the minimum amounts in donatewiki and banners. [15:22:54] yup! [15:23:15] also ejegg do you know why do not solicit in civi doesn't have a default value? [15:23:42] Huh, I can't think of a specific reason! [15:39:48] it screws me up whenever i'm trying to create sample data locally--i wonder if this is a pain point for anyone else [15:43:47] mepps yeah, it's a bit annoying [15:47:07] Fundraising-Backlog: Ingenico audit issue: transactions not in Civi - https://phabricator.wikimedia.org/T178081#3680036 (MBeat33) [15:50:51] (CR) Ejegg: [C: 1] "OK, adding the redirect looks like a good way to solve the problem. In that case, can you omit the changes to PaymentResult? Adding an ass" [extensions/DonationInterface] - https://gerrit.wikimedia.org/r/382513 (owner: Mepps) [15:51:25] mepps that patch looks deployable, just a couple requests ^^^ [15:52:41] PaymentResult seems like an attempt to hide the details of the FinalStatus, so it's a bit odd to add that back on top of the isFailed() type methods [16:24:12] (CR) Ejegg: "recheck" [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/381499 (owner: Ejegg) [16:39:34] Fundraising Sprint Synchronized Screaming, Fundraising Sprint turtles that are robotic that destroy the whole world with their foot, Fundraising-Backlog, Unplanned-Sprint-Work: donatewiki_counts and banner impressions missing in pgehres database - https://phabricator.wikimedia.org/T177331#3680205 (... [16:41:12] (CR) Ejegg: [C: 2] Add editorconfig to various drupal dirs with drupal whitespace standard [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/383300 (https://phabricator.wikimedia.org/T177725) (owner: Eileen) [16:45:06] (Merged) jenkins-bot: Add editorconfig to various drupal dirs with drupal whitespace standard [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/383300 (https://phabricator.wikimedia.org/T177725) (owner: Eileen) [16:47:01] Fundraising-Backlog, Wikimedia-Fundraising-CiviCRM: Overlong gateway_txn_id made it to QC - https://phabricator.wikimedia.org/T178086#3680237 (Ejegg) [16:58:14] Fundraising Sprint Synchronized Screaming, Fundraising Sprint turtles that are robotic that destroy the whole world with their foot, Fundraising-Backlog, Unplanned-Sprint-Work: donatewiki_counts and banner impressions missing in pgehres database - https://phabricator.wikimedia.org/T177331#3680278 (... [17:10:26] (CR) XenoRyet: [C: 2] Fixes to client-side error logging [extensions/DonationInterface] - https://gerrit.wikimedia.org/r/383650 (https://phabricator.wikimedia.org/T121800) (owner: Ejegg) [17:11:20] (CR) XenoRyet: [C: 2] Add an off switch for client-side error logging [extensions/DonationInterface] - https://gerrit.wikimedia.org/r/383712 (https://phabricator.wikimedia.org/T121800) (owner: Ejegg) [17:16:18] (Merged) jenkins-bot: Fixes to client-side error logging [extensions/DonationInterface] - https://gerrit.wikimedia.org/r/383650 (https://phabricator.wikimedia.org/T121800) (owner: Ejegg) [17:16:36] (Merged) jenkins-bot: Add an off switch for client-side error logging [extensions/DonationInterface] - https://gerrit.wikimedia.org/r/383712 (https://phabricator.wikimedia.org/T121800) (owner: Ejegg) [17:23:03] (PS13) Mepps: Handle payment not initiated [extensions/DonationInterface] - https://gerrit.wikimedia.org/r/382513 [17:24:13] ejegg changes made ^ [18:11:01] (CR) Mepps: [C: 1] "This looks good to me but I'm having an access error locally with the dedupe page so can't test out the bug or the fix." [wikimedia/fundraising/crm/civicrm] - https://gerrit.wikimedia.org/r/383501 (https://phabricator.wikimedia.org/T177873) (owner: Eileen) [18:12:02] more donations queue consumer failures :S [18:12:17] XenoRyet: any luck with your investigation? [18:12:37] This is starting to feel pretty serious [18:12:48] I'll see if I can replicate locally [18:13:44] Yea, I was on the other thing this morning, but I'll switch back over. This is getting serious. [18:15:48] (PS14) Ejegg: Handle payment not initiated [extensions/DonationInterface] - https://gerrit.wikimedia.org/r/382513 (owner: Mepps) [18:16:10] ejgg XenoRyet, are you guys discussing the unknown_error? [18:16:57] (CR) Ejegg: [C: 2] "Looks good! If you want, the test could look at the PaymentResult returned from processDonorReturn instead of getTransactionResponse etc." [extensions/DonationInterface] - https://gerrit.wikimedia.org/r/382513 (owner: Mepps) [18:17:38] mepps: We're on about the donations queue consume failed with code 1 failmail. Though that unknown_error one looks pretty serious as well. [18:18:01] XenoRyet, i can look at unknown_error, just didn't want to duplicate work [18:18:12] Yea, go for it. [18:23:11] OK, I've got a message queued up with country='AN' and a fake state_province [18:23:24] let's see what happens when I try importing that [18:23:42] (Merged) jenkins-bot: Handle payment not initiated [extensions/DonationInterface] - https://gerrit.wikimedia.org/r/382513 (owner: Mepps) [18:28:33] so, there was no problem locally, let's see what the db tables look like [18:29:32] The incoming message looks normal too, nothing amiss with the way it came in the door. [18:30:38] so, no address got created in my local import [18:31:32] but it kept on going with inserting the contribution [18:31:49] what kind of settings would cause that? [18:32:01] I mean, would cause the thing to fail [18:32:14] Yea, I don't know. [18:32:49] are the country tables the same locally? [18:33:45] well, I get the same 'Cannot find country' locally [18:34:12] let's see what happens next [18:34:53] Fundraising-Backlog, Wikimedia-Fundraising-CiviCRM: Weird contribution merge in Civi - https://phabricator.wikimedia.org/T178021#3680693 (LeanneS) Thanks @Eileenmcnaughton. Does this issue occur just by having multiple tabs open or when having edit pages open in both tabs? [18:35:52] Fundraising Sprint Synchronized Screaming, Fundraising Sprint turtles that are robotic that destroy the whole world with their foot, Fundraising-Backlog, Unplanned-Sprint-Work: donatewiki_counts and banner impressions missing in pgehres database - https://phabricator.wikimedia.org/T177331#3680704 (... [18:37:22] XenoRyet: ah, my local test data needs to have more data than just country [18:37:39] it's getting thrown out by wmf_civicrm_is_address_valid [18:37:52] Fundraising Sprint Synchronized Screaming, Fundraising Sprint turtles that are robotic that destroy the whole world with their foot, Fundraising-Backlog, Unplanned-Sprint-Work: donatewiki_counts and banner impressions missing in pgehres database - https://phabricator.wikimedia.org/T177331#3680706 (... [18:40:42] ooh, that's throwing a thing that's not catchable! [18:41:06] rolls everything back and does a CiviExit [18:41:26] worryingly, it looks like our at-least-once processing isn't doing its thing [18:41:39] that is, we're dropping the message from the queue [18:41:47] so that's another thing to examine [18:42:33] for the current crop, I think we insert the missing country and make country_id mandatory in wmf_civicrm_is_address_valid [18:43:11] XenoRyet: let me know what you think of that solution ^^^ [18:44:12] Sounds reasonable on its face. Let me look. [18:51:20] Oh hey, Netherlands Antilles is no longer a country [18:51:34] Well, that's a thing. [18:51:41] as of 2010 it's part of Natherlands proper again [18:51:57] I was gonna say though, looks like we are already requiring country_id in _is_address_valid [18:52:14] XenoRyet: that's an 'any', not an 'all' check [18:52:32] oh, whoops, misread. You're right. [18:52:41] and because there was street_address data in there, it was calling it valid [18:52:58] yea, confused myself for a second there. [18:53:42] so.. there's another country code to add to the unstaging [18:57:54] (PS1) Ejegg: PayPal: map Netherlands Antilles to Netherlands [extensions/DonationInterface] - https://gerrit.wikimedia.org/r/383897 (https://phabricator.wikimedia.org/T177803) [19:04:10] Fundraising-Backlog, fundraising-tech-ops: fundraising database replication lag master thread - https://phabricator.wikimedia.org/T173472#3680762 (Jgreen) >>! In T173472#3678507, @Eileenmcnaughton wrote: > ... Looking at this I think the spikes are around about every 30 mins.... That lines up with the p... [19:05:15] (PS1) Ejegg: Drop addresses without country [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/383900 (https://phabricator.wikimedia.org/T177803) [19:06:39] OK, so... I think we need to reconstruct these [19:08:32] and fix the php-queue/predis popAtomic call to actually leave the message on exit() [19:09:06] something's not quite right here: https://github.com/CoderKungfu/php-queue/blob/master/src/PHPQueue/Backend/Predis.php#L104 [19:10:13] ah crud, it's deeper, something in the predis driver? [19:10:20] or maybe we're using the transaction wrong [19:16:28] Fundraising-Backlog, Wikimedia-Fundraising-CiviCRM: PHP-queue predis backend is losing data when exit() is call in popAtomic callback - https://phabricator.wikimedia.org/T178104#3680821 (Ejegg) [19:21:12] so... maybe we can just re-parse the latest PayPal audit files? [19:32:26] (CR) XenoRyet: [C: 2] PayPal: map Netherlands Antilles to Netherlands [extensions/DonationInterface] - https://gerrit.wikimedia.org/r/383897 (https://phabricator.wikimedia.org/T177803) (owner: Ejegg) [19:33:58] (CR) XenoRyet: [C: 2] Drop addresses without country [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/383900 (https://phabricator.wikimedia.org/T177803) (owner: Ejegg) [19:35:52] (Merged) jenkins-bot: PayPal: map Netherlands Antilles to Netherlands [extensions/DonationInterface] - https://gerrit.wikimedia.org/r/383897 (https://phabricator.wikimedia.org/T177803) (owner: Ejegg) [19:35:55] Fundraising Sprint turtles that are robotic that destroy the whole world with their foot, Fundraising-Backlog, Wikimedia-Fundraising-CiviCRM, Patch-For-Review: Donation queue consumer crash on unknown country - https://phabricator.wikimedia.org/T177803#3680896 (XenoRyet) a:XenoRyet>Ejegg [19:42:17] (Merged) jenkins-bot: Drop addresses without country [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/383900 (https://phabricator.wikimedia.org/T177803) (owner: Ejegg) [19:43:50] (PS1) Ejegg: Merge branch 'master' into deployment [wikimedia/fundraising/crm] (deployment) - https://gerrit.wikimedia.org/r/383905 [19:44:10] (CR) Ejegg: [C: 2] Merge branch 'master' into deployment [wikimedia/fundraising/crm] (deployment) - https://gerrit.wikimedia.org/r/383905 (owner: Ejegg) [19:45:43] (Merged) jenkins-bot: Merge branch 'master' into deployment [wikimedia/fundraising/crm] (deployment) - https://gerrit.wikimedia.org/r/383905 (owner: Ejegg) [19:47:59] !log updated CiviCRM from 47de9768872243cbe763acd4184d767ddf69daba to 2d6668b5669118cba2f837e722c09499495fbc60 [19:48:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:49:00] AndyRussG: meeting? [19:50:42] Fundraising-Backlog, Wikimedia-Fundraising-CiviCRM: Add link to contact summary to assist in finding & deduping the contact - https://phabricator.wikimedia.org/T177999#3677467 (DStrine) [19:51:00] Fundraising-Backlog, Wikimedia-Fundraising-CiviCRM: Improve major gifts email links when a contact has already been merged so there is not a findy-game - https://phabricator.wikimedia.org/T178000#3680954 (DStrine) [19:59:33] Fundraising Sprint turtles that are robotic that destroy the whole world with their foot, Fundraising-Backlog, Wikimedia-Fundraising-CiviCRM: Overlong gateway_txn_id made it to QC - https://phabricator.wikimedia.org/T178086#3680963 (DStrine) [19:59:56] Fundraising Sprint turtles that are robotic that destroy the whole world with their foot, Fundraising-Backlog, Wikimedia-Fundraising-CiviCRM: Overlong gateway_txn_id made it to QC - https://phabricator.wikimedia.org/T178086#3680966 (XenoRyet) a:XenoRyet [20:04:11] Fundraising-Backlog: Ingenico audit issue: transactions not in Civi - https://phabricator.wikimedia.org/T178081#3680978 (DStrine) p:Triage>High [20:06:40] Fundraising-Backlog, FR-Adyen, FR-Smashpig: Adyen jobs should retry at least once on connect failure - https://phabricator.wikimedia.org/T177893#3680991 (DStrine) [20:15:12] Fundraising Sprint turtles that are robotic that destroy the whole world with their foot, Fundraising-Backlog, Wikimedia-Fundraising-CiviCRM, MW-1.31-release-notes (WMF-deploy-2017-10-17 (1.31.0-wmf.4)), Patch-For-Review: Donation queue consumer... - https://phabricator.wikimedia.org/T177803#3670769 [20:16:17] Fundraising Sprint turtles that are robotic that destroy the whole world with their foot, Fundraising-Backlog, Wikimedia-Fundraising-CiviCRM, MW-1.31-release-notes (WMF-deploy-2017-10-17 (1.31.0-wmf.4)), Patch-For-Review: Donation queue consumer... - https://phabricator.wikimedia.org/T177803#3681016 [20:33:30] ejegg: if you get a chance to look at this one https://gerrit.wikimedia.org/r/#/c/383734/ - it might help / give us seful info on replg [20:33:52] k, that's next on my list! [20:35:31] Fundraising-Backlog, Wikimedia-Fundraising-CiviCRM: Weird contribution merge in Civi - https://phabricator.wikimedia.org/T178021#3681063 (Eileenmcnaughton) It need to be an editable page in each tab & it confuses which tab the form submission is for :-( [20:36:59] ejegg: I know it's a long list! [20:37:09] :) [20:41:26] Fundraising Sprint turtles that are robotic that destroy the whole world with their foot, Fundraising-Backlog, Wikimedia-Fundraising-CiviCRM: Coinbase import error - https://phabricator.wikimedia.org/T177806#3681098 (Ejegg) a:Eileenmcnaughton>LeanneS Ooh, more constraint violations! We had an... [20:47:05] (PS3) Ejegg: Fix Omnimailing.load job to bypass creating recipients & to use internal replace, add test. [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/383734 (https://phabricator.wikimedia.org/T178020) (owner: Eileen) [20:47:30] (CR) Ejegg: [C: 2] "_skip_evil_ indeed! The code looks good." [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/383734 (https://phabricator.wikimedia.org/T178020) (owner: Eileen) [20:52:22] (Merged) jenkins-bot: Fix Omnimailing.load job to bypass creating recipients & to use internal replace, add test. [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/383734 (https://phabricator.wikimedia.org/T178020) (owner: Eileen) [21:36:27] !log disabled omnimail jobs [21:36:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:46:32] (PS1) XenoRyet: Handle additional type of failed recurrance. [wikimedia/fundraising/SmashPig] - https://gerrit.wikimedia.org/r/383949 (https://phabricator.wikimedia.org/T178086) [23:28:59] (PS1) Eileen: Merge branch 'master' of https://gerrit.wikimedia.org/r/wikimedia/fundraising/crm into deployment [wikimedia/fundraising/crm] (deployment) - https://gerrit.wikimedia.org/r/383954 [23:29:37] cwd I'm ready to deploy https://gerrit.wikimedia.org/r/#/c/383954/ which bypasses those specific temp tables we were discussing the other day [23:29:43] (CR) Eileen: [C: 2] Merge branch 'master' of https://gerrit.wikimedia.org/r/wikimedia/fundraising/crm into deployment [wikimedia/fundraising/crm] (deployment) - https://gerrit.wikimedia.org/r/383954 (owner: Eileen) [23:30:32] Fundraising-Backlog, fundraising-tech-ops: fundraising database replication lag master thread - https://phabricator.wikimedia.org/T173472#3681659 (Eileenmcnaughton) I am ready to deploy a patch the affects that half hourly job & should eliminate temp tables from it https://gerrit.wikimedia.org/r/#/c/3839... [23:30:56] eileen1: i actually have all the omnimail jobs off right now [23:30:57] (Merged) jenkins-bot: Merge branch 'master' of https://gerrit.wikimedia.org/r/wikimedia/fundraising/crm into deployment [wikimedia/fundraising/crm] (deployment) - https://gerrit.wikimedia.org/r/383954 (owner: Eileen) [23:31:01] as of 1.5 hours ago or so [23:31:17] https://grafana.wikimedia.org/dashboard/db/fundraising-database?panelId=28&fullscreen&orgId=1&from=now-12h&to=now&edit&refresh=1m [23:31:27] cwd yep - I saw that - that's why I wanted to let you know first [23:31:47] if you click on the slaves (besides 1003) you will see a pattern [23:32:19] cwd shall I deploy & then we can turn that job back on? [23:32:22] anyway yes please feel free to deploy! [23:32:24] yeah [23:32:32] and we cna see if the increase stays stopped [23:32:38] ok - I just didn't want to cause confusion if you were testing something [23:32:45] thanks [23:32:49] all good now [23:33:05] still speculation as to whether this is even the cause of lag but it's something [23:34:01] well it's interesting because the job that turns out to have been creating temp tables is fairly low volume so had not been a focus until jeff dug up that stuff from the bin log 2 days ago [23:34:32] eileen1: is it a specific omnimail job? i turned off all 5 [23:35:50] !log update from 2d6668b5669118cba2f837e722c09499495fbc60 to b953064ec27228f158c3d0e5aa9cdf043cb20256 [23:35:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:36:06] cwd well several of them are the same but with different frequencies [23:36:15] anyway - I just deployed that patch [23:36:31] (PS1) Ejegg: Merge branch 'master' into deployment [wikimedia/fundraising/crm] (deployment) - https://gerrit.wikimedia.org/r/383957 [23:36:36] (CR) Ejegg: [C: 2] Merge branch 'master' into deployment [wikimedia/fundraising/crm] (deployment) - https://gerrit.wikimedia.org/r/383957 (owner: Ejegg) [23:37:02] cwd the situation in our bin log actually sounds like that old bug report - with the transactions being written out of order - although not necessarily same underlying [23:37:15] oops, beat me to it! [23:37:28] (Abandoned) Ejegg: Merge branch 'master' into deployment [wikimedia/fundraising/crm] (deployment) - https://gerrit.wikimedia.org/r/383957 (owner: Ejegg) [23:37:45] i'll turn the job back on now [23:38:34] cwd there's a (zombie) process of that job running since yesterday [23:39:08] ah weird [23:39:20] 27701 [23:39:38] or rather 27711 [23:39:46] lotta process redirection going on there [23:40:21] want me to kill it? [23:40:27] I thought it was an artifact of my 'running jobs' prometheus script, but nope, it's actually been running all that time [23:40:36] cwd yes please [23:40:48] makes for a cleaner test of the new patch [23:40:59] ok gone [23:41:21] cwd the magic line is number 78 in this patch [23:41:22] https://gerrit.wikimedia.org/r/#/c/383734/3/sites/default/civicrm/extensions/org.wikimedia.omnimail/api/v3/Omnimailing/Load.php [23:42:08] eileen1: in the before or after? [23:42:22] the replace one? [23:42:32] this line [23:42:32] '_skip_evil_bao_auto_recipients_' => 1, [23:42:51] passing that parameter in skips…. evil [23:43:15] evil temp tables? [23:44:49] autopopulated stuffs [23:45:08] that don't make any sense for synthetic mailing records like we're inserting [23:46:13] eileen1: the 'transactional' extension you pointed to doesn't seem to use any of those tricks. looks like they could learn something from that latest commit! [23:48:18] ejegg: yep probably ! [23:48:44] ejegg: I haven't really looked at that extension yet - it's on my radar [23:48:50] but I haven't examined it [23:49:17] so did we re-enabled those jobs? [23:49:56] it's pretty simple, mostly just inserting a MailingEventQueue and Activity for the transactional emails [23:50:17] I haven't re-enabled them. cwd? [23:50:18] eileen1: yep [23:50:26] !log re-enabled omnimail jobs [23:50:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:51:00] so we should see or not see a spike in temp tables in 10-15 minutes [23:52:21] well there were 5 jobs [23:52:27] & I think 4 are the same job [23:52:33] & that is the one that got fixed [23:52:38] & the other doesn't use temp tables [23:52:47] so my money says they stop climbing [23:53:02] if they stop climbing I vote we try to get a restart on services [23:53:17] so we have a clean baseline to track the re-emergence of replag from [23:53:21] and see if it all stays at -? [23:53:24] and by - i meant 0