[00:07:36] !log re-enabled pending queue consumer [00:07:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:10:15] !log re-enabled fundraising audit jobs [00:10:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:11:46] fr-tech intentionally failmail incoming [00:12:04] cool, old behavior still works for lock file contention [00:12:04] cool ejegg-you're working late! [00:12:17] mepps it's even later for you! [00:12:44] * ejegg is really looking forward to that new hire [00:12:54] ejegg: are you central time there? [00:13:15] cwd one hour away from eastern, i think that's central [00:13:19] GMT-5 [00:14:09] yepyep [00:14:19] well we've got all the timezones covered now [00:14:23] in the US [00:14:33] need hawaii :) [00:14:39] touche [00:15:30] but jeff sessions says it's just some island in the pacific! [00:15:37] haha [00:16:04] still better treatment than some island in the caribbean [00:16:25] !log re-enabled omnimail import jobs [00:16:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:16:51] ok, this time re-running the recipient load SHOULDN'T failmail [00:16:56] you could totally tell that this was the first time the orange man learned that the US supports puerto rico [00:17:33] oops, the last run was actually done [00:19:16] (CR) Mepps: [C: 2] Fix UI batch merge mishandling context paramter. [wikimedia/fundraising/crm/civicrm] - https://gerrit.wikimedia.org/r/381920 (https://phabricator.wikimedia.org/T176256) (owner: Eileen) [00:19:31] (CR) Mepps: [C: 2] CRM-21202, set retrieve = true so that its not reinitialized again and just want the DataTable instance to be returned [wikimedia/fundraising/crm/civicrm] - https://gerrit.wikimedia.org/r/381912 (https://phabricator.wikimedia.org/T176256) (owner: Eileen) [00:20:02] !log re-enabled fundraising stats and export jobs [00:20:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:21:14] (CR) Mepps: [C: 2] CRM--21248 Fix merge screen conflict listing [wikimedia/fundraising/crm/civicrm] - https://gerrit.wikimedia.org/r/381921 (https://phabricator.wikimedia.org/T176256) (owner: Eileen) [00:22:15] (CR) Mepps: [C: 2] CRM-21224 get LIMIT out of the where string. [wikimedia/fundraising/crm/civicrm] - https://gerrit.wikimedia.org/r/381922 (owner: Eileen) [00:24:07] !log re-enabled civicrm cron and ingenico jobs [00:24:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:26:20] (Merged) jenkins-bot: Fix UI batch merge mishandling context paramter. [wikimedia/fundraising/crm/civicrm] - https://gerrit.wikimedia.org/r/381920 (https://phabricator.wikimedia.org/T176256) (owner: Eileen) [00:26:22] (Merged) jenkins-bot: CRM-21202, set retrieve = true so that its not reinitialized again and just want the DataTable instance to be returned [wikimedia/fundraising/crm/civicrm] - https://gerrit.wikimedia.org/r/381912 (https://phabricator.wikimedia.org/T176256) (owner: Eileen) [00:26:39] (Merged) jenkins-bot: CRM--21248 Fix merge screen conflict listing [wikimedia/fundraising/crm/civicrm] - https://gerrit.wikimedia.org/r/381921 (https://phabricator.wikimedia.org/T176256) (owner: Eileen) [00:27:18] (Merged) jenkins-bot: CRM-21224 get LIMIT out of the where string. [wikimedia/fundraising/crm/civicrm] - https://gerrit.wikimedia.org/r/381922 (owner: Eileen) [00:29:16] !log re-enabled CiviCRM de-dupe jobs [00:29:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:29:55] (CR) Mepps: Fix a couple base test case things (1 comment) [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/381499 (owner: Ejegg) [00:33:34] !log re-enabled fundraising queue consumers [00:33:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:45:13] (CR) Eileen: Fix a couple base test case things (1 comment) [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/381499 (owner: Ejegg) [00:57:27] Fundraising Sprint RadioActivewear, Fundraising Sprint Synchronized Screaming, Fundraising-Backlog, Patch-For-Review: process-control should make slow-starting jobs easier - https://phabricator.wikimedia.org/T171873#3478936 (Ejegg) Open>Resolved This works! [00:58:22] Fundraising Sprint RadioActivewear, Fundraising Sprint Synchronized Screaming, Fundraising-Backlog, Wikimedia-Fundraising-CiviCRM, Patch-For-Review: CiviCRM dedupe jobs should gracefully time out - https://phabricator.wikimedia.org/T172303#3494111 (Ejegg) still sending failmail, at least when... [01:10:52] thanks mepps wasn't expecting you to be working at this time [01:35:12] RECOVERY - check_swap on civi1001 is OK: SWAP OK - 96% free (7267 MB out of 7623 MB) [01:55:23] PROBLEM - check_swap on mintaka is CRITICAL: SWAP CRITICAL - 46% free (3489 MB out of 7627 MB) [02:00:32] PROBLEM - check_swap on mintaka is CRITICAL: SWAP CRITICAL - 11% free (802 MB out of 7627 MB) [02:05:22] RECOVERY - check_swap on mintaka is OK: SWAP OK - 96% free (7286 MB out of 7627 MB) [02:56:21] (PS5) Eileen: Remove duplicate spaces, html & and odd whitespace in name fields [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/382062 (https://phabricator.wikimedia.org/T175744) [02:56:23] (PS1) Eileen: Remove attempt to set sort_name & display_name. [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/382106 [03:45:33] (PS1) Eileen: Resolve conflicts on asymetrical whitespace when merging [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/382108 (https://phabricator.wikimedia.org/T175746) [03:47:53] Fundraising-Backlog: Ask GlobalCollect to let us cancel status 600 transactions - https://phabricator.wikimedia.org/T114205#1687915 (DStrine) I just moved this back to the analysis column to be reconsidered soonish. [04:05:26] Fundraising-Backlog: Impressions from Big English tests - https://phabricator.wikimedia.org/T177328#3654890 (Jksamra) @jrobell @Pcoombe @AndyRussG Attached is a table of the ratio of large impressions to 1st BH banners for these two tests. There is a ~60% drop in the ratios from 9/20 to 9/27 . This is driven... [04:11:02] (PS2) Eileen: Resolve conflicts on asymetrical whitespace when merging [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/382108 (https://phabricator.wikimedia.org/T175746) [04:11:04] (PS1) Eileen: Resolve conflicts on asymetrical whitespace when merging [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/382109 (https://phabricator.wikimedia.org/T175748) [04:38:57] (PS1) Eileen: Function extraction [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/382110 (https://phabricator.wikimedia.org/T175748) [04:48:34] (PS2) Eileen: Resolve conflicts on asymetrical whitespace when merging [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/382109 (https://phabricator.wikimedia.org/T175748) [04:48:36] (PS2) Eileen: Function extraction [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/382110 (https://phabricator.wikimedia.org/T175748) [04:48:38] (PS1) Eileen: Numbers are not people [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/382112 (https://phabricator.wikimedia.org/T175747) [04:52:43] (PS3) Eileen: Resolve conflicts on asymetrical whitespace when merging [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/382108 (https://phabricator.wikimedia.org/T175746) [05:17:02] (PS3) Eileen: Resolve conflicts on asymetrical punctuation when merging [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/382109 (https://phabricator.wikimedia.org/T175748) [05:17:04] (PS3) Eileen: Function extraction [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/382110 (https://phabricator.wikimedia.org/T175748) [05:17:06] (PS2) Eileen: Numbers are not people [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/382112 (https://phabricator.wikimedia.org/T175747) [05:19:33] (PS4) Eileen: Resolve conflicts on asymetrical punctuation when merging [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/382109 (https://phabricator.wikimedia.org/T175748) [05:20:38] (PS4) Eileen: Function extraction [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/382110 (https://phabricator.wikimedia.org/T175748) [05:20:40] (PS3) Eileen: Numbers are not people [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/382112 (https://phabricator.wikimedia.org/T175747) [05:24:02] Fundraising Sprint Prank Seatbelt, Fundraising Sprint Quill Pencil, Fundraising Sprint RadioActivewear, Fundraising Sprint Synchronized Screaming, and 3 others: Drop IDs on cache tables - https://phabricator.wikimedia.org/T174404#3656156 (Eileenmcnaughton) ejegg - shall we pull this out of the sp... [08:14:35] Fundraising Sprint Synchronized Screaming, Fundraising-Backlog, Unplanned-Sprint-Work: Email click data not tracking properly - https://phabricator.wikimedia.org/T177331#3656374 (jrobell) Thank you for flagging and working on this @Ejegg //The lines are all still in the main Hive web logs, so nothi... [09:37:40] Fundraising-Backlog: Impressions from Big English tests - https://phabricator.wikimedia.org/T177328#3656615 (jrobell) Thank you @Jksamra. Those ratios look very low to me indeed. @AndyRussG I am sending you the full reports from these tests over email as well. It would be great to get your help to dig int... [15:27:37] ejegg why don't we save the donor_id in the paypal message? is it available when the message is saved? [15:31:37] mepps you mean the payer_id ? [15:31:44] I guess we didn't think we needed it [15:32:03] Where are you getting errors for it being missing? [15:43:52] ooh, donation queue consume failure? [15:46:46] weird, DB Error: Unknown error, but it killed the whole queue consumer [15:47:27] seems to have been transient [15:48:44] happening right now? [15:48:56] at :39 past the hr [15:49:12] db is under load... think we can dig the actual error up? [15:49:27] i'll check the mysql log [15:49:45] cwd the error itself was at 2017-10-04 15:40:14,261 [15:50:07] (it was the run that started at 15:39) [15:51:45] looks like a deadlock [15:52:42] inserting into civicrm_email [15:53:58] i can probably determine from this log which txn was allowed to finish [15:54:01] shoot, those are pretty common, but they never used to kill the whole consumer [15:55:16] PROBLEM - check_redis on frqueue1001 is CRITICAL: CRITICAL: donations is 11654 10000, payments-init is 1299 1000 - REDIS 2.8.17 on 127.0.0.1:6379 has 1 databases (db0) with 8 keys, up 7 days 23 hours - memory use is 14.54M (peak 14.59M, 0.21% of max, fragmentation 1.19%), connected_slaves is 2, jobs is 0, jobs-adyen is 55, jobs-paypal is 731, payments-antifraud is 912, pending is 410, recurring is 107, refund is 0, unsubscribe [15:55:29] uhm [15:55:34] yoiks [15:55:44] what kind of speed are we getting here [15:57:01] is something wrong with the donation consumer? [15:57:06] averaging above 600 every run [15:58:23] perhaps we just got an unprecedented amount of donations [15:59:54] that's a possibility [15:59:59] going to turn off dedupe [16:00:07] PROBLEM - check_redis on frqueue1001 is CRITICAL: CRITICAL: donations is 12594 10000 - REDIS 2.8.17 on 127.0.0.1:6379 has 1 databases (db0) with 8 keys, up 7 days 23 hours - memory use is 14.70M (peak 14.97M, 0.23% of max, fragmentation 1.30%), connected_slaves is 2, jobs is 0, jobs-adyen is 22, jobs-paypal is 764, payments-antifraud is 567, payments-init is 869, pending is 476, recurring is 132, refund is 0, unsubscribe is 0 [16:01:48] !log disabled CiviCRM dedupe jobs [16:01:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:05:07] PROBLEM - check_redis on frqueue1001 is CRITICAL: CRITICAL: donations is 13811 10000 - REDIS 2.8.17 on 127.0.0.1:6379 has 1 databases (db0) with 8 keys, up 7 days 23 hours - memory use is 14.83M (peak 15.41M, 0.23% of max, fragmentation 1.28%), connected_slaves is 2, jobs is 0, jobs-adyen is 79, jobs-paypal is 635, payments-antifraud is 234, payments-init is 397, pending is 499, recurring is 38, refund is 0, unsubscribe is 0 [16:10:07] PROBLEM - check_redis on frqueue1001 is CRITICAL: CRITICAL: donations is 14654 10000, payments-init is 1191 1000 - REDIS 2.8.17 on 127.0.0.1:6379 has 1 databases (db0) with 8 keys, up 7 days 23 hours - memory use is 16.56M (peak 16.66M, 0.24% of max, fragmentation 1.19%), connected_slaves is 2, jobs is 0, jobs-adyen is 65, jobs-paypal is 693, payments-antifraud is 829, pending is 478, recurring is 59, refund is 0, unsubscribe [16:10:42] Damn, stopping de-dupe didn't speed it up much [16:12:09] frdb1002 is about to start yalping about replag [16:12:17] yup [16:12:31] what's with the high reads on frdb1003? [16:12:39] is that frdb-read right now? [16:13:06] no idea [16:13:08] Fundraising-Backlog: Minimum amount changes from CA donation form to Paypal - https://phabricator.wikimedia.org/T177415#3657827 (MBeat33) [16:13:09] it's 1002 again [16:14:53] ejegg: 1003 is a different OS (stretch) so it's possible that mysql config/behavior would be a little different [16:15:17] PROBLEM - check_redis on frqueue1001 is CRITICAL: CRITICAL: donations is 15612 10000 - REDIS 2.8.17 on 127.0.0.1:6379 has 1 databases (db0) with 8 keys, up 7 days 23 hours - memory use is 16.79M (peak 16.85M, 0.26% of max, fragmentation 1.26%), connected_slaves is 2, jobs is 0, jobs-adyen is 17, jobs-paypal is 672, payments-antifraud is 502, payments-init is 753, pending is 233, recurring is 82, refund is 0, unsubscribe is 0 [16:20:07] PROBLEM - check_redis on frqueue1001 is CRITICAL: CRITICAL: donations is 16820 10000 - REDIS 2.8.17 on 127.0.0.1:6379 has 1 databases (db0) with 9 keys, up 7 days 23 hours - memory use is 17.24M (peak 17.89M, 0.27% of max, fragmentation 1.29%), connected_slaves is 2, jobs is 0, jobs-adyen is 51, jobs-paypal is 698, payments-antifraud is 198, payments-init is 339, pending is 427, recurring is 129, refund is 1, unsubscribe is 0 [16:25:17] PROBLEM - check_redis on frqueue1001 is CRITICAL: CRITICAL: donations is 17558 10000, payments-init is 1116 1000 - REDIS 2.8.17 on 127.0.0.1:6379 has 1 databases (db0) with 9 keys, up 7 days 23 hours - memory use is 18.50M (peak 18.57M, 0.28% of max, fragmentation 1.24%), connected_slaves is 2, jobs is 0, jobs-adyen is 61, jobs-paypal is 599, payments-antifraud is 777, pending is 459, recurring is 69, refund is 1, unsubscribe [16:30:07] PROBLEM - check_redis on frqueue1001 is CRITICAL: CRITICAL: donations is 18282 10000 - REDIS 2.8.17 on 127.0.0.1:6379 has 1 databases (db0) with 9 keys, up 7 days 23 hours - memory use is 18.89M (peak 19.03M, 0.29% of max, fragmentation 1.24%), connected_slaves is 2, jobs is 0, jobs-adyen is 12, jobs-paypal is 644, payments-antifraud is 563, payments-init is 760, pending is 350, recurring is 103, refund is 1, unsubscribe is 0 [16:35:16] PROBLEM - check_redis on frqueue1001 is CRITICAL: CRITICAL: donations is 19380 10000 - REDIS 2.8.17 on 127.0.0.1:6379 has 1 databases (db0) with 9 keys, up 7 days 23 hours - memory use is 19.37M (peak 19.77M, 0.30% of max, fragmentation 1.27%), connected_slaves is 2, jobs is 0, jobs-adyen is 73, jobs-paypal is 682, payments-antifraud is 213, payments-init is 378, pending is 249, recurring is 130, refund is 1, unsubscribe is 0 [16:40:07] PROBLEM - check_redis on frqueue1001 is CRITICAL: CRITICAL: donations is 19780 10000, payments-init is 1222 1000 - REDIS 2.8.17 on 127.0.0.1:6379 has 1 databases (db0) with 9 keys, up 7 days 23 hours - memory use is 20.85M (peak 20.94M, 0.30% of max, fragmentation 1.20%), connected_slaves is 2, jobs is 0, jobs-adyen is 60, jobs-paypal is 643, payments-antifraud is 894, pending is 482, recurring is 150, refund is 1, unsubscribe [16:40:08] PROBLEM - check_mysql on frdb1002 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 1210 [16:45:07] PROBLEM - check_redis on frqueue1001 is CRITICAL: CRITICAL: donations is 21413 10000 - REDIS 2.8.17 on 127.0.0.1:6379 has 1 databases (db0) with 9 keys, up 7 days 23 hours - memory use is 20.92M (peak 21.13M, 0.32% of max, fragmentation 1.24%), connected_slaves is 2, jobs is 0, jobs-adyen is 16, jobs-paypal is 638, payments-antifraud is 551, payments-init is 760, pending is 475, recurring is 34, refund is 1, unsubscribe is 0 [16:45:08] PROBLEM - check_mysql on frdb1002 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 1278 [16:50:07] PROBLEM - check_redis on frqueue1001 is CRITICAL: CRITICAL: donations is 21960 10000 - REDIS 2.8.17 on 127.0.0.1:6379 has 1 databases (db0) with 9 keys, up 8 days 4 minutes - memory use is 21.40M (peak 21.93M, 0.32% of max, fragmentation 1.23%), connected_slaves is 2, jobs is 0, jobs-adyen is 58, jobs-paypal is 620, payments-antifraud is 246, payments-init is 381, pending is 385, recurring is 55, refund is 1, unsubscribe is 0 [16:50:08] PROBLEM - check_mysql on frdb1002 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 1324 [16:55:07] PROBLEM - check_redis on frqueue1001 is CRITICAL: CRITICAL: donations is 22636 10000, payments-init is 1179 1000 - REDIS 2.8.17 on 127.0.0.1:6379 has 1 databases (db0) with 9 keys, up 8 days 9 minutes - memory use is 22.58M (peak 22.76M, 0.33% of max, fragmentation 1.19%), connected_slaves is 2, jobs is 0, jobs-adyen is 61, jobs-paypal is 621, payments-antifraud is 835, pending is 452, recurring is 79, refund is 1, unsubscribe [16:55:08] PROBLEM - check_mysql on frdb1002 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 1407 [16:58:01] Fundraising Sprint Synchronized Screaming, Fundraising-Backlog, Patch-For-Review: Update postal address in Thank You email - https://phabricator.wikimedia.org/T177230#3658013 (jrobell) Hi @Ejegg I checked all the languages and these ones are the ones missing: Catalan Danish German Spanish Hebr... [17:00:17] PROBLEM - check_redis on frqueue1001 is CRITICAL: CRITICAL: donations is 23598 10000 - REDIS 2.8.17 on 127.0.0.1:6379 has 1 databases (db0) with 9 keys, up 8 days 14 minutes - memory use is 22.46M (peak 22.76M, 0.34% of max, fragmentation 1.23%), connected_slaves is 2, jobs is 0, jobs-adyen is 17, jobs-paypal is 616, payments-antifraud is 511, payments-init is 713, pending is 455, recurring is 109, refund is 1, unsubscribe is [17:00:17] PROBLEM - check_mysql on frdb1002 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 1456 [17:05:07] PROBLEM - check_redis on frqueue1001 is CRITICAL: CRITICAL: donations is 23718 10000 - REDIS 2.8.17 on 127.0.0.1:6379 has 1 databases (db0) with 8 keys, up 8 days 19 minutes - memory use is 21.05M (peak 23.12M, 0.34% of max, fragmentation 1.33%), connected_slaves is 2, jobs is 0, jobs-adyen is 58, jobs-paypal is 430, payments-antifraud is 89, payments-init is 173, pending is 73, recurring is 34, refund is 0, unsubscribe is 0 [17:05:08] PROBLEM - check_mysql on frdb1002 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 1510 [17:09:11] oops, these failed to send: ejegg why don't we save the donor_id in the paypal message? is it available when the message is saved? [17:09:11] 1:07 PM ejegg in paypal_express/var_map it looks like it's donor_id [17:09:11] 1:08 PM and it's causing issues in the orphan recity function [17:10:16] PROBLEM - check_redis on frqueue1001 is CRITICAL: CRITICAL: donations is 22356 10000 - REDIS 2.8.17 on 127.0.0.1:6379 has 1 databases (db0) with 8 keys, up 8 days 24 minutes - memory use is 19.42M (peak 23.12M, 0.34% of max, fragmentation 1.44%), connected_slaves is 2, jobs is 0, jobs-adyen is 42, jobs-paypal is 164, payments-antifraud is 203, payments-init is 346, pending is 120, recurring is 50, refund is 0, unsubscribe is 0 [17:10:17] PROBLEM - check_mysql on frdb1002 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 1605 [17:15:16] PROBLEM - check_redis on frqueue1001 is CRITICAL: CRITICAL: donations is 21704 10000 - REDIS 2.8.17 on 127.0.0.1:6379 has 1 databases (db0) with 7 keys, up 8 days 29 minutes - memory use is 18.44M (peak 23.12M, 0.34% of max, fragmentation 1.51%), connected_slaves is 2, jobs is 0, jobs-adyen is 27, jobs-paypal is 149, payments-antifraud is 102, payments-init is 190, pending is 0, recurring is 59, refund is 0, unsubscribe is 0 [17:15:17] PROBLEM - check_mysql on frdb1002 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 1648 [17:20:07] PROBLEM - check_redis on frqueue1001 is CRITICAL: CRITICAL: donations is 19970 10000 - REDIS 2.8.17 on 127.0.0.1:6379 has 1 databases (db0) with 7 keys, up 8 days 34 minutes - memory use is 17.24M (peak 23.12M, 0.33% of max, fragmentation 1.58%), connected_slaves is 2, jobs is 0, jobs-adyen is 52, jobs-paypal is 165, payments-antifraud is 45, payments-init is 85, pending is 0, recurring is 82, refund is 0, unsubscribe is 0 [17:20:07] PROBLEM - check_mysql on frdb1002 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 1699 [17:25:07] PROBLEM - check_redis on frqueue1001 is CRITICAL: CRITICAL: donations is 18476 10000 - REDIS 2.8.17 on 127.0.0.1:6379 has 1 databases (db0) with 8 keys, up 8 days 39 minutes - memory use is 16.21M (peak 23.12M, 0.33% of max, fragmentation 1.68%), connected_slaves is 2, jobs is 0, jobs-adyen is 53, jobs-paypal is 129, payments-antifraud is 123, payments-init is 243, pending is 90, recurring is 23, refund is 0, unsubscribe is 0 [17:25:08] PROBLEM - check_mysql on frdb1002 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 1785 [17:25:09] mepps So what's the error exactly? [17:25:26] also, fr-tech, any news for scrum of scrums? besides all that ^^^ [17:25:33] Express Checkout PayerID is missing. [17:25:59] i'm not sure if there's a way to make a parameter optional on our end? [17:26:04] i think we might have required it, not paypal [17:30:07] PROBLEM - check_redis on frqueue1001 is CRITICAL: CRITICAL: donations is 17201 10000 - REDIS 2.8.17 on 127.0.0.1:6379 has 1 databases (db0) with 8 keys, up 8 days 44 minutes - memory use is 15.15M (peak 23.12M, 0.33% of max, fragmentation 1.80%), connected_slaves is 2, jobs is 0, jobs-adyen is 12, jobs-paypal is 135, payments-antifraud is 80, payments-init is 176, pending is 107, recurring is 30, refund is 0, unsubscribe is 0 [17:30:08] PROBLEM - check_mysql on frdb1002 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 1824 [17:31:07] mepps it looks like we do log an info-level 'Notice missing PayerID in PaypalExpressAdapater::ProcessDonorReturn' [17:31:17] but that shouldn't stop things from proceding [17:31:26] right i'm getting that warning too [17:31:33] but then getting this error after it proceeds [17:31:47] when calling GetExpressCheckoutDetails ? [17:32:02] oops, gotta meeting for a bit! [17:32:07] okay! [17:35:16] PROBLEM - check_mysql on frdb1002 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 1797 [17:35:17] PROBLEM - check_redis on frqueue1001 is CRITICAL: CRITICAL: donations is 15870 10000 - REDIS 2.8.17 on 127.0.0.1:6379 has 1 databases (db0) with 8 keys, up 8 days 49 minutes - memory use is 14.11M (peak 23.12M, 0.33% of max, fragmentation 1.92%), connected_slaves is 2, jobs is 0, jobs-adyen is 55, jobs-paypal is 116, payments-antifraud is 32, payments-init is 80, pending is 18, recurring is 46, refund is 0, unsubscribe is 0 [17:40:16] PROBLEM - check_redis on frqueue1001 is CRITICAL: CRITICAL: donations is 14050 10000 - REDIS 2.8.17 on 127.0.0.1:6379 has 1 databases (db0) with 9 keys, up 8 days 54 minutes - memory use is 13.07M (peak 23.12M, 0.32% of max, fragmentation 2.03%), connected_slaves is 2, jobs is 0, jobs-adyen is 56, jobs-paypal is 126, payments-antifraud is 147, payments-init is 199, pending is 1, recurring is 146, refund is 2, unsubscribe is 0 [17:40:17] PROBLEM - check_mysql on frdb1002 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 1759 [17:45:04] PROBLEM - check_redis on frqueue1001 is CRITICAL: CRITICAL: donations is 13237 10000 - REDIS 2.8.17 on 127.0.0.1:6379 has 1 databases (db0) with 9 keys, up 8 days 59 minutes - memory use is 12.05M (peak 23.12M, 0.29% of max, fragmentation 2.00%), connected_slaves is 2, jobs is 0, jobs-adyen is 17, jobs-paypal is 130, payments-antifraud is 99, payments-init is 151, pending is 26, recurring is 10, refund is 2, unsubscribe is 0 [17:45:04] PROBLEM - check_mysql on frdb1002 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 1667 [17:50:13] PROBLEM - check_redis on frqueue1001 is CRITICAL: CRITICAL: donations is 11346 10000 - REDIS 2.8.17 on 127.0.0.1:6379 has 1 databases (db0) with 9 keys, up 8 days 1 hours - memory use is 10.67M (peak 23.12M, 0.30% of max, fragmentation 2.26%), connected_slaves is 2, jobs is 0, jobs-adyen is 56, jobs-paypal is 133, payments-antifraud is 27, payments-init is 69, pending is 89, recurring is 15, refund is 2, unsubscribe is 0 [17:50:13] PROBLEM - check_mysql on frdb1002 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 1580 [17:50:30] queue dropping, lag dropping [17:51:27] mepps OK, so where exactly does the error happen? do you have a stack trace or an error API response? [17:55:12] PROBLEM - check_mysql on frdb1002 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 1407 [18:00:12] PROBLEM - check_mysql on frdb1002 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 1242 [18:11:08] mepps let me know if you want to work this out via vidchat [18:13:16] okay ejegg, i might poke around for another half an hour then reach out [18:13:23] cool [18:15:12] RECOVERY - check_redis on frqueue1001 is OK: OK: REDIS 2.8.17 on 127.0.0.1:6379 has 1 databases (db0) with 9 keys, up 8 days 1 hours - memory use is 5.58M (peak 23.12M, 0.27% of max, fragmentation 3.97%), connected_slaves is 2, donations is 4372, jobs is 0, jobs-adyen is 17, jobs-paypal is 145, payments-antifraud is 94, payments-init is 143, pending is 82, recurring is 29, refund is 0, unsubscribe is 2 [18:15:24] cwd / Jeff_Green: what is the 'resolution: 1/2' tab in grafana admin? [18:16:07] no idea! [18:17:02] i find it extremely interesting that the replag chart more or less followed the pattern of the other load today [18:17:07] instead of spiraling out of control [18:21:45] d'oh, is someone editing the fundraising overview right now? [18:22:03] I'm trying to add process-control jobs and it's telling me there's an edit conflict [18:22:12] i might have, but if I did you can kill it [18:22:17] thanks! [18:35:11] RECOVERY - check_mysql on frdb1002 is OK: Uptime: 696727 Threads: 40 Questions: 66300780 Slow queries: 52154 Opens: 8304 Flush tables: 1 Open tables: 640 Queries per second avg: 95.160 Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 450 [19:19:43] Fundraising Sprint Synchronized Screaming, Fundraising-Backlog, MediaWiki-extensions-DonationInterface: PayPal EC cURL timeouts - https://phabricator.wikimedia.org/T177438#3658664 (Ejegg) [19:22:41] ejegg got sucked into a rabbit hole but this is my issue: https://developer.paypal.com/docs/classic/api/errors/#10419 [19:26:56] mepps ooh, so, we should be getting the payer ID back from the GetExpressCheckoutDetails call [19:27:02] can you tell if that's happening? [19:27:26] so i think we have to send to them in the first place, which we aren't [19:27:36] but i'll take a look at that [19:27:44] GetExpressCheckoutDetails should only require the token [19:44:40] ejegg i don't see PAYERID being returned but i also dont' think the code looks for it [19:45:07] mepps it's in the 'response' keys for that transaction [19:45:21] it should be mapped to donor_id during unstaging [19:45:39] hmm [19:45:47] so i also notice this: // Incoming parameters after returning from the PayPal workflow [19:45:47] $this->transactions['ProcessReturn'] = array( [19:45:47] 'request' => array( [19:45:47] 'token', [19:45:48] 'PayerID', [19:45:48] ), [19:45:48] ); [19:45:58] which is what i'd searched for before with that case [19:46:12] whereas in the other places it's PAYERID [19:46:44] shoot, that might be a red herring [19:46:51] it definitely wasn't returned though [19:47:06] from the GetExpressCheckoutDetails ? [19:47:19] OK, let's debug through that - I was pretty sure we should get that back [19:47:33] fr-tech are we standupping? [19:47:39] joining now [19:48:24] looking for phone as google chucked me out [19:48:46] eileen what did you do to google? [20:06:44] (PS6) Eileen: Remove duplicate spaces, html & and odd whitespace in name fields [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/382062 (https://phabricator.wikimedia.org/T175744) [20:12:56] (CR) jerkins-bot: [V: -1] Remove duplicate spaces, html & and odd whitespace in name fields [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/382062 (https://phabricator.wikimedia.org/T175744) (owner: Eileen) [20:32:29] mepps actually I've gotta grab some food too, sorry [20:37:34] okay ejegg! [20:48:44] mepps ok, got a bit of food in me, want to video chat? [20:49:44] ejegg sure! [20:50:05] k, getting back into the hangout [20:50:42] cool, i'm in there [21:11:00] ejegg i'm on the call but i'm not sure if you can hear me [21:21:21] Fundraising Sprint Synchronized Screaming, Fundraising-Backlog, Unplanned-Sprint-Work: Email click data not tracking properly - https://phabricator.wikimedia.org/T177331#3658980 (Jgreen) >>! In T177331#3656374, @jrobell wrote: > Thank you for flagging and working on this @Ejegg > > //The lines are... [21:24:19] Fundraising Sprint Synchronized Screaming, Fundraising-Backlog, Unplanned-Sprint-Work: Email click data not tracking properly - https://phabricator.wikimedia.org/T177331#3658981 (Jgreen) >>! In T177331#3658980, @Jgreen wrote: >>>! In T177331#3656374, @jrobell wrote: >> Thank you for flagging and work... [22:08:27] ejegg: I just realised that OnNotSuccesful was a recent addittion - I have had to hack local to not get errors in strict mode https://gerrit.wikimedia.org/r/#/c/380668/ [22:09:03] someone commented suggesting the protected vs public is a thing - but the change above is what make it not error for me (phpunit 4 I think) [22:19:10] (PS5) Eileen: Fix ImportMessageTest to actually test contact. [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/381923 (https://phabricator.wikimedia.org/T175744) [22:19:10] (PS7) Eileen: Remove duplicate spaces, html & and odd whitespace in name fields [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/382062 (https://phabricator.wikimedia.org/T175744) [22:25:15] eileen oh, odd, I'll check it out [22:26:27] I ccogdill are there more emails going out today? [22:30:51] ejegg: I'm just starting to read the code related to paypal timeout - it was payments-wiki/LocalSettings.php? [22:32:30] no SmashPig/local-config/paypal/main.yaml [22:33:50] eileen: it's the LocalSettings file [22:34:07] I just tried changing the timeout for all gateways [22:34:26] but it's probably not worth your time digging into the DonationInterface code [22:34:48] ejegg: ok - let me know if you think I can help [22:35:04] Will do! If the thing I did just now doesn't work, I may be stumped [22:35:07] I'm always a bit torn because I know if I work on the dedupe then I pile up MORe review work [22:35:24] !log changed DonationInterface cURL timeout for all processors to 12 sec [22:35:32] Heh, bring it on! [22:35:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:35:50] on that - if I pull a phab into the sprint but not because it's externally driven does that get unplanned sprint work flag? [22:36:02] or is that flag just to show we were having to react [22:36:04] sure, sounds right [22:36:19] Might as well flag it [22:36:23] ok [22:36:47] cdw / Jeff_Green is there a way to get tabular data out of prometheus [22:36:54] ? [22:36:57] Fundraising Sprint Synchronized Screaming, Fundraising-Backlog, Wikimedia-Fundraising-CiviCRM, Fr-CiviCRM-dedupe-FY2017/18, Unplanned-Sprint-Work: Improve dedupe handling of Country only addresses - https://phabricator.wikimedia.org/T176699#3659075 (Eileenmcnaughton) [22:37:05] I wonder if there's a problem with the timestamps I added [22:37:18] the docs say they should be milliseconds since the epoch [22:37:19] Fundraising Sprint Synchronized Screaming, Fundraising-Backlog, Wikimedia-Fundraising-CiviCRM, Fr-CiviCRM-dedupe-FY2017/18, Unplanned-Sprint-Work: Improve dedupe handling of Country only addresses - https://phabricator.wikimedia.org/T176699#3634083 (Eileenmcnaughton) Bringing this in / forwar... [22:37:23] which I think I got right [22:37:32] but nothing shows on the time graph [22:46:52] When you select table, it still limits by time, huh? [22:49:57] hmm, prometheus seems to send itself mail every minute [23:35:15] PROBLEM - check_mysql on frdb1002 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 1276 [23:35:15] PROBLEM - check_mysql on frdev1001 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 1201 [23:40:15] PROBLEM - check_mysql on frdb1002 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 1436 [23:40:15] PROBLEM - check_mysql on frdev1001 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 1369 [23:45:12] PROBLEM - check_mysql on frdb1002 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 1596 [23:45:12] PROBLEM - check_mysql on frdev1001 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 1532 [23:50:12] PROBLEM - check_mysql on frdev1001 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 1685 [23:50:12] PROBLEM - check_mysql on frdb1002 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 1746 [23:55:12] PROBLEM - check_mysql on frdev1001 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 1884 [23:55:12] PROBLEM - check_mysql on frdb1002 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 1911 [23:55:14] !log update process_control to 10161f8fc0669b900919ca643bf3987e93820065 [23:55:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log