[00:02:48] (Merged) jenkins-bot: Submodule update [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/377673 (owner: Eileen) [00:02:50] (CR) jerkins-bot: [V: -1] Update Omnimail plugin to allow setting offset [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/377663 (owner: Eileen) [00:02:52] (CR) jerkins-bot: [V: -1] Update Omnimail to use offset [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/377666 (https://phabricator.wikimedia.org/T175665) (owner: Eileen) [00:05:15] PROBLEM - check_mysql on frdb2001 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 1478 [00:07:49] hrm [00:08:56] (CR) Eileen: "recheck" [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/377663 (owner: Eileen) [00:08:58] (CR) jerkins-bot: [V: -1] Update Omnimail plugin to allow setting offset [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/377663 (owner: Eileen) [00:09:21] (CR) jerkins-bot: [V: -1] Update Omnimail to use offset [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/377666 (https://phabricator.wikimedia.org/T175665) (owner: Eileen) [00:09:57] (CR) Eileen: "recheck" [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/377663 (owner: Eileen) [00:10:06] cwd :-( [00:10:15] RECOVERY - check_mysql on frdb2001 is OK: Uptime: 987900 Threads: 1 Questions: 32695137 Slow queries: 20801 Opens: 11977 Flush tables: 1 Open tables: 603 Queries per second avg: 33.095 Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 0 [00:10:20] what is happening on the server at the moment - just that minor donation flood? [00:10:38] (PS4) Eileen: Update Omnimail plugin to allow setting offset [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/377663 [00:14:39] (CR) jerkins-bot: [V: -1] Update Omnimail plugin to allow setting offset [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/377663 (owner: Eileen) [00:16:10] (PS1) Eileen: Update Omnimail plugin to allow setting offset [wikimedia/fundraising/crm] (deployment) - https://gerrit.wikimedia.org/r/377675 [00:17:45] (CR) Eileen: [C: 2] "Having some merge issues - change was approved by Elliott here https://gerrit.wikimedia.org/r/#/c/377663/" [wikimedia/fundraising/crm] (deployment) - https://gerrit.wikimedia.org/r/377675 (owner: Eileen) [00:18:35] (Merged) jenkins-bot: Update Omnimail plugin to allow setting offset [wikimedia/fundraising/crm] (deployment) - https://gerrit.wikimedia.org/r/377675 (owner: Eileen) [00:19:37] (PS5) Eileen: Update Omnimail plugin to allow setting offset [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/377663 [00:19:39] (PS4) Eileen: Update Omnimail to use offset [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/377666 (https://phabricator.wikimedia.org/T175665) [00:19:49] (Abandoned) Eileen: Update Omnimail plugin to allow setting offset [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/377663 (owner: Eileen) [00:21:21] weird, unknown extension omnimail? [00:22:00] (PS1) Eileen: Update Omnimail plugin to allow setting offset [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/377678 [00:22:21] ejegg: now I'm in a real mess - I accidentially redid against deployment :-( [00:22:33] oh shoot... [00:22:36] https://gerrit.wikimedia.org/r/#/c/377675/ [00:22:48] I thought I was replacing the one on master & carried over your +2 [00:22:59] I think once merged into master too it will be ok [00:23:34] if zuul ever cares about https://gerrit.wikimedia.org/r/#/c/377678/ [00:23:47] can you un-abandon the other change? [00:23:49] (Restored) Eileen: Update Omnimail plugin to allow setting offset [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/377663 (owner: Eileen) [00:23:55] (PS7) Ejegg: Update Omnimail plugin to allow setting offset [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/377663 (owner: Eileen) [00:24:11] ok, I cherry-picked the one from deployment back [00:24:23] yeah - tried unabandoning - they are breeding now! [00:24:33] not sure if that helped matters! [00:24:47] (Abandoned) Eileen: [WIP] CiviCRM 4.7.23 [wikimedia/fundraising/crm/civicrm] - https://gerrit.wikimedia.org/r/366186 (owner: Eileen) [00:25:14] go zuulgo [00:25:31] (CR) jerkins-bot: [V: -1] Update Omnimail plugin to allow setting offset [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/377663 (owner: Eileen) [00:26:42] failed again - this is doing my head in [00:28:29] lemme log on to the ci box and see if it's getting the updated plugin ok [00:29:33] (PS2) Eileen: Update Omnimail plugin to allow setting offset [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/377678 [00:31:45] ugh, what the heck is the matter? [00:32:04] the source looks fine [00:33:04] (CR) Eileen: [C: 2] "approved already - copying over - zul issues" [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/377678 (owner: Eileen) [00:33:34] (PS6) Ejegg: Drop autoincrement ID and FKs on group_contact_cache [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/374588 (https://phabricator.wikimedia.org/T174404) [00:33:44] I'm trying this one instead - the other had that extra composer further down - maybe that's an issue? [00:33:46] https://gerrit.wikimedia.org/r/#/c/377678/ [00:34:19] that would be weird... [00:34:31] on the ci server the vendor dir got updated just fine [00:34:56] go zuul go [00:35:10] (CR) jerkins-bot: [V: -1] Drop autoincrement ID and FKs on group_contact_cache [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/374588 (https://phabricator.wikimedia.org/T174404) (owner: Ejegg) [00:36:16] OK, so it's not the patch's fault [00:36:31] well I suppose that's a good thing [00:36:33] let's see if the CI config got changed lately [00:36:49] doe a vendor dir update matter? [00:36:55] ie. the submodule [00:37:07] not on the master branch [00:37:12] only deploy [00:38:27] huh, there are a ton of commits to the CI repo today and yesterday [00:38:39] but nothing that looks like it should have touched the crm build [00:40:04] this one just passed https://gerrit.wikimedia.org/r/#/c/377678/ [00:40:29] fweaky [00:40:35] (Abandoned) Eileen: Update Omnimail plugin to allow setting offset [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/377663 (owner: Eileen) [00:40:40] it's on a different CI instance... [00:40:52] (PS5) Eileen: Update Omnimail to use offset [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/377666 (https://phabricator.wikimedia.org/T175665) [00:41:10] hmm - interesting [00:41:36] i can delete the source dir on the one that was failing [00:46:29] deleted [00:46:43] good luck with the rest of it eileen, I'm off for the night [00:46:56] (CR) Eileen: "recheck" [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/377666 (https://phabricator.wikimedia.org/T175665) (owner: Eileen) [00:47:06] thanks! [00:47:10] just waiting on https://gerrit.wikimedia.org/r/#/dashboard/self [00:49:10] (CR) jerkins-bot: [V: -1] Update Omnimail to use offset [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/377666 (https://phabricator.wikimedia.org/T175665) (owner: Eileen) [01:02:35] (PS1) Eileen: Update Omnimail [wikimedia/fundraising/crm/vendor] - https://gerrit.wikimedia.org/r/377684 [01:03:17] (CR) Eileen: [C: 2] "submodule update" [wikimedia/fundraising/crm/vendor] - https://gerrit.wikimedia.org/r/377684 (owner: Eileen) [01:04:58] (CR) jerkins-bot: [V: -1] Update Omnimail [wikimedia/fundraising/crm/vendor] - https://gerrit.wikimedia.org/r/377684 (owner: Eileen) [01:05:33] (PS6) Eileen: Update Omnimail to use offset [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/377666 (https://phabricator.wikimedia.org/T175665) [01:14:45] (CR) jerkins-bot: [V: -1] Update Omnimail to use offset [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/377666 (https://phabricator.wikimedia.org/T175665) (owner: Eileen) [01:27:11] (CR) Eileen: "recheck" [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/377666 (https://phabricator.wikimedia.org/T175665) (owner: Eileen) [01:35:15] PROBLEM - check_mysql on frdb2001 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 1319 [01:35:31] (CR) jerkins-bot: [V: -1] Update Omnimail to use offset [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/377666 (https://phabricator.wikimedia.org/T175665) (owner: Eileen) [01:37:01] Fundraising Sprint Far Beer, Fundraising Sprint Gondwanaland Reunification Engine, Fundraising Sprint Homebrew Hadron Collider, Fundraising Sprint Ivory Tower Defense Games, and 9 others: Errors in CiviCRM dedupe screen - https://phabricator.wikimedia.org/T160571#3602972 (Eileenmcnaughton) @Leann... [01:37:49] Fundraising-Backlog, Wikimedia-Fundraising-CiviCRM, Fr-CiviCRM-dedupe-FY2017/18, Epic: Epic: Dedupe V2: resolve top conflicts - https://phabricator.wikimedia.org/T143057#3602975 (Eileenmcnaughton) [01:37:51] Fundraising Sprint Quill Pencil, Fundraising-Backlog, Wikimedia-Fundraising-CiviCRM: Deal with diacritic conflicts on name checks - https://phabricator.wikimedia.org/T149763#3602973 (Eileenmcnaughton) Open>Resolved I guess I'm the only one tracking this & the task I intended is done. It might... [01:38:02] Fundraising-Backlog, Wikimedia-Fundraising-CiviCRM, Fr-CiviCRM-dedupe-FY2017/18, Epic: Epic: Dedupe V2: resolve top conflicts - https://phabricator.wikimedia.org/T143057#2555439 (Eileenmcnaughton) [01:38:04] Fundraising Sprint Quill Pencil, Fundraising-Backlog, Wikimedia-Fundraising-CiviCRM, Fr-CiviCRM-dedupe-FY2017/18, Patch-For-Review: Update wmf_civicrm import normalisation to set city to NULL when it is ‘0’ or ‘City/Town’ or ‘NoCity’ - https://phabricator.wikimedia.org/T174980#3602976 (Eileenm... [01:40:15] RECOVERY - check_mysql on frdb2001 is OK: Uptime: 993300 Threads: 1 Questions: 33712885 Slow queries: 20979 Opens: 12084 Flush tables: 1 Open tables: 603 Queries per second avg: 33.940 Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 0 [01:40:21] (CR) Eileen: "Should be picking up commit from composer -" [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/377666 (https://phabricator.wikimedia.org/T175665) (owner: Eileen) [01:40:26] (CR) Eileen: "recheck" [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/377666 (https://phabricator.wikimedia.org/T175665) (owner: Eileen) [01:42:11] (CR) jerkins-bot: [V: -1] Update Omnimail to use offset [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/377666 (https://phabricator.wikimedia.org/T175665) (owner: Eileen) [02:23:50] (PS1) Eileen: Another attempt - Update Omnimail plugin to allow setting offset [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/377692 [02:24:12] (CR) Eileen: [C: 2] "previously approved - just playing zuul games" [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/377692 (owner: Eileen) [02:25:00] (PS2) Eileen: Another attempt - Update Omnimail plugin to allow setting offset [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/377692 [02:25:02] (PS7) Eileen: Update Omnimail to use offset [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/377666 (https://phabricator.wikimedia.org/T175665) [02:28:49] (CR) jerkins-bot: [V: -1] Update Omnimail to use offset [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/377666 (https://phabricator.wikimedia.org/T175665) (owner: Eileen) [02:34:15] (CR) Eileen: "recheck" [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/377666 (https://phabricator.wikimedia.org/T175665) (owner: Eileen) [02:41:22] (CR) Eileen: "recheck" [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/377666 (https://phabricator.wikimedia.org/T175665) (owner: Eileen) [02:43:22] eileen: howdy [02:44:09] cwd hi [02:44:24] only just got the change to the job merged so am about to deploy [02:44:38] righteous [02:45:08] i am around if you want to try anything [02:45:25] or be regaled with tales of what we have been testing [02:46:22] (PS1) Eileen: Merge branch 'master' of https://gerrit.wikimedia.org/r/wikimedia/fundraising/crm into deployment [wikimedia/fundraising/crm] (deployment) - https://gerrit.wikimedia.org/r/377696 [02:46:54] thanks cwd [02:50:08] (PS1) Eileen: Update vendor submodule, latest Omnimail [wikimedia/fundraising/crm] (deployment) - https://gerrit.wikimedia.org/r/377698 [02:50:17] (CR) Eileen: [C: 2] Merge branch 'master' of https://gerrit.wikimedia.org/r/wikimedia/fundraising/crm into deployment [wikimedia/fundraising/crm] (deployment) - https://gerrit.wikimedia.org/r/377696 (owner: Eileen) [02:50:37] (CR) jerkins-bot: [V: -1] Merge branch 'master' of https://gerrit.wikimedia.org/r/wikimedia/fundraising/crm into deployment [wikimedia/fundraising/crm] (deployment) - https://gerrit.wikimedia.org/r/377696 (owner: Eileen) [02:53:17] (PS2) Eileen: Update vendor submodule, latest Omnimail [wikimedia/fundraising/crm] (deployment) - https://gerrit.wikimedia.org/r/377698 [02:54:06] (CR) Eileen: [C: 2] Update vendor submodule, latest Omnimail [wikimedia/fundraising/crm] (deployment) - https://gerrit.wikimedia.org/r/377698 (owner: Eileen) [02:54:27] eileen: are you having trouble with CI? [02:54:36] :-( [02:54:51] (Merged) jenkins-bot: Update vendor submodule, latest Omnimail [wikimedia/fundraising/crm] (deployment) - https://gerrit.wikimedia.org/r/377698 (owner: Eileen) [02:54:59] i have hardly even dealt with jenkins lately [02:54:59] I think I'm near the end of it - there were also some mistakes in commits & order & I'm a bit confused [02:55:13] (Abandoned) Eileen: Merge branch 'master' of https://gerrit.wikimedia.org/r/wikimedia/fundraising/crm into deployment [wikimedia/fundraising/crm] (deployment) - https://gerrit.wikimedia.org/r/377696 (owner: Eileen) [02:56:16] (PS1) Eileen: Merge branch 'master' of https://gerrit.wikimedia.org/r/wikimedia/fundraising/crm into deployment [wikimedia/fundraising/crm] (deployment) - https://gerrit.wikimedia.org/r/377699 [02:57:02] the thing I'm mostly confused about is this commit https://gerrit.wikimedia.org/r/#/c/377673/ - I can't see where I merged to deployment but it seems merged [02:57:14] (CR) Eileen: [C: 2] Merge branch 'master' of https://gerrit.wikimedia.org/r/wikimedia/fundraising/crm into deployment [wikimedia/fundraising/crm] (deployment) - https://gerrit.wikimedia.org/r/377699 (owner: Eileen) [02:57:51] anyway I'm good to roll I think [02:59:00] i trust your judgment :) [02:59:05] the submodules are confusing [02:59:21] very much [02:59:51] I was going to NOT deploy that yet - in case it helps the symptoms / makes it harder to replicate them. But not worrying about that now [03:00:32] !log update civicrm from ee7dda38ed7071d3100907d11c2a77e82a2a852c to 187238fccb2b59378dececac7055a5320ce0269b [03:00:35] eileen: the omnimail offset? [03:00:41] or did i miss something [03:00:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [03:00:55] no the civicrm submodule commit which is in the mix [03:01:08] (it reduces clearing of cache tables - but probably not that much) [03:02:57] eileen: is it from core or special for us? [03:03:05] can you link me? [03:03:51] cwd https://gerrit.wikimedia.org/r/#/c/377386/ [03:04:10] it is mostly code clean up but some reduction in cache clearing [03:04:27] notably a possible reduction on the add to group & remove from group actions [03:09:07] eileen: cool, seems worth trying to me [03:09:54] (CR) Legoktm: "Yay! Congrats!" [extensions/DonationInterface] - https://gerrit.wikimedia.org/r/371770 (https://phabricator.wikimedia.org/T170314) (owner: Mepps) [03:11:27] cwd it's running now - if you do drush cvapi setting.get return=omnimail_omnigroupmembers_load you can now see an offset [03:11:52] once [03:11:53] select count(*) FROM civicrm_group_contact WHERE group_id = 310; [03:12:08] returns more than 335605 we'll be at the part of the file with new names [03:15:16] & there is a wee way to go [03:21:50] eileen: heh db load seems to be going down? [03:23:42] cwd -no contacts added the group yet - I think it's just re-processing old rows [03:23:51] but, in theory next time it won't start from 0 [03:23:58] Actually I'll kill it now & test that! [03:24:05] ah right on [03:24:14] oh it disconnected that terminal.. [03:24:19] mandatory test :-) [03:24:47] heh yeah the ssh timeout has gotten more aggressive lately [03:26:23] ok def incrementing still [03:26:35] & not creating new rows yet [03:30:50] & still not … 89231 rows processed [03:31:02] (which is obviously WHY we are doing this) [03:32:15] eileen: is it keeping a marker of where it is at? [03:33:04] yep 98658 rows in [03:33:30] drush cvapi setting.get return=omnimail_omnigroupmembers_load gives the offset [03:33:54] eileen: sorry i meant, is that the new change? will it start from the same spot next time? [03:34:04] yep - that's what I was trying to deploy [03:34:13] groovy [03:34:34] i am assuming it spends a lot of time searching vs updating? [03:34:35] since our biggest issue seemed to be the apples vs oranges issue [03:34:53] no - I think it's just that we have a csv that is say 500k rows [03:34:59] & we have processed 300k [03:35:03] without tracking offset [03:35:19] so, we now need to establish the offset before we can usefully proceed [03:35:49] but what is the biggest time sink? [03:35:52] Well - yes - establishing the offset will require lots of lookups [03:35:59] like if it was just inserting it should take no time [03:36:13] but it has to see if there is a duplicate in the db, something like that? [03:36:48] So, the process is that it does an email look up & then calls the civicrm api to create a contact, and email & add to a group [03:36:59] if the email already exists it skips to the next row [03:37:21] (currenltly it's still re-processing previously processed so each row is a search+ a skip) [03:37:22] gotcha, but all that stuff is cpu time consuming [03:37:36] yeah I expect so - are you seeing load? [03:37:53] I mean similar time consuming to our other jobs I would imagine [03:38:36] yeah [03:38:42] no not really abnormal load [03:39:27] 123369 rows processed now & they are all ones previously done it seems.... [03:39:44] heh we did run this thing for awhile [03:40:05] :-) [03:41:06] they put up the english banners tomorrow morning, presumably we will see more lag [03:42:04] but i have turned off dedupe et all over the last few weeks and not seen a significant reduction in replag [03:46:13] no it doesn't seem to be the thing does it? [03:46:23] my gut is that there is some mysql tuning paramter that will help - but I know not what [03:46:55] Where is our config documented? puppet I guess? [03:47:29] 154956.... [03:47:42] yep generally you should see everything in the puppet repo [03:48:12] what is the link to that repo? [03:49:01] eileen: other channel... [04:04:46] 222180 rows processed [04:15:15] PROBLEM - check_mysql on frdb2001 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 1554 [04:20:15] RECOVERY - check_mysql on frdb2001 is OK: Uptime: 1002900 Threads: 1 Questions: 36316522 Slow queries: 21282 Opens: 12264 Flush tables: 1 Open tables: 603 Queries per second avg: 36.211 Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 101 [04:23:57] cwd - above doesn't obviously relate to anything :-( [04:24:18] heh nope [04:24:31] haven't had any success creating lag this week [04:24:48] it's been a frustratingly inconsistent problem [04:25:17] anyway i gotta go to sleep [04:25:22] night! [04:25:55] email me if you find anything interesting? otherwise we'll be monitoring the chaos USA time :) [04:26:02] :-) [04:26:30] g'night! [14:36:11] (PS1) Brian Wolff: Make CentralNotice work on sqlite [extensions/CentralNotice] - https://gerrit.wikimedia.org/r/377775 [14:45:22] (PS7) Mepps: WIP getHostedCheckoutStatus [extensions/DonationInterface] - https://gerrit.wikimedia.org/r/366167 (https://phabricator.wikimedia.org/T163948) (owner: Ejegg) [15:49:27] (CR) Mepps: WIP getHostedCheckoutStatus (2 comments) [extensions/DonationInterface] - https://gerrit.wikimedia.org/r/366167 (https://phabricator.wikimedia.org/T163948) (owner: Ejegg) [15:50:53] Hi cwd, MBeat : we're going to extend today's Big English test by 30 minutes as we had reduced traffic at the start due to clashes with WLM campaigns [15:51:04] thanks pcoombe [15:51:16] New end time is 16:30 UTC [15:51:47] pcoombe: groovy, thanks [15:55:13] PROBLEM - check_mysql on frdb2001 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 1219 [15:57:01] welp [15:57:11] parallel replication doesn't seem to make a difference [16:00:13] PROBLEM - check_mysql on frdb2001 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 1519 [16:05:13] PROBLEM - check_mysql on frdb2001 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 1819 [16:10:13] PROBLEM - check_mysql on frdb2001 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 2120 [16:10:54] ejegg: here is evidence that parallel replication does not help, might even hurt [16:11:15] for our purposes that is [16:15:13] PROBLEM - check_mysql on frdb2001 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 2419 [16:17:11] cwd oh huh [16:17:26] ejegg: my guess is that there must be overhead from whatever supervisor it uses to see what queries can be batched [16:17:36] so it would take a specific load profile for it to be faster [16:17:44] yeah, that's got to be pretty complicated [16:18:34] my biggest takeaway about dbs so far is it all depends on how you are using them [16:18:40] right, and the load profile we've got, where a few tables see most of the action, is probably terrible for parallelizing [16:18:55] yeah with a lot of lock contention [16:18:57] sounds right [16:19:13] this board is getting more useful: https://grafana.wikimedia.org/dashboard/db/frack-db?orgId=1&from=1505265662588&to=now [16:19:48] so, I know we've merged a couple preliminaries to making the cache tables less crazy [16:20:05] oh yeah? deployed? [16:20:13] PROBLEM - check_mysql on frdb2001 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 2720 [16:20:23] I think it's just laying the groundwork [16:20:54] i.e., adding a parameter to API calls that lets you say "don't mess with the cache tables yet, I'm doing 100 things" [16:21:05] and we haven't actually changed any of our jobs to use that [16:21:17] ah interesting [16:21:42] It may be applicable to the recipient import job [16:22:14] Looks like if you call the api directly from the command line it turn off cache flapping, does the call, then refreshes the caches [16:23:15] k, i'mma figure out why my last attempt to retry deadlocked inserts didn't seem to do anything [16:23:17] fwiw there are different degress of parallel for replication, we're experimenting with the least disruptive which is to have parallel workers on the slaves. it's also possible to replicated different "content" separately from the master, so for example you'd replicate the civi+drupal db's separate from the web statistics db, since they can be out of sync, and that can help if the problem is one of those blocking the other [16:23:59] ah, cool [16:24:50] the general feedback I'm getting from the dba's is that it doesn't look like something we can tune our way out of from the database side [16:25:13] PROBLEM - check_mysql on frdb2001 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 3019 [16:26:30] well, there was the idea of artificially slowing down the master actually :-P [16:27:50] i just bumped slave worker threads from 2 to 4 on frdb2001, to see if it helps or hurts [16:30:13] PROBLEM - check_mysql on frdb2001 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 1611 [16:32:16] i may be eating my words ^^^ [16:32:37] holy cow [16:32:54] woo! this is exciting [16:33:36] https://grafana.wikimedia.org/dashboard/db/frack-db?orgId=1&from=1505265662588&to=now <- notice the drop in lag, and corresponding spike in queries on 2001 [16:35:13] RECOVERY - check_mysql on frdb2001 is OK: Uptime: 5405 Threads: 2 Questions: 2361195 Slow queries: 174 Opens: 88 Flush tables: 1 Open tables: 231 Queries per second avg: 436.853 Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 0 [16:35:17] it could be that 2 workers wasn't enough to make whatever performance ding from threading worthwhile [16:36:34] yeah maybe so [16:37:34] well damn, i'm guardedly optimistic [16:38:14] now i hope another server lags and the same fix works [16:44:13] whoa, cool! [16:48:42] Jeff_Green: replag is climbing again... and we have had some success just restarting the process before [16:49:22] i didn't restart mysql though, just the stopped/started replication [16:49:47] o rly [16:49:52] yup [17:00:13] PROBLEM - check_mysql on frdb2001 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 1221 [17:04:59] damn it [17:05:13] PROBLEM - check_mysql on frdb2001 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 1520 [17:05:25] (CR) Umherirrender: Add phpcs script (2 comments) [extensions/DonationInterface] - https://gerrit.wikimedia.org/r/371770 (https://phabricator.wikimedia.org/T170314) (owner: Mepps) [17:07:46] Fundraising-Backlog, MediaWiki-extensions-DonationInterface: Update DonationInterface code style for php5.5 - https://phabricator.wikimedia.org/T166613#3605036 (Umherirrender) [17:07:49] it's a weird state, there must be some hidden process blocking replicaiton, I'm tracking it down [17:10:13] PROBLEM - check_mysql on frdb2001 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 1821 [17:15:10] PROBLEM - check_mysql on frdb2001 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 2120 [17:20:10] PROBLEM - check_mysql on frdb2001 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 2420 [17:25:10] PROBLEM - check_mysql on frdb2001 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 2720 [17:28:42] fr-tech any news for scrum of scrums [17:28:43] ? [17:30:10] PROBLEM - check_mysql on frdb2001 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 3020 [17:35:20] PROBLEM - check_mysql on frdb2001 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 3320 [17:40:20] PROBLEM - check_mysql on frdb2001 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 2997 [17:45:13] RECOVERY - check_mysql on frdb2001 is OK: Uptime: 9606 Threads: 3 Questions: 3011795 Slow queries: 455 Opens: 159 Flush tables: 1 Open tables: 233 Queries per second avg: 313.532 Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 0 [18:20:20] fundraising-tech-ops, Operations: Enumerate remaining unported stats - https://phabricator.wikimedia.org/T175850#3605313 (cwdent) [18:42:04] Fundraising-Backlog, fundraising-tech-ops, Operations: Enumerate remaining unported stats - https://phabricator.wikimedia.org/T175850#3605447 (DStrine) [19:00:10] PROBLEM - check_mysql on frdb2001 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 1262 [19:01:43] * cwd shakes fist at sky [19:01:47] hmm weird, my updates on https://gerrit.wikimedia.org/r/#/c/375454/ didn't seem to sync here, but XenoRyet, I suggested you might want to take a look at that one too? [19:02:08] cwd let me know if you want to talk out any of the mysql stuff [19:03:39] Thanks for the rebase mepps - I'll rebase the following one [19:05:20] PROBLEM - check_mysql on frdb2001 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 1562 [19:06:12] (PS1) Ejegg: Fix case on AstroPay and PayPal legacy UI modules [extensions/DonationInterface] - https://gerrit.wikimedia.org/r/377822 (https://phabricator.wikimedia.org/T173869) [19:08:43] cool ejegg, i +2ed that but for some reaosn my updates aren't translating to irc [19:09:02] weird! [19:09:11] yeah! [19:09:30] I feel like a ghost [19:10:10] PROBLEM - check_mysql on frdb2001 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 1862 [19:11:03] huh, looks like it's doing the same thing to me. [19:11:16] oh hey, looks like phpcs isn't failing things even with Errors [19:11:25] is it even running in CI ? [19:11:56] mepps: sure! [19:12:02] https://grafana-admin.wikimedia.org/dashboard/db/frack-jeff-db?orgId=1&from=now-1h&to=now [19:12:25] ahh, needs to be added to phpcs [19:12:30] I mean to test [19:12:32] check out how frdb2001 seems to quit doing anything at the same time as the lag starts [19:13:01] there has got to be an option to consistently color things across the charts [19:13:46] if that doesn't work take off the -admin part [19:15:08] PROBLEM - check_mysql on frdb2001 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 2163 [19:15:25] oh i do see what you mean about the timing [19:16:59] and the coloring! [19:17:28] Jeff_Green: mepps this is also interesting: https://grafana.wikimedia.org/dashboard/db/frack-db?orgId=1&from=1505265662588&to=now [19:17:40] look how dirty pages is actually climbing *after* the lag is going [19:18:04] would something like index recalculation cause that? [19:18:37] pause in innodb activity + pause of replication + filling up memory [19:19:53] no idea [19:20:09] PROBLEM - check_mysql on frdb2001 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 2462 [19:21:38] cwd i might be reading this wrong but it looks like the dirty pages start climbing 15 minutes before the lag starts--not sure if that means anything [19:25:01] mepps: you are right, the colors got me again [19:25:18] PROBLEM - check_mysql on frdb2001 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 2762 [19:25:35] yeah so it's like it's not doing anything [19:29:22] cwd i'm about to be in retro for the rest of my day but happy to jump on a call about this tomorrow--sometimes i know it can be helpful to speak out loud :) [19:30:08] PROBLEM - check_mysql on frdb2001 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 3062 [19:31:55] mepps: sounds great [19:32:10] have a good retro! [19:35:18] PROBLEM - check_mysql on frdb2001 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 3362 [19:40:08] PROBLEM - check_mysql on frdb2001 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 2283 [19:45:16] RECOVERY - check_mysql on frdb2001 is OK: Uptime: 16805 Threads: 2 Questions: 3954111 Slow queries: 2232 Opens: 347 Flush tables: 1 Open tables: 242 Queries per second avg: 235.293 Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 0 [19:45:31] cwd replag- what is happening on server? [19:45:37] https://grafana.wikimedia.org/dashboard/db/frack-db?orgId=1 down [19:45:46] ah sorry i've been messing with it [19:45:52] eileen: check this out though [19:45:56] https://grafana.wikimedia.org/dashboard/db/frdb2001?orgId=1 [19:46:05] it seems to be doing nothing for long periods [19:46:41] does it just go to sleep when nothing's using it? [19:47:08] are there any notable forms of load [19:49:11] Fundraising Sprint Quill Pencil, Fundraising-Backlog, Patch-For-Review: how to get back to A/B testing? - https://phabricator.wikimedia.org/T173869#3605705 (DStrine) [19:56:59] cwd invited you to a talk when eileen's up tomorrow and with XenoRyet wants to be involved, but i can also talk earlier in the day [19:59:48] cool! [19:59:56] hopefully have some more data by then [19:59:59] Fundraising Sprint Quill Pencil, Fundraising-Backlog, MediaWiki-extensions-CentralNotice, Unplanned-Sprint-Work: Investigate email: BH data storage/transfer issue for iPad donations - https://phabricator.wikimedia.org/T174719#3605740 (DStrine) [20:00:36] It's doing my head in - why now? Is silverpop export running? [20:03:36] Fundraising Sprint Quill Pencil, Fundraising-Backlog, Wikimedia-Fundraising-CiviCRM: Review reducing smartGroupCache clearing per https://issues.civicrm.org/jira/browse/CRM-21109 - https://phabricator.wikimedia.org/T174408#3560702 (Eileenmcnaughton) Open>Resolved [20:07:50] cwd mysql Slave_parallel_workers is 0 on prod but I assume not 0 on the slaves? [20:09:00] Fundraising Sprint Prank Seatbelt, Fundraising Sprint Quill Pencil, Fundraising-Backlog, Wikimedia-Fundraising-CiviCRM, FR-2016-17-Q2-Campaign-Support: Periodically run Civi contact import performance tests, note trends - https://phabricator.wikimedia.org/T146338#3605755 (DStrine) Open>... [20:09:46] eileen: some of the slaves have >0 [20:10:10] the one that has been lagging has 4 :-\ [20:10:16] PROBLEM - check_mysql on frdb2001 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 1302 [20:10:51] when jeff first turned it on it seemed to knock the lag down [20:10:55] and we were like yay [20:10:58] but then it came back [20:11:21] Fundraising Sprint Navel Warfare, Fundraising Sprint Outie Inverter, Fundraising Sprint Prank Seatbelt, Fundraising Sprint Quill Pencil, and 4 others: Are PayPal refunds for recurring donations incorrectly being tagged as EC or vice versa? - https://phabricator.wikimedia.org/T171351#3605769 (XenoR... [20:12:08] :- [20:12:45] cwd ok so there are slightly different settings on the slaves? or is that the only one? [20:13:02] that is an older machine than the others [20:13:22] so the settings must be at least a little different, i'll try to find out how much [20:13:46] Fundraising-Backlog: Clean up Damaged queue - https://phabricator.wikimedia.org/T175862#3605778 (XenoRyet) [20:14:24] Fundraising Sprint Outie Inverter, Fundraising Sprint Prank Seatbelt, Fundraising-Backlog, Wikimedia-Fundraising-CiviCRM, FR-Ingenico: Deal with recurring donations stuck in 'In Progress' status - https://phabricator.wikimedia.org/T171868#3605797 (Ejegg) p:Triage>Normal [20:15:16] PROBLEM - check_mysql on frdb2001 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 1602 [20:15:17] cwd I feel like it would be good to have a table comparing the boxes with config + any other differences [20:16:10] Fundraising-Backlog, fundraising-tech-ops, Operations, Spike: Spike: Enumerate remaining unported stats - https://phabricator.wikimedia.org/T175850#3605804 (DStrine) [20:19:22] Fundraising-Backlog, Wikimedia-Fundraising-CiviCRM, FR-Email: Omnimail recipient load (silently) broken - https://phabricator.wikimedia.org/T175394#3592379 (DStrine) [20:20:16] PROBLEM - check_mysql on frdb2001 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 1902 [20:23:40] Fundraising-Backlog, Wikimedia-Fundraising-CiviCRM, FR-Email: Reset on_hold in wmf_civicrm_message_email_update - https://phabricator.wikimedia.org/T170350#3427980 (DStrine) [20:24:12] Fundraising-Backlog, Wikimedia-Fundraising-CiviCRM: CiviCRM dedupe jobs should gracefully time out - https://phabricator.wikimedia.org/T172303#3605829 (Ejegg) The other possibility is to add an setting like ignore_overtime to the process-control job description, which would suppress failmail when the pre... [20:24:30] Fundraising-Backlog, Wikimedia-Fundraising-CiviCRM, Fr-CiviCRM-dedupe-FY2017/18: Update wmf_civicrm import normalisation to replace htmlampersand with & - https://phabricator.wikimedia.org/T175744#3605830 (Eileenmcnaughton) a:Eileenmcnaughton [20:25:16] RECOVERY - check_mysql on frdb2001 is OK: Uptime: 262 Threads: 1 Questions: 21065 Slow queries: 9 Opens: 194 Flush tables: 1 Open tables: 220 Queries per second avg: 80.400 Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 0 [20:25:55] Fundraising Sprint Kickstopper, Wikimedia-Fundraising-CiviCRM, FR-Paypal, FR-WMF-Audit: Follow up with Paypal on audit regeneration, enable parser - https://phabricator.wikimedia.org/T167828#3605831 (Ejegg) [20:26:10] eileen: i kicked mysql and it recovered again [20:26:54] So in terms of analysis this time ONLY frdb2001 lagged & a server restart worked? [20:27:02] Fundraising-Backlog, fundraising-tech-ops, Operations, Spike: Spike: Enumerate remaining unported stats - https://phabricator.wikimedia.org/T175850#3605833 (DStrine) a:cwdent>None [20:27:59] Fundraising Sprint Quill Pencil, Fundraising Sprint R 2017, Fundraising-Backlog, Patch-For-Review: Establish methodology for creating load to replicate replag - https://phabricator.wikimedia.org/T175665#3605842 (DStrine) [20:28:01] Fundraising Sprint R 2017, Fundraising-Backlog, Wikimedia-Fundraising-CiviCRM, FR-Email: Omnimail recipient load (silently) broken - https://phabricator.wikimedia.org/T175394#3605843 (DStrine) [20:28:03] Fundraising Sprint Quill Pencil, Fundraising Sprint R 2017, Fundraising-Backlog, MediaWiki-extensions-CentralNotice, and 2 others: CN Campaign Suppression prior to scheduled start time - https://phabricator.wikimedia.org/T175358#3605845 (DStrine) [20:28:05] Fundraising Sprint Quill Pencil, Fundraising Sprint R 2017, Fundraising-Backlog, Wikimedia-Fundraising-CiviCRM, and 2 others: Find source of unlimited dedupe queries, prevent them - https://phabricator.wikimedia.org/T175382#3605844 (DStrine) [20:28:06] cwd I did read something that suggested there is an ordering thing that affects seconds behind & it might be there are 800 things & 700 have been processed but the first one might not be so it shows as 800 behind [20:28:07] Fundraising Sprint Quill Pencil, Fundraising Sprint R 2017, Fundraising-Backlog, MediaWiki-extensions-CentralNotice, Unplanned-Sprint-Work: Investigate email: BH data storage/transfer issue for iPad donations - https://phabricator.wikimedia.org/T174719#3605846 (DStrine) [20:28:09] Fundraising Sprint Quill Pencil, Fundraising Sprint R 2017, Fundraising-Backlog, Wikimedia-Fundraising-CiviCRM: Create lists of CiviCRM groups to allow MG & DS to review with a view to tidy up - https://phabricator.wikimedia.org/T174407#3605847 (DStrine) [20:28:11] Fundraising Sprint Quill Pencil, Fundraising Sprint R 2017, Fundraising-Backlog, MW-1.30-release-notes (WMF-deploy-2017-09-19 (1.30.0-wmf.19)), Patch-For-Review: how to get back to A/B testing? - https://phabricator.wikimedia.org/T173869#3605849 (DStrine) [20:28:14] Fundraising Sprint Prank Seatbelt, Fundraising Sprint Quill Pencil, Fundraising Sprint R 2017, Fundraising-Backlog, and 2 others: Drop IDs on cache tables - https://phabricator.wikimedia.org/T174404#3605848 (DStrine) [20:28:16] Fundraising Sprint Prank Seatbelt, Fundraising Sprint Quill Pencil, Fundraising Sprint R 2017, Fundraising-Backlog, MediaWiki-extensions-CentralNotice: CentralNotice: Prevent duplicate campaign names due to race condition - https://phabricator.wikimedia.org/T173866#3605850 (DStrine) [20:28:18] Fundraising Sprint R 2017, Fundraising-Backlog, MediaWiki-extensions-DonationInterface: Message with blank payment_method and amount = 0 sent to payments-init - https://phabricator.wikimedia.org/T173347#3605851 (DStrine) [20:28:22] Fundraising Sprint R 2017, Fundraising-Backlog, Wikimedia-Fundraising-CiviCRM: CiviCRM dedupe jobs should gracefully time out - https://phabricator.wikimedia.org/T172303#3605852 (DStrine) [20:28:23] Fundraising Sprint Navel Warfare, Fundraising Sprint Outie Inverter, Fundraising Sprint Prank Seatbelt, Fundraising Sprint Quill Pencil, and 4 others: Create orphan rectifier for PayPal Express Checkout - https://phabricator.wikimedia.org/T172202#3605853 (DStrine) [20:28:26] Fundraising Sprint Outie Inverter, Fundraising Sprint Prank Seatbelt, Fundraising Sprint R 2017, Fundraising-Backlog, and 2 others: Deal with recurring donations stuck in 'In Progress' status - https://phabricator.wikimedia.org/T171868#3605854 (DStrine) [20:28:29] Fundraising Sprint R 2017, Fundraising-Backlog, Wikimedia-Fundraising-CiviCRM, FR-Email: Reset on_hold in wmf_civicrm_message_email_update - https://phabricator.wikimedia.org/T170350#3605858 (DStrine) [20:28:31] Fundraising Sprint Navel Warfare, Fundraising Sprint Outie Inverter, Fundraising Sprint Prank Seatbelt, Fundraising Sprint Quill Pencil, and 4 others: Are we losing transactions witih repeated ct_id? - https://phabricator.wikimedia.org/T171349#3605856 (DStrine) [20:28:32] Fundraising Sprint Navel Warfare, Fundraising Sprint Outie Inverter, Fundraising Sprint Prank Seatbelt, Fundraising Sprint Quill Pencil, and 4 others: Populate country column when creating c_t rows during offline import - https://phabricator.wikimedia.org/T171658#3605855 (DStrine) [20:28:34] Fundraising Sprint Kickstopper, Fundraising Sprint Loose Lego Carpeting, Fundraising Sprint Murphy's Lawyer, Fundraising Sprint Navel Warfare, and 6 others: process-control should not crash on bad utf-8 from stdout or stderr - https://phabricator.wikimedia.org/T167849#3605861 (DStrine) [20:28:36] Fundraising Sprint Quill Pencil, Fundraising Sprint R 2017, Fundraising-Backlog, MediaWiki-extensions-CentralNotice, and 3 others: WMDE banners failing to save - Timing out on save - https://phabricator.wikimedia.org/T170591#3605857 (DStrine) [20:28:40] Fundraising Sprint Murphy's Lawyer, Fundraising Sprint Navel Warfare, Fundraising Sprint Outie Inverter, Fundraising Sprint Prank Seatbelt, and 7 others: Implement Ingenico Connect API calls to get payment status - https://phabricator.wikimedia.org/T163948#3605862 (DStrine) [20:28:42] Fundraising Sprint Loose Lego Carpeting, Fundraising Sprint Murphy's Lawyer, Fundraising Sprint Navel Warfare, Fundraising Sprint Outie Inverter, and 6 others: CentralNotice: Add controls to purge banner content in Varnish for a specific language - https://phabricator.wikimedia.org/T168673#3605860 (... [20:28:46] Fundraising Sprint Navel Warfare, Fundraising Sprint Outie Inverter, Fundraising Sprint Prank Seatbelt, Fundraising Sprint Quill Pencil, and 5 others: Import email-only contacts from 'remind me later' links into CiviCRM - https://phabricator.wikimedia.org/T160949#3605863 (DStrine) [20:28:48] Fundraising Sprint Quill Pencil, Fundraising Sprint R 2017, Fundraising-Backlog, FR-Ingenico, and 3 others: spike: investigate creating an ingenico form with no city and state - https://phabricator.wikimedia.org/T151769#3605864 (DStrine) [20:33:42] cwd where can I see "Ganglia metrics make it look like something happened in late June that has had all the DB servers basically out of memory since then." [20:36:39] Eileen: ah that was my mistake, a kernel update confused me [20:37:42] it started using free memory for cache [20:40:00] cwd ok - but do we have a graph indication this issue over time or only once it became an issue [20:40:17] (which is tricky because it arose when we started hammering the db) [21:06:07] (PS1) Ejegg: Fix remaining phpcs warnings, add to composer test [extensions/DonationInterface] - https://gerrit.wikimedia.org/r/377895 [21:07:26] (PS1) Ejegg: Fix case on AstroPay and PayPal legacy UI modules [extensions/DonationInterface] (deployment) - https://gerrit.wikimedia.org/r/377896 (https://phabricator.wikimedia.org/T173869) [21:07:31] (CR) Ejegg: [C: 2] Fix case on AstroPay and PayPal legacy UI modules [extensions/DonationInterface] (deployment) - https://gerrit.wikimedia.org/r/377896 (https://phabricator.wikimedia.org/T173869) (owner: Ejegg) [21:08:36] (Merged) jenkins-bot: Fix case on AstroPay and PayPal legacy UI modules [extensions/DonationInterface] (deployment) - https://gerrit.wikimedia.org/r/377896 (https://phabricator.wikimedia.org/T173869) (owner: Ejegg) [21:11:53] (PS2) Ejegg: Fix remaining phpcs warnings, add to composer test [extensions/DonationInterface] - https://gerrit.wikimedia.org/r/377895 [21:14:45] (PS1) Ejegg: Update DonationInterface submodule [core] (fundraising/REL1_27) - https://gerrit.wikimedia.org/r/377898 [21:14:51] (CR) Ejegg: [C: 2] Update DonationInterface submodule [core] (fundraising/REL1_27) - https://gerrit.wikimedia.org/r/377898 (owner: Ejegg) [21:20:11] (Merged) jenkins-bot: Update DonationInterface submodule [core] (fundraising/REL1_27) - https://gerrit.wikimedia.org/r/377898 (owner: Ejegg) [21:23:50] !log updated payments-wiki from ed2e4811067ad3d74433e8d7f4944ae838af6e28 to 26b16eadc119e3edddec83ac0e680ac60c43ecaf [21:24:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:31:31] (PS1) Ejegg: Standardize case of ext.donationInterface [extensions/DonationInterface] - https://gerrit.wikimedia.org/r/377899 [21:33:08] (CR) jerkins-bot: [V: -1] Fix remaining phpcs warnings, add to composer test [extensions/DonationInterface] - https://gerrit.wikimedia.org/r/377895 (owner: Ejegg) [21:58:51] fundraising-tech-ops: Sync 'variants' directory from settings repo in f_c_u -p payments-wiki - https://phabricator.wikimedia.org/T175871#3606127 (Ejegg) [22:03:57] (PS3) Ejegg: Fix remaining phpcs warnings, add to composer test [extensions/DonationInterface] - https://gerrit.wikimedia.org/r/377895 [22:10:49] (PS1) Ejegg: Retry deadlocked inserts, take two [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/377903 (https://phabricator.wikimedia.org/T118487) [22:11:50] Fundraising Sprint Quill Pencil, Fundraising Sprint RadioActivewear, Fundraising-Backlog, Wikimedia-Fundraising-CiviCRM, Patch-For-Review: Deadlock should result in requeueing the message - https://phabricator.wikimedia.org/T118487#1801561 (Ejegg) [22:12:50] ejegg: one thing about ^ [22:13:02] deadlocks can be avoided by auditing txn usage [22:13:11] Fundraising Sprint Quill Pencil, Fundraising Sprint RadioActivewear, Fundraising-Backlog, FR-Ingenico, and 3 others: spike: investigate creating an ingenico form with no city and state - https://phabricator.wikimedia.org/T151769#3606160 (Ejegg) The 'variant' code is deployed to prod, just needs... [22:14:08] cwd well, we definitely want the transaction when we're adding the user + contribution [22:15:05] I guess the select part of the merge job might not need to be in a transaction [22:15:15] yeah i imagine in the context of civi it would be really complicated [22:15:28] srsly [22:15:38] but when you see deadlocks it indicates the some transactions are bumping in to each other [22:15:56] the only real "fix" would be identifying and fixing them [22:16:14] yeah, and I wouldn't be surprised if a lot of core code just wrapped everything in a transaction [22:16:26] heh yeah it's a very easy thing to armchair program [22:16:44] my specialty [22:17:13] also that metaphor fails with programming [22:17:17] easily done from an armchair [22:17:18] haha [22:19:03] ejegg: on a practical note i seem to remember that in a deadlock situation it is not deterministic which thread will be killed and which will be allowed to succeed [22:19:26] how do you know what to retry? [22:21:39] if we're trying to insert a contact, and the db comes back with a lock wait or a deadlock, we drop the message into the damaged db with a retry_date [22:22:23] when hopefully the thing that was locking up civicrm_email will be done [22:23:57] ah yeah it is probably safe to assume only the killed thread receives the error [22:24:00] obvs [22:24:11] oh yeah, I guess so [22:34:43] Fundraising Sprint RadioActivewear, Fundraising-Backlog, MediaWiki-extensions-DonationInterface: Message with blank payment_method and amount = 0 sent to payments-init - https://phabricator.wikimedia.org/T173347#3606276 (Ejegg) Looks like two different windows: * xxxx:19:09 Donor starts a CC donation... [22:36:13] see you later, folks! [22:38:28] cwd regarding deadlock retries - civi retries 3 times I think [22:38:52] we could change that. I note a lot are on the email table which I expect is lots of small inserts [22:42:21] eileen: i doubt it's causing a ton of problems [22:42:39] would be nice to fix the "right" way but probably nigh impossible [22:43:54] cwd I don't know what the right way is TBH [22:44:15] I mean deadlocks are kinda there to handle too much traffic for the db to keep up with in this case [22:46:58] if you are careful enough with transactions from the get-go you can avoid creating situations where deadlocks can occur [22:47:24] but in anything civi-size that ship has probably sailed :) [22:50:13] also there would probably be a performance hit [23:11:14] eileen: frdb2001 seems to be operating normally since restarting mysql: https://grafana.wikimedia.org/dashboard/db/frdb2001?orgId=1 [23:11:18] i do not know what to make of that [23:11:53] cwd It kinda feels like a pattern - that we get a sweet spot after a restart [23:12:29] sure seems that way [23:12:45] think of it like the kids bedrooms. After a big spring clean there is a period where most things get put away. but then as more & more things don't there is an increasingly momentum to un-openable chaos [23:17:24] heheh [23:17:38] except i expect my mysql housekeeper to be on time