[02:07:40] (PS1) Eileen: Towards CRM-20155 clean up form code in order to consolidate function use. [wikimedia/fundraising/crm/civicrm] - https://gerrit.wikimedia.org/r/373173 [02:11:31] (PS2) Eileen: Add ability to find duplicates for selected contacts. [wikimedia/fundraising/crm/civicrm] - https://gerrit.wikimedia.org/r/373155 (https://phabricator.wikimedia.org/T151270) [02:39:27] (PS3) Eileen: Add ability to find duplicates for selected contacts. [wikimedia/fundraising/crm/civicrm] - https://gerrit.wikimedia.org/r/373155 (https://phabricator.wikimedia.org/T151270) [02:59:20] (CR) Eileen: "OK - this is working now on staging" [wikimedia/fundraising/crm/civicrm] - https://gerrit.wikimedia.org/r/373155 (https://phabricator.wikimedia.org/T151270) (owner: Eileen) [03:02:20] Fundraising Sprint Prank Seatbelt, Fundraising-Backlog, Wikimedia-Fundraising-CiviCRM: civi dedupe: offer dedupe option in a regular search - https://phabricator.wikimedia.org/T151270#3543669 (Eileenmcnaughton) Ok - working on staging now - search for a contact where you know there are duplicate emai... [03:34:23] (PS1) Eileen: CRM-20658: Fatal error on Dedupe rule for > 1 match [wikimedia/fundraising/crm/civicrm] - https://gerrit.wikimedia.org/r/373176 (https://phabricator.wikimedia.org/T160571) [04:19:16] (PS2) Eileen: CRM-20658: Fatal error on Dedupe rule for > 1 match [wikimedia/fundraising/crm/civicrm] - https://gerrit.wikimedia.org/r/373176 (https://phabricator.wikimedia.org/T160571) [04:24:45] Fundraising Sprint Far Beer, Fundraising Sprint Gondwanaland Reunification Engine, Fundraising Sprint Homebrew Hadron Collider, Fundraising Sprint Ivory Tower Defense Games, and 8 others: Errors in CiviCRM dedupe screen - https://phabricator.wikimedia.org/T160571#3543696 (Eileenmcnaughton) [04:25:32] Fundraising Sprint Far Beer, Fundraising Sprint Gondwanaland Reunification Engine, Fundraising Sprint Homebrew Hadron Collider, Fundraising Sprint Ivory Tower Defense Games, and 8 others: Errors in CiviCRM dedupe screen - https://phabricator.wikimedia.org/T160571#3103876 (Eileenmcnaughton) I'm pu... [14:46:40] cwd: heads up that we've got a Big English test launching in 15 minutes [14:46:59] pcoombe: thanks! [14:47:03] we'll be watching [14:47:19] cool. It lasts 1 hour, let me know if you see any issues [14:47:58] sounds good [14:48:18] i expect to see replag spam in here, will probably fiddle some settings and see what happens [14:52:10] let me know when you're around ejegg|away [15:54:51] hi mepps! [15:55:19] back in Bogota, just sorting out a couple of account things [15:55:30] hi ejegg! are you working today? [15:55:45] yep! [15:56:49] great [15:57:04] i'm still sick but i'd really like to close out the orphan rectifier stuff today if possible [15:59:01] i have to take a 10 minute baby break then want to hangout? [15:59:14] i have my checkin with katie at 12:30 [15:59:46] sure, sounds good! [16:15:54] ejegg queenmary? [16:16:07] mepps one sec! [16:20:15] (PS9) Mepps: WIP Orphan Slayer Module [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/370225 [16:22:00] ejegg: does the donation consumer update c_t? [16:27:10] cwd yep [16:28:03] i bet that fights with the front end some [16:28:05] over the write lock [16:28:35] we'll do the guid thing some day [16:30:14] PROBLEM - check_mysql on frdb2001 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 2817 [16:30:26] here it is [16:30:34] damn [16:30:47] * cwd silences phone [16:32:10] (CR) jerkins-bot: [V: -1] WIP Orphan Slayer Module [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/370225 (owner: Mepps) [16:35:14] ejegg: to disable a job you just comment out the schedule? [16:35:14] PROBLEM - check_mysql on frdb2001 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 2217 [16:38:18] !log disabled all dedupe [16:38:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:38:31] cwd yep, sorry [16:38:41] np, just checked git [16:38:46] we'll see if that does it [16:40:15] PROBLEM - check_mysql on frdb2001 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 1763 [16:43:20] there are still some dedupes running [16:43:47] ejegg, Jeff_Green - opinions on killing em? [16:44:16] multiple? [16:44:42] ah nm there are just 2 lines for it [16:44:49] but it's been running almost an hour [16:44:50] aside: something heavy is happens ever 4h 30m [16:44:55] oh really [16:45:04] i think it's probably this [16:45:14] PROBLEM - check_mysql on frdb2001 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 1237 [16:46:14] which PID are you talking about? 32641? [16:47:00] or its parent [16:47:02] 32633 [16:47:38] is that the sudo wrapper? [16:48:17] i think 32633 is the wrapper [16:48:31] who knows, at any rate let's look at the debug log [16:49:55] the p-c logs don't say much [16:50:14] RECOVERY - check_mysql on frdb2001 is OK: Uptime: 571814 Threads: 1 Questions: 12784911 Slow queries: 3383 Opens: 8085 Flush tables: 1 Open tables: 601 Queries per second avg: 22.358 Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 0 [16:52:58] sorry i'm stupid it has not been running for almost an hour [16:53:04] almost 20 minutes [16:53:54] but clearly dedupe chokes when donations are hot [16:54:35] and i can certainly imagine why that would cause replag [16:54:51] it usually runs every 5 minutes [16:55:02] and takes however much less than that so we don't see mails about overlap [16:55:13] but it's taking 20+ minutes when banners are up [16:58:21] something ran around these times: 2:35AM, 7:00AM, 11:30AM, 4:00PM that spiked load on the master db [17:00:22] argh [17:00:31] and it ran at the same time as these banners [17:00:47] which messes up the results [17:01:13] however the dedupe thing still applies in that we don't see that mail normally [17:38:56] spatton: are banners still up? [17:39:57] i think they are not [17:45:14] PROBLEM - check_mysql on frdb2001 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 2779 [17:48:13] whyyyyyyyyy [17:50:14] RECOVERY - check_mysql on frdb2001 is OK: Uptime: 575413 Threads: 1 Questions: 13221331 Slow queries: 3383 Opens: 8105 Flush tables: 1 Open tables: 601 Queries per second avg: 22.977 Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 0 [18:00:49] ejegg, meet on the call in 5? i just need to grab some water [18:01:27] sounds good mepps! [18:07:16] ejegg meet in queenmary or in the meeting call? [18:08:20] mepps oops, i'm in queenmary [18:08:27] forgot there was a separate meeting call [18:08:35] okay joining there! [18:21:21] (PS10) Mepps: WIP Orphan Slayer Module [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/370225 [18:42:57] (PS11) Mepps: WIP Orphan Slayer Module [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/370225 [19:00:01] mepps sorry, rejoining [19:05:10] (PS12) Mepps: WIP Orphan Slayer Module [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/370225 [19:10:54] (CR) jerkins-bot: [V: -1] WIP Orphan Slayer Module [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/370225 (owner: Mepps) [19:19:56] (PS1) Ejegg: Return inserted IDs for pending and payments_init [wikimedia/fundraising/SmashPig] - https://gerrit.wikimedia.org/r/373347 [19:24:47] (PS13) Mepps: WIP Orphan Slayer Module, getting expected error message [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/370225 [19:31:10] (CR) jerkins-bot: [V: -1] WIP Orphan Slayer Module, getting expected error message [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/370225 (owner: Mepps) [19:47:46] AndyRussG: meeting? [19:52:48] Fundraising-Backlog, Wikimedia-Fundraising, MediaWiki-extensions-CentralNotice, Documentation: Banner and donatewiki style guide documentation needs updating - https://phabricator.wikimedia.org/T119821#3546874 (Pcoombe) Open>Resolved There is now some updated donatewiki documentation on c... [20:11:38] ejegg do you have any reviews you want me to look at? [20:11:57] also can you take a look at the orphan rectifier stuff in donationinterface? [20:15:14] mepps there's the mastercard stuff for review [20:15:44] and that little smashpig one if you want to get the pending db IDs back from storeMessage: https://gerrit.wikimedia.org/r/373347 [20:15:57] i'll definitely look at the donationinterface bits! [20:16:06] (PS2) Mepps: Update Mastercard logo [extensions/DonationInterface] - https://gerrit.wikimedia.org/r/373109 (https://phabricator.wikimedia.org/T166795) (owner: Ejegg) [20:16:11] (CR) Mepps: [C: 2] Update Mastercard logo [extensions/DonationInterface] - https://gerrit.wikimedia.org/r/373109 (https://phabricator.wikimedia.org/T166795) (owner: Ejegg) [20:20:46] (Merged) jenkins-bot: Update Mastercard logo [extensions/DonationInterface] - https://gerrit.wikimedia.org/r/373109 (https://phabricator.wikimedia.org/T166795) (owner: Ejegg) [20:36:21] eileen, ejegg - either of you have any idea what these load spikes might be? https://ganglia.wikimedia.org/latest/graph.php?r=day&z=xlarge&c=Fundraising+eqiad&h=frdb1001.frack.eqiad.wmnet&jr=&js=&v=1.09&m=load_one&vl=+&ti=One+Minute+Load+Average [20:36:48] i do not see any p-c jobs with a 6 hour schedule unless i'm misreading [20:36:59] wow, those are pretty chunky [20:37:18] yep and one was right in the middle of the test today [20:40:07] doesn't seem to be the audit parsers [20:40:15] (PS2) Mepps: Update MasterCard -> Mastercard [extensions/DonationInterface] - https://gerrit.wikimedia.org/r/373110 (https://phabricator.wikimedia.org/T166795) (owner: Ejegg) [20:40:20] (CR) Mepps: [C: 2] Update MasterCard -> Mastercard [extensions/DonationInterface] - https://gerrit.wikimedia.org/r/373110 (https://phabricator.wikimedia.org/T166795) (owner: Ejegg) [20:41:09] cwd silverpop export starts at 06:00, but there's no spike for another hour [20:42:17] (Merged) jenkins-bot: Update MasterCard -> Mastercard [extensions/DonationInterface] - https://gerrit.wikimedia.org/r/373110 (https://phabricator.wikimedia.org/T166795) (owner: Ejegg) [20:42:19] and that should just be once a day right? [20:42:25] yeah [20:42:36] was wondering if this was multiple things [20:42:45] but the spikes do look pretty regular [20:43:06] yeah [20:43:15] the last one is bigger presumably cause of the banners [20:43:21] hmmmm [20:43:44] do they start on the half hour? [20:45:51] not exactly [20:46:00] the distribution is a little funky [20:47:52] hmm, do they last a half an hour each? [20:48:12] what sort of spike is it? [20:48:21] could it be some huge report? [20:48:37] processor or ram or network or writes? [20:49:10] (CR) Mepps: Support srcset for card logos (1 comment) [extensions/DonationInterface] - https://gerrit.wikimedia.org/r/373112 (https://phabricator.wikimedia.org/T166795) (owner: Ejegg) [20:49:35] eileen: that graph is cpu [20:49:56] on the db server I guess [20:51:15] (CR) Ejegg: Support srcset for card logos (1 comment) [extensions/DonationInterface] - https://gerrit.wikimedia.org/r/373112 (https://phabricator.wikimedia.org/T166795) (owner: Ejegg) [20:51:21] note the silverpop export has a bunch of queries within it & one or more of them might be more intensive. Also I wonder about cache flushing causing it? [20:53:13] ejegg see comment above [20:53:29] thinking those load spikes are related to ossec [20:53:36] fs scans [20:54:39] eileen: isn't silverpop running on staging anyway? [20:54:43] i think this is that: https://ganglia.wikimedia.org/latest/graph.php?r=week&z=xlarge&c=Fundraising+eqiad&h=frdev1001.frack.eqiad.wmnet&jr=&js=&v=1.09&m=load_one&vl=+&ti=One+Minute+Load+Average [21:07:37] mepps responded! [21:07:59] or rather, mepps: I responded [21:08:22] hehe [21:08:24] cwd yep, it's hitting staging [21:08:45] cool [21:09:05] but the actual process runs on civi1001 [21:11:16] (CR) Eileen: [C: 2] Update list of processors in Gateway Reconciliation report [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/370979 (owner: Ejegg) [21:17:37] (Merged) jenkins-bot: Update list of processors in Gateway Reconciliation report [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/370979 (owner: Ejegg) [21:36:25] cwd am I right in believing there were no notable issues with replag over the one hour test, or Caitlin's 60k email [21:36:50] no, email not gone yet I uess [21:37:52] eileen1: there was some lag, not as bad as last time [21:41:25] (PS1) Ejegg: Merge branch 'master' into deployment [extensions/DonationInterface] (deployment) - https://gerrit.wikimedia.org/r/373385 [21:41:59] (CR) Ejegg: [C: 2] Merge branch 'master' into deployment [extensions/DonationInterface] (deployment) - https://gerrit.wikimedia.org/r/373385 (owner: Ejegg) [21:42:51] (Merged) jenkins-bot: Merge branch 'master' into deployment [extensions/DonationInterface] (deployment) - https://gerrit.wikimedia.org/r/373385 (owner: Ejegg) [21:47:00] cwd hmm lag on a one hour test doesn't bode well for BE [21:47:15] you ain't kiddin [21:47:57] i pulled some numbers yesterday and last week's test was around 1/10 the traffic of a busy big english day [21:49:00] so what's the best way to figure out the source? [21:50:03] we could turn off all the jobs, wait for a big chunk of donations to build up in the queue, then run just the queue consumer at full tilt, to see if that gets us lag [21:51:47] yeah we should test donations without dedupe & silverpop.fetch on - because that will be the case for the first few days of BE [21:53:27] that's not a bad idea [21:53:51] we turned off a regular security scan, that may have had something to do with it [21:54:05] ossec-syscheckd [21:54:25] it's mostly doing duplicated work at this point [23:48:49] eileen1: sory I've been tied up with a bunch of other stuff. What is on staging to be reviewed? [23:49:39] dstrine: if you do a contact search now you will have another action 'Find duplicate contacts' I think [23:50:04] try for a contact you know there is an email dupe for [23:50:42] (eg. pick a name from this link civicrm/contact/dedupefind?reset=1&rgid=13&gid=268&limit=500000000&action=update) [23:51:00] can do hangout / screen share if easier