[01:51:18] <wikibugs>	 (CR) AndyRussG: Controls to purge banner content from front-end cache for a language (6 comments) [extensions/CentralNotice] - https://gerrit.wikimedia.org/r/364910 (https://phabricator.wikimedia.org/T168673) (owner: AndyRussG)
[03:07:12] <wikibugs>	 (PS1) Eileen: Add filters to mailing report. [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/367847 (https://phabricator.wikimedia.org/T161758)
[03:30:54] <wikibugs>	 (PS1) Eileen: Omnimailing - extendeded mailing report - add suppressed [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/367848 (https://phabricator.wikimedia.org/T161758)
[03:56:00] <wikibugs>	 Fundraising Sprint Gondwanaland Reunification Engine, Fundraising Sprint Homebrew Hadron Collider, Fundraising Sprint Ivory Tower Defense Games, Fundraising Sprint Judgement Suspenders, and 8 others: retrieve the text/ html and statistics data for m... - https://phabricator.wikimedia.org/T161758#3473641
[03:56:33] <wikibugs>	 Fundraising Sprint Loose Lego Carpeting, Fundraising Sprint Murphy's Lawyer, Fundraising Sprint Navel Warfare, Fundraising-Backlog, and 2 others: Add ability for MG to import to Primary address type - https://phabricator.wikimedia.org/T169025#3473642 (Eileenmcnaughton)
[03:56:41] <wikibugs>	 Fundraising Sprint Loose Lego Carpeting, Fundraising Sprint Murphy's Lawyer, Fundraising Sprint Navel Warfare, Fundraising-Backlog, and 2 others: Add ability for MG to import to Primary address type - https://phabricator.wikimedia.org/T169025#3384910 (Eileenmcnaughton) Open>Resolved
[04:01:37] <wikibugs>	 Fundraising Sprint Judgement Suspenders, Fundraising Sprint Kickstopper, Fundraising-Backlog, Wikimedia-Fundraising-CiviCRM, and 2 others: Silverpop Figure out how to deal with merged contacts mailing records - https://phabricator.wikimedia.org/T171703#3473652 (Eileenmcnaughton)
[04:21:19] <wikibugs>	 (PS1) Eileen: Fix typo causing enotice & suppressed not to populate [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/367851 (https://phabricator.wikimedia.org/T161758)
[04:48:44] <wikibugs>	 Fundraising Sprint Gondwanaland Reunification Engine, Fundraising Sprint Homebrew Hadron Collider, Fundraising Sprint Ivory Tower Defense Games, Fundraising Sprint Judgement Suspenders, and 8 others: Drush not handling spaces in quotes / schedule Si... - https://phabricator.wikimedia.org/T171435#3473673
[04:55:17] <icinga-wm>	 PROBLEM - check_mysql on frdb2001 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 2186
[04:59:13] <eileen1>	 Hmm that alert might be just the volume of mailing data being loaded
[05:00:08] <icinga-wm>	 RECOVERY - check_mysql on frdb2001 is OK: Uptime: 1260470 Threads: 1 Questions: 46313750 Slow queries: 6781 Opens: 9557 Flush tables: 1 Open tables: 608 Queries per second avg: 36.743 Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 0
[09:32:02] <wikibugs>	 Fundraising-Backlog, Wikimedia-Fundraising-CiviCRM, Patch-For-Review: Populate country column when creating c_t rows during offline import - https://phabricator.wikimedia.org/T171658#3474004 (Pcoombe)
[10:37:44] <wikibugs>	 (Abandoned) Hashar: Jenkins job validation (DO NOT SUBMIT) [extensions/DonationInterface] - https://gerrit.wikimedia.org/r/324066 (owner: Hashar)
[11:08:18] <wikibugs>	 (CR) Hashar: "recheck" [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/363141 (owner: Hashar)
[11:09:49] <wikibugs>	 (CR) jerkins-bot: [V: -1] CI: install CiviCRM with a fake sendmail [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/363141 (owner: Hashar)
[11:12:01] <wikibugs>	 (PS2) Hashar: CI: install CiviCRM with a fake sendmail [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/363141
[11:13:32] <wikibugs>	 (CR) jerkins-bot: [V: -1] CI: install CiviCRM with a fake sendmail [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/363141 (owner: Hashar)
[11:13:51] <wikibugs>	 (CR) Hashar: "The mysterious failure only occurs on integration-slave-jessie-1001 while -1002 works fine bah" [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/363141 (owner: Hashar)
[11:17:39] <wikibugs>	 Wikimedia-Fundraising-CiviCRM, Continuous-Integration-Infrastructure, Release-Engineering-Team (Kanban): wikimedia-fundraising-civicrm fails with Call to a member function getDriver() on null in phar:///srv/jenkins-workspace/workspace/wikimedia-fundrais... - https://phabricator.wikimedia.org/T171724#3474287
[12:07:24] <wikibugs>	 Wikimedia-Fundraising-CiviCRM, Continuous-Integration-Infrastructure, Release-Engineering-Team (Kanban): wikimedia-fundraising-civicrm fails with Call to a member function getDriver() on null in phar:///srv/jenkins-workspace/workspace/wikimedia-fundrais... - https://phabricator.wikimedia.org/T171724#3474393
[12:09:15] <wikibugs>	 (CR) Hashar: "it fails on both hosts :(" [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/363141 (owner: Hashar)
[12:32:02] <wikibugs>	 (Restored) Hashar: Jenkins job validation (DO NOT SUBMIT) [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/151840 (owner: Hashar)
[12:32:08] <wikibugs>	 (PS4) Hashar: Jenkins job validation (DO NOT SUBMIT) [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/151840
[12:51:44] <wikibugs>	 (CR) Hashar: "Found it:" [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/363141 (owner: Hashar)
[12:55:05] <wikibugs>	 Wikimedia-Fundraising-CiviCRM, Continuous-Integration-Infrastructure, Release-Engineering-Team (Kanban): wikimedia-fundraising-civicrm fails with Call to a member function getDriver() on null in phar:///srv/jenkins-workspace/workspace/wikimedia-fundrais... - https://phabricator.wikimedia.org/T171724#3474628
[12:56:59] <wikibugs>	 (PS1) Hashar: (DO NOT SUBMIT) Strict and verbose ci-populate-dbs [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/367886 (https://phabricator.wikimedia.org/T171724)
[12:58:36] <wikibugs>	 (CR) jerkins-bot: [V: -1] (DO NOT SUBMIT) Strict and verbose ci-populate-dbs [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/367886 (https://phabricator.wikimedia.org/T171724) (owner: Hashar)
[13:03:44] <wikibugs>	 (CR) Hashar: [C: -1] "sendmail_path is php.ini setting not an amp one. Found out via travis.yaml file:" [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/363141 (owner: Hashar)
[13:05:49] <pcoombe>	 Good morning MBeat! Big English banners just went up
[13:06:03] <MBeat>	 thanks, pcoombe !
[13:06:37] <MBeat>	 Ilike the earlier start time
[13:34:04] <wikibugs>	 (PS2) Hashar: Make CI scripts more stricts [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/367886 (https://phabricator.wikimedia.org/T171724)
[13:45:10] <icinga-wm>	 PROBLEM - check_mysql on frdb2001 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 3394
[13:50:06] <wikibugs>	 (PS3) Hashar: Make CI scripts more stricts [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/367886 (https://phabricator.wikimedia.org/T171724)
[13:50:10] <icinga-wm>	 PROBLEM - check_mysql on frdb2001 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 3390
[13:51:10] <wikibugs>	 (PS1) Hashar: Fix misc bash oddities in the CI scripts [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/367893
[13:55:11] <icinga-wm>	 PROBLEM - check_mysql on frdb2001 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 3376
[14:00:10] <icinga-wm>	 PROBLEM - check_mysql on frdb2001 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 3375
[14:05:10] <icinga-wm>	 PROBLEM - check_mysql on frdb2001 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 3369
[14:05:47] <wikibugs>	 (PS3) Hashar: CI: install CiviCRM with a fake sendmail [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/363141 (https://phabricator.wikimedia.org/T161724)
[14:06:27] <pcoombe>	 MBeat: Big English test just finished. All looks good from my end :)
[14:06:39] <wikibugs>	 (PS4) Hashar: CI: install CiviCRM with a fake sendmail [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/363141 (https://phabricator.wikimedia.org/T161724)
[14:06:44] <MBeat>	 great, thank you! nothing glaring in Zendesk
[14:10:10] <icinga-wm>	 PROBLEM - check_mysql on frdb2001 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 3419
[14:11:44] <wikibugs>	 (PS5) Hashar: CI: install CiviCRM with a fake sendmail [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/363141 (https://phabricator.wikimedia.org/T161724)
[14:13:11] <wikibugs>	 (CR) jerkins-bot: [V: -1] CI: install CiviCRM with a fake sendmail [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/363141 (https://phabricator.wikimedia.org/T161724) (owner: Hashar)
[14:13:32] <wikibugs>	 Wikimedia-Fundraising-CiviCRM, Continuous-Integration-Infrastructure, Patch-For-Review, Release-Engineering-Team (Kanban): wikimedia-fundraising-civicrm fails with Call to a member function getDriver() on null in phar:///srv/jenkins-workspace/worksp... - https://phabricator.wikimedia.org/T171724#3474819
[14:15:10] <icinga-wm>	 PROBLEM - check_mysql on frdb2001 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 3449
[14:15:11] <icinga-wm>	 PROBLEM - check_mysql on frdb1002 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 1215
[14:20:10] <icinga-wm>	 PROBLEM - check_mysql on frdb1002 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 1301
[14:20:11] <icinga-wm>	 PROBLEM - check_mysql on frdb2001 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 3458
[14:25:10] <icinga-wm>	 PROBLEM - check_mysql on frdb1002 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 1394
[14:25:15] <icinga-wm>	 PROBLEM - check_mysql on frdb2001 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 3454
[14:30:10] <icinga-wm>	 PROBLEM - check_mysql on frdb1002 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 1479
[14:30:10] <icinga-wm>	 PROBLEM - check_mysql on frdb2001 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 3458
[14:35:10] <icinga-wm>	 PROBLEM - check_mysql on frdb1002 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 1555
[14:35:20] <icinga-wm>	 PROBLEM - check_mysql on frdb2001 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 3460
[14:40:10] <icinga-wm>	 PROBLEM - check_mysql on frdb2001 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 3450
[14:40:11] <icinga-wm>	 PROBLEM - check_mysql on payments2001 is CRITICAL: Slave IO: Connecting Slave SQL: Yes Seconds Behind Master: (null)
[14:45:10] <icinga-wm>	 RECOVERY - check_mysql on frdb1002 is OK: Uptime: 1296929 Threads: 1 Questions: 69285377 Slow queries: 7992 Opens: 10298 Flush tables: 1 Open tables: 610 Queries per second avg: 53.422 Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 0
[14:45:20] <icinga-wm>	 PROBLEM - check_mysql on frdb2001 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 3470
[14:45:21] <icinga-wm>	 PROBLEM - check_mysql on payments2001 is CRITICAL: Slave IO: Connecting Slave SQL: Yes Seconds Behind Master: (null)
[14:47:24] <wikibugs>	 (CR) Mepps: "One quick question but overall looks good" (1 comment) [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/365543 (owner: Eileen)
[14:50:10] <icinga-wm>	 PROBLEM - check_mysql on frdb2001 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 3498
[14:50:11] <icinga-wm>	 PROBLEM - check_mysql on payments2001 is CRITICAL: Slave IO: Connecting Slave SQL: Yes Seconds Behind Master: (null)
[14:51:21] <ejegg|away>	 oh wow, those silverpop activity imports are brutal on the db...
[14:55:20] <icinga-wm>	 PROBLEM - check_mysql on frdb2001 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 3505
[14:55:20] <icinga-wm>	 PROBLEM - check_mysql on payments2001 is CRITICAL: Slave IO: Connecting Slave SQL: Yes Seconds Behind Master: (null)
[14:57:58] <cwd>	 hopefully we can move some of these type of alerts to prometheus and have more of a sliding scale
[15:00:10] <icinga-wm>	 PROBLEM - check_mysql on frdb2001 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 3530
[15:00:20] <icinga-wm>	 PROBLEM - check_mysql on payments2001 is CRITICAL: Slave IO: Connecting Slave SQL: Yes Seconds Behind Master: (null)
[15:03:33] <wikibugs>	 Wikimedia-Fundraising-CiviCRM, Continuous-Integration-Infrastructure, Patch-For-Review, Release-Engineering-Team (Kanban): wikimedia-fundraising-civicrm fails with Call to a member function getDriver() on null in phar:///srv/jenkins-workspace/worksp... - https://phabricator.wikimedia.org/T171724#3474977
[15:05:10] <icinga-wm>	 PROBLEM - check_mysql on frdb2001 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 3570
[15:05:11] <icinga-wm>	 PROBLEM - check_mysql on payments2001 is CRITICAL: Slave IO: Connecting Slave SQL: Yes Seconds Behind Master: (null)
[15:05:49] <ejegg|semi>	 cwd ah, also this is an initial load of all of last year's bulk mailings
[15:06:15] <ejegg|semi>	 k, rain has lessened, gonna bike the rest of the way to the office
[15:10:10] <icinga-wm>	 PROBLEM - check_mysql on frdb2001 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 3596
[15:10:11] <icinga-wm>	 PROBLEM - check_mysql on payments2001 is CRITICAL: Slave IO: Connecting Slave SQL: Yes Seconds Behind Master: (null)
[15:15:10] <icinga-wm>	 RECOVERY - check_mysql on frdb2001 is OK: Uptime: 1297370 Threads: 1 Questions: 68625910 Slow queries: 7270 Opens: 10231 Flush tables: 1 Open tables: 608 Queries per second avg: 52.896 Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 0
[15:15:13] <mepps>	 AndyRussG my head is in the clowds today, do you wnat to meet?
[15:15:20] <icinga-wm>	 RECOVERY - check_mysql on payments2001 is OK: Uptime: 1789429 Threads: 4 Questions: 18090 Slow queries: 0 Opens: 17 Flush tables: 1 Open tables: 80 Queries per second avg: 0.010 Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 0
[15:26:42] <ejegg>	 hi mepps and AndyRussG
[15:26:52] <mepps>	 hi ejegg!
[15:27:46] <cwd>	 ejegg: hey, got a second to talk about the mysql issue?
[15:27:55] <ejegg>	 cwd sure
[15:28:18] <ejegg>	 afaik it should dissapate soon
[15:28:24] <cwd>	 basically we want to avoid excessive replag
[15:28:40] <cwd>	 if anything happened to the master at this point we'd lose data permanently
[15:28:41] <ejegg>	 lemme see how far through the past year it's gotten
[15:28:55] <cwd>	 plus i have 250+ text messages from icinga
[15:29:26] <ejegg>	 yeah, so we should do something special when we need to run huge initial data loads like this one
[15:29:45] <cwd>	 yeah
[15:29:54] <cwd>	 what was the actual process? a drush command?
[15:30:10] <icinga-wm>	 PROBLEM - check_mysql on frdb2001 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 1932
[15:30:18] <ejegg>	 yep, it's the new process-control job omnimail_recipient_load;
[15:30:50] <cwd>	 gotcha
[15:31:29] <cwd>	 ejegg: would you mind writing a note to tech explaining what happened so we can have a conversation about how to handle it more generally?
[15:31:41] <cwd>	 i'm sure there's no catch-all fix to long running queries
[15:31:47] <cwd>	 but we could discuss different approaches
[15:32:15] <ejegg>	 cwd it might not even be a long-running query, just a sustained volume of inserts
[15:33:11] <ejegg>	 it's added 22M rows to civicrm_mailing_provider_data over the past 10 hrs, and 6M to civicrm_mailing_recipients
[15:34:33] <AndyRussG>	 mepps: oooops!!!!!
[15:34:43] <AndyRussG>	 aaaarg
[15:34:45] <AndyRussG>	 sorrrryy!
[15:34:49] <AndyRussG>	 Totally spaced out
[15:35:20] <icinga-wm>	 PROBLEM - check_mysql on frdb2001 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 1289
[15:35:26] <Jeff_Green>	 ejegg: let's stop the job and evaluate
[15:35:40] <AndyRussG>	 how about in 1/2 hour?
[15:40:10] <icinga-wm>	 PROBLEM - check_mysql on frdb2001 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 1259
[15:40:49] <ejegg>	 cwd wrote that note
[15:41:19] <cwd>	 thanks!
[15:41:19] <wikibugs>	 (Abandoned) Awight: WIP: DonationForm [extensions/DonationInterface] - https://gerrit.wikimedia.org/r/61929 (owner: Awight)
[15:41:28] <wikibugs>	 (Abandoned) Awight: WIP device filtering in GlobalAllocation [extensions/CentralNotice] - https://gerrit.wikimedia.org/r/63100 (owner: Awight)
[15:41:30] <wikibugs>	 (Abandoned) Awight: [WIP] Minor payment_submethod cleanup [extensions/DonationInterface] - https://gerrit.wikimedia.org/r/64236 (owner: Awight)
[15:41:33] <wikibugs>	 (Abandoned) Awight: WIP Adapter is not always initialized with data [extensions/DonationInterface] - https://gerrit.wikimedia.org/r/64345 (owner: Awight)
[15:41:35] <wikibugs>	 (Abandoned) Awight: [WIP] GatewayAdapter::isSupported [extensions/DonationInterface] - https://gerrit.wikimedia.org/r/64872 (owner: Awight)
[15:41:39] <ejegg>	 ooh, ghosts!
[15:41:41] <wikibugs>	 (Abandoned) Awight: WIP tests for the return_value_map [extensions/DonationInterface] - https://gerrit.wikimedia.org/r/86790 (owner: Awight)
[15:41:50] <wikibugs>	 (Abandoned) Awight: WIP dedupe report [wikimedia/fundraising/tools] - https://gerrit.wikimedia.org/r/93429 (owner: Awight)
[15:41:54] <mepps>	 AndyRussG, that should work, I'll be here until your time 11:45/12:45 EST
[15:41:57] <AndyRussG>	 somebody doesn't want to be haunted anymore
[15:42:08] <AndyRussG>	 mepps: ok thanks!!!! many apologies
[15:42:17] <ejegg>	 the power of gerrit compels you!
[15:42:18] <mepps>	 AndyRussG, it's okay I spaced too
[15:42:45] <wikibugs>	 (Abandoned) Awight: [WIP] protect findAccount in case there is no -AccountInfo [extensions/DonationInterface] - https://gerrit.wikimedia.org/r/95873 (owner: Awight)
[15:42:49] <wikibugs>	 (Abandoned) Awight: WIP example worldpay audit conf [wikimedia/fundraising/tools] - https://gerrit.wikimedia.org/r/129212 (owner: Awight)
[15:42:51] <wikibugs>	 (Abandoned) Awight: [WIP] opt_out preferences apply to email addresses separately [wikimedia/fundraising/tools] - https://gerrit.wikimedia.org/r/133181 (owner: Awight)
[15:42:55] <wikibugs>	 (Abandoned) Awight: [WIP] Transparent workaround for access control [extensions/DonationInterface] (php54_test_adapter_collapse) - https://gerrit.wikimedia.org/r/133509 (owner: Awight)
[15:43:37] <awight>	 lol
[15:43:45] * awight rattles chains in the wings
[15:43:50] <AndyRussG>	 awight: boo!
[15:44:10] * awight faints at potentially running into a real ghost
[15:45:18] <icinga-wm>	 PROBLEM - check_mysql on frdb2001 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 1235
[15:49:45] <ejegg>	 Jeff_Green: ok, rescheduled it
[15:49:54] <ejegg>	 oops, yaml error
[15:49:57] <Jeff_Green>	 to never?
[15:50:08] <icinga-wm>	 PROBLEM - check_mysql on frdb2001 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 1213
[15:50:09] <ejegg>	 to after work hrs
[15:50:31] <Jeff_Green>	 please just shut it off until we can evaluate
[15:50:57] <Jeff_Green>	 it really should have been shut off as soon as it was evident it was causing the replag
[15:51:04] <ejegg>	 ah, ok
[15:51:33] <Jeff_Green>	 i spent a chunk of this AM looking for it but because it was small queries I didn't connect the lag to the job
[15:52:48] <Jeff_Green>	 i don't know enough about what it's doing to try to look for a better option, but it might be better to schedule downtime and do it as a bulk insert
[15:54:34] <cwd>	 i would agree that if the lag is totally unavoidable we should schedule downtime to do it
[15:54:37] <ejegg>	 ok, it's out of the schedule
[15:54:51] <cwd>	 but maybe there's an easier way to throttle it on the way in
[15:55:15] <cwd>	 i imagine it'll be different for every case
[15:55:57] <ejegg>	 if those API calls are each wrapped in a big txn, we could at least break those down
[15:56:31] <cwd>	 yeah, although there could be adverse effects to that too
[15:56:44] <cwd>	 like if multiple txns caused excessive index recalculation
[15:56:58] <ejegg>	 oh?
[15:57:14] <cwd>	 pure conjecture
[15:57:23] <Jeff_Green>	 what's the end result in the db of the API call, does it just add a row to one table or is it touching a bunch of tables?
[15:57:24] <ejegg>	 so, if a txn runs for 20 minutes, that means 20 minutes of replag, right?
[15:57:51] <cwd>	 hmm, probably depends on the isolation level of the txn
[15:58:18] <ejegg>	 Jeff Green looks like inserting to two tables, both only used for this 3rd party mailer data
[15:58:44] <ejegg>	 hmm, I see the inserts to one table in the code, maybe the other table is populated by a different call?
[16:00:48] <Jeff_Green>	 that's a good question re. replag, I'm a little surprised it was so bad for single row inserts
[16:01:37] <Jeff_Green>	 i wonder if it would be better to do something like temporarily disable indexes, "load data infile" or similar, and reenable indexes
[16:02:36] <Jeff_Green>	 is this a once-a-year kind of thing? or is this the beginning of something more frequent?
[16:02:59] <ejegg>	 Jeff_Green: this is a once-in-a-lifetime initial load of a year's worth of data
[16:03:12] <Jeff_Green>	 ok
[16:03:13] <ejegg>	 once we're up to date, we'll be loading the past half hour's worth each time
[16:03:47] <Jeff_Green>	 how much is left to do?
[16:04:09] <ejegg>	 lemme see, we're up to at least the 10th of December
[16:04:23] <ejegg>	 which should be the majority of the mailings
[16:04:36] <ejegg>	 I'll see if I can get stats in the Silverpop console
[16:04:41] <Jeff_Green>	 k
[16:05:32] <AndyRussG>	 mepps: anytime now is cool!
[16:07:58] <mepps>	 AndyRussG, great, meet in queenmary?
[16:08:16] <AndyRussG>	 mepps: K!
[16:08:28] * Jeff_Green afk for lunch, biab
[16:53:44] <wikibugs>	 Wikimedia-Fundraising-CiviCRM, Continuous-Integration-Infrastructure, Patch-For-Review, Release-Engineering-Team (Kanban): wikimedia-fundraising-civicrm fails with Call to a member function getDriver() on null in phar:///srv/jenkins-workspace/worksp... - https://phabricator.wikimedia.org/T171724#3475326
[16:55:12] <icinga-wm>	 PROBLEM - check_mysql on frdb1002 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 1233
[16:56:44] <cwd>	 hmmm
[16:56:52] <ejegg>	 hmm indeed
[16:57:06] <ejegg>	 checking on the job
[16:58:35] <ejegg>	 oh hey, the report in the silverpop console came back
[17:00:12] <icinga-wm>	 PROBLEM - check_mysql on frdb1002 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 1297
[17:00:48] <cwd>	 ejegg: INSERT IGNORE INTO civicrm_mailing_provider_dat
[17:00:56] <cwd>	 did that thing restart itself or something?
[17:01:47] <ejegg>	 cwd still running the job from almost 2 hours ago
[17:02:09] <ejegg>	 no idea how it decides on the batch size
[17:03:12] <ejegg>	 ah, it's tracking click throughs, opens, and bounces as well
[17:05:12] <icinga-wm>	 PROBLEM - check_mysql on frdb1002 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 1370
[17:08:45] <awight>	 cool!  silverpop -> Civi syncing?
[17:09:52] <ejegg>	 awight: yeppers!
[17:10:05] <ejegg>	 so far just dumping into an unconnected table
[17:10:12] <icinga-wm>	 PROBLEM - check_mysql on frdb1002 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 1444
[17:10:13] <ejegg>	 but there's a flag for 'processed into civi'
[17:10:53] <ejegg>	 meaning we'll be making actual civimail records at some point, i think
[17:11:03] <ejegg>	 fr-tech any news or requests for scrum of scrums?
[17:11:56] <awight>	 nice way to do it.
[17:12:37] <cwd>	 ejegg: you could mention that we are starting the prometheus transition, i think other teams will be happy about that
[17:12:44] <cwd>	 ganglia->prometheus
[17:12:49] <ejegg>	 sure thing!
[17:13:12] <cwd>	 ty!
[17:15:12] <icinga-wm>	 PROBLEM - check_mysql on frdb1002 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 1519
[17:20:12] <icinga-wm>	 PROBLEM - check_mysql on frdb1002 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 1593
[17:23:30] <awight>	 cwd: How does that relate to grafana?  Is the foundation also trying to go grafana -> prometheus?
[17:23:46] <awight>	 lol, I’m still trying to sort out ganglia -> grafana
[17:24:54] <cwd>	 awight: prometheus provides the data, grafana provides the graphs
[17:25:01] <cwd>	 so they work in concert
[17:25:08] <awight>	 aha, cool thanks
[17:25:12] <icinga-wm>	 PROBLEM - check_mysql on frdb1002 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 1667
[17:25:23] <awight>	 yipes—and grafana provides the… something for icinga
[17:25:38] <awight>	 provides the extra layer of complexity ;-)
[17:25:45] <cwd>	 does it? i thought icinga remained its own thing
[17:25:55] <cwd>	 but i am only just starting this Great Adventure
[17:27:07] <AndyRussG>	 ejegg: thx, nothing her for now!
[17:27:21] <awight>	 haha the first step is for us to leave behind our egos, which tell us that we know anything at all
[17:29:26] <cwd>	 amen brother
[17:29:45] <cwd>	 another thing i don't know is why the replag on 1002 continues to grow
[17:29:57] <cwd>	 i can't find any drush procs
[17:30:12] <icinga-wm>	 PROBLEM - check_mysql on frdb1002 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 1747
[17:30:13] <icinga-wm>	 RECOVERY - check_mysql on frdb2001 is OK: Uptime: 1305470 Threads: 1 Questions: 78869929 Slow queries: 7270 Opens: 10721 Flush tables: 1 Open tables: 608 Queries per second avg: 60.414 Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 0
[17:30:56] <cwd>	 Jeff_Green: maybe i should restart replication there?
[17:31:22] <cwd>	    State: closing tables
[17:31:23] <ejegg>	 cwd huh, the last run of the big recipient load finished about 20 min ago
[17:31:27] <cwd>	 i wonder if something hung up
[17:31:37] <Jeff_Green>	 cwd: I think it's just slow queries
[17:32:07] <cwd>	 isn't it weird that 2001 caught up?
[17:32:37] <Jeff_Green>	 you shouldn't need to restart replication if it's reporting that it's connected and whatnot
[17:33:27] <Jeff_Green>	 i suspect what's happening on 2001 is that the degraded RAID means disk IO is reduced
[17:33:51] <Jeff_Green>	 i haven't confirmed that but I've seen it happen before whenever hardware RAID is degraded
[17:34:11] <cwd>	 that's the weird part, 2001 is the one that caught up, 1002 is still hollering
[17:34:18] <Jeff_Green>	 orlly
[17:34:33] <Jeff_Green>	 1002 has other jobs that 2001 doesn't
[17:34:45] <cwd>	 ah yeah good point
[17:35:02] <cwd>	 it is the read-db fqdn yeah?
[17:35:12] <icinga-wm>	 PROBLEM - check_mysql on frdb1002 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 1824
[17:35:48] <cwd>	 processlist looks unexciting except for that "closing tables" thing that seems to be hung
[17:37:08] <Jeff_Green>	 it is the read db yeah
[17:37:34] <Jeff_Green>	 i saw a long-running "closing tables" earlier on frdb2001
[17:38:12] <Jeff_Green>	 it would be nice to know ~which~ tables it's closing :-)
[17:38:19] <cwd>	 srsly
[17:38:28] <cwd>	 this seems like a common problem without a common solution
[17:40:12] <icinga-wm>	 RECOVERY - check_mysql on frdb1002 is OK: Uptime: 1307429 Threads: 1 Questions: 79100687 Slow queries: 7998 Opens: 10777 Flush tables: 1 Open tables: 610 Queries per second avg: 60.500 Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 0
[17:40:31] <cwd>	 welp
[17:40:36] <cwd>	 i guess it was just really slow
[17:41:35] <cwd>	 Jeff_Green: you didn't turn any knobs to fix that did you?
[17:41:46] <Jeff_Green>	 nope
[17:42:12] <Jeff_Green>	 and the next time I checked after that process ended, there's a new one this time with the query
[17:43:47] <cwd>	 yeah i see that
[18:03:56] <wikibugs>	 (CR) Ejegg: [C: 2] Fix typo causing enotice & suppressed not to populate [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/367851 (https://phabricator.wikimedia.org/T161758) (owner: Eileen)
[18:08:35] <wikibugs>	 (CR) Mepps: [C: 2] Update SmashPig and DonationInterface [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/365069 (owner: Ejegg)
[18:09:22] <ejegg|food>	 woohoo, thanks mepps!
[18:09:58] <ejegg|food>	 cwd once i deploy ^^^ we can get rid of the legacy SmashPig.yaml
[18:11:40] <mepps>	 fr-tech i have to take john to the doctor during standup today, will catch up on work a bit later this evening after james goes to sleep
[18:11:54] <wikibugs>	 (Merged) jenkins-bot: Fix typo causing enotice & suppressed not to populate [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/367851 (https://phabricator.wikimedia.org/T161758) (owner: Eileen)
[18:11:54] <mepps>	 fr-tech i'm still here until 3
[18:18:06] <wikibugs>	 (Merged) jenkins-bot: Update SmashPig and DonationInterface [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/365069 (owner: Ejegg)
[18:33:42] <AndyRussG>	 mepps: hope everything's OK!
[18:34:10] <mepps>	 AndyRussG, it is, it's a foot injury, not a critical illness
[18:39:03] <ejegg>	 hope it heals up quick! foot injuries are still a bummer
[18:44:29] <mepps>	 ejegg true! yes i'm hoping it's nothing too major before all our travels coming up!
[18:45:46] <AndyRussG>	 mepps: ah... hope it gets better in a skip, hop and a jump :)
[18:54:04] <AndyRussG>	 ejegg: mepps: am I crazy, or is there something fundamentally wrong about this? https://github.com/wikimedia/mediawiki-extensions-CentralNotice/blob/e2a4ff9f87e9ee5a9daf5886e2bf0b7a64c8000f/special/SpecialCentralNotice.php#L820-L829
[18:54:13] <mepps>	 AndyRussG, that might make it worse ;)
[18:54:42] <AndyRussG>	 awww :( sorry didn't mean it that way!
[18:56:06] <AndyRussG>	 about the CN code ^ seems that after a banner save, it displays may fields in the form just based on what was sent in the post request to save, not what was actually saved in the DB
[18:56:19] <AndyRussG>	 so users might thing that settings were saved correctly, even if they weren't!
[18:56:43] <mepps>	 AndyRussG ahh yeah that's a good catch
[18:58:12] <AndyRussG>	 maybe it was a "workaround" for DB replication lag... though it shouldn't be an issue, in theory, because something behind the scenes is supposed to make sure a user gets an up-to-date snapshot after they save
[18:58:25] <AndyRussG>	 (something in our DB infrastructure, IIRC)
[19:04:04] <AndyRussG>	 hmmm not sure that code ever runs, now
[19:08:07] <mepps>	 hmm that seems problematic too
[19:09:40] <AndyRussG>	 welcome to CentralNotice!
[19:10:10] <wikibugs>	 (PS1) Ejegg: Update SmashPig, DonationInterface, and dependencies [wikimedia/fundraising/crm/vendor] - https://gerrit.wikimedia.org/r/367943
[19:10:16] * AndyRussG tries to escape silly bitterness
[19:10:25] <wikibugs>	 (CR) Ejegg: [C: 2] Update SmashPig, DonationInterface, and dependencies [wikimedia/fundraising/crm/vendor] - https://gerrit.wikimedia.org/r/367943 (owner: Ejegg)
[19:11:35] <wikibugs>	 (PS1) Ejegg: Merge branch 'master' into deployment [wikimedia/fundraising/crm] (deployment) - https://gerrit.wikimedia.org/r/367944
[19:11:55] <AndyRussG>	 aaarg, no, it does run
[19:13:18] <wikibugs>	 (CR) Ejegg: [C: 2] Merge branch 'master' into deployment [wikimedia/fundraising/crm] (deployment) - https://gerrit.wikimedia.org/r/367944 (owner: Ejegg)
[19:19:38] <wikibugs>	 (Merged) jenkins-bot: Update SmashPig, DonationInterface, and dependencies [wikimedia/fundraising/crm/vendor] - https://gerrit.wikimedia.org/r/367943 (owner: Ejegg)
[19:19:40] <wikibugs>	 (Merged) jenkins-bot: Merge branch 'master' into deployment [wikimedia/fundraising/crm] (deployment) - https://gerrit.wikimedia.org/r/367944 (owner: Ejegg)
[19:23:37] <wikibugs>	 Fundraising-Backlog, MediaWiki-extensions-CentralNotice: CentralNotice: On saving a banner, form shows values from save request without checking DB - https://phabricator.wikimedia.org/T171774#3475902 (AndyRussG)
[19:23:59] <AndyRussG>	 ejegg: mepps: ^ task for the abovmentioned shiew
[19:24:08] <AndyRussG>	 *ishiew
[19:24:56] <ejegg>	 !log disabled queue consumers for CiviCRM update
[19:25:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:25:06] <ejegg>	 eschew?
[19:35:04] <wikibugs>	 (PS1) Ejegg: hack out php54 polyfill stuff [wikimedia/fundraising/crm/vendor] - https://gerrit.wikimedia.org/r/367948
[19:35:14] <wikibugs>	 (CR) Ejegg: [C: 2] hack out php54 polyfill stuff [wikimedia/fundraising/crm/vendor] - https://gerrit.wikimedia.org/r/367948 (owner: Ejegg)
[19:35:45] <wikibugs>	 (PS1) Ejegg: Update vendor (get rid of php54 polyfill includes) [wikimedia/fundraising/crm] (deployment) - https://gerrit.wikimedia.org/r/367950
[19:35:53] <wikibugs>	 (CR) Ejegg: [C: 2] Update vendor (get rid of php54 polyfill includes) [wikimedia/fundraising/crm] (deployment) - https://gerrit.wikimedia.org/r/367950 (owner: Ejegg)
[19:43:09] <wikibugs>	 (Merged) jenkins-bot: hack out php54 polyfill stuff [wikimedia/fundraising/crm/vendor] - https://gerrit.wikimedia.org/r/367948 (owner: Ejegg)
[19:43:11] <wikibugs>	 (Merged) jenkins-bot: Update vendor (get rid of php54 polyfill includes) [wikimedia/fundraising/crm] (deployment) - https://gerrit.wikimedia.org/r/367950 (owner: Ejegg)
[19:46:30] <wikibugs>	 (PS1) Ejegg: Fixes for SmashPig update [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/367953
[19:47:11] <ejegg>	 oh hey, is fr-tech standup even happening today?
[20:07:52] <wikibugs>	 (PS4) Ejegg: Make CI scripts more stricts [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/367886 (https://phabricator.wikimedia.org/T171724) (owner: Hashar)
[20:08:40] <wikibugs>	 (CR) Ejegg: "Thanks, hashar!" [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/367886 (https://phabricator.wikimedia.org/T171724) (owner: Hashar)
[20:09:55] <wikibugs>	 (CR) Eileen: Update Omnimail GET to add rml fields (1 comment) [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/365543 (owner: Eileen)
[20:10:18] <wikibugs>	 (CR) Ejegg: [C: 2] Make CI scripts more stricts [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/367886 (https://phabricator.wikimedia.org/T171724) (owner: Hashar)
[20:16:35] <wikibugs>	 (Merged) jenkins-bot: Make CI scripts more stricts [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/367886 (https://phabricator.wikimedia.org/T171724) (owner: Hashar)
[20:16:37] <wikibugs>	 Fundraising-Backlog, fundraising-tech-ops: process-control repeated failure handling - https://phabricator.wikimedia.org/T161567#3476087 (cwdent) p:Normal>High Today's mailstrom (ha ha) warrants re-prioritizing this issue.  p-c should stop jobs at a fail mail threshold, something like 5 mails in...
[20:16:53] <cwd>	 dstrine: i hope it is ok that i moved this to high priority ^
[20:17:39] <cwd>	 we have succumbed to "alert fatigue" on a few fronts and are seeing too many meaningless ones at this point
[20:21:51] <ejegg>	 cwd I was just talking about that in standup
[20:22:08] <ejegg>	 at least as concerns paypal audit parsing
[20:22:36] <cwd>	 awesome
[20:22:57] <cwd>	 seems like some annoying but predictable errors in the audit files?
[20:23:03] <wikibugs>	 (PS4) Eileen: Update Omnimail GET to add rml fields [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/365543
[20:24:13] <ejegg>	 cwd yeah, it's a recurring thing where recurring payments are missing the subscription identifier
[20:24:22] <ejegg>	 meta-recurring bug
[20:25:28] <cwd>	 think it's safe to catch it and bail in a way that doesn't send mail?
[20:25:45] <cwd>	 don't want to make it too accepting of bad data or anything
[20:25:50] <cwd>	 but seems like it happens regularly
[20:25:59] <cwd>	 and i doubt we are going to get them to fix it
[20:26:44] <ejegg>	 cwd we need to email them to get corrected files whenever it happens
[20:27:01] <ejegg>	 and to yell at them some more about fixing the underlying bug
[20:27:36] <cwd>	 aah sure
[20:27:43] <cwd>	 so we don't want to just mask the failure
[20:27:54] <ejegg>	 right
[20:29:18] <wikibugs>	 (PS2) Ejegg: Fix misc bash oddities in the CI scripts [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/367893 (owner: Hashar)
[20:29:40] <wikibugs>	 (CR) Ejegg: [C: 2] "Oh hey, that's a really handy tool!" [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/367893 (owner: Hashar)
[20:30:28] <cwd>	 ejegg: is a general purpose mechanism to shut p-c jobs off on excessive failure better for this case?
[20:31:29] <ejegg>	 that would be great for this morning's "aborting, still running" errors
[20:32:07] <ejegg>	 for the audit parsing, I think we'd want the parser itself to decide how many bad lines are too many and to shut itself down
[20:32:19] <ejegg>	 at a predictable point rather than just being killed
[20:32:34] <cwd>	 sounds reasonable
[20:32:39] <cwd>	 catch that exception in a loop?
[20:33:04] <hashar>	 ejegg: thank you for the civicrm reviews / +2
[20:33:14] <ejegg>	 yeah, it's actually caught in a loop right now, there's just nothing accumulating the failures for a batch failmail or an action on too many
[20:33:15] <hashar>	 ejegg: sorry for the ton of spam I have emitted earlier today :(
[20:33:25] <ejegg>	 hashar: thank you for the improvements!
[20:33:47] <ejegg>	 heh, and whatever spam CI emitted was drowned out by fundraising's own monitoring spam
[20:34:03] <hashar>	 ejegg: all of that to move the civicrm job to Nodepool instances (and ensure jobs start with a fresh env on every build)
[20:34:07] <hashar>	 ahah
[20:35:02] <ejegg>	 oh cool, so they'll be able to run on nodepool now!
[20:35:31] <wikibugs>	 (Merged) jenkins-bot: Fix misc bash oddities in the CI scripts [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/367893 (owner: Hashar)
[20:36:37] <ejegg>	 hashar: oh yeah, looks like that 'move to nodepool' patch has been in the works since before we updated things to request php5.6
[20:37:00] <ejegg>	 which should still be fine on nodepool, right?
[20:43:16] <hashar>	 ejegg: hopefully :]
[20:43:32] <hashar>	 I will probably craft a transient job to test it is working all fine
[20:43:34] <hashar>	 then switch
[20:43:54] <ejegg>	 cool
[20:43:54] <hashar>	 I dont want you people to be blocked by CI randomly voting -1 on everything
[20:44:05] <ejegg>	 heh, that would indeed be an impediment
[20:45:09] <wikibugs>	 (PS6) Hashar: CI: install CiviCRM with a fake sendmail [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/363141 (https://phabricator.wikimedia.org/T161724)
[20:47:20] <wikibugs>	 (CR) Hashar: "Untested. That is meant to let me move the wikimedia-fundraising-civicrm job toward Nodepool instances and thus ensure a clean env on ever" [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/363141 (https://phabricator.wikimedia.org/T161724) (owner: Hashar)
[20:47:42] <hashar>	 ejegg: and that last one, I havent tested it at all. I  think I will take care of it tomorrow
[20:47:52] <ejegg>	 k, have a good evening!
[20:48:06] <hashar>	 if the CI pass on an instance that lacks sendmail, I think I will +2 it
[20:48:11] <hashar>	 and then migrate the jenkins job
[20:48:25] <hashar>	 but yeah tomorrow. I dont want to break anything when developers are active :]
[20:48:29] <hashar>	 thanks for all the reviews!
[20:48:46] <ejegg>	 thanks again for keeping our stuff up to date
[21:08:37] <wikibugs>	 (CR) Ejegg: "Looks great, just a few questions inline" (4 comments) [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/365543 (owner: Eileen)
[21:13:35] <dstrine>	 ejegg:  cwd  just catching up here. RE: T161567  does this need to be done soon? Please note I don't expect a lot to get done next sprint cause it's wikimania time
[21:13:36] <stashbot>	 T161567: process-control repeated failure handling - https://phabricator.wikimedia.org/T161567
[21:13:53] <ejegg>	 dstrine: it's another nice-to-have
[21:14:01] <dstrine>	 ok
[21:14:04] <ejegg>	 but even if we code it, I'm not sure how long it'll take to get out
[21:14:09] <dstrine>	 hmm
[21:14:16] <dstrine>	 ok
[21:14:27] <ejegg>	 since deployment of that particular tool seems to be a huge headache for ops
[21:22:37] <cwd>	 heh, we can make time
[21:22:43] <cwd>	 saves headaches like this morning
[21:23:04] <ejegg>	 I'm more concerned about that die-silently-on-bad-utf8 bug that's still out there
[21:23:36] <cwd>	 that's cause you didn't get 250 text messages at 7am today :)
[21:23:53] <cwd>	 but srsly we can roll up a new p-c soon
[21:24:21] <cwd>	 hopefully knock out all those things
[21:27:49] <eileen>	 guys just retrying the silverpop get with a shorter time period (from command line not scheduled) to see if that gets through without hurting stuff
[21:29:15] <wikibugs>	 (PS10) Ejegg: Unify queue message handling with SmashPig [extensions/DonationInterface] - https://gerrit.wikimedia.org/r/355453 (https://phabricator.wikimedia.org/T95647)
[21:29:54] <cwd>	 eileen: cool, yeah doing it in smaller bits that don't cause the replag is a fine solution too
[21:30:45] <eileen>	 cwd right  - the issue is to find the tolerance point - because unless I magically find it first go there will be a few rounds of causing annoyance while I figure it out
[21:31:01] <wikibugs>	 (PS3) Ejegg: Add country to c_t rows created during imports [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/367806 (https://phabricator.wikimedia.org/T171658)
[21:31:09] <eileen>	 btw - I didn't get the emails about the delay - I thought I used to?
[21:31:17] <cwd>	 it will be a bit of a moving target too in case it coincides with some other job
[21:31:32] <cwd>	 eileen: i can check on that
[21:31:53] <eileen>	 cwd yeah - possibly - although I think most ohter jobs the sore point is not the db replication
[21:32:05] <cwd>	 my basic instructions at this point are to kill any process or query that creates replag warnings
[21:32:15] <eileen>	 ah ok
[21:32:26] <eileen>	 well hopefully my 12 hour one will sneak through
[21:32:32] <cwd>	 :)
[21:36:00] <cwd>	 eileen: as far as emails, i'm pretty sure it's a prod puppet thing so i'll have to file a ticket, also pretty sure it's just me and jeff right now cause other folks were getting annoyed
[21:37:26] <cwd>	 gotta run, back in a bit
[21:44:23] <wikibugs>	 (PS4) Ejegg: Add country to c_t rows created during imports [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/367806 (https://phabricator.wikimedia.org/T171658)
[21:46:58] <wikibugs>	 (CR) Eileen: "My head hurts - my reply was quite long & it disappeared!" (3 comments) [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/365543 (owner: Eileen)
[21:53:32] <eileen>	 cwd - my 12 hour run survived without any emails it seems!
[21:55:51] <eileen>	 what I notices is that on staging the number of rows in that table is still going up even though finished on live - so I guess there is some replication lag - but not triggered the concern yet
[21:57:52] <wikibugs>	 (PS1) Ejegg: clean up insert_contribution_tracking signature [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/368103
[21:59:43] <wikibugs>	 (PS5) Ejegg: Add country to c_t rows created during imports [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/367806 (https://phabricator.wikimedia.org/T171658)
[22:00:00] <eileen>	 hmm - not fully caught up yet - I guess it could still cross into the dreaded ichinga terrory
[22:00:18] <eileen>	 (meant to say territory but I kinda like terrory)
[22:01:01] <ejegg>	 hehe
[22:07:54] <eileen>	 oh - it's stopped updating on dev - survived - will try another 12 hours from the command line
[22:09:11] <ejegg>	 eileen: ack, just realized the queue consumers are still off from my update attempt earlier!
[22:09:41] <ejegg>	 would you mind blessing https://gerrit.wikimedia.org/r/367953 ?
[22:09:51] <ejegg>	 then I'll re-deploy and turn stuff back on slowly
[22:10:42] <eileen>	 ok looking now
[22:10:45] <ejegg>	 Thanks!
[22:11:49] <wikibugs>	 (PS6) Ejegg: Add country to c_t rows created during imports [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/367806 (https://phabricator.wikimedia.org/T171658)
[22:13:17] <wikibugs>	 Fundraising-Backlog, Wikimedia-Fundraising-CiviCRM: Direct Mail Appeal not reflecting in contribution records for event - https://phabricator.wikimedia.org/T171794#3476345 (LeanneS)
[22:21:27] <wikibugs>	 (CR) Eileen: [C: 2] "This seems consistent with other changes around the queue. The change is only really changing" [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/367953 (owner: Ejegg)
[22:26:13] <ejegg>	 thanks eileen !
[22:27:28] <wikibugs>	 (Merged) jenkins-bot: Fixes for SmashPig update [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/367953 (owner: Ejegg)
[22:29:07] <wikibugs>	 (PS1) Ejegg: Merge branch 'master' into deployment [wikimedia/fundraising/crm] (deployment) - https://gerrit.wikimedia.org/r/368105
[22:29:15] <wikibugs>	 (CR) Ejegg: [C: 2] Merge branch 'master' into deployment [wikimedia/fundraising/crm] (deployment) - https://gerrit.wikimedia.org/r/368105 (owner: Ejegg)
[22:29:41] <eileen>	 trying another silverpop job
[22:29:46] <eileen>	 12 hours
[22:29:50] <ejegg>	 cool cool
[22:30:02] <wikibugs>	 (Merged) jenkins-bot: Merge branch 'master' into deployment [wikimedia/fundraising/crm] (deployment) - https://gerrit.wikimedia.org/r/368105 (owner: Ejegg)
[22:30:07] <eileen>	 will nurse it through peak big english & then schedule again
[22:30:32] <wikibugs>	 (PS1) Eileen: Update silverpopXmlConnector [wikimedia/fundraising/crm/vendor] - https://gerrit.wikimedia.org/r/368106
[22:30:36] <ejegg>	 eileen: I'm about to deploy that update again
[22:30:47] <ejegg>	 unless you think it'll disrupt your running job
[22:31:51] <eileen>	 I don't think it will - the bottleneck seems to just be communicating db transactions to the other db
[22:32:00] <eileen>	 & that won't be a massive volume will it?
[22:33:06] <wikibugs>	 (CR) jerkins-bot: [V: -1] Unify queue message handling with SmashPig [extensions/DonationInterface] - https://gerrit.wikimedia.org/r/355453 (https://phabricator.wikimedia.org/T95647) (owner: Ejegg)
[22:34:10] <ejegg>	 !log updated CiviCRM from 461900edc1e6f2443894b41c4bfa1c88160f9096 to fb83798f068ba3365a286e7f131eb5eb5b0e7aae
[22:34:20] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:34:59] <ejegg>	 ok, that seemed not to break everything
[22:35:16] <eileen>	 phew
[22:35:32] <eileen>	 I feel like it would be a hard fail if it were one
[22:35:59] <ejegg>	 yeah, we're only actually touching SmashPig stuff in the queue consumers and a few other places
[22:36:15] <ejegg>	 I just tested the damaged message db re-queueing, and that worked
[22:36:21] <eileen>	 great!
[22:36:29] <ejegg>	 so I'm going to turn on the queue consumers, starting with antifraud/init
[22:36:36] <eileen>	 I think I need to understand smash pig better
[22:36:52] <eileen>	 just want to get silverpop all dusted though at the moment
[22:37:12] <ejegg>	 yeah, good call. smashpig is still kind of a basket of functionality
[22:38:07] <eileen>	 oh you merged that suppressed fix! I should get that deployed
[22:38:13] <ejegg>	 !log reactivated antifraud / payment-init queue consumer
[22:38:24] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:38:25] <ejegg>	 eileen that just went out along with the smashpig update!
[22:38:39] <ejegg>	 sorry, should have mentioned it
[22:39:39] <eileen>	 ah great
[22:39:47] <eileen>	 I'll need to rerun some grabs
[22:40:15] <eileen>	 although it won't show on the report I want to send caitling to review without this https://gerrit.wikimedia.org/r/#/c/367848/
[22:40:39] <eileen>	 failmail on queue just now?
[22:43:59] <ejegg>	 eileen: oops, that was my fault - I ran the thing manually cause I was impatient waiting for the cronjob to fire
[22:44:25] <ejegg>	 yep, looking at the filters and that one right now!
[22:44:37] <ejegg>	 ok, queue consumers look fine, I'll turn the rest back on
[22:46:20] <ejegg>	 !log reactivated remaining fundraising queue consumers
[22:46:29] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:48:35] <wikibugs>	 (PS1) Ejegg: Fix blank i18n message added by TranslateWiki [extensions/DonationInterface] - https://gerrit.wikimedia.org/r/368108
[22:48:40] <wikibugs>	 (CR) Ejegg: [C: 2] Fix blank i18n message added by TranslateWiki [extensions/DonationInterface] - https://gerrit.wikimedia.org/r/368108 (owner: Ejegg)
[22:49:16] <wikibugs>	 (PS11) Ejegg: Unify queue message handling with SmashPig [extensions/DonationInterface] - https://gerrit.wikimedia.org/r/355453 (https://phabricator.wikimedia.org/T95647)
[22:51:18] <wikibugs>	 (CR) jerkins-bot: [V: -1] Fix blank i18n message added by TranslateWiki [extensions/DonationInterface] - https://gerrit.wikimedia.org/r/368108 (owner: Ejegg)
[22:51:56] <wikibugs>	 (PS2) Ejegg: Fix blank i18n message added by TranslateWiki [extensions/DonationInterface] - https://gerrit.wikimedia.org/r/368108
[22:52:13] <wikibugs>	 (PS12) Ejegg: Unify queue message handling with SmashPig [extensions/DonationInterface] - https://gerrit.wikimedia.org/r/355453 (https://phabricator.wikimedia.org/T95647)
[22:52:34] <ejegg>	 cd
[22:52:37] <ejegg>	 derp
[22:53:19] * ejegg searches for the irssi plugin which asks for confirmation when you enter a valid command line
[23:01:32] <wikibugs>	 (CR) Ejegg: "filters work great!" (2 comments) [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/367847 (https://phabricator.wikimedia.org/T161758) (owner: Eileen)
[23:01:36] <wikibugs>	 (PS2) Ejegg: Add filters to mailing report. [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/367847 (https://phabricator.wikimedia.org/T161758) (owner: Eileen)
[23:01:42] <wikibugs>	 (CR) Ejegg: [C: 2] Add filters to mailing report. [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/367847 (https://phabricator.wikimedia.org/T161758) (owner: Eileen)
[23:01:59] <wikibugs>	 (PS2) Ejegg: Omnimailing - extendeded mailing report - add suppressed [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/367848 (https://phabricator.wikimedia.org/T161758) (owner: Eileen)
[23:02:06] <wikibugs>	 (CR) Ejegg: [C: 2] Omnimailing - extendeded mailing report - add suppressed [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/367848 (https://phabricator.wikimedia.org/T161758) (owner: Eileen)
[23:03:53] <wikibugs>	 (CR) Ejegg: [C: 2] "not sure why composer decided to shuffle installed.json, but this looks fine!" [wikimedia/fundraising/crm/vendor] - https://gerrit.wikimedia.org/r/368106 (owner: Eileen)
[23:05:57] <wikibugs>	 (PS5) Ejegg: Update Omnimail GET to add rml fields [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/365543 (owner: Eileen)
[23:06:53] <wikibugs>	 (CR) Ejegg: [C: 2] "Looks ready for a road test!" [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/365543 (owner: Eileen)
[23:07:16] <ejegg>	 ok eileen, i'm heading out for now
[23:07:34] <eileen>	 ejegg: thanks - I've still got some tweaks to do on that groupmember get, but I think having it merged up to date is cleaner
[23:08:05] <wikibugs>	 (Merged) jenkins-bot: Add filters to mailing report. [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/367847 (https://phabricator.wikimedia.org/T161758) (owner: Eileen)
[23:14:15] <wikibugs>	 (Merged) jenkins-bot: Omnimailing - extendeded mailing report - add suppressed [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/367848 (https://phabricator.wikimedia.org/T161758) (owner: Eileen)
[23:15:10] <icinga-wm>	 PROBLEM - check_mysql on frdb2001 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 2758
[23:18:10] <wikibugs>	 (Merged) jenkins-bot: Update silverpopXmlConnector [wikimedia/fundraising/crm/vendor] - https://gerrit.wikimedia.org/r/368106 (owner: Eileen)
[23:20:10] <icinga-wm>	 PROBLEM - check_mysql on frdb2001 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 2733
[23:20:21] <wikibugs>	 (Merged) jenkins-bot: Update Omnimail GET to add rml fields [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/365543 (owner: Eileen)
[23:24:00] <wikibugs>	 (PS1) Eileen: Merge branch 'master' of https://gerrit.wikimedia.org/r/wikimedia/fundraising/crm into deployment [wikimedia/fundraising/crm] (deployment) - https://gerrit.wikimedia.org/r/368111
[23:25:20] <icinga-wm>	 PROBLEM - check_mysql on frdb2001 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 1947
[23:25:39] <wikibugs>	 (CR) Eileen: [C: 2] Merge branch 'master' of https://gerrit.wikimedia.org/r/wikimedia/fundraising/crm into deployment [wikimedia/fundraising/crm] (deployment) - https://gerrit.wikimedia.org/r/368111 (owner: Eileen)
[23:26:27] <wikibugs>	 (Merged) jenkins-bot: Merge branch 'master' of https://gerrit.wikimedia.org/r/wikimedia/fundraising/crm into deployment [wikimedia/fundraising/crm] (deployment) - https://gerrit.wikimedia.org/r/368111 (owner: Eileen)
[23:27:10] <wikibugs>	 (PS1) Eileen: Update vendor submodule e2f13e9 Update silverpopXmlConnector [wikimedia/fundraising/crm] (deployment) - https://gerrit.wikimedia.org/r/368112
[23:27:26] <wikibugs>	 (CR) Eileen: [C: 2] Update vendor submodule e2f13e9 Update silverpopXmlConnector [wikimedia/fundraising/crm] (deployment) - https://gerrit.wikimedia.org/r/368112 (owner: Eileen)
[23:28:04] <wikibugs>	 (Merged) jenkins-bot: Update vendor submodule e2f13e9 Update silverpopXmlConnector [wikimedia/fundraising/crm] (deployment) - https://gerrit.wikimedia.org/r/368112 (owner: Eileen)
[23:30:10] <icinga-wm>	 PROBLEM - check_mysql on frdb2001 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 1923
[23:30:11] <icinga-wm>	 PROBLEM - check_mysql on frdb1002 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 1210
[23:33:39] <eileen>	 !log civicrm update from fb83798f068ba3365a286e7f131eb5eb5b0e7aae to e83c012581305012145eae45495e7e8ea6f4e249
[23:33:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:35:10] <icinga-wm>	 PROBLEM - check_mysql on frdb1002 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 1355
[23:35:10] <icinga-wm>	 PROBLEM - check_mysql on frdb2001 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 1891
[23:40:10] <icinga-wm>	 PROBLEM - check_mysql on frdb1002 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 1506
[23:40:11] <icinga-wm>	 PROBLEM - check_mysql on frdb2001 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 1872
[23:45:10] <icinga-wm>	 PROBLEM - check_mysql on frdb1002 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 1662
[23:45:11] <icinga-wm>	 PROBLEM - check_mysql on frdev1001 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 1320
[23:45:11] <icinga-wm>	 PROBLEM - check_mysql on frdb2001 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 1836
[23:50:10] <icinga-wm>	 PROBLEM - check_mysql on frdb1002 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 1810
[23:50:11] <icinga-wm>	 RECOVERY - check_mysql on frdev1001 is OK: Uptime: 1329089 Threads: 1 Questions: 86941942 Slow queries: 21378 Opens: 12265 Flush tables: 1 Open tables: 1009 Queries per second avg: 65.414 Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 0
[23:50:12] <icinga-wm>	 PROBLEM - check_mysql on frdb2001 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 1805
[23:55:11] <icinga-wm>	 PROBLEM - check_mysql on frdb1002 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 1962
[23:55:12] <icinga-wm>	 RECOVERY - check_mysql on frdb2001 is OK: Uptime: 1328570 Threads: 1 Questions: 84756143 Slow queries: 7270 Opens: 11624 Flush tables: 1 Open tables: 608 Queries per second avg: 63.795 Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 0