[00:05:58] Fundraising Sprint CAPS LOCK CULTS, Fundraising Sprint Dampness, Fundraising Sprint Evil Twins For Everyone, Fundraising-Backlog, and 2 others: My testing suggests that REMOVING the contribution_status_id (& low-cardinality indexes like is_deleted) will spe... - https://phabricator.wikimedia.org/T247489 [00:07:23] (CR) jerkins-bot: [V: -1] Re-apply WMF patches [wikimedia/fundraising/crm/drupal] - https://gerrit.wikimedia.org/r/622235 (owner: Ejegg) [00:07:49] hmm, that one usually doesn't fail [00:08:06] oh, just from the parent patch failing before [00:08:15] (CR) Ejegg: [V: +2 C: +2] Re-apply WMF patches [wikimedia/fundraising/crm/drupal] - https://gerrit.wikimedia.org/r/622235 (owner: Ejegg) [00:10:40] Fundraising Sprint CAPS LOCK CULTS, Fundraising Sprint Dampness, Fundraising Sprint Evil Twins For Everyone, Fundraising-Backlog, and 2 others: My testing suggests that REMOVING the contribution_status_id (& low-cardinality indexes like is_deleted) will spe... - https://phabricator.wikimedia.org/T247489 [00:11:12] ejegg: so I commented on https://phabricator.wikimedia.org/T247489 about what I think we should drop [00:11:49] given I haven't done the work to be confident about others these seem like ones I've investigated before & which still seem to make sense [00:16:03] (I'll try to figure out the code but it might be easier just to agree the indexes & Dallas can run them as once we make it 'dev friendly' (not break on our locals) we wind up with multiple sql which takes longer [00:21:00] looking at the phab eileen [00:52:59] eileen OK, I'd be happy removing those two at the top (is_deleted, contribution_status_id) + the empty civicrm_activity ones! [00:53:09] thanks for all the testing on that [00:53:10] ! [00:53:51] ejegg: cool - dwisehaupt are you ok working off https://phabricator.wikimedia.org/T247489#6407943 [00:54:04] I haven't actually tested that sql yet but I can do on staging [00:54:33] (PS1) Ejegg: Update drupal submodule [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/622246 [00:54:33] (CR) Ejegg: [C: +2] Update drupal submodule [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/622246 (owner: Ejegg) [00:56:35] sorry. looking. [00:57:47] Fundraising-Backlog, Wikimedia-Fundraising-CiviCRM: Review CiviCRM indexes in order to reduce disk space & improve performance - https://phabricator.wikimedia.org/T126388 (Eileenmcnaughton) [00:57:59] that looks ok to me. [00:58:04] we want to run those soon? [00:58:33] (just trying to plan food since i was silly and forgot to eat a proper lunch today. [01:01:33] (Merged) jenkins-bot: Update drupal submodule [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/622246 (owner: Ejegg) [01:01:35] yeah sure [01:01:56] it's all down & we just run them in the console? [01:03:06] well. i moved us back out of maint about an hour ago, but we can easily put it back and do these. [01:03:19] donations will queue like they were before so it's not a big deal. [01:04:57] i've been trying to track down what caused the issue with pcoombe's script [01:05:08] ok sure - I thought you were doing frdev stuff [01:05:46] tell me when to run the first query! [01:06:27] oh yeah. frdev update was smooth. this odd query is the only thing so far. [01:06:39] give me just a sec and i'll push the maint mode. [01:09:19] ok. civi1001 and it's p-c jobs are in maint mode. [01:09:36] first one was instant [01:10:14] second one too - I guess because you already rebuilt? [01:11:26] hmmmm... [01:11:41] i didn't do any table rebuilds. just a version upgrade. [01:12:30] MariaDB [civicrm]> ALTER TABLE civicrm_activity DROP INDEX index_is_deleted; [01:12:30] Query OK, 0 rows affected (0.025 sec) [01:12:30] Records: 0 Duplicates: 0 Warnings: 0 [01:12:32] ah, well, we did the alters a few weeks/months ago. time is still fuild to mee right now. [01:13:22] nice eileen [01:14:10] it definitely deleted it. [01:14:37] (PS1) Ejegg: Merge branch 'master' into deployment [wikimedia/fundraising/crm] (deployment) - https://gerrit.wikimedia.org/r/622247 [01:14:38] so those are all done (super quickly) - maybe more efficient on latest mariadb? [01:14:44] (CR) Ejegg: [C: +2] Merge branch 'master' into deployment [wikimedia/fundraising/crm] (deployment) - https://gerrit.wikimedia.org/r/622247 (owner: Ejegg) [01:14:54] so the other thing was to enable rpow [01:15:43] quite possible [01:16:25] so to enable rpow we need to add some lines to civicrm.settings.php - that's in localsettings? [01:17:24] yes. i believe so. [01:17:36] that's where i put the DSN next to the rest of them. [01:18:51] ejegg: just updating now - can you review localsettings in a few mins? [01:19:10] sure thing eileen [01:19:26] I put in one commit just now but still doing rpow [01:19:36] I just +2ed the drupal update & was thinking of pushing that out at the same time [01:20:15] kinda torn on this check_bannerimpressions script issue. seems like a change that isn't handling the read_default_file option correctly and thus not passing the password on. [01:24:39] ejegg: OK - I think it's right.... [01:24:47] looking [01:27:11] looks right to me eileen ! [01:27:19] ok - let's try [01:30:11] PROBLEM - check_mysql on frdev1001 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 1236 [01:30:22] !log civicrm revision is ce28723709, config revision is 54c8c7abf2 [01:30:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [01:30:41] checking that frdev replication. should be ok. [01:31:24] hmm that didn't work DB Error: connect failed [01:31:49] heh. the alter apparently was quick everywhere but frdev. [01:31:55] acking that alert [01:32:45] ACKNOWLEDGEMENT - check_mysql on frdev1001 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 1236 Dwisehaupt known - alter catchup [01:33:33] ok. sorry. did you need me to verify something with rpow? [01:33:48] I'm just trying to see what's wrong :-( [01:35:02] ok. just verified mysql connectivity works as that user from civi1001 to fundraisingdb-read.wmnet [01:35:50] the error is [nativecode=php_network_getaddresses: getaddrinfo failed: Name or service no...", "php_network_getaddresses: getaddrinfo failed: Name or service not known") [01:36:48] ohh I think I see it [01:36:52] i would think that points to dns/name resolution [01:37:52] just fyi, we'll probably have another 52+ mins of catchup on frdev1001 on the alter. at least for the one that is running now. [01:38:12] funny - it was instant on prod [01:38:25] yeah. can't really explain that. [01:38:54] and prod has a mix of stretch and buster among the regular frdb hosts. [01:39:31] !log civicrm revision is ce28723709, config revision is 96839009f1 [01:39:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [01:40:17] ok - it seems to be working now - what shall we test? [01:40:19] ejegg: [01:40:25] sure! [01:40:39] slow-starting some QCs? [01:41:02] yeah & also dwisehaupt if I try a query can you see if it goes on the r/o db [01:41:14] sure. i'll take lookg. [01:42:09] ok. have innotop up, it'll be better if it's a longer query. [01:43:54] (PS1) Eileen: Merge branch 'master' of https://gerrit.wikimedia.org/r/wikimedia/fundraising/crm into deployment [wikimedia/fundraising/crm] (deployment) - https://gerrit.wikimedia.org/r/622249 [01:45:05] eileen and dwisehaupt ok, a donation QC slow start just worked! [01:45:20] eileen: are there new changes? I just +2ed a merge to deployment [01:45:47] ejegg: nope - you might have just done the same merge? [01:46:02] oh right, but there's no auto-submit on deploy still [01:46:13] dwisehaupt: I might need the UI to do a decent slow query :-) [01:47:09] ha. [01:47:27] yeah. i'm seeing growth in connections, queries, and handlers. [01:47:45] https://frmon.frdev.wikimedia.org/d/000000273/mysql?orgId=1&from=1598319084385&to=1598319984385&var-dc=Prometheus&var-server=frdb1001.frack.eqiad.wmnet&var-port=9004 [01:48:49] eileen: i just submitted that one [01:48:58] want to redeploy [01:49:45] ? [01:49:52] !log civicrm revision changed from ce28723709 to 0f195c6cca, config revision is 96839009f1 [01:49:54] done [01:49:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [01:50:18] dwisehaupt: can you re-enable front end? [01:50:22] sure thing. [01:52:55] dwisehaupt: OK slow query coming up (you'll need to kill this one I think) [01:52:56] ok. maint mode removed and pushed out. we should be back to normal. [01:53:17] are you user 192? [01:53:20] yep [01:53:21] cause i see that. :) [01:53:34] on the read only connect? [01:54:05] yeah. it's on read only, not the master [01:54:15] nice [01:54:20] wanna kill it then? [01:54:23] sure. [01:54:35] so that would have been one query that didn't kill master :-) [01:54:48] killed. [01:54:51] is everything back on? [01:54:56] yes it is. [01:55:00] we are live. :) [01:56:04] queue backlog just about clear too. [01:56:16] great [01:56:21] gonna take a minute and have a quick dinner. back in a few. [01:56:30] I'm gonna test that dedupe code [01:58:58] cool cool [01:59:17] Fundraising Sprint CAPS LOCK CULTS, Fundraising Sprint Dampness, Fundraising Sprint Evil Twins For Everyone, Fundraising-Backlog, and 2 others: My testing suggests that REMOVING the contribution_status_id (& low-cardinality indexes like is_deleted) will spe... - https://phabricator.wikimedia.org/T247489 [02:00:07] Fundraising-Backlog, Wikimedia-Fundraising-CiviCRM, fundraising Sprint Q 2020 for real: Wow let's look at that contribution_status_id index again - https://phabricator.wikimedia.org/T257132 (Eileenmcnaughton) [02:02:44] ejegg: are the queues back on? [02:04:41] I think so - didn't dwisehaupt say the backlog was just about done? [02:06:45] yep, they're running eileen [02:07:03] ok - I've spotted a couple of weird things. [02:07:23] errors? [02:07:27] or slowness? [02:07:38] UI inconsistencies? [02:07:40] I expect it was the upgrade - but the change log is not showing the proper report [02:08:06] oh hmm, that's not something I'd expect [02:08:23] ok. back. [02:08:27] yeah - upgrade related I think. I'll take a few mins break & then look [02:08:42] hmm, frdev replication lag is steadily increasing [02:08:50] dwisehaupt: I've found 1 or 2 things but related to civi upgrade rather than mtce I think [02:09:11] dwisehaupt: looks like frdev1001 stopped replicating about an hour ago [02:09:25] ejegg: it hasn't stopped replicating, it's just chewing on the alter still. [02:09:31] ohhhh ok [02:10:05] it's on the second alter. [02:10:40] odd it was so quick on prod! [02:11:05] yeah. can't explain that. i expected it to be slow everywhere. :) [02:13:52] only real hanging bit i have for the buster upgrade is the bannerimpressions script. i have a feeling we may need to alter it for python3 and possibly a new mysql module. [02:14:12] well, python3 isn't really needed, but if we are going to be in there anyay may as well do that. [02:16:12] (PS1) Eileen: Add general settings file [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/622251 [02:17:08] (PS2) Eileen: Add general settings file [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/622251 [02:18:03] (PS3) Eileen: Add general settings file [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/622251 [02:18:05] (PS1) Eileen: Enable rpow on new dev sites & CI [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/622252 [02:19:30] ejegg: I have a theory about the change log - maybe I missed defining the logging dsn [02:22:21] I've just pushed a commit that adds it - I wonder if that means read queries aren't on the r/o db [02:30:43] hmm - that made it worse :-( [02:30:59] i do see the occasional query coming to read. this one is fr_stats [02:32:24] sorry eileen, was eating & doing dishes [02:32:31] looking now [02:32:36] ejegg: no worries - it didn't work :-( [02:32:40] I pushed it [02:33:26] so is frdb-read actually reading from the super-lagged version? [02:33:33] that would be a good way to test I guess [02:34:04] well before that change it was concluding logging was not enabled & giving the change log (not logging log) [02:34:12] but with - I'm getting a 500 error [02:34:21] civicrm/report/instance/221?reset=1&force=1&snippet=json§ion=2&altered_contact_id_op=eq&altered_contact_id_value=478&cid=478 [02:35:32] ooh, in a squinched up little iframe even [02:35:35] seeing some civi_read selects on frdb1001 [02:35:51] *258 upstream sent too big header while reading response header from upstream [02:36:15] oh, that looks like a legit json response to me [02:36:25] in ff anyway [02:36:48] huh - your 258 error sounds like maybe nginx unhappy? [02:36:54] if you go to contact 478 & go to the change log [02:37:02] that's where it should load [02:37:40] ok, now I see the error in the UI [02:38:06] that error 258 is from syslogg [02:39:55] dwisehaupt: frdev isn;'t loading - is that 'known' [02:42:35] that is not known. it should be back up. [02:42:36] looking. [02:44:24] dwisehaupt: also the error on live - seems nginxy- it's a bit obscure but I find https://talk.plesk.com/threads/nginx-error-upstream-sent-too-big-header.338232/ [02:44:55] of course the result might just be uberweird & large... [02:45:07] just verified that the connection string actually works and lets us select from a bunch of random tables [02:45:29] eileen: i think there must be an underlying error that the nginx thing is masking, no? [02:45:37] nothing more in Civi logs? [02:45:37] probably [02:45:46] but no not in the civi logs [02:45:55] & can't look on staging just yet [02:46:14] + local rebuilding & bit borked [02:46:44] hmm, dwisehaupt would any SQL errors on frdb-read be logged on the DB side? [02:46:49] or could they be? [02:47:11] it's a quick fail so likely the connection is the problem [02:48:23] ok. dash is working and that's hosted on frdev also. [02:51:18] so it's handing off to apache, now to figure out what apache is doing wrong. [02:51:20] is it possible we're steering a write query to frdeb-read by mistake? [02:51:33] so to be clear 2 things [02:51:33] 1) 500 on homepage of frdev [02:51:33] 2) 502 error on live on above url [02:51:41] can we log those on the db side? [02:51:57] i'm chasing #1 right now. let me know if you'd prefer i look at #2 instead first. [02:51:57] (log write queries on the read connection, that is) [02:51:57] the triggers should be fine - it's the db query to read [02:53:37] so. when hitting the apache layer, it's returning a 500 for / [02:53:39] ejegg: tangental but relevant to me as my local is blocking me digging there - I got https://gerrit.wikimedia.org/r/c/wikimedia/fundraising/crm/+/622251 to run right through & enable logging with the triggers locally [02:53:50] dwisehaupt: yes - on staging [02:54:16] yeah. [02:54:18] ejegg: also https://gerrit.wikimedia.org/r/c/wikimedia/fundraising/crm/+/622252/1 (on the replicate locally front) is passing - yay [02:54:45] previously that would be followed with the css and all. so to figure out what went awry. [02:56:58] yeah! [02:57:48] ah. it's not logged in the apache error log. [02:58:16] PHP Fatal error: Uncaught Error: Call to undefined function cache_get() in /srv/org.wikimedia.civicrm/drupal/includes/module.inc:762 [02:59:10] yerp. that's it. [02:59:46] grep apache /var/log/syslog if you want to see the full error [03:00:39] ooh, crap, drupal upgrade issue? [03:00:52] ohh [03:00:59] good eye dwisehaupt [03:01:12] is that staging or prod? [03:01:33] staging. [03:01:45] i can move on to the prod 502 next. [03:02:15] hmm, does staging have the drupal update from today? [03:02:25] module.inc hasn't changed lately [03:02:45] it definitely has a lot of calls to cache_get [03:03:01] defined in includes/cache.inc [03:03:23] I didn't deploy it [03:03:25] it should be up to date. it went to buster, but other than that, no changes were made by me. [03:03:27] and cache.inc is required in bootstrap.inc [03:03:43] dwisehaupt: I don't think staging has the drupal patch then [03:03:52] we only just now merged that to gerrit [03:05:55] ejegg: ok the change log loads locally now locally is happy. The change in civicrm.settings.php for local dsn still seems right too [03:07:30] adding that LOGGING_DSN? [03:09:12] eileen: i have test i run for that 502. give me just a sec. [03:09:22] ejegg: yep [03:09:45] ok, looks like it should work! [03:11:47] eileen: try it now. [03:11:56] i've temp increased the proxy buffers. [03:12:57] dwisehaupt: yes - that fixes 502 on live [03:13:07] but the contents don't seem silly huge [03:13:38] maybe 2-3 pages [03:13:39] ok. i can codify that. the default buffer sizes aren't that big. only 4k [03:15:03] i bumped them up to 128k which is a big bump, but not unreal. we could tune for that query if we'd really like to min/max it. [03:16:21] hmm - I just hit it again - but on a page it had been OK for just before [03:16:46] it's possible puppet fixed my temp fix. :) [03:17:24] yeah, it did. [03:17:32] weird, why would any of that have changed just now though? [03:17:53] new nginx version. possibly new defaults. [03:18:19] i can codify this but it'll take a few since they are new config elements. [03:18:40] i am running towards the end of my usefulness since i've been at the console for over 13 hours. [03:20:39] oops, silverpop export just failed [03:20:43] checking that [03:21:14] oh. civicrm has it's own template file for this. that's nicer. [03:21:55] uhhh weird, pymysql.err.DataError: (1406, "Data too long for column 'subsidiaries' at row 325") [03:22:19] we shouldn't be exporting that column with the matching gifts data, should we? [03:22:27] looking [03:22:49] dwisehaupt: can we leave it up for now & tune later? [03:23:57] oh weird, they ARE exporting the subsidiaries [03:24:02] ugh, that's not useful [03:24:08] ejegg: I wish matching gifts were it's own job - but ignoring that for now... we could just make the column longer.... [03:24:23] yep, will do [03:25:21] oooh, nasty - it's defined as a varchar(5000) just to make the tests work [03:25:35] since sqlite doesn't support blobs [03:25:37] boooooo [03:25:46] i just wanna delete it [03:25:54] oh, but that'll break the silverpop-side import [03:26:03] well, I can just insert blanks for now [03:26:17] they really really really can't be using it to fill our mail templates [03:28:15] (PS1) Ejegg: Blank out subsidiaries column in matching gifts data export [wikimedia/fundraising/tools] - https://gerrit.wikimedia.org/r/622255 [03:28:36] eileen: yes. i have left it large. just pushed the change so it should be good now and stay that way. [03:28:54] eileen: want to peek at ^^^ ? [03:28:55] cool - well it should be OK for users for now [03:29:25] (CR) Eileen: [C: +2] "Yep - will get us past today!" [wikimedia/fundraising/tools] - https://gerrit.wikimedia.org/r/622255 (owner: Ejegg) [03:29:27] ok. [03:29:54] ejegg: done - plus I hit submit before it had a chance to unverify [03:30:04] ejegg: was there something related to the drupal patch that i need to do? [03:30:10] RECOVERY - check_mysql on frdev1001 is OK: Uptime: 11155 Threads: 12 Questions: 80111 Slow queries: 6382 Opens: 385338 Flush tables: 1 Open tables: 200 Queries per second avg: 7.181 Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 0 [03:30:25] oh yay, the alters finished. [03:35:16] (Abandoned) Eileen: Merge branch 'master' of https://gerrit.wikimedia.org/r/wikimedia/fundraising/crm into deployment [wikimedia/fundraising/crm] (deployment) - https://gerrit.wikimedia.org/r/622249 (owner: Eileen) [03:38:47] Fundraising-Backlog, fundraising-tech-ops: check_bannerimpressions.py failing after frdev1001 upgrade to buster - https://phabricator.wikimedia.org/T261180 (Dwisehaupt) [03:44:45] (PS1) Ejegg: Merge branch 'master' into deploy [wikimedia/fundraising/tools] (deploy) - https://gerrit.wikimedia.org/r/622256 [03:44:53] (CR) Ejegg: [C: +2] Merge branch 'master' into deploy [wikimedia/fundraising/tools] (deploy) - https://gerrit.wikimedia.org/r/622256 (owner: Ejegg) [03:45:24] (Merged) jenkins-bot: Merge branch 'master' into deploy [wikimedia/fundraising/tools] (deploy) - https://gerrit.wikimedia.org/r/622256 (owner: Ejegg) [03:48:32] dwisehaupt: frdev is still down but nothing that can't wait for tomorrow! [03:48:49] ok. that's just the drupal update bit, yes? [03:49:09] not sure - I did the git pull - but don't let it keep you on line [03:49:25] you want me to include that in my close out email? that it's know and being worked on? [03:49:32] i'm not sure how much it's used. [03:49:45] go for it -mostly frtech use it [03:49:53] ok. [03:50:03] I was gonna send an email about read-only or triggers [03:50:15] or I can write something for you to include [03:50:58] i can just pop something in saying that it's currently unavailable but a known issue. [03:51:12] unless you want to include more detail. in which case, sure. [03:51:59] basically [03:51:59] 1) we removed some indexes that were slowing down some queries. A small number of queries will be slower now (notably searching on Cancelled donations or any other non-completed status) but more will be faster [03:51:59] 2) we enabled some read only connection work the CiviCRM core team did on our behalf which will re-direct some queries off our main server & reduce our risk of an awol query locking it up [03:52:24] cool. i'll add that in. [03:52:30] & 3) staging copy of civi is not accessible ATM but being worked on [03:52:52] not accessible via the staging website (db connections still OK) [03:54:12] pushing that tools update out [04:00:51] ok. i'm going to head out now. [04:01:03] have a good rest of your day/night folks. [04:02:30] !log updated fundraising python tools from 305f2a4438 to dcad0bfe75 [04:02:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [04:07:41] ok, that's the silverpop daily job kicked off again [04:07:50] gonna head to bed. night eileen! [04:07:55] or have a good day... [04:08:02] night - [05:55:37] (PS1) Eileen: Enable shoreditch in our settings [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/622262 [09:53:04] Fundraising-Backlog, fundraising-tech-ops: MySQLdb python scripts failing after frdev1001 upgrade to buster - https://phabricator.wikimedia.org/T261180 (Pcoombe) [09:54:34] Fundraising-Backlog, fundraising-tech-ops: MySQLdb python scripts failing after frdev1001 upgrade to buster - https://phabricator.wikimedia.org/T261180 (Pcoombe) I'm getting the same error for other more important scripts (e.g. banner_test) which use the same Python module :/ Will look into it. [13:47:19] Fundraising-Backlog, fundraising-tech-ops, Patch-For-Review: MySQLdb python scripts failing after frdev1001 upgrade to buster - https://phabricator.wikimedia.org/T261180 (Pcoombe) Open→Resolved a:Pcoombe Okay, these are fixed. The upgrade meant it was no longer automatically expanding out... [13:49:48] Fundraising-Backlog, fundraising-tech-ops: MySQLdb python scripts failing after frdev1001 upgrade to buster - https://phabricator.wikimedia.org/T261180 (Pcoombe) [14:32:46] (CR) Mepps: [C: +2] Paypal Audit: Move the check for missing subscription ids to a later stage in the audit process to avoid overthrowing exceptions. [wikimedia/fundraising/tools] - https://gerrit.wikimedia.org/r/620123 (https://phabricator.wikimedia.org/T243005) (owner: Jgleeson) [14:33:31] (Merged) jenkins-bot: Paypal Audit: Move the check for missing subscription ids to a later stage in the audit process to avoid overthrowing exceptions. [wikimedia/fundraising/tools] - https://gerrit.wikimedia.org/r/620123 (https://phabricator.wikimedia.org/T243005) (owner: Jgleeson) [14:38:20] Fundraising-Backlog: [Enhancement] Allow for updating header copy on the Adyen payments page - https://phabricator.wikimedia.org/T261209 (jbolorinos-ctr) [14:43:33] Fundraising-Backlog: Remove cc selection on Adyen payments page - https://phabricator.wikimedia.org/T261210 (jbolorinos-ctr) [14:48:14] Fundraising Sprint Octopus hugs, Fundraising Sprint Pseudopretzels, Fundraising-Backlog, fundraising Sprint Q 2020 for real: Fr-tech chores list - https://phabricator.wikimedia.org/T258527 (mepps) In taking this on, I'm definitely feeling failmail fatigue. I'm glad we'll be turning off the duplic... [14:52:28] Fundraising-Backlog: Bug: Silverpop nightly jobs failed due to session limits - https://phabricator.wikimedia.org/T261211 (mepps) [14:53:29] Fundraising-Backlog: [Enhancement] Include a Continue button at the bottom of the wiki payments page - https://phabricator.wikimedia.org/T261212 (jbolorinos-ctr) [14:55:33] Fundraising-Backlog, FR-Q2-FY2020-21-cleanup-list: Adyen message in queue without pending entry - https://phabricator.wikimedia.org/T259253 (mepps) Also 80506377.4 failed for the same reason last night. [14:58:06] Fundraising-Backlog, FR-Q2-FY2020-21-cleanup-list: Adyen message in queue without pending entry - https://phabricator.wikimedia.org/T259253 (mepps) Since yesterday, I see 7 possible instances of this in the damaged messages. [15:05:34] fr-tech: Was there a ticket for Ingenico's data center problems? Grant is a bit curious what went on there. [15:06:57] what are his questions XenoRyet? i'm looking at a couple email threads [15:07:27] i don't see any tickets like this [15:08:44] XenoRyet there's an email thread with them in fr-tech with subject Re: Timeouts. ***urgent*** [15:09:01] Thanks [15:10:39] i don't think we opened a ticket. pretty early on we determined there wasn't action we could take to address it. just had to wait out the dns caching. [15:11:14] from reading the threads it looks like we just routed all cc processing to adyen [15:11:55] also ejegg dwisehaupt we've now had two messages rejected due to unknown db errors this morning [15:12:20] i just went in to the db and i see a couple queries taking 30+ seconds, should i be concerned about a deadlock there? [15:12:53] could be mepps - do the queries look like big reports, or something more routine [15:12:56] ? [15:13:03] all of them seem to have finished now [15:13:14] yeah they were reports [15:14:21] ok, 30 seconds sounds about right then [15:14:54] though I guess we would hope those could run on the read-only copy [15:14:57] now [15:15:16] did we get that turned on ejegg? [15:15:28] yes! [15:15:34] well, in theory anyway [15:15:44] ooh that's great [15:15:55] I'm not sure if we've observed the queries hitting the read copy [15:16:05] it was pretty late at that point [15:16:09] my time [15:16:13] so i used drush cvsqlc which by default gets the read db [15:16:30] so i might have wanted to set the config value to the write db [15:16:50] Fundraising Sprint Octopus hugs, Fundraising Sprint Pseudopretzels, Fundraising-Backlog, Wikimedia-Fundraising-CiviCRM, and 2 others: Readonly DB connection available to turn on - https://phabricator.wikimedia.org/T237350 (mepps) Apparently this is now turned on! [15:23:34] ohh, interesting [15:24:33] mepps do you want to review the config to see that it's got the right values for that new var you added to the RODB extension? [15:27:12] yeah the config is set up correctly [15:27:19] i'm not sure why this isn't working like i'd expect... [15:27:41] thanks for the review on the paypal audit patch mepps! I'll push that out [15:29:11] i wonder about the way we're trying to set it dynamically... [15:30:03] Fundraising-Backlog: Bug: Investigate whether rpow drush cvsqlc is working as expected - https://phabricator.wikimedia.org/T261213 (mepps) [15:33:54] ejegg: how come we're blanking out the subsidiaries field on the matching gifts stuff? https://gerrit.wikimedia.org/r/c/wikimedia/fundraising/tools/+/622255 [15:34:04] I don't see a ticket attached to the patch [15:34:16] jgleeson: there was an error when we ran it last night - that field was too long [15:34:28] ah [15:34:30] and I couldn't imagine any use they'd have for that up in silverpop [15:34:38] so I figured we'd delete the column today [15:34:58] but just to get the thing working last night, blanking was the easiest way [15:36:37] yeah I also can't see them using that data. parsing it alone would probably be a big task [15:36:46] in the acoustic world [15:37:39] mepps: i'm not sure about the drush setup. we were definitely seeing read connections going to the read db vs the write one. but i was just focused on the db and queries coming from civi. [15:37:55] i'm ready to help if you need any. [15:38:11] (PS1) Jgleeson: Merge branch 'master' into deploy [wikimedia/fundraising/tools] (deploy) - https://gerrit.wikimedia.org/r/622383 [15:41:31] Fundraising-Backlog, FR-Adyen: [Enhancement] Allow for updating header copy on the Adyen payments page - https://phabricator.wikimedia.org/T261209 (DStrine) [15:41:43] Fundraising-Backlog, FR-Adyen: Remove cc selection on Adyen payments page - https://phabricator.wikimedia.org/T261210 (DStrine) [15:41:53] Fundraising-Backlog, FR-Adyen: [Enhancement] Include a Continue button at the bottom of the wiki payments page - https://phabricator.wikimedia.org/T261212 (DStrine) [15:42:05] (CR) Jgleeson: [C: +2] Merge branch 'master' into deploy [wikimedia/fundraising/tools] (deploy) - https://gerrit.wikimedia.org/r/622383 (owner: Jgleeson) [15:42:39] (Merged) jenkins-bot: Merge branch 'master' into deploy [wikimedia/fundraising/tools] (deploy) - https://gerrit.wikimedia.org/r/622383 (owner: Jgleeson) [15:44:39] !log fundraising-tools updated from dcad0bfe75 to 3fe3a23114 [15:44:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:45:58] Fundraising-Backlog: [Mobile] Card Number and CVV fields allow non-numerical input - https://phabricator.wikimedia.org/T261216 (jbolorinos-ctr) [15:46:15] (PS2) Jgleeson: Revert "Add in custom http_request function for OANDA" [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/621494 (https://phabricator.wikimedia.org/T260092) [15:49:44] Fundraising-Backlog: Bug: Investigate whether rpow drush cvsqlc is working as expected - https://phabricator.wikimedia.org/T261213 (mepps) This is likely connected to my own patch: https://github.com/totten/rpow/pull/7/commits/c59e3766dbd9be9ee48e96d98db8c81336449a06. [15:53:11] Fundraising-Backlog, FR-Adyen: [Mobile] Card Number and CVV fields allow non-numerical input - https://phabricator.wikimedia.org/T261216 (DStrine) [15:59:37] Fundraising-Backlog: "Previous" navigation button does not work on Adyen payment form - https://phabricator.wikimedia.org/T260749 (jbolorinos-ctr) There's not an error message page available for the FR banners {F32194607} [16:13:00] fundraising-tech-ops, DC-Ops, Operations: RAID controller failing on frdb1002.frack.eqiad.wmnet - https://phabricator.wikimedia.org/T261221 (Jgreen) [16:13:33] fundraising-tech-ops, DC-Ops, Operations: RAID controller failing on frdb1002.frack.eqiad.wmnet - https://phabricator.wikimedia.org/T261221 (Jgreen) [16:21:20] fundraising-tech-ops, DC-Ops, Operations: RAID controller failing on frdb1002.frack.eqiad.wmnet - https://phabricator.wikimedia.org/T261221 (Jgreen) [16:28:30] cstone I think I figured out the issue with the l10n function in the new mustache [16:28:56] newer mustache no longer passes all the args to helper functions in a single array [16:29:04] nice ejegg I was just looking at that too [16:29:14] now it passes them as separate args [16:29:21] was trying to figure out if the template had changed [16:29:29] so we need l10n to take variable arguments [16:29:31] er the way to pass them changed [16:33:14] cstone fortunately it's a very small change [16:33:30] are you familiar with the ellipses syntax in php? [16:34:03] I know it in JS is it similiar [16:34:19] function(...$blah) will take all the args and stuff them in array $blah [16:34:27] is that how it works in JS? [16:37:35] hmm not sure if exactly the same but the same general thing [16:39:05] oh, so we seem to always pop the first arg off [16:39:16] https://www.php.net/manual/en/functions.arguments.php#functions.variable-arg-list then? [16:39:16] so we'd want $key, ...$params here [16:39:29] yep cstone that's exactly what we want [16:40:47] oops dstrine just saw your meeting! [16:45:16] fundraising-tech-ops, fr-tech-ops-okr: Update staging to match production (php, buster) - https://phabricator.wikimedia.org/T260629 (Dwisehaupt) This was completed in the maintenance window last night. Staging is now running buster and updated to the latest package versions. [16:48:43] fundraising-tech-ops: Epic: Upgrade fundraising servers to buster - https://phabricator.wikimedia.org/T254198 (Dwisehaupt) [16:53:30] Fundraising-Backlog, fundraising-tech-ops: Plan for FY20-21 Q1 fundraising database maintenance window - https://phabricator.wikimedia.org/T257919 (Dwisehaupt) During the maint window we removed some indexes as specified in T247489. They were almost instantaneous on the db cluster but took time to replic... [16:53:43] Fundraising-Backlog, fundraising-tech-ops: Plan for FY20-21 Q1 fundraising database maintenance window - https://phabricator.wikimedia.org/T257919 (Dwisehaupt) [17:06:04] cstone: want to do a quick video call to make those updates? [17:06:30] sure just a sec let me relocate the computer [17:09:10] in the mid sprint meet? [17:11:00] sure cstone, be there in a sec [17:11:08] looks like we also need a partialresolver: https://zordius.github.io/HandlebarsCookbook/9902-lcop-partialresolver.html [17:12:38] fundraising-tech-ops, DC-Ops, Operations: RAID controller failing on frdb1002.frack.eqiad.wmnet - https://phabricator.wikimedia.org/T261221 (Jgreen) [17:13:00] fundraising-tech-ops, DC-Ops, Operations: RAID controller failing on frdb1002.frack.eqiad.wmnet - https://phabricator.wikimedia.org/T261221 (Jgreen) [17:13:31] fundraising-tech-ops, DC-Ops, Operations: RAID controller failing on frdb1002.frack.eqiad.wmnet - https://phabricator.wikimedia.org/T261221 (Jgreen) [17:23:23] Fundraising-Backlog, FR-Adyen: Remove cc selection on Adyen payments page - https://phabricator.wikimedia.org/T261210 (Pcoombe) It's not actually a selection interface, they are just images of the accepted card logos. But agree this is kind of confusing. [17:32:53] fundraising-tech-ops, DC-Ops, Operations: RAID controller failing on frdb1002.frack.eqiad.wmnet - https://phabricator.wikimedia.org/T261221 (Jgreen) [17:34:10] Fundraising-Backlog, Wikimedia-Fundraising-CiviCRM, FR-Email: Matching Gifts database data cleanup - https://phabricator.wikimedia.org/T260935 (MNoorWMF) From my understanding, the only usable columns we have that can be pulled into the email automation are: column B: id (should be named employer_id... [17:34:38] (PS1) Cstone: WIP for 1_35: Update Mustache for LightnCandy 1.2.5 [extensions/DonationInterface] - https://gerrit.wikimedia.org/r/622401 (https://phabricator.wikimedia.org/T260621) [17:34:56] fundraising-tech-ops, DC-Ops, Operations: RAID controller failing on frdb1002.frack.eqiad.wmnet - https://phabricator.wikimedia.org/T261221 (Jgreen) [17:36:17] (CR) jerkins-bot: [V: -1] WIP for 1_35: Update Mustache for LightnCandy 1.2.5 [extensions/DonationInterface] - https://gerrit.wikimedia.org/r/622401 (https://phabricator.wikimedia.org/T260621) (owner: Cstone) [17:36:32] (CR) Ejegg: [V: +2 C: +2] Remove non fundraising extensions and skins. [core] (fundraising/REL1_35) - https://gerrit.wikimedia.org/r/619814 (owner: Cstone) [17:37:03] (CR) jerkins-bot: [V: -1] Remove non fundraising extensions and skins. [core] (fundraising/REL1_35) - https://gerrit.wikimedia.org/r/619814 (owner: Cstone) [17:38:13] (PS2) Cstone: WIP for 1_35: Update Mustache for LightnCandy 1.2.5 [extensions/DonationInterface] - https://gerrit.wikimedia.org/r/622401 (https://phabricator.wikimedia.org/T260621) [17:40:33] fundraising-tech-ops, DC-Ops, Operations: RAID controller failing on frdb1002.frack.eqiad.wmnet - https://phabricator.wikimedia.org/T261221 (Jgreen) [17:40:38] (CR) jerkins-bot: [V: -1] WIP for 1_35: Update Mustache for LightnCandy 1.2.5 [extensions/DonationInterface] - https://gerrit.wikimedia.org/r/622401 (https://phabricator.wikimedia.org/T260621) (owner: Cstone) [17:59:31] fundraising-tech-ops, DC-Ops, Operations: RAID controller failing on frdb1002.frack.eqiad.wmnet - https://phabricator.wikimedia.org/T261221 (Jgreen) [18:00:57] fundraising-tech-ops, DC-Ops, Operations: RAID controller failing on frdb1002.frack.eqiad.wmnet - https://phabricator.wikimedia.org/T261221 (Jgreen) Note that fr-tech-ops intents to schedule a firmware upgrade, but our concern is that the upgrade is likely surface an underlying hardware issue rather... [18:09:46] darn cstone I should have merged your fundraising/REL1_35 patches earler [18:09:49] *earlier [18:10:11] I'm trying to merge them now and it looks like other people have done some updates to the branch [18:10:16] that make them unmergeable [18:10:38] I just tried a local rebase but now git-review is giving me tons of 'remote: ERROR: Implicit Merge of ...' [18:10:57] will try just cherry-picking them one at a time onto a fresh checkout [18:13:16] ah boo ejegg [18:14:18] im going on a quick caterpillar food run but ill be back in like 10 if I need to redo them [18:15:44] very hungry caterpillars? [18:28:59] (PS2) Ejegg: Remove non fundraising extensions and skins. [core] (fundraising/REL1_35) - https://gerrit.wikimedia.org/r/619814 (owner: Cstone) [18:29:07] ok, that seems to be happier [18:29:58] (PS2) Ejegg: Add logos. [core] (fundraising/REL1_35) - https://gerrit.wikimedia.org/r/619815 (owner: Cstone) [18:30:00] (PS9) Ejegg: Add payments-wiki extensions. [core] (fundraising/REL1_35) - https://gerrit.wikimedia.org/r/619817 (owner: Cstone) [18:30:02] (PS2) Ejegg: Update vendor for merged fundraising deps [core] (fundraising/REL1_35) - https://gerrit.wikimedia.org/r/620118 (owner: Cstone) [18:31:16] it was just one caterpillar but i acquired another on the food run but yes extremely hungry [18:33:29] (CR) Ejegg: [V: +2 C: +2] Remove non fundraising extensions and skins. [core] (fundraising/REL1_35) - https://gerrit.wikimedia.org/r/619814 (owner: Cstone) [18:46:58] fundraising-tech-ops: clamav-freshclam failing to start on civi1001 - https://phabricator.wikimedia.org/T260562 (Dwisehaupt) Open→Resolved a:Dwisehaupt We have decided to remove the UpdateLogFile config element for now as we will still get logs to syslog. We can pull more detailed logs in the cas... [18:51:06] (CR) Ejegg: [V: +2 C: +2] Add logos. [core] (fundraising/REL1_35) - https://gerrit.wikimedia.org/r/619815 (owner: Cstone) [19:07:25] fundraising-tech-ops: set up backup cycle for fran1001 - https://phabricator.wikimedia.org/T261003 (Jgreen) [19:27:00] (PS2) Jgleeson: Add stats counter to contribution tracking qc so that we can record duplicate ct exceptions and processing rates. [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/622196 (https://phabricator.wikimedia.org/T256037) [19:28:11] Wikimedia-Fundraising-Banners: QA for enUS pre-test on August 26 - https://phabricator.wikimedia.org/T261147 (jbolorinos-ctr) [19:28:14] Wikimedia-Fundraising-Banners: 2020-08-26 enUS pre-test: desktop large batch - https://phabricator.wikimedia.org/T261148 (jbolorinos-ctr) Open→Resolved a:jbolorinos-ctr Test donation completed, everything looks good here. Closing this now as QA has completed and these banners are now READY TO TEST! [19:29:07] (PS3) Cstone: WIP for 1_35: Update Mustache for LightnCandy 1.2.5 [extensions/DonationInterface] - https://gerrit.wikimedia.org/r/622401 (https://phabricator.wikimedia.org/T260621) [19:30:10] (PS3) Jgleeson: Add stats counter to contribution tracking qc so that we can record duplicate ct exceptions and processing rates. [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/622196 (https://phabricator.wikimedia.org/T256037) [19:30:51] (CR) jerkins-bot: [V: -1] WIP for 1_35: Update Mustache for LightnCandy 1.2.5 [extensions/DonationInterface] - https://gerrit.wikimedia.org/r/622401 (https://phabricator.wikimedia.org/T260621) (owner: Cstone) [19:37:08] (CR) jerkins-bot: [V: -1] Add stats counter to contribution tracking qc so that we can record duplicate ct exceptions and processing rates. [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/622196 (https://phabricator.wikimedia.org/T256037) (owner: Jgleeson) [19:48:31] Fundraising-Backlog, Wikimedia-Fundraising-CiviCRM, fundraising Sprint Q 2020 for real: Wow let's look at that contribution_status_id index again - https://phabricator.wikimedia.org/T257132 (mepps) As a user, I would like indexes on contribution_status_id not to be a barrier to exporting or searching... [20:07:41] Wikimedia-Fundraising-Banners: Orphan text visible on Japanese dsk lg Monthly Convert step - https://phabricator.wikimedia.org/T259535 (jbolorinos-ctr) Thanks @Pcoombe ! Here's what it looks like now: {F32194753} The orphan text is gone but now the second blue button looks like it could maybe use a little m... [20:25:11] (PS4) Jgleeson: Add stats counter to contribution tracking qc so that we can record duplicate ct exceptions and processing rates. [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/622196 (https://phabricator.wikimedia.org/T256037) [22:45:48] Fundraising-Backlog, SRE Program Management, fr-donorservices, Performance Issue: IN donors receiving DNS error on Wikipedia - https://phabricator.wikimedia.org/T260563 (MBeat33) Open→Resolved a:MBeat33 None of the donors followed up with any additional feedback, so it makes sense to... [22:46:51] Fundraising-Backlog: Unclear error message for missing expiration date on payments wiki page - https://phabricator.wikimedia.org/T261256 (jbolorinos-ctr) [23:02:44] cstone so everything looks good on my local except that the Template:LanguageSwitch seems to have been damaged on update - not related to DonationInterface [23:04:00] weird, says it doesn't exist but when I try to create it it keeps saying there's an edit conflict [23:10:53] phab created per discussion https://phabricator.wikimedia.org/T261257