[00:24:27] 10DBA, 06Operations, 13Patch-For-Review: Set up TLS for MariaDB replication - https://phabricator.wikimedia.org/T111654#2223067 (10jcrespo) [00:24:30] 10DBA, 06Operations, 13Patch-For-Review: Restart pending mysql hosts with old TLS cert - https://phabricator.wikimedia.org/T152188#3007856 (10jcrespo) 05Open>03Resolved All hosts with the old expiring cert have been reimagened or (if scheduled for decomission), restarted: ``` sudo salt --output=txt -C 'G... [00:28:33] 10DBA, 06Operations, 13Patch-For-Review: Set up TLS for MariaDB replication - https://phabricator.wikimedia.org/T111654#3007873 (10jcrespo) After resolving T152188, pending hosts: ``` $ sudo salt --output=txt -C 'G@cluster:mysql' cmd.run 'mysql -BN --skip-ssl -e "SELECT @@ssl_ca"' | grep NULL db1020.eqiad.... [07:27:16] 10DBA, 06Labs, 13Patch-For-Review: Add and sanitize s2, s4, s5, s6 and s7 to sanitarium2 and new labsdb hosts - https://phabricator.wikimedia.org/T153743#3008324 (10Marostegui) All the tables have been imported to labsdb1009. Stopping and starting mysql worked fine, doing a SELECT over all the tables worked... [07:34:10] 10DBA, 06Labs, 13Patch-For-Review: Add and sanitize s2, s4, s5, s6 and s7 to sanitarium2 and new labsdb hosts - https://phabricator.wikimedia.org/T153743#3008328 (10Marostegui) All the tables have been imported to db1095. Stopping and starting mysql worked fine, doing a SELECT over all the tables worked and... [12:21:32] db1037 created a small spike in errors [12:21:36] *depool [12:22:33] 100 errors, not 100 000 [12:22:39] when it was depooled? [12:22:41] it seem to bee cooler now [12:22:49] ah [12:22:55] it is not db1037 they complain about [12:23:05] but large queries running on db1023 [12:23:23] we may need to put a second rc there (s6) [12:24:12] I will repool it as soon as possible [12:28:21] 10DBA, 13Patch-For-Review: Deploy gtid_domain_id flag in our mysql hosts - https://phabricator.wikimedia.org/T149418#3008849 (10Marostegui) I have added some more comments to the bug report, as I have tested it with all 10.1 versions involved. [12:32:11] jynus: once you are done with your deploys, let me know so I can push this: https://gerrit.wikimedia.org/r/#/c/336612/ no rush [12:32:24] yes [12:32:27] give 2 minutes [12:32:31] sure, no rush at all [12:32:46] just wanted to mention before you head out for lunch :) [12:33:06] you may need to manually rebase [12:40:58] things are going back to normal [12:41:05] good [12:41:14] so we do need another host there then [12:41:14] although around ~30 queries have still to timeout [12:41:39] queries are kept Sorting result [12:41:54] if could be specifically 1023 [12:42:05] and not rc/contributions role [12:42:33] but we needed a restart there anyway [12:42:36] at some point [12:43:18] it was only 181 errors anyway [12:43:28] so probably just one specific query [12:44:12] https://tendril.wikimedia.org/report/slow_queries?host=^db1023&user=wikiuser&schema=wik&qmode=eq&query=&hours=1 [12:44:21] 10DBA, 06Operations, 13Patch-For-Review: Move db1073 to B3 - https://phabricator.wikimedia.org/T156126#3008868 (10Marostegui) Server has been depooled and downtimed - ready to shut it down whenever @Cmjohnson is at the DC. [12:47:13] confirmed ok, the current errors are jobqueue noise, not user requests [12:48:00] :) [13:37:09] 10DBA, 06Labs, 13Patch-For-Review: Add and sanitize s2, s4, s5, s6 and s7 to sanitarium2 and new labsdb hosts - https://phabricator.wikimedia.org/T153743#3009048 (10Marostegui) data has been sanitized in all the hosts (sanitarium2, labsdb1009,10,11). Triggers have been enabled on commonswiki on sanitarium2... [13:40:40] 10DBA, 06Labs, 13Patch-For-Review: Add and sanitize s2, s4, s5, s6 and s7 to sanitarium2 and new labsdb hosts - https://phabricator.wikimedia.org/T153743#3009080 (10jcrespo) > create the views - @chasemp / @yuvipanda is that something you will take care of? > add the grants for commonswiki BTW, we said that... [13:44:08] ah, that heartbeat issue again on db1095 with the new replication channel :) [13:44:33] that is easy to fix, just insert anything with the right PK [13:44:50] it is slightly more strict with row [13:45:19] talking about row [13:46:53] https://mariadb.com/kb/en/mariadb/flashback/ [13:47:16] the actual usful staff will be on 10.3 [13:47:23] *stuff [13:48:37] need help? [13:50:01] I will get you to review the insert for db1095 :) [13:50:54] INSERT server_id=171970589 + INSERT server_id=180359175 [13:51:44] ROW doesn't care about the rest of the fields (and neither we) [13:52:15] ah, I was getting it to be consistent XD [13:53:13] yeah, I payed the price one, then I just started being efficient [13:53:16] *once [13:53:18] haha [13:54:04] those server_id are fake? [13:54:08] nope [13:54:15] I got them from its immediate master [13:54:23] they are the real s4 masters (or should be) [13:54:56] just did SELECT server_id, ts FROM heartbeat ORDER BY ts DESC LIMIT 2; [13:55:31] yep, just checked, them they are the eqiad and codfw master [13:56:45] replication looks good now :) [13:56:46] thanks [13:58:06] 10DBA, 06Labs, 13Patch-For-Review: Add and sanitize s2, s4, s5, s6 and s7 to sanitarium2 and new labsdb hosts - https://phabricator.wikimedia.org/T153743#3009126 (10Marostegui) Replication has been started on db1095 and db1064 and so far it is looking good (they are both catching up after 2 days stopped) [14:20:17] 10DBA, 06Labs, 13Patch-For-Review: Add and sanitize s2, s4, s5, s6 and s7 to sanitarium2 and new labsdb hosts - https://phabricator.wikimedia.org/T153743#3009194 (10Marostegui) We also need to compress commonswiki on all the hosts. [14:21:20] that is another reason to do it on db1095 first, then copy [14:21:27] yep :) [14:23:53] :wq [14:23:57] :( [14:35:48] 10DBA, 06Labs, 13Patch-For-Review: Add and sanitize s2, s4, s5, s6 and s7 to sanitarium2 and new labsdb hosts - https://phabricator.wikimedia.org/T153743#3009252 (10chasemp) >>! In T153743#3009048, @Marostegui wrote: > data has been sanitized in all the hosts (sanitarium2, labsdb1009,10,11). > Triggers have... [14:42:44] I have disabled db1089 query logging [14:43:31] how much did it generate in terms of "G"? just for curiosity [14:43:39] 4.2 [14:43:46] 0.1 GB less than calculated [14:43:52] :-D [14:43:54] in 24h? [14:43:58] yes [14:44:06] normally I like to have 10 GB [14:45:00] that will be like 300MB compressed [14:45:13] note I have it sampled 1/100 [14:45:27] I have documented: "I found that a rate of 20 for regular hosts, and 100 for busy hosts, for 24 hours is a good value." [14:45:41] at https://wikitech.wikimedia.org/wiki/MariaDB/query_performance [14:46:20] 466 MB actually, which compresses in a few seconds [14:48:24] jynus: quick question on a labsdb view outcome, for teh revision table we have ' if(rev_deleted&1,null,rev_text_id) as rev_text_id' [14:48:25] https://phabricator.wikimedia.org/diffusion/OPUP/browse/production/modules/role/templates/labs/db/views/maintain-views.yaml;7678fccb897fd6912a75e3a162d338eb01a31193$323 [14:48:36] which analytics is saying always results in rev_text_id being a literal 0 [14:48:47] I'm not sure I understand if that's intentional or not [14:49:14] well, I cannot say about the intentions [14:49:36] :) [14:49:51] but maybe it should be rev_deleted = 1 then null ? [14:50:11] what values can rev_deleted be? [14:50:27] only 0 and 1? [14:50:44] I'm not sure of the answer there either [14:50:58] this bit of logic could be from eons ago [14:51:12] SELECT 0&1; [14:51:14] is 0 [14:51:20] but SELECT 1&1; [14:51:22] is 1 [14:51:26] so that is not true [14:51:58] yeah, probably can be 2 [14:52:05] based on the second part [14:52:38] I would tell them to send a CR [14:52:45] then refer to security and legal [14:52:47] to ok it [14:53:02] they are normally more strict than us [14:53:29] right [14:53:31] ok [14:53:45] I do not see a problem at first glance [14:54:05] but sincerely, you make more of a mediawiki question than mysql question [14:54:08] *made [14:54:19] sure, no doubt the context is all MW [14:54:31] I can generate, however [14:54:38] a list of interesting things [14:54:47] with a new loginc before implementing it [14:54:55] so tell them to ask for a specific logic [14:55:03] basically, send a CR [14:55:13] ok, I'll ask analytics to split out a different ticket to address this field [14:55:16] and *then* we examine, consult with legal, etc [14:55:18] yes [14:55:24] cool [14:55:27] I hope you are ok with me [14:55:33] totally, appreciate the insight [14:55:39] that we should be very careful [14:55:42] when changing those [14:55:48] normaly things look strange [14:55:52] but there is a reason [14:56:02] I was pretty there as well but I was a bit confused on whether the actual view setup was maybe just poor and never noticed or soething [14:56:05] so open to change [14:56:08] a small nte otherwise [14:56:11] if it is an error [14:56:27] labsdb1009 and labsdb1011 are gtg on views but 1010 tells me 'pymysql.err.OperationalError: (1142, "CREATE VIEW command denied to user 'maintainviews'@'localhost' for table 'abuse_filter_action'") [14:56:42] oh, again? [14:57:00] yeah via [14:57:00] labsdb1010:~# maintain-views --all-databases --replace-all --debug [14:57:01] something is problematic that reoccures [14:57:14] maybe after a restart [14:57:31] is it on commons? [14:57:48] looks like [14:57:49] maybe the load we do doesn't reset the grants [14:57:49] yiwikisource [14:57:53] oh? [14:57:57] that's new [14:58:06] actually, that's just teh first wiki hit I think [14:58:11] and it's probably a red herring [14:58:12] last time, I just did flush grants and it magically worked [14:58:21] let me try again [14:58:42] the thing is, if it happened on all [14:58:48] but just on one makes no sense [14:58:55] yep [14:59:43] try once again [15:00:25] yeah, it is working now [15:00:26] I can confirm exact same grants for maintainviews [15:00:30] mystery? [15:00:32] something is fishy [15:00:42] I belive a bug [15:00:49] maybe when tables are imported [15:00:50] fwiw I literally just ran that command each time nothing else [15:00:54] grants get weird [15:01:07] that could line up [15:01:09] no, but you do it when something changes [15:01:16] yep [15:01:22] which normally it us restarting or importing [15:01:40] but it is not very reproducible [15:02:03] I think your user can run that [15:02:17] so I am thinking of adding flush at the start of the script [15:02:28] and document it as a bug [15:02:55] sorry for the extra work [15:03:08] as I said, maybe we can automatize it more in the future [15:03:34] at least people will be happy to have commons now :-) [15:03:43] thanks to manuel's work [15:04:30] :) it's all good, at least it's consistently buggy? [15:04:50] he [15:05:31] it could be also related to roles [15:05:37] it is the only other thing special there [15:06:23] I would hae thought so too, but 1 in 3 or sometimes 2 in 3 makes no sense [15:06:29] yeah [15:06:58] specially 1010 [15:07:12] last and first servers can be special [15:07:36] because human error, almost never the middle ones [15:08:05] 10DBA, 06Labs, 13Patch-For-Review: Add and sanitize s2, s4, s5, s6 and s7 to sanitarium2 and new labsdb hosts - https://phabricator.wikimedia.org/T153743#3009402 (10chasemp) > maintain-views --all-databases --replace-all --debug on all three, small note labsdb1010 had that same issue which needed a flush pr... [15:10:23] if only we could rid ourselves of all these humans [15:25:17] chase, remember we are humans too, and we do human things, and we totally do not want to kill all humans [15:30:43] dang it, good call [15:32:18] jynus: analytics folks were hoping to see the logic for triggers at the sanitarium level, where does that live? It's in a repo I can't remember [15:32:32] I moved it to puppet [15:33:06] gotcha [15:33:27] I am searching it [15:34:23] jynus: modules/role/files/mariadb/redact_sanitarium.sh ? [15:34:40] https://phabricator.wikimedia.org/diffusion/OPUP/browse/production/modules/role/manifests/mariadb.pp;f8482375022ba1eee1aee518a58fb11e79a49299$705 [15:35:05] https://phabricator.wikimedia.org/diffusion/OPUP/browse/production/modules/role/manifests/mariadb.pp;f8482375022ba1eee1aee518a58fb11e79a49299$743 [15:35:18] got it, thanks jynus [15:38:35] hey chasemp sorry I was not paying attention, I am busy with something else at the moment. Just saw this: https://phabricator.wikimedia.org/T153743#3009402 [15:38:38] is that still happening? [15:38:58] marostegui: jynus did anotehr flush which fixed it but me are full of head shaking as to why it's needed [15:39:14] yeah, it is weird :| [15:39:38] so all the views are in place now? :) [15:40:03] I think the grants may be missing [15:40:07] for the role [15:40:28] yeah, those for sure (or at least i didn't create them so..) [15:41:16] marostegui: views are ok afaik :) [15:48:13] chasemp: I am going to add the grant [15:48:53] we should have a tools just for testing connectivity [15:48:56] *tool [15:49:08] jynus: I am looking around puppet and that grant isn't defined, am I right? [15:49:32] nope, because it is WIP [15:49:39] sure sure [15:49:44] and the whole labs-production separation [15:49:46] just double checking [15:49:51] I will add it on the three hosts then [15:50:04] remember the \_ [15:50:18] I think I may have added meta_p incorrectly [15:51:34] hehe yeah :) [15:52:52] ah, I did not do the meta_p thing [15:52:54] I can do that [15:52:57] (forgot) [15:57:06] lets throw thing to T153058 as TODO [15:57:06] T153058: LabsDB infrastructure pending work - https://phabricator.wikimedia.org/T153058 [16:00:38] 10DBA, 06Labs, 13Patch-For-Review: Add and sanitize s2, s4, s5, s6 and s7 to sanitarium2 and new labsdb hosts - https://phabricator.wikimedia.org/T153743#3009591 (10Marostegui) >>! In T153743#3009402, @chasemp wrote: >> maintain-views --all-databases --replace-all --debug > > on all three, small note labsdb... [16:15:52] 10DBA, 06Operations, 13Patch-For-Review: Move db1073 to B3 - https://phabricator.wikimedia.org/T156126#3009621 (10Marostegui) a:03Marostegui db1073 has been moved. DNS updated db-eqiad,codfw files updated mysql and replication started finely. tendril updated I will pool it in back slowly to once it is wa... [16:51:38] 10DBA, 06Operations, 13Patch-For-Review: Set up TLS for MariaDB replication - https://phabricator.wikimedia.org/T111654#3009760 (10jcrespo) TLS is now deployed on all core servers: ``` root@neodymium:~$ sudo salt --output=txt -C 'G@cluster:mysql and G@mysql_group:core' cmd.run 'mysql -BN --skip-ssl -e "SELEC... [17:17:48] 10DBA, 06Operations, 13Patch-For-Review: Set up TLS for MariaDB replication - https://phabricator.wikimedia.org/T111654#3009812 (10jcrespo) Enabled everywhere except on db1034 and db2057, which probably require a package upgrade. ``` $ sudo salt --output=txt -C 'G@cluster:mysql and G@mysql_group:core' cmd.... [17:25:48] 10DBA, 06Operations, 13Patch-For-Review: Set up TLS for MariaDB replication - https://phabricator.wikimedia.org/T111654#3009867 (10Marostegui) >>! In T111654#3009760, @jcrespo wrote: > TLS is now deployed on all core servers: Congratulations, that was a massive and tedious effort. [17:32:41] 10DBA, 10MediaWiki-Database, 13Patch-For-Review, 07PostgreSQL, 07Schema-change: Some tables lack unique or primary keys, may allow confusing duplicate data - https://phabricator.wikimedia.org/T17441#3009878 (10Krinkle) [17:34:15] 10DBA, 10MediaWiki-Database, 13Patch-For-Review, 07PostgreSQL, 07Schema-change: Some tables lack unique or primary keys, may allow confusing duplicate data - https://phabricator.wikimedia.org/T17441#197618 (10Krinkle) [17:34:48] 10DBA, 10MediaWiki-Database, 13Patch-For-Review, 07PostgreSQL, 07Schema-change: Some tables lack unique or primary keys, may allow confusing duplicate data - https://phabricator.wikimedia.org/T17441#3009907 (10Krinkle) [18:13:36] 10DBA, 06Labs, 10Labs-Infrastructure, 10Tool-Labs: Make (redacted) log_search table available on ToolLabs - https://phabricator.wikimedia.org/T85756#3010092 (10zhuyifei1999) [19:07:52] 10DBA, 06Operations, 13Patch-For-Review: Set up TLS for MariaDB replication - https://phabricator.wikimedia.org/T111654#3010354 (10jcrespo) db1034 is left, pending of the reimage marked above^. Of the non core hosts, only the following are left: db1020.eqiad.wmnet: NULL - **m2 master** db1009.eqiad.wmnet:... [22:13:04] Hello? [22:13:04] Is anyone here [22:13:15] Tes [22:13:24] There's 42 people here [22:13:52] RainbowSprinkles: sorry. My bouncer is stuck. I sent those messages a while ago. [22:14:01] :) [22:14:17] And looks like it's working again [23:58:36] 10DBA, 07Epic, 13Patch-For-Review: Decouple roles from mariadb.pp into their own file - https://phabricator.wikimedia.org/T150850#3011669 (10Dzahn) I have added 2 special comment lines to make it ignore the lint-warnings for the remaining ones until this is resolved - https://gerrit.wikimedia.org/r/#/c/33673...