[00:14:03] 10DBA, 10wikitech.wikimedia.org: Rename database 'labswiki' to 'wikitechwiki' - https://phabricator.wikimedia.org/T171570 (10bd808) 05Stalled→03Declined The amount of work needed to track down and change all references to the labswiki name are not worth the pain, especially if the main reason is getting ri... [06:12:44] 10DBA, 10Operations, 10ops-codfw: (codfw):rack/setup/install db213[2-5] - https://phabricator.wikimedia.org/T237702 (10Marostegui) >>! In T237702#5647039, @jcrespo wrote: > @Papaul Yes, the rack proposal seems ok. > @Marostegui Let's consider installing buster on new hosts starting now, even if that means in... [06:33:59] 10DBA, 10Operations, 10ops-codfw: Upgrade db2072 firmware and bios - https://phabricator.wikimedia.org/T237905 (10Marostegui) [06:34:12] 10DBA, 10Operations, 10ops-codfw: Upgrade db2072 firmware and bios - https://phabricator.wikimedia.org/T237905 (10Marostegui) p:05Triage→03Normal [06:45:04] 10DBA, 10Schema-change: Remove globalblocks tables from wikis - https://phabricator.wikimedia.org/T230055 (10Marostegui) 05Open→03Resolved Thanks for noticing: ` root@db1123.eqiad.wmnet[napwikisource]> select count(*) from globalblocks; +----------+ | count(*) | +----------+ | 0 | +----------+ 1 ro... [06:45:10] 10DBA, 10Epic, 10Tracking-Neverending: Database tables to be dropped on Wikimedia wikis and other WMF databases (tracking) - https://phabricator.wikimedia.org/T54921 (10Marostegui) [06:51:56] 10DBA, 10Data-Services, 10cloud-services-team (Kanban): Prepare and check storage layer for mnwwiki - https://phabricator.wikimedia.org/T235743 (10Marostegui) The steps that are needed before running the views script: * Create the `_p` database on labsdb1009-labsdb1012 * Add the database to the user role gra... [07:28:39] 10DBA, 10Operations: Decommission db2043-db2069 - https://phabricator.wikimedia.org/T228258 (10Marostegui) >>! In T228258#5614366, @jcrespo wrote: > I noticed db2062 isn't set on m1, is that on purpose because it is going to be decommissioned? Or because we didn't want it to alert? Or something else? CC @Maros... [08:08:42] 10DBA, 10Operations: Decommission db2048.codfw.wmnet - https://phabricator.wikimedia.org/T237913 (10Marostegui) [08:09:01] 10DBA, 10Operations: Decommission db2048.codfw.wmnet - https://phabricator.wikimedia.org/T237913 (10Marostegui) p:05Triage→03Normal [08:29:02] 10DBA, 10Operations: Decommission db2048.codfw.wmnet - https://phabricator.wikimedia.org/T237913 (10Marostegui) [08:39:02] 10DBA, 10wikitech.wikimedia.org: Move databases for wikitech (labswiki) and labstestwiki to a main cluster section (s5?) - https://phabricator.wikimedia.org/T167973 (10jcrespo) 05Open→03Stalled Just to clarify, does that mean there are blockers to move wikitech to production mw application servers, but the... [08:39:22] 10DBA, 10wikitech.wikimedia.org: Move databases for wikitech (labswiki) and labstestwiki to a main cluster section (s5?) - https://phabricator.wikimedia.org/T167973 (10jcrespo) 05Stalled→03Open [08:39:44] 10DBA, 10Wikidata, 10Patch-For-Review, 10User-Ladsgroup: Populate term_full_entity_id on www.wikidata.org - https://phabricator.wikimedia.org/T171460 (10Lea_Lacroix_WMDE) [08:43:48] 10DBA, 10Operations, 10ops-codfw: (codfw):rack/setup/install db213[2-5] - https://phabricator.wikimedia.org/T237702 (10jcrespo) I really meant 10.1, as a stop-gap measure until a final decision for database on buster is done, but to prevent a second reimage later on (upgrading just the package or copying it... [08:45:15] 10DBA, 10Operations, 10ops-codfw: (codfw):rack/setup/install db213[2-5] - https://phabricator.wikimedia.org/T237702 (10Marostegui) >>! In T237702#5652295, @jcrespo wrote: > I really meant 10.1, as a stop-gap measure until a final decision for database on buster is done, but to prevent a second reimage later... [08:47:43] 10DBA, 10Operations: Decommission db2043-db2069 - https://phabricator.wikimedia.org/T228258 (10jcrespo) What about db1135, then (probably others)? Is it on purpose or WIP, or something else? I didn't want to break anything, sorry. [08:49:50] 10DBA, 10Operations, 10ops-codfw: (codfw):rack/setup/install db213[2-5] - https://phabricator.wikimedia.org/T237702 (10Marostegui) @Papaul remember that I already included those hosts into into their correct partman recipe and spare role for now: https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/543748/... [08:50:02] 10DBA, 10Operations, 10ops-codfw: (codfw):rack/setup/install db213[2-5] - https://phabricator.wikimedia.org/T237702 (10jcrespo) So, any package would be cool to me- I just would like to avoid you extra work for an extra reimage. [08:50:51] jynus: morning! ^ so what's the default package now if we install buster? 101 or 103? do you remember from top of you head? [08:51:58] I don't really know [08:52:03] class role::mariadb::core { [08:52:03] if os_version('debian >= buster') { [08:52:03] $default_package = 'wmf-mariadb103' [08:52:03] } else { [08:52:12] but it can be set up on hiera too [08:52:21] e.g. with package= variable, I think [08:52:43] let me see [08:53:00] remember those hosts will go to misc [08:53:16] I am not sure we have 10.1 packages or 10.3 packages for buster up to date [08:53:48] did you see my email about meeting? [08:53:51] I think we only have 10.3 for buster, but I am not sure [08:54:16] Oh, just saw it [08:54:25] let's do 15:15? [08:54:29] well, It would be easy to create new packages for 10.3 or 10.1 or even other options [08:54:50] I can do it with whatever you think is best [08:54:57] let's do 15:15 [08:55:12] we even have percona-server 8.0 running right now on db1114 [08:55:18] oh nice :) [08:55:41] with protocolx, myrocks, and replicating for a week enwiki [08:56:15] marostegui: https://grafana.wikimedia.org/d/000000273/mysql?var-dc=eqiad%20prometheus%2Fops&var-server=db1114 [08:56:17] from what I can see on my lab, I only did 10.3 for buster [08:56:42] so the proposal of 10.1 was just because it may be easier, Idk [08:56:50] I don't care about the package [08:57:01] it is just that I don't want to reimage twice new servers [08:57:10] because of the package [08:57:28] yep, it makes sense to go to buster [08:57:42] at this point, I may even suggest 10.4 as a goal if mariadb [08:57:48] hehe [08:57:50] because support for older versions [08:57:55] is getting worse and worse [08:57:58] let's do 10.1 packages for buster then? [08:57:59] we can talk later about it [08:58:04] yeah [08:58:11] do you want me to help, or you do it? [08:58:21] if you do it, let me update the repo [08:58:30] with the latest 10.1.42 changes [08:58:52] if you can help with that, that'd be good, if not, I can do it later this week or next week (depending on how much it takes to catch up after holidays) [08:58:56] so I know this is far from ideal [08:59:02] I do have the lab for buster+10.3 packages [08:59:18] and you have stretch+10.1 XDDD [08:59:32] but I just I am getting worried about the larger and larger back catalogue [09:00:06] which under normal circumstances we should have started working on months ago :-S [09:00:38] so trying to minimize technical debt, I hope you understand [09:00:55] if not, we find out that we don't have time for buster + 10.1 now or before the end of the Q, we can just install those 4 hosts (they are only 4) with stretch+10.1 to get that out of the way and the revisit the package on buster once we are out of the Q [09:01:11] up to you [09:01:16] I would understand that [09:01:43] Let's try to have 10.1 for buster, and if not, fall back to stretch again (they are only 4 hosts) [09:01:45] but I hope you understand my concern of technical debt too :-D [09:01:51] of course I do :) [09:02:04] I can have buster 10.1 packages easy, I think [09:02:05] I am just trying to think of a way in case we run into issues when creating 10.1 for buster [09:02:15] if not, I report back [09:02:17] if you do, that'd be great :) [09:02:20] thanks :* [09:02:45] and reasses [09:02:50] Is there anything urgent I need to check/review/tackle? Or I can go back to reading emails? [09:03:02] urgent, unless you woke up to red alerts no [09:03:05] but [09:03:17] there is a few key decisions I stalled for your ok [09:03:34] please look at SRE weekly meeting for those [09:03:43] ok! [09:04:04] stalled as in "I think X, but because you are here next week, let's wait for manuel ok" [09:04:28] https://phabricator.wikimedia.org/T228258#5652306 what's the issue with db1135? [09:04:40] it has no section on hiera [09:04:49] so no replication checks [09:04:53] I think [09:05:01] it is m1 master [09:05:14] let me see [09:05:24] I know something is weird, but I may not remember how [09:05:29] sure :) [09:05:46] yes, it lacks m1 replication checks [09:06:03] I think more important on the replica [09:06:16] maybe it is on purpose, but the multi-instance one has it [09:06:30] but that host is m1 master and not multi instance [09:06:42] https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/534386/ [09:06:44] https://icinga.wikimedia.org/cgi-bin/icinga/status.cgi?search_string=Lag%3A+m1 [09:06:50] ^with this you will understand [09:06:59] https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/537316/1/hieradata/hosts/db1063.yaml [09:07:22] there may be a puppet bug [09:08:07] I would expect the check "MariaDB Slave Lag: m1" to be on all m1 hosts (or on none) [09:08:30] probably on all, even with a large lag check and no paging [09:08:34] We've never had lag checks on misc hosts (no idea why) [09:08:46] ah, then not on purpose by me [09:08:46] So that probably needs to be fixed [09:08:57] so, let's file a ticket [09:09:00] low prio [09:09:03] I thought it was done on purpose (as it was always like that) [09:09:05] yep [09:09:07] but I got confused with the decom hosts [09:09:21] interesting, the multi-intsance hosts do have the check haha [09:09:23] because I thought it was removed from those on purpose [09:09:29] see it is weird [09:09:30] ? [09:09:32] yeah [09:10:24] I will create a task [09:11:10] m3 is correct [09:11:23] see db2065 which is not multisource [09:11:42] so weird [09:12:22] so it could be something was lost on upgrade? or maybe it was always wrong? [09:12:31] I think it was always wrong, check this [09:12:32] just pointed the thing, which confused me a bit [09:12:44] https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/537316/ -> db1063 (old m1 master) [09:13:05] but why is db2065 ok? [09:13:07] https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/518458/ -> new m1 master [09:13:43] they have different roles (db2062 and db2065) [09:13:50] ah [09:13:53] so that it is [09:13:57] node 'db2065.codfw.wmnet' { [09:13:57] it got lost on role change [09:13:57] role(mariadb::misc::phabricator) [09:13:57] } [09:14:05] node 'db2062.codfw.wmnet' { [09:14:05] class { '::role::mariadb::misc': [09:14:18] Yeah, but the roles have not been touched for years I think [09:14:26] so I think this has been like that for years [09:14:37] sure, I am not telling you! [09:14:40] :-D [09:14:45] 10DBA: Add replication lag (and other checks) to misc all hosts - https://phabricator.wikimedia.org/T237927 (10Marostegui) [09:14:58] 10DBA: Add replication lag (and other checks) to misc all hosts - https://phabricator.wikimedia.org/T237927 (10Marostegui) p:05Triage→03Normal [09:15:03] it just needs some review with the new roles [09:17:03] yeah, phabricator.pp is quite broken [09:17:13] yeah, misc roles need lots of love [09:17:18] probably misc too [09:17:45] just sent you the new invitation for today's meeting [09:18:07] ok, accepted [09:18:14] <3 [09:18:26] read the SRE meeting notes for past meetings [09:18:31] yep [09:18:33] will do! [09:22:20] apparently, they just released 10.1.43 [09:22:35] oh, that was fast [09:22:42] they released .42 past 25th, no? [09:23:01] no, 3 days before .42 [09:23:06] no, 3 days before .43 I mean [09:23:23] wow [09:24:27] The big download button now leads you to mariadb.com [09:24:39] And in small size, "Alternate download from mariadb.org" [09:24:51] hehe so they changed it :) [09:25:24] Ah I see they finally released .42 the 5th instead of the 25th Oct as they previously planned [09:25:27] and .43 the 8th [09:25:31] weird [09:29:07] so 10.1 doesn't compile on buster because it doesn't have openssl 1.1 support [09:29:23] there is openss 1.0 on buster, but not with development libraries [09:30:30] we could test the stretch package, I belive it worked [09:33:53] not with development libraries? that's weird no? [09:34:43] it is supposed to be deprecated, kinda on buster [09:34:53] aaah [09:34:56] I see [09:38:27] buster only has OpenSSL 1.1, there's no 1.0.x anymore [09:38:49] ah, then it may be a wikimedia backport [09:38:52] or something else [09:38:56] stretch has both in separate source packages (https://packages.qa.debian.org/o/openssl1.0.html and https://packages.qa.debian.org/o/openssl.html) [09:39:43] marostegui: so up to you, I am compiling 10.1.43 for strech in any case [09:40:16] jynus: cool, I can try it it for buster on those hosts [09:40:47] I mean, being misc hosts on codfw, I think there is room for experimentation [09:41:13] unluke mw which require more tight versions [09:41:15] yeah [09:44:46] I am trying to see if there is a hiera flag for package: wmf-mariadb101 or something [09:45:09] to force a buster host to install 101 instead of having to change the package_wmf.pp line for it [09:45:40] maybe it only exists on core hosts? [09:45:50] yeah, I don't know [09:47:06] it can be easily tested, one setting up one of the hosts, we just need to try to line on the .yaml and see if it breaks :) [09:47:33] well, we have a puppet compiler for that [09:48:18] that's what I meant [09:48:28] But I think it is only in core from what I can quickly see [09:49:26] but misc is so spread that who knows XD [10:18:42] ok for me to test 10.1.43 on db2102? [10:18:59] (test-s1 host on codfw) [10:20:43] go for it! [10:44:57] https://grafana.wikimedia.org/d/000000273/mysql?var-dc=codfw%20prometheus%2Fops&var-server=db2102&orgId=1&panelId=39&fullscreen&var-port=9104&from=1573458290405&to=1573469090406 [10:45:12] :) [10:45:22] leaving for some hours [10:45:50] we'll talk later [10:45:58] yep [10:46:04] (hopefully not many hours) [10:49:24] you may want to upgrade the source backups and dbprov hosts, too: https://github.com/MariaDB/server/commit/5164f8c206 [11:01:13] 10Blocked-on-schema-change, 10DBA, 10Wikidata: Schema change on production for increase the size of wbt_text_in_lang.wbxl_language - https://phabricator.wikimedia.org/T237120 (10Marostegui) I can try to start with this next week to make sure the deletion of wb_terms doesn't get (more) blocked. I have a huge... [11:37:52] 10Blocked-on-schema-change, 10DBA, 10Wikidata: Schema change on production for increase the size of wbt_text_in_lang.wbxl_language - https://phabricator.wikimedia.org/T237120 (10Ladsgroup) >>! In T237120#5652812, @Marostegui wrote: > I can try to start with this next week to make sure the deletion of wb_term... [11:43:30] btw. We reduced size of PC (the fingerprint of wikidata in wikipedia PC entries: https://phabricator.wikimedia.org/T236749#5638907) [11:55:36] 10DBA, 10Wikidata, 10Patch-For-Review, 10User-Ladsgroup: Populate term_full_entity_id on www.wikidata.org - https://phabricator.wikimedia.org/T171460 (10Ladsgroup) [12:07:49] jynus: can I make the maintenance script faster? [12:12:23] For when you have some time: https://gerrit.wikimedia.org/r/c/operations/puppet/+/550113 [13:02:21] hey folks, any idea about T237971 ? [13:02:22] T237971: Cron /usr/local/bin/mwscript extensions/TorBlock/maintenance/loadExitNodes.php --wiki=labswiki --force > /dev/null - https://phabricator.wikimedia.org/T237971 [13:02:43] it seems labswiki is somehow confused about a DB in s10 [13:03:04] not sure if that's a simple hiera setting for that host [13:03:28] andrew has been working on that server (and wiki) lately, but he is off today [13:52:58] See my comment [13:53:16] other than that, I know no more, andrew may have done some things on his side? [13:53:31] Amir1: manuel or me will check it later [13:53:50] thanks [13:54:16] thanks jynus , yeah that feels related. I live-hacked the server to avoid the cronspam and andrew will hopefully follow-up tomorrow [14:00:13] arturo: No idea about that, but I guess it is related to? https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/547596/ (even though it is not merged?) [14:00:26] Ah I see Jaime already commented :) [14:00:34] yup [14:01:15] I am sure andrew knows a lot more about where this could come from [14:01:25] * arturo nods [14:16:15] jynus: meeting? [15:24:08] arturo, marostegui, jynus, for context about labtestwiki: It is currently in a very broken state, which will be resolved if/when James's patches are merged. I have made a bunch of in-place hacks to work around many (but not all) of the current issues. [15:24:09] In parallel, Bryan and I opened a bunch of tasks about moving (production) wikitech to the prod wiki cluster, which may mean that we can get rid of labtestwiki entirely. [15:24:46] I sort of understand the James patches but will need help getting them actually merged and applied [15:25:19] 10DBA, 10wikitech.wikimedia.org: Move databases for wikitech (labswiki) and labstestwiki to a main cluster section (s5?) - https://phabricator.wikimedia.org/T167973 (10bd808) @jcrespo Once the next MediaWiki deployment train runs, Wikitech's OpenStackManager extension will no longer interact with OpenStack API...