[04:49:05] 10DBA, 10Epic, 10Patch-For-Review: Upgrade WMF database-and-backup-related hosts to buster - https://phabricator.wikimedia.org/T250666 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by marostegui on cumin2001.codfw.wmnet for hosts: ` ['db2108.codfw.wmnet'] ` The log can be found in `/var/log/wmf... [05:29:49] 10DBA, 10Epic: Upgrade WMF database-and-backup-related hosts to buster - https://phabricator.wikimedia.org/T250666 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['db2108.codfw.wmnet'] ` and were **ALL** successful. [05:45:16] 10DBA, 10Epic, 10Patch-For-Review: Upgrade WMF database-and-backup-related hosts to buster - https://phabricator.wikimedia.org/T250666 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by marostegui on cumin2001.codfw.wmnet for hosts: ` ['db2111.codfw.wmnet', 'db2075.codfw.wmnet'] ` The log can be... [05:56:05] 10DBA, 10Core Platform Team: text table still has old_* fields and indexes on some hosts - https://phabricator.wikimedia.org/T250066 (10Marostegui) Only s3 master pending, I will do it on Monday as for s3 master, any schema change even if it is a small one generates quite lots of IO (as it has 900 wikis) so it... [06:09:29] 10DBA, 10Epic, 10Patch-For-Review: Upgrade WMF database-and-backup-related hosts to buster - https://phabricator.wikimedia.org/T250666 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['db2111.codfw.wmnet', 'db2075.codfw.wmnet'] ` and were **ALL** successful. [06:21:00] 10DBA, 10Patch-For-Review: Upgrade m1 to Buster and Mariadb 10.4 - https://phabricator.wikimedia.org/T254556 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by marostegui on cumin2001.codfw.wmnet for hosts: ` ['db2132.codfw.wmnet'] ` The log can be found in `/var/log/wmf-auto-reimage/202006190620_m... [06:44:38] 10DBA, 10Patch-For-Review: Upgrade m1 to Buster and Mariadb 10.4 - https://phabricator.wikimedia.org/T254556 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['db2132.codfw.wmnet'] ` and were **ALL** successful. [06:50:58] 10DBA, 10Patch-For-Review: Upgrade m1 to Buster and Mariadb 10.4 - https://phabricator.wikimedia.org/T254556 (10Marostegui) [08:05:58] 10DBA, 10Operations: refactor mariadb puppet code to have single mapping of multiinstance section to port numbers - https://phabricator.wikimedia.org/T255849 (10Kormat) [08:06:04] 10DBA, 10Operations: refactor mariadb puppet code to have single mapping of multiinstance section to port numbers - https://phabricator.wikimedia.org/T255849 (10Kormat) p:05Triage→03Medium [08:44:30] 10DBA: Switchover es5 master from es1023 to es1024 - https://phabricator.wikimedia.org/T255755 (10Marostegui) As per the chat yesterday on -databases, I am going to assume we cannot do it and we need to go via MW to disable es5 momentarily, make everything write to es4, do the switchover and then revert that cha... [09:00:21] 10DBA: Switchover es5 master from es1023 to es1024 - https://phabricator.wikimedia.org/T255755 (10Marostegui) [09:00:33] 10DBA: Switchover es5 master from es1023 to es1024 - https://phabricator.wikimedia.org/T255755 (10Marostegui) [09:16:06] jynus: hey. you mentioned that `charset="utf8mb4` for pymysql was a bad idea. i've been doing a bit of digging into this, and i'm not sure what to do about it [09:16:16] if i don't set a charset, the default _is_ utf8 [09:16:23] it's not possible to use `charset="binary"` [09:18:27] let me see [09:21:13] so it looks like that is the "query charset", which is not very relevant [09:21:48] you will get binary data no matter what you configure there if you query binary data, so not really a relevant comment from me [09:22:23] ahh, ok. [09:22:57] the warning is relevant, but not for that piece of code [09:23:03] +1 [09:23:10] but later for when data is retreived [09:23:29] that you will have to handle, independently of that config, getting non-text strings [09:25:56] marostegui: we have a first version of the transfer.py documentation [09:26:08] I saw some a few days ago I think [09:26:12] we will put it on doc.wikimedia.org and then will ask for feedback [09:26:13] On one of the CR urls [09:26:16] nice [09:26:32] as I am to "in" to realize some mistakes [10:22:48] ;] [10:23:43] jynus: shouldn't i just dpeloy the CI change to publish the transferpy doc ? [10:23:58] if you are happy with it, please do [10:24:00] or do you explcitly want to wait for privacybatm ? [10:24:12] we can later ask for changes, but they are likely to be sent to our repo, not ci [10:24:19] yeah [10:24:20] as in, doc changes [10:24:24] lets break CI [10:24:31] hopefully not [10:24:50] I tested doc building and it worked flawlesly locally [10:25:05] but you never now depending on os versions or whatever [10:25:09] *know [10:26:16] Job transferpy-tox-publish not defined [10:26:38] race condition or real? [10:28:13] I forgot to deploy the new jobs! - what should I have done? [10:31:28] nothing ;d [10:31:46] it like merging a puppet patch and forgetting to run 'puppet-merge' [10:32:01] ok [10:32:35] Published at https://doc.wikimedia.org/transferpy/master/ [10:32:38] it worked! [10:32:41] nice [10:33:04] how many do I owe you already? [10:33:08] 10DBA, 10Patch-For-Review: Add more information to --help option of transfer.py - https://phabricator.wikimedia.org/T253219 (10hashar) * Published at https://doc.wikimedia.org/transferpy/master/ \o/ [10:34:00] 10DBA, 10Patch-For-Review: Add more information to --help option of transfer.py - https://phabricator.wikimedia.org/T253219 (10jcrespo) yay: https://doc.wikimedia.org/transferpy/master/ [10:35:04] this is relevant also for kormat, as https://doc.wikimedia.org/transferpy/master/transferpy/transferpy.html#module-transferpy.MariaDB is even a new class for mariadb abstraction [10:35:34] the beauty is that I don't even know what that transferpy thing is about ;] [10:35:41] he he [10:35:57] it is what makes backups work for mysql [10:36:26] oh. I just cp -a the innodb file that seems to work for me [10:37:18] more seriously, I should look at having the Pamiko doc included [10:37:26] Paramiko [10:37:45] those other things, paramiko, ssh, etc. we don't really use them [10:38:03] after wmf sres decided to go with cumin [10:38:27] think of mw on oracle- technically it works, but there is very little usage [10:42:17] 10DBA, 10Patch-For-Review: Add more information to --help option of transfer.py - https://phabricator.wikimedia.org/T253219 (10jcrespo) We can close this, but let's remember to keep the help up-to-date with the new features implemented, as well as everything that is currently missing as it has not yet been ful... [10:43:48] well that was easy [10:44:17] well, unlike me, my student is a good developer! [10:44:40] and he is doing a fantastic job [10:46:13] ^ marostegui: obviously not urgent, but we would like your feedback to make the docs more useful [10:46:27] ah, I will take a look! [10:46:49] feel free to file tickets to him directly [10:47:34] I think we will move basic usage and api doc there and keep on wiki only the wmf installation and examples [10:48:09] 10DBA, 10Patch-For-Review: Add more information to --help option of transfer.py - https://phabricator.wikimedia.org/T253219 (10Privacybatm) :-D :-D \o/ >>! In T253219#6237417, @jcrespo wrote: > We can close this, but let's remember to keep the help up-to-date with the new features implemented, as well as eve... [10:48:39] 10DBA, 10Patch-For-Review: Add more information to --help option of transfer.py - https://phabricator.wikimedia.org/T253219 (10jcrespo) 05Open→03Resolved [10:48:43] 10DBA, 10Google-Summer-of-Code (2020), 10Patch-For-Review: GSoC 2020 Proposal: Improve the framework to transfer files over the LAN - https://phabricator.wikimedia.org/T248256 (10jcrespo) [10:49:48] 10DBA, 10Patch-For-Review: Switchover es5 master from es1023 to es1024 - https://phabricator.wikimedia.org/T255755 (10Marostegui) [10:53:14] jynus: I have just fixed it https://gerrit.wikimedia.org/r/#/c/606664/ [10:53:20] we can mock some modules! [10:53:52] oh, cool [10:54:19] stackoverflow is magic [10:54:24] enter a problem. Get a solution. [10:54:28] +1 but I will wait for him to ack it as it is on his repo [10:54:34] yeah ;-] [10:55:36] I don't see a problem to install the dep for doc generation purposes, it doesn't have to be a runtime dep, can be a test/doc-generation dep [10:57:45] https://www.youtube.com/watch?v=ussCHoQttyQ [10:58:36] lol [10:59:11] did you look at the like/dislike ratio of that video? [10:59:34] wow 1:! [10:59:35] 1:1 [10:59:48] soo funny [10:59:56] they have no strong feelings either [11:40:14] Hey, did anything happen this morning on s8? https://logstash.wikimedia.org/goto/6609c2c11d7f61670e56f15e6f315fcb [11:40:32] the code is reaching paths that are basically impossible now [11:42:02] Amir1: not that I know of [11:42:25] I mean, we are not doing any maintenance there [11:42:59] we haven't deployed anything this morning either. Suddenly 100 cases an hour are failing [11:43:42] nothing obvious here either https://grafana.wikimedia.org/d/000000278/mysql-aggregated?orgId=1&from=now-24h&to=now [11:45:32] Actually the rows written has elevated around the same time the errors went kaboom https://grafana.wikimedia.org/d/000000278/mysql-aggregated?panelId=7&fullscreen&orgId=1&from=now-24h&to=now [11:45:34] but it's not s8 [11:45:52] it is an alter table on s1 [11:45:56] on a host that is depooled [11:46:06] db1134 [11:46:46] https://grafana.wikimedia.org/d/000000273/mysql?panelId=3&fullscreen&orgId=1&from=now-24h&to=now&var-dc=eqiad%20prometheus%2Fops&var-server=db1134&var-port=9104 [11:47:34] oh okay, thanks [11:49:17] Nothing super obvious here either: https://logstash.wikimedia.org/goto/9a7fabf1339d7511fb45647f7eedac14 [11:50:05] 10DBA, 10Epic, 10Patch-For-Review: Upgrade WMF database-and-backup-related hosts to buster - https://phabricator.wikimedia.org/T250666 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by marostegui on cumin2001.codfw.wmnet for hosts: ` ['db2116.codfw.wmnet', 'db2119.codfw.wmnet', 'db2130.codfw.wmn... [11:52:03] Amir1: Why do you suspect it is a DB issue? [11:53:11] it's the term store code and it happens on race conditions (when a row is being read, can't find it, can't insert it either, can't read it again too even outside of scope repeatable read) [11:54:20] Amir1: I am checking all s8 hosts and they are looking ok for now [11:54:31] Amir1: can you give me a hostname or IP? [11:54:49] sure [11:55:02] I have a feeling that it might opcache corruption [11:55:33] https://logstash.wikimedia.org/goto/ab3489acf83a5695d656dbcc87ae7393 [11:56:40] 10.64.48.219 (mw1377) at 2020-06-19T11:54:48+00:00 [11:56:56] maybe bin log would help? They are reading master AFAIK [11:58:04] it happens only on 20 mw nodes, are we at the middle of migration of some sorts for mw nodes? [11:58:24] to buster, envoy, etc. [11:58:59] Amir1: I don't know about that, maybe in -sre someone from serviceops knows or maybe akosiaris [11:59:07] sure [11:59:13] Amir1: I can check binlogs, but you'd need to tell me what to look for :) [12:00:35] anything on wbt_* tables on this timestamp: 2020-06-19T11:54:48 (plus minus ten seconds?) on master of s8 [12:00:55] ok, let me see [12:00:59] specially if you can filter out by client IP [12:01:34] no, I cannot do that [12:02:01] INSERT /* Wikibase\Lib\Store\Sql\Terms\DatabaseItemTermStoreWriter::acquireAndInsertTerms */ IGNORE INTO `wbt_item_terms` (wbit_item_id,wbit_term_in_lang_id) VALUES (80160131,'471473560') that for instance? [12:02:49] wait, that is the wrong timestamp [12:05:57] Amir1: [12:06:09] #200619 11:54:48 [12:06:09] INSERT /* Wikibase\Lib\Store\Sql\Terms\Util\ReplicaMasterAwareRecordIdsAcquirer::insertNonExistingRecordsIntoMaster */ IGNORE INTO `wbt_term_in_lang` (wbtl_text_in_lang_id,wbtl_type_id) VALUES ('500003426','1') [12:06:12] Amir1: not that I know of [12:06:55] marostegui: yup, that's it, thanks. I check [12:06:58] akosiaris: Thanks! [12:19:55] 10DBA, 10Epic: Upgrade WMF database-and-backup-related hosts to buster - https://phabricator.wikimedia.org/T250666 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['db2116.codfw.wmnet', 'db2119.codfw.wmnet', 'db2130.codfw.wmnet'] ` and were **ALL** successful. [12:52:06] 10DBA, 10Patch-For-Review: Switchover es5 master from es1023 to es1024 - https://phabricator.wikimedia.org/T255755 (10CDanis) Looks like you already figured this out, but just commenting to confirm: a config deploy editing `$wgDefaultExternalStore` and `'templateOverridesByCluster' =>` in db-eqiad.php will be... [12:55:45] 10DBA: Add more information to --help option of transfer.py - https://phabricator.wikimedia.org/T253219 (10hashar) Something I forgot, although the documentation is now published at https://doc.wikimedia.org/transferpy/ , it is not listed on the main page: https://doc.wikimedia.org/ An entry can be added by edi... [12:56:36] 10DBA, 10Patch-For-Review: Switchover es5 master from es1023 to es1024 - https://phabricator.wikimedia.org/T255755 (10Marostegui) >>! In T255755#6237692, @CDanis wrote: > Looks like you already figured this out, but just commenting to confirm: a config deploy editing `$wgDefaultExternalStore` and `'templateOve... [13:01:55] 10DBA: Add more information to --help option of transfer.py - https://phabricator.wikimedia.org/T253219 (10jcrespo) I will have a look. [14:17:34] marostegui: I am not sure, but that previous change involved swapping out a replica, not a primary. i'm not sure what happens to the lag reported by the replicas when the primary is being switched but i could imagine that having impacts at the db level if you don't set `is static` [14:46:54] did you see the root spam by prometheus-puppet-agent-stats on db servers? [14:47:07] does anyone know where that comes from? [14:51:00] I am guessing buster, will create a ticket [15:16:22] jynus: those are "old" probably when they got reimaged [15:16:29] haven't happened for a few hours now [15:16:37] they got reimaged today around that time [15:17:48] ok, then I will wait, maybe they will go away [15:20:47] the last email was from two hours ago [15:22:07] don't worry, I thought it was more frequent [17:08:30] 10DBA, 10MediaWiki-extensions-Linter, 10Patch-For-Review: Display count of remaining content space errors - https://phabricator.wikimedia.org/T173943 (10Jonesey95) The maintainer of Fireflytools has abandoned updates. It would be useful for this tool to be maintained centrally. [19:18:39] 10DBA, 10Gerrit, 10Patch-For-Review: Make sure both `reviewdb-test` (used forgerrit upgrade testing) and `reviewdb` (formerly production) databases get torn down - https://phabricator.wikimedia.org/T255715 (10Dzahn)