[00:11:30] 10DBA, 10Analytics, 10MediaWiki-Database, 10Research, 10Wikidata: Improve interlingual links across wikis through Wikidata IDs - https://phabricator.wikimedia.org/T215616 (10Tbayer) The [[https://www.mediawiki.org/wiki/Manual:Page_props_table |page_props table]] contains `wikibase_item` values for a give... [00:16:54] 10DBA, 10Analytics, 10MediaWiki-Database, 10Research, 10Wikidata: Improve interlingual links across wikis through Wikidata IDs - https://phabricator.wikimedia.org/T215616 (10diego) @Tbayer , great. Thanks. [01:50:26] 10DBA, 10Wikimedia-Site-requests: Global rename of The_Photographer → Wilfredor: supervision needed - https://phabricator.wikimedia.org/T215107 (10Wilfredor) Friday is ok to me, that way I'll have the weekend to fix something that needs to be changed [01:55:41] 10DBA, 10CheckUser: Provide a strategy for testing the performance of queries needed to show the list of user-agents for each IP - https://phabricator.wikimedia.org/T212092 (10Huji) 05Open→03Resolved Thanks @jcrespo [01:55:43] 10DBA, 10CheckUser, 10Patch-For-Review: The "show ip" action should also provide a distinct list of user-agents for each IP - https://phabricator.wikimedia.org/T170508 (10Huji) [01:56:50] 10DBA, 10CheckUser, 10Patch-For-Review: The "show ip" action should also provide a distinct list of user-agents for each IP - https://phabricator.wikimedia.org/T170508 (10Huji) 05Open→03Declined I am going to decline this Task, per the analysis done in T212092 An alternative strategy using APIs is discu... [01:59:08] 10DBA, 10CheckUser, 10Patch-For-Review: Create index for cu_agents in cu_changes table - https://phabricator.wikimedia.org/T147894 (10Huji) @jcrespo Given the analysis you already did on T212092#4934152 would you recommend that we go ahead with creating this index? (If I understand your comment there, the in... [06:40:36] 10DBA, 10Operations: Predictive failures on disk S.M.A.R.T. status - https://phabricator.wikimedia.org/T208323 (10Marostegui) [07:05:14] 10DBA, 10Data-Services, 10Datasets-General-or-Unknown, 10User-notice: Archive and drop education program (ep_*) tables on all wikis - https://phabricator.wikimedia.org/T174802 (10Marostegui) [07:12:58] 10DBA, 10Data-Services, 10Datasets-General-or-Unknown, 10User-notice: Archive and drop education program (ep_*) tables on all wikis - https://phabricator.wikimedia.org/T174802 (10Marostegui) I have renamed all the tables on enwiki on db1089: ` root@db1089.eqiad.wmnet[enwiki]> show tables like 'T174%'; +---... [07:13:42] 10DBA, 10Data-Services, 10Datasets-General-or-Unknown, 10User-notice: Archive and drop education program (ep_*) tables on all wikis - https://phabricator.wikimedia.org/T174802 (10Marostegui) a:03Marostegui [07:16:22] 10DBA, 10Data-Services, 10Datasets-General-or-Unknown, 10User-notice: Archive and drop education program (ep_*) tables on all wikis - https://phabricator.wikimedia.org/T174802 (10Marostegui) [07:28:52] 10DBA, 10Analytics, 10Analytics-Kanban, 10Patch-For-Review, 10User-Banyek: Migrate dbstore1002 to a multi instance setup on dbstore100[3-5] - https://phabricator.wikimedia.org/T210478 (10elukey) @ayounsi modified a bit analytics-in4 term mysql-dbstore: ` [edit firewall family inet filter analytics-in4 t... [08:07:59] 10Blocked-on-schema-change, 10MediaWiki-Change-tagging, 10Patch-For-Review, 10User-Ladsgroup: Drop change_tag.ct_tag column in production - https://phabricator.wikimedia.org/T210713 (10Marostegui) [09:16:57] 10DBA, 10Analytics, 10MediaWiki-Database, 10Research, 10Wikidata: Improve interlingual links across wikis through Wikidata IDs - https://phabricator.wikimedia.org/T215616 (10jcrespo) > the API works good for query specific pages/entities, not for example to know which pages that existing in X_wiki are mi... [09:25:22] 10DBA, 10CheckUser, 10Patch-For-Review: Create index for cu_agents in cu_changes table - https://phabricator.wikimedia.org/T147894 (10jcrespo) @Huji I would need to see the code changed, the proposals I gave at T212092 are different from the one shown at https://gerrit.wikimedia.org/r/#/c/mediawiki/extension... [15:31:21] is one of the affected DB hosts with reboot issues straightforward to reboot? we could try installing the 144 kernel from the upcoming 9.9 stretch release to see whether it makes a difference [15:31:45] let me check [15:32:04] 10DBA, 10Operations, 10ops-eqiad, 10Patch-For-Review, 10User-Marostegui: rack/setup/install db11[26-38].eqiad.wmnet - https://phabricator.wikimedia.org/T211613 (10Marostegui) [15:32:11] moritzm: yeah, I can depool db2085 and we can use that one [15:33:46] moritzm: I am depooling it now [15:34:49] ack, pulling the kernel from s-p-u in the mean time [15:34:56] thanks [15:35:02] I will stop mysql once depooled and we can play around [15:35:31] we might need to reboot a few times to see if it is solved or not [15:35:40] ack [15:41:05] moritzm: db2085 is fully ready for a reboot [15:41:11] depooled and mysql stopped [15:42:26] I've installed the new kernel on it, let me connect to the mgmt before we reboot, ok? [15:42:33] go for it! [15:43:28] I suggest to reboot a few times if it works the first time XD [15:44:38] ack, setting downtime, gonna reboot shortly [15:44:45] I downtimed it :) [15:44:47] for 24h [15:44:54] ah, good :-) [15:45:47] it's rebooting [15:48:30] let's see.. [15:48:37] with 144 it boots just fine, let me reboot again to rule out some Heisenbug [15:48:59] yeah [15:49:22] if it works, leave it with me and I will reboot it a few more times and ping you if something goes bad [15:50:20] it's rebooting again, I'm looking at the 4.9.130-4.9.144 interdiff in the mean time [15:50:29] thank you [15:53:00] it is up again [15:53:31] Do you want me to take over a reboot a few more times? [15:54:12] sure, I'll look at the changelogs next trying to find a change which would explain this [15:54:16] great! [15:54:25] is 4.9.144 default? [15:55:00] it will be by next weekend (when the stretch 9.9 release happens), but we can also go ahead and upgraded the batch of servers beforehand [15:55:10] no, I mean on this host :) [15:55:21] which other host was affected in eqiad? (just want to look at the kernel modules loaded, not reboot it :-) [15:55:26] ah, ok [15:55:27] db1106 [15:55:35] yes, it will pick 4.9.144 by default [15:55:37] https://phabricator.wikimedia.org/T214840 [15:55:38] great [15:55:48] console: Serial Device 2 is currently in use [15:55:54] give me some room! [15:55:55] :) [15:56:03] of the serial console now :-) [15:56:08] thanks [15:56:10] going to reboot 5 times [15:56:24] so we'll see what we get! [15:56:33] ack [16:02:54] moritzm: keep in mind that db1106 was rebooted with a kernel that works XD [16:04:06] yeah, just looking at lsmod mostly and that should be identical with 4.9.x [16:04:16] ah ok ok [16:05:11] after the 3rd reboot…..it is now stuck [16:06:12] it got rebooted automatically [16:06:28] let's see what it does now [16:06:49] Not sure if you saw: https://phabricator.wikimedia.org/T214840#4944412 [16:07:45] the 5th reboot also failed, it is now stuck [16:08:41] got rebooted automatically, let's see what it does now [16:09:29] this time it went thru [16:09:44] I will sum up this on the task [16:11:52] 10DBA, 10Operations, 10Packaging, 10Patch-For-Review: db2085/db1106 don't boot with 4.9.0-8-amd64 - https://phabricator.wikimedia.org/T214840 (10Marostegui) @MoritzMuehlenhoff has installed 4.9.144-3 on db2085. Out of 8 reboots, two of them got stuck (in a row). 1st reboot by @MoritzMuehlenhoff OK 2nd rebo... [16:12:51] it could also be entirely unrelated to the kernel version, if this affects every 3rd reboot or so and most of them were still running the default kernel we might simply not have seen it by now [16:13:31] what we could do to narrow this down is to remove the -8 kernel and reboot several times with the -7 kernel which was previously in use [16:13:57] to see if that happens with -7 kernel too? [16:14:59] keep in mind that db1106 and db2085 were rebooted just once when the new kernel was installed [16:18:02] yeah, that's what I meant, maybe this is more generic hw issue and also applies to the -7 kernel and simply didn't notice it yet since that batch of servers hasn't seen any reboots beyond the initial install [16:18:37] no, we do reboot them for every kernel upgrade [16:18:46] what I mean is, we do reboot them "often" [16:19:25] but yeah, let's leave -7 only there and I can do another 8 times reboot test [16:19:35] let me know when I can reboot [16:23:41] db2085 currently has -8 running and that prevents the remova of the kernel, we'll need to manually pick -7 in grub menu, then we can remove the -8 one [16:24:07] ah true haha [16:24:09] let me do that [16:24:16] or I can force the removal, but dpkg prints a strong warning not to do that :-) [16:24:23] nah [16:24:25] let me do that [16:24:28] I will reboot [16:28:21] ok, booting -7 now [16:28:39] which looks stuck too :| [16:29:48] apparently jaime had the same issue yesterday with -6, which didn't boot at first [16:29:58] so it might be indeed not something from this version specifically [16:30:15] -7 didn't boot, and got rebooted [16:30:19] let me try again with -7 [16:30:47] this time it went thru [16:32:02] 10DBA, 10Operations, 10Packaging, 10Patch-For-Review: db2085/db1106 don't boot with 4.9.0-8-amd64 - https://phabricator.wikimedia.org/T214840 (10Marostegui) After restarting with the previous kernel 4.9.0-7-amd64, the first time it didn't boot up, the second time it did. [16:32:04] moritzm: do you want to remove -8? [16:35:58] ack, on it [16:36:25] once done I will reboot 5 times with -7 [16:37:24] done, -8 removed [16:37:30] ok, let me reboot then [16:39:32] ack [16:48:35] 4th reboots ok…going for the 5th [16:48:45] I am updating the task in a bit [16:49:16] cause the first time we rebooted with -7 (when -8 was still installed) it didn't work, which is similar to what happened to jaime, with -6 yesterday) [16:57:15] interesting, the 6th reboot, -7 failed (same thing as -8 did, the 6th reboot failed) [17:02:40] 10DBA, 10Operations, 10Packaging, 10Patch-For-Review: db2085/db1106 don't boot with 4.9.0-8-amd64 - https://phabricator.wikimedia.org/T214840 (10Marostegui) @MoritzMuehlenhoff has removed -8 kernel from db2085 and I have rebooted it 8 times with -7 now 1st reboot: OK 2nd reboot: OK 3rd reboot: OK 4th rebo... [17:02:44] moritzm: I have added my thoughts there ^ [17:02:48] I am going to start mysql and call it a day [17:02:56] host running -7 now [17:03:50] ack, let's pick this up tomorrow [17:03:56] thanks! [23:28:04] 10DBA, 10Datasets-General-or-Unknown, 10Patch-For-Review, 10Wikimedia-Incident, 10WorkType-NewFunctionality: Automate the check and fix of object, schema and data drifts between mediawiki HEAD, production masters and slaves - https://phabricator.wikimedia.org/T104459 (10greg)