[04:51:38] 10DBA, 10Wikimedia-Apache-configuration, 10Patch-For-Review, 10Release-Engineering-Team (Kanban): Cleanup remaining WikipediaMobileFirefoxOS references - https://phabricator.wikimedia.org/T187850 (10Marostegui) [04:51:45] 10DBA, 10Wikimedia-Apache-configuration, 10Patch-For-Review, 10Release-Engineering-Team (Kanban): Cleanup remaining WikipediaMobileFirefoxOS references - https://phabricator.wikimedia.org/T187850 (10Marostegui) [04:52:48] 10DBA, 10Wikimedia-Apache-configuration, 10Patch-For-Review, 10Release-Engineering-Team (Kanban): Cleanup remaining WikipediaMobileFirefoxOS references - https://phabricator.wikimedia.org/T187850 (10Marostegui) 05Open>03Resolved a:03chasemp If the only hosts pending were db1095 and db1102 this can be... [04:52:52] 10DBA, 10Wikimedia-Apache-configuration, 10Patch-For-Review, 10Release-Engineering-Team (Kanban): Cleanup remaining WikipediaMobileFirefoxOS references - https://phabricator.wikimedia.org/T187850 (10Marostegui) 05Open>03Resolved a:03chasemp If the only hosts pending were db1095 and db1102 this can be... [05:01:24] 10DBA, 10Patch-For-Review, 10Schema-change: Schema change to drop default from externallinks.el_index_60 - https://phabricator.wikimedia.org/T197891 (10Marostegui) [05:01:28] 10DBA, 10Patch-For-Review, 10Schema-change: Schema change to drop default from externallinks.el_index_60 - https://phabricator.wikimedia.org/T197891 (10Marostegui) [05:01:38] 10DBA, 10Patch-For-Review, 10Schema-change: Schema change: Add unique index on archive.ar_rev_id - https://phabricator.wikimedia.org/T196379 (10Marostegui) [05:01:41] 10DBA, 10Patch-For-Review, 10Schema-change: Schema change: Add unique index on archive.ar_rev_id - https://phabricator.wikimedia.org/T196379 (10Marostegui) [05:01:43] 10Blocked-on-schema-change, 10MediaWiki-Database, 10Patch-For-Review: Add a primary key to l10n_cache - https://phabricator.wikimedia.org/T146591 (10Marostegui) [05:01:51] 10Blocked-on-schema-change, 10MediaWiki-Database, 10Patch-For-Review: Add a primary key to l10n_cache - https://phabricator.wikimedia.org/T146591 (10Marostegui) [06:30:21] 10DBA, 10Patch-For-Review, 10Schema-change: Schema change to drop default from externallinks.el_index_60 - https://phabricator.wikimedia.org/T197891 (10Marostegui) [06:30:35] 10DBA, 10Patch-For-Review, 10Schema-change: Schema change: Add unique index on archive.ar_rev_id - https://phabricator.wikimedia.org/T196379 (10Marostegui) [06:30:51] 10DBA, 10Patch-For-Review, 10Schema-change: Schema change to drop default from externallinks.el_index_60 - https://phabricator.wikimedia.org/T197891 (10Marostegui) [06:30:52] 10Blocked-on-schema-change, 10MediaWiki-Database, 10Patch-For-Review: Add a primary key to l10n_cache - https://phabricator.wikimedia.org/T146591 (10Marostegui) [06:30:55] 10DBA, 10Patch-For-Review, 10Schema-change: Schema change: Add unique index on archive.ar_rev_id - https://phabricator.wikimedia.org/T196379 (10Marostegui) [06:31:10] 10Blocked-on-schema-change, 10MediaWiki-Database, 10Patch-For-Review: Add a primary key to l10n_cache - https://phabricator.wikimedia.org/T146591 (10Marostegui) [06:33:10] 10DBA, 10Patch-For-Review, 10Schema-change: Schema change to drop default from externallinks.el_index_60 - https://phabricator.wikimedia.org/T197891 (10Marostegui) s6 eqiad progress [] labsdb1009 [] labsdb1010 [] labsdb1011 [] db1125 [] dbstore1002 [] db1085 [] db1088 [] db1093 [] db1096 [] db1098 [] db1113... [06:33:12] 10DBA, 10Patch-For-Review, 10Schema-change: Schema change: Add unique index on archive.ar_rev_id - https://phabricator.wikimedia.org/T196379 (10Marostegui) s6 eqiad progress [] labsdb1009 [] labsdb1010 [] labsdb1011 [] db1125 [] dbstore1002 [] db1085 [] db1088 [] db1093 [] db1096 [] db1098 [] db1113 [] db1061 [06:33:15] 10DBA, 10Patch-For-Review, 10Schema-change: Schema change to drop default from externallinks.el_index_60 - https://phabricator.wikimedia.org/T197891 (10Marostegui) s6 eqiad progress [] labsdb1009 [] labsdb1010 [] labsdb1011 [] db1125 [] dbstore1002 [] db1085 [] db1088 [] db1093 [] db1096 [] db1098 [] db1113... [06:33:17] 10DBA, 10Patch-For-Review, 10Schema-change: Schema change: Add unique index on archive.ar_rev_id - https://phabricator.wikimedia.org/T196379 (10Marostegui) s6 eqiad progress [] labsdb1009 [] labsdb1010 [] labsdb1011 [] db1125 [] dbstore1002 [] db1085 [] db1088 [] db1093 [] db1096 [] db1098 [] db1113 [] db1061 [06:33:19] 10Blocked-on-schema-change, 10MediaWiki-Database, 10Patch-For-Review: Add a primary key to l10n_cache - https://phabricator.wikimedia.org/T146591 (10Marostegui) s6 eqiad progress [] labsdb1009 [] labsdb1010 [] labsdb1011 [] db1125 [] dbstore1002 [] db1085 [] db1088 [] db1093 [] db1096 [] db1098 [] db1113 []... [06:33:26] 10Blocked-on-schema-change, 10MediaWiki-Database, 10Patch-For-Review: Add a primary key to l10n_cache - https://phabricator.wikimedia.org/T146591 (10Marostegui) s6 eqiad progress [] labsdb1009 [] labsdb1010 [] labsdb1011 [] db1125 [] dbstore1002 [] db1085 [] db1088 [] db1093 [] db1096 [] db1098 [] db1113 []... [07:02:54] 10DBA, 10Wikimedia-Apache-configuration, 10Patch-For-Review, 10Release-Engineering-Team (Kanban): Cleanup remaining WikipediaMobileFirefoxOS references - https://phabricator.wikimedia.org/T187850 (10jcrespo) As I said, I fixed this at https://phabricator.wikimedia.org/T187850#4078495 [07:03:04] 10DBA, 10Wikimedia-Apache-configuration, 10Patch-For-Review, 10Release-Engineering-Team (Kanban): Cleanup remaining WikipediaMobileFirefoxOS references - https://phabricator.wikimedia.org/T187850 (10jcrespo) As I said, I fixed this at https://phabricator.wikimedia.org/T187850#4078495 [07:22:47] 10DBA, 10Epic, 10Wikimedia-Incident: Improve regular production database backups handling - https://phabricator.wikimedia.org/T138562 (10jcrespo) [07:22:48] 10DBA, 10Epic, 10Wikimedia-Incident: Improve regular production database backups handling - https://phabricator.wikimedia.org/T138562 (10jcrespo) [07:22:50] 10DBA, 10Patch-For-Review: Setup database logical backups on eqiad - https://phabricator.wikimedia.org/T192358 (10jcrespo) 05Open>03Resolved s1 is being backed up, resolving- we will setup the other sections when we have the hardware available. [07:22:52] 10DBA, 10Patch-For-Review: Setup database logical backups on eqiad - https://phabricator.wikimedia.org/T192358 (10jcrespo) 05Open>03Resolved s1 is being backed up, resolving- we will setup the other sections when we have the hardware available. [07:27:29] 10DBA: Setup database on tendril hosts to gather backup statistics - https://phabricator.wikimedia.org/T198937 (10jcrespo) p:05Triage>03Normal [07:27:31] 10DBA: Setup database on tendril hosts to gather backup statistics - https://phabricator.wikimedia.org/T198937 (10jcrespo) p:05Triage>03Normal [07:28:09] I think tendril is leaking memory, I will need to restart it- tell me when I can do that securely [07:28:14] anytime [07:28:28] that looks related to the crash we had some months ago then [07:28:32] we were suspecting a leak [07:29:01] yeah, at least it leaks every few months [07:29:06] while before it was every day [07:29:50] also please have a look at my proposal up here and its ticket [07:29:59] Yeah, give me a minute :) [07:30:02] (I meant gerrit patch) [07:59:27] i commented a few minutes ago on it [07:59:31] we can follow up on the patch [08:49:53] 10DBA: Optimize logging table - https://phabricator.wikimedia.org/T197459 (10Marostegui) [08:49:57] 10DBA: Optimize logging table - https://phabricator.wikimedia.org/T197459 (10Marostegui) [09:23:24] apparently there is officewiki on db1115 [09:24:28] tendril is back, btw [09:28:06] :( [09:39:05] but it is being used? [09:39:09] that is weird [10:39:28] marostegui: I would like to dump it and delete it and its associated grants [10:39:36] I don't think that is there for a real reason [10:39:42] officewiki? [10:39:45] yeah [10:39:51] yeah, I am surprised it is there [10:39:54] maybe some emergency in the past [10:39:55] there is an officewiki in s3 [11:05:05] 10Blocked-on-schema-change, 10MediaWiki-Database, 10Patch-For-Review: Add a primary key to l10n_cache - https://phabricator.wikimedia.org/T146591 (10Marostegui) [11:05:08] 10Blocked-on-schema-change, 10MediaWiki-Database, 10Patch-For-Review: Add a primary key to l10n_cache - https://phabricator.wikimedia.org/T146591 (10Marostegui) [11:05:13] 10DBA, 10Patch-For-Review, 10Schema-change: Schema change: Add unique index on archive.ar_rev_id - https://phabricator.wikimedia.org/T196379 (10Marostegui) [11:05:20] 10DBA, 10Patch-For-Review, 10Schema-change: Schema change: Add unique index on archive.ar_rev_id - https://phabricator.wikimedia.org/T196379 (10Marostegui) [11:05:22] 10DBA, 10Patch-For-Review, 10Schema-change: Schema change to drop default from externallinks.el_index_60 - https://phabricator.wikimedia.org/T197891 (10Marostegui) [11:05:24] 10DBA, 10Patch-For-Review, 10Schema-change: Schema change to drop default from externallinks.el_index_60 - https://phabricator.wikimedia.org/T197891 (10Marostegui) [11:12:29] 10DBA: Optimize logging table - https://phabricator.wikimedia.org/T197459 (10Marostegui) [11:12:31] 10DBA: Optimize logging table - https://phabricator.wikimedia.org/T197459 (10Marostegui) [11:25:42] hi, I'm working on automating service restarts for stateless services T135991; there's a script which detects if the given services uses an updated library and if that's the case restarts it. most of DBA services are obviously not in scope, but prometheus-mysqld-exporter seems like a viable candidate. ok to test service restarts for it on a few hosts? [11:25:43] T135991: Automated service restarts for common low-level system services - https://phabricator.wikimedia.org/T135991 [11:26:07] moritzm: fine by me [11:27:04] 10DBA: Optimize logging table - https://phabricator.wikimedia.org/T197459 (10Marostegui) [11:27:16] 10DBA: Optimize logging table - https://phabricator.wikimedia.org/T197459 (10Marostegui) [11:32:29] ack, thanks [11:35:30] +1 [11:36:53] haproxy is stateless, but I would also not restart it either (we can reload it easily, but that probably doesn't restat the actual process) [11:37:31] yeah, haproxy seems too risky, it's rather something we'd restart in a controlled manner [11:38:56] let me find what else we could include [11:40:13] mostly evertyhing else (cron, ssh,...) but not nagios or ferm [11:40:32] diamond +1 [11:40:55] smartd +1 [11:42:52] most of that is already covered [11:43:45] on e.g. db1076 we're covering the following via system-wide defaults: [11:44:16] systemd-timesyncd, diamond, prometheus-node-exporter, exim4, lldpd, cron, systemd-journald, smartd, nagios-nrpe-server and mcelog [11:44:29] a few more which are system-wide are in preparation/testing: [11:45:03] systemd-logind, ssh, potentially rsyslog [11:46:13] ferm isn't a service in a restartable sense as it only deals with the existing iptables rules and those are all running in the kernel, so unaffected by library changes [11:47:06] nrpe is enabled for quite some time, if you have a specific concern we can make an exception for dc hosts, but the restart window is in the millisecond range and Icinga should retry if it tries to connect during that tiny window [11:47:15] "db hosts" [11:47:28] Icinga itself ofc isn't auto-restarted, just the local NRPE agent [12:22:29] 10DBA, 10Patch-For-Review, 10Schema-change: Schema change to drop default from externallinks.el_index_60 - https://phabricator.wikimedia.org/T197891 (10Marostegui) [12:22:30] 10DBA, 10Patch-For-Review, 10Schema-change: Schema change to drop default from externallinks.el_index_60 - https://phabricator.wikimedia.org/T197891 (10Marostegui) [12:22:32] 10DBA, 10Patch-For-Review, 10Schema-change: Schema change: Add unique index on archive.ar_rev_id - https://phabricator.wikimedia.org/T196379 (10Marostegui) [12:22:34] 10DBA, 10Patch-For-Review, 10Schema-change: Schema change: Add unique index on archive.ar_rev_id - https://phabricator.wikimedia.org/T196379 (10Marostegui) [12:22:44] 10Blocked-on-schema-change, 10MediaWiki-Database, 10Patch-For-Review: Add a primary key to l10n_cache - https://phabricator.wikimedia.org/T146591 (10Marostegui) [12:22:47] 10Blocked-on-schema-change, 10MediaWiki-Database, 10Patch-For-Review: Add a primary key to l10n_cache - https://phabricator.wikimedia.org/T146591 (10Marostegui) [12:25:53] 10DBA, 10Patch-For-Review, 10Schema-change: Schema change to drop default from externallinks.el_index_60 - https://phabricator.wikimedia.org/T197891 (10Marostegui) s2 eqiad progress [] labsdb1009 [] labsdb1010 [] labsdb1011 [] db1125 [] dbstore1002 [] db1074 [] db1076 [] db1090 [] db1103 [] db1105 [] db1122... [12:25:55] 10DBA, 10Patch-For-Review, 10Schema-change: Schema change to drop default from externallinks.el_index_60 - https://phabricator.wikimedia.org/T197891 (10Marostegui) s2 eqiad progress [] labsdb1009 [] labsdb1010 [] labsdb1011 [] db1125 [] dbstore1002 [] db1074 [] db1076 [] db1090 [] db1103 [] db1105 [] db1122... [12:25:57] 10DBA, 10Patch-For-Review, 10Schema-change: Schema change: Add unique index on archive.ar_rev_id - https://phabricator.wikimedia.org/T196379 (10Marostegui) s2 eqiad progress [] labsdb1009 [] labsdb1010 [] labsdb1011 [] db1125 [] dbstore1002 [] db1074 [] db1076 [] db1090 [] db1103 [] db1105 [] db1122 [] db1066 [12:26:01] 10DBA, 10Patch-For-Review, 10Schema-change: Schema change: Add unique index on archive.ar_rev_id - https://phabricator.wikimedia.org/T196379 (10Marostegui) s2 eqiad progress [] labsdb1009 [] labsdb1010 [] labsdb1011 [] db1125 [] dbstore1002 [] db1074 [] db1076 [] db1090 [] db1103 [] db1105 [] db1122 [] db1066 [12:26:09] 10Blocked-on-schema-change, 10MediaWiki-Database, 10Patch-For-Review: Add a primary key to l10n_cache - https://phabricator.wikimedia.org/T146591 (10Marostegui) s2 eqiad progress [] labsdb1009 [] labsdb1010 [] labsdb1011 [] db1125 [] dbstore1002 [] db1074 [] db1076 [] db1090 [] db1103 [] db1105 [] db1122 []... [12:26:12] 10Blocked-on-schema-change, 10MediaWiki-Database, 10Patch-For-Review: Add a primary key to l10n_cache - https://phabricator.wikimedia.org/T146591 (10Marostegui) s2 eqiad progress [] labsdb1009 [] labsdb1010 [] labsdb1011 [] db1125 [] dbstore1002 [] db1074 [] db1076 [] db1090 [] db1103 [] db1105 [] db1122 []... [12:47:54] during my tests I noticed that on mariadb::core_multiinstance hosts there's a prometheus-mysqld-exporter.service running in addition to the @instances defined via puppet (it's the systemd unit shipped in the prometheus-mysqld-exporter package). it should be non-functional as there's no ferm rules for the this service in /etc/ferm/conf.d [12:48:32] it doesn't cause any issues, just something which irritated me as I first suspected by patch were wrong when I checked https://puppet-compiler.wmflabs.org/compiler02/11709/db1103.eqiad.wmnet/ [12:49:25] we could mask the prometheus-mysqld-exporter.service on multi instance hosts, or we simply ignore it as it's not causing any real issue, but I can prepare a patch to mask it if desired [12:50:22] moritzm: what? [12:50:38] there is ferm rules for those services [12:51:11] oh [12:51:14] I get what you mean [12:51:23] so there is prometheus-mysqld-exporter.service [12:51:33] and prometheus-mysqld-exporter@.service [12:51:56] yeah, this is specific to multi instance, it's fine on the others [12:52:04] but why mask it? [12:52:36] oh, in those running it may be a leftover [12:52:50] when migrating to single instance to multi [12:53:01] but ferm should just block those [12:53:02] yeah, it's not a real issue, just something that I stumbled upon [12:53:17] yep, ferm will prevent hius [12:53:18] this [12:53:29] you can just kill those if they bother you [12:53:42] it happens many times that a host is setup with 1 instance [12:53:48] and then it grows to 2 [12:54:04] but we dont like to touch the state of systemd [12:54:13] I think this also happens to hosts which have two instances from the start [12:54:23] prometheus-mysqld-exporter.service is coming from the deb [12:54:30] and the @ services from puppet [12:54:34] I see [12:54:46] so technically, it is a problem with the .deb [12:55:02] the .deb should not autostart it [12:55:38] that's the default for virtually anything packaged in Debian, users would revolt if it were different :-) [12:55:54] but, let's simply ignore, it doesn't cause any real issue [12:55:56] or we can force stopping it on puppet [12:56:01] the problem is that [12:56:10] there is a separation of concerns [12:56:23] I just ran into it as I thought the auto-restart patch would have missed an instance [12:56:25] multi-instances should not touch single instance services [12:57:02] makes sense, I see the point [12:59:04] 10Blocked-on-schema-change, 10MediaWiki-Database, 10Patch-For-Review: Add a primary key to l10n_cache - https://phabricator.wikimedia.org/T146591 (10Lyrixn) Whatever's going on here, it's spamming my email. Might want to figure out a better way of doing what you're doing, just an advice. [12:59:08] 10Blocked-on-schema-change, 10MediaWiki-Database, 10Patch-For-Review: Add a primary key to l10n_cache - https://phabricator.wikimedia.org/T146591 (10Lyrixn) Whatever's going on here, it's spamming my email. Might want to figure out a better way of doing what you're doing, just an advice. [13:01:23] 10Blocked-on-schema-change, 10MediaWiki-Database, 10Patch-For-Review: Add a primary key to l10n_cache - https://phabricator.wikimedia.org/T146591 (10Marostegui) >>! In T146591#4403390, @Lyrixn wrote: > Whatever's going on here, it's spamming my email. Might want to figure out a better way of doing what you'r... [13:01:28] 10Blocked-on-schema-change, 10MediaWiki-Database, 10Patch-For-Review: Add a primary key to l10n_cache - https://phabricator.wikimedia.org/T146591 (10Marostegui) >>! In T146591#4403390, @Lyrixn wrote: > Whatever's going on here, it's spamming my email. Might want to figure out a better way of doing what you'r... [13:04:35] 10Blocked-on-schema-change, 10MediaWiki-Database, 10Patch-For-Review: Add a primary key to l10n_cache - https://phabricator.wikimedia.org/T146591 (10jcrespo) But don't add him/her again :-) [13:04:40] 10Blocked-on-schema-change, 10MediaWiki-Database, 10Patch-For-Review: Add a primary key to l10n_cache - https://phabricator.wikimedia.org/T146591 (10jcrespo) But don't add him/her again :-) [13:05:34] This was an accident, but better not piss people :-) https://phabricator.wikimedia.org/T146591#4403406 [13:06:42] 10Blocked-on-schema-change, 10MediaWiki-Database, 10Patch-For-Review: Add a primary key to l10n_cache - https://phabricator.wikimedia.org/T146591 (10Marostegui) I didn't realize that replying and quoting the comment would add him/her back [13:06:48] 10Blocked-on-schema-change, 10MediaWiki-Database, 10Patch-For-Review: Add a primary key to l10n_cache - https://phabricator.wikimedia.org/T146591 (10Marostegui) I didn't realize that replying and quoting the comment would add him/her back [13:20:21] jynus: as I said I didn't realize I was getting him added back. [13:20:29] :-) [13:51:15] 10DBA: switchover es1014 to es1017 - https://phabricator.wikimedia.org/T197073 (10Marostegui) We'd need to start thinking a date for this failover. [13:51:25] 10DBA: switchover es1014 to es1017 - https://phabricator.wikimedia.org/T197073 (10Marostegui) We'd need to start thinking a date for this failover. [18:20:21] 10DBA, 10Goal: Monitor backup generation for failure or incorrect generation - https://phabricator.wikimedia.org/T198447 (10jcrespo) [18:20:23] 10DBA, 10Patch-For-Review: Setup database on tendril hosts to gather backup statistics - https://phabricator.wikimedia.org/T198937 (10jcrespo) 05Open>03Resolved a:03jcrespo Finally it is being worked on the same instances as tendril. [18:23:16] 10DBA: Gather statistics about the backups on a database - https://phabricator.wikimedia.org/T198987 (10jcrespo) p:05Triage>03Normal [18:23:53] 10DBA: Gather statistics about the backups on a database - https://phabricator.wikimedia.org/T198987 (10jcrespo) ``` root@neodymium:~$ ./section s1 db1052.eqiad.wmnet 3306 db1067.eqiad.wmnet 3306 db1080.eqiad.wmnet 3306 db1083.eqiad.wmnet 3306 db1089.eqiad.wmnet 3306 db1099.eqiad.wmnet... [18:26:36] 10DBA: Gather statistics about the backups on a database - https://phabricator.wikimedia.org/T198987 (10jcrespo) ```lines=10 -- -- Table structure for table `instances` -- DROP TABLE IF EXISTS `instances`; /*!40101 SET @saved_cs_client = @@character_set_client */; /*!40101 SET character_set_client = utf8 */...