[08:49:14] I have created https://github.com/Atoptool/atop/issues/27 too [08:49:42] Ah, great!! [08:57:00] meanwhile it should be disabled on all core hosts [08:57:18] while https://gerrit.wikimedia.org/r/#/c/428579/ is reviewed [09:04:46] jynus: I've akso marked the Debian bug as forwarded upstream with a reference to your github issue [09:05:00] thanks [09:05:24] Thanks! [09:05:27] moritzm: what is the policy regarding that? [09:05:40] do they wait X time and if not a patch is done? [09:05:50] or not wait at all if it is trivial/config? [09:06:32] depends, in the discretion of the maintainer. most packages are following upstream closely, so if upstream changes the default Debian will follow along [09:06:59] yeah, but imagine no upstream response is going to happen? [09:07:08] it will sit like that forever? [09:07:47] no. often a change might be applied in Debian and if it works well, it can be a convincing argument for upstream to merge it e.g. [09:08:00] I think this change is uncontroversial and just an oversight [09:08:39] we've also not noticed on the > 200 existing stretch hosts so far [09:09:04] I think that is a gap in our monitoring coverage [09:09:14] so it's very likely that upstream simply didn't estimate that the fallout of changing the default would be as drastic as it was for us [09:09:15] as we have 1 minute granularity [09:09:26] true that [09:09:30] and this is a 1 second issue every 10 minutes [09:09:50] so we are not noticing micro-spikes [09:10:16] ironicaly, atop wouls may have helped with that :-) [09:11:11] indeed :-) [09:11:13] 10DBA, 10Operations-Software-Development: Debmonitor: request for misc DB allocation - https://phabricator.wikimedia.org/T192875#4152620 (10Volans) p:05Triage>03Normal [09:29:03] 10DBA, 10Operations-Software-Development: Debmonitor: request for misc DB allocation - https://phabricator.wikimedia.org/T192875#4152683 (10jcrespo) I think m2 would be the right place to set it up, more so after the database there is upgraded. How are you going to handle dc monitoring, will you store things o... [09:31:01] 10DBA, 10Operations-Software-Development: Debmonitor: request for misc DB allocation - https://phabricator.wikimedia.org/T192875#4152684 (10Marostegui) How would you handle a DC fail? What would happen if the active master dies? Will things pile up from your side? What if you need to read from codfw? [09:47:23] 10DBA, 10Operations-Software-Development: Debmonitor: request for misc DB allocation - https://phabricator.wikimedia.org/T192875#4152719 (10Volans) Sorry, I didn't mention the multi-DC setup :) The idea for now is to install DebMonitor server on two hosts, one per main DC in an active/passive setup. The webUI... [12:12:32] jynus: https://gerrit.wikimedia.org/r/#/c/428613/ I didn't understand what you said about 25M unpurged rows but I wanted to say this thing already deleted 360M rows [12:17:53] the question is, can you wait? [12:18:11] and can you stop deleting things on s8? [12:18:23] until we stabilize all servers again [13:01:50] jynus: yeah of course, I already made the patch :) [13:03:09] thanks deploying now, will try to fix dbstore2001 and the others (cannot do today because backups) and wil ping you when we are ready [13:30:59] I am playing around with dbstore2001:s8 [13:31:16] ok [13:31:49] problem is that if I make purging faster, it tends to lag [13:33:19] it is funny to see how if I block all sql operations, innodb io increases [13:35:20] 10DBA, 10Operations, 10ops-eqiad, 10Patch-For-Review: Rack and setup db1116 - db1123 - https://phabricator.wikimedia.org/T191792#4153597 (10Marostegui) @Cmjohnson I have enabled those hosts to get installed with the db.cfg recipe, so as soon as they start PXE booting they should get the correct installatio... [14:24:51] 10DBA, 10Operations-Software-Development: Debmonitor: request for misc DB allocation - https://phabricator.wikimedia.org/T192875#4153775 (10Marostegui) From where will you be querying this DB? (just to see which (new) grants you might need? [15:07:02] ERROR 1062 (23000): Duplicate entry '2-47909092-0-2113' for key 'linter_cat_page_position' on table compression, sounds familiar? [15:07:09] yeah [15:07:11] it does [15:07:20] we have done lots of linter reimports, but not surprising [15:07:35] how did you solve it? [15:07:54] reimporting it [15:08:03] mysqldump, drop, reimport [15:08:06] but this was just reimported [15:08:16] fresh from .sql files [15:08:22] that's surprising then [15:08:28] is mariadb alter broken? [15:08:40] or its innodb compressed engine? [15:08:46] can you try it again? [15:08:50] like do the mysqldump again [15:08:54] and compress it? [15:09:02] it should be quite small, so worth trying [15:09:14] it is 1 GB, not that small [15:09:27] but I am doing other alters at the same time [15:09:27] That should be pretty fast to do I think [15:10:01] I can try again with the slave stopped [15:10:11] Yeah [15:10:12] to confirm it is not a broken alter [15:10:43] I encountered lots of those when doing the compression for the rc slaves [15:10:56] And drop + mysqldump fixed it [15:11:04] but if this is a fresh imported slaveā€¦that's new [15:11:08] I would do that test [15:12:29] if it works with the slave stopped, we may have a broken alter table, which is scary [15:12:57] so you are going to do the same alter but with the slave stopped? [15:13:06] yes, there is nothing to lose [15:13:09] what if you just try it again [15:13:11] with the slave enabled [15:13:15] to confirm it crashes again [15:13:30] I just run it with it stopped, it is at 93% already [15:14:16] it would have been good to test it with it enabled, just to see if it crashes too [15:14:24] or if for some reason it was just a one time thing [15:14:47] it could be some weirdness with compression changing the size of indexes or something [15:15:27] it has a blob [15:15:47] which may be a factor [15:16:01] but the complain is on the unique key [15:23:08] it worked with the slave stopped [15:25:33] 10DBA, 10Operations-Software-Development: Debmonitor: request for misc DB allocation - https://phabricator.wikimedia.org/T192875#4154037 (10Volans) >>! In T192875#4153775, @Marostegui wrote: > From where will you be querying this DB? (just to see which (new) grants you might need? @Marostegui only from debmon... [15:39:23] 10DBA, 10Multi-Content-Revisions, 10Schema-change: Schema change to drop archive.ar_text and archive.ar_flags - https://phabricator.wikimedia.org/T192926#4154116 (10Anomie) p:05Triage>03Normal [15:39:49] 10DBA, 10Multi-Content-Revisions, 10Schema-change: Schema change to drop archive.ar_text and archive.ar_flags - https://phabricator.wikimedia.org/T192926#4154132 (10Anomie) [15:45:28] 10DBA, 10Multi-Content-Revisions, 10Schema-change: Schema change to drop archive.ar_text and archive.ar_flags - https://phabricator.wikimedia.org/T192926#4154116 (10Marostegui) I will do this change along with T191316 once the current 3 schema changes we have ongoing are done. [15:47:27] jynus: if it happens again, let's try twice with the slave enabled [16:47:01] 10DBA, 10Operations, 10ops-eqiad, 10Patch-For-Review: Rack and setup db1116 - db1123 - https://phabricator.wikimedia.org/T191792#4154547 (10Cmjohnson) [17:48:50] 10DBA, 10MediaWiki-extensions-WikibaseRepository, 10Wikidata, 10User-Ladsgroup, 10Wikidata-Ministry-Of-Magic: Apply schema changes to an isolated database and examine the results - https://phabricator.wikimedia.org/T191391#4154763 (10Ladsgroup) Yeah, Some of my changes are done now. what do you think of... [19:08:51] 10DBA, 10Operations, 10ops-eqiad, 10Patch-For-Review: Rack and setup db1116 - db1123 - https://phabricator.wikimedia.org/T191792#4155054 (10Cmjohnson) @Marostegui @jcrespo All of the db's with the exception of db1120 (c6) are installed and ready for you to take over. Most likely I have a bad cable or a lo... [21:30:41] 10DBA, 10Operations, 10hardware-requests, 10ops-eqiad, 10Patch-For-Review: Decommission db1034 - https://phabricator.wikimedia.org/T182556#4155612 (10Cmjohnson) [21:32:09] 10DBA, 10Operations, 10hardware-requests, 10ops-eqiad, 10Patch-For-Review: Decommission db1029 and db1031 - https://phabricator.wikimedia.org/T184054#3870892 (10Cmjohnson)