[01:07:32] <wikibugs>	 10DBA, 10MediaWiki-General-or-Unknown, 10Operations, 10MW-1.31-release-notes (WMF-deploy-2017-10-10 (1.31.0-wmf.3)), and 2 others: Regularly purge expired temporary userrights from DB tables - https://phabricator.wikimedia.org/T176754#3970110 (10TBolliger)
[06:19:28] <wikibugs>	 10DBA, 10Operations, 10ops-eqiad, 10Patch-For-Review: Rack and setup db1115 (tendril replacement database) - https://phabricator.wikimedia.org/T185788#3970735 (10Marostegui) Chris, can you hold this task? There are under going discussions about the hostname, I marked the task as stalled but I should've sai...
[06:46:05] <jynus>	 we have today 10.1.31 and 10.0.34
[06:46:40] <marostegui>	 \o/
[06:46:53] <marostegui>	 I will upgrade db1096 then (s5 multi-instance)
[06:47:17] <marostegui>	 I had to upgrade its kernel anyways, so...:-)
[06:47:22] <jynus>	 and s6
[06:47:34] <marostegui>	 yeah
[07:11:28] <jynus>	 tell me when you repool db1096 fully, I will restart db1088
[07:12:01] <marostegui>	 it will take a while (I am running an alter there)
[07:12:09] <jynus>	 ah, one thing- I may have broken when uploading those packages the 10.1 package on jessie
[07:12:18] <jynus>	 do not install that
[07:12:33] <marostegui>	 I will install 10.1.34 on stretch
[07:12:51] <jynus>	 I do not think we have any host like that, and I will fix them soon, but just in case
[08:31:05] <jynus>	 touching db1104 in any way? seem very overloaded
[08:31:26] <jynus>	 https://grafana.wikimedia.org/dashboard/db/mysql?orgId=1&from=now-24h&to=now&var-dc=eqiad%20prometheus%2Fops&var-server=db1104&var-port=9104
[08:32:50] <jynus>	 I think it is wikidata api
[08:34:12] <jynus>	 I am going to change weights on s8 eqiad
[08:43:12] <jynus>	 actually is it api or is it main traffic?
[08:55:03] <marostegui>	 No, I know what it is
[08:55:05] <marostegui>	 it was me
[08:55:25] <marostegui>	 I was running a mydumper from that host to db1089 and db1067 (as I normally do from that host, but I thought I had depooled it)
[08:55:29] <marostegui>	 So my bad
[08:58:48] <jynus>	 so, actually running mysql on an active host is not bad, but it should be throttled
[08:58:52] <jynus>	 *dump
[09:05:31] <Reedy>	 yay, more schema indexes that don't match... https://phabricator.wikimedia.org/T187295
[09:28:56] <wikibugs>	 10Blocked-on-schema-change, 10DBA: Apply AbuseFilter patch-fix-index - https://phabricator.wikimedia.org/T187295#3971090 (10Reedy)
[09:52:13] <jynus>	 no problems with db1096 upgrade?
[09:52:24] <marostegui>	 nope
[09:52:50] <jynus>	 I had only tested on production on labsdb1010
[10:16:35] <godog>	 FYI https://gerrit.wikimedia.org/r/c/410412/ when you get a chance
[10:18:15] <wikibugs>	 10DBA, 10Data-Services, 10cloud-services-team (Kanban): Review m5 backups - https://phabricator.wikimedia.org/T186585#3948422 (10jcrespo)
[10:19:24] <jynus>	 consider a virtual +1 hiere without actual reviewing
[10:19:36] <jynus>	 I can give you a real one, but later
[10:20:14] <jynus>	 I am not sure RAID hosts generate smart alerts
[10:20:28] <jynus>	 hw raid, I mean
[10:21:03] <jynus>	 I mean, they do, but have to be accessed through raid controller, which is technically already monitored
[10:39:13] <jynus>	 misc backup are running right now, in case something goes wrong
[10:40:36] <marostegui>	 thanksw
[10:45:20] <wikibugs>	 10DBA, 10Epic: Meta ticket: Migrate multi-source database hosts to multi-instance - https://phabricator.wikimedia.org/T159423#3971271 (10elukey) The Analytics team's vision is to move all our data to HDFS and eventually move all our users away from mysql, both for the log database and for all the ones on dbsto...
[10:50:10] <wikibugs>	 10DBA, 10Data-Services, 10cloud-services-team (Kanban): Review m5 backups - https://phabricator.wikimedia.org/T186585#3971296 (10jcrespo) 05Open>03Resolved I have not dropped percona, will want to examine the checkusums later. Will do at another time.
[10:56:26] <jynus>	 special warning that dbstore2001 is at 83% disk usage and the OTRS backus could be large
[10:57:52] <marostegui>	 Anything pending to be deleted apart from the small 13G backup at /srv/tmp?
[10:58:32] <wikibugs>	 10Blocked-on-schema-change, 10DBA: Apply AbuseFilter patch-fix-index - https://phabricator.wikimedia.org/T187295#3971317 (10MarcoAurelio) @Reedy Can you apply that patch on Beta Cluster? Thanks.
[10:58:33] <jynus>	 not really
[10:59:11] <jynus>	 in the short future we could delete more sections, generate them on es2*
[10:59:28] <jynus>	 in the far feature, dumps and dbs will be on separate new hosts
[11:01:06] <wikibugs>	 10Blocked-on-schema-change, 10DBA: Apply AbuseFilter patch-fix-index - https://phabricator.wikimedia.org/T187295#3971330 (10Reedy) >>! In T187295#3971317, @MarcoAurelio wrote: > @Reedy Can you apply that patch on Beta Cluster? Thanks.  No, there's no need. Pretty sure all the wikis were created after that was...
[11:05:49] <wikibugs>	 10Blocked-on-schema-change, 10DBA: Apply AbuseFilter patch-fix-index - https://phabricator.wikimedia.org/T187295#3971332 (10MarcoAurelio) Ok. Thanks for the explanation :)
[11:25:16] <marostegui>	 db1106 is now running 10.0.34
[11:41:32] <jynus>	 cool
[11:42:18] <Hauskatze>	 'funny' thing that schema-change for abusefilter... seems that the indexes differ on every db{} :/
[11:43:05] <jynus>	 that is what happens when there is not good mainenance- commits are done, but server maintainer normally are not notified
[11:43:16] <jynus>	 so they end up in the state they were created
[11:44:18] <Hauskatze>	 well, I'm just a watcher; I don't even understand what's the difference between db1001 and db1002, etc
[11:45:20] <jynus>	 one of the first things I did was to document and enforce a process: https://wikitech.wikimedia.org/wiki/Schema_changes#Workflow_of_a_schema_change
[11:53:21] <Hauskatze>	 is that fixable?
[11:53:33] <Hauskatze>	 I mean, does that have an 'easy' or feasible solution?
[11:53:35] <jynus>	 the process or the indexes?
[11:53:48] <Hauskatze>	 what is requested in that task in general
[11:53:59] <jynus>	 sure, just not easy
[11:54:07] <Hauskatze>	 as for the script optimization, shall I open a new task?
[11:54:21] <jynus>	 the answer to is X doable in computer science is always yes
[11:54:25] <jynus>	 :-)
[11:54:43] <jynus>	 that is the task, no need for a separate one, that should fix the script automatically
[11:54:58] <Hauskatze>	 heh, tell that to the dbas of my former university, they always said 'impossible' when we proposed some sort of program change
[11:55:23] <jynus>	 what they meant is "is is possible, but not advisable"
[11:55:44] <Hauskatze>	 what they really meant was 'I don't want to work'
[11:55:53] <jynus>	 and things are possible, the question is when?
[12:00:30] <wikibugs>	 10DBA, 10Cloud-Services, 10Toolforge: Allow self-serve database credential and permissions management for Tool Labs projects - https://phabricator.wikimedia.org/T136335#3971465 (10jcrespo) I will not be working on this, but will help if someone else wants in the future (add us back). Not sure if it is still...
[12:01:23] <jynus>	 marostegui: I have moved labsdb1010 fix and x1-codfw fix to next, as I do not think they are longer the current priority- feel free to disagree, and just change them back,etc.
[12:01:55] <marostegui>	 sounds good yeah
[12:02:07] <marostegui>	 I wanted to work on x1-codfw failover this week, but not sure I will at this point :(
[12:02:13] <jynus>	 exactly
[12:02:42] <jynus>	 not that it cannot be moved ever, just trying to reflect what I think is current status
[12:02:50] <marostegui>	 I know :)
[12:03:28] <jynus>	 otrs backups still ongoing
[12:05:14] <jynus>	 good news is m1,m3 and m5 are done and reviewed as doing what they should
[12:05:23] <marostegui>	 great!!
[12:06:09] <jynus>	 now I only have to pick all the pieces and put them in a script and puppetize them
[12:06:24] <jynus>	 "only!
[12:06:53] <marostegui>	 haha yeah Xd
[12:06:55] <marostegui>	 "just that"
[12:06:56] <marostegui>	 haha
[12:07:21] <jynus>	 wanna fight later about T159423 and T186123?
[12:07:21] <stashbot>	 T186123: rack/setup/install tendril2001 - https://phabricator.wikimedia.org/T186123
[12:07:21] <stashbot>	 T159423: Meta ticket: Migrate multi-source database hosts to multi-instance - https://phabricator.wikimedia.org/T159423
[12:07:58] <marostegui>	 haha
[12:08:02] <marostegui>	 yeah, we can do it later
[12:08:27] <jynus>	 T159423 this is not a clear ticket, if I was pedantic, I would send it back to elukey
[12:08:42] <jynus>	 but probably we hav enough to create a ticket that is clear
[12:09:30] <elukey>	 ?
[12:09:34] <marostegui>	 I think it is up to us now, really, and we have different visions hehe
[12:09:35] <jynus>	 just kidding
[12:09:56] <jynus>	 elukey: I asked you to create a ticket with "we need to purchase X, this is what we now"
[12:10:04] <jynus>	 it is ok, we will do that
[12:10:40] <jynus>	 I was just teasing you
[12:10:51] <elukey>	 ah I might have misread, I understood to put that thought in the task in which we were discussing dbstore1002
[12:11:16] <jynus>	 yeah, but that is quite unrelated- it is blocking that, but the point is a purchase, needs its own ticket
[12:11:27] <jynus>	 we will take care of it, don't worry
[12:11:27] <elukey>	 sure
[12:11:33] <elukey>	 thanks!
[12:11:55] <jynus>	 when marostegui onboarded I indoctrinated on being super-clear on tickets
[12:12:10] <jynus>	 he remmbers the torture still
[12:12:33] <jynus>	 I will have to send you to the torture room too, elukey :-D
[12:12:47] * elukey runs away from Jaime
[12:12:49] <jynus>	 indoctrinated him
[12:24:51] <jynus>	 https://www.youtube.com/watch?v=gI7bnzUzJpA
[12:27:00] <volans>	 jynus: lol, the two sides of the tree seems to fight between each other :D
[12:29:16] <jynus>	 they are the tests
[12:29:22] <jynus>	 vs the actual code
[12:29:35] <jynus>	 and at the other side, the storage engines
[12:29:37] <moritzm>	 I'm pretty sure that if you run gource against our puppet tree, it'll visualise a portal to hell opening up
[12:29:50] <jynus>	 I literally LOL
[12:35:35] <elukey>	 ahahahhaha
[12:38:26] <Hauskatze>	 probably
[12:39:06] <wikibugs>	 10DBA, 10Phabricator (2018-02-xx), 10Release: Upcoming phabricator upgrade requires unusually long database migrations - https://phabricator.wikimedia.org/T187143#3971548 (10mmodell) Here is the one that I expect to be slow: https://secure.phabricator.com/source/phabricator/browse/master/resources/sql/autopa...
[12:44:52] <wikibugs>	 10DBA, 10Phabricator (2018-02-xx), 10Release: Upcoming phabricator upgrade requires unusually long database migrations - https://phabricator.wikimedia.org/T187143#3971553 (10jcrespo) I do not see any schema change, only updates using ids, is there an actual schema change done before that? https://secure.phab...
[12:46:01] <wikibugs>	 10DBA, 10Phabricator (2018-02-xx), 10Release: Upcoming phabricator upgrade requires unusually long database migrations - https://phabricator.wikimedia.org/T187143#3971554 (10Marostegui) >>! In T187143#3971548, @mmodell wrote: > Here is the one that I expect to be slow: https://secure.phabricator.com/source/p...
[12:46:08] <marostegui>	 haha you were faster
[12:47:03] <wikibugs>	 10DBA, 10Phabricator (2018-02-xx), 10Release: Upcoming phabricator upgrade requires unusually long database migrations - https://phabricator.wikimedia.org/T187143#3971555 (10mmodell) https://secure.phabricator.com/source/phabricator/browse/master/resources/sql/autopatches/20180208.maniphest.01.close.sql
[12:48:42] <wikibugs>	 10DBA, 10Phabricator (2018-02-xx), 10Release: Upcoming phabricator upgrade requires unusually long database migrations - https://phabricator.wikimedia.org/T187143#3971556 (10mmodell) The slow one is backfilling the data, the alter table is in a separate migration which should be much quicker than the populat...
[12:49:04] <wikibugs>	 10DBA, 10Phabricator (2018-02-xx), 10Release: Upcoming phabricator upgrade requires unusually long database migrations - https://phabricator.wikimedia.org/T187143#3971557 (10jcrespo) Is it easy to trick phabricator into skipping migrations?- we could do that safely and online, and it seems backwards compatible.
[12:49:20] <wikibugs>	 10DBA, 10Phabricator (2018-02-xx), 10Release: Upcoming phabricator upgrade requires unusually long database migrations - https://phabricator.wikimedia.org/T187143#3971558 (10Marostegui) >>! In T187143#3971555, @mmodell wrote: > https://secure.phabricator.com/source/phabricator/browse/master/resources/sql/aut...
[12:50:02] <jynus>	 small big means medium ? :-)
[12:50:13] <marostegui>	 just corrected it haha :)
[12:50:35] <jynus>	 I am not worried that much about size, but about metadata locking
[12:50:55] <jynus>	 although if we do the migration, we will not avoid that
[12:50:56] <marostegui>	 There shouldn't be much people playing around with phab at this time I reckon
[12:51:32] <wikibugs>	 10DBA, 10Phabricator (2018-02-xx), 10Release: Upcoming phabricator upgrade requires unusually long database migrations - https://phabricator.wikimedia.org/T187143#3971560 (10mmodell) @jcrespo: it's possible but I'd have to hack around some detection scheme, phabricator normally refuses to even serve pages wh...
[12:51:34] <marostegui>	 In fact I believe around 6AM UTC is better than the original proposal of midnight, as I believe there will be a lot less people
[12:52:09] <jynus>	 I was thinking the following
[12:52:14] <jynus>	 even if not technically needed
[12:52:25] <jynus>	 apply the schema change to the replicas
[12:52:39] <jynus>	 take the possible downtime to finally do the db master failover
[12:52:51] <jynus>	 apply the the new replica, old master
[12:52:55] <jynus>	 then continue the migration
[12:53:04] <jynus>	 so do the upgrade and the pending failover at the same time
[12:53:10] <wikibugs>	 10DBA, 10Phabricator (2018-02-xx), 10Release: Upcoming phabricator upgrade requires unusually long database migrations - https://phabricator.wikimedia.org/T187143#3971567 (10mmodell) @marostegui: that's one of the largest `maniphest_task` tables in teh world, despite it's size in raw bytes ;)  I'm surprised...
[12:53:18] <jynus>	 lol
[12:53:24] <marostegui>	 xDDDDDD
[12:53:31] <jynus>	 we should show him the 500 GB revision table
[12:53:36] <marostegui>	 or wb_terms
[12:53:37] <marostegui>	 XDDD
[12:53:46] <jynus>	 so check my planP 
[12:53:55] <jynus>	 I know it mixes things
[12:54:09] <jynus>	 but if we don't do it, I don't know if we will find the time
[12:54:12] <marostegui>	 yeah, the problem is replication
[12:54:25] <jynus>	 they are new columns
[12:54:32] <jynus>	 should be ok
[12:54:47] <jynus>	 not sure if I changed m3 to be row
[12:54:51] <marostegui>	 Don't know, I am fine with it but we need to check with him
[12:54:53] <jynus>	 we can check
[12:54:54] <marostegui>	 if it is doable
[12:55:07] <jynus>	 the thing is, failing over m* hosts is getting behind
[12:55:29] <marostegui>	 yeah, but mixing it with a migration I am not completely sure about it
[12:55:32] <jynus>	 and we need to start replacing those old m* hosts
[12:55:54] <jynus>	 technically, we can migrate only one host, set read only for extended period of time
[12:55:55] <marostegui>	 We should create a task to arrange a failover for phab
[12:55:56] <jynus>	 rever
[12:56:03] <marostegui>	 and start ehre
[12:56:05] <marostegui>	 there
[12:56:23] <jynus>	 revert to the original host if there is any problem
[12:56:30] <jynus>	 then the original one will be decommed eventually
[12:57:25] <marostegui>	 yeah, I get it. But we need to check with him if that is doable, migrating just one host (including all the updates and all that)
[12:57:38] <marostegui>	 I would prefer just to do the mgiration, and between us arrange a failover day and do it
[12:57:42] <marostegui>	 To avoid mixing more things
[12:58:16] <jynus>	 without him being around?
[12:58:18] <marostegui>	 But again, we need to check with him first of all
[12:58:31] <jynus>	 honestly, we do not have the time to do things well
[12:58:31] <marostegui>	 No, he needs to be around of course, as we are for his migration
[12:58:42] <jynus>	 so we should do things fast
[12:59:02] <marostegui>	 Planning all this for something that will be done at 6AM UTC tomorrow is going too fast I think :)
[12:59:09] <jynus>	 m3 is db1043 -> db1059
[12:59:16] <jynus>	 it has been planed a long time ago
[12:59:21] <jynus>	 I did the planning :-)
[12:59:32] <marostegui>	 Along with a data migration?
[13:00:01] <jynus>	 T175679
[13:00:01] <stashbot>	 T175679: Decommission db1048 (was Move m3 slave to db1059) - https://phabricator.wikimedia.org/T175679
[13:00:16] <jynus>	 Sep 12 2017
[13:00:30] <marostegui>	 I know, but what is not planned is the failover
[13:00:49] <marostegui>	 Do you feel comfortable doing it tomorrow along with a data migration?
[13:00:58] <jynus>	 what data migration?
[13:01:19] <marostegui>	 the one that we are discussing in the ticket
[13:01:41] <jynus>	 if you are confortable with that, which is the new stuff, I am confortable with the switchover
[13:01:51] <jynus>	 in fact, I think it is safer to do a switchover
[13:01:58] <jynus>	 than doing it without a switchover
[13:02:01] <marostegui>	 The thing is that we are doing both at the same time :)
[13:02:06] <jynus>	 ^
[13:02:23] <marostegui>	 And there are many questions in the air, starting with: can we have a replica with different schema? can he run the migration there first?
[13:02:23] <jynus>	 marostegui: my plan has a rollback plan
[13:02:47] <jynus>	 applying it to the current master does not have a rollback option
[13:02:59] <marostegui>	 it does if we stop the slave before the migration
[13:03:33] <marostegui>	 again, we need to ask him if the mgiration can be run on the slave first
[13:04:35] <jynus>	 I still do not understand your fear- we already have automatic failover on the servers
[13:04:46] <jynus>	 I can trigger it right now
[13:05:50] <marostegui>	 I don't have any fear, I have questiosn: can we have a replica with different schema? can he run the migration there first?
[13:06:01] <jynus>	 we do the alter- if replication break on the replica, we undo it
[13:06:07] <jynus>	 if it doesn't we failover
[13:06:16] <jynus>	 please tell me where is the problem?
[13:06:59] <marostegui>	 The problem is that we don't even have things clear and the migration is happening in a few hours and we haven't even talked to mukunda about this so we don't even know how to run the migration on a single host
[13:07:30] <jynus>	 marostegui: don't worry, I will take care, ok
[13:07:42] <marostegui>	 ok
[13:08:39] <marostegui>	 I will be online at 7am to help
[13:09:41] <jynus>	 well, technicall, it would be in 2 days
[13:10:05] <marostegui>	 I believe it is thursday 6AM UTC, that was my understanding from the ticket
[13:10:47] <marostegui>	 https://phabricator.wikimedia.org/T187143#3968928
[13:12:37] <jynus>	 let me run the later right now
[13:12:41] <jynus>	 *the alter
[13:12:46] <jynus>	 so I can tell you
[13:17:06] <marostegui>	 ok
[13:31:38] <wikibugs>	 10DBA, 10Phabricator (2018-02-xx), 10Release: Upcoming phabricator upgrade requires unusually long database migrations - https://phabricator.wikimedia.org/T187143#3971656 (10jcrespo) The alter is done on the slave already- try to see if you can cleanly skip that migration- if there are other alters backwards...
[13:32:47] <wikibugs>	 10DBA, 10Phabricator (2018-02-xx), 10Release: Upcoming phabricator upgrade requires unusually long database migrations - https://phabricator.wikimedia.org/T187143#3971668 (10jcrespo) The switchover will probably require your help restarting apache/phab at the time.
[14:13:53] <Amir1>	 Hey, I enabled a feature in svwiki, itwiki, nlwiki (s2) hywiki, zhwiki, bewiki, glwiki (s3), and frwiki (s6) that might increase size of wbc_entity_usage a little, In some cases it's actually reduces the size drastically (like cawiki, ruwiki, hywiki) but one thing is for sure, recentchanges table in all of these wikis is starting to shrink. You might need to optimize it. Specially please do optimize wbc_enityt_usage table in
[14:13:53] <Amir1>	 ruwiki, cawiki, and hywiki next week. It will free up some space
[14:28:07] <jynus>	 that is too much info- copy and paste to a ticket :-)
[14:28:25] <jynus>	 I got lost on "I"
[14:28:42] <Amir1>	 :))))
[14:28:43] <Amir1>	 Sure
[14:29:04] <jynus>	 nothing fancy, FYI with apparently low priority
[14:29:20] <jynus>	 a "FYI ticket", I mean
[14:30:52] <Amir1>	 Will do
[14:31:44] <jynus>	 it looks like low because it doesn't seem urgent of blocking you at first
[14:31:49] <jynus>	 *or blocking
[15:24:13] <wikibugs>	 10DBA, 10Operations, 10ops-codfw: db2048: RAID with predictive failure - https://phabricator.wikimedia.org/T187328#3972007 (10Marostegui) p:05Triage>03Normal
[15:54:17] <marostegui>	 Nice, this will be fixed in 10.3 (apparently) https://jira.mariadb.org/browse/MDEV-10568
[15:55:30] <wikibugs>	 10DBA, 10Operations, 10ops-eqiad, 10Patch-For-Review: Rack and setup db1115 (tendril replacement database) - https://phabricator.wikimedia.org/T185788#3972137 (10RobH) a:05Cmjohnson>03Marostegui @Marostegui:  I'm assigning this to you for stalled until you provide feedback as requested.  Please assign...
[15:55:48] <jynus>	 is that transportable tablespaces for partitioned tables?
[15:56:10] <marostegui>	 yeah
[15:56:29] <jynus>	 maybe too late?
[15:57:06] <marostegui>	 yeah, indeed, specially because it's been there since 5.7 in mysql :( 
[16:14:12] <jynus>	 we can always go to #mariadb and troll monty :-)
[16:14:12] <jynus>	 as he does on my talks
[16:31:17] <jynus>	 I am looking at s6 on eqiad
[16:31:41] <jynus>	 right now there is 1 api host and 2 main servers with large servers
[16:31:56] <jynus>	 I was thinking of having 2 api and 2 main
[16:32:19] <jynus>	 by havin one with 50/50
[18:02:32] <wikibugs>	 10DBA, 10Data-Services, 10Goal, 10Patch-For-Review, 10cloud-services-team (FY2017-18): Migrate all users to new Wiki Replica cluster and decommission old hardware - https://phabricator.wikimedia.org/T142807#3972846 (10madhuvishy)
[18:02:46] <wikibugs>	 10DBA, 10Data-Services, 10cloud-services-team (Kanban): Review m5 backups - https://phabricator.wikimedia.org/T186585#3972847 (10madhuvishy) Thanks for this work @jcrespo!