[00:28:26] <wikibugs>	 10DBA, 10Data-Services, 10Patch-For-Review: Expose ar_content_format and ar_content_model columns of archive table on Labs replicas - https://phabricator.wikimedia.org/T89741#3423430 (10Bawolff) I'm fine with it, provided that its null when ar_deleted&1=1.  This may be mildly paranoia, but I'd like to be str...
[01:07:42] <wikibugs>	 10DBA, 10Data-Services, 10Patch-For-Review: Expose ar_content_format and ar_content_model columns of archive table on Labs replicas - https://phabricator.wikimedia.org/T89741#3423529 (10APalmer_WMF) 05Open>03Resolved
[01:08:00] <wikibugs>	 10DBA, 10Data-Services, 10Patch-For-Review: Expose ar_content_format and ar_content_model columns of archive table on Labs replicas - https://phabricator.wikimedia.org/T89741#1044220 (10APalmer_WMF) 05Resolved>03Open
[05:01:10] <wikibugs>	 10DBA, 10Operations, 10ops-eqiad, 10Patch-For-Review: Reset db1070 idrac - https://phabricator.wikimedia.org/T160392#3423679 (10Marostegui) Nice catch faidon!! Thanks for fixing this and specially thanks for fixing dbstore1001, which is a critical host for us!
[06:55:30] <madhuvishy>	 hey y'all :) cya at 1400 for T168584! have a nice day <3
[06:55:30] <stashbot>	 T168584: Labsdb* servers need to be rebooted - https://phabricator.wikimedia.org/T168584
[06:56:13] <marostegui>	 Sleep weel madhuvishy! :)
[06:56:16] <marostegui>	 well
[07:29:49] <wikibugs>	 10DBA, 10RESTBase-API, 10Reading List Service, 10ArchCom-RfC (ArchCom-Approved), and 4 others: RfC: Reading List service - https://phabricator.wikimedia.org/T164990#3423906 (10jcrespo) It is my intention to, at some point, to convert titles into first-class entities, give them some ids and that way reduce...
[07:47:02] <wikibugs>	 10DBA, 10Mail, 10Operations: Setup database for dmarc service - https://phabricator.wikimedia.org/T170158#3423956 (10jcrespo) a:03herron So we need: db name, account name, grants needed, ips/dns of the origin of the connections.
[07:59:01] <wikibugs>	 10DBA: Drop localisation and localisation_file_hash tables, l10nwiki databases too - https://phabricator.wikimedia.org/T119811#3423982 (10Marostegui) Dropped from s6 (frwiki and jawiki)
[08:18:28] <wikibugs>	 10DBA: Drop localisation and localisation_file_hash tables, l10nwiki databases too - https://phabricator.wikimedia.org/T119811#3424047 (10Marostegui) Dropped from s5 (dewiki)
[08:41:30] <wikibugs>	 10DBA, 10MediaWiki-extensions-WikibaseClient, 10Operations, 10Performance-Team, and 6 others: Cache invalidations coming from the JobQueue are causing lag on several wikis - https://phabricator.wikimedia.org/T164173#3424125 (10aaron) >>! In T164173#3420723, @daniel wrote: > @aaron another question: does Re...
[09:23:30] <wikibugs>	 10DBA, 10RESTBase-API, 10Reading List Service, 10ArchCom-RfC (ArchCom-Approved), and 4 others: RfC: Reading List service - https://phabricator.wikimedia.org/T164990#3424163 (10daniel) >>! In T164990#3421869, @Fjalapeno wrote: > @Daniel we went back and forth on this a bit. I originally proposed ids, but in...
[09:30:48] <wikibugs>	 10DBA, 10Analytics-Kanban, 10Operations, 10Patch-For-Review, 10User-Elukey: Puppetize Piwik's Database and set up periodical backups - https://phabricator.wikimedia.org/T164073#3424171 (10elukey) I want to observe how the patch that I merged behaves during the next days before closing.
[09:35:33] <elukey>	 jynus: o/ - when you have some time can I ask you a couple of questions about the eventlogging_cleaner user on dbstore1002/db1047 ?
[09:39:52] <jynus>	 elukey sure
[09:39:57] <jynus>	 can I call you?
[09:40:12] <jynus>	 I hate my keyboard
[09:40:24] <jynus>	 new one arrives tomorrow
[09:47:48] <elukey>	 jynus: sorry I was grabbing a coffee, sure! 
[09:48:13] <elukey>	 (grabbing my headphones)
[10:12:14] <wikibugs>	 10DBA: Drop localisation and localisation_file_hash tables, l10nwiki databases too - https://phabricator.wikimedia.org/T119811#3424257 (10Marostegui) Dropped from s4 (commonswiki)
[10:50:47] <wikibugs>	 10Blocked-on-schema-change, 10DBA, 10Patch-For-Review: Convert unique keys into primary keys for some wiki tables on s1 - https://phabricator.wikimedia.org/T166204#3424434 (10Marostegui) db1067 is done: ``` root@neodymium:/home/marostegui# for i in `cat s1_tables`; do echo $i; mysql --skip-ssl -hdb1067 enwik...
[10:50:50] <wikibugs>	 10Blocked-on-schema-change, 10DBA, 10Patch-For-Review: Convert unique keys into primary keys for some wiki tables on s1 - https://phabricator.wikimedia.org/T166204#3424435 (10Marostegui)
[10:54:53] <wikibugs>	 10Blocked-on-schema-change, 10DBA, 10Patch-For-Review: Convert unique keys into primary keys for some wiki tables on s1 - https://phabricator.wikimedia.org/T166204#3424437 (10Marostegui)
[11:09:57] <jynus>	 s1 seems to have replication issues
[11:10:06] <jynus>	 as in more lag than usualk
[11:10:16] <jynus>	 on some host on eqiad and codfw 
[11:12:11] <marostegui>	 you seeing that on the aggregated?
[11:13:44] <jynus>	 I am seein everywhere, tendril, icinga, logstash
[11:13:59] <jynus>	 https://logstash.wikimedia.org/goto/2c8c67d873b9b4f7b092951474ede9ef
[11:17:10] <marostegui>	 I was checking my last scap deployment and it was db1067 just being depooled to pooled with weight 0, at 10:56 as perl SAL, so I don't think it could be related
[11:18:03] <jynus>	 pools and depools rarely cause lag
[11:18:19] <jynus>	 either the whole whole thing is down or not
[11:18:20] <marostegui>	 yeah, I was just checking if it could be in anyway related
[11:18:39] <jynus>	 normally when affecting multiple hosts it is a software cause
[12:31:48] <wikibugs>	 10DBA, 10Analytics-Kanban: Purge all old data from EventLogging master - https://phabricator.wikimedia.org/T168414#3424745 (10mforns)
[13:05:35] <wikibugs>	 10DBA, 10RESTBase-API, 10Reading List Service, 10ArchCom-RfC (ArchCom-Approved), and 4 others: RfC: Reading List service - https://phabricator.wikimedia.org/T164990#3424855 (10Tgr) ID change on page undeletion was fixed a while ago (T28123). Deletion and normal recreation still changes the ID (and attempts...
[14:01:43] <madhuvishy>	 jynus: morning :) labsdb1004 reboot?
[14:05:18] <madhuvishy>	 halfak and me are around
[14:05:35] <halfak>	 \o/
[14:07:33] <jynus>	 I was waiting for chris to be around
[14:07:53] <madhuvishy>	 jynus: for 1004? I thought that was only for 1001 and 1003
[14:08:00] <jynus>	 ok
[14:08:05] <jynus>	 let's do it, then
[14:08:23] <jynus>	 any blockers or anything you have to do before hand?
[14:08:39] <madhuvishy>	 jynus: icinga I think
[14:08:45] <jynus>	 yes
[14:08:49] <madhuvishy>	 i can set up downtime
[14:08:59] <jynus>	 let's disable alerts
[14:09:08] <jynus>	 downtime get lost very frequently
[14:09:25] <jynus>	 lately
[14:10:17] <madhuvishy>	 jynus: okay, done
[14:11:01] <jynus>	 lets move to operations-
[14:11:16] <madhuvishy>	 jynus: okay!
[14:22:23] <halfak>	 Looks like we're down?
[14:24:57] <volans>	 halfak: see progress in #wikimedia-operations ;)
[14:25:15] <halfak>	 :) Yeah.  Just saw that after I posted. 
[14:27:38] <halfak>	 All was successful. 
[14:27:40] <halfak>	 Thanks folks
[14:27:48] * halfak retreats to his usually scheduled channels. 
[14:27:50] <halfak>	 o/
[14:28:42] <wikibugs>	 10Blocked-on-schema-change, 10DBA, 10Patch-For-Review: Apply schema change to add 3D filetype for STL files - https://phabricator.wikimedia.org/T168661#3425136 (10Marostegui) #cloud-services-team I am about to start an alter table on sanitarium2's master (db1064) which once done will replicate to sanitarium2...
[14:28:50] <madhuvishy>	 jynus: o/ 
[14:28:50] <jynus>	 so postgres uses default packages
[14:29:04] <jynus>	 which means everything is automatic
[14:29:09] <jynus>	 start, stop
[14:29:12] <madhuvishy>	 right
[14:29:20] <jynus>	 I manually in this case did
[14:29:32] <jynus>	 systemctl stop postgresql
[14:29:35] <jynus>	 but just to be safe
[14:29:47] <madhuvishy>	 ah before rebooting?
[14:29:51] <jynus>	 mysql is a bit more involved because we use the same ideas than on production
[14:30:05] <jynus>	 yes, but almost 100% sure it is not needed
[14:30:05] <madhuvishy>	 but when it came up it automatically started postgres - okay
[14:30:08] <jynus>	 yes
[14:30:18] <jynus>	 mysql, we like to handle it manually
[14:30:32] <jynus>	 that means managed => false on puppet
[14:30:43] <jynus>	 except on trivial services
[14:30:44] <madhuvishy>	 right, okay
[14:30:53] <jynus>	 and automatic stuff is deleted from the package
[14:31:11] <jynus>	 so in this case there are 2 important stuff
[14:31:26] <jynus>	 STOP ALL SLAVES before stop
[14:31:45] <jynus>	 and manually doing /etc/init.d/mysql stop 
[14:31:53] <jynus>	 the reason is that
[14:32:08] <jynus>	 it can take 30 minutes to shutdown a server
[14:32:16] <jynus>	 and the os won't wait
[14:32:24] <jynus>	 creating in some cases corruption
[14:32:24] <madhuvishy>	 ah right
[14:32:46] <jynus>	 so init.d is still used on jessie
[14:33:00] <jynus>	 stretch finally gives us systemd support
[14:33:12] <jynus>	 (mariadb, actually, but you get the idea)
[14:33:30] <jynus>	 so systemctl mariadb stop on strech+
[14:33:52] <madhuvishy>	 right, makes sense
[14:33:54] <jynus>	 the only extra thing is that  I upgraded mysql
[14:34:09] <jynus>	 probably something you will not be doing on your own etc
[14:34:17] <jynus>	 but just out of completeness
[14:34:30] <jynus>	 I started mysql with slave stopped
[14:34:40] <madhuvishy>	 right - may be i'll be a dba someday ;) go on etc!
[14:34:50] <jynus>	  /etc/init.d/mysql start --skip-slave-start
[14:34:59] <jynus>	 then mysql_upgrade --skip-ssl
[14:35:21] <jynus>	 and then mysql --skip-ssl -> START SLAVE
[14:35:33] <jynus>	 on stretch, the skip-slave start
[14:35:37] <jynus>	 will be
[14:35:51] <madhuvishy>	 jynus: you ran all this post-reboot?
[14:35:57] <jynus>	 yes
[14:36:08] <jynus>	 or after mysql start if reboot
[14:36:40] <jynus>	 importatant before pooling it back/m*rk it as active
[14:37:08] <jynus>	 systemctl set-environment MYSQLD_OPTS="--skip-slave-start"
[14:37:19] <jynus>	 mysql_upgrade --skip-ssl
[14:37:26] <jynus>	 systemctl unset-environment MYSQLD_OPTS
[14:37:37] <jynus>	 then start the slave the same way
[14:37:46] <jynus>	 this should be all on the documenation for mariadb
[14:37:53] <madhuvishy>	 so, you stop all slaves before reboot, reboot, and then start back mysql with --skip-slave-start, run the upgrade, and finally start the slaves
[14:37:54] <jynus>	 and if it is not, I will add it soon
[14:38:01] <jynus>	 exactly
[14:38:07] <jynus>	 same here as in production
[14:38:07] <madhuvishy>	 cool
[14:38:12] <jynus>	 so the idea
[14:38:30] <jynus>	 is that it requires a bit more of work as a conscient decision
[14:38:40] <jynus>	 normally the package does that every time
[14:38:47] <jynus>	 but that is dangereous in some cases
[14:39:05] <jynus>	 here we control when to do it, what kind of maintenance to do, etc.
[14:39:20] <jynus>	 not start it automatically
[14:39:29] <madhuvishy>	 oh okay understood
[14:39:31] <jynus>	 because if it crashed, we want to check it first
[14:39:50] <jynus>	 just wanted to share what I just did- maybe not useful
[14:40:04] <madhuvishy>	 definitely useful :) thank you
[14:40:50] <jynus>	 I will now undo the alerts disabling
[14:41:04] <madhuvishy>	 jynus: awesome thanks
[14:41:23] <madhuvishy>	 jynus: for the upcoming 1001 and 1003 reboots, Chris is back, do you think we should do one of them thursday?
[14:41:34] <jynus>	 mmm
[14:41:44] <jynus>	 is manuel around?
[14:41:48] <madhuvishy>	 i'm happy to wait till next week too
[14:42:05] <madhuvishy>	 to make sure the dust settles after tomorrow's reboot
[14:42:06] <jynus>	 my biggest concern is what if they do not come back
[14:42:26] <jynus>	 I think manuel was almost finishing the new hosts
[14:42:33] <jynus>	 and don't want to pressure him
[14:42:41] <jynus>	 but that would be a great plan B
[14:42:41] <marostegui>	 Yeah, "almost"
[14:42:43] <jynus>	 plus
[14:42:50] <jynus>	 announcement
[14:43:19] <jynus>	 in the ideal world, we should be able to send thing with 1 day in advance
[14:43:20] <moritzm>	 if these are being replaced with new hosts and if there's a real risk of those not coming back, let's rather avoid the reboots of 1001/1003
[14:43:32] <jynus>	 moritzm: but if we keep doing that
[14:43:37] <jynus>	 we will never know
[14:43:42] <jynus>	 plus, the replacements
[14:43:54] <marostegui>	 I am "almost" done with the new hosts, but it can take some days until they have imported the rest 3 shards
[14:43:57] <jynus>	 are not going to be 100% ready
[14:44:01] <jynus>	 yeah
[14:44:11] <moritzm>	 ok, I thought this was a drop-in replacement of some sort
[14:44:14] <jynus>	 so I wanted a colective decision
[14:44:17] <jynus>	 moritzm: not at all
[14:44:19] <jynus>	 same service
[14:44:21] <jynus>	 but many changes
[14:44:39] <jynus>	 eventual replacement, but the user side of things is problematic
[14:45:13] <jynus>	 I don't know, maybe I am too negative
[14:45:20] <chasemp>	 if we can wait on reboot we should 
[14:45:34] <jynus>	 what are the chances also
[14:45:37] <chasemp>	 404 and 950 days of uptime on 5-ish year old servers?
[14:45:42] <jynus>	 of both not coming back?
[14:45:53] <marostegui>	 Yeah chasemp, that is the thing :(
[14:46:01] <madhuvishy>	 if they haven't been rebooted in that long, we have no idea I guess
[14:46:05] <marostegui>	 If I had to reboot one, I would try the 404 uptime one first
[14:46:17] <chasemp>	 and we can't lose a single disk iirc it's raid 0 for data
[14:46:24] <jynus>	 not only that
[14:46:29] <jynus>	 before we run out of spave
[14:46:48] <jynus>	 we enlarged the disk with a plain parition
[14:46:53] <chasemp>	 disks are the most likely thing to fail on a reboot after that long 
[14:46:57] <jynus>	 yeah
[14:47:01] <jynus>	 that is my fear
[14:47:09] <chasemp>	 If we can help it I vote wait
[14:47:10] <jynus>	 that is my feari do not want to be a jerk
[14:47:17] <chasemp>	 in a few weeks we would hve a viable plan b even if painful?
[14:47:18] <madhuvishy>	 it seems like we should hold off to me too
[14:47:21] <jynus>	 but this is the part where we give you the facts
[14:47:34] <jynus>	 and you kind of value them and take a decision
[14:47:38] <chasemp>	 as long as moritzm says we can afford to hold off we should
[14:47:56] <chasemp>	 we are already walking the tight rope let's not also juggle chainsaws :)
[14:47:57] <jynus>	 being a devils advocate
[14:48:01] <jynus>	 on the other side
[14:48:18] <jynus>	 they are the most common ones to be breached for security issues
[14:48:25] <jynus>	 and most needing an upgrade
[14:48:39] <chasemp>	 understood
[14:49:11] <jynus>	 when we finish setting up 9-11
[14:49:22] <chasemp>	 marostegui: are we something like +/- a month from all replicas on 9-11?
[14:49:39] <jynus>	 we should announce them right away
[14:49:43] <jynus>	 as beta
[14:49:46] <jynus>	 or something
[14:49:51] <jynus>	 opt-in
[14:49:57] <chasemp>	 agreed
[14:50:01] <jynus>	 and try to aleviate some load
[14:50:06] <marostegui>	 chasemp: I would say less than a month
[14:50:10] <jynus>	 that will help
[14:50:28] <moritzm>	 so, these servers have the immediate attack vectors plugged (glibc ld.so, exim and sudo), the reboot is required to fix the underlying problem on the kernel level
[14:51:13] <madhuvishy>	 right, okay
[14:51:44] <moritzm>	 if there's a risk of some data loss (and raid0 for the data partion sounds a bit like it), we can also hold this back, but it would be good if we setup teh new servers in a way that it allows us to also reboot these servers in the future
[14:52:09] <chasemp>	 moritzm: sounds good, thank you
[14:52:17] <jynus>	 moritzm: the new ones
[14:52:21] <jynus>	 have that
[14:52:40] <jynus>	 we have haproxy in the middle, and it is extremely simple
[14:52:47] <moritzm>	 lets avoid trouble, then and fix this by migrating to the new servers
[14:52:50] <jynus>	 to the point that it was done weeks ago
[14:53:26] <jynus>	 but we need to compromise to push that
[14:53:33] <jynus>	 once it is ready
[14:54:05] <jynus>	 which is when we give cloud the keys to the porche and wish you good luck :-D
[14:54:18] <marostegui>	 hahahaha
[14:54:29] <chasemp>	 then we crash it and ask our parents for another 
[14:54:33] <madhuvishy>	 ok great: summary - We wait on labsdb1001 and 1003 reboots for now, wait for labsdb1009-11 to be available as functional alternatives for users, and then when the usage for 1001 and 1003 has dropped somewhat, we schedule reboots for 1 and 3.
[14:54:34] <jynus>	 he he
[14:54:36] <madhuvishy>	 :D
[14:55:09] <chasemp>	 madhuvishy: that is my understanding as well, and that should be able to happen within the next 6 weeks it sounds like
[14:55:24] <madhuvishy>	 right, I'll update on task :)
[14:55:42] <jynus>	 chasemp: it is also a good reason
[14:55:49] <jynus>	 to push people to adopt them
[14:55:56] <wikibugs>	 10DBA, 10Operations, 10Scoring-platform-team, 10cloud-services-team: Labsdb* servers need to be rebooted - https://phabricator.wikimedia.org/T168584#3425238 (10madhuvishy)
[14:56:11] <jynus>	 "they will be decomissioned no longer than X due to hardware and security concerns"
[14:56:17] <chasemp>	 definitely, stick and carrot 
[14:56:39] <jynus>	 you should definitely start thinking about the user dbs
[14:56:52] <jynus>	 maybe with the new VM hosts
[14:57:06] <jynus>	 (not sure how that is going)
[14:57:16] <chasemp>	 I think teh policy decision has to be made before the technical ones on the user dbs there
[14:57:21] <chasemp>	 and we haven't had a real final thought on it
[14:57:28] <chasemp>	 jynus: thosee are ordered and that's about it
[14:57:30] <jynus>	 I know
[14:57:37] <jynus>	 just a reminder
[14:57:49] <jynus>	 so we do not leave those decisions for the last minute
[14:57:56] <chasemp>	 everything in me wants to say no surviving user dbs on replicas
[14:57:58] <chasemp>	 right
[14:58:02] <jynus>	 yeah
[14:58:08] <jynus>	 but....
[14:58:21] <jynus>	 (elpisis, just elipsis)
[14:58:33] <chasemp>	 this is where we make bryan teh bad guy and stand behind him while the arrows fly
[14:58:34] <chasemp>	 :)
[14:58:45] <chasemp>	 j/k idk what we'll do
[14:59:00] <wikibugs>	 10DBA, 10Operations, 10Scoring-platform-team, 10cloud-services-team: Labsdb* servers need to be rebooted - https://phabricator.wikimedia.org/T168584#3425254 (10madhuvishy) Status: labsdb1005 reboot is scheduled for July 12 at 1400 UTC.  We've decided to wait on labsdb1001 and 1003 reboots for now - given t...
[14:59:30] <chasemp>	 jynus: will you two be in montreal by chance?
[14:59:39] <chasemp>	 good time to sit down and talk since it's so nuanced
[14:59:49] <marostegui>	 I will be around
[14:59:52] <marostegui>	 Jaime won't
[15:00:28] <chasemp>	 let's plan on bouncing some ideas and reporting back marostegui
[15:00:39] <marostegui>	 sure!
[15:00:40] <chasemp>	 if jynus is ok w/ that
[15:00:44] <jynus>	 I would involve some prominent users
[15:00:49] <jynus>	 as in
[15:00:58] <jynus>	 asking how they use it
[15:01:04] <chasemp>	 kinda what I'm thinking too, we can set up a table to take concerns possibly
[15:01:07] <jynus>	 and maybe 90% are trivial changes
[15:01:20] <chasemp>	 we have a few "posters" and designated appearances we can use to recruit some use cases 
[15:01:26] <jynus>	 like people using replicas becase they do not know better
[15:01:44] <jynus>	 or one-time usages
[15:01:51] <chasemp>	 halfak is a good proxy for advanced users too so I want to bend his ear a bit
[15:01:55] <jynus>	 which could be maybe solved
[15:03:01] <chasemp>	 thanks jynus marostegui moritzm madhuvishy :) brb before a meeting
[15:03:39] <madhuvishy>	 yup, thanks y'all :) /me -> meeting too
[15:05:29] <jynus>	 marostegui: https://icinga.wikimedia.org/cgi-bin/icinga/status.cgi?host=db1096
[15:06:17] <marostegui>	 nice!!!
[15:11:49] <wikibugs>	 10DBA, 10RESTBase-API, 10Reading List Service, 10ArchCom-RfC (ArchCom-Approved), and 4 others: RfC: Reading List service - https://phabricator.wikimedia.org/T164990#3425302 (10Fjalapeno) @tgr thanks… I forgot to mention the summary lookup in Cassandra which doesn't support ids
[15:34:48] <wikibugs>	 10DBA, 10Operations, 10Patch-For-Review: eqiad rack/setup 11 new DB servers - https://phabricator.wikimedia.org/T162233#3425420 (10Cmjohnson) @jcrespo: the issue should be resolved. The cable was in the wrong eth port.  Confirmed MAC cmjohnson@asw-b-eqiad> ... ethernet-switching table brief |grep ge-5/0/5...
[15:41:00] <wikibugs>	 10DBA, 10Operations, 10Patch-For-Review: eqiad rack/setup 11 new DB servers - https://phabricator.wikimedia.org/T162233#3425465 (10jcrespo) May I ask you to check db1100, db1104 and db1105- probably the same issue.
[15:42:54] <wikibugs>	 10DBA, 10Operations, 10Patch-For-Review: eqiad rack/setup 11 new DB servers - https://phabricator.wikimedia.org/T162233#3425480 (10ops-monitoring-bot) Script wmf_auto_reimage was launched by jynus on neodymium.eqiad.wmnet for hosts: ``` ['db1098.eqiad.wmnet'] ``` The log can be found in `/var/log/wmf-auto-re...
[15:49:21] <wikibugs>	 10DBA, 10Patch-For-Review: Refactor puppet mariadb class to support multi-instance hosts - https://phabricator.wikimedia.org/T169514#3400308 (10ops-monitoring-bot) Script wmf_auto_reimage was launched by jynus on neodymium.eqiad.wmnet for hosts: ``` ['db1096.eqiad.wmnet'] ``` The log can be found in `/var/log/...
[15:50:37] <wikibugs>	 10DBA, 10Operations, 10Patch-For-Review: eqiad rack/setup 11 new DB servers - https://phabricator.wikimedia.org/T162233#3425524 (10jcrespo)
[16:11:48] <wikibugs>	 10DBA, 10Patch-For-Review: Refactor puppet mariadb class to support multi-instance hosts - https://phabricator.wikimedia.org/T169514#3425594 (10ops-monitoring-bot) Completed auto-reimage of hosts: ``` ['db1096.eqiad.wmnet'] ```  and were **ALL** successful.
[16:31:58] <wikibugs>	 10DBA, 10Operations, 10Patch-For-Review: eqiad rack/setup 11 new DB servers - https://phabricator.wikimedia.org/T162233#3425748 (10ops-monitoring-bot) Completed auto-reimage of hosts: ``` ['db1098.eqiad.wmnet'] ```  and were **ALL** successful.
[16:44:40] <wikibugs>	 10DBA, 10Analytics-EventLogging, 10Analytics-Kanban, 10Contributors-Analysis, and 5 others: Drop tables with bad data: mediawiki_page_create_1 mediawiki_revision_create_1 - https://phabricator.wikimedia.org/T169781#3425892 (10kaldari) 05Open>03Resolved Looks good to me!
[16:45:01] <wikibugs>	 10DBA, 10Analytics-EventLogging, 10Analytics-Kanban, 10Contributors-Analysis, and 4 others: Drop tables with bad data: mediawiki_page_create_1 mediawiki_revision_create_1 - https://phabricator.wikimedia.org/T169781#3425895 (10kaldari)
[16:59:02] <wikibugs>	 10DBA, 10Operations, 10Patch-For-Review: eqiad rack/setup 11 new DB servers - https://phabricator.wikimedia.org/T162233#3425966 (10jcrespo)
[16:59:11] <wikibugs>	 10DBA, 10Operations, 10Patch-For-Review: eqiad rack/setup 11 new DB servers - https://phabricator.wikimedia.org/T162233#3156297 (10jcrespo)
[18:42:35] <wikibugs>	 10DBA, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Create a user for the eventlogging_cleaner script on the analytics slaves - https://phabricator.wikimedia.org/T170118#3426549 (10Nuria)
[19:00:23] <wikibugs>	 10DBA, 10Analytics, 10Analytics-EventLogging: dbstore1002 crashed - https://phabricator.wikimedia.org/T170308#3426597 (10jcrespo)
[19:06:24] <wikibugs>	 10DBA, 10Operations, 10Patch-For-Review: eqiad rack/setup 11 new DB servers - https://phabricator.wikimedia.org/T162233#3426638 (10Cmjohnson) @jcrespo db1100, 1105 were the same issue db1104 is something else. I will update once I figure it out
[19:19:00] <wikibugs>	 10DBA, 10Analytics, 10Analytics-EventLogging: dbstore1002 crashed - https://phabricator.wikimedia.org/T170308#3426698 (10Marostegui) Yeah, big alter on s1 tables (adding PK) was running at the time :-(
[19:20:58] <wikibugs>	 10DBA, 10Analytics, 10Analytics-EventLogging: dbstore1002 crashed - https://phabricator.wikimedia.org/T170308#3426713 (10Marostegui) I will try the alters tomorrow to see if they go thru or if this host cannot cope with such big ones (which will be worrying)
[19:37:52] <wikibugs>	 10DBA, 10Analytics, 10Analytics-EventLogging: dbstore1002 crashed - https://phabricator.wikimedia.org/T170308#3426812 (10jcrespo) At least x1 broke- no time to reimport now.
[19:38:59] <wikibugs>	 10DBA, 10Operations, 10Patch-For-Review: eqiad rack/setup 11 new DB servers - https://phabricator.wikimedia.org/T162233#3426824 (10Cmjohnson) @jcrespo db1104 is fixed, vlan conflict.
[19:40:13] <wikibugs>	 10DBA, 10Epic, 10Tracking: Database tables to be dropped on Wikimedia wikis and other WMF databases (tracking) - https://phabricator.wikimedia.org/T54921#3426839 (10jcrespo)
[19:42:03] <wikibugs>	 10DBA, 10Mail, 10Operations: Setup database for dmarc service - https://phabricator.wikimedia.org/T170158#3426863 (10herron)
[19:43:18] <wikibugs>	 10DBA, 10Analytics, 10Security, 10Wikimedia-Incident: MySQL password for research@analytics-store.eqiad.wmnet publicly revealed - https://phabricator.wikimedia.org/T170066#3426873 (10Legoktm)
[20:05:51] <wikibugs>	 10DBA, 10Analytics, 10Analytics-EventLogging: dbstore1002 crashed - https://phabricator.wikimedia.org/T170308#3427081 (10Marostegui) Oh if a shard at least broke, then I won't try this alter again as it could corrupt another shard and we might need to even reimport it. We will need to skip this host and leav...
[20:24:48] <wikibugs>	 10DBA, 10Patch-For-Review: Refactor puppet mariadb class to support multi-instance hosts - https://phabricator.wikimedia.org/T169514#3427276 (10jcrespo) So after lots of changes, db1096 is running right now with 7 mysql instances (they are empty), usable and with icinga monitoring.   `systemctl start mariadb@s...
[21:04:08] <wikibugs>	 10DBA, 10Operations, 10Wikimedia-Site-requests, 10Patch-For-Review: Create CoC committee private wiki - https://phabricator.wikimedia.org/T165977#3427683 (10Dereckson) a:03Dereckson Wiki scheduled for creation 2017-07-12 10:00–13:00 UTC.
[21:54:31] <wikibugs>	 10DBA: db2019 has performance issues, replace disk or switchover s4 master elsewhere - https://phabricator.wikimedia.org/T170351#3428001 (10jcrespo)