[06:03:34] <wikibugs>	 10DBA, 10Data-Services, 10Goal, 10cloud-services-team (FY2017-18): Create method for accessing user watchlists in database queries - https://phabricator.wikimedia.org/T182948#3839509 (10Dispenser) p:05Triage>03High
[06:04:09] <wikibugs>	 10DBA, 10Data-Services, 10Goal, 10cloud-services-team (FY2017-18): Create method for accessing user watchlists in database queries - https://phabricator.wikimedia.org/T182948#3839521 (10Dispenser) p:05High>03Normal
[07:48:52] <wikibugs>	 10DBA, 10Patch-For-Review: run pt-tablechecksum on s5/s8 - https://phabricator.wikimedia.org/T161294#3839619 (10Marostegui) wikidatawiki.archive has been fixed.
[08:38:24] <wikibugs>	 10DBA, 10Operations, 10Patch-For-Review: Decommission db1034 - https://phabricator.wikimedia.org/T182556#3839663 (10Marostegui)
[08:42:56] <wikibugs>	 10DBA, 10Operations, 10hardware-requests, 10ops-eqiad, 10Patch-For-Review: Decommission db1034 - https://phabricator.wikimedia.org/T182556#3839665 (10Marostegui) a:05Marostegui>03Cmjohnson @Cmjohnson this host is fully ready to be decommissioned
[09:04:07] <wikibugs>	 10DBA, 10Data-Services, 10Toolforge: I can't connect to DB replica on Toolforge due to TLS-related failure - https://phabricator.wikimedia.org/T182892#3837792 (10jcrespo) > I wonder if there is some signal that is coming from the new db cluster that is triggering your client to attempt TLS protection on the...
[09:30:40] <wikibugs>	 10DBA, 10Operations, 10Performance-Team, 10Availability (Multiple-active-datacenters): Perform testing for TLS effect on connection rate - https://phabricator.wikimedia.org/T171071#3839728 (10jcrespo) Do you have a set of instructions you run, so I can reproduce and I can check TLS is effectively enabled w...
[09:40:04] <wikibugs>	 10DBA, 10Data-Services, 10Goal, 10cloud-services-team (FY2017-18): Create method for accessing user watchlists in database queries - https://phabricator.wikimedia.org/T182948#3839751 (10jcrespo) "accessing user watchlists in database queries" could you elaborate what do you want to query? watchlist is a pr...
[09:55:50] <wikibugs>	 10DBA, 10Operations, 10Patch-For-Review: Rack and setup db1111 and db1112 - https://phabricator.wikimedia.org/T180788#3839764 (10Marostegui)
[10:03:38] <jynus>	 I will deploy this now https://gerrit.wikimedia.org/r/398246
[10:03:51] <jynus>	 did you do changes to labsdb1009?
[10:03:55] <jynus>	 I would depool it next
[10:05:54] <marostegui>	 Nope, not touched 1009
[10:05:57] <marostegui>	 you can proceed as you wish
[10:07:54] <jynus>	 do you want to go first or should I? how long does it take?
[10:08:11] <jynus>	 mine should not take more than 30 minutes or so
[10:10:37] <marostegui>	 no, you can go
[10:10:39] <marostegui>	 no problem :)
[11:00:07] <jynus>	 I double checked that failover is soft
[11:00:23] <jynus>	 2017-12-15 10:59:25     labsdb1010
[11:00:24] <jynus>	 2017-12-15 10:59:26     labsdb1011
[11:00:29] <marostegui>	 so it is confirmed that it is soft?
[11:00:34] <jynus>	 but a long runing query
[11:00:59] <jynus>	 finished as expected having started before failover and finished after
[11:01:28] <jynus>	 I have not done formal testing, but I belive so
[11:01:42] <jynus>	 I ve just done a quick test
[11:02:19] <marostegui>	 I wonder about those connections with 0 seconds we saw yesterday after the failover
[11:02:44] <jynus>	 yeah, those are probably long-running connections
[11:02:55] <jynus>	 from persistent connections (pools)
[11:03:11] <jynus>	 the connections don't fail hard
[11:03:24] <jynus>	 so that means they kept open/running unless reopened
[11:04:17] <jynus>	 that is consistent with my test
[11:04:27] <jynus>	 soft failover means no hard failover
[11:04:40] <jynus>	 which means no killed open connections
[11:06:50] <marostegui>	 probably yeah
[12:09:45] <wikibugs>	 10DBA, 10Operations, 10hardware-requests, 10ops-eqiad, 10Patch-For-Review: Decommission db1021 - https://phabricator.wikimedia.org/T181378#3787781 (10MoritzMuehlenhoff) Still showing in servermon, also seems like a missing "puppet node deactivate"
[12:10:35] <wikibugs>	 10DBA, 10Operations, 10Phabricator, 10hardware-requests, 10ops-eqiad: Decommission db1048 (was Move m3 slave to db1059) - https://phabricator.wikimedia.org/T175679#3599964 (10MoritzMuehlenhoff) Still showing in servermon, also seems like a missing "puppet node deactivate"
[12:22:00] <jynus>	 marostegui: important https://gerrit.wikimedia.org/r/398450
[12:22:24] <marostegui>	 let see
[12:22:34] <marostegui>	 oh!
[13:09:27] <wikibugs>	 10DBA, 10Operations, 10Patch-For-Review: Rack and setup db1111 and db1112 - https://phabricator.wikimedia.org/T180788#3840331 (10Marostegui) a:05Cmjohnson>03Marostegui db1111 has now commonswiki and eowiki there as requested by @daniel. It is replicating s4 (but as spoke, we will remove replication once...
[13:42:09] <wikibugs>	 10DBA, 10Data-Services, 10Goal, 10cloud-services-team (FY2017-18): Create method for accessing user watchlists in database queries - https://phabricator.wikimedia.org/T182948#3840406 (10Dispenser) I mean literally just that.  You can try out the [[http://dispenser.info.tm/~dispenser/cgi-bin/watchlist_point...
[13:59:12] <wikibugs>	 10DBA, 10Data-Services, 10Goal, 10cloud-services-team (FY2017-18): Create method for accessing user watchlists in database queries - https://phabricator.wikimedia.org/T182948#3840439 (10jcrespo) @bd808 Isn't dumping the *private watchlist* of a user into a public database something we would frown upon? Wou...
[16:22:44] <bd808>	 debezium is a thing I heard about in passing at KubeCon last week that seems interesting -- http://debezium.io/docs/faq/ -- it reads mysql bin logs and produces events to a kafaka topic describing what changed so that other apps can consume them. Basically distributed on-row-change triggers.
[16:23:07] <bd808>	 Its a FLOSS project from RedHat
[16:24:05] <jynus>	 buy why use the database layer if mediawiki (app layer) could do that 1000 times better?
[16:24:38] <bd808>	 yeah, if you have control of the source at the app side that is probably more efficient
[16:24:53] <jynus>	 did you have an usage in mind?
[16:25:08] <bd808>	 I just liked the idea of being able to tap any mysql db using row based binlogs as distributed event source
[16:26:01] <bd808>	 For MediaWiki I'm not sure there is an application of it
[16:26:19] <bd808>	 maybe something in our Analytics space
[16:26:22] <jynus>	 doesn't changepropagation do that?
[16:26:37] <jynus>	 or some of those kafka-based streams I do not know about?
[16:27:02] <jynus>	 it has been rewriten so many times I could not keep track of it
[16:27:08] <bd808>	 yeah I think changeprop is similar idea from the app side of the data stream
[16:27:24] <bd808>	 which as you say is a more efficient place to tap in
[16:27:57] <jynus>	 for other things... it depends on what is the inteded usage
[16:28:11] <jynus>	 there are several conversos of data event formats
[16:28:28] <jynus>	 e.g. to send mysql data to hadoop
[16:28:36] <jynus>	 or to other datastores
[16:28:46] <jynus>	 *conversion tools
[16:29:44] <jynus>	 we checked some for the filtering, but the truth is in practice a 99% compatibility is not good enough
[16:29:56] <jynus>	 not even a 99.99%
[16:30:12] <bd808>	 *nod* people freak out if there is any delta
[16:30:17] <jynus>	 ha
[16:30:21] <jynus>	 I disagree
[16:30:47] <jynus>	 there is a huge data drift between production servers
[16:31:03] <jynus>	 and last word I had from mediawiki core was "not my problem"
[16:31:07] <bd808>	 interesting
[16:31:40] <bd808>	 mostly because of the weird delete stuff where rows get shuffled between tables?
[16:31:56] <jynus>	 so here we are manuel and I maintaing the ship afloat without even a bucket
[16:32:40] <jynus>	 bd808: I was told that was solved- but existing problems created in the past "not our concern"
[16:33:16] <jynus>	 then I think there are some smaller-scope issues on some extensions- tag change and others
[16:35:36] <jynus>	 bd808: literally volunteers are way more interested on data inconsistencies than core developers
[16:36:01] <bd808>	 jynus: yeah, that's the group I was thinking of
[16:36:21] <jynus>	 anyway, for filtering, the problem is not as much the technology
[16:36:29] <jynus>	 but its administration
[16:37:42] <jynus>	 things will get better- I spent 2 years rebuilding what was already there, but well done
[16:37:57] <jynus>	 now we can think of actually changing things for better
[16:38:32] <jynus>	 labs data was a mess when I entered, now it is literally better than some production hosts
[16:39:08] <chasemp>	 jynus: I mentioned to someone yesterday that we sat down to make the wikireplica plan in Barcelona well over a year go.  But hey, it's there :)
[16:39:25] <marostegui>	 haha indeed, I still remember that meeting
[16:39:31] <jynus>	 chasemp: I though about it 1 year prior to ir
[16:39:33] <jynus>	 *it
[16:40:07] <jynus>	 I think the new machines were bought on summer that year
[16:41:05] <chasemp>	 yes, I believe so 
[16:41:35] <jynus>	 bd808, chasemp, do we need to meet to check pending task regarding labsdb1001/2/3?
[16:41:40] <jynus>	 not now, I mean soon
[16:41:57] <jynus>	 I think there is a lot of small backlog?
[16:42:39] <chasemp>	 do you mean figuring out if any of them are still valid in a labsdb10[09|10|11] world?
[16:42:42] <jynus>	 not that we both have a lot of time, but to make sure we are not waiting for each other?
[16:43:02] <jynus>	 well, things like user databases, datasets and other tickets like those
[16:43:32] <bd808>	 jynus: I thnk it would be good for us to meet and "solve" the user curated data + replication issue
[16:44:04] <jynus>	 well, more than solving, (maybe we can meet in person for that)
[16:44:33] <jynus>	 are you ok with everthing as happening know? are we waiting for us to do some things? etc
[16:44:43] <jynus>	 just that
[16:44:58] <chasemp>	 let's make a plan to chat right after teh new year?
[16:45:06] <jynus>	 are you "I know there are things pending but I do not have time"?
[16:45:11] <chasemp>	 and I'll try to go through and figure out what seems like an open question
[16:45:19] <jynus>	 I just want to figure out where we are
[16:45:24] <chasemp>	 definitely
[16:45:29] <jynus>	 we do not have to technically meet
[16:45:38] <jynus>	 email works for me
[16:45:41] <chasemp>	 would like to chat about the toolsdb issue
[16:45:43] <chasemp>	 ok
[16:45:48] <jynus>	 oh, ues
[16:45:50] <jynus>	 that, too
[16:45:59] <jynus>	 machines are installed?
[16:46:15] <jynus>	 so that kind of thing, in which we both have  each of the answers
[16:46:30] <chasemp>	 we have physical machiens which cloud ppl need to put large VMs on and figure out how to make it as friendly for you as possible i.e. not care it's an instance if at all possible
[16:46:36] <chasemp>	 right
[16:46:45] <jynus>	 or more likely, we have half* of the answers togheder and will have to made up the rest :-D
[16:47:03] <jynus>	 I was discussing with manuel the other day
[16:47:21] <jynus>	 to maybe purchase proxies like the ones for labsdbs
[16:47:32] <jynus>	 to make failover easier
[16:47:52] * chasemp nods
[16:47:57] <jynus>	 so as you see, lots of pending topics
[16:48:07] <jynus>	 I have to say, labsdb100[9,10,11]
[16:48:17] <jynus>	 is working nicely in terms of administration
[16:48:22] <chasemp>	 that's really awesome to hear
[16:48:25] <jynus>	 we can upgrade easily, etc.
[16:49:30] <jynus>	 one other thing, if we will have to create one backups to some old databases
[16:49:50] <jynus>	 because "you deleted my data and didn't warn me with 2 years in advance"
[16:50:32] <jynus>	 so let's sync at some point by email or in person
[16:50:51] <chasemp>	 I'm putting it on my list to send the initial email to kick of a conversation
[16:51:01] <chasemp>	 and we'll know quickly if a hangout followup makes sense
[16:51:21] <chasemp>	 s/of/off
[16:51:26] <jynus>	 ah, one think
[16:51:33] <jynus>	 check goal drafts
[16:51:45] <jynus>	 we included labsdb1001/2/3 on the goal to decomission on Q3
[16:51:48] <jynus>	 *thing
[16:52:36] <chasemp>	 that seems reasonble, bd808 can speak to if that makes sense in our world but I'm sure it does
[17:35:32] <wikibugs>	 10DBA, 10Data-Services, 10Toolforge: I can't connect to DB replica on Toolforge due to TLS-related failure - https://phabricator.wikimedia.org/T182892#3841104 (10bd808) Thanks for the clarification @jcrespo. I'll try to find a place on https://wikitech.wikimedia.org/wiki/Help:Toolforge/Database to highlight...
[17:44:48] <wikibugs>	 10DBA, 10Data-Services, 10Toolforge: I can't connect to DB replica on Toolforge due to TLS-related failure - https://phabricator.wikimedia.org/T182892#3841116 (10bd808) https://wikitech.wikimedia.org/w/index.php?title=Help:Toolforge/Database&diff=1778547&oldid=1777977
[18:10:02] <wikibugs>	 10DBA, 10Data-Services, 10Goal, 10cloud-services-team (FY2017-18): Create method for accessing user watchlists in database queries - https://phabricator.wikimedia.org/T182948#3841224 (10bd808) I have to agree with @jcrespo that the new state seems more like a feature than a bug. The use case for @Dispenser...
[18:10:24] <wikibugs>	 10DBA, 10Data-Services: Create method for accessing user watchlists in database queries - https://phabricator.wikimedia.org/T182948#3841225 (10bd808)
[20:06:41] <wikibugs>	 10DBA, 10Data-Services, 10Goal, 10cloud-services-team (FY2017-18): Stop managing account creation for labsdb1001 and 1003 through the maintain-dbusers script - https://phabricator.wikimedia.org/T183029#3841538 (10madhuvishy) p:05Triage>03High
[21:13:36] <wikibugs>	 10DBA, 10Data-Services, 10Goal, 10Patch-For-Review, 10cloud-services-team (FY2017-18): Migrate all users to new Wiki Replica cluster and decommission old hardware - https://phabricator.wikimedia.org/T142807#3841790 (10jcrespo) @bd808 We really would need the proposed change now, before the "web" server e...