[03:48:45] <wikibugs>	 10DBA, 10Beta-Cluster-Infrastructure, 06Release-Engineering-Team, 07WorkType-Maintenance: Upgrade mariadb in deployment-prep from Precise/MariaDB 5.5 to Jessie/MariaDB 5.10 - https://phabricator.wikimedia.org/T138778#2409864 (10Nikita13311331) Родительские Задачи T139044: включить gtid-на бета-версию maria...
[06:26:04] <_joe_>	 jynus: whenever you're around/available, I would like to discuss puppetdb
[06:59:48] <wikibugs>	 10DBA, 10Flow, 03Collab-Team-Q1-July-Sep-2016, 13Patch-For-Review: Cleanup ptwikibooks conversion - https://phabricator.wikimedia.org/T119509#2535876 (10matthiasmullie) @jcrespo Sure. Does 10AM CEST work for you to start the backups? I'll be around.
[07:13:49] <jynus>	 _joe_, let me restart first
[07:23:22] <_joe_>	 jynus: /win 19
[07:23:24] <_joe_>	 argh
[07:23:41] <_joe_>	 jynus: so, let me know when you're available
[07:24:13] <wikibugs>	 10DBA, 10Flow, 03Collab-Team-Q1-July-Sep-2016, 13Patch-For-Review: Cleanup ptwikibooks conversion - https://phabricator.wikimedia.org/T119509#2535920 (10jcrespo) Yes, contact me on IRC and I will tell you about its progress and when they finish.
[07:30:39] <jynus>	 _joe_, tell me
[07:31:12] <_joe_>	 jynus: so my basic idea for postgres/puppetdb is:
[07:31:30] <_joe_>	 1) have a master postgres server in eqiad, replicated to codfw
[07:31:54] <_joe_>	 2) have puppetdb (the application) connect to the master from all datacenters where it's installed
[07:32:31] <_joe_>	 3) in case of a failure, I'd just point the application to the slave database, and we can then find out how to recover from failure
[07:33:05] <_joe_>	 keep in mind that a failure is something we want to be able to recover from, not necessarily in less than half an hour
[07:33:13] <volans>	 _joe_: which replication method do you plan to use for postgres? 
[07:33:29] <jynus>	 it is not as direct as mysql, yes
[07:33:57] <_joe_>	 volans: not sure which one we're using in postgresql::master, let me check
[07:34:47] <_joe_>	 hot_standby = on
[07:36:54] <_joe_>	 jynus: what I'd like to get is a) a general sanity check of the idea b) AFAICT, there is no backup mechanism in place for postgres
[07:37:44] <jynus>	 well, I do not think puppet requires a lot of safety, does it?
[07:38:11] <_joe_>	 not a /lot/, we just need to be sure we don
[07:38:15] <jynus>	 current puppet db could be wiped and nothing would happen, probably
[07:38:16] <_joe_>	 t lose everything
[07:38:25] <_joe_>	 we'd lose a few things
[07:38:33] <_joe_>	 specifically icinga
[07:38:41] <_joe_>	 and some other things that are pretty important
[07:38:49] <jynus>	 I am unsure about cross-dc queries
[07:39:56] <_joe_>	 all via ssl
[07:40:01] <jynus>	 do we use TLS? whould it work?
[07:40:05] <_joe_>	 replication supports ssl too
[07:40:16] <_joe_>	 yes TLS is supported
[07:40:24] <_joe_>	 by puppetdb
[07:40:48] <jynus>	 currently, we failover to a server on the same dc
[07:40:59] <jynus>	 that simplofies things
[07:41:53] <volans>	 in theory it would be better to have the clients use SSL too, not only the replication if the clients are doing cross-dc queries
[07:42:06] <_joe_>	 volans: read above
[07:42:11] <_joe_>	 09:40 < _joe_> yes TLS is supported
[07:42:11] <_joe_>	 09:41 < _joe_> by puppetdb
[07:42:13] <jynus>	 while having a replica on the remote dc but no client, I think
[07:42:31] <jynus>	 do we have/will we have a client on codfw?
[07:42:35] <volans>	 ops, I missed that line :)
[07:42:54] <_joe_>	 yes
[07:43:06] <_joe_>	 we will have puppetdb running both in codfw and eqiad
[07:43:27] <_joe_>	 and in the standard config, local puppetmasters will communicate with the local puppetdb
[07:44:14] <jynus>	 ?
[07:44:26] <jynus>	 that will not work with the slave
[07:44:28] <jynus>	 ah
[07:44:32] <_joe_>	 puppetdb will then point to the master postgres
[07:44:33] <jynus>	 you mean puppet db the app
[07:44:36] <jynus>	 ok ok
[07:44:37] <_joe_>	 yes
[07:44:47] <_joe_>	 the piece-of-crap java/clojure thing
[07:44:56] <jynus>	 looks good, just doesn't look easy
[07:45:26] <jynus>	 for example, in case of eqiad down, you have to turn the replication direction
[07:45:42] <jynus>	 which is what I tried to avoid on all our mysql machines
[07:45:55] <_joe_>	 I know
[07:46:00] <jynus>	 but you seemed happy about it for redis
[07:46:05] <_joe_>	 that's why I am asking you
[07:47:59] <akosiaris>	 ehlo
[07:48:23] <_joe_>	 jynus: if you have better options, I'm all ears
[07:48:41] <_joe_>	 keep in mind we can't modify puppetdb-the-app :P
[07:49:07] <akosiaris>	 we can, we just wont
[07:49:14] <akosiaris>	 at least I wont
[07:49:20] <akosiaris>	 and I 'll advise anyone against it
[07:50:03] <_joe_>	 come on let's be realistic, no one is ever doing that :P
[07:50:06] <jynus>	 can I ask why puppet took such a radical change?
[07:50:15] <akosiaris>	 er...
[07:50:24] <akosiaris>	 not sure I how to answer that politely
[07:50:27] <_joe_>	 jynus: yes you can ask
[07:50:27] <jynus>	 suddenly using a separate piece for the storage
[07:50:49] <_joe_>	 not suddenly, puppetdb has been around for 6 years i think
[07:50:50] <akosiaris>	 lemme find where they justify it
[07:50:59] <_joe_>	 we just avoided it until it was possible
[07:51:01] <jynus>	 instead of supporting directly postgres/mysql/sqlite
[07:51:32] <akosiaris>	 Actually, PuppetDB isn’t written in Java at all! It’s written in a language called Clojure, which is a dialect of Lisp that runs on the Java Virtual Machine (JVM). Several other languages were prototyped, including Ruby and JRuby, but they lacked the necessary performance. We chose to use a JVM language because of its excellent libraries and high performance. Of the available JVM languages, we used Clojure because of its expre
[07:51:46] <_joe_>	 akosiaris: you keep pasting that :P
[07:51:51] <akosiaris>	 I just love it!!!!
[07:52:20] <akosiaris>	 so jynus the answer is expresiveness, performance and prior experience
[07:52:27] <jynus>	 so they have a ruby-based app
[07:52:31] <_joe_>	 akosiaris: and in the next iteration, puppet-server!!!1!!1!
[07:52:36] <_joe_>	 jynus: not anymore
[07:52:42] <akosiaris>	 now as to why they decided to ditch the rails based framework and write their own
[07:52:50] <jynus>	 what?
[07:52:59] <akosiaris>	 puppet 5 is not ruby
[07:53:04] <_joe_>	 puppet 4.0 has a clojure server
[07:53:05] <jynus>	 WHAT?
[07:53:08] <akosiaris>	 it's clojure
[07:53:11] <_joe_>	 which is the recommended one
[07:53:17] <jynus>	 what about the custom functions?
[07:53:18] <_joe_>	 it still reads ruby via jruby
[07:53:26] <_joe_>	 that's a kind of magic!
[07:53:37] <akosiaris>	 my take, is we 're screwed
[07:53:43] <akosiaris>	 that thing sounds magic already on paper
[07:53:45] <akosiaris>	 imagine reality
[07:54:07] <volans>	 and they say that they will never support mysql for lack of features like recursive queries
[07:54:09] <akosiaris>	 thankfully they have stdlib to support so it might just work
[07:54:18] <akosiaris>	 volans: or any other DB more or less
[07:54:26] <akosiaris>	 well they support HSQLDB
[07:54:43] <akosiaris>	 but that's like saying "I support sqlite!!!"
[07:55:06] <akosiaris>	 not really useful for anything that needs high performance
[07:56:20] <jynus>	 I do not really have much to add, rathen than I would prefer within-dc failover
[07:57:16] <jynus>	 1 active; 2 passives on each dc
[07:57:36] <jynus>	 but if you are short on resource go on with your plan
[07:57:51] <_joe_>	 we can add local slaves easily, yes
[07:58:07] <_joe_>	 but for starters, I think this is enough
[07:58:11] <jynus>	 it is that usually you have to do maintenance
[07:58:19] <_joe_>	 yes
[07:58:28] <jynus>	 and doing a dc failover just for an upgrade is annoying
[07:58:32] <akosiaris>	 well, changing the master in a postgres env is a mess
[07:58:44] <akosiaris>	 it's just barely supported in reality
[07:58:50] <volans>	 agree with akosiaris 
[07:59:04] <_joe_>	 thank you puppetlabs
[07:59:17] <jynus>	 I am thinking more about the client, but ok
[07:59:19] <akosiaris>	 recursive queries man.. recursive queries!
[07:59:25] <_joe_>	 yo
[07:59:48] <akosiaris>	 there is ofc the fact that jaime has a very valid point 
[07:59:54] <jynus>	 recursive queries are nice if you know that you are doing
[08:00:05] <jynus>	 specially in tree-like structures that I suppose puppet handles
[08:00:20] <akosiaris>	 codfw clients would be using the local puppetDB but that puppetDB will have to be using the EQIAD postgres
[08:00:34] <akosiaris>	 which is gonna be causing a slowdown for codfw clients
[08:00:41] <jynus>	 first think I would do is test that
[08:00:42] <akosiaris>	 but I see no solution tbh
[08:00:49] <jynus>	 ok
[08:00:55] <jynus>	 there is one possibility
[08:01:06] <jynus>	 but I do not know if possible
[08:01:22] <akosiaris>	 actually there are solutions... I just hate them all
[08:01:23] <jynus>	 dividing the farm?
[08:01:30] <akosiaris>	 things like pgpool and pgpool 2
[08:01:35] <_joe_>	 jynus: alas nope
[08:01:37] <jynus>	 nah
[08:02:38] <_joe_>	 I have to be honest, I know nothing about postgres failover
[08:02:56] <akosiaris>	 _joe_: read the uber blog now
[08:02:59] <jynus>	 what about having only 1 active puppet db?
[08:03:00] <_joe_>	 been using mysql for way too long and completely lost track of postgres around 2006 or so
[08:03:01] <akosiaris>	 you will understand why it is a mess
[08:03:27] <akosiaris>	 TL;DR postgres replication is basicall on-disk data
[08:03:30] <_joe_>	 akosiaris: if we assume we have weak consistency requirements, it's still a mess?
[08:03:31] <jynus>	 does puppet db handle connections?
[08:03:48] <jynus>	 the app, I mean
[08:03:49] <akosiaris>	 _joe_: yes. cause postgres does not allow that
[08:03:59] <akosiaris>	 you either have everything or you don't
[08:04:01] <_joe_>	 jynus: connections to postgres?
[08:04:05] <jynus>	 yes
[08:04:09] <_joe_>	 jynus: yes
[08:04:14] <akosiaris>	 there's is practically nothing to configure about consistency in postgres
[08:04:36] <_joe_>	 akosiaris: so replication is not going to help us in any way?
[08:04:37] <akosiaris>	 that WALs need to be applied as is
[08:04:39] <jynus>	 what about puppetdb conmmunication to the rest of the ap
[08:04:49] <jynus>	 is it separated or it is part of the ap?
[08:05:00] <akosiaris>	 jynus: the puppetmasters talk to puppetDB over SSL
[08:05:01] <_joe_>	 puppetdb is contacted from puppet via a REST API
[08:05:10] <akosiaris>	 it's a REST API
[08:05:10] <volans>	 akosiaris, _joe_ how/from where/who will write to this DB?
[08:05:11] <_joe_>	 over ssl, yes
[08:05:23] <akosiaris>	 volans: puppetmasters per DC
[08:05:24] <_joe_>	 volans: only puppetdb will write to the db
[08:05:27] <jynus>	 have you thought about not separating puppet DB, but talking to a single puppet db instance?
[08:05:43] <_joe_>	 jynus: then the problem is it's a SPOF
[08:05:52] <jynus>	 no, we have another passive
[08:05:56] <_joe_>	 the whole replication thing is trying to avoid a SPOF
[08:06:25] <jynus>	 so 1 puppet - 1 puppetdb - 1 postgres on each dc
[08:06:33] <_joe_>	 that is the plan
[08:06:38] <jynus>	 but only puppet is active on both
[08:06:43] <_joe_>	 ok
[08:06:43] <akosiaris>	 more like 2 puppetmasters but yes
[08:06:46] <_joe_>	 that's possible
[08:06:47] <jynus>	 instead of puppet AND puppetdb
[08:06:48] <akosiaris>	 it does not matter
[08:06:56] <_joe_>	 jynus: it's possible ofc
[08:07:09] <_joe_>	 what's the advantage?
[08:07:17] <_joe_>	 apart from making the config a bit simpler
[08:07:29] <jynus>	 well, I mention if because it may be preferred on or the other
[08:07:36] <jynus>	 depending on the connection pattern
[08:08:12] <_joe_>	 actually I am thinking of going that route.
[08:08:17] <jynus>	 whatever has or can have persistent connections will be preferred for crossdc communication
[08:08:43] <jynus>	 if it does not have built in persistent connections; proxies can be used to trick those
[08:10:33] <jynus>	 so, I really do not have much to say about your plan
[08:10:56] <jynus>	 I am just shocked the more I learn about puppet
[08:11:10] <_joe_>	 you tell me about that
[08:12:28] <jynus>	 it seems that the "Is puppet the future?" session that I wrote as a mere discussion started may not be so farfetched
[08:12:39] <jynus>	 *started
[08:13:21] <jynus>	 having a couple of machines for postgress may not be that bad, maybe we can reuse them for more services in the future
[08:14:19] <volans>	 do we have postgres for our OSM?
[08:14:33] <jynus>	 yes, but it is separate
[08:14:41] <jynus>	 and that is an application on its own
[08:14:52] <jynus>	 (I am talking about the dbs)
[08:15:14] <jynus>	 we do not want to mic puppet and maps
[08:15:21] <volans>	 yes, I was thinking what replication was chosen there
[08:32:58] <jynus>	 if jessie has a higher version of mariadb than you, maybe it is time to upgrade
[08:38:41] <wikibugs>	 10DBA, 06Operations: dbstore2002 stopped providing mysql service despite the process being running - https://phabricator.wikimedia.org/T142273#2536017 (10jcrespo) 05Open>03Resolved Disk replacement is handled on a separate task; no more to do here.
[08:45:43] <wikibugs>	 10DBA, 10Analytics-EventLogging, 10ImageMetrics: Drop EventLogging tables for ImageMetricsLoadingTime and ImageMetricsCorsSupport - https://phabricator.wikimedia.org/T141407#2536049 (10jcrespo) @Jdforrester-WMF Metrics are still coming in as recent as `20160808090453`, and thus these tables are being recreat...
[10:36:42] <wikibugs>	 07Blocked-on-schema-change, 10DBA: Deploy I2b042685 to all databases - https://phabricator.wikimedia.org/T139090#2536153 (10jcrespo) This affects almost all servers, all wikis; it will be easier to apply the change to all wikis rather than selectively.
[10:37:14] <wikibugs>	 07Blocked-on-schema-change, 10DBA: Deploy I2b042685 to all databases - https://phabricator.wikimedia.org/T139090#2536154 (10jcrespo) This affects almost all servers, all wikis; it will be easier to apply the change to all wikis rather than selectively.
[13:35:02] <wikibugs>	 10DBA, 06Labs, 10Tool-Labs, 07Tracking: Certain tools users create multiple long running queries that take all memory from labsdb hosts, slowing it down and potentially crashing (tracking) - https://phabricator.wikimedia.org/T119601#2536473 (10jcrespo)
[15:06:52] <wikibugs>	 10DBA, 06Labs, 10Tool-Labs, 07Tracking: Certain tools users create multiple long running queries that take all memory from labsdb hosts, slowing it down and potentially crashing (tracking) - https://phabricator.wikimedia.org/T119601#2536650 (10jcrespo)
[15:44:54] <wikibugs>	 10DBA, 06Labs, 10Tool-Labs, 07Tracking: Certain tools users create multiple long running queries that take all memory from labsdb hosts, slowing it down and potentially crashing (tracking) - https://phabricator.wikimedia.org/T119601#2536810 (10jcrespo)
[15:46:12] <wikibugs>	 10DBA, 10Analytics-EventLogging, 10ImageMetrics: Drop EventLogging tables for ImageMetricsLoadingTime and ImageMetricsCorsSupport - https://phabricator.wikimedia.org/T141407#2536814 (10Jdforrester-WMF) Isn't client-side code caching great?  OK, wait a month and then try?
[15:48:47] <wikibugs>	 10DBA, 10Analytics-EventLogging, 10ImageMetrics: Drop EventLogging tables for ImageMetricsLoadingTime and ImageMetricsCorsSupport - https://phabricator.wikimedia.org/T141407#2536822 (10jcrespo) :-)  To be fair- I understand why those would be generated. But shouldn't we have a way to discard those on server...
[15:54:44] <wikibugs>	 10DBA, 10Analytics-EventLogging, 10ImageMetrics: Drop EventLogging tables for ImageMetricsLoadingTime and ImageMetricsCorsSupport - https://phabricator.wikimedia.org/T141407#2497306 (10Jdforrester-WMF) >>! In T141407#2536822, @jcrespo wrote: > :-) >  > To be fair- I understand why those would be generated. B...
[16:03:52] <wikibugs>	 10DBA, 06Labs, 10Tool-Labs, 07Tracking: Certain tools users create multiple long running queries that take all memory from labsdb hosts, slowing it down and potentially crashing (tracking) - https://phabricator.wikimedia.org/T119601#2536903 (10jcrespo)
[16:10:07] <wikibugs>	 10DBA, 10Analytics-EventLogging, 10ImageMetrics: Drop EventLogging tables for ImageMetricsLoadingTime and ImageMetricsCorsSupport - https://phabricator.wikimedia.org/T141407#2536923 (10jcrespo) Let's wait, then. It is not that important, it was just annoying.
[17:37:20] <wikibugs>	 10DBA, 06Community-Tech, 07Schema-change: Add local_user_id and global_user_id to localuser table in centralauth database - https://phabricator.wikimedia.org/T141951#2537215 (10kaldari)
[17:37:46] <wikibugs>	 10DBA, 06Community-Tech, 07Schema-change: Add local_user_id and global_user_id to localuser table in centralauth database - https://phabricator.wikimedia.org/T141951#2517589 (10kaldari)
[17:40:11] <wikibugs>	 10DBA, 06Community-Tech: Create a maintenance script for populating the local_user_id and global_user_id fields in the centralauth localuser table - https://phabricator.wikimedia.org/T142503#2537232 (10kaldari)
[17:40:19] <wikibugs>	 10DBA, 06Community-Tech, 07Schema-change: Add local_user_id and global_user_id to localuser table in centralauth database - https://phabricator.wikimedia.org/T141951#2517589 (10bd808) Current CA schema: https://phabricator.wikimedia.org/diffusion/ECAU/browse/master/central-auth.sql  New columns would be adde...
[17:41:13] <wikibugs>	 10DBA, 06Community-Tech, 07Schema-change: Add local_user_id and global_user_id to localuser table in centralauth database - https://phabricator.wikimedia.org/T141951#2537250 (10DannyH) p:05Triage>03Normal
[17:41:20] <wikibugs>	 10DBA, 06Community-Tech: Create a maintenance script for populating the local_user_id and global_user_id fields in the centralauth localuser table - https://phabricator.wikimedia.org/T142503#2537232 (10DannyH) p:05Triage>03Normal
[17:52:28] <wikibugs>	 10DBA, 03Community-Tech-Sprint, 07Schema-change: Add local_user_id and global_user_id to localuser table in centralauth database - https://phabricator.wikimedia.org/T141951#2537319 (10DannyH)
[23:59:32] <wikibugs>	 10DBA, 06Community-Tech-Tool-Labs, 10Striker: Create production database and users for Striker - https://phabricator.wikimedia.org/T142545#2538784 (10bd808)