[03:48:45] 10DBA, 10Beta-Cluster-Infrastructure, 06Release-Engineering-Team, 07WorkType-Maintenance: Upgrade mariadb in deployment-prep from Precise/MariaDB 5.5 to Jessie/MariaDB 5.10 - https://phabricator.wikimedia.org/T138778#2409864 (10Nikita13311331) Родительские Задачи T139044: включить gtid-на бета-версию maria... [06:26:04] <_joe_> jynus: whenever you're around/available, I would like to discuss puppetdb [06:59:48] 10DBA, 10Flow, 03Collab-Team-Q1-July-Sep-2016, 13Patch-For-Review: Cleanup ptwikibooks conversion - https://phabricator.wikimedia.org/T119509#2535876 (10matthiasmullie) @jcrespo Sure. Does 10AM CEST work for you to start the backups? I'll be around. [07:13:49] _joe_, let me restart first [07:23:22] <_joe_> jynus: /win 19 [07:23:24] <_joe_> argh [07:23:41] <_joe_> jynus: so, let me know when you're available [07:24:13] 10DBA, 10Flow, 03Collab-Team-Q1-July-Sep-2016, 13Patch-For-Review: Cleanup ptwikibooks conversion - https://phabricator.wikimedia.org/T119509#2535920 (10jcrespo) Yes, contact me on IRC and I will tell you about its progress and when they finish. [07:30:39] _joe_, tell me [07:31:12] <_joe_> jynus: so my basic idea for postgres/puppetdb is: [07:31:30] <_joe_> 1) have a master postgres server in eqiad, replicated to codfw [07:31:54] <_joe_> 2) have puppetdb (the application) connect to the master from all datacenters where it's installed [07:32:31] <_joe_> 3) in case of a failure, I'd just point the application to the slave database, and we can then find out how to recover from failure [07:33:05] <_joe_> keep in mind that a failure is something we want to be able to recover from, not necessarily in less than half an hour [07:33:13] _joe_: which replication method do you plan to use for postgres? [07:33:29] it is not as direct as mysql, yes [07:33:57] <_joe_> volans: not sure which one we're using in postgresql::master, let me check [07:34:47] <_joe_> hot_standby = on [07:36:54] <_joe_> jynus: what I'd like to get is a) a general sanity check of the idea b) AFAICT, there is no backup mechanism in place for postgres [07:37:44] well, I do not think puppet requires a lot of safety, does it? [07:38:11] <_joe_> not a /lot/, we just need to be sure we don [07:38:15] current puppet db could be wiped and nothing would happen, probably [07:38:16] <_joe_> t lose everything [07:38:25] <_joe_> we'd lose a few things [07:38:33] <_joe_> specifically icinga [07:38:41] <_joe_> and some other things that are pretty important [07:38:49] I am unsure about cross-dc queries [07:39:56] <_joe_> all via ssl [07:40:01] do we use TLS? whould it work? [07:40:05] <_joe_> replication supports ssl too [07:40:16] <_joe_> yes TLS is supported [07:40:24] <_joe_> by puppetdb [07:40:48] currently, we failover to a server on the same dc [07:40:59] that simplofies things [07:41:53] in theory it would be better to have the clients use SSL too, not only the replication if the clients are doing cross-dc queries [07:42:06] <_joe_> volans: read above [07:42:11] <_joe_> 09:40 < _joe_> yes TLS is supported [07:42:11] <_joe_> 09:41 < _joe_> by puppetdb [07:42:13] while having a replica on the remote dc but no client, I think [07:42:31] do we have/will we have a client on codfw? [07:42:35] ops, I missed that line :) [07:42:54] <_joe_> yes [07:43:06] <_joe_> we will have puppetdb running both in codfw and eqiad [07:43:27] <_joe_> and in the standard config, local puppetmasters will communicate with the local puppetdb [07:44:14] ? [07:44:26] that will not work with the slave [07:44:28] ah [07:44:32] <_joe_> puppetdb will then point to the master postgres [07:44:33] you mean puppet db the app [07:44:36] ok ok [07:44:37] <_joe_> yes [07:44:47] <_joe_> the piece-of-crap java/clojure thing [07:44:56] looks good, just doesn't look easy [07:45:26] for example, in case of eqiad down, you have to turn the replication direction [07:45:42] which is what I tried to avoid on all our mysql machines [07:45:55] <_joe_> I know [07:46:00] but you seemed happy about it for redis [07:46:05] <_joe_> that's why I am asking you [07:47:59] ehlo [07:48:23] <_joe_> jynus: if you have better options, I'm all ears [07:48:41] <_joe_> keep in mind we can't modify puppetdb-the-app :P [07:49:07] we can, we just wont [07:49:14] at least I wont [07:49:20] and I 'll advise anyone against it [07:50:03] <_joe_> come on let's be realistic, no one is ever doing that :P [07:50:06] can I ask why puppet took such a radical change? [07:50:15] er... [07:50:24] not sure I how to answer that politely [07:50:27] <_joe_> jynus: yes you can ask [07:50:27] suddenly using a separate piece for the storage [07:50:49] <_joe_> not suddenly, puppetdb has been around for 6 years i think [07:50:50] lemme find where they justify it [07:50:59] <_joe_> we just avoided it until it was possible [07:51:01] instead of supporting directly postgres/mysql/sqlite [07:51:32] Actually, PuppetDB isn’t written in Java at all! It’s written in a language called Clojure, which is a dialect of Lisp that runs on the Java Virtual Machine (JVM). Several other languages were prototyped, including Ruby and JRuby, but they lacked the necessary performance. We chose to use a JVM language because of its excellent libraries and high performance. Of the available JVM languages, we used Clojure because of its expre [07:51:46] <_joe_> akosiaris: you keep pasting that :P [07:51:51] I just love it!!!! [07:52:20] so jynus the answer is expresiveness, performance and prior experience [07:52:27] so they have a ruby-based app [07:52:31] <_joe_> akosiaris: and in the next iteration, puppet-server!!!1!!1! [07:52:36] <_joe_> jynus: not anymore [07:52:42] now as to why they decided to ditch the rails based framework and write their own [07:52:50] what? [07:52:59] puppet 5 is not ruby [07:53:04] <_joe_> puppet 4.0 has a clojure server [07:53:05] WHAT? [07:53:08] it's clojure [07:53:11] <_joe_> which is the recommended one [07:53:17] what about the custom functions? [07:53:18] <_joe_> it still reads ruby via jruby [07:53:26] <_joe_> that's a kind of magic! [07:53:37] my take, is we 're screwed [07:53:43] that thing sounds magic already on paper [07:53:45] imagine reality [07:54:07] and they say that they will never support mysql for lack of features like recursive queries [07:54:09] thankfully they have stdlib to support so it might just work [07:54:18] volans: or any other DB more or less [07:54:26] well they support HSQLDB [07:54:43] but that's like saying "I support sqlite!!!" [07:55:06] not really useful for anything that needs high performance [07:56:20] I do not really have much to add, rathen than I would prefer within-dc failover [07:57:16] 1 active; 2 passives on each dc [07:57:36] but if you are short on resource go on with your plan [07:57:51] <_joe_> we can add local slaves easily, yes [07:58:07] <_joe_> but for starters, I think this is enough [07:58:11] it is that usually you have to do maintenance [07:58:19] <_joe_> yes [07:58:28] and doing a dc failover just for an upgrade is annoying [07:58:32] well, changing the master in a postgres env is a mess [07:58:44] it's just barely supported in reality [07:58:50] agree with akosiaris [07:59:04] <_joe_> thank you puppetlabs [07:59:17] I am thinking more about the client, but ok [07:59:19] recursive queries man.. recursive queries! [07:59:25] <_joe_> yo [07:59:48] there is ofc the fact that jaime has a very valid point [07:59:54] recursive queries are nice if you know that you are doing [08:00:05] specially in tree-like structures that I suppose puppet handles [08:00:20] codfw clients would be using the local puppetDB but that puppetDB will have to be using the EQIAD postgres [08:00:34] which is gonna be causing a slowdown for codfw clients [08:00:41] first think I would do is test that [08:00:42] but I see no solution tbh [08:00:49] ok [08:00:55] there is one possibility [08:01:06] but I do not know if possible [08:01:22] actually there are solutions... I just hate them all [08:01:23] dividing the farm? [08:01:30] things like pgpool and pgpool 2 [08:01:35] <_joe_> jynus: alas nope [08:01:37] nah [08:02:38] <_joe_> I have to be honest, I know nothing about postgres failover [08:02:56] _joe_: read the uber blog now [08:02:59] what about having only 1 active puppet db? [08:03:00] <_joe_> been using mysql for way too long and completely lost track of postgres around 2006 or so [08:03:01] you will understand why it is a mess [08:03:27] TL;DR postgres replication is basicall on-disk data [08:03:30] <_joe_> akosiaris: if we assume we have weak consistency requirements, it's still a mess? [08:03:31] does puppet db handle connections? [08:03:48] the app, I mean [08:03:49] _joe_: yes. cause postgres does not allow that [08:03:59] you either have everything or you don't [08:04:01] <_joe_> jynus: connections to postgres? [08:04:05] yes [08:04:09] <_joe_> jynus: yes [08:04:14] there's is practically nothing to configure about consistency in postgres [08:04:36] <_joe_> akosiaris: so replication is not going to help us in any way? [08:04:37] that WALs need to be applied as is [08:04:39] what about puppetdb conmmunication to the rest of the ap [08:04:49] is it separated or it is part of the ap? [08:05:00] jynus: the puppetmasters talk to puppetDB over SSL [08:05:01] <_joe_> puppetdb is contacted from puppet via a REST API [08:05:10] it's a REST API [08:05:10] akosiaris, _joe_ how/from where/who will write to this DB? [08:05:11] <_joe_> over ssl, yes [08:05:23] volans: puppetmasters per DC [08:05:24] <_joe_> volans: only puppetdb will write to the db [08:05:27] have you thought about not separating puppet DB, but talking to a single puppet db instance? [08:05:43] <_joe_> jynus: then the problem is it's a SPOF [08:05:52] no, we have another passive [08:05:56] <_joe_> the whole replication thing is trying to avoid a SPOF [08:06:25] so 1 puppet - 1 puppetdb - 1 postgres on each dc [08:06:33] <_joe_> that is the plan [08:06:38] but only puppet is active on both [08:06:43] <_joe_> ok [08:06:43] more like 2 puppetmasters but yes [08:06:46] <_joe_> that's possible [08:06:47] instead of puppet AND puppetdb [08:06:48] it does not matter [08:06:56] <_joe_> jynus: it's possible ofc [08:07:09] <_joe_> what's the advantage? [08:07:17] <_joe_> apart from making the config a bit simpler [08:07:29] well, I mention if because it may be preferred on or the other [08:07:36] depending on the connection pattern [08:08:12] <_joe_> actually I am thinking of going that route. [08:08:17] whatever has or can have persistent connections will be preferred for crossdc communication [08:08:43] if it does not have built in persistent connections; proxies can be used to trick those [08:10:33] so, I really do not have much to say about your plan [08:10:56] I am just shocked the more I learn about puppet [08:11:10] <_joe_> you tell me about that [08:12:28] it seems that the "Is puppet the future?" session that I wrote as a mere discussion started may not be so farfetched [08:12:39] *started [08:13:21] having a couple of machines for postgress may not be that bad, maybe we can reuse them for more services in the future [08:14:19] do we have postgres for our OSM? [08:14:33] yes, but it is separate [08:14:41] and that is an application on its own [08:14:52] (I am talking about the dbs) [08:15:14] we do not want to mic puppet and maps [08:15:21] yes, I was thinking what replication was chosen there [08:32:58] if jessie has a higher version of mariadb than you, maybe it is time to upgrade [08:38:41] 10DBA, 06Operations: dbstore2002 stopped providing mysql service despite the process being running - https://phabricator.wikimedia.org/T142273#2536017 (10jcrespo) 05Open>03Resolved Disk replacement is handled on a separate task; no more to do here. [08:45:43] 10DBA, 10Analytics-EventLogging, 10ImageMetrics: Drop EventLogging tables for ImageMetricsLoadingTime and ImageMetricsCorsSupport - https://phabricator.wikimedia.org/T141407#2536049 (10jcrespo) @Jdforrester-WMF Metrics are still coming in as recent as `20160808090453`, and thus these tables are being recreat... [10:36:42] 07Blocked-on-schema-change, 10DBA: Deploy I2b042685 to all databases - https://phabricator.wikimedia.org/T139090#2536153 (10jcrespo) This affects almost all servers, all wikis; it will be easier to apply the change to all wikis rather than selectively. [10:37:14] 07Blocked-on-schema-change, 10DBA: Deploy I2b042685 to all databases - https://phabricator.wikimedia.org/T139090#2536154 (10jcrespo) This affects almost all servers, all wikis; it will be easier to apply the change to all wikis rather than selectively. [13:35:02] 10DBA, 06Labs, 10Tool-Labs, 07Tracking: Certain tools users create multiple long running queries that take all memory from labsdb hosts, slowing it down and potentially crashing (tracking) - https://phabricator.wikimedia.org/T119601#2536473 (10jcrespo) [15:06:52] 10DBA, 06Labs, 10Tool-Labs, 07Tracking: Certain tools users create multiple long running queries that take all memory from labsdb hosts, slowing it down and potentially crashing (tracking) - https://phabricator.wikimedia.org/T119601#2536650 (10jcrespo) [15:44:54] 10DBA, 06Labs, 10Tool-Labs, 07Tracking: Certain tools users create multiple long running queries that take all memory from labsdb hosts, slowing it down and potentially crashing (tracking) - https://phabricator.wikimedia.org/T119601#2536810 (10jcrespo) [15:46:12] 10DBA, 10Analytics-EventLogging, 10ImageMetrics: Drop EventLogging tables for ImageMetricsLoadingTime and ImageMetricsCorsSupport - https://phabricator.wikimedia.org/T141407#2536814 (10Jdforrester-WMF) Isn't client-side code caching great? OK, wait a month and then try? [15:48:47] 10DBA, 10Analytics-EventLogging, 10ImageMetrics: Drop EventLogging tables for ImageMetricsLoadingTime and ImageMetricsCorsSupport - https://phabricator.wikimedia.org/T141407#2536822 (10jcrespo) :-) To be fair- I understand why those would be generated. But shouldn't we have a way to discard those on server... [15:54:44] 10DBA, 10Analytics-EventLogging, 10ImageMetrics: Drop EventLogging tables for ImageMetricsLoadingTime and ImageMetricsCorsSupport - https://phabricator.wikimedia.org/T141407#2497306 (10Jdforrester-WMF) >>! In T141407#2536822, @jcrespo wrote: > :-) > > To be fair- I understand why those would be generated. B... [16:03:52] 10DBA, 06Labs, 10Tool-Labs, 07Tracking: Certain tools users create multiple long running queries that take all memory from labsdb hosts, slowing it down and potentially crashing (tracking) - https://phabricator.wikimedia.org/T119601#2536903 (10jcrespo) [16:10:07] 10DBA, 10Analytics-EventLogging, 10ImageMetrics: Drop EventLogging tables for ImageMetricsLoadingTime and ImageMetricsCorsSupport - https://phabricator.wikimedia.org/T141407#2536923 (10jcrespo) Let's wait, then. It is not that important, it was just annoying. [17:37:20] 10DBA, 06Community-Tech, 07Schema-change: Add local_user_id and global_user_id to localuser table in centralauth database - https://phabricator.wikimedia.org/T141951#2537215 (10kaldari) [17:37:46] 10DBA, 06Community-Tech, 07Schema-change: Add local_user_id and global_user_id to localuser table in centralauth database - https://phabricator.wikimedia.org/T141951#2517589 (10kaldari) [17:40:11] 10DBA, 06Community-Tech: Create a maintenance script for populating the local_user_id and global_user_id fields in the centralauth localuser table - https://phabricator.wikimedia.org/T142503#2537232 (10kaldari) [17:40:19] 10DBA, 06Community-Tech, 07Schema-change: Add local_user_id and global_user_id to localuser table in centralauth database - https://phabricator.wikimedia.org/T141951#2517589 (10bd808) Current CA schema: https://phabricator.wikimedia.org/diffusion/ECAU/browse/master/central-auth.sql New columns would be adde... [17:41:13] 10DBA, 06Community-Tech, 07Schema-change: Add local_user_id and global_user_id to localuser table in centralauth database - https://phabricator.wikimedia.org/T141951#2537250 (10DannyH) p:05Triage>03Normal [17:41:20] 10DBA, 06Community-Tech: Create a maintenance script for populating the local_user_id and global_user_id fields in the centralauth localuser table - https://phabricator.wikimedia.org/T142503#2537232 (10DannyH) p:05Triage>03Normal [17:52:28] 10DBA, 03Community-Tech-Sprint, 07Schema-change: Add local_user_id and global_user_id to localuser table in centralauth database - https://phabricator.wikimedia.org/T141951#2537319 (10DannyH) [23:59:32] 10DBA, 06Community-Tech-Tool-Labs, 10Striker: Create production database and users for Striker - https://phabricator.wikimedia.org/T142545#2538784 (10bd808)