[05:26:51] 10DBA, 10Analytics: hi.wikisource added to labs replicas? - https://phabricator.wikimedia.org/T227030 (10Marostegui) As @Reedy points out, hi.wiksource isn't created yet, not even its database {T219374}. As the wiki is marked as a public wiki, the process is as follows: - Database created - DBAs take over {T2... [05:27:46] 10DBA, 10Analytics, 10Data-Services: Prepare and check storage layer for hi.wikisource - https://phabricator.wikimedia.org/T219374 (10Marostegui) Adding #analytics as they are interested in knowing when this wiki finally gets created so they can sqoop data from it {T227030} [05:28:06] 10DBA, 10Analytics: hi.wikisource added to labs replicas? - https://phabricator.wikimedia.org/T227030 (10Marostegui) 05Open→03Declined [07:30:59] any objection to try to schedule s8 db master failover for 30th July? so we can unblock the wb_terms migration second phase and also decommision db1071? [07:41:29] ok [07:41:52] I will create the task(s) [08:08:15] for the prometheus/zarcillo change, I don't want to pollute the CR with a lot of forth and back, so asking here [08:08:28] what is the problem for the secrets? I might miss some context there :) [08:08:49] just zarcillo credentials? [08:12:57] I believe so [08:13:22] and where are those stored right now? [08:13:57] nowhere [08:14:09] the account doesn't exist, it will be a new account [08:14:25] 10DBA: Failover s8 (wikidatawiki) db primary master db1071 to db1104 (read-only required) - https://phabricator.wikimedia.org/T227062 (10Marostegui) [08:14:26] and we don't use hiera for any other mysql account [08:14:40] 10DBA: Failover s8 (wikidatawiki) db primary master db1071 to db1104 (read-only required) - https://phabricator.wikimedia.org/T227062 (10Marostegui) p:05Triage→03Normal [08:15:06] So I am asking what is the best way to go over it [08:15:29] you don't but I'm sure other things are there, for example debmonitor's db password is there [08:15:38] profile::debmonitor::server::django_mysql_db_password [08:15:40] specially thinking it is "not our puppet class" #notmyclass [08:16:13] (again, your class #notmyclass) :-D [08:16:33] git grep pass | grep db [08:16:39] in the private repo gives you an idea ;) [08:17:13] I don't see any problem to store it there [08:17:20] I actually don't like it [08:17:32] where do you want to store it? [08:17:38] host and password seem to be on secret [08:17:56] but what about user, port? [08:18:06] and database [08:18:40] 10DBA: Failover s8 (wikidatawiki) db primary master db1071 to db1104 (read-only required) - https://phabricator.wikimedia.org/T227062 (10Marostegui) [08:18:41] you can have most things in the public hiera and just password and maybe username in the private one I guess [08:19:00] we requested ports to be configurable, because we will multiplex proxies soon [08:19:19] that is the part I am not convinced, separate public and private configuration [08:19:34] so you'd rather store everything related to db on private? [08:19:38] in a random, not agreed fashion [08:19:56] no, I just want everybody to do the same everywhere [08:19:59] what we've done in some module is to add the private keys to the public one too, commented out [08:20:03] saying that are on the private one [08:20:23] but what is private? sure password [08:20:26] but user? [08:20:28] host? [08:20:33] 10DBA, 10Operations: Decommission db1061-db1073 - https://phabricator.wikimedia.org/T217396 (10Marostegui) [08:20:48] I would like to have a guidliness for databases [08:20:56] Host is definitely not private [08:21:00] you understand my question now, volans? [08:21:12] I want to agree on a method [08:21:12] then you have to define one :) I don't think we have it [08:21:18] not for this, but in general [08:21:21] sure [08:21:29] so we don't have 20 different methods [08:21:59] I like the mixed config with comments on what's in private [08:22:03] things ike [08:22:03] is the public ssl CA a secret? [08:22:05] # specified in the private repo [08:22:05] #k8s_infrastructure_users: [08:22:47] and seems used already in quite few places [08:22:54] to be considered a semi-standard [08:23:09] other example [08:23:10] #profile::ores::redis::password: actually in the private repo [08:23:13] also want something a bit more structured than "just add random hiera keys" [08:23:44] similar names, not "pass" "password" "mysql_pass" "db_pass" "database" [08:23:49] _password [08:24:56] 10DBA: Failover s8 (wikidatawiki) db primary master db1071 to db1104 (read-only required) - https://phabricator.wikimedia.org/T227062 (10Marostegui) [08:25:18] also sometimes people mix service passwords [08:25:27] with database that that service use [08:26:22] and I need easy, pseudo-standarized access to those in case we have to change host, port, pass, etc. [08:28:49] seems material for a small design doc/RFC to introduce [08:29:47] sure, but it would be rude for me to just made up it on my own [08:30:16] and before you think I am overthinking, not thinking was what led to current mw db password handilng [08:30:25] which you may be already familiar with [08:31:40] yeah, but an RFC is for that, you don't impose it, you propose it and seek shared agreement [08:32:02] for things in hiera I personally like the "full config" to be in public hiera with commented keys marked as private [08:32:14] becuase it gives you the full picture from one place (public) [08:32:20] so, should I send a patch, you review it and then we convert it to a standard? [08:32:22] and just the private bits are private [08:32:26] as in RFC? [08:32:28] which bits... TBD [08:32:47] that is another conversation [08:32:54] what is secret? [08:32:57] yeah [08:33:19] there are arguments of security vs obscurity [08:33:40] [10:32:02] for things in hiera I personally like the "full config" to be in public hiera with commented keys marked as private [08:33:40] -> I like that idea [08:34:10] jynus: For me I think private should only be password I think, the user I think it is fine to be public [08:34:23] There are arguments for both, of course, but I rather go for public for the user [08:34:26] in the longer term I hope we'll introduce some secret management software so we'll have to change something anyway probably :) [08:34:32] indeed [08:34:42] for the user I'm ok with both public or private, hosts seems public to me [08:34:50] I have a project not related to secrets [08:34:55] but to grants [08:34:56] also most of our users are guessable [08:34:56] +1 to host public [08:35:19] but it would only handle accounts from where to where [08:35:32] and an identifier so the password is stored encrypted somewhere else [08:35:47] because we do currently a poor dependency management [08:35:57] of X hosts need to acceess Y hosts [08:36:11] sorry I've to step out for a bit for an errand that cannot wait, will read backlog later [08:36:17] ok, thanks [10:49:10] marostegui: I am thinking of enabling report-hosts on all databases [10:49:32] for reliable replica identification (and not relying on server id) [10:50:04] so that SHOW SLAVE HOSTS shows the string configured (fqdn) [10:50:12] or the ip [12:21:39] yeah [12:21:51] I think we actually discussed that a year ago [12:21:59] but we never did it [12:22:02] so -1 [12:22:04] +1 [12:47:08] https://phabricator.wikimedia.org/P8698 [12:48:29] nice [12:48:56] are you planning to add some checks as in: do not continue if you find an error (like a slave doesn't start replicating)? [12:49:37] that happens already [12:49:52] Ah sweeet [12:49:52] although the logic is not always clear [12:50:00] And how sort of checks do you do after the move? [12:50:03] for example, one common error [12:50:05] Check if the position advances? [12:50:16] is that on change master, the connection fails [12:50:32] I give the error but do not touch anything else [12:50:42] yeah, great [12:51:17] 'success': False, 'errno': -1, 'errmsg': "error connecting to master 'replication@127.0.0.1:3308' - retry-time: 60 maximum-retries: 86400 message: Access denied for user 'replication'@'localhost' (using password: YES)"} [12:51:25] {'success': False, 'errno': -1, 'errmsg': "error connecting to master 'replication@127.0.0.1:3308' - retry-time: 60 maximum-retries: 86400 message: Access denied for user 'replication'@'localhost' (using password: YES)"} [12:53:30] db1r.move(db2) -> {'success': False, 'errno': -1, 'errmsg': 'The host is not configured as a replica'} [12:54:08] and when the slave gets moved, what checks are you planning on doing to say: ok, this worked fine? [12:54:45] 4. Restart replication on both hosts, wait a bit, make sure replication continues and both caught up [12:55:04] if current_status['success'] and sibling_status['success'] and current_status['slave_io_running'] == 'Yes' and current_status['slave_sql_running'] == 'Yes' and sibling_status['slave_io_running'] == 'Yes' and sibling_status['slave_sql_running'] == 'Yes' and self.lag() < self.timeout and sibling_replication.lag() < self.timeout [12:55:24] nice! [12:55:59] it can always improve [12:56:11] for example, right now, wait is static based on timeout [12:56:24] (like repl.pl) [12:56:39] but maybe it could wait only lag() seconds + 1 [12:56:59] already start_slave() [12:57:31] does: while (slave_thread == '' and (slave_status['slave_io_running'] != 'Yes' [12:57:41] Ah I see [12:57:42] Nice [12:57:58] Are you planning to add the other two features of repl.pl? [12:58:03] time.sleep(0.1) [12:58:09] so I run START SLAVE [12:58:18] and wait every 0.1 seconds until it says yes [12:58:37] up to timeout seconds, where I declare it "It didn't start well" [12:58:39] ah cool [12:58:49] the other two features of repl.pl ? [12:58:58] move() does that [12:59:11] it just identifys the case and apply it automatically [12:59:13] or [12:59:17] I mean the stopping slaves in sync and the move slave up [12:59:36] you can call WMFReplication.stop_in_sync_with_sibling(self, sibling) [12:59:46] move_sibling_to_child(self, sibling) [12:59:53] move_child_to_sibling(self, current_master, new_master) [13:00:22] but if you run move(new_master) it does everything for you [13:00:29] for the 4 methods implemented [13:00:32] lovely! :) [13:00:40] it doesn't still do arbitrary topology changes [13:00:54] I was just asking because the moving slaves up and stopping in sync is something we do use quite often [13:00:58] but it does the 2 repl.pl ones [13:01:06] yes, that is available [13:01:31] plus both are stopped at the same corodinate [13:01:43] plus the master's master is stopped [13:02:21] e.g. A -> B, B-> C, B-> D, B replication is stopped [13:02:47] Yeah I se [13:02:48] see [13:02:49] Cool [13:03:09] eventually move will work with arbitrary hosts [13:03:13] but that needs more work [13:03:25] sure, but this is a good start! [13:03:27] to cover for repl.pl [16:32:48] 10DBA, 10Operations, 10ops-codfw: rack/setup/install db21[21-30].codfw.wmnet - https://phabricator.wikimedia.org/T227113 (10RobH) p:05Triage→03Normal [16:33:00] 10DBA, 10Operations, 10ops-codfw: rack/setup/install db21[21-30].codfw.wmnet - https://phabricator.wikimedia.org/T227113 (10RobH) [19:16:07] 10DBA, 10Core Platform Team Backlog, 10MediaWiki-General-or-Unknown: Investigate query planning in MariaDB 10 - https://phabricator.wikimedia.org/T85000 (10WDoranWMF) p:05Triage→03Low