[07:39:04] good morning [07:40:22] sorry about pinging you yesterday, I didn't think I would find easily the slow query on our monitoring [07:41:22] no problem at all [07:42:02] it turned out to be a serious problem [07:42:29] (which on the positive side, it means it should not be very common) [07:42:41] yeah it looked a bit strange [08:07:07] volans, I want to reduce our dependency on events- while they have a numerous amount of advantages [08:07:31] they have very little visibility, and that creates issues in the long term [08:07:57] do we have a full list of all the ones that are in place? [08:08:05] that doesn't mean deleting them all at once [08:08:21] of course, but starting moving away from them [08:08:25] but stop creating them, and maybe substitute the ones that needs rewriting [08:08:54] I am going start by migrating analytics purging to a cron job inside mariadb::maintenance [08:09:08] as it needs rewriting [08:09:12] ok [08:09:30] the other thing that I tought is to stop using events for delayed slaves [08:09:51] and either backport or use a different server version for it [08:10:20] having to disable events every time there is a maintenance in order to stop a slave is a pain [08:10:33] I want your input (agree, disagree, etc.) [08:11:47] we do not need another ticket for a TODO kind of task [08:13:08] there is still watchdog-like events on the servers, which are more complex, as we do not want too much overhead on connection handling [08:13:44] but those can and due create issues like large error logs [08:13:56] s/due/do/ [08:14:06] for the delayed replication seems that they didn't include it in 10.1 either: https://jira.mariadb.org/browse/MDEV-7145 [08:14:18] yeah, saw that [08:14:41] which is understandable because multi-source replication [08:14:58] also, for the same reasons, pt-slave-delay segfaults [08:15:49] so maybe not something to think now, but if/when dbstore replacements arrive [08:16:30] and pt-slave-delay has the same issue of pt-heartbeat... is an outside process, could die, etc.. [08:16:53] yes [08:16:59] if it worked :-) [08:17:08] yeah :) [08:17:10] assuming that [08:17:14] it is not a win-win [08:17:43] but after being here for some time, maintainability and visibility is my number 1 priority [08:19:05] unlike heart, if that dies is not a huge issue, assuming proper monitoring [08:19:19] but at least you know it died [08:19:31] how about events that fail to execute? [08:20:04] issues? just kill the process/comment the cron [08:20:06] agree as long as the delayed monitor is done on both side (i.e. 25h < lag < 23h) [08:20:15] in fact [08:20:24] now it is not done on the short side [08:20:38] because it usually goes below that [08:21:05] a better method woudl allow more exact tuning of that (per instruction) [08:24:36] based on the timestamp of the next query? [08:25:12] yes, but that is done automatically by the MASTER_DELAY functionality [08:25:36] are you worried about DBs that are not written ver often? [08:25:41] s/ver/very/ [08:25:51] ? [08:26:47] I agree that a per-instruction is the exact way, but for databases that gets written all the time the delay should be pretty much accurate too [08:26:49] the delay? Not much. I am worried about having to handle events every single time I do maintenance [08:27:11] or not knowing what is running on the databases [08:27:27] of course, I was comparing the basic delay method vs per-instruction [08:27:52] I fully agree that events are hidden [08:28:00] I do not care much about that exactitude, but commenting why 25h < lag < 23h is not possible now [08:28:13] but it could be [08:28:39] why it goes below the short side? [08:28:52] check the event logic [08:29:11] ok :) [08:29:13] it is basically a cron, so it has limitations [08:31:32] yep, every 5 seconds if I read it correctly [08:31:51] which for x1 it is enough to catch up [08:32:42] see also m3 [08:33:36] not worried- those are small enough to recover from backups [08:33:57] what about using START SLAVE UNTIL? (probably what pt-slave-delay does?) [08:34:31] I do not want any of these, I have not an issue with that [08:34:44] my issue is with events [08:35:29] I think this is way more sane: https://dev.mysql.com/doc/refman/5.6/en/replication-delayed.html [08:36:09] absolutely, as long as they will merge it into MariaDB too :) [08:36:16] they [08:36:18] or us [08:36:24] or a third way [08:36:34] s/they/someone/ [08:37:01] Percona does support Toku and that [08:37:36] sadly, multi-source replication is only on 5.7, which is not stable at all [08:38:30] not stable the multi-source on 5.7 or still not stable 5.7 in general? [08:38:41] I would say neither [08:39:16] it also does not have to be percona if we go all-innodb, as I want [08:39:45] but same issue [08:40:06] and we do not want to maintain 1 single different server just for that [08:40:30] the ideal solution is to backport the feature, but with multi-source, it is not trivial [08:42:46] givent hat MySQL 5.7 has both (delayed replication and multisource), assuming delayed works with multi-source too, should not the code be pretty much already there? [08:42:55] no [08:43:08] multisource replication is completelly different on both [08:43:20] totally different implementation [08:43:29] true, I forgot that [08:43:51] to the point that it is the reason MAriaDB has delayed it, even if it is a ver much required feature [08:44:36] not worried too much, I think it can stay like this for now [08:45:02] maybe try to fix pt-s-d is more feasable short-term [08:45:29] probably [08:45:47] or [08:45:57] consider what's after 10.0 [08:46:29] if 10.1 or if it makes sense to have a pool of 5.7s for specific roles [08:59:57] OT: there was quite a spike on wfLogDBError around 22 UTC last night, all from db1034 (s7 slave, watchlist & co) [09:00:11] I'm taking a look at slow queries [09:28:15] I might have some culprit, do you think is worth to chase it? [09:28:39] sure [09:28:49] unless it is a known and reported issue [09:28:51] :-) [09:29:05] good luck to me to find it on phab :D [09:29:17] well, send me the link and I will tell you [09:30:03] watchlist has some issues on very large watchlists on editing [09:30:18] they are known, and already there is a plan [09:30:30] I "think" it was this: [09:30:35] https://tendril.wikimedia.org/report/slow_queries_checksum?checksum=452a75a017b7a9219cc1d22dc3458067&host=db1034&user=&schema=&hours=12 [09:32:26] usally I file them with the module name (ApiQueryContributions) [09:32:36] https://phabricator.wikimedia.org/T131065 [09:32:43] around ~50 of them were executed around that time and the explain is not very conforting [09:33:35] the problem is the FORCE + indexes [09:33:49] actually I tried withouth and uses the same index [09:34:03] not separately [09:34:10] and not for every query [09:34:20] just add a commente there [09:34:32] same ticket? [09:34:46] yes, of course [09:34:52] :) [09:35:26] find if you can the corresponding api call on the logs [09:35:39] but do not disclose publicly ips or users [10:04:44] running [10:04:49] yes | for host in db2036 db2043 db2050 db2057 dbstore2001 dbstore2002; do ./osc_host.sh --host=$host.codfw.wmnet --db=ruwiktionary --table=categorylinks --method=ddl --no-replicate "DROP INDEX cl_collation, ADD INDEX cl_collation_ext (cl_collation, cl_to, cl_type, cl_from)"; done [11:03:12] where to find the API call logs? [11:03:26] kibana [11:03:37] it will have created a db error [11:03:52] limit the timestamp [11:04:05] yes I did, was looking there, let me reched [11:04:08] *recheck [11:04:21] didn't you start your research preciselly there :-) [11:04:47] the raw mediawiki logs are in fluorine [11:05:12] but there should not be a difference with the ones on logstash [11:05:18] yes but there are only connection erros on kibana [11:05:25] I didn't see the API request [11:05:43] only /w/api.php [11:05:53] there is a board on long running queries [11:06:01] as complained by HHVM too [11:06:17] are POST, so maybe is for that they are not in kibana [11:07:07] then maybe they are not on the logs at all [11:08:15] most of the queries were on cawiki, there are just few errors from cawiki, most of them are spread for many wikis [11:08:28] hard to say which call generated that query from wfLogDBError [11:09:48] check near here: https://logstash.wikimedia.org/#dashboard/temp/AVR2TwhOO3D718AOAAX2 [11:10:54] so not spend too much time on that, as I said, it is a known issue [11:12:04] ok, thanks [11:12:47] I would be more interested on restating semisync [11:12:48] found one [11:13:21] yeah, for sure, was already in my list for today, sorry for the distraction [11:40:48] I was looking at db1040, you reimaged it yesterday right? [11:41:37] ah but with the data copied away and then back I guess [11:42:13] this would explain why it has rpl_semi_sync_master loaded too... [11:42:34] yep [11:43:23] great, for a moment I thought my change did something wrong [11:54:08] * volans getting lunch, all commands ready to do semisync just after [12:54:52] I'll start with s2 [12:55:52] good, log once [13:01:18] metadata lock issues on s3 are inexistent, so ok to run there online ddl/pt-osc [13:02:24] I think they only happen on large read-wikis like commons/enwiki/dewiki/wikidata/etc. [13:03:16] good, I was just looking at this during lunch :) http://code.openark.org/blog/mysql/solving-the-facebook-osc-non-atomic-table-swap-problem [13:06:07] not really our problem [13:06:18] we do not have issues with table swapping [13:06:33] but with creating and dropping the triggers [13:07:14] I know was just on today's feeds, I laugh for the coincidence [13:07:22] on the argument [13:07:23] oh, is it new [13:07:29] today's [13:07:32] I though I had read it before [13:07:57] lol [13:08:19] "Rumor has it that we at GitHub are developing a new, triggerless, Online Schema Change tool. It is rumored to be based off binary logs and is rumored to have lots of interesting rumored implications." [13:08:22] I like that [13:08:53] although I would like to see the consistency, specially on an env like ours [13:09:18] and multi-source [13:10:06] bah, for external apps it is easy to patch [13:10:31] it usually just means default_master_connection [13:11:17] they mention the possibility could be based on binlogs reading [13:11:23] I wonder [13:11:57] how to achieve consistency, because applying binlogs out of order scares me [13:12:03] specially STATEMENT ones [13:12:17] without GTIDs [13:12:35] my goal is to solve that with brute force [13:12:50] pool & depool automatically [13:12:56] including the masters [13:13:02] masters, in plural [13:14:15] mostly because it solves more than osc at the same time [13:14:41] and our masters have generally low load [13:14:49] solve most of the issues, TLS, upgrades, etc... [13:15:02] "solve" is too hard [13:15:07] mitigate? [13:15:28] make it easier [13:15:52] and of it is 100% depooled, it can be done unattended [13:16:00] which means it can be automatized [13:16:44] all those other options are ok, but still increase load, etc. [13:17:16] plus most rely on primary keys being there, which is a goal too, but not reality [13:17:56] ok, running the schema change on the master, we will see if the upgrade is worth it [13:18:07] which shard? [13:18:25] s3 ok [13:18:26] s3 [13:19:30] no issues for now, hopefully also not when it finishes [13:20:04] (it is obviously run disabling the binlog, but just in case) [13:33:27] so far so good, s2 master was already having the plugin but turned off, now is on, stats are good [13:34:13] the only one that I'm looking at to be sure is not something to worry about is rpl_semi_sync_master_wait_pos_backtraverse [13:34:17] https://mariadb.com/kb/en/mariadb/semisynchronous-replication-plugin-status-variables/#rpl_semi_sync_master_wait_pos_backtraverse [13:35:38] it is slightly increasing, it's at 29 so far in ~23 minutes [13:36:16] I have not an idea what that is, looking [13:36:41] but semisync makes the master being slower is actually a good thing [13:36:48] that is intended [13:37:11] in fact, I think in 5.7 there is the chance to wait for > 1 server [13:37:22] and we would use that if we could [13:37:53] but why the order should be inverted? [13:38:12] Quote: "Occurs if the order in which transactions start waiting for a reply is different from the order in which their binary log events were written" [13:39:15] as far as I know we do not have parallel replication enabled [13:39:22] or group commit, etc. [13:40:53] btw the average wait time of a transaction is ~620 microseconds [13:41:05] that is good [13:42:14] 1612 QPS (assuming serial transactions, which we of course do not do) [13:43:05] do not overthink semisync too much, it has been on for years already, it is not a new setup [13:43:15] it just was missconfigured and forgotten [13:44:47] BTW, I am closing some tickets harder than usual, because I realized that having a ticket opened that has not clear actionables creates noise [13:45:13] that doesn't mean they should not be reopened, etc. [13:45:29] I will probably make mistakes [13:45:31] ok, I'll skip s3 for now and proceed to s4... until your OSC finishes [13:45:53] it finished already: T130692 [13:45:53] T130692: Add new indexes from eec016ece6d2b30addcdf3d3efcc2ba59b10e858 to production databases - https://phabricator.wikimedia.org/T130692 [13:46:38] ops, I missed it, great [13:47:36] s3 then [14:22:01] * andrewbogott waves at jynus [14:22:31] so, andrewbogott, at some point let's test changing pointing to localhost there [14:22:41] and that way we will know [14:23:25] jynus: can do it right now if you're not in the thick of other things. [14:23:43] it is ok [14:23:53] Also I gave you all the wrong hostnames just now, they should be *2001 [14:24:00] which I'm sure you knew already [14:24:03] ? [14:24:17] ah, yes, I understood you [14:24:18] labtestservices2001 [14:24:35] I basically followed at all times what the role did for the install [14:25:06] so if that is really a test thing, just change the host to localhost (if it has logs, the better) [14:25:16] then I will add the grants [14:25:22] if it fails [14:25:44] once it works, I will puppetize and add the right grants [14:26:11] ok, my test case is dig @labtestservices2001.wikimedia.org control.labtestproject.codfw.labtest [14:26:18] which returns 10.196.16.17 currently [14:26:24] I'll change the config now [14:26:47] maybe it does the right thing and switches to socket [14:27:00] hang on, that was the 'before I change something' test [14:27:05] now I've changed the config and restarted... [14:27:09] looks like pdns won't start [14:27:27] do you have any kind of log? [14:27:36] looking, one second... [14:27:50] gmysql Connection failed: Unable to connect to database: Access denied for user 'pdns'@'localhost' (using password: YES) [14:28:02] oh, that is both good and bad [14:28:29] good because it uses the right thing, bad because it has the incorrect pass [15:17:50] jynus: according to config x1 should have semi-sync too, but the eqiad slave is db1029 that is still on 5.5, will the plugin be compatible? [15:18:19] yes, it should [15:18:33] in any case, I have to reimage the slave soon [15:18:39] so far I skipped old 5.5 masters because they will be reimaged soon [15:18:59] ok, I'll enable it to x1 too [15:20:02] seems to work :) [15:42:19] jynus: in addition to pdns needing to r/w that database from 'localhost', we also need designate to be able to r/w [15:42:48] Are those grants already on your radar? [15:43:52] no [15:44:09] doesn't designate have its own database? [15:44:27] yes — this is messy :( [15:44:34] clarify that [15:44:39] Designate has the canonical representation of domains and records [15:44:52] Records are synced with pdns via AXFR (which means that pdns is writing them itself) [15:44:53] that still stays on m5? [15:45:02] but AXFR doesn't support domain syncing [15:45:07] no, I do not care about the why [15:45:13] so to tell pdns about newly created domains designate has to write directly to the pdns database [15:45:23] I'll figure out what the grants should look like and update the bug [15:45:23] what databases and users must go where? [15:45:39] please comment on the bug [15:46:08] I only moved pdns db and user [15:57:27] what about pdns_admin account ? [15:58:50] andrewbogott, see https://gerrit.wikimedia.org/r/286671 [15:59:00] there is some confusion there [16:13:57] jynus: added comments but I'm not sure I'm addressing your confusion [16:15:01] so I have pdns user, AFAIK, correctly being added using the same hiera key that on the client [16:15:19] there was another user on the old db called pdns_admin [16:15:52] oh, I see your comment now [16:16:01] that was the missing key [16:16:41] and then it is the other user you mentioned, designate something [16:43:45] jynus: weirdly, designate appears to use the 'pdns' user to access the pdns database. [16:43:55] It works but, maybe I should refactor that? [16:44:32] well, I think that is the sane option [16:44:40] ok, will write a patch... [16:44:42] the issue is from where and to which db [16:44:46] no, I mean [16:44:51] leaving it as it is [16:44:55] oh, ok! [16:45:10] my question is which host it tries to connect to and from where [16:45:12] let me figure out where in hiera the host is coming from... [16:45:37] because if I remember correctly, designate is its own db, on m5, not replicated [16:45:52] that is the only thing that should be clarified [16:46:08] the designate service uses two databases — 'designate' which will remain unchanged [16:46:12] because you only asked me to move pdsn [16:46:15] and 'pdns' which is what we're changing [16:46:30] ok, just make sure it uses m5 for designate [16:46:34] yep [16:46:38] and localhost for pdns [16:46:58] It can't use localhost though [16:47:11] https://gerrit.wikimedia.org/r/286671 should do it [16:47:36] There's only one designate process. It needs to talk to /both/ new pdns databases. [16:47:40] So it can't refer to them as localhost [16:47:53] so that is a grant that is missing [16:47:53] My current patch names them by IP, but I can do it by fqdn instead if that's better [16:48:01] no, ip is required [16:48:23] just add a comment on the grants (the grants are for the clients from the server prespective) [16:48:26] and it will be coming in from labservices1001, but let me figure out what the right hiera name is for that [16:48:43] server==db server [16:50:12] volans, https://phabricator.wikimedia.org/T131368#2260608 [16:51:02] where are our DBs? :( [16:51:24] lost in translation... [16:51:41] comming on the 25th [16:51:58] jynus: I commented with what we should need. Hope it makes sense [16:52:34] get prepared for fun of 16 of T133398 [16:52:34] T133398: Install, configure and provision recently arrived db core machines - https://phabricator.wikimedia.org/T133398 [16:54:04] yeah! [16:54:08] andrewbogott, that makes totally sense [16:54:33] just that with so many combinations, it will require some care on failover [16:55:09] when you have a moment I'd like you to have a look at https://phabricator.wikimedia.org/T133584 (just last few posts) [16:55:20] T133584 [16:55:20] T133584: Reduce stress caused by IABot on the DB - https://phabricator.wikimedia.org/T133584