[07:54:11] <_joe_> Have you seen https://vitess.io/ ? It's cloud scale mysql!!1! [07:54:21] yeah [07:54:32] using it on google and slack [07:54:50] <_joe_> it looks like a modern tungsten replicaator to me [07:54:52] but we prefer not to do sharding at the moment [07:55:32] <_joe_> also, their CTO just told us we should call you people DBEs, not DBAs [07:55:34] it is just a framwork for sharding, so it requires code changes, we are not yet there- we are not bottlenecked by writes [07:55:45] <_joe_> jynus: I agree [07:56:19] <_joe_> it seems a nice project, but not something for us, tbh [07:56:31] <_joe_> also they say the rewrite queries [07:57:01] well, they have to, to convert queries to send them to multiple hosts [07:57:21] <_joe_> indeed [07:57:34] this is a sharding framework, it is on the radar, but not something we are invested right now [07:57:45] e.g. query rewrite is something that the dbs can do right now [07:58:06] and we are looking at other solutions for pooling, caching, etc. [07:58:12] high availability [07:58:35] what does the E in DBE stand for? experts? [07:58:36] <_joe_> I saw github is evaluating it, it makes sense for them probably [07:58:40] <_joe_> Engineers [07:58:44] ah [07:58:50] nah, they don't need it either [07:59:02] they have less traffic than we do too [07:59:03] <_joe_> moritzm: it's like "dba is overused, let's invent a new term, like SRE" [07:59:19] <_joe_> jynus: but the advantage is it's supposed to help run mysql on kubernetes [07:59:28] <_joe_> from their prespective, I guess [07:59:33] sure, on cloud I see it [07:59:42] I was hoping for something more elusive, like "database expressionist" or so [07:59:52] but we are not going that way, at least for now [07:59:57] <_joe_> moritzm: ahahah [08:00:11] <_joe_> jynus: yes, it doesn't make sense for us at all [08:00:28] I was thinking the other day [08:00:46] etcd is not for high bandwidth [08:00:57] but would it make sense for sessions? [08:01:10] given it is not really that high for us [08:01:21] <_joe_> no, etcd doesn't make sense for sessions [08:01:32] <_joe_> cassandra or anything else natively multi-dc does [08:01:46] but it is like 10 writes/s [08:01:58] <_joe_> so we don't need a big cassandra cluster [08:02:00] <_joe_> :) [09:14:42] 10DBA, 10Patch-For-Review: Productionize 8 eqiad hosts - https://phabricator.wikimedia.org/T192979#4173877 (10jcrespo) [10:10:16] jynus: marostegui Hey, do you think it's okay to deploy this: https://gerrit.wikimedia.org/r/#/c/427202/ [10:11:20] what does the script do? [10:13:10] jynus: it sets term_search_key in wb_terms table with empty string [10:13:28] clearing the table (making it smaller) [10:16:07] why head? [10:17:21] tac [10:18:11] that is more complex than it need to be [10:18:14] but ok if it works [10:21:12] my suggestion would be to print all values, sory by natural order, and take the largest -much easier [10:33:03] jynus: yeah, it is the copy paste from an older script we used for wb_terms [10:33:12] let me apply Lucas' note [10:33:28] anyway, please follow the requested workflow [10:33:34] you add me as a reviwer [10:33:54] and ask me to deploy with a bit ot background (given there is no text on the commit) [10:34:04] that will be faster than pinging me here [10:39:09] yeah sure [10:39:22] 10DBA, 10Operations, 10ops-eqiad, 10Patch-For-Review: db1098 crashed and got rebooted - https://phabricator.wikimedia.org/T193331#4174122 (10jcrespo) s6 main_tables.txt have been checked, no errors found, now checking s7 instance: ``` $ cat s7.dblist | while read db; do cat main_tables.txt | while read t... [10:39:41] can you also share some example logs on terbium on the commit ? [10:39:44] thanks [10:40:34] Actually, now that I read it Lucas suggetion was basically the same as mine [10:42:00] Amir1: this is not specifically for you, but you may want to ping someone from wikidata [10:42:50] jynus: I don't get it [10:42:53] most of the warnings right now for T191282 seem to come from wikibase [10:42:54] T191282: Wikimedia\Rdbms\LoadBalancer::{closure}: found writes pending - https://phabricator.wikimedia.org/T191282 [10:43:06] (I was searching the ticket :-P) [10:43:39] not sure if wikibase or the new job queue is to blame [10:43:48] but they may be able to help debugging [10:44:41] okay, I will bring this up in the sprint planing :) [10:44:53] I will add the wikidata tag there [10:44:56] and an explanation [10:46:23] okay [10:46:45] I don't think it is urgent as in, things are breaking, but it seems to be causing 400K logs per hour [10:47:35] that sounds pretty bad [10:47:43] I will take a look ASAP [10:47:55] btw. regarding the logs for the script, unfortunately I can't make the file, but thie is a sample [10:48:03] https://www.irccloud.com/pastebin/bBITcPNM/ [10:48:11] well you can maybe fake one [10:48:13] :-) [10:48:31] better, I ran it for a minute and got the output :D [10:48:37] that is ok [10:48:50] I can move that to the right place [10:48:57] to bootstrap it [10:49:16] let me finish what I am doing now first [10:49:39] may merge after lunch, will ping you first [10:49:58] 10DBA, 10Continuous-Integration-Infrastructure, 10MediaWiki-Database, 10Quibble, and 2 others: Enable MariaDB/MySQL strict mode on CI slaves - https://phabricator.wikimedia.org/T119371#1824701 (10hashar) I have used the wrong task number (T118371). Logs: >>! In T118371#4174152, @gerritbot wrote: > Change... [10:51:07] 10DBA, 10Continuous-Integration-Infrastructure, 10MediaWiki-Database, 10Quibble, and 2 others: Enable MariaDB/MySQL strict mode on CI db hosts - https://phabricator.wikimedia.org/T119371#4174200 (10jcrespo) [10:52:12] thanks! [11:59:07] 10DBA, 10AbuseFilter, 10Patch-For-Review: Move AbuseFilter slow filters data from Logstash to per-filter profiling - https://phabricator.wikimedia.org/T179604#4174370 (10Daimona) The table is now prettier, and so is data storing. However, there's a doubt for which I'd like to hear some thoughts. In https://g... [13:58:02] hi, I was looking at the list of reported smart failures and db1063 [13:58:38] db1064, db1066, db1073 are also in the list, are those going to be decom and/or known the disks are not health according to smart ? [14:01:06] well, most of those are going to be moved to less important roles [14:01:30] but they are for the most part in use [14:01:35] there is spare disks, though [14:05:56] then there is dbstore1002 which is analytics' and is scheduled for decom, but next quarter [14:06:43] could the messages be cut from cluster=mysql device=megaraid,5 instance=dbstore1002:9100 job=node site=eqiad [14:07:06] to device=X / device={X} ? [14:07:29] the rest, on the ones that it is failing is not really interesting @ icinga (will never change) [14:08:19] or even just "X" [14:23:58] we could add an option to ignore certain tags per check yeah [14:24:14] but is it really different on the others? [14:24:22] why is the cluster relevant there? [14:24:32] is it to copy and paste? [14:25:16] those are the tags on the metric that is in alert, not necessarily to copy and paste no [14:25:32] bbiab [14:25:46] you mean that it is based on a generic alert with prometheus? [14:26:06] if it is, I understand [14:39:39] yeah, so the smart metric we are alerting on has those tags [14:42:57] jynus: we can add an option to display only certain tags tho, or hide some, do you mind filing a task? [14:44:33] I will [14:45:14] although I may need other examples on how that is being used, for context [14:45:29] but I guess we do not have many alerting right now [14:48:35] indeed, not many [14:59:46] jynus: for the db hosts would you like tasks for smart failures ? ditto for dbstore and labsdb [15:00:04] not sure [15:00:25] if they bother you, I will ack them for a day and wait for manuel to see what we do [15:00:50] jynus: no bother no, I'm ok to wait for you and manuel to decide [15:01:23] those hosts are likely to be moved [15:01:36] from their current role [15:08:12] ok! I've silenced them until tomorrow, just to avoid noise/confusion [15:18:26] 10DBA, 10Operations, 10ops-eqiad, 10Patch-For-Review: db1098 crashed and got rebooted - https://phabricator.wikimedia.org/T193331#4175144 (10jcrespo) No consistency issues found on s6 and s7, repooling. [15:26:53] 10DBA, 10Operations, 10ops-eqiad, 10Patch-For-Review: db1098 crashed and got rebooted - https://phabricator.wikimedia.org/T193331#4175165 (10jcrespo) @robh the specific incident for this host has been taken care, should we centralize the recurring issue into a separate task? If yes, I would close this as r... [15:44:30] 10DBA, 10Operations, 10ops-eqiad, 10Patch-For-Review: db1098 crashed and got rebooted - https://phabricator.wikimedia.org/T193331#4175204 (10jcrespo) p:05High>03Normal [15:48:57] 10DBA, 10Operations, 10ops-codfw, 10Patch-For-Review: db2081 crashed/rebooted, probably due to hardware failure - https://phabricator.wikimedia.org/T193325#4175216 (10jcrespo) I am leaving a check ongoing on wikidatawiki on some codfw hosts to proof no data was lost. [16:51:03] 10DBA, 10Operations, 10ops-eqiad, 10Patch-For-Review: db1098 crashed and got rebooted - https://phabricator.wikimedia.org/T193331#4175365 (10Marostegui) I vote for closing this and if it happens again on any other host. Open a general task and a case with the vendor. db1100 crashed half a year ago and ne... [20:18:50] jynus: when I use "describe user" on metawiki_p, the user_editcount field says it's NULLed, but in fact it's not, and should not [20:19:48] https://phabricator.wikimedia.org/P7068 [20:48:58] 10DBA: Drop flaggedrevs tables on wikis where it is not enabled - https://phabricator.wikimedia.org/T174801#4176392 (10MarcoAurelio) [20:50:02] 10DBA: Drop flaggedrevs tables at eswikibooks - https://phabricator.wikimedia.org/T193676#4176393 (10MarcoAurelio) [20:52:34] 10DBA: Drop flaggedrevs tables at eswikibooks - https://phabricator.wikimedia.org/T193676#4176405 (10MarcoAurelio) Strangely the tables have data, but we never used the extension on es.wikibooks (and it's not installed there either). [20:56:45] 10DBA: Drop flaggedrevs tables on wikis where it is not enabled - https://phabricator.wikimedia.org/T174801#4176423 (10MarcoAurelio) [20:57:29] 10DBA: Drop flaggedrevs tables at eswiki - https://phabricator.wikimedia.org/T193678#4176424 (10MarcoAurelio) [21:31:40] 10DBA, 10Epic, 10Tracking: Database tables to be dropped on Wikimedia wikis and other WMF databases (tracking) - https://phabricator.wikimedia.org/T54921#3464591 (10MarcoAurelio) With regards to one of the tables mentioned in the task description (arbcom1_vote.ibd) please note that Extension:BoardVote is mar... [21:33:44] 10DBA, 10Epic, 10Tracking: Database tables to be dropped on Wikimedia wikis and other WMF databases (tracking) - https://phabricator.wikimedia.org/T54921#4176562 (10MarcoAurelio) [21:33:46] 10DBA: Drop flaggedrevs tables from mediawikiwiki - https://phabricator.wikimedia.org/T186865#4176561 (10MarcoAurelio)