[07:54:11] <_joe_>	 Have you seen https://vitess.io/ ? It's cloud scale mysql!!1!
[07:54:21] <jynus>	 yeah
[07:54:32] <jynus>	 using it on google and slack
[07:54:50] <_joe_>	 it looks like a modern tungsten replicaator to me
[07:54:52] <jynus>	 but we prefer not to do sharding at the moment
[07:55:32] <_joe_>	 also, their CTO just told us we should call you people DBEs, not DBAs
[07:55:34] <jynus>	 it is just a framwork for sharding, so it requires code changes, we are not yet there- we are not bottlenecked by writes
[07:55:45] <_joe_>	 jynus: I agree
[07:56:19] <_joe_>	 it seems a nice project, but not something for us, tbh
[07:56:31] <_joe_>	 also they say the rewrite queries
[07:57:01] <jynus>	 well, they have to, to convert queries to send them to multiple hosts
[07:57:21] <_joe_>	 indeed
[07:57:34] <jynus>	 this is a sharding framework, it is on the radar, but not something we are invested right now
[07:57:45] <jynus>	 e.g. query rewrite is something that the dbs can do right now
[07:58:06] <jynus>	 and we are looking at other solutions for pooling, caching, etc.
[07:58:12] <jynus>	 high availability
[07:58:35] <moritzm>	 what does the E in DBE stand for? experts?
[07:58:36] <_joe_>	 I saw github is evaluating it, it makes sense for them probably
[07:58:40] <_joe_>	 Engineers
[07:58:44] <moritzm>	 ah
[07:58:50] <jynus>	 nah, they don't need it either
[07:59:02] <jynus>	 they have less traffic than we do too
[07:59:03] <_joe_>	 moritzm: it's like "dba is overused, let's invent a new term, like SRE"
[07:59:19] <_joe_>	 jynus: but the advantage is it's supposed to help run mysql on kubernetes
[07:59:28] <_joe_>	 from their prespective, I guess
[07:59:33] <jynus>	 sure, on cloud I see it
[07:59:42] <moritzm>	 I was hoping for something more elusive, like "database expressionist" or so
[07:59:52] <jynus>	 but we are not going that way, at least for now
[07:59:57] <_joe_>	 moritzm: ahahah
[08:00:11] <_joe_>	 jynus: yes, it doesn't make sense for us at all
[08:00:28] <jynus>	 I was thinking the other day
[08:00:46] <jynus>	 etcd is not for high bandwidth
[08:00:57] <jynus>	 but would it make sense for sessions?
[08:01:10] <jynus>	 given it is not really that high for us
[08:01:21] <_joe_>	 no, etcd doesn't make sense for sessions
[08:01:32] <_joe_>	 cassandra or anything else natively multi-dc does
[08:01:46] <jynus>	 but it is like 10 writes/s
[08:01:58] <_joe_>	 so we don't need a big cassandra cluster
[08:02:00] <_joe_>	 :)
[09:14:42] <wikibugs_>	 10DBA, 10Patch-For-Review: Productionize 8 eqiad hosts - https://phabricator.wikimedia.org/T192979#4173877 (10jcrespo)
[10:10:16] <Amir1>	 jynus: marostegui Hey, do you think it's okay to deploy this: https://gerrit.wikimedia.org/r/#/c/427202/
[10:11:20] <jynus>	 what does the script do?
[10:13:10] <Amir1>	 jynus: it sets term_search_key in wb_terms table with empty string
[10:13:28] <Amir1>	 clearing the table (making it smaller)
[10:16:07] <jynus>	 why head?
[10:17:21] <jynus>	 tac
[10:18:11] <jynus>	 that is more complex than it need to be
[10:18:14] <jynus>	 but ok if it works
[10:21:12] <jynus>	 my suggestion would be to print all values, sory by natural order, and take the largest -much easier
[10:33:03] <Amir1>	 jynus: yeah, it is the copy paste from an older script we used for wb_terms
[10:33:12] <Amir1>	 let me apply Lucas' note
[10:33:28] <jynus>	 anyway, please follow the requested workflow
[10:33:34] <jynus>	 you add me as a reviwer
[10:33:54] <jynus>	 and ask me to deploy with a bit ot background (given there is no text on the commit)
[10:34:04] <jynus>	 that will be faster than pinging me here
[10:39:09] <Amir1>	 yeah sure
[10:39:22] <wikibugs_>	 10DBA, 10Operations, 10ops-eqiad, 10Patch-For-Review: db1098 crashed and got rebooted - https://phabricator.wikimedia.org/T193331#4174122 (10jcrespo) s6 main_tables.txt have been checked, no errors found, now checking s7 instance:  ```  $ cat s7.dblist | while read db; do cat main_tables.txt | while read t...
[10:39:41] <jynus>	 can you also share some example logs on terbium on the commit ?
[10:39:44] <jynus>	 thanks
[10:40:34] <jynus>	 Actually, now that I read it Lucas suggetion was basically the same as mine
[10:42:00] <jynus>	 Amir1: this is not specifically for you, but you may want to ping someone from wikidata
[10:42:50] <Amir1>	 jynus: I don't get it
[10:42:53] <jynus>	 most of the warnings right now for T191282 seem to come from wikibase
[10:42:54] <stashbot>	 T191282: Wikimedia\Rdbms\LoadBalancer::{closure}: found writes pending - https://phabricator.wikimedia.org/T191282
[10:43:06] <jynus>	 (I was searching the ticket :-P)
[10:43:39] <jynus>	 not sure if wikibase or the new job queue is to blame
[10:43:48] <jynus>	 but they may be able to help debugging
[10:44:41] <Amir1>	 okay, I will bring this up in the sprint planing :)
[10:44:53] <jynus>	 I will add the wikidata tag there
[10:44:56] <jynus>	 and an explanation
[10:46:23] <Amir1>	 okay
[10:46:45] <jynus>	 I don't think it is urgent as in, things are breaking, but it seems to be causing 400K logs per hour
[10:47:35] <Amir1>	 that sounds pretty bad
[10:47:43] <Amir1>	 I will take a look ASAP
[10:47:55] <Amir1>	 btw. regarding the logs for the script, unfortunately I can't make the file, but thie is a sample
[10:48:03] <Amir1>	 https://www.irccloud.com/pastebin/bBITcPNM/
[10:48:11] <jynus>	 well you can maybe fake one
[10:48:13] <jynus>	 :-)
[10:48:31] <Amir1>	 better, I ran it for a minute and got the output :D
[10:48:37] <jynus>	 that is ok
[10:48:50] <jynus>	 I can move that to the right place
[10:48:57] <jynus>	 to bootstrap it
[10:49:16] <jynus>	 let me finish what I am doing now first
[10:49:39] <jynus>	 may merge after lunch, will ping you first
[10:49:58] <wikibugs_>	 10DBA, 10Continuous-Integration-Infrastructure, 10MediaWiki-Database, 10Quibble, and 2 others: Enable MariaDB/MySQL strict mode on CI slaves - https://phabricator.wikimedia.org/T119371#1824701 (10hashar) I have used the wrong task number (T118371). Logs:  >>! In T118371#4174152, @gerritbot wrote: > Change...
[10:51:07] <wikibugs_>	 10DBA, 10Continuous-Integration-Infrastructure, 10MediaWiki-Database, 10Quibble, and 2 others: Enable MariaDB/MySQL strict mode on CI db hosts - https://phabricator.wikimedia.org/T119371#4174200 (10jcrespo)
[10:52:12] <Amir1>	 thanks!
[11:59:07] <wikibugs_>	 10DBA, 10AbuseFilter, 10Patch-For-Review: Move AbuseFilter slow filters data from Logstash to per-filter profiling - https://phabricator.wikimedia.org/T179604#4174370 (10Daimona) The table is now prettier, and so is data storing. However, there's a doubt for which I'd like to hear some thoughts. In https://g...
[13:58:02] <godog>	 hi, I was looking at the list of reported smart failures and db1063
[13:58:38] <godog>	 db1064, db1066, db1073 are also in the list, are those going to be decom and/or known the disks are not health according to smart ?
[14:01:06] <jynus>	 well, most of those are going to be moved to less important roles
[14:01:30] <jynus>	 but they are for the most part in use
[14:01:35] <jynus>	 there is spare disks, though
[14:05:56] <jynus>	 then there is dbstore1002 which is analytics' and is scheduled for decom, but next quarter
[14:06:43] <jynus>	 could the messages be cut from cluster=mysql device=megaraid,5 instance=dbstore1002:9100 job=node site=eqiad
[14:07:06] <jynus>	 to device=X / device={X} ?
[14:07:29] <jynus>	 the rest, on the ones that it is failing is not really interesting @ icinga (will never change)
[14:08:19] <jynus>	 or even just "X"
[14:23:58] <godog>	 we could add an option to ignore certain tags per check yeah
[14:24:14] <jynus>	 but is it really different on the others?
[14:24:22] <jynus>	 why is the cluster relevant there?
[14:24:32] <jynus>	 is it to copy and paste?
[14:25:16] <godog>	 those are the tags on the metric that is in alert, not necessarily to copy and paste no
[14:25:32] <godog>	 bbiab
[14:25:46] <jynus>	 you mean that it is based on a generic alert with prometheus?
[14:26:06] <jynus>	 if it is, I understand
[14:39:39] <godog>	 yeah, so the smart metric we are alerting on has those tags
[14:42:57] <godog>	 jynus: we can add an option to display only certain tags tho, or hide some, do you mind filing a task?
[14:44:33] <jynus>	 I will
[14:45:14] <jynus>	 although I may need other examples on how that is being used, for context
[14:45:29] <jynus>	 but I guess we do not have many alerting right now
[14:48:35] <godog>	 indeed, not many
[14:59:46] <godog>	 jynus: for the db hosts would you like tasks for smart failures ? ditto for dbstore and labsdb
[15:00:04] <jynus>	 not sure
[15:00:25] <jynus>	 if they bother you, I will ack them for a day and wait for manuel to see what we do
[15:00:50] <godog>	 jynus: no bother no, I'm ok to wait for you and manuel to decide
[15:01:23] <jynus>	 those hosts are likely to be moved
[15:01:36] <jynus>	 from their current role
[15:08:12] <godog>	 ok! I've silenced them until tomorrow, just to avoid noise/confusion
[15:18:26] <wikibugs_>	 10DBA, 10Operations, 10ops-eqiad, 10Patch-For-Review: db1098 crashed and got rebooted - https://phabricator.wikimedia.org/T193331#4175144 (10jcrespo) No consistency issues found on s6 and s7, repooling.
[15:26:53] <wikibugs_>	 10DBA, 10Operations, 10ops-eqiad, 10Patch-For-Review: db1098 crashed and got rebooted - https://phabricator.wikimedia.org/T193331#4175165 (10jcrespo) @robh the specific incident for this host has been taken care, should we centralize the recurring issue into a separate task? If yes, I would close this as r...
[15:44:30] <wikibugs_>	 10DBA, 10Operations, 10ops-eqiad, 10Patch-For-Review: db1098 crashed and got rebooted - https://phabricator.wikimedia.org/T193331#4175204 (10jcrespo) p:05High>03Normal
[15:48:57] <wikibugs_>	 10DBA, 10Operations, 10ops-codfw, 10Patch-For-Review: db2081 crashed/rebooted, probably due to hardware failure - https://phabricator.wikimedia.org/T193325#4175216 (10jcrespo) I am leaving a check ongoing on wikidatawiki on some codfw hosts to proof no data was lost.
[16:51:03] <wikibugs_>	 10DBA, 10Operations, 10ops-eqiad, 10Patch-For-Review: db1098 crashed and got rebooted - https://phabricator.wikimedia.org/T193331#4175365 (10Marostegui) I vote for closing this and if it happens again on any other host. Open a general task and a case with the vendor.   db1100 crashed half a year ago and ne...
[20:18:50] <Hauskatze>	 jynus: when I use "describe user" on metawiki_p, the user_editcount field says it's NULLed, but in fact it's not, and should not
[20:19:48] <Hauskatze>	 https://phabricator.wikimedia.org/P7068
[20:48:58] <wikibugs_>	 10DBA: Drop flaggedrevs tables on wikis where it is not enabled - https://phabricator.wikimedia.org/T174801#4176392 (10MarcoAurelio)
[20:50:02] <wikibugs_>	 10DBA: Drop flaggedrevs tables at eswikibooks - https://phabricator.wikimedia.org/T193676#4176393 (10MarcoAurelio)
[20:52:34] <wikibugs_>	 10DBA: Drop flaggedrevs tables at eswikibooks - https://phabricator.wikimedia.org/T193676#4176405 (10MarcoAurelio) Strangely the tables have data, but we never used the extension on es.wikibooks (and it's not installed there either).
[20:56:45] <wikibugs_>	 10DBA: Drop flaggedrevs tables on wikis where it is not enabled - https://phabricator.wikimedia.org/T174801#4176423 (10MarcoAurelio)
[20:57:29] <wikibugs_>	 10DBA: Drop flaggedrevs tables at eswiki - https://phabricator.wikimedia.org/T193678#4176424 (10MarcoAurelio)
[21:31:40] <wikibugs_>	 10DBA, 10Epic, 10Tracking: Database tables to be dropped on Wikimedia wikis and other WMF databases (tracking) - https://phabricator.wikimedia.org/T54921#3464591 (10MarcoAurelio) With regards to one of the tables mentioned in the task description (arbcom1_vote.ibd) please note that Extension:BoardVote is mar...
[21:33:44] <wikibugs_>	 10DBA, 10Epic, 10Tracking: Database tables to be dropped on Wikimedia wikis and other WMF databases (tracking) - https://phabricator.wikimedia.org/T54921#4176562 (10MarcoAurelio)
[21:33:46] <wikibugs_>	 10DBA: Drop flaggedrevs tables from mediawikiwiki - https://phabricator.wikimedia.org/T186865#4176561 (10MarcoAurelio)