[01:04:51] <wikibugs>	 10DBA, 10MediaWiki-API, 07Performance: Slow query in API list=tags - https://phabricator.wikimedia.org/T164552#3237462 (10Catrope)
[01:23:49] <wikibugs>	 07Blocked-on-schema-change, 10DBA, 10MediaWiki-extensions-ORES, 06Scoring-platform-team: Deploy uniqueness constraints on ores_classification table - https://phabricator.wikimedia.org/T164530#3236866 (10Catrope) >>! In T164530#3236896, @jcrespo wrote: > Thanks. One thing that I have been suggesting lately...
[01:28:53] <wikibugs>	 10DBA, 10MediaWiki-API, 10MediaWiki-Change-tagging, 13Patch-For-Review, 07Performance: Slow query in API list=tags - https://phabricator.wikimedia.org/T164552#3237504 (10TTO)
[05:43:47] <wikibugs>	 10DBA, 10Wikidata, 13Patch-For-Review, 07Schema-change: Drop the useless wb_terms keys "wb_terms_entity_type" and "wb_terms_type" on "wb_terms" table - https://phabricator.wikimedia.org/T163548#3237677 (10Marostegui) db2059 is done: ``` root@neodymium:~# mysql --skip-ssl -hdb2059.codfw.wmnet wikidatawiki -...
[05:44:45] <wikibugs>	 07Blocked-on-schema-change, 10DBA, 10Wikidata, 13Patch-For-Review, 03Wikidata-Sprint: Deploy schema change for adding term_full_entity_id column to wb_terms table - https://phabricator.wikimedia.org/T162539#3237678 (10Marostegui) db2059 is done: ``` root@neodymium:~# mysql --skip-ssl -hdb2059.codfw.wmnet...
[07:23:49] <wikibugs>	 10DBA: Run pt-table-checksum on s3 - https://phabricator.wikimedia.org/T164488#3237753 (10Marostegui) s3 is going to be interesting to run the pt-table-checksum on...as not all the tables exist on all the wikis as per my last checks some months ago. I would start by doing a list of the most important tables we w...
[07:33:53] <wikibugs>	 07Blocked-on-schema-change, 10DBA, 10Expiring-Watchlist-Items, 10MediaWiki-Watchlist, and 3 others: Add wl_id to watchlist tables on production dbs - https://phabricator.wikimedia.org/T130067#3237779 (10Marostegui) a:03Marostegui
[08:23:34] <jynus>	 I have started to load again db1015
[08:23:41] <jynus>	 minus cebwiki
[08:23:45] <marostegui>	 good!
[08:23:52] <marostegui>	 we will see how long it takes
[08:24:15] <jynus>	 I can also start testing more on dbstore2001
[08:24:42] <marostegui>	 https://grafana.wikimedia.org/dashboard/file/server-board.json?refresh=1m&orgId=1&var-server=db1015&var-network=eth0
[08:25:34] <marostegui>	 Are you loading in parrallel then? With how many threads if yes?
[08:25:55] <jynus>	 8 threads
[08:26:09] <marostegui>	 nice
[08:26:17] <marostegui>	 so 8 databases in parallel?
[08:26:25] <jynus>	 no
[08:26:37] <jynus>	 it can do intra-db and intra-table parallelism, too
[08:26:44] <marostegui>	 nice!
[08:27:13] <jynus>	 I have disabled all alerts on db1015
[08:27:23] <jynus>	 in case something goes wrong again
[08:27:38] <marostegui>	 you think it will finish today?
[08:27:41] <jynus>	 yes
[08:27:52] <jynus>	 cebwiki is a very large db
[08:28:00] <jynus>	 and should be migrated away from s3
[08:28:05] <marostegui>	 is that the one with a huge templatelinks table?
[08:28:10] <jynus>	 yes
[08:28:23] * marostegui worried that knows that from his memory
[08:28:23] <jynus>	 migrating away is the first steop
[08:28:36] <jynus>	 the real way to fix this is to avoid:
[08:36:56] <jynus>	 | 2617034 |           10 | ;        |                 0 |
[08:37:05] <jynus>	 repeated 100M
[08:37:08] <jynus>	 https://ceb.wikipedia.org/w/index.php?title=Espesyal:WhatLinksHere/Plantilya:;&hidelinks=1&hideredirs=1
[08:37:36] <wikibugs>	 10DBA, 06Operations: Decomissions old s2 eqiad hosts (db1018, db1021, db1024, db1036) - https://phabricator.wikimedia.org/T162699#3237844 (10Marostegui) You think db1024 can go away now? It was the old old master (it is depooled) and as per T154485#3171631 we should be good to go.
[08:38:24] <wikibugs>	 10DBA, 06Operations: Decomissions old s2 eqiad hosts (db1018, db1021, db1024, db1036) - https://phabricator.wikimedia.org/T162699#3237847 (10jcrespo) Yes
[08:39:35] <wikibugs>	 10DBA, 06Operations: Decomissions old s2 eqiad hosts (db1018, db1021, db1024, db1036) - https://phabricator.wikimedia.org/T162699#3237850 (10Marostegui) >>! In T162699#3237847, @jcrespo wrote: > Yes  Great, will prepare the patches and merge them next week and let Chris know so he can do his part too.
[08:48:22] <jynus>	 oh, you are running alter table there
[08:48:24] <jynus>	 I will wait
[08:48:51] <marostegui>	 Yes, in 2001 and 1001
[08:48:53] <marostegui>	 2002 is free
[08:49:05] <jynus>	 yeah, but I want a place with plenty of space
[08:49:12] <marostegui>	 Ah
[08:49:14] <jynus>	 I will wait
[08:49:16] <jynus>	 np
[08:49:41] <jynus>	 mydumper detected the long-running query and warned
[08:51:18] <marostegui>	 Sorry then, it will take almost the whole day
[08:52:06] <jynus>	 no sorry!
[08:52:30] <jynus>	 as if I didn't have  plenty of things pending to do!
[08:53:47] <marostegui>	 haha
[08:53:48] <marostegui>	 yeah
[09:02:51] <jynus>	 or
[09:03:07] <jynus>	 I can try remote dumping from an inactive codfw server
[09:03:36] <jynus>	 for example db2071
[09:10:16] <marostegui>	 Sure, not touching that one 
[09:16:24] <wikibugs>	 10DBA, 06Operations, 13Patch-For-Review: Decomissions old s2 eqiad hosts (db1018, db1021, db1024, db1036) - https://phabricator.wikimedia.org/T162699#3237920 (10Marostegui) All the patches to decommission db1024 are ready now. I will merge next week and create an specific task for Chris for that host so he d...
[09:54:31] <jynus>	 is there any database pending to clone?
[09:55:06] <marostegui>	 pending to clone for what?
[09:55:16] <jynus>	 like pending to fix
[09:55:20] <marostegui>	 like broken that needs cloning?
[09:55:20] <marostegui>	 ah
[09:55:24] <jynus>	 or to install for the first time
[09:55:36] <marostegui>	 mmm, no, because db1022 was the only one, and we decided not to care about it
[09:55:41] <jynus>	 I saw a new method of cloning that could potentially be faster
[09:55:52] <jynus>	 and supports multi-hosts
[09:56:06] <marostegui>	 To install for the first time, maybe you can take one of the new ones and check the future db-eqiad.php and place it somewhere were it will be needed
[09:56:32] <jynus>	 maybe I can clone 2 of those, even if later we discard them
[09:56:41] <marostegui>	 sure
[09:56:58] <marostegui>	 But in order not to waste your time, if there is a clear place where it will go in the future, just do it I would say
[09:57:06] <jynus>	 sure
[09:57:17] <jynus>	 I mean, of course, that is why I saked
[09:57:21] <marostegui>	 yeah yeah
[09:57:37] <marostegui>	 I haven't reviewed that file in a while, but maybe there are some obvious cases 
[09:57:41] <marostegui>	 so you can install those already
[10:00:52] <jynus>	 I think aside from other changes, most of the new hosts will be the multi-instance ones
[10:01:21] <jynus>	 I know, I can clone s5 to future s8
[10:01:28] <marostegui>	 :)
[10:01:35] <jynus>	 and leave it for now as s5 with everthing
[10:03:27] <jynus>	 oh, lots of servers not booting
[10:03:36] <marostegui>	 yeah
[10:03:37] <marostegui>	 :(
[10:04:04] <jynus>	 this is getting complex- with current state, desired staten and everthing in-between
[10:05:37] <jynus>	 also, the earlier we use them and crash them the earlier we can complain
[10:05:43] <marostegui>	 haha
[10:05:44] <marostegui>	 yes
[10:06:17] <jynus>	 I am going to update the table on top of https://phabricator.wikimedia.org/T162233
[10:06:25] <jynus>	 with the latest status of each one
[10:06:30] <marostegui>	 that must be updated
[10:06:34] <marostegui>	 I think I did it
[10:06:41] <jynus>	 no, not the racks
[10:06:45] <marostegui>	 ah
[10:06:51] <jynus>	 but the lifecycle
[10:07:15] <marostegui>	 the ones that are not strikethrough are the ones that are failing
[10:07:36] <jynus>	 ok, I didn't get that
[10:07:38] <jynus>	 thanks
[10:07:44] <marostegui>	 maybe it is not clear
[10:07:56] <marostegui>	 and I should specify it on the original table?
[10:07:57] <jynus>	 I will make it explicit
[10:08:02] <jynus>	 dont' worry
[10:08:05] <marostegui>	 ok thanks!
[10:08:16] <marostegui>	 sorry for the confusion :)
[10:16:41] <wikibugs>	 10DBA, 06Operations, 13Patch-For-Review: eqiad rack/setup 11 new DB servers - https://phabricator.wikimedia.org/T162233#3238142 (10jcrespo)
[10:21:10] <jynus>	 I am going to try to provision db1099 and db1103 at the same time from db1049
[10:22:42] <volans>	 is es2019 still in "probation"? all notifications are disabled in icinga
[10:22:53] <jynus>	 yes
[10:22:59] <jynus>	 it is not pooled
[10:23:06] <jynus>	 and I have to run a script to check it still
[10:23:43] <volans>	 how much time we have spent on that guy all together? :D
[10:23:45] <marostegui>	 Yeah, we left it there
[10:23:49] <volans>	 ETOOMUCH IMHO
[10:24:26] <jynus>	 volans, do you want to be responsible for losing wikipdia content?
[10:24:35] <jynus>	 I think worth it every time
[10:24:53] <jynus>	 in money and all that? clearly not
[10:25:02] <marostegui>	 jynus: it would be nice to run your comparey.py there, to see if it behaves well with load
[10:25:06] <marostegui>	 or it crashes again
[10:25:07] <jynus>	 buying a new server would have cost less
[10:25:19] <jynus>	 marostegui, that's the plan
[10:25:29] <jynus>	 with all the other pending tasks
[10:25:32] <jynus>	 :-)
[10:25:52] <volans>	 I was referring to get a replacement one, from the manufacturer, given that crashes all the time
[10:26:01] <volans>	 not that the time spent was not worth
[10:26:02] <volans>	 ;)
[10:26:21] <volans>	 like it would have been so better if we were able to get it replaced after the 2nd or 3rd crash
[10:26:30] <wikibugs>	 10DBA, 06Operations, 10ops-codfw: es2019 crashed again - https://phabricator.wikimedia.org/T149526#3238169 (10jcrespo) 05Resolved>03Open Pending to run compare.py around the ids obtained before.
[10:26:32] <marostegui>	 yeah, and not get it wiped too
[10:26:33] <marostegui>	 :p
[10:26:50] <jynus>	 volans I would love to had that
[10:26:54] <jynus>	 but don't tell me
[10:27:01] <jynus>	 tell papaul or robh
[10:27:18] <jynus>	 and they will explain to you why they didn't/couldn't do that
[10:27:28] <marostegui>	 i wonder what is needed to get a full replacement, I guess you really need to probe it
[10:27:38] <marostegui>	 otherwise change main board and good luck
[10:27:42] <marostegui>	 but it is the same we had with db2034
[10:27:51] <marostegui>	 it would have been way easier to get it replaced
[10:27:53] <marostegui>	 that one was fun
[10:29:29] <jynus>	 53m28.379s to backup enwiki
[10:29:35] <marostegui>	 nice!
[10:29:44] <jynus>	 despite the alter
[10:29:53] <jynus>	 although I suspect a remote copy may be faster
[10:30:04] <jynus>	 93G
[10:31:05] <jynus>	 enwiki.revision.00000.sql.gz - enwiki.revision.00006.sql.gz + enwiki.revision-schema.sql.gz
[10:32:24] <marostegui>	 so, one file with the table structure and then 6 files with the data itself?
[10:32:30] <jynus>	 yes
[10:32:33] <jynus>	 for large tables
[10:32:33] <marostegui>	 very nice
[10:32:43] <jynus>	 those can be loaded in parallel
[10:32:55] <marostegui>	 for the same table? very nice
[10:33:13] <jynus>	 basically, pagelinks, revision, text and templatelinks
[10:34:13] <jynus>	 we may want to even separate backup place from lagged slave
[10:34:25] <jynus>	 in the future
[10:35:13] <jynus>	 you can see the structure at dbstore2001:/srv/tmp/export-20170505-090626 if curious
[10:35:32] <jynus>	 including a metadata file
[10:36:55] <marostegui>	 So you define at which point you want to chunk the data? like: tables bigger than 100M rows and then it automatically does that?
[10:37:05] <jynus>	 either that or by size
[10:37:12] <marostegui>	 Or can you also say: for this table I want 10 chunks
[10:37:13] <jynus>	 I thought by row was better
[10:37:23] <jynus>	 thinking on parallel load
[10:37:28] <jynus>	 even if they are thiner
[10:38:37] <jynus>	 the slices are not perfect because it uses heuristics to be able to do paralelism
[10:39:49] <jynus>	 with this we can have more flexible backups in 8 hours
[10:39:58] <marostegui>	 yeah, it is a nice discovery
[10:40:37] <jynus>	 well, I was going to reimplement it because I though this older version wouldn't work for us
[10:40:53] <jynus>	 the version in strech works better
[10:40:58] <jynus>	 regarding locks
[10:41:25] <jynus>	 also no gtid registration, only binlog position
[10:42:05] <jynus>	 my aim with this is to create a cron + script on dbstore2001, test it
[10:42:21] <jynus>	 and once we are happy, run it on dbstore1001, too
[10:44:26] <wikibugs>	 10DBA, 06Operations: es2019 crashed again - https://phabricator.wikimedia.org/T149526#3238218 (10jcrespo) a:05Marostegui>03jcrespo
[10:50:57] <wikibugs>	 10DBA, 06Operations: Create less overhead on bacula jobs when dumping production databases - https://phabricator.wikimedia.org/T162789#3238231 (10jcrespo) I am recovering db1015 again minus cebwiki. Better that leaving db1015 broken and doing nothing. I have disabled notifications on db1015 just in case.  Crea...
[10:52:02] <jynus>	 I think I can create backup on /srv/tmp, and once they finish, package them (eg. tar per database on s3) and move them to /srv/backups
[10:55:00] <wikibugs>	 10DBA, 06Operations: Create less overhead on bacula jobs when dumping production databases - https://phabricator.wikimedia.org/T162789#3238240 (10jcrespo) a:03jcrespo
[10:55:11] <marostegui>	 Remember dbstore2001 might have old backups on srv/backups
[10:55:17] <marostegui>	 I would say delete them anyways
[10:59:01] <jynus>	 yeah
[10:59:10] <jynus>	 as long as they are localizable
[10:59:18] <jynus>	 and more than enough space
[10:59:24] <jynus>	 they can be left there
[11:00:02] <jynus>	 I will organize things with puppet soon
[11:07:50] <addshore>	 jynus: re your comment @ https://phabricator.wikimedia.org/T164407#3237950 it doesn't look like that is cognate related
[11:08:20] <jynus>	 I was just pasting anything anomalus I can find there
[11:08:47] <jynus>	 the backgroud being that now there is almost no innodb contention on x1
[11:12:05] <addshore>	 interesting, an no idea as to the cause?
[11:12:12] <jynus>	 nope
[11:12:17] <jynus>	 traffic change?
[11:12:32] <jynus>	 so the interesting part is how variable stuff is in x1
[11:12:44] <jynus>	 that was my main montivation of that comment
[11:14:43] <addshore>	 Innodb_mutex_spin_rounds dropped at 1:50 also (whatever that is)
[11:15:06] <jynus>	 yeah, just sometimes innodb does active wait
[11:15:26] <jynus>	 normal that if blocking goes down, spins go down too
[11:15:44] <jynus>	 that is for fine tuning innodb, and sincerely, we are not yet there :-)
[11:30:13] <wikibugs>	 10DBA, 06Operations: es2019 crashed again - https://phabricator.wikimedia.org/T149526#3238274 (10jcrespo) Left running on neodymium (I did some optimizations to ignore values below and beyond max id respectively): ``` while read db id; do echo "./compare.py es1019.eqiad.wmnet es2019.codfw.wmnet $db blobs_clust...
[11:30:49] <jynus>	 ^the thing with this is less "do you trust gtid" and more "do you trust the hw controller" to do its job
[11:45:13] <jynus>	 addshore, your answer on the ticket was exactly what I had in mind
[11:45:50] <addshore>	 decrease in query execution time? or?
[11:45:51] <jynus>	 something like bad performance created by a separate process creating overhead on cognate and making the write slow
[11:47:10] <addshore>	 any ideas on ways to track it down? is there a full list of things using x1 somewhere?
[12:34:49] <wikibugs>	 10DBA, 10Wikidata: Migrate wb_terms to using prefixed entity IDs instead of numeric IDs - https://phabricator.wikimedia.org/T114903#3238485 (10Lydia_Pintscher)
[12:34:51] <wikibugs>	 10DBA, 10Wikidata, 07Schema-change, 03Wikidata-Sprint: Evaluate how to best add a column for full entity ID to wb_terms without affecting wikidata.org users - https://phabricator.wikimedia.org/T159718#3238480 (10Lydia_Pintscher) 05Open>03Resolved a:03Lydia_Pintscher Further work in T162539.
[13:15:02] <wikibugs>	 10DBA: Drop pre_ tables - https://phabricator.wikimedia.org/T118859#1811055 (10Marostegui) For the record, they exist on:  s2 s3 s4 s5 s6 s7
[13:16:52] <wikibugs>	 10DBA: Drop pre_ tables - https://phabricator.wikimedia.org/T118859#1811055 (10jcrespo) We should backup the backup, I normally archive them for some weeks on es2001.
[14:05:39] <jynus>	 BTW, I asume you saw the new checkboxes on tendril :-)
[14:06:27] <marostegui>	 I did :)
[14:06:29] <marostegui>	 hehe
[14:06:36] <marostegui>	 Very useful XD
[14:06:55] <marostegui>	 But I believe I saw you did that during the weekend or the bank holiday and I was like: grrrr!!!
[14:06:58] <marostegui>	 but thanks!
[14:13:07] <wikibugs>	 10DBA, 05MW-1.30-release-notes, 10MediaWiki-API, 10MediaWiki-Change-tagging, and 2 others: Slow query in API list=tags - https://phabricator.wikimedia.org/T164552#3238830 (10Anomie) 05Open>03Resolved The slow query isn't in ApiQueryTags anymore, now it's in ChangeTags and hidden behind a 5-minute cache.
[15:12:26] <wikibugs>	 07Blocked-on-schema-change, 10DBA, 10Wikidata, 13Patch-For-Review, 03Wikidata-Sprint: Deploy schema change for adding term_full_entity_id column to wb_terms table - https://phabricator.wikimedia.org/T162539#3239057 (10Marostegui) db2052 is done: ``` root@neodymium:/home/marostegui/git/software/dbtools# m...
[15:13:07] <wikibugs>	 10DBA, 10Wikidata, 13Patch-For-Review, 07Schema-change: Drop the useless wb_terms keys "wb_terms_entity_type" and "wb_terms_type" on "wb_terms" table - https://phabricator.wikimedia.org/T163548#3239058 (10Marostegui) db2052 is done: ``` root@neodymium:/home/marostegui/git/software/dbtools# mysql --skip-ssl...
[15:48:05] <wikibugs>	 10DBA, 06Operations, 10ops-eqiad, 13Patch-For-Review: Reset db1070 idrac - https://phabricator.wikimedia.org/T160392#3239219 (10Marostegui) @Cmjohnson just checking if in the end you updated the idrac firmware? No pushing by any means, just checking if I need to powercycle this host next week or not. Thank...
[17:10:11] <wikibugs>	 10DBA, 10Monitoring, 07Puppet: Document performance optimization of servermon and/or puppet reporting tools - https://phabricator.wikimedia.org/T164604#3239514 (10jcrespo)
[17:11:19] <wikibugs>	 10DBA, 10Monitoring, 07Puppet: Document performance optimization of servermon and/or puppet reporting tools - https://phabricator.wikimedia.org/T164604#3239534 (10jcrespo)
[17:13:53] <wikibugs>	 10DBA, 06Operations, 10ops-eqiad, 13Patch-For-Review: Reset db1070 idrac - https://phabricator.wikimedia.org/T160392#3239556 (10Cmjohnson) I updated the firmware on db1070 and ipmitool is still not working, I compared the idrac settings via the gui with db1068 (ipmi works) and not differences between the t...
[17:16:35] <wikibugs>	 10DBA, 10Monitoring, 07Puppet: Document performance optimization of servermon and/or puppet reporting tools - https://phabricator.wikimedia.org/T164604#3239573 (10jcrespo)
[17:45:02] <wikibugs>	 10DBA, 10Monitoring, 07Documentation, 07Puppet: Document performance optimization of servermon and/or puppet reporting tools - https://phabricator.wikimedia.org/T164604#3239694 (10Reedy)
[19:34:16] <wikibugs>	 10DBA, 06Operations: Create less overhead on bacula jobs when dumping production databases - https://phabricator.wikimedia.org/T162789#3239953 (10jcrespo) It took a bit more than 11 hours to reload logically db1015 (minus cebwiki) - that is 1.3 TB (out of a total of 1.5TB for all of s3). It is now back replica...