[06:02:03] 10Blocked-on-schema-change, 10DBA, 10Patch-For-Review: Drop now redundant indexes from pagelinks and templatelinks - https://phabricator.wikimedia.org/T174509#3718538 (10Marostegui) s3 master - db1075 is done. All the shards got the redundant indexes removed. [06:02:13] 10Blocked-on-schema-change, 10DBA, 10Patch-For-Review: Drop now redundant indexes from pagelinks and templatelinks - https://phabricator.wikimedia.org/T174509#3718539 (10Marostegui) [06:04:33] 10Blocked-on-schema-change, 10DBA, 10Patch-For-Review: Drop now redundant indexes from pagelinks and templatelinks - https://phabricator.wikimedia.org/T174509#3718540 (10Marostegui) >>! In T174509#3715077, @Marostegui wrote: > The index has been removed from all s3 hosts. The only pending one is the master (... [06:09:52] elukey: db1108 still importing the last failed table, (which is huge), the rest went well, so only one pending! [06:34:07] 10DBA, 10Patch-For-Review: Migrate some s4 hosts to file per table - https://phabricator.wikimedia.org/T161088#3718553 (10Marostegui) a:03Marostegui [06:58:14] 10DBA, 10Data-Services, 10cloud-services-team (Kanban): Identify tools hosting databases on labsdb100[13] and notify maintainers - https://phabricator.wikimedia.org/T175096#3718568 (10Marostegui) I have made a backup of that directory and left it on: ``` [root@labsdb1001 06:23 /srv/tmp] # pwd /srv/tmp [roo... [07:38:02] 10DBA, 10Patch-For-Review: Support multi-instance on core hosts - https://phabricator.wikimedia.org/T178359#3718587 (10Marostegui) [07:38:52] 10DBA, 10Patch-For-Review: Productionize 22 new codfw database servers - https://phabricator.wikimedia.org/T170662#3718588 (10Marostegui) [07:46:30] 10DBA, 10Patch-For-Review: Support multi-instance on core hosts - https://phabricator.wikimedia.org/T178359#3718610 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by marostegui on neodymium.eqiad.wmnet for hosts: ``` db2087.codfw.wmnet ``` The log can be found in `/var/log/wmf-auto-reimage/2017103... [08:03:04] 10DBA, 10Patch-For-Review: Productionize 22 new codfw database servers - https://phabricator.wikimedia.org/T170662#3718616 (10Marostegui) [08:03:21] 10DBA, 10Patch-For-Review: Productionize 22 new codfw database servers - https://phabricator.wikimedia.org/T170662#3668334 (10Marostegui) [08:05:52] 10DBA, 10Patch-For-Review: Support multi-instance on core hosts - https://phabricator.wikimedia.org/T178359#3718618 (10ops-monitoring-bot) Completed auto-reimage of hosts: ``` ['db2087.codfw.wmnet'] ``` and were **ALL** successful. [08:30:18] 10Blocked-on-schema-change, 10DBA, 10Patch-For-Review: Drop now redundant indexes from pagelinks and templatelinks - https://phabricator.wikimedia.org/T174509#3718640 (10Marostegui) Optimized s5 master: dewiki.pagelinks (10G) dewiki.templatelinks (30G) wikidatawiki.templatelinks (300M) pending: wikidatawiki... [08:30:28] marostegui: \o/ [08:30:47] luckily we dropped some huge tables before doing this import [09:11:09] 10Blocked-on-schema-change, 10DBA, 10MediaWiki-extensions-WikibaseRepository, 10Wikidata, and 3 others: Deploy dropping wb_entity_per_page table - https://phabricator.wikimedia.org/T177601#3718698 (10Marostegui) 05Open>03Resolved Table has been dropped everywhere [09:11:50] 10Blocked-on-schema-change, 10DBA, 10MediaWiki-extensions-WikibaseRepository, 10Wikidata, and 3 others: Deploy dropping wb_entity_per_page table - https://phabricator.wikimedia.org/T177601#3718703 (10Marostegui) [09:42:41] marostegui: as FYI I merged a while ago https://gerrit.wikimedia.org/r/#/c/386636/ [09:42:50] (like earlier on today) [09:43:18] this should raise the consumption of some jobs like refreshlink and htmlcacheupdate [09:43:18] ah, thanks for the heads up [09:43:29] the job queue is still 10M+ [09:43:32] :( [09:50:05] 10DBA, 10Patch-For-Review: Support multi-instance on core hosts - https://phabricator.wikimedia.org/T178359#3718747 (10Marostegui) db2087:s7 is done [11:04:01] 10DBA, 10Operations, 10Ops-Access-Requests, 10Patch-For-Review, 10cloud-services-team (Kanban): Access to raw database tables on labsdb* for wmcs-admin users - https://phabricator.wikimedia.org/T178128#3718906 (10Marostegui) @madhuvishy please review this: https://gerrit.wikimedia.org/r/#/c/387214/ and i... [11:12:49] 10DBA: Database error on es.wikipedia.org in function "Revision::insertOn": "Duplicate entry for key 'PRIMARY'" - https://phabricator.wikimedia.org/T48047#532824 (10Marostegui) @Platonides I was considering to close this ticket as it is has been almost 4 years ago and looked like a one time thing? [11:17:03] 10DBA, 10Patch-For-Review: Support multi-instance on core hosts - https://phabricator.wikimedia.org/T178359#3718930 (10Marostegui) [11:17:12] 10DBA, 10Patch-For-Review: Support multi-instance on core hosts - https://phabricator.wikimedia.org/T178359#3692976 (10Marostegui) db2087 is fully ready after importing s6 [11:17:43] 10DBA, 10Patch-For-Review: Productionize 22 new codfw database servers - https://phabricator.wikimedia.org/T170662#3718934 (10Marostegui) [12:17:31] marostegui: Poke [13:27:56] 10DBA, 10Operations, 10Ops-Access-Requests, 10Patch-For-Review, 10cloud-services-team (Kanban): Access to raw database tables on labsdb* for wmcs-admin users - https://phabricator.wikimedia.org/T178128#3719383 (10madhuvishy) 05Open>03Resolved [13:30:32] marostegui: chasemp: It's 13:30 and I'm here, ready to merge https://gerrit.wikimedia.org/r/#/c/386660/ to switchover dns from 1001 to 1003 [13:31:07] madhuvishy: I am here to, but I thought the shutdown was in one hour [13:31:15] That is what my calendar says :) [13:31:45] confirmed atm [13:31:45] marostegui: yes it is, I just want to switch dns over to the other box so some of the user connections can drop from 1001 [13:31:46] root@tools-bastion-03:~# host enwiki.labsdb [13:31:46] enwiki.labsdb has address 10.64.4.11 [13:31:47] root@tools-bastion-03:~# host 10.64.4.11 [13:31:49] 11.4.64.10.in-addr.arpa domain name pointer labsdb1001.eqiad.wmnet. [13:31:53] madhuvishy: ah sure :-) [13:32:19] chasemp: cool thanks, merging now. [13:36:09] root@tools-bastion-03:~# host enwiki.labsdb [13:36:09] enwiki.labsdb has address 10.64.37.5 [13:36:09] root@tools-bastion-03:~# host 10.64.37.5 [13:36:12] 5.37.64.10.in-addr.arpa domain name pointer labsdb1003.eqiad.wmnet. [13:36:42] after 'nscd -i hosts' [13:36:49] oh cool [13:37:53] puppet didn't log restarting pdns-recursor so was wondering if that happened [13:38:42] madhuvishy: for something like this I would normally disable puppet there just to see it all happen in console [13:39:25] yeah I did run it myself on 1002 [13:44:45] chasemp: okay I ran some checks over clush, things seem to look good on tools [13:45:13] cool [13:45:25] clushing the `nscd -i hosts` purge would be good if not done already [13:45:34] there are non-Tools things that use the replicas...even legitimately [13:45:45] i'll check quary [13:45:47] maybe we should do the same clush from the labpuppetmaster clush masters [13:45:48] quarry [13:45:51] hmmm [13:45:52] sure [13:45:58] cumin [13:46:05] sorry :D [13:46:06] yes [13:46:18] :-P [13:46:54] volans you definitely have a highlight for the word cumin [13:46:57] * marostegui 100% sure now [13:47:16] I just came back and was reading backlogs [13:47:36] let's start a daily recipe sharing group and spice up volans' life :) [13:48:10] marostegui: you cannot prove it without a reasonable doubt [13:49:07] * bd808 pings on "vagrant" still which can lead to a lot of interruptions [13:49:25] madhuvishy: did you check idrac access on labsdb1001 in the end? [13:49:33] marostegui: yeah I did [13:49:37] (it doesn't work for me with the normal idrac password) [13:49:50] marostegui: you have to log in as admin instead of root [13:50:08] ah right [13:50:11] https://wikitech.wikimedia.org/wiki/Cisco_UCS_C250_M1#Connecting [13:50:11] works now [13:50:11] thank you! [13:50:28] yeah, i forgot those two are special... [13:50:32] can't wait to get rid of them :) [13:50:36] thanks [13:51:41] marostegui: he he np, the date is coming soon :) [14:00:39] chasemp: cumin-ed out to 715 hosts ;) dns looks good [14:09:58] madhuvishy: when do you want to me to stop replication and start shutting down mysql? [14:10:57] marostegui: let's start in 20 minutes? That was the time we announced [14:11:04] sounds good [14:11:11] Cool [14:28:33] 10DBA, 10Operations, 10ops-eqiad: decommission db1018 - https://phabricator.wikimedia.org/T176215#3719649 (10Cmjohnson) [14:28:45] 10DBA, 10Operations, 10ops-eqiad: decommission db1018 - https://phabricator.wikimedia.org/T176215#3617573 (10Cmjohnson) Server has been wiped and removed from rack...racktables updated [14:29:03] 10DBA, 10Operations, 10ops-eqiad: decommission db1018 - https://phabricator.wikimedia.org/T176215#3719653 (10Cmjohnson) 05Open>03Resolved [14:29:06] 10DBA, 10Operations, 10Patch-For-Review: Decomissions old s2 eqiad hosts (db1018, db1021, db1024, db1036) - https://phabricator.wikimedia.org/T162699#3719654 (10Cmjohnson) [15:15:56] 10DBA, 10Operations, 10cloud-services-team, 10Patch-For-Review, 10Scoring-platform-team (Current): Labsdb* servers need to be rebooted - https://phabricator.wikimedia.org/T168584#3719840 (10Marostegui) [15:54:30] elukey: still importing the big big big table [15:54:44] uffff [15:55:45] it is a massive table [15:55:54] let's hope it will be done by tomorrow [15:56:23] i started its import yesterday around 2pm i think, so a bit over 24h now for that table (it is the last one) [16:49:24] is the name of the table in the screen? [16:49:35] no, you can see it on [16:49:39] (give me a sec) [16:49:47] mysql --skip-ssl -e "show processlist;" -S /run/mysqld/mysql.sock [16:50:00] you will see the root line inserting [16:50:02] ahh those are the ones in /srv/tmp/db1047_failed_tables [16:50:05] yeah [16:50:07] but only one pending [16:50:13] INSERT INTO `MediaViewer_10867062_15423246` [16:50:25] That one was huge [16:50:50] there were worst ones :P [16:50:52] Rows: 1147639343 [16:50:55] like 500GB ones [16:51:07] that is a show table status, so not super accurated [16:51:09] from db1047 [16:53:12] marostegui: I just discovered the pagelinks table [16:53:18] uh? [16:53:24] and what refreshlink does [16:53:29] ah [16:53:30] XDDDD [16:53:51] now I get why you guys need to be pinged when a huge amount of those jobs start [16:54:08] yeah, it is a pain [16:55:09] I thought it was more related to wikitext etc.. (the refreshlinks) [16:55:20] but no, my understanding was completely wrong [17:02:51] marostegui: chasemp bd808 labsdb1001 seems to still be doing okay, my test queries from tools all did fine. I'm going to revert my dns patch [17:03:05] kk [17:24:25] 10DBA, 10Operations, 10cloud-services-team, 10Patch-For-Review, 10Scoring-platform-team (Current): Labsdb* servers need to be rebooted - https://phabricator.wikimedia.org/T168584#3720347 (10madhuvishy) The 1001 reboot is all done. Notes from my planning etherpad: labsdb1001 (Planned for Oct 30 2017 14:3...