[01:03:05] 10DBA, 10Wikimedia-Site-requests, 07Tracking: Database table cleanup (tracking) - https://phabricator.wikimedia.org/T18660#3219862 (10Krinkle) [01:04:56] 10DBA, 06Labs, 10MediaWiki-General-or-Unknown: MW database: user.user_editcount shows a wrong value - https://phabricator.wikimedia.org/T134359#3219865 (10Krinkle) [03:08:23] 10DBA, 10AbuseFilter, 06Performance-Team, 05MW-1.29-release (WMF-deploy-2017-04-25_(1.29.0-wmf.21)), 13Patch-For-Review: AFComputedVariable::compute query timeouts - https://phabricator.wikimedia.org/T116557#3219907 (10Krinkle) 05Open>03Resolved a:03Krinkle Looking at the last 7 days and 24 hours i... [05:03:26] 10DBA, 07Epic, 13Patch-For-Review, 05codfw-rollout: Database maintenance scheduled while eqiad datacenter is non primary (after the DC switchover) - https://phabricator.wikimedia.org/T155099#3219946 (10Marostegui) [05:03:29] 10DBA, 13Patch-For-Review: Network maintenance on row D (databases) - https://phabricator.wikimedia.org/T162681#3219943 (10Marostegui) 05Open>03Resolved a:03Marostegui This was all done and nothing else is pending [05:03:36] 10DBA, 13Patch-For-Review: Network maintenance on row D (databases) - https://phabricator.wikimedia.org/T162681#3219948 (10Marostegui) a:05Marostegui>03None [05:05:49] 10DBA, 06Operations, 10ops-eqiad: Move masters away from D1 in eqiad? - https://phabricator.wikimedia.org/T163895#3219949 (10Marostegui) Yes but that's only changing db-eqiad and db-codfw as we normally do when moving a server, as far as I know. [05:51:59] 10DBA, 06Operations, 10Phabricator, 10ops-eqiad: db1048 BBU Faulty - slave lagging - https://phabricator.wikimedia.org/T160731#3219986 (10Marostegui) @Cmjohnson thanks. Let me coordinate this and we will arrange one day to do the swap. @mmodell is there any problem if we take db1048 down for a few minutes... [05:53:07] 10DBA, 06Operations, 10ops-eqiad: db1047 BBU RAID issues (was: Investigate db1047 replication lag) - https://phabricator.wikimedia.org/T159266#3219987 (10Marostegui) @Cmjohnson it was supposed to be killed soon, but @Ottomata believes it will take a bit longer, so maybe it is worth replacing the BBU. @Ottom... [06:24:23] 07Blocked-on-schema-change, 10DBA, 10Wikidata, 03Wikidata-Sprint: Deploy schema change for adding term_full_entity_id column to wb_terms table - https://phabricator.wikimedia.org/T162539#3219991 (10Marostegui) db1049 is done: ``` root@neodymium:~# mysql --skip-ssl -hdb1049 wikidatawiki -e "show create tabl... [06:26:22] 10DBA, 10Wikidata, 13Patch-For-Review, 07Schema-change: Drop the useless wb_terms keys "wb_terms_entity_type" and "wb_terms_type" on "wb_terms" table - https://phabricator.wikimedia.org/T163548#3219992 (10Marostegui) db1045 is done: ``` root@neodymium:/home/marostegui/git/software/dbtools# mysql --skip-ssl... [06:26:51] 10DBA, 06Operations, 13Patch-For-Review: eqiad rack/setup 11 new DB servers - https://phabricator.wikimedia.org/T162233#3219993 (10Marostegui) Hi Chris, We will take it from here yes. Thanks for getting all this sorted for us! [06:51:56] 10DBA, 13Patch-For-Review: Rampant differences in indexes on enwiki.revision across the DB cluster - https://phabricator.wikimedia.org/T132416#3220047 (10Marostegui) labsdb1011 is done: ``` root@labsdb1011:~# mysql --skip-ssl enwiki -e "show create table revision\G" *************************** 1. row *********... [06:53:03] 10DBA, 13Patch-For-Review: Rampant differences in indexes on enwiki.revision across the DB cluster - https://phabricator.wikimedia.org/T132416#3220048 (10Marostegui) This task is almost done. All the core + labs hosts are done on both dcs. The only pending host is dbstore1001 which will be done after the dc sw... [07:45:10] 10DBA, 06Operations, 13Patch-For-Review: eqiad rack/setup 11 new DB servers - https://phabricator.wikimedia.org/T162233#3220057 (10Marostegui) Hey @Cmjohnson I have tried to install 3 servers to just make sure they worked fine and we didn't miss anything. And also to make sure we at least have 3 for the swi... [07:48:02] 10DBA, 06Operations, 13Patch-For-Review: eqiad rack/setup 11 new DB servers - https://phabricator.wikimedia.org/T162233#3220059 (10Marostegui) [08:07:40] thanks for fixing 1003 yesterday night [08:17:54] 10DBA, 06Operations, 10Phabricator, 10ops-eqiad: db1048 BBU Faulty - slave lagging - https://phabricator.wikimedia.org/T160731#3220083 (10jcrespo) There are some reports running on the slave- We should point the slave to the master to avoid activity there thought the dns alias. [08:31:34] 10DBA, 10AbuseFilter, 06Performance-Team, 05MW-1.29-release (WMF-deploy-2017-04-25_(1.29.0-wmf.21)), 13Patch-For-Review: AFComputedVariable::compute query timeouts - https://phabricator.wikimedia.org/T116557#3220140 (10jcrespo) Indeed: {F7807936} [08:33:21] all s6 slaves seem to be up to date [08:33:52] the old ones on s3, however, are very slow [08:34:25] it will take more than a day for then? [08:34:33] because you started yesterday morning? [08:34:44] I started 2 days ago [08:35:02] oh wow, which wiki are they doing now? [08:35:04] it just finished in 24 hours, 1 day ago [08:35:29] so it was a good decision to not do everthing [08:36:02] mlwiki [08:36:08] yeah, i agree, as long as we do the master, then it is a matter of pool/depool [08:36:13] oh, still ml! [08:36:18] so another day maybe to finish? [08:36:20] if we had one extra week ok [08:36:26] in the middle [08:36:30] more or less [08:36:38] so yes, 1 extra day + catchup [08:36:42] in the end we only have 7 days [08:37:14] I did not start the masters yesterday because I was too tired plus the upcoming movement [08:37:27] that was a good decision [08:37:31] in fact i woke up at 4am [08:37:33] and checked tendril [08:37:37] to see if it was running there [08:37:43] because i thought of the move [08:37:56] yeah [08:38:10] but you shouldn't be worried about those things [08:38:25] the conservative option is to wait [08:38:44] yesterday night when I saw the ticket about the move I was like: oh crap we have to reconfigure all the slaves, and then 20 minuytes after I realise we have not, i was too tired to remember even that we do not replicate from ips [08:39:06] yeah, while dns is not ok for connecting [08:39:21] replication only uses the dns one, and maintains the connection up [08:39:33] that is why dns and ssl are ok [08:39:41] yeah, I was afraid we'd go late in the evening again today reconfiguring everything [08:39:46] but thankfully not [08:39:55] just change the php files and we should be good to [08:39:57] to go [08:40:13] which makes me realize that mediawikis using the tls certs will not be ok :-/ [08:40:30] because they connect to ips, not domains [08:40:49] yeah, but they should not connect to any eqiad server [08:40:50] no? [08:40:56] no no [08:41:04] I am not talking now [08:41:09] ah in general right [09:31:37] what do you think of db1040 backup? [09:32:14] did it finish the md5? [09:32:17] should we dump it? Is it ok for it to get stale? [09:32:18] yes [09:32:30] it got transferred well [09:32:38] do you know what we can do? place it on one of the new hosts [09:32:42] (that got installed) [09:32:56] ok [09:32:59] I can do that [09:33:12] so we leave it replicating [09:33:14] in case we need it [09:33:26] and we actually could have that host ready for s4 :) [09:33:31] and we decomission db1040 for real [09:33:34] yeah [09:33:45] without losing any data from it [09:35:34] 10DBA: Decommission db1040 - https://phabricator.wikimedia.org/T164057#3220228 (10jcrespo) [09:36:33] 10DBA, 06Operations, 13Patch-For-Review: Decommission old coredb machines (<=db1050) - https://phabricator.wikimedia.org/T134476#2266762 (10jcrespo) [09:36:35] 10DBA, 05codfw-rollout: db1022 broke while changing topology on s6- evaluate if to fix or directly decomission - https://phabricator.wikimedia.org/T163778#3220243 (10jcrespo) [09:37:00] do you have a server ready for me? [09:37:08] yes I do [09:37:10] db1097 [09:38:26] taking it [09:43:51] so what do you think, we should go for multi-instance rather than multi source for the places we can? [09:44:16] sanitarium, dbstore1001, dbstore2001 [09:44:27] i think so yes [09:44:44] i still believe it is easier to manage or to isolate issues [09:44:55] easier to clone [09:45:22] yeah, and if we corrupt innodb, it is only one instance [09:46:06] we can start trying that for dbstore1001 [09:46:15] not now or soon [09:47:09] yeah [09:47:12] or 2002 [10:20:04] 10DBA: Convert unique keys into primary keys for some wiki tables on s6 - https://phabricator.wikimedia.org/T163979#3220355 (10jcrespo) [10:20:05] 07Blocked-on-schema-change, 10DBA: Convert unique keys into primary keys for some wiki tables on s3 - https://phabricator.wikimedia.org/T163912#3220356 (10jcrespo) [10:20:08] 10DBA, 07Epic, 13Patch-For-Review, 05codfw-rollout: Database maintenance scheduled while eqiad datacenter is non primary (after the DC switchover) - https://phabricator.wikimedia.org/T155099#3220354 (10jcrespo) [10:21:45] 10DBA, 07Epic, 13Patch-For-Review, 05codfw-rollout: Database maintenance scheduled while eqiad datacenter is non primary (after the DC switchover) - https://phabricator.wikimedia.org/T155099#2933574 (10jcrespo) [10:21:49] 10DBA, 10MediaWiki-Database, 13Patch-For-Review, 07PostgreSQL, 07Schema-change: Some tables lack unique or primary keys, may allow confusing duplicate data - https://phabricator.wikimedia.org/T17441#3220364 (10jcrespo) [10:21:51] 10DBA: Convert unique keys into primary keys for some wiki tables on s6 - https://phabricator.wikimedia.org/T163979#3216957 (10jcrespo) 05Open>03Resolved This is now done, no errors, no replication lag (only db1093 was skipped some events as it as already deployed there). [10:22:36] 10DBA, 06Operations, 10ops-eqiad: Move masters away from D1 in eqiad? - https://phabricator.wikimedia.org/T163895#3220367 (10jcrespo) [10:22:38] 10DBA, 07Epic, 13Patch-For-Review, 05codfw-rollout: Database maintenance scheduled while eqiad datacenter is non primary (after the DC switchover) - https://phabricator.wikimedia.org/T155099#3220366 (10jcrespo) [10:22:53] 10DBA, 06Operations, 10ops-eqiad: Move masters away from D1 in eqiad - https://phabricator.wikimedia.org/T163895#3213695 (10jcrespo) [10:39:54] 10DBA, 06Labs: Prepare and check storage layer for wbwikimedia - https://phabricator.wikimedia.org/T162513#3220405 (10jcrespo) @chasemp @Andrew This has been redacted on sanitarium (you have to be blocked by me first doing that before touching new wikis), it is stil pending on sanitarium2. Hold for now. [10:41:07] 10DBA, 10Wikimedia-Site-requests, 13Patch-For-Review: Recreate a wiki for Wikimedia Portugal - https://phabricator.wikimedia.org/T126832#3220407 (10jcrespo) @Dereckson- is this a public wiki, should it be replicated to labs? [10:42:12] 10DBA, 06Operations, 13Patch-For-Review: eqiad rack/setup 11 new DB servers - https://phabricator.wikimedia.org/T162233#3220411 (10Marostegui) So I did another round: db1096 -> installed The ones that we would need @Cmjohnson to check (no need to be done this week!) db1098 -> after attempting pxe boot: bl... [10:42:31] 10DBA, 06Operations, 13Patch-For-Review: eqiad rack/setup 11 new DB servers - https://phabricator.wikimedia.org/T162233#3220412 (10Marostegui) [10:46:33] 10DBA, 06MediaWiki-Platform-Team, 10MediaWiki-extensions-Linter, 10Wikimedia-Extension-setup, and 4 others: Review and deploy Linter extension to Wikimedia wikis - https://phabricator.wikimedia.org/T148609#3220416 (10jcrespo) [11:07:22] 10DBA, 06Operations, 10ops-eqiad: Move masters away from D1 in eqiad - https://phabricator.wikimedia.org/T163895#3220434 (10Marostegui) I have downtimed all the slaves in s5,s6 and s7 for 10 hours. [11:08:25] I will start the pk alter on s1, s2 and s4 masters, then [11:08:44] great [11:10:36] db1097 is decompressing [11:10:56] go SSds go!! [11:11:11] I think it will take 3 hours [11:11:18] because that "nice" compression I did [11:12:12] what did you use? [11:13:16] https://jnovy.fedorapeople.org/pxz/ [11:14:01] ah that one! [11:14:19] it is quite popular, isn't it? I have never tried myself [11:14:30] And given you results I doubt I will [12:59:12] xz is a great format [12:59:40] well, the format doesn't matter, the algorithm is great [12:59:55] but it is not for real-time processing and streaming, I suppose [13:00:13] at least not for our needs i guess [13:00:18] https://dbahire.com/which-compression-tool-should-i-use-for-my-database-backups/ [13:00:25] it could be ok for long-term backups [13:01:19] probably pxz doesn't work ok for streaming [13:01:26] I didn't see many cores being used [13:01:40] so it may require local only to be truly efficient [13:04:38] i am reading your article [13:04:46] don't waste your time [13:04:49] :-) [13:05:00] it is interesting that the compress ratio is you exclude the l* tools, it is pretty pretty similar [13:05:04] there is not much gain/loss [13:05:48] note that it is new_size/original_size [13:06:13] compare sizes [13:06:35] a 1% can be a huge difference between 1 and 2% [13:06:40] it is half the size [13:08:41] and for example for backups, compresion time is not a huge deal [13:08:55] decompression time is: http://dbahire.com/which-compression-tool-should-i-use-for-my-database-backups-part-ii-decompression/ [13:09:54] haha pigz vs bzip2 [13:10:18] no paralel decompression aparently for bzip2 [13:11:30] that is when I started being a fan of pigz, a nice speed/ratio trade-off [13:23:20] 10DBA, 05codfw-rollout: db1022 broke while changing topology on s6- evaluate if to fix or directly decomission - https://phabricator.wikimedia.org/T163778#3220727 (10Marostegui) I would place one of the new servers and decommission this one as soon as we can. [13:52:45] 10DBA, 06Operations, 10ops-eqiad, 13Patch-For-Review: Move masters away from D1 in eqiad - https://phabricator.wikimedia.org/T163895#3220785 (10Marostegui) [14:30:26] 10DBA, 06Analytics-Kanban: Puppetize Piwik's Database and set up periodical backups - https://phabricator.wikimedia.org/T164073#3220920 (10elukey) [14:30:55] 10DBA, 06Operations, 10ops-eqiad: db1047 BBU RAID issues (was: Investigate db1047 replication lag) - https://phabricator.wikimedia.org/T159266#3220924 (10Ottomata) Not difficult at all. I think this server is not used often, only really when there are issues with dbstore1002. @cmjohnson, let me know what d... [14:34:31] hello people, I added the DBA tag for my task about Piwik's database (that currently needs some love) [14:35:07] lol [14:35:14] yeah I know :) [14:35:16] after you told me to "go away" [14:35:27] now you have to wait on the queue :-) [14:36:13] this is me reaching out to DBA, doing it by myself :) [14:37:14] my point would be to set up something that follows best practices, without impacting on your ops load [14:37:27] link [14:37:29] I'd rather ask for help than make a big mistake [14:37:38] https://phabricator.wikimedia.org/T164073 [14:37:51] atm I am exploring options [14:37:54] ah [14:37:59] it is a different ticket [14:38:12] we don't do that [14:38:13] yes the other one was probably "too many 500s for piwik" [14:38:37] there is a standarized way to do backups, which is bacula [14:38:48] and a script that does everthing for you [14:38:55] so you only need to add a certain class [14:39:12] (forgot about bacula you are right) [14:39:23] don't setup your own system [14:40:06] jynus: but this system is 60G and single host... I guess the "standard" way will create downtime [14:40:17] backups, downtime? [14:40:21] lock [14:40:27] do you live in 2001? [14:40:34] lol [14:40:35] rotfl [14:40:40] ahhahahaah [14:40:53] depends what kind of backup your magic script does :D [14:41:12] there is 2 options- I would recomend mysqldump, it is longer to create and recover [14:41:19] but smaller to store and more flexible [14:41:48] we still do mysqldump for 2TB databases, and now it is in the limit of usefulness [14:42:00] but it probably will work well for you [14:42:19] it is mostly lock-free, if you use InnoDB [14:42:45] if you need instant recovery, go for a slave [14:43:19] atm we don't have instant recovery, but I'd need to save data once in a while somewhere [14:43:31] yeah, that is bacula [14:43:59] now, we are a bit low on space (there is an upgrade coming next year) [14:44:14] so make sure you sync with alex too for requirements [14:44:26] but as I said, the classes do everyhing for you [14:44:30] super, I'll add Operations and Alex to the task [14:44:39] look at the backup slides from ops sessions [14:44:47] and give a look at dbstore1001 [14:44:53] * elukey takes notes [14:45:08] which with a single class it runs a backup every week [14:45:48] 10DBA, 07Epic, 13Patch-For-Review, 05codfw-rollout: Database maintenance scheduled while eqiad datacenter is non primary (after the DC switchover) - https://phabricator.wikimedia.org/T155099#3220986 (10Marostegui) [14:45:51] 10DBA, 06Operations, 10ops-eqiad, 13Patch-For-Review: Move masters away from D1 in eqiad - https://phabricator.wikimedia.org/T163895#3220983 (10Marostegui) 05Open>03Resolved a:03Cmjohnson This has all been completed. The masters have now slaves connected to them again: ``` root@neodymium:/home/maro... [14:48:07] 10DBA, 06Analytics-Kanban, 06Operations: Puppetize Piwik's Database and set up periodical backups - https://phabricator.wikimedia.org/T164073#3220992 (10elukey) [14:49:18] 10DBA, 06Analytics-Kanban, 06Operations: Puppetize Piwik's Database and set up periodical backups - https://phabricator.wikimedia.org/T164073#3220845 (10elukey) @akosiaris: After a chat with Jaime I'd like to explore the possibility of using bacula, but I was told to double check with you requirements. Do yo... [14:50:40] 07Blocked-on-schema-change, 10DBA, 07Schema-change: Drop externallinks.el_from_namespace on wmf databases - https://phabricator.wikimedia.org/T114117#3220997 (10Umherirrender) [14:50:55] elukey, as a test [14:51:09] check that all your tables are transactional [14:51:53] and you can run mysqldump --single-transaction to check it really is block-free and how much time it takes to be done, space, and to be recovered elsewhere [14:52:06] 07Blocked-on-schema-change, 06Collaboration-Team-Triage, 10MediaWiki-extensions-PageCuration, 07Schema-change: Drop ptrl_comment in production - https://phabricator.wikimedia.org/T157762#3221003 (10Umherirrender) [14:56:16] jynus: ack thanks [14:56:37] 07Blocked-on-schema-change, 10DBA, 07Schema-change: Dropping rc_moved_to_title/rc_moved_to_ns on wmf databases - https://phabricator.wikimedia.org/T51191#3221025 (10Umherirrender) [15:01:33] 07Blocked-on-schema-change, 10DBA, 07Schema-change: Apply enum changes to (img|oi|fa)_major_mime on production - https://phabricator.wikimedia.org/T72005#3221027 (10Umherirrender) [15:04:17] 07Blocked-on-schema-change, 10DBA, 06Collaboration-Team-Triage, 10Flow, and 3 others: Drop flow_subscription table - https://phabricator.wikimedia.org/T149936#3221045 (10Umherirrender) [15:08:24] 10DBA, 06Collaboration-Team-Triage, 10Flow, 05MW-1.29-release (WMF-deploy-2016-11-29_(1.29.0-wmf.4)), and 2 others: Drop flow_subscription table - https://phabricator.wikimedia.org/T149936#3221097 (10jcrespo) Those are not considered schema changes for us, at least not for now. We use tracking task T54921... [15:12:38] 10DBA, 07Schema-change: Dropping rc_moved_to_title/rc_moved_to_ns on wmf databases - https://phabricator.wikimedia.org/T51191#3221114 (10jcrespo) [15:12:48] 10DBA, 07Schema-change: Dropping rc_moved_to_title/rc_moved_to_ns on wmf databases - https://phabricator.wikimedia.org/T51191#553101 (10jcrespo) [15:12:51] 10DBA, 07Epic, 07Tracking: Database tables to be dropped on Wikimedia wikis and other WMF databases (tracking) - https://phabricator.wikimedia.org/T54921#3221129 (10jcrespo) [15:13:24] 10DBA, 06Analytics-Kanban, 06Operations: Puppetize Piwik's Database and set up periodical backups - https://phabricator.wikimedia.org/T164073#3221132 (10akosiaris) Depends on how often you want it backed up and the rate of growth. So mysql needs to be dumped in some way before it is backed up as backing up... [15:13:38] 07Blocked-on-schema-change, 10DBA, 07Schema-change: Dropping rc_moved_to_title/rc_moved_to_ns on wmf databases - https://phabricator.wikimedia.org/T51191#553101 (10jcrespo) Sorry, I was wrong on my latest edit, this is an index drop. [15:13:45] 07Blocked-on-schema-change, 10DBA, 07Schema-change: Dropping rc_moved_to_title/rc_moved_to_ns on wmf databases - https://phabricator.wikimedia.org/T51191#3221135 (10jcrespo) [15:13:48] 10DBA, 07Epic, 07Tracking: Database tables to be dropped on Wikimedia wikis and other WMF databases (tracking) - https://phabricator.wikimedia.org/T54921#2428858 (10jcrespo) [15:16:42] 07Blocked-on-schema-change, 10DBA, 07Schema-change: Apply enum changes to (img|oi|fa)_major_mime on production - https://phabricator.wikimedia.org/T72005#3221141 (10jcrespo) p:05Normal>03Low > Not exactly a rush [15:18:07] 10DBA, 06Collaboration-Team-Triage, 10Flow, 05MW-1.29-release (WMF-deploy-2016-11-29_(1.29.0-wmf.4)), and 2 others: Drop flow_subscription table - https://phabricator.wikimedia.org/T149936#3221143 (10jcrespo) p:05Triage>03Low > It's unused [15:23:47] 10DBA, 06Analytics-Kanban, 06Operations: Puppetize Piwik's Database and set up periodical backups - https://phabricator.wikimedia.org/T164073#3220845 (10jcrespo) I agree with most of things said, and I actually mentioned some of those to luka on IRC. BTW, for the record- the best way to move forward regardi... [16:12:07] 10DBA, 06Operations, 10Phabricator, 10ops-eqiad: db1048 BBU Faulty - slave lagging - https://phabricator.wikimedia.org/T160731#3221343 (10mmodell) @Marostegui: correct, phabricator isn't currently querying the slave, other than the reports mentioned by @jcrespo. [16:17:28] 10DBA, 06Operations, 10ops-eqiad: db2062 (s7 master eqiad) in a reboot cycle - https://phabricator.wikimedia.org/T164092#3221380 (10jcrespo) [16:22:32] 10DBA, 06Operations, 10ops-eqiad: db2062 (s7 master eqiad) in a reboot cycle - https://phabricator.wikimedia.org/T164092#3221420 (10jcrespo) Moving eqiad master service back to db1041. [16:24:50] 10DBA, 06Operations, 10ops-eqiad: db2062 (s7 master eqiad) in a reboot cycle - https://phabricator.wikimedia.org/T164092#3221433 (10jcrespo) [18:18:16] jynus: raid battery [18:23:13] jynus: i have a spare [18:23:19] swapping it now [16:30:24] 10DBA, 06Community-Tech, 10MediaWiki-User-blocking: Do test queries for range contributions to gauge performance of using different tables - https://phabricator.wikimedia.org/T156318#3221478 (10MusikAnimal) @jcrespo Sorry to bug you. I'm guessing I will not be given the necessary rights to `FLUSH STATUS`on d... [16:31:52] 10DBA, 06Operations, 10ops-eqiad: db1047 BBU RAID issues (was: Investigate db1047 replication lag) - https://phabricator.wikimedia.org/T159266#3221486 (10Cmjohnson) @ottomata Let's schedule for Wednesday next week @10am EST. [16:32:25] 10DBA, 06Operations, 10ops-eqiad: db1062 (s7 master eqiad) in a reboot cycle - https://phabricator.wikimedia.org/T164092#3221491 (10Cmjohnson) [16:32:37] 10DBA, 06Community-Tech, 10MediaWiki-User-blocking: Do test queries for range contributions to gauge performance of using different tables - https://phabricator.wikimedia.org/T156318#3221492 (10jcrespo) you do not need flush status, as long as you run them as the first query of the connection- just connect,... [16:34:38] 10DBA, 06Operations, 10ops-eqiad: db1062 (s7 master eqiad) in a reboot cycle - https://phabricator.wikimedia.org/T164092#3221499 (10jcrespo) No errors on the last boot, but I would like to confirm by restarting it once more. I am doing that. [16:34:41] 10DBA, 10Wikimedia-Site-requests, 13Patch-For-Review: Recreate a wiki for Wikimedia Portugal - https://phabricator.wikimedia.org/T126832#3221498 (10Dereckson) Yes, this is a public-facing wiki, to be replicated, yes (if we replicate .wikimedia.org chapter wikis too — check if bewikimedia is replicated). [16:35:46] 10DBA, 10Wikimedia-Site-requests, 13Patch-For-Review: Recreate a wiki for Wikimedia Portugal - https://phabricator.wikimedia.org/T126832#3221503 (10jcrespo) We do it normally as long as it is not private (the text written there is open to the internet). [16:42:54] 10DBA, 06Operations, 10Phabricator, 10ops-eqiad: db1048 BBU Faulty - slave lagging - https://phabricator.wikimedia.org/T160731#3221522 (10Marostegui) Great, we can change the DNS and that's should be it! Thanks! [16:54:01] 10DBA, 13Patch-For-Review, 07Performance, 07Wikimedia-Incident: Reduce max execution time of interactive queries or a better detection and killing of bad query patterns - https://phabricator.wikimedia.org/T160984#3221535 (10greg) [17:12:18] 10DBA, 06Operations, 10ops-eqiad: db1062 (s7 master eqiad) in a reboot cycle - https://phabricator.wikimedia.org/T164092#3221587 (10Marostegui) >>! In T164092#3221420, @jcrespo wrote: > Moving eqiad master service back to db1041. This might be confusing, should we specify that it was never done? [17:13:16] 10DBA, 06Operations, 10ops-eqiad: db1062 (s7 master eqiad) in a reboot cycle - https://phabricator.wikimedia.org/T164092#3221589 (10jcrespo) You already did- I was doing it when Chris asked me to wait on IRC. [18:11:23] 10DBA, 13Patch-For-Review, 07Performance, 07Wikimedia-Incident: Reduce max execution time of interactive queries or a better detection and killing of bad query patterns - https://phabricator.wikimedia.org/T160984#3221722 (10jcrespo) 05Open>03Resolved This has been slowly deployed to all mediawiki produ... [18:26:22] 10DBA, 10Wikimedia-Site-requests, 13Patch-For-Review: Recreate a wiki for Wikimedia Portugal - https://phabricator.wikimedia.org/T126832#3221746 (10Dereckson) Okay, so yes, we can replicate ptwikimedia. [18:46:40] 10DBA, 10Wikimedia-Site-requests, 13Patch-For-Review: Recreate a wiki for Wikimedia Portugal - https://phabricator.wikimedia.org/T126832#3221800 (10jcrespo) done, views will be created at: T164103 [18:48:25] 10DBA, 06Labs, 13Patch-For-Review: Prepare and check storage layer for pa.wikisource - https://phabricator.wikimedia.org/T160859#3221813 (10jcrespo) [18:49:04] 10DBA, 06Labs: Prepare and check storage layer for dty.wikipedia.org - https://phabricator.wikimedia.org/T162102#3221818 (10jcrespo) [18:49:36] 10DBA, 06Labs: Prepare and check storage layer for wbwikimedia - https://phabricator.wikimedia.org/T162513#3221826 (10jcrespo) [18:53:51] 10DBA, 06Community-Tech, 10MediaWiki-User-blocking: Do test queries for range contributions to gauge performance of using different tables - https://phabricator.wikimedia.org/T156318#3221834 (10MusikAnimal) >>! In T156318#3221492, @jcrespo wrote: > you do not need flush status, as long as you run them as the... [18:56:17] jynus: thanks for https://phabricator.wikimedia.org/T164103 -- I'm not sure if these all a security team sign off somewhere or not [18:56:48] up to you [18:57:20] I have summarized them for you because I have done the job to see which where pending [18:57:28] which caused issues yesterday [18:57:34] * chasemp nods [18:57:35] thanks [19:13:59] 10DBA, 06Operations: db1063 io (s5 master eqiad) performance is bad - https://phabricator.wikimedia.org/T164107#3221912 (10jcrespo) [19:15:32] 10DBA, 06Operations: db1063 io (s5 master eqiad) performance is bad - https://phabricator.wikimedia.org/T164107#3221926 (10jcrespo) [19:16:23] 10DBA, 06Operations: db1063 io (s5 master eqiad) performance is bad - https://phabricator.wikimedia.org/T164107#3221912 (10jcrespo) ``` SET GLOBAL innodb_flush_log_at_trx_commit=0; SET GLOBAL sync_binlog=0; ``` Seems to be helping. I had tried disabling semi_sync replication, but that didn't work. [19:20:09] 10DBA, 06Operations: db1063 io (s5 master eqiad) performance is bad - https://phabricator.wikimedia.org/T164107#3221935 (10jcrespo) Oh, I got it: ``` Default Cache Policy: WriteBack, ReadAheadNone, Direct, No Write Cache if Bad BBU Current Cache Policy: WriteThrough, ReadAheadNone, Direct, No Write Cache if Ba... [19:26:41] 10DBA, 06Operations: db1063 io (s5 master eqiad) performance is bad - https://phabricator.wikimedia.org/T164107#3221962 (10jcrespo) The only reason I can see is: ``` Temperature: 78 C Temperature : High ``` while on db1062 I see: ``` Temperature: 47 C Temperature... [19:46:34] 10DBA, 06Community-Tech, 10MediaWiki-User-blocking: Do test queries for range contributions to gauge performance of using different tables - https://phabricator.wikimedia.org/T156318#3222081 (10jcrespo) > so I'm hoping you can copy the existing I can do that, but not probably in the next 2 weeks- there is... [19:53:20] 10DBA, 06Operations: db1063 io (s5 master eqiad) performance is bad - https://phabricator.wikimedia.org/T164107#3222087 (10jcrespo) On boot: ``` megacli -AdpBbuCmd -GetBbuStatus -a0 | grep Temperature Temperature: 64 C Temperature : OK ``` [19:53:32] 10DBA, 06Operations: db1063 io (s5 master eqiad) performance is bad - https://phabricator.wikimedia.org/T164107#3222088 (10jcrespo) ``` $ cat /sys/class/thermal/thermal_zone*/temp 61000 60000 ``` [20:00:18] 10DBA, 06Operations: db1063 io (s5 master eqiad) performance is bad - https://phabricator.wikimedia.org/T164107#3222112 (10jcrespo) This is now ok, but it is getting hotter: ``` $ megacli -LDInfo -L0 -a0 | grep "Cache Policy:" Default Cache Policy: WriteBack, ReadAheadNone, Direct, No Write Cache if Bad BBU Cu... [20:03:11] 10DBA, 06Operations: db1063 io (s5 master eqiad) performance is bad - https://phabricator.wikimedia.org/T164107#3222123 (10Marostegui) labsdb1011 which is in the same rack: ``` Controller Temperature (C): 60 ``` [20:05:00] 10DBA, 06DC-Ops, 06Operations: db1063 thermal issues (was: db1063 io (s5 master eqiad) performance is bad) - https://phabricator.wikimedia.org/T164107#3222137 (10jcrespo) [20:10:15] 10DBA, 06DC-Ops, 06Operations: db1063 thermal issues (was: db1063 io (s5 master eqiad) performance is bad) - https://phabricator.wikimedia.org/T164107#3222162 (10jcrespo) I have forced: ``` megacli -LDSetProp -ForcedWB -Immediate -Lall -aAll ``` The server will get fried, but at least we won't have lag. [20:25:52] 10DBA, 06DC-Ops, 06Operations: db1063 thermal issues (was: db1063 io (s5 master eqiad) performance is bad) - https://phabricator.wikimedia.org/T164107#3222187 (10Marostegui) Fans and the other sensors look fine though: ``` 12 | Fan1 RPM | Fan | 3960.00 | RPM | 'OK' 13 | Fa... [20:56:49] 10DBA, 06DC-Ops, 06Operations: db1063 thermal issues (was: db1063 io (s5 master eqiad) performance is bad) - https://phabricator.wikimedia.org/T164107#3222282 (10Marostegui) There is nothing on the controllers' log apart from the automatic switch to WriteThrough when it first detected the BBU temp was high:... [21:33:00] 10DBA, 06DC-Ops, 06Operations: db1063 thermal issues (was: db1063 io (s5 master eqiad) performance is bad) - https://phabricator.wikimedia.org/T164107#3222417 (10jcrespo) ```lines=10 MCE 0 CPU 2 THERMAL EVENT TSC 3f67e99385dbc7 TIME 1492766490 Fri Apr 21 09:21:30 2017 Processor 2 heated above trip temperatu... [21:33:42] 10DBA, 06DC-Ops, 06Operations: db1063 thermal issues (was: db1063 io (s5 master eqiad) performance is bad) - https://phabricator.wikimedia.org/T164107#3222422 (10jcrespo) [21:45:34] 10DBA, 06DC-Ops, 06Operations: db1063 thermal issues (was: db1063 io (s5 master eqiad) performance is bad) - https://phabricator.wikimedia.org/T164107#3222474 (10Marostegui) >>! In T164107#3222417, @jcrespo wrote: > ```lines=10 > MCE 0 > CPU 2 THERMAL EVENT TSC 3f67e99385dbc7 > TIME 1492766490 Fri Apr 21 09...