[05:58:17] 10DBA, 06Labs, 10Labs-Infrastructure, 13Patch-For-Review: Migrate existing labs users from the old servers, if possible using roles and start maintaining users on the new database servers, too - https://phabricator.wikimedia.org/T149933#2892062 (10yuvipanda) This has been deployed now, and all tools have a... [05:58:28] jynus: marostegui this is deployed and tested to work fine, btw. [05:58:40] let me know when I can test the proxy :) [06:01:38] 10DBA, 06Labs, 10Labs-Infrastructure, 13Patch-For-Review: Migrate existing labs users from the old servers, if possible using roles and start maintaining users on the new database servers, too - https://phabricator.wikimedia.org/T149933#2892079 (10yuvipanda) I still need to figure out best way of making th... [06:39:59] 10DBA, 06Labs, 10Tool-Labs, 07Regression: Tool Labs: Add skin, language, and variant to user_properties_anon - https://phabricator.wikimedia.org/T152043#2892193 (10Krinkle) [07:23:21] 10DBA: Defragment db1015 - https://phabricator.wikimedia.org/T153739#2892272 (10Marostegui) It finished optimizing the tables under 1G I will now run the loop to optimize the biggest revision tables (around 5G or so the average, but there are 53 of them). There have been lo delays during the whole process: https... [07:54:41] 10DBA: Defragment db1044 - https://phabricator.wikimedia.org/T153826#2892289 (10Marostegui) [08:00:25] 10DBA: Defragment db1015 - https://phabricator.wikimedia.org/T153739#2892314 (10Marostegui) >>! In T153739#2892271, @Stashbot wrote: > {nav icon=file, name=Mentioned in SAL (#wikimedia-operations), href=https://tools.wmflabs.org/sal/log/AVkgQOoblCyyDMEPu8m5} [2016-12-21T07:20:42Z] Running optimize t... [08:51:04] 10DBA: Defragment db1015 - https://phabricator.wikimedia.org/T153739#2889198 (10jcrespo) See: T110504 [08:51:21] 10DBA: Defragment db1044 - https://phabricator.wikimedia.org/T153826#2892289 (10jcrespo) See: T110504 [08:52:55] 10DBA: Defragment db1015 - https://phabricator.wikimedia.org/T153739#2892507 (10Marostegui) Thanks! - I though justt about defragmenting a couple of tables across the board to give us enough space just for the holidays given that we cannot depool slaves during this week. But this is indeed the long/medium soluti... [09:53:33] 10DBA, 06Labs, 10Tool-Labs: Provisioning MySQL replica users fails on tool labs - https://phabricator.wikimedia.org/T151014#2892615 (10Marostegui) @yuvipanda do you think we can kill these two users? (after holidays period) [09:58:40] 10DBA, 06Editing-Analysis, 10MediaWiki-Database, 07Chinese-Sites: Some revisions on Chinese Wikisource have timestamps from before the wiki was created - https://phabricator.wikimedia.org/T123313#2892650 (10Aklapper) [10:11:24] I am going to drop the unknown databases from labsdb1001 [10:12:58] that will free needed space there [10:15:27] sounds good [10:15:37] will you backup them first or not even? [10:15:50] yes, every time [10:20:28] how would you feel about enabling innodb_strict_mode everywhere at the same time than Barracuda? http://dev.mysql.com/doc/refman/5.7/en/innodb-parameters.html#sysvar_innodb_strict_mode [10:20:56] I hate doing a multi-hour alter later to find it was worthless [10:21:28] totally in favour. at booking we only had it on RBR hosts [10:23:10] 10DBA: Set barracuda InnoDB file format as the default configuration everywhere - https://phabricator.wikimedia.org/T150949#2892826 (10jcrespo) [10:23:27] ^I have added it here, as it is related to Barracuda/compression [10:24:11] 10DBA: Set barracuda InnoDB file format as the default configuration everywhere - https://phabricator.wikimedia.org/T150949#2892827 (10Marostegui) See: https://gerrit.wikimedia.org/r/#/c/321638/ [10:25:12] one comment [10:25:22] to avoid doing 10 patches [10:25:32] (even if we want to do it slowly) [10:25:55] I started working on: https://gerrit.wikimedia.org/r/324915 [10:26:53] give it a look, probably will simplify such a task [10:27:18] i see [10:27:41] i would prefer to deploy: https://gerrit.wikimedia.org/r/#/c/321638/ first, as a "test" and if it works, we can work together on that task maybe? [10:27:43] I do not want to edit 10 times the same parameters [10:28:00] and we can still use variables to control which ones get which options [10:28:57] but that is ok [10:29:14] but add the new config I mention, too, at the same time [10:29:58] ah yes [12:24:44] 10DBA, 06Labs, 10Tool-Labs: Provisioning MySQL replica users fails on tool labs - https://phabricator.wikimedia.org/T151014#2893147 (10yuvipanda) @Marostegui yup! [12:25:29] 10DBA, 06Labs, 10Tool-Labs: Provisioning MySQL replica users fails on tool labs - https://phabricator.wikimedia.org/T151014#2893148 (10Marostegui) Great - thanks! I will get rid of them after freeze [12:34:26] 10DBA, 06Labs, 10Tool-Labs: Provisioning MySQL replica users fails on tool labs - https://phabricator.wikimedia.org/T151014#2893169 (10Marostegui) This is what will be done ``` root@neodymium:~# host labstore1001 labstore1001.eqiad.wmnet has address 10.64.37.6 root@neodymium:~# host labstore1002 labstore100... [13:18:17] https://phabricator.wikimedia.org/P4662 [13:18:34] Our xmas gift!!! [13:18:45] nice job!! [13:19:13] I can create a cron job now [13:19:35] but the script doesn't work on db1069 (multiple instances) [13:20:27] i did a very quick thing yesterday: https://gerrit.wikimedia.org/r/#/c/328352/ [13:20:40] (as you mentioned the cronjob) [13:24:00] looks good, but there is the problem I mentioned [13:24:14] (doesn't work on db1069) [13:24:19] yeah :( [13:24:35] and we can fix it or having some kind of disabled parameter on puppet [13:24:44] 10DBA: Defragment db1015 - https://phabricator.wikimedia.org/T153739#2893296 (10Marostegui) ``` root@db1015:/srv/sqldata# df -hT /srv/ Filesystem Type Size Used Avail Use% Mounted on /dev/mapper/tank-data xfs 1.6T 1.4T 236G 86% /srv ``` I have started the templatelinks (bigger than 1G) tables... [13:25:10] we can make the script to detect the hostname and if it is db1069 do a diffrerent thing [13:25:51] the other thing I would change is the grep '-' for something like egrep/grep -e '^-- ' [13:26:03] not sure that is right [13:26:15] or probably, much better [13:26:25] just fix the script properly [13:26:34] with parameters [13:26:46] yes, it was a quick thingy [13:26:52] that is why i didn't even add you yet :) [13:26:55] to review [13:27:03] no, I mean the python [13:27:08] aaah [13:27:12] the patch is good [13:28:23] have something like -v (verbose output, adds comments), --socket and --repo-dir [13:29:44] how would you like to have an option to allow a run, and not just a dry-run (which should be the one by default I think) [13:29:49] something like —drop-the-stuff [13:31:47] I do not understand "allow a run" [13:32:19] Sorry: to allow the script to run what it suggests [13:32:37] let's say: I run it, i check that all looks good, and now I want to force it to run and drop whatever it suggest [13:32:47] or you just prefer script.py | mysql [13:32:49] no, I do not like that [13:32:49] ? [13:33:02] you can always run mysql [13:33:10] and you should not blindly delete things [13:33:32] what could be added [13:33:41] is the redact functionality [13:33:52] because most of it is already there [13:33:59] ˜/icinga-wm 14:32> PROBLEM - Disk space on db1035 is CRITICAL: DISK CRITICAL - free space: /srv 63672 MB (3% inode=99%) :| [13:34:08] yes, that is me [13:34:13] aah fine [13:34:17] pheeew [16:23:10] * yuvipanda waves [16:23:32] jynus: marostegui we are testing the labsdb-web.eqiad.wmnet endpoint and yuvi cannot see the enwiki_p view (seems like missing grant?) [16:23:32] echo 'show grants' | mysql --defaults-file=replica.my.cnf -h labsdb-web.eqiad.wmnet | grep enwiki [16:23:39] does not show enwiki_p [16:23:57] shows enwikibooks, enwikivoyage, enwikisource, enwikinews, enwikiversity [16:24:03] let me see [16:26:31] yes, I see something missing [16:26:54] I may have added the 800 from s3 and not enwiki [16:28:36] yuvipanda, test now [16:29:05] probably we tested it on a single machine, and when I added all, I forgot about the one we first tested on [16:29:18] jynus: nope, still same [16:29:35] have you restarted the connection? [16:30:44] I can reproduce [16:31:24] hm [16:32:12] ah, sorry [16:32:21] I granted them twice to labsdb1010 [16:32:30] instead of 1010 and 1011 [16:32:42] 1011 is currently the backend for web [16:32:47] fixing now [16:33:56] echo 'show grants' | mysql --defaults-file=replica.my.cnf -h labsdb-web.eqiad.wmnet | grep enwiki [16:33:58] works now [16:34:02] mistakes were made [16:34:15] :-) [16:34:26] jynus: chasemp yup, can confirm [16:34:38] jynus: thanks man, a side question [16:34:39] root@neodymium:~# mysql --skip-ssl -hlabsdb1010 [16:34:40] works [16:34:47] root@neodymium:~# mysql --skip-ssl -hlabsdb1009 [16:34:49] doesn't [16:34:59] I'm probably doing something odd [16:34:59] strange [16:35:08] maybe a firewall problem? [16:35:29] ERROR 1045 (28000): Access denied for user 'root'@'10.64.32.20' (using password: YES) [16:35:34] seems perms? [16:35:45] ok, root should not work there [16:35:59] I have yet to create a remote account [16:36:09] for now, for admin purposes [16:36:20] go to localhost and run mysql [16:36:30] yuvipanda: can you poke at the analytics end point just for due diligence? [16:36:31] neodymium is WIP [16:36:43] yuvipanda: labsdb-analytics.eqiad.wmnet [16:36:47] because I have to package a proper client package [16:36:54] jynus: cool man, no worries just worth a mention [16:36:56] yes [16:37:11] but think that the error is that it works now, rather than it doesn't work [16:37:13] chasemp: yeah, I did that first and that seemed to work fine [16:37:16] nice [16:37:24] jynus: ah that makes sense aactually [16:37:26] I will check the account and make sure to delete it [16:37:44] it will eventually work, but it requires setting up a new password [16:38:14] as an aside, gents I'm going to send a quick email and close a few tasks :) [16:39:35] 10DBA, 06Labs, 10Labs-Infrastructure, 07Epic, 07Tracking: Labs databases rearchitecture (tracking) - https://phabricator.wikimedia.org/T140788#2893764 (10chasemp) [16:39:38] 10DBA, 06Labs, 10Labs-Infrastructure, 13Patch-For-Review: Provide at least 2 separate service endpoints: one for slow, long running queries; and another for quick, web requests - https://phabricator.wikimedia.org/T147051#2893761 (10chasemp) 05Open>03Resolved a:03chasemp The service endpoints establis... [16:40:25] chasemp: jynus awesome. [16:40:30] 10DBA, 06Labs, 10Tool-Labs: enwiki_p replica on s1 is corrupted - https://phabricator.wikimedia.org/T134203#2893769 (10chasemp) [16:40:34] 10DBA, 06Labs, 10Labs-Infrastructure, 13Patch-For-Review: Provision with data the new labsdb servers and provide replica service with at least 1 shard from a sanitized copy from production - https://phabricator.wikimedia.org/T147052#2893765 (10chasemp) 05Open>03Resolved a:03chasemp enwiki_p is now fu... [16:41:31] yuvipanda, chasemp now neodymium access is prohibited [16:42:00] I will setup proper remote access later, enforcing ssl for the admin account [16:42:21] for now, mysql --skip-ssl from localhost only [16:42:39] (firewall is still open) [16:43:56] 10DBA, 06Labs, 10Labs-Infrastructure, 07Epic, 07Tracking: Labs databases rearchitecture (tracking) - https://phabricator.wikimedia.org/T140788#2893778 (10chasemp) [16:43:59] 10DBA, 06Labs, 10Labs-Infrastructure, 13Patch-For-Review: Implement a frontend failover solution for labsdb replicas - https://phabricator.wikimedia.org/T141097#2893774 (10chasemp) 05Open>03Resolved We have settled on DNS based failover with haproxy for the time being between the two dbproxy hosts (101... [16:44:13] kk [16:45:14] chasemp, I will add some milestones for the dba side related to the goal [16:45:26] to the etherpad [16:45:33] thanks man [16:45:45] even if they are not part of the goals, we did a lot of refactoring and fixing [16:45:58] and at least making them known somewhere [16:53:43] jynus: nice idea there we shold do the same w/ rewriting maintain-views, maintain_meta-p, and create-dbusers and such [16:54:00] we can laugh about this over beers in a few weeks, gotta afk for a bit :) [17:09:39] 10DBA, 06Labs, 10Tool-Labs: enwiki_p replica on s1 is corrupted - https://phabricator.wikimedia.org/T134203#2893829 (10jcrespo) @scfc This is technically solved on the new servers, enwiki is fine there, but we most likely will not fix them on the current servers. When run on enwiki production master I get:... [17:10:22] 10DBA, 06Labs, 10Labs-Infrastructure, 13Patch-For-Review: Provision with data the new labsdb servers and provide replica service with at least 1 shard from a sanitized copy from production - https://phabricator.wikimedia.org/T147052#2893833 (10jcrespo) [17:10:26] 10DBA, 06Labs, 10Labs-Infrastructure, 13Patch-For-Review: Provision sanitized data on labsdb1009, labsdb1010, labsdb1011 with from db1095 - https://phabricator.wikimedia.org/T152194#2893831 (10jcrespo) 05Open>03Resolved a:03jcrespo [17:17:14] 10DBA, 06Labs: Missing data on labs replica database - https://phabricator.wikimedia.org/T133715#2893851 (10jcrespo) 05Open>03Resolved The revision is available on the new labsdb servers (which will fix now and in the future any drift with production). This is difficult to solve on the current (old) server... [17:17:17] 10DBA, 06Labs, 06Operations, 07Tracking: Database replication problems - production and labs (tracking) - https://phabricator.wikimedia.org/T50930#2893853 (10jcrespo) [17:21:55] 10DBA, 06Labs, 10Tool-Labs: Tool Labs queries die - https://phabricator.wikimedia.org/T127266#2893866 (10jcrespo) 05Open>03Resolved a:03jcrespo Dispenser, we just finished setting up 3 new labsdb servers with 2 separate entry points- one for fast webrequests, and another for analytics-like queries. On... [17:30:59] 10DBA, 06Labs, 06Operations, 07Tracking: Database replication problems - production and labs (tracking) - https://phabricator.wikimedia.org/T50930#2893895 (10jcrespo) [17:31:03] 10DBA, 06Labs: Data missing from June 11/12 on s3.labsdb - https://phabricator.wikimedia.org/T115517#2893892 (10jcrespo) 05Open>03Resolved a:03jcrespo These are the results on the new replica service servers: ``` root@localhost[srwiki_p]> select max(rev_timestamp) as max_rev, min(rev_timestamp) as min_r... [17:35:03] 10DBA, 06Labs: Make watchlist table available as curated foo_p.watchlist_count on labsdb - https://phabricator.wikimedia.org/T59617#2893932 (10jcrespo) This is almost done (the table is available on the current labsdb servers), but to close it we need to: * Puppetize thinking there may be other tables like th... [17:35:36] 10DBA: LabsDB infrastructure pending work - https://phabricator.wikimedia.org/T153058#2893935 (10jcrespo) [17:35:38] 10DBA, 06Labs, 10Labs-Infrastructure, 07Epic, 07Tracking: Labs databases rearchitecture (tracking) - https://phabricator.wikimedia.org/T140788#2893936 (10jcrespo) [17:46:29] 10DBA, 06Labs, 10Labs-Infrastructure: Explore 'Analyze' statement as substitute for Explain - https://phabricator.wikimedia.org/T141095#2893989 (10jcrespo) 10.1 Analyze requires also SELECT grants for the underlying tables, so it cannot be used. This doesn't take away from the fact that there is already a wo... [18:13:48] jynus, chasemp hey guys, sorry I missed the meeting, but it took longer than expected in the hospital :( [18:14:31] no problem [18:14:36] hope you saw the mail [18:14:41] and the link within it [18:15:06] I am going thru email now [18:15:18] we can talk tomorrow [18:15:34] I am reading the etherpad one [18:15:40] I guess that is what you mean [18:17:04] the goal is completed! \o/ [18:34:20] 10DBA: Defragment db1015 - https://phabricator.wikimedia.org/T153739#2894085 (10Marostegui) It finished and now running across all the smaller tables over night. ``` root@db1015:~# df -hT /srv/ Filesystem Type Size Used Avail Use% Mounted on /dev/mapper/tank-data xfs 1.6T 1.4T 254G 85% /srv ``` [18:35:56] 10DBA: Defragment db1044 - https://phabricator.wikimedia.org/T153826#2894100 (10Marostegui) Thanks! - I thought just about defragmenting a couple of tables across the board to give us enough space just for the holidays given that we cannot depool slaves during this week. But this is indeed the long/medium soluti... [20:13:40] 10DBA, 06Labs, 10Labs-Infrastructure: Explore 'Analyze' statement as substitute for Explain - https://phabricator.wikimedia.org/T141095#2894390 (10yuvipanda) @jcrespo what do you think of a http service that is hosted on hardware somewhere and maintained by the labs team, that lets you send it a query and gi...