[00:11:56] 06Labs, 06Operations, 13Patch-For-Review, 07Tracking: Migrate misc to secondary labstore HA cluster - https://phabricator.wikimedia.org/T154336#2947783 (10madhuvishy) All the data will be migrated over and shouldn't need any prior action. If any of the services that are writing to /home or /data/project do... [01:00:01] Change on 12wikitech.wikimedia.org a page Nova Resource:Tools/Access Request/Hidayatsrf was created, changed by Hidayatsrf link https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/Access_Request/Hidayatsrf edit summary: Created page with "{{Tools Access Request |Justification=Run a bot |Completed=false |User Name=Hidayatsrf }}" [01:21:41] !log labs Disabling puppet across labs instances with nfs mounted T154336 [01:21:42] Unknown project "labs" [01:21:42] T154336: Migrate misc to secondary labstore HA cluster - https://phabricator.wikimedia.org/T154336 [01:37:01] PROBLEM - Puppet run on tools-services-02 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [02:47:02] RECOVERY - Puppet run on tools-services-02 is OK: OK: Less than 1.00% above the threshold [0.0] [02:50:26] 06Labs, 06Operations, 13Patch-For-Review, 07Tracking: Migrate misc to secondary labstore HA cluster - https://phabricator.wikimedia.org/T154336#2948310 (10madhuvishy) [02:51:14] 06Labs, 06Operations, 13Patch-For-Review, 07Tracking: Migrate misc to secondary labstore HA cluster - https://phabricator.wikimedia.org/T154336#2907785 (10madhuvishy) Removed wikidata-dev from list of affected projects - It has nfs-mount turned off explicitly via hiera - https://wikitech.wikimedia.org/wiki... [02:54:10] Change on 12wikitech.wikimedia.org a page Nova Resource:Tools/Access Request/Hidayatsrf was modified, changed by Tim Landscheidt link https://wikitech.wikimedia.org/w/index.php?diff=1339825 edit summary: [03:41:20] 06Labs, 10Tool-Labs: Simplify and reduce the amount of options jsub supports - https://phabricator.wikimedia.org/T134846#2279594 (10scfc) Ceterum censeo: I think this is not necessary and harmful. `jsub` is intended to be used with SGE, and there is no reason to handicap it to be compatible with another syste... [05:08:01] PROBLEM - Puppet run on tools-services-02 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [06:13:02] RECOVERY - Puppet run on tools-services-02 is OK: OK: Less than 1.00% above the threshold [0.0] [06:35:41] PROBLEM - Puppet run on tools-services-01 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [06:39:03] PROBLEM - Puppet run on tools-services-02 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [07:30:18] 06Labs, 06Operations, 13Patch-For-Review, 07Tracking: Migrate misc to secondary labstore HA cluster - https://phabricator.wikimedia.org/T154336#2948556 (10madhuvishy) [07:40:41] RECOVERY - Puppet run on tools-services-01 is OK: OK: Less than 1.00% above the threshold [0.0] [07:44:03] RECOVERY - Puppet run on tools-services-02 is OK: OK: Less than 1.00% above the threshold [0.0] [09:06:41] PROBLEM - Puppet run on tools-services-01 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [09:10:03] PROBLEM - Puppet run on tools-services-02 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [09:14:34] Change on 12wikitech.wikimedia.org a page Nova Resource:Tools/Access Request/Codedev was created, changed by Codedev link https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/Access_Request/Codedev edit summary: Created page with "{{Tools Access Request |Justification=Hi would love to help out with some smaller low hanging fruit first and then perhaps get involved in the tools part. |Completed=false |Us..." [09:58:13] 06Labs, 10Labs-Infrastructure: Deprecate precise instances in Labs by 03/31/2017 - https://phabricator.wikimedia.org/T143349#2948794 (10Qgil) >>! In T143349#2946955, @Acs wrote: > We are using http://korma.wmflabs.org/browser/ which is 208.80.155.156 IP. So 208.80.155.168 from my point of view its not used any... [10:10:03] RECOVERY - Puppet run on tools-services-02 is OK: OK: Less than 1.00% above the threshold [0.0] [10:11:43] RECOVERY - Puppet run on tools-services-01 is OK: OK: Less than 1.00% above the threshold [0.0] [10:37:41] PROBLEM - Puppet run on tools-services-01 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [10:39:05] samtar: heya! I can reset it for the lta account right now if you want, but user accounts are going to take a bit longer [10:41:01] PROBLEM - Puppet run on tools-services-02 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [11:23:04] yuvipanda: Sounds good! :) when will I be able to use the tools.lta's db login? [11:23:24] samtar: I can do that now, moment [11:25:27] samtar: should be there now,c an you verify? [11:26:13] yuvipanda: I'm afraid I don't have access to my ssh key at work, so can't verify [11:26:23] samtar: ok! I can see it tho :) [11:27:04] yuvipanda: Well that's good enough for me :) thank you kindly! I believe them not being generated randomly is a known issue? [11:27:52] samtar: yeah, the process was 'stuck' from 12 Jan [11:32:32] yuvipanda: For normal lusers, it's not possible to see meaningful show create table, right? [11:32:43] nope [11:33:01] I just use the https://www.mediawiki.org/wiki/Manual:Database_layout documentation page instead [11:33:16] I do most of the time, or my own copy of mediaWiki [11:34:47] yuvipanda: Can you do a SHOW CREATE table for categorylinks on Commons? [11:35:02] multichill: sure. any specific reason? [11:35:15] https://phabricator.wikimedia.org/T155529 [11:36:55] multichill: pasted [11:37:32] yuvipanda: Thanks, so `cl_sortkey_prefix` varbinary(255) NOT NULL DEFAULT means it's the UTF-8 in LATIN-1 shit [11:37:50] I think [11:42:42] RECOVERY - Puppet run on tools-services-01 is OK: OK: Less than 1.00% above the threshold [0.0] [11:43:11] multichill: maybe :) I've no idea [11:45:30] yuvipanda: https://www.mediawiki.org/wiki/Toolserver:Code_snippets#Fix_UTF-8_encoded_as_latin-1 blast from the past :-( [11:46:03] RECOVERY - Puppet run on tools-services-02 is OK: OK: Less than 1.00% above the threshold [0.0] [12:38:40] PROBLEM - Puppet run on tools-services-01 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [13:03:19] 06Labs, 10Tool-Labs, 10DBA: Provisioning MySQL replica users fails on tool labs - https://phabricator.wikimedia.org/T151014#2949041 (10Marostegui) @yuvipanda let me know when you want to do this Thanks [13:35:59] 06Labs, 10Labs-Infrastructure, 10DBA, 06Operations, 13Patch-For-Review: Migrate labsdb1005/1006/1007 to jessie - https://phabricator.wikimedia.org/T123731#2949078 (10faidon) Ping! Jan 25 is a week away from now, not a lot of time left for an announcement :) [13:37:54] 06Labs, 10Labs-Infrastructure, 10DBA, 06Operations, 13Patch-For-Review: Migrate labsdb1005/1006/1007 to jessie - https://phabricator.wikimedia.org/T123731#2949082 (10mark) p:05Normal>03High [13:38:34] 06Labs, 10Labs-Infrastructure, 10DBA, 06Operations, 13Patch-For-Review: Migrate labsdb1005/1006/1007 to jessie - https://phabricator.wikimedia.org/T123731#2949083 (10yuvipanda) I didn't manage to send out the announcement due to unforseen personal issues. I'll send it out now after checking with jynus. [13:41:00] 06Labs, 10Tool-Labs, 10DBA: Provisioning MySQL replica users fails on tool labs - https://phabricator.wikimedia.org/T151014#2949085 (10Marostegui) This has been executed on the following hosts: ``` labsdb1001 labsdb1003 labsdb1004 labsdb1005 ``` ``` set session sql_log_bin=0; drop user 'labsdbadmin'@'10.64... [15:12:09] 06Labs, 10Tool-Labs: Webservice outages and/or issues - https://phabricator.wikimedia.org/T155494#2949190 (10Giftpflanze) I'm not aware of any script errors. Maybe there are db queries or other things that cause it to hang. I see two possibilities: * Let us run the control script on a tools bastion (would the... [15:43:42] RECOVERY - Puppet run on tools-services-01 is OK: OK: Less than 1.00% above the threshold [0.0] [15:48:56] PROBLEM - Puppet run on tools-webgrid-generic-1402 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [15:59:32] 06Labs: Increase resource quota for dwl - https://phabricator.wikimedia.org/T152456#2949316 (10Andrew) For reference: An xlarge and a small instance would would be an increase from 8 cores to 9 cores, and from 16Gb to 18Gb. I would probably raise the quota more than that to allow transition to the new instance... [16:07:03] PROBLEM - Puppet run on tools-services-02 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [16:25:37] 06Labs: Increase resource quota for dwl - https://phabricator.wikimedia.org/T152456#2949359 (10Giftpflanze) Considering my earlier comment (bigram instance), the needed numbers would be 1+8=9 cores and 2+36=38GB. But otherwise you're right. And I actually planned to do the transition without additional temporary... [16:28:55] RECOVERY - Puppet run on tools-webgrid-generic-1402 is OK: OK: Less than 1.00% above the threshold [0.0] [16:39:41] PROBLEM - Puppet run on tools-services-01 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [17:09:20] !log shinken Silencing shinken for nfs misc migration T154336 [17:09:38] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Shinken/SAL [17:09:45] T154336: Migrate misc to secondary labstore HA cluster - https://phabricator.wikimedia.org/T154336 [17:24:06] 06Labs: Request creation of wikidata-federation labs project - https://phabricator.wikimedia.org/T154659#2949492 (10WMDE-leszek) @chasemp Will you be able to estimate when the new project could be created? Just so that I know how to plan my work in next days. Thanks! [19:21:35] 06Labs, 10Labs-Infrastructure: Empty default security group for newly created project - https://phabricator.wikimedia.org/T136871#2350514 (10Krenair) >>! In T136871#2949633, @Andrew wrote: > As best I can tell, though, there's no way to add a group rule to that. So I can't add the rule which allows all traffi... [19:44:09] 06Labs, 10Labs-Infrastructure: Empty default security group for newly created project - https://phabricator.wikimedia.org/T136871#2950224 (10Krenair) Ugh, right, it's the first rule here: ```krenair@silver:~$ nova --os-tenant-name admin secgroup-list +-----+---------+-------------+ | Id | Name | Descriptio... [19:47:28] !log ores restarted precached and uwsgi-ores on ores-web-03. Memory usage issues. [19:47:30] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Ores/SAL [19:47:58] 06Labs, 10Tool-Labs, 10Tool-Labs-tools-LTA-Knowledgebase: tools.lta missing replica.my.cnf - https://phabricator.wikimedia.org/T155317#2950248 (10Samtar) [19:48:00] 10Tool-Labs-tools-LTA-Knowledgebase: Create LTA table - https://phabricator.wikimedia.org/T155342#2950247 (10Samtar) 05Open>03Resolved [19:48:04] 10Tool-Labs-tools-LTA-Knowledgebase: Create user table - https://phabricator.wikimedia.org/T155340#2950249 (10Samtar) 05Open>03Resolved [19:48:06] 06Labs, 10Tool-Labs, 10Tool-Labs-tools-LTA-Knowledgebase: tools.lta missing replica.my.cnf - https://phabricator.wikimedia.org/T155317#2940251 (10Samtar) [19:48:31] 06Labs, 10Tool-Labs, 07Tracking: Tool Labs users missing replica.my.cnf (tracking) - https://phabricator.wikimedia.org/T135931#2950257 (10Samtar) [19:48:33] 06Labs, 10Tool-Labs, 10Tool-Labs-tools-LTA-Knowledgebase: tools.lta missing replica.my.cnf - https://phabricator.wikimedia.org/T155317#2940251 (10Samtar) 05Open>03Resolved a:03Samtar @yuvipanda All sorted, thank you so much for your help! I'll mark this resolved [19:49:03] yuvipanda: ^ [20:16:45] bd808: feel like CR me? [20:17:22] matanya: I can give you a bit of time. what patch? [20:17:36] bd808: https://gerrit.wikimedia.org/r/#/c/332812/1 [20:17:51] simple copy edit of english wording [20:20:00] thanks bd808 [20:20:11] easy enough. I need to get a deploy window to update the app later this week [20:20:41] isn't there one today ? [20:20:48] we;ve got a new batch of translations queued up too [20:20:52] "swat" [20:21:02] swat is MW only [20:21:08] this is a whole other mess [20:21:11] oh, ok [20:21:48] it would take longer to explain to a SWATer than to just do :) [20:50:34] hey yall [20:50:39] i'm getting an error I think i've seen before: [20:50:39] Could not retrieve catalog from remote server: Error 400 on SERVER: invalid byte sequence in US-ASCII at /etc/puppet/modules/druid/manifests/coordinator.pp:1 [20:50:43] running puppet on a labs instance [20:50:51] i don't see anything wrong on line 1 there, as it says [20:51:04] i'm not using self hosted puppet [20:51:12] so i can't really troubleshoot [20:51:52] it looks like bd808 restarted deployment-puppetmaster to 'fix' this last year: https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL#2016-02-22 [20:53:14] andrewbogott: ^ ? [20:53:19] Change on 12wikitech.wikimedia.org a page Nova Resource:Tools/Access Request/Md Hashim azmi shaikh was modified, changed by Tim Landscheidt link https://wikitech.wikimedia.org/w/index.php?diff=1344684 edit summary: [20:53:39] ottomata: on 'a' labs instance? [20:53:41] Change on 12wikitech.wikimedia.org a page Nova Resource:Tools/Access Request/$traight-$hoota was modified, changed by Tim Landscheidt link https://wikitech.wikimedia.org/w/index.php?diff=1344686 edit summary: [20:54:04] on druid201.analytics.eqiad.wmflabs [20:54:24] Change on 12wikitech.wikimedia.org a page Nova Resource:Tools/Access Request/Md Hashim azmi shaikh was modified, changed by Tim Landscheidt link https://wikitech.wikimedia.org/w/index.php?diff=1344688 edit summary: [20:54:30] Change on 12wikitech.wikimedia.org a page Nova Resource:Tools/Access Request/$traight-$hoota was modified, changed by Tim Landscheidt link https://wikitech.wikimedia.org/w/index.php?diff=1344690 edit summary: [20:56:10] andrewbogott: can i restart puppetmaster on labcontrol1001? [20:56:14] is that the correct place? [20:56:49] ottomata: hey! can you hold off? we're dealing with an outage [20:56:49] ottomata we are in teh middle of a bit of an emergency, can we hold off for a bit? [20:56:51] :) [20:56:56] oh! sure :) [21:01:52] Change on 12wikitech.wikimedia.org a page Nova Resource:Tools/Access Request/Wurgl was modified, changed by Tim Landscheidt link https://wikitech.wikimedia.org/w/index.php?diff=1344777 edit summary: [21:02:24] Change on 12wikitech.wikimedia.org a page Nova Resource:Tools/Access Request/Wurgl was modified, changed by Tim Landscheidt link https://wikitech.wikimedia.org/w/index.php?diff=1344784 edit summary: [21:04:50] ottomata: you are getting that from the shared labs puppetmaster? :/ [21:07:04] bd808: yes [21:07:05] otto@druid201:~$ cat /etc/puppet/puppet.conf | grep server [21:07:05] server = labs-puppetmaster-eqiad.wikimedia.org [21:08:00] it was supposedly fixed with https://gerrit.wikimedia.org/r/#/c/301071/4 but apparently not really [21:09:21] (03PS1) 10Jean-Frédéric: Fix name of CommonsCat field in fr_(fr) [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/332824 [21:10:25] chasemp hi, when i ssh into gerrit-mysql, im getting this error Could not chdir to home directory /home/paladox: No such file or directory [21:10:51] but that was created when i created the instance but today i am seeing that error. [21:10:55] paladox we have a known issue that is being fixed [21:11:00] oh [21:11:02] thanks [21:11:03] it's fallout from the in progress maint [21:11:08] oh [21:12:43] hmm bd808 [21:12:44] [@labcontrol1001:/etc/puppet] $ cat /etc/default/puppetmaster | grep LANG [21:13:30] oh but [21:13:33] tha'ts for self hosted [21:14:39] assuming labcontrol uses passenger / apache [21:14:49] i don't see the env in /etc/apache2/env-* [21:14:50] ottomata: _joe_ left a review comment on a similar patch that I abandoned about the LANG maybe needing to be set in the Apache config. I've lost track of how we actually run the puppetmaster process [21:15:04] https://gerrit.wikimedia.org/r/#/c/301071/4/modules/puppetmaster/manifests/passenger.pp [21:15:05] OH [21:15:11] labcontrol is trusty [21:16:45] yeah, bd808 that looks like the problem [21:17:10] heh. guard conditions are evil for so many reasons in Puppet [21:18:11] bd808: labcontrol is apache/passenger, same as (pretty much) everywhere these days [21:18:52] right. it was only different before the last round of self-hosted puppetmaster cleanup by yuvipanda [21:19:18] * andrewbogott nods [21:19:30] indeed. now it's just fucked up in the same way, rather than fucked up in an entirely different way [21:20:10] https://gerrit.wikimedia.org/r/#/c/332853/ [21:20:17] _joe_: bd808^ [21:21:05] seems sane to me ottomata [21:21:25] <_joe_> ottomata: actually, -1 [21:21:44] <_joe_> that is sane only on apache 2.4/puppet >=3.7 [21:21:48] oh [21:21:56] <_joe_> IIRC [21:22:00] hm [21:22:03] <_joe_> but it's 10:30 pm [21:22:06] <_joe_> don't trust me [21:22:27] it's 3AM, trust me and merge! [21:22:52] (jk pls don't) [21:23:04] haha [21:25:12] well hm, i'd love to just restart puppetmaster and then abandon this patch and hold my nose and hope it works... [21:25:19] yall lemme know when things are cool to do that [21:28:44] chasemp does that mean my home dir is deleted or was a backup done? I had some things i was testing for gerrit / jenkins. [21:30:04] paladox: it'll be restored soon, paladox, [21:30:10] thanks :) [21:32:39] please restore taxonbot.dwl.eqiad.wmflabs:/home/taxonbot too, thank you [21:33:02] will do, doctaxon [21:33:55] <_joe_> ottomata: do you have a trusty puppetmaster with puppet 3.8? [21:34:03] <_joe_> in that case, you need that patch ofc [21:37:32] _joe_ [21:37:33] yup [21:37:33] Version: 3.8.5-2~bpo8trusty+2 [21:37:37] on labcontrol1001 [22:18:55] 10Tool-Labs-tools-LTA-Knowledgebase: Create password change function - https://phabricator.wikimedia.org/T155675#2950725 (10Samtar) [22:21:53] 10Tool-Labs-tools-LTA-Knowledgebase: Create password change function - https://phabricator.wikimedia.org/T155675#2950725 (10Legoktm) Is there a reason this tool doesn't use OAuth? [22:39:43] yuvipanda chasemp : there has been a little bit restored only, I hope, the rest is coming back, too? [22:41:00] doctaxon, hte process is ongoing so can't say till it's finished [22:41:17] thank you, I hope, nothing will be lost [22:46:39] doctaxon: which instance? [22:46:56] taxonbot.dwl.eqiad.wmflabs:/home/taxonbot [22:54:51] doctaxon: okay - all the files are there - still getting restored etc [22:59:10] madhuvishy: all the files? Thank you, I will wait till all has come in [23:05:30] 06Labs, 10wikitech.wikimedia.org, 05MW-1.29-release-notes, 13Patch-For-Review, 05WMF-deploy-2017-01-17_(1.29.0-wmf.8): LinksUpdate::acquirePageLock error with SMW enabled - https://phabricator.wikimedia.org/T153618#2950983 (10scfc) If I understand https://wikitech.wikimedia.org/wiki/Special:Version corre... [23:10:29] question, why did labs migrate to NFS [23:14:36] Zppix: we didn't, we have always been on NFS but it was a single server prone to failure and now we are moving to a cluster [23:17:29] chasemp i would hate to be the server a few months ago (you may want to put ice on the cpu of it) anyway thanks i was wondering [23:17:52] 06Labs, 10wikitech.wikimedia.org, 05MW-1.29-release-notes, 13Patch-For-Review, 05WMF-deploy-2017-01-17_(1.29.0-wmf.8): LinksUpdate::acquirePageLock error with SMW enabled - https://phabricator.wikimedia.org/T153618#2951004 (10bd808) ``` silver:~ bd808$ mwscript showJobs.php --wiki=labswiki --group htmlCa...