[03:59:51] <wikibugs>	 06Labs, 10Labs-Infrastructure: Copy labmon data to new SSDs - https://phabricator.wikimedia.org/T137924#2403791 (10RobH)
[04:02:40] <wikibugs>	 06Labs, 10Labs-Infrastructure: Copy labmon data to new SSDs - https://phabricator.wikimedia.org/T137924#2403792 (10RobH) All data has been restored from USB disk and services resumed.  However, there are errors.  syslog is spamming with Failed value conversion (but i also saw those before the migration)  permi...
[04:02:52] <wikibugs>	 06Labs, 10Labs-Infrastructure: Copy labmon data to new SSDs - https://phabricator.wikimedia.org/T137924#2403793 (10RobH)
[08:26:13] <wikibugs>	 06Labs, 10Tool-Labs, 15User-bd808: task not run via crontab - https://phabricator.wikimedia.org/T138178#2403888 (10WikedKentaur) 05Open>03Resolved Task runs now via cron. Thanks.
[08:34:03] <wm-bot>	 Change on 12wikitech.wikimedia.org a page Nova Resource:Tools/Access Request/Bonnedav was created, changed by Bonnedav link https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/Access_Request/Bonnedav edit summary: Created page with "{{Tools Access Request |Justification=I plan to use Tools to begin work on various projects that I have been thinking about for awhile. |Completed=false |User Name=Bonnedav }}"
[09:57:21] <wikibugs>	 06Labs, 10Labs-Infrastructure: Copy labmon data to new SSDs - https://phabricator.wikimedia.org/T137924#2404129 (10yuvipanda) So I looked at uwsgi logs and it looked like a possible race between uwsgi starting and graphite being fully installed. It looks ok now - I just restarted it! \o/  I'm going to throw mo...
[10:40:47] <hashar>	 yuvipanda: labmon might well have a ton of useless metrics to garbage collect.  The deployment-prep statsd metrics come to mind
[11:27:39] <wikibugs>	 10Tool-Labs-tools-Other, 06Wikisource, 05Wikimania-Hackathon-2016: OCR scripts need updating at tools labs by updating the "tesseract-ben" package - https://phabricator.wikimedia.org/T117711#2404334 (10Bodhisattwa) Tpt updated during Wikimania hackathon
[11:27:52] <wikibugs>	 10Tool-Labs-tools-Other, 06Wikisource, 05Wikimania-Hackathon-2016: OCR scripts need updating at tools labs by updating the "tesseract-ben" package - https://phabricator.wikimedia.org/T117711#2404335 (10Bodhisattwa) 05Open>03Resolved a:03Bodhisattwa
[11:32:25] <wikibugs>	 10Tool-Labs-tools-Other, 06Wikisource, 05Wikimania-Hackathon-2016: OCR scripts need updating at tools labs by updating the "tesseract-ben" package - https://phabricator.wikimedia.org/T117711#2404358 (10Bodhisattwa) a:05Bodhisattwa>03None
[14:03:51] <wikibugs>	 06Labs, 10MediaWiki-Vagrant, 10MediaWiki-extensions-Newsletter: Cannot enable/list roles in newsletter-test instance - https://phabricator.wikimedia.org/T131460#2404786 (10Tgr) 05Resolved>03Open This seems to be triggered when the encoding environment variables (`LC_*`) are changed.
[14:15:23] <wikibugs>	 06Labs, 10MediaWiki-Vagrant, 10MediaWiki-extensions-Newsletter: Cannot enable/list roles in newsletter-test instance - https://phabricator.wikimedia.org/T131460#2404846 (10Tgr) The two files at fault are probably `puppet/modules/role/manifests/echo.pp` and `puppet/modules/role/manifests/contenttranslation.pp...
[15:01:55] <wikibugs>	 06Labs, 10Tool-Labs, 10Tool-Labs-tools-Database-Queries: Get access to an old database on tools-db - https://phabricator.wikimedia.org/T101709#1345755 (10TTO) Was this database found?
[15:13:06] <chasemp>	 hashar: yeah I'm sure you are right, we'll do some cleanup early next week I imagine
[15:13:40] <hashar>	 chasemp: hi! I am not sure though how one can determine a given metric is no more updated / used :(
[15:13:50] <hashar>	 maybe via the file modified time 
[15:13:59] <chasemp>	 yeah that's how I've done it before
[15:14:21] <chasemp>	 there is another way w/ whisper tools to look at last updated value if it needs extra context
[15:14:36] <chasemp>	 but probably not needed
[15:23:06] <wikibugs>	 06Labs, 10MediaWiki-Vagrant, 10MediaWiki-extensions-Newsletter: Cannot enable/list roles in newsletter-test instance - https://phabricator.wikimedia.org/T131460#2405092 (10Tgr) Not sure what encoding these files use but it's not UTF8. Someone probably used their system encoding when creating them and setting...
[15:31:30] <hashar>	 chasemp: and we can probably have a shorter retention. 
[15:31:44] <hashar>	 I am sure on beta we can live with just a few months of history
[16:13:31] <chasemp>	 hashar: yeah I bet we piggy back on prod durations and we could easily do a year default in labs at a reasonable interval 
[16:18:50] <robh>	 hrmm, so the labmon1001 data is migrated and coming in succesfully but seems the actual portal for graphite is still down.
[16:19:07] <robh>	 its serving a dms file rather than content?
[16:19:30] <robh>	 chasemp: ^ i already left yuvi a pm about it, but i imagine since data is indeed coming into the system the frontend portal not working is far lower priority
[16:20:44] <bd808>	 robh: I think the prod graphite portal is broken too. ori filed a bug about it yesterday
[16:21:12] <bd808>	 T138541
[16:21:12] <stashbot>	 T138541: "unexpected error" on graphite-web - https://phabricator.wikimedia.org/T138541
[16:21:38] <robh>	 huh, different error, but odd they both break around same time, heh
[16:21:50] <bd808>	 Looks like yuvipanda may have temporarily reverted the change based on the task
[16:21:59] <robh>	 i think labmon1001 is a config error since its not serving hte proper page at all, not even an error page
[16:22:19] <robh>	 but, both yuvi and i confirmed we saw data streaming to the system, though when i had confirmed there were other errors he fixed his AM
[16:23:20] <chasemp>	 robh: what page are you trying?
[16:23:26] <bd808>	 robh: *nod* grafana boards are looking better -- https://grafana.wikimedia.org/dashboard/db/labs-project-board
[16:23:27] <robh>	 nothing yet
[16:23:54] <robh>	 chasemp: I havent dug back in yet
[16:24:12] <chasemp>	 http://graphite.wmflabs.org/ pops up to open in ...a text editor so there is some issue there, apache/django seems like
[16:24:40] <robh>	 yeah its serving a .dms file rather than content
[16:24:52] <robh>	 but its not normal apache sites, apache just feeds into the graphite software
[16:25:10] <bd808>	 graphite's UI is a django app
[16:25:29] <bd808>	 (a horribly written one too)
[16:25:33] <robh>	 so it appears to be a graphite app issue, not a stack issue for apache.  the permissions for all the graphite data is also correct (afaict)
[16:39:53] <chasemp>	 huh man I'm not sure
[16:39:56] <chasemp>	 I will take a look again in a efw
[16:39:57] <chasemp>	 few
[17:20:18] <yuvipanda>	 Hey
[17:20:19] <yuvipanda>	 Check HTTPS
[17:20:24] <yuvipanda>	 HTTP is doing strange things
[17:20:51] <yuvipanda>	 robh chasemp bd808
[17:23:44] <robh>	 yuvipanda: same thing
[17:24:20] <robh>	 firefox tries to download a dms file
[17:24:28] <robh>	 the filename for .dms changes each time.
[17:25:07] <chasemp>	 I'm looking for a runaway cat at the moment unfortunately, she seems to ahve flown the coop but isn't prepared to be outside at all
[17:25:12] <robh>	 chrome tries to download some file of download.gz, so different browsers different behaviors
[17:25:32] <robh>	 chasemp: that sucks, hopefully has front claws?  (helps if they are outside, as they can climb down things with front claws)
[17:25:45] <chasemp>	 she doesn't that's part of the issue yeah
[17:25:53] <chasemp>	 and she is an idiot so there is that
[17:27:52] <chasemp>	 last time this happend she was in a dresser drawer but seems like a real code red this time
[17:33:57] <yuvipanda>	 Robh strange, it worked fine in the morning...
[17:35:09] <robh>	 i assumed something odd bliped since you checked historical data with it, heh
[17:35:29] <robh>	 gotta run afk for 20, laundry swap.
[17:39:52] <chasemp>	 yuvipanda: so you tested it this morn and all was well?...ook then wth
[17:41:17] <yuvipanda>	 Yeah
[17:41:24] <yuvipanda>	 (on and off - wikimania)
[17:56:14] <halfak>	 I'm having issues with ores-worker-07, ores-worker-09 and ores-worker-10.  And someone help me figure out what's up. 
[17:56:26] <halfak>	 I tried rebooting ores-worker-07 and it won't come back. 
[17:56:40] <halfak>	 I left 09 and 10 alone in case someone wants to look at their state. 
[17:57:08] <halfak>	 Oh wait.  looks like 09 is accessible
[17:57:17] <halfak>	 07 is definitely derped and won't turn back on
[17:58:13] <halfak>	 maybe chasemp ^ 
[17:59:08] <wikibugs>	 06Labs, 10MediaWiki-Vagrant, 10MediaWiki-extensions-Newsletter, 13Patch-For-Review: Cannot enable/list roles in newsletter-test instance - https://phabricator.wikimedia.org/T131460#2167726 (10ori) I'd rather figure out how to get Puppet reliably run from a UTF-8 locale. Has that been explored and deemed im...
[18:04:35] <halfak>	 Yeah.  OK.  looks like only -07 is struggling
[18:09:00] <chasemp>	 sure I can look at 07 halfak
[18:09:33] <halfak>	 Thanks :) 
[18:20:35] <chasemp>	 halfak: it seems like it shut itself down, possibly as the virt host is overloaded
[18:20:47] <chasemp>	 there has been some known issues that look like this afaik, but I can't get it to come back yet
[18:20:47] <halfak>	 Gotcha.  I did try to restart it a bit ago
[18:20:51] <chasemp>	 toh it says it will
[18:21:00] <chasemp>	 yeah it gets into like an administrative shutdown state I think
[18:21:05] <chasemp>	 I told it to start and it said ok
[18:21:05] <halfak>	 Gotcha.
[18:21:08] <chasemp>	 so I'm giving it a minute
[18:21:12] <halfak>	 Thanks
[18:21:22] <chasemp>	 I can try to cold migrate this to another virt host and I will if it doesn't come back but
[18:21:31] <chasemp>	 never done it before and no one else is around so even money that works out :)
[18:21:42] <chasemp>	 but def not anything your doing
[18:24:40] <halfak>	 Good to know.  Happily I didn't even notice that the darn node depooled because the rest of the nodes just took over. 
[18:42:13] <halfak>	 chasemp, doesn't look like it's back online
[18:42:27] <chasemp>	 yeah nova is a liar pffff
[18:42:49] <chasemp>	 I think it's the 1001 virt host halfak, you could try spinning up a new one while I look into migration
[18:42:57] <chasemp>	 I'll leave a note for andrew on where I get with this
[18:43:23] <halfak>	 chasemp, so delete this instance and just make a new one?
[18:43:38] <halfak>	 I'm at quota :/
[18:43:43] <chasemp>	 gah ok
[18:44:19] <chasemp>	 ok then yeah let's delete and recreate and see if that works
[18:44:26] <halfak>	 OK will do!
[18:48:57] * halfak deletes and starts recreating ores-worker-07
[18:50:58] <halfak>	 chasemp, is there a way to configure puppet through horizon or is that still wikitech-only?
[18:51:18] <chasemp>	 this coming quarter we hope to port to horizon but still wikitech atm :)
[18:51:22] <halfak>	 kk
[18:53:16] * halfak forces the puppet run.  
[18:53:19] <halfak>	 Almost ready :) 
[19:05:03] <chasemp>	 thanks halfak for your patience, not the ideal way to handle it but it's a weird week 
[19:05:20] <halfak>	 No worries.  Labs is awesome.  It's a small price to pay. 
[19:05:27] <halfak>	 :) 
[19:06:02] <halfak>	 OK.  Working on re-pooling now. :) 
[19:07:22] <tom29739>	 (don't mean to barge in here) Is there a way to completely disable puppet on a labs instance, and what problems might it cause?
[19:07:24] <halfak>	 IT"S ALIVE
[19:09:45] <chasemp>	 tom29739: well...you can disble puppet but I'm not sure if htat persists through a reboot and it will cause configuration drift nad instance stalness that will eventually break the instance guaranteed
[19:09:56] <chasemp>	 it's really a short term thing for some specific end to be viable
[19:10:01] <chasemp>	 puppet agent --disable 'reason'
[19:10:10] <tom29739>	 How can I stop it overwriting my files?
[19:10:23] <tom29739>	 It keeps overwriting my salt config. :/
[19:11:06] <chasemp>	 tom29739: well, are you using a project specific salt master or something?  we would have to puppetize your config 
[19:11:27] <tom29739>	 I'm trying to.
[19:12:35] <tom29739>	 I was using https://wikitech.wikimedia.org/wiki/Help:Project_hosted_salt_master and it worked, but it's a 2 year old version of salt.
[19:12:44] <chasemp>	 hmm beta does this so there has to be a way to do this
[19:13:00] <chasemp>	 tom29739: what version?
[19:13:07] <chasemp>	 and what distro?
[19:13:15] <chasemp>	 because prod salt is probably jessie but I can check
[19:13:33] <tom29739>	 When I use that puppet role I get salt master 2014.7.5
[19:13:37] <tom29739>	 I'm using jessie
[19:14:02] <tom29739>	 I want to use salt 2016.3.1 because it has some features that I want to use.
[19:14:52] <tom29739>	 I did try to use the external saltstack repo, but the wikimedia repo overrides it for some reason, and it installs the old version.
[19:15:00] <chasemp>	 hm I see salt-master 2014.7.5 (Helium)
[19:15:03] <chasemp>	 in prod for labs
[19:15:12] <chasemp>	 so we may not support the newer version
[19:15:43] <chasemp>	 the package being the same name along w/ repo priority yeah
[19:15:48] <chasemp>	 probably nukes the 3rd party
[19:15:56] <tom29739>	 How can I override the wikimedia repo, so I can use the 3rd party repo? 
[19:16:20] <tom29739>	 (without puppet overwriting whatever I do)
[19:16:42] <chasemp>	 I'm surprised it would mess w/ an installed package
[19:16:47] <chasemp>	 puppet isn't usually that smart
[19:17:20] <tom29739>	 It runs apt each time and overwrites it. With the specific salt-master package in the apt command.
[19:18:45] <chasemp>	 so not sure why the labs specific role there but
[19:18:55] <chasemp>	 the main manifest has a version flag
[19:18:57] <chasemp>	 class salt::master(
[19:18:57] <chasemp>	     $salt_version='installed',
[19:19:16] <chasemp>	 so in theory we can finagle it for hiera maybe if that's all it is
[19:19:53] <tom29739>	 The salt minion would need to be upgraded too.
[19:19:59] <tom29739>	 They share salt-common
[19:20:03] <tom29739>	 (the package)
[19:20:41] <tom29739>	 And I'd need to upgrade salt-minion too to take advantage of the new features.
[19:20:43] <chasemp>	 yeah it's going to be tracking down and/or creating settings to match thinsg and making it hiera friendly
[19:21:02] <chasemp>	 but using version of sw in labs newer or different from prod is difficult
[19:22:06] <tom29739>	 Can a custom role be created in puppet? Then I could just put that on my instance.
[19:22:41] <tom29739>	 role::salt::masters::labs::project_master is the current role, so maybe something could be created off of that?
[19:34:17] <chasemp>	 having a manifest for a specific version seems bad, I'm not sure how that's functionally different from allowing version specification in a command manifest
[19:34:33] <chasemp>	 and dual code paths for something like this will always lead to trouble
[19:35:13] <bd808>	 using salt will always lead to trouble ;)
[19:36:06] <tom29739>	 Puppet wouldn't even work when I tried it.
[19:36:45] <tom29739>	 The project puppetmaster just refused to work and then puppet refused to run on the other instances.
[19:39:49] <bd808>	 I know you had a lot of issues setting up that cluster which is always lame. I have used self-hosted and project wide puppetmasters alot in Labs though So I think it was all bad luck and/or bad documentation.
[19:41:46] <tom29739>	 I don't really understand puppet that much, which I'm sure is helping. ;)
[19:42:51] <bd808>	 switching puppetmasters requires some hand holding or getting the hiera settings just right before you start. There are ssl client certs that you have to remove in various places
[19:43:54] <tom29739>	 It wasn't accepting the certs if I recall. It was moaning, and then I couldn't ssh to the instances because puppet wouldn't run, and then it just fell over.
[19:44:40] <bd808>	 that was mostly bad luck I think though because you kept having issues aster you ditched the local puppet setup too right?
[19:44:47] <bd808>	 *after
[19:44:53] <tom29739>	 I think so.
[19:45:27] <tom29739>	 As least salt appears to be working, in that I can ping the minions.
[19:46:00] <chasemp>	 tom29739: what is the feature you are looking for in newer salt?
[19:46:33] <tom29739>	 I've got it on a tab somewhere.. I'll find it.
[19:47:14] <wikibugs>	 06Labs, 10Labs-Infrastructure: Copy labmon data to new SSDs - https://phabricator.wikimedia.org/T137924#2405716 (10RobH) a:05RobH>03None
[19:49:12] <tom29739>	 chasemp, it was spm.
[19:49:36] <tom29739>	 I remember trying it on the saltmaster, and it didn't work because it was that old version.
[19:50:13] <chasemp>	 hm interesting 
[19:51:36] <wikibugs>	 06Labs, 10Labs-Infrastructure: Copy labmon data to new SSDs - https://phabricator.wikimedia.org/T137924#2405720 (10RobH) Updating from ongoing work and chat with @yuvipanda and @chasemp   I've updated the task description to reflect that since Yuvi fixed this server in his AM, it is now experiencing a new issu...
[20:16:02] <tom29739>	 salt 2016.3.1 (Boron) :)
[20:17:07] <tom29739>	 Wonder what will happen when I re-enable puppet...
[21:26:43] <tom29739>	 And it's talking to the minions :) \o/
[21:27:51] <chasemp>	 so you installed the version of salt you want and puppet is back to normal
[21:28:01] <chasemp>	 but since puppet just checks for installed state
[21:28:04] <chasemp>	 it all works out
[21:28:05] <chasemp>	 ?
[21:28:23] <chasemp>	 my imagining is that is ok but if someone pins a version ever your going to feel it :)
[22:48:19] <Lokal_Profil>	 !log tools.heritage Recreated source tables (T138606)
[22:48:20] <stashbot>	 T138606: updating monuments_all fails due to wd_item - https://phabricator.wikimedia.org/T138606
[22:48:23] <labs-morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.heritage/SAL, Master