[01:02:47] <shinken-wm>	 PROBLEM - ToolLabs Home Page on toollabs is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[01:06:47] <shinken-wm>	 RECOVERY - ToolLabs Home Page on toollabs is OK: HTTP OK: HTTP/1.1 200 OK - 908295 bytes in 4.300 second response time
[01:12:41] <Coren>	 milimetric: What are you doing on druid1 atm?
[01:12:55] <Coren>	 milimetric: that tail is - literally - eating up all t he NFS I/O bandwidth.
[01:13:01] <milimetric>	 ugh
[01:13:03] <milimetric>	 sorry! :(
[01:13:08] <milimetric>	 it's this insane task
[01:13:13] <milimetric>	 I have to tail -n +2 a file
[01:13:16] <milimetric>	 and it's 35GB
[01:13:32] <milimetric>	 because the first line and only the first line is causing this huge indexing job to fail (not my code)
[01:13:54] <milimetric>	 is there a better way to copy in place on there?
[01:14:07] <milimetric>	 it should be done relatively soon...
[01:14:31] <Coren>	 You might want to insert a pv with a -L limit?
[01:14:32] <milimetric>	 Coren: ^
[01:14:47] <Coren>	 But if it's over soon, then you should be okay.
[01:15:10] <Negative24>	 !ping
[01:15:11] <wm-bot>	 !pong
[01:15:40] <Coren>	 Pro tip: when you have a very expensive I/O operation we (any opsen) can usually do it easier and faster directly on the storage server instead.
[01:15:44] <milimetric>	 :/ I started it about 30 minutes ago, and if I'm right it should only take about 35 minutes
[01:15:59] <milimetric>	 oh, i had no idea i had rights to log into the store server
[01:16:02] <Coren>	 milimetric: Well, the output file size should give you a good idea.
[01:16:04] <milimetric>	 how do I do that
[01:16:43] <milimetric>	 I'd ls -l the output file but i didn't wanna kill the NFS even more
[01:16:58] <Coren>	 labstore1002.eqiad.wmnet.  The actual storage is under /srv
[01:17:07] <Coren>	 You'll know for the next time.  :-)
[01:17:50] <milimetric>	 thx and sorry, if it's not over in 10 minutes I'll kill it
[01:17:57] <Coren>	 When you do an expensive thing there, just making it 'ionice -c idle <whatever>' means you won't kill I/O.
[01:18:23] <Negative24>	 lag on tools, anyone?
[01:18:32] <Coren>	 milimetric: That's allright, but since it's user-impacting you need to do an outage report now.  :-)
[01:21:07] <gifti>	 lol
[01:21:08] <Coren>	 Negative24: Yeah; should be over soon.
[01:21:09] * Negative24 is reading scrollback
[01:21:09] <milimetric>	 ok, my oh my - https://wikitech.wikimedia.org/wiki/Incident_documentation right?
[01:21:10] <Coren>	 Ayup.
[01:21:10] <Coren>	 (Just to clarify, /you/ couldn't have logged in to labstore1002 to do the fix, afaik, but anyone in ops can and would have been happy to help)
[01:22:22] <milimetric>	 oh, I see
[01:22:45] <Negative24>	 How much does tools rely on NFS?
[01:22:55] <milimetric>	 Coren: I'm gonna kill it, mind helping me then?
[01:23:16] <Coren>	 milimetric: Sure thing.  Just tell me what the file is and I'll start it now
[01:24:20] <milimetric>	 k, it's /data/project/milimetric/1day.pageviews.2015.10.14.json and I just need everything but the 1st line in a file in the same folder, doesn't matter what you call it
[01:24:21] <Coren>	 Negative24: Quite a bit - it's the project that most depends on it.  That said, the normal case (as we see now) is performance impact not brokenness.
[01:24:21] <milimetric>	 sorry everyone, lag should be ok now, I killed it
[01:24:22] <Negative24>	 ok, makes sense
[01:24:26] <Negative24>	 milimetric: thanks
[01:25:25] <Coren>	 milimetric: It was about 2/3 done - good thing you killed it.
[01:25:33] <milimetric>	 yes
[01:25:39] <milimetric>	 i'll delete that partially done one
[01:26:14] <milimetric>	 oh... does *that* use NFS bandwidth too?
[01:26:43] <Coren>	 milimetric: It'll have an impact, but very briefly.
[01:27:25] * milimetric is spooked and sorry, will look at /data/project with fear from now on
[01:28:06] <Coren>	 Heh.  No need for fear but just remember that manipulating 100s of G of data has a cost.  :-)
[01:28:23] <yuvipanda_>	 Ugh
[01:28:27] <yuvipanda_>	 Got a page about nfs 
[01:28:35] <Coren>	 YuviPanda: handled now.
[01:28:36] <yuvipanda_>	 A bit away from my laptop...
[01:28:38] <yuvipanda_>	 Ah ok
[01:28:39] <Negative24>	 Luke081515|away: oh I didn't know you had an away nick. I was trying to reply to your privmsg
[01:28:40] <yuvipanda_>	 Cook
[01:28:42] <yuvipanda_>	 Cool
[01:28:47] <yuvipanda_>	 I'll go away then. Thanks 
[01:28:53] <Coren>	 np
[01:29:18] <Coren>	 milimetric: I'll wait for everything to catch up then start your tail
[01:32:46] <milimetric>	 https://wikitech.wikimedia.org/wiki/Incident_documentation/20151023-LabsNFS-Lag
[01:33:04] <milimetric>	 (feel free to change or ask me to put any other details in)
[01:33:24] <Coren>	 milimetric: I started the tail capped at 20MB/s; will take 28m
[01:33:34] <milimetric>	 thx much
[01:34:01] <Coren>	 milimetric: Only thing missing is a timeline.
[01:34:07] <milimetric>	 this is for the pageview API, btw, we're trying to figure out better ways to store the data in less space so we can give people more historical data
[01:34:18] <Coren>	 milimetric: (when you started the operation, when you killed it)
[01:34:30] <milimetric>	 k
[01:39:35] <milimetric>	 added the timeline and moved to 10-24 because UTC: https://wikitech.wikimedia.org/wiki/Incident_documentation/20151024-LabsNFS-Lag
[01:42:34] <Coren>	 milimetric: Danke.
[01:48:41] <Coren>	 milimetric: It make take a while longer than pv told me at first since it's on Idle ionice - other users get priority.
[01:59:01] <gifti>	 is something broken again? (tools webservices don't respond to me)
[01:59:28] <gifti>	 hm, works again
[02:01:07] <gifti>	 but nfs seems to be full
[02:05:01] <gifti>	 Coren: ^?
[02:05:01] <Coren>	 "full"?
[02:05:01] <gifti>	 no bandwith left
[02:05:02] <gifti>	 sorry
[02:05:02] <Coren>	 Yeah, looking.
[02:05:03] <gifti>	 can't log in, connections frozen
[02:05:03] <Coren>	 I actually see no NFS traffic at all - that seems wrong.
[02:05:17] <Coren>	 ... and it's back.
[02:05:20] <gifti>	 yup
[02:05:24] * Coren boggles a bit.
[02:07:21] <gifti>	 strange
[02:07:21] <Negative24>	 hmm
[02:07:22] <Negative24>	 Coren: how do you check the NFS stats?
[02:07:22] <Coren>	 For a short while, there was absolutely no network traffic between labs and the storage server
[02:07:22] <Coren>	 Negative24: Depends.  Right now I'm doing iptraf on the server proper, but my normal "keep an eye on thing" is in grafana
[02:07:23] <Negative24>	 Ok. Grafana is pretty cool
[02:07:23] <Coren>	 Hm.  Not seeing any outliers on traffic either, so nothing is going rogue that I can see.
[02:09:15] <Coren>	 Well, except someone making a tarball on tools-bastion but it's not /too/ bad.
[02:18:05] <Coren>	 It's doing it again.
[02:18:17] <gifti>	 yeah :\
[02:18:29] <Coren>	 afaict, network connections just... stop.
[02:18:36] <Coren>	 Then start again.
[02:18:43] <gifti>	 someone plugging some cables?
[02:19:29] <Coren>	 I'm not seeing anything broken on the labstore; and the server's network appears to be up.
[02:19:31] <Coren>	 ... and, traffic returns.
[02:19:35] * Coren looks closer at routing between labs and not-labs.
[02:20:26] <gifti>	 and problems again
[02:20:36] <gifti>	 and up
[02:20:40] <gifti>	 wow
[02:21:03] <Coren>	 gifti: I think what you're seeing is a side effect of exponential backoff though - when the connection returns it's going to pick up slowly.
[02:21:27] <Coren>	 gifti: Because the clients don't know it's not just network congestion
[02:21:51] <gifti>	 mhm
[02:24:49] * Coren looks at labnet1002 with suspicion.
[02:28:18] * Coren wonders why wm-bots is taking doing so much I/O
[02:36:34] <Coren>	 Augh.  I hate heisenbugs.
[02:51:14] <milimetric>	 so I wanted to just read that big file from /data/project now, to process it, now that it's ready
[02:51:41] <milimetric>	 just making sure that's ok.  There's no real way to pv it because it's not my code that's reading it
[02:52:06] <milimetric>	 (because it looks like there are other problems going on)
[02:53:31] <milimetric>	 I guess the process reading it will be rate-limited anyway, by how much it can process
[02:53:47] <milimetric>	 so probably won't kill anything.  Just ping me if I'm using up too much bandwidth again
[02:54:09] <Coren>	 kk
[02:54:44] <Coren>	 Reading, as a rule, is lighter than writing.
[03:12:21] <milimetric>	 when it says 80Gb allocated storage here: https://wikitech.wikimedia.org/wiki/Nova_Resource:Druid1.analytics.eqiad.wmflabs
[03:12:39] <milimetric>	 is that available somewhere I'm not looking?  It looks like / has only 20G on druid1
[03:17:43] <Coren>	 milimetric: Extra space is not partitionned by default.
[03:18:14] <milimetric>	 oh, is there a way to get that?  Then I can pv copy that big file and end my nightmares
[03:18:21] <milimetric>	 it failed again some weird characters in the middle of it
[03:19:03] <Coren>	 labs_lvm::volume
[03:19:39] <milimetric>	 sweet
[03:20:30] <milimetric>	 hm, I see role::labs::lvm::mnt and role::labs::lvm::srv
[03:20:44] <Coren>	 role::labs::lvm::srv is an easily-usable role you can simply apply that basically just mounts "everything else" on /srv
[03:21:04] <Coren>	 Both just use labs_lvm::volume with reasonable values.
[03:21:34] <milimetric>	 and that won't be nfs-mounted, so I can copy to it and then do whatever I want on /srv?
[03:22:06] <Coren>	 Not NFS, though still virtual resource.  So, "Yes, for various values of 'whatever I want'".  :-)
[03:22:13] <Coren>	 Much more forgiving than NFS though.
[03:22:34] <milimetric>	 cool, I'm just going to be piping that file through regexes until I get it to validate
[03:22:43] <milimetric>	 thx
[04:48:47] <Krinkle>	 YuviPanda: The Grafana instance in labs is a bit weird (each user has their own org, dashboards made inside that are not publicly visible). But here's something to consider: http://i.imgur.com/teEggEj.png
[04:49:20] <Krinkle>	 The new Grafana 2.0 has an important feature for this to work: Repeatable panels, and repeatable rows - based on available values in a variable.
[04:49:27] <Krinkle>	 So it can repeat a row for each server in a project.
[04:51:50] <Krinkle>	 Once we can host this dashboard somewhere that doesn't require logging in, it'll feel like it's time to decommision my Nagf tool - https://tools.wmflabs.org/nagf/?project=cvn - which is now fully obsoleted by Grafana functionality with this latest release.
[04:52:10] <Krinkle>	 We can probably also use production grafana, since it just queries graphite. It can take multiple graphite installs.
[08:18:46] <shinken-wm>	 PROBLEM - Puppet failure on tools-webgrid-lighttpd-1410 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0]
[08:58:45] <shinken-wm>	 RECOVERY - Puppet failure on tools-webgrid-lighttpd-1410 is OK: OK: Less than 1.00% above the threshold [0.0]
[09:46:37] <shinken-wm>	 PROBLEM - Puppet staleness on tools-k8s-bastion-01 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [43200.0]
[11:50:45] <shinken-wm>	 PROBLEM - Puppet failure on tools-webgrid-lighttpd-1410 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0]
[12:25:53] <shinken-wm>	 RECOVERY - Puppet failure on tools-webgrid-lighttpd-1410 is OK: OK: Less than 1.00% above the threshold [0.0]
[14:18:48] <wikibugs>	 10Wikibugs: Wrong message, wikibugs desplayed normaly users, if herald does an action - https://phabricator.wikimedia.org/T116477#1750540 (10Luke081515) 3NEW
[16:43:18] <wikibugs>	 10Wikibugs, 6Phabricator: Wrong message, wikibugs desplayed normaly users, if herald does an action - https://phabricator.wikimedia.org/T116477#1750681 (10Legoktm) wikibugs is just echoing the information it received from Phabricator: ``` 2015-10-24 14:10:59,941 - wikibugs.wb2-phab - DEBUG - Processing {"class...
[16:49:44] <wikibugs>	 10Wikibugs, 6Phabricator: Wrong message, wikibugs desplayed normaly users, if herald does an action - https://phabricator.wikimedia.org/T116477#1750692 (10valhallasw) redis2irc.log: ``` 2015-10-24 14:11:00,579 - irc3.wikibugs - DEBUG - > PRIVMSG #wikimedia-releng :10Beta-Cluster-Infrastructure, 10CirrusSearch,...
[16:49:51] * valhallasw`cloud eyes legoktm
[16:49:52] <valhallasw`cloud>	 :>
[16:57:39] <wikibugs>	 6Labs, 10Tool-Labs: Provide centralized logging (logstash) - https://phabricator.wikimedia.org/T97861#1750706 (10intracer)
[16:59:18] <wikibugs>	 10Wikibugs, 6Phabricator: Wrong message, wikibugs desplayed normaly users, if herald does an action - https://phabricator.wikimedia.org/T116477#1750711 (10Luke081515) I wonder, normaly herald don't does something, when someone just reads.
[16:59:52] <wikibugs>	 6Labs, 10Tool-Labs: Cannot start java processes using the grid engine - https://phabricator.wikimedia.org/T69588#1750712 (10intracer)
[17:02:13] <shinken-wm>	 PROBLEM - ToolLabs Home Page on toollabs is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[17:06:49] <shinken-wm>	 RECOVERY - ToolLabs Home Page on toollabs is OK: HTTP OK: HTTP/1.1 200 OK - 908345 bytes in 3.345 second response time
[18:21:46] <shinken-wm>	 PROBLEM - Puppet failure on tools-webgrid-lighttpd-1410 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0]
[18:56:45] <shinken-wm>	 RECOVERY - Puppet failure on tools-webgrid-lighttpd-1410 is OK: OK: Less than 1.00% above the threshold [0.0]
[19:03:43] <wikibugs>	 6Labs, 6Phabricator: phabricator puppet at labs broken - https://phabricator.wikimedia.org/T116442#1750815 (10Negative24) Which project and node is this?
[19:06:53] <Negative24>	 Luke081515|AFK: I'm going to try and replicate ^
[19:09:43] <JohnFLewis>	 Negative24: the error tells you
[19:10:02] <JohnFLewis>	 It's rcm-3 in the rcm project
[19:10:20] <Negative24>	 JohnFLewis: oh I see
[19:10:40] <Negative24>	 nm then
[19:12:27] <Negative24>	 I'm trained to just see what I want to see in an error (because I see them so much :P)
[19:13:06] <JohnFLewis>	 Mkay :P
[19:28:38] <wikibugs>	 6Labs, 6Phabricator: phabricator puppet at labs broken - https://phabricator.wikimedia.org/T116442#1750835 (10Negative24) Hmm, it works for me on phab-t116442.performance.eqiad.wmflabs. @Luke081515, how did you configure the puppet group on the rcm project?
[19:42:32] <Negative24>	 twentyafterfour: was there a ticket created for the phab security extension not being cloned by puppet on install?
[19:52:50] <wikibugs>	 6Labs, 6Phabricator: phabricator puppet at labs broken - https://phabricator.wikimedia.org/T116442#1750851 (10Luke081515) I used role::phabricator::main, and then tried to get it with sudo puppet agent -tv. The whole console output is this:    luke081515@rcm-2:~$ sudo puppet agent -tv   Info: Retrieving plugin...
[19:52:58] <Luke081515>	 Negative24: I wrote it down at the task
[19:53:56] <wikibugs>	 6Labs, 6Phabricator, 7Puppet: phabricator puppet at labs broken - https://phabricator.wikimedia.org/T116442#1750852 (10Krenair)
[19:54:18] <Negative24>	 Luke081515: I was asking how not which class
[19:54:54] <Luke081515>	 Negative24: with sudo puppet agent -tv
[19:55:32] <Negative24>	 Did you add it through Special:NovaPuppetGroup?
[19:55:42] <Negative24>	 and then configure the VM?
[19:56:04] <Negative24>	 Luke081515: and you want role::phabricator::labs not role::phabricator::main
[19:56:04] <Luke081515>	 Negative24: Yeah, that's true
[19:56:10] <Luke081515>	 hm, ok
[19:56:14] <Negative24>	 try that
[19:56:45] <Luke081515>	 ok, wait a moment
[19:57:58] <Negative24>	 Luke081515: it will fail again but I'll walk you from there
[19:58:32] <Negative24>	 phabricator takes a few steps. The puppet role isn't a turnkey configure-er
[19:59:21] <Luke081515>	 ok, thanks. At the moment one error, and a lot of notices
[20:00:38] <Negative24>	 Luke081515: post the error here
[20:00:50] <Luke081515>	 Error: /Stage[main]/Apache::Logrotate/Augeas[Apache2 logs]: Could not evaluate: Save failed with return code false, see debug
[20:01:01] <Luke081515>	 and two more:
[20:01:03] <wikibugs>	 6Labs, 10wikitech.wikimedia.org: Interacting with table generated by {{#ask}} throws Uncaught TypeError - https://phabricator.wikimedia.org/T101642#1750860 (10Aklapper)
[20:01:05] <wikibugs>	 6Labs, 6operations, 10wikitech.wikimedia.org: distribution upgrade for wikitech-static instance - https://phabricator.wikimedia.org/T94585#1750861 (10Aklapper)
[20:01:06] <Luke081515>	 Error: Could not start Service[phd]: Execution of '/usr/sbin/service phd start --force' returned 255:
[20:01:07] <wikibugs>	 6Labs, 10wikitech.wikimedia.org: Project Bastion has service groups - https://phabricator.wikimedia.org/T64537#1750862 (10Aklapper)
[20:01:09] <wikibugs>	 6Labs, 10wikitech.wikimedia.org: Upgrade SMW to 1.9 or later - https://phabricator.wikimedia.org/T62886#1750863 (10Aklapper)
[20:01:09] <Luke081515>	 Error: /Stage[main]/Phabricator/Service[phd]/ensure: change from stopped to running failed: Could not start Service[phd]: Execution of '/usr/sbin/service phd start --force' returned 255:
[20:01:11] <wikibugs>	 6Labs, 10wikitech.wikimedia.org: Discrepancies in public IP instance lists between different wikitech UIs - https://phabricator.wikimedia.org/T62883#1750865 (10Aklapper)
[20:01:13] <wikibugs>	 6Labs, 10wikitech.wikimedia.org: Can't reset password on wikitech (Unicode passwords not accepted), due to LDAP/opendj? - https://phabricator.wikimedia.org/T58114#1750867 (10Aklapper)
[20:01:15] <wikibugs>	 6Labs, 10wikitech.wikimedia.org, 7Regression: [Regression] Editing "Documentation" page for labs project not working - https://phabricator.wikimedia.org/T47519#1750869 (10Aklapper)
[20:01:16] <Luke081515>	 that are all errors
[20:01:17] <wikibugs>	 6Labs, 10wikitech.wikimedia.org: Hostnames assigned to floating IP persist when deallocated - https://phabricator.wikimedia.org/T55816#1750870 (10Aklapper)
[20:01:19] <wikibugs>	 6Labs, 10wikitech.wikimedia.org: Cleanup and enable UserFunctions extension on wikitech - https://phabricator.wikimedia.org/T47455#1750873 (10Aklapper)
[20:01:21] <wikibugs>	 6Labs, 10LabsDB-Auditor, 10Tool-Labs: Make labsdb views fully column-whitelist based - https://phabricator.wikimedia.org/T86218#1750874 (10Aklapper)
[20:01:31] <wikibugs>	 6Labs, 10LabsDB-Auditor: Generate simple HTML interface to view reports generated by labsdb-auditor - https://phabricator.wikimedia.org/T78723#1750881 (10Aklapper)
[20:01:49] <wikibugs>	 6Labs, 10Tool-Labs: deduplicate compute::general and compute::dedicated roles - https://phabricator.wikimedia.org/T99131#1750882 (10Aklapper)
[20:01:51] <wikibugs>	 6Labs, 10Tool-Labs: Investigate alternatives to dedicated exec node for gifti's tools - https://phabricator.wikimedia.org/T99130#1750883 (10Aklapper)
[20:01:53] <wikibugs>	 6Labs, 10Tool-Labs: Shinken: make sure 'Free space - all mounts' can handle no-longer-existing mounts - https://phabricator.wikimedia.org/T99077#1750884 (10Aklapper)
[20:01:54] <wikibugs>	 6Labs, 10Tool-Labs: Fix shinken config to remove tools-webproxy-test - https://phabricator.wikimedia.org/T99073#1750885 (10Aklapper)
[20:01:56] <wikibugs>	 6Labs, 10Tool-Labs, 5Patch-For-Review: Deprecate #no-default-php in .lighttpd.conf - https://phabricator.wikimedia.org/T98818#1750886 (10Aklapper)
[20:01:58] <wikibugs>	 6Labs, 10Tool-Labs: Investigate / get rid of http debian repo from toollabs - https://phabricator.wikimedia.org/T98575#1750887 (10Aklapper)
[20:02:00] <wikibugs>	 6Labs, 10Tool-Labs: Add shinken admin accounts for tools ops - https://phabricator.wikimedia.org/T97862#1750889 (10Aklapper)
[20:02:02] <wikibugs>	 6Labs, 10Tool-Labs: Convert tomcat-starter to python - https://phabricator.wikimedia.org/T98442#1750888 (10Aklapper)
[20:02:06] <wikibugs>	 6Labs, 10Tool-Labs, 3ToolLabs-Goals-Q4: Make tools-mail redundant - https://phabricator.wikimedia.org/T96967#1750897 (10Aklapper)
[20:02:08] <wikibugs>	 6Labs, 10Tool-Labs: toolsbeta: set up puppet-compiler / temporary-apply - https://phabricator.wikimedia.org/T97081#1750896 (10Aklapper)
[20:02:31] <Negative24>	 wow the batch editing Andre
[20:02:50] <Negative24>	 Luke081515: you can ignore the apache error
[20:02:56] <Luke081515>	 yeah, I found the batch edit job: 78 tasks
[20:03:09] <Negative24>	 but the phd service is what is what I needed
[20:03:13] <wikibugs>	 6Labs, 10Tool-Labs, 10Internet-Archive: Document how to install Python modules in a tool's home directory/virtual environment - https://phabricator.wikimedia.org/T63824#1750973 (10Aklapper)
[20:03:13] <wikibugs>	 6Labs, 10Tool-Labs, 7Tracking: Toolserver migration to Tools (tracking) - https://phabricator.wikimedia.org/T60788#1750975 (10Aklapper)
[20:03:15] <wikibugs>	 6Labs, 10Tool-Labs, 7Documentation: add basic expectations management to docs - https://phabricator.wikimedia.org/T56701#1750984 (10Aklapper)
[20:03:16] <Luke081515>	 ok
[20:03:17] <wikibugs>	 6Labs, 10Tool-Labs, 7Documentation: Wikimedia Labs system admin (sysadmin) documentation sucks - https://phabricator.wikimedia.org/T57946#1750982 (10Aklapper)
[20:03:19] <wikibugs>	 6Labs, 10Tool-Labs: Status page should automatically refresh data - https://phabricator.wikimedia.org/T54275#1750983 (10Aklapper)
[20:03:21] <wikibugs>	 6Labs, 10Tool-Labs: Clean up list of projects on Tool Labs home page and add Tomcat tools - https://phabricator.wikimedia.org/T51937#1750987 (10Aklapper)
[20:04:10] <Negative24>	 Luke081515: cd to /srv/phab/libext and run `sudo git clone https://github.com/wikimedia/phabricator-extensions-security.git security`
[20:04:10] <Luke081515>	 hm, ok, so whats the next step?
[20:04:17] <Luke081515>	 ok, wait a moment
[20:04:42] <wikibugs>	 6Labs, 10LabsDB-Auditor, 10MediaWiki-extensions-OpenStackManager, 10Tool-Labs, and 7 others: Labs' Phabricator tags overhaul - https://phabricator.wikimedia.org/T89270#1750993 (10Aklapper) 5Open>3Resolved a:3Aklapper >>! In T89270#1374717, @Aklapper wrote: > * This will **NOT** retroactively add exis...
[20:04:42] <Luke081515>	 done
[20:05:06] <Negative24>	 Luke081515: now cd to phabricator/ and run `sudo bin/storage upgrade` and answer yes to the prompts
[20:06:08] <twentyafterfour>	 Negative24: not that I know of
[20:06:30] <Luke081515>	 Negative24: Done
[20:06:42] <Negative24>	 twentyafterfour: ok because we have another user that I'm walking through the phab lab install process ^
[20:07:02] <Negative24>	 Luke081515: then rerun puppet and you should be good to go
[20:08:32] <wikibugs>	 6Labs, 10wikitech.wikimedia.org: Hostnames assigned to floating IP persist when deallocated - https://phabricator.wikimedia.org/T55816#1751008 (10Krenair) Is {T115194} an instance of this bug?
[20:08:37] <Luke081515>	 Negative24: Notice: Finished catalog run in 16.54 seconds
[20:08:40] <Luke081515>	 succesful
[20:08:57] <wikibugs>	 6Labs, 6Phabricator, 7Puppet: phabricator puppet at labs broken - https://phabricator.wikimedia.org/T116442#1751011 (10Negative24) 5Open>3Resolved a:3Negative24 `role::phabricator::main` isn't the right Puppet class to use in Labs. I'm pretty sure the error had to do with the site variables. I walked @...
[20:09:28] <Negative24>	 Luke081515: did you configure a web proxy?
[20:09:47] <Luke081515>	 Negative24: Which port? 8080 or 80?
[20:09:54] <Negative24>	 Luke081515: 80
[20:10:00] <Luke081515>	 ok, then I got one
[20:10:25] <Negative24>	 then you should be able to reach the instance and configure a default admin account
[20:12:48] <Luke081515>	 504 Gateway Time-out :(
[20:12:59] <Luke081515>	 Negative24: Not possible at the moment
[20:13:10] <Negative24>	 Luke081515: what the web address
[20:13:13] <Luke081515>	 Or need I to configure the domain first?
[20:13:22] <Negative24>	 s/what's/what
[20:13:23] <Luke081515>	 I choosed http://luke081515.wmflabs.org/
[20:14:04] <Negative24>	 make sure the web proxy is pointed at the right instance
[20:14:12] <Negative24>	 https://wikitech.wikimedia.org/wiki/Special:NovaProxy
[20:14:49] <Luke081515>	 the instance is the right
[20:14:58] <Luke081515>	 *is correct
[20:16:01] <Luke081515>	 but it still times out
[20:16:47] <Negative24>	 oh I know
[20:16:54] <Negative24>	 security policies
[20:17:53] <Negative24>	 Luke081515: what security groups is your instance in? (check Special:NovaInstance under Security groups)
[20:18:29] <Luke081515>	 Negative24: Do you mean Special:NovaSecurityGroup?
[20:18:48] <Negative24>	 Luke081515: yeah
[20:19:15] <Negative24>	 you need to add port 80 to one of the security groups your instance is in
[20:19:51] <Luke081515>	 Negative24: At the moment, I got these: https://phabricator.wikimedia.org/F2764696
[20:20:19] <Negative24>	 ok, only one security group.
[20:20:34] <Negative24>	 click add rule next to the default rule
[20:20:37] <Luke081515>	 yeah, I don't change it yet, since the project exists
[20:20:54] <Luke081515>	 ok, which ports, and which protocol?
[20:21:19] <doctaxon>	 eh, wozu hast du einen TaxonBot-Ordner
[20:21:58] <Negative24>	 Luke081515: beginning and end port range: 80, protocol: tcp, CIDR range: 0.0.0.0/0
[20:22:53] <Luke081515>	 Negative24: And which source group? I only have default, so choose this?
[20:23:26] <Negative24>	 Luke081515: no don't select anything there. That's for creating another type of rule
[20:23:37] <Luke081515>	 ok
[20:23:46] <Luke081515>	 I created the rule now
[20:23:47] <Negative24>	 Luke081515: only fill in the section "Individual rule"
[20:24:11] <Luke081515>	 yeah, I've done that
[20:24:40] <Negative24>	 hmm, its still not working. At least it hit the instance
[20:25:17] <Negative24>	 give me one sec
[20:25:30] <Luke081515>	 yeah, but I have now another error, an exception
[20:25:36] <Luke081515>	 This request asked for "/" on host "luke081515.wmflabs.org", but no site is configured which can serve this request.
[20:26:01] <Negative24>	 yeah I saw that
[20:30:10] <Negative24>	 Luke081515: ah ok. So puppet sets the phabricator url to <hostname>.wmflabs.org so you're are going to have to use rcm-3.wmflabs.org
[20:30:36] <Luke081515>	 ok, so I have to change the proxy config?
[20:30:47] <Negative24>	 yes, create a new proxy
[20:31:15] <Luke081515>	 YEAH, it works!
[20:31:17] <Luke081515>	 https://rcm-2.wmflabs.org/
[20:31:59] <doctaxon>	 is das das neue phab?
[20:32:07] <Negative24>	 you have two instances? rcm-2 and rcm-3?
[20:32:16] <Luke081515>	 yeah :)
[20:32:37] <Negative24>	 ok
[20:32:56] <Luke081515>	 Negative24: so puppet imported the config to my instance, and I can change it?
[20:33:27] <Negative24>	 Luke081515: No. It pulls the config from the puppetmaster. I don't think you can change it
[20:33:36] <Luke081515>	 hm
[20:34:31] <Luke081515>	 but changing local config works :)
[20:34:46] <Negative24>	 which config?
[20:34:58] <Luke081515>	 the config at phabricator, like require login
[20:34:58] <Negative24>	 are you trying to change the base-uri?
[20:35:11] <Luke081515>	 no, I first put the instance to login required ;)
[20:35:24] <Luke081515>	 and that works
[20:35:56] <Luke081515>	 but I got still mysql errors
[20:36:01] <Negative24>	 yeah, those configs are in the database
[20:36:06] <Luke081515>	 ok
[20:37:27] <Luke081515>	 Negative24: Can youhelp me? I got MySQL May Run Slowly
[20:37:30] <Luke081515>	 Can I change that?
[20:37:41] <Negative24>	 you can ignore those
[20:37:48] <Negative24>	 just click ignore issue
[20:37:51] <Luke081515>	 ok
[20:40:46] <Luke081515>	 Negative24: Then, thank you much for your help :)
[20:41:03] <Negative24>	 Luke081515: you're welcome
[20:43:23] <doctaxon>	 Negative24: is it possible to migrate the data of the old phab in this new one now?
[20:43:52] <Negative24>	 doctaxon: I'm not sure. I believe that would require a database migration
[20:43:53] <doctaxon>	 tasks, diffs and gits, projects, workboards, all
[20:44:07] <doctaxon>	 yes, but how to
[20:44:29] <Luke081515>	 doctaxon: I guess you need access to the old database, and that's not possible
[20:44:54] <doctaxon>	 i think, Negative24 has a plan, right?
[20:45:25] <Negative24>	 I had no idea what you're trying to do
[20:45:27] <Negative24>	 *have
[20:45:51] <doctaxon>	 i need the data of the old phab in this new now to work on on my tasks for example
[20:46:15] <doctaxon>	 there are many projects on the old
[20:46:28] <doctaxon>	 i need it on the new now
[20:46:41] <Negative24>	 what old phab? phabricator.wikimedia.org?
[20:47:06] <doctaxon>	 https://luke081515.phoreplay.com
[20:47:06] <Negative24>	 I don't even know what this is for
[20:48:04] <doctaxon>	 i am user on this phab and i need the new data migrated on this new phab of Luke081515 now
[20:48:23] <valhallasw`cloud>	 so ask Luke081515 to dump and import the databases?
[20:48:51] <doctaxon>	 oh ya, may I know, how to do this?
[20:48:57] <Luke081515>	 the problem is, that I can't access the old database, So I will have to do this manully, but it is not so much :)
[20:49:20] <doctaxon>	 it doesn't work manually
[20:49:26] <Luke081515>	 I does
[20:49:29] <Luke081515>	 *it
[20:49:36] <Negative24>	 Luke081515: you're going to have to bring that up with phoreplay
[20:49:46] <valhallasw`cloud>	 ask the phoreplay people for a database dump, then, I guess?
[20:49:55] <valhallasw`cloud>	 or write a bot to migrate everything, but that's not going to be a lot of fun
[20:50:05] <Negative24>	 no that won
[20:50:12] <Negative24>	 *won't
[20:50:45] <Luke081515>	 valhallasw`cloud: Was flimport not a bot, which migrates from another phabricator? Maybe a script like this exists?
[20:51:43] <doctaxon>	 yes, this is a good question?
[20:53:02] <Negative24>	 flimport was bz -> phab, right?
[20:53:21] <valhallasw`cloud>	 no, I think flimport was phab01.wmflabs -> phab.wm.o
[20:53:27] <valhallasw`cloud>	 but it did a pretty crappy job at it
[20:53:40] <valhallasw`cloud>	 all messages are  'from: flimport' etc: https://phabricator.wikimedia.org/T111
[20:53:56] <valhallasw`cloud>	 it's probably mentioned somewhere in the migration docs
[20:54:32] <Luke081515>	 and rtimport, was this also a phab->phab bot?
[20:54:50] <valhallasw`cloud>	 rt->phab, as the name suggests.
[20:55:10] <valhallasw`cloud>	 and we've had a jira->phab bot, which also did a reasonable-but-not-very-good job
[20:55:17] <Luke081515>	 hm, ok. I don't know whats rt is, but this is another question :)
[20:55:22] <Negative24>	 Luke081515: any bot that you resurface is going to need serious overhaul work to get it to work with what you are trying to do
[20:56:07] <Luke081515>	 hm, ok, so it could be faster to this manually, we have just 50 open tasks :)
[20:56:48] <valhallasw`cloud>	 and while you're at it, why not just move to the wikimedia phabricator instance...?
[20:58:30] <Negative24>	 Luke081515: I'm pretty sure phoreplay would be happy to help. They already do database backups from what I see.
[21:01:03] <Luke081515>	 ok, then thanks for you're help
[21:01:12] <Negative24>	 and valhallasw has a good point
[21:02:06] <Luke081515>	 We got some private tasks, and use mainly not english to communicate, and normaly I would use a different configuration
[21:03:56] <valhallasw`cloud>	 Neither of those should really be an issue. There is no strict requirement for people to use English on wm-phab (although it's common) and private tasks can just be handled the same way security issues are.
[21:04:00] <valhallasw`cloud>	 But it's your choice in the end.
[21:05:19] <Luke081515>	 (and I would do some testing at my instance too). Thank you, for your help :)
[21:10:26] <wikibugs>	 6Labs, 10wikitech.wikimedia.org, 7Regression: [Regression] Editing "Documentation" page for labs project not working - https://phabricator.wikimedia.org/T47519#1751057 (10Krinkle) 5Open>3Resolved a:3Krinkle
[21:29:30] <wikibugs>	 6Labs, 6Phabricator, 5Patch-For-Review, 7Puppet: On labs phabricator references security extension even though it isn't present - https://phabricator.wikimedia.org/T104904#1751092 (10Negative24) 5Resolved>3Open Those two commits ensure the directory is created but doesn't install the security extension...
[21:32:33] <wikibugs>	 6Labs, 6Phabricator, 5Patch-For-Review, 7Puppet: On labs phabricator references security extension even though it isn't present - https://phabricator.wikimedia.org/T104904#1751095 (10mmodell) I think we want the security extension in labs. At least until we deprecate it's use. I'm in the process of develop...
[21:56:43] <wikibugs>	 6Labs, 10Beta-Cluster-Infrastructure: Figure out why wikipedia requires an extra DNS entry that the other sites do not - https://phabricator.wikimedia.org/T111661#1751131 (10Krenair) a:5Andrew>3Krenair
[21:59:17] <wikibugs>	 6Labs, 10Beta-Cluster-Infrastructure: beta-hhvm.wmflabs.org? - https://phabricator.wikimedia.org/T111657#1751137 (10Krenair) Have also got cloudadmin and killed these entries in the labs global DNS config: ```beta-hhvm: beta-hhvm.wmflabs.org wikipedia-beta-hhvm: wikipedia.beta-hhvm.wmflabs.org```
[22:02:36] <wikibugs>	 6Labs, 10Beta-Cluster-Infrastructure: Figure out why wikipedia requires an extra DNS entry that the other sites do not - https://phabricator.wikimedia.org/T111661#1751138 (10Krenair) 5Open>3Resolved I got cloudadmin, removed the weird NovaAddress entry, and then removed the `wikipedia-beta: wikipedia.beta....
[22:27:48] <wikibugs>	 6Labs, 3Labs-Sprint-107, 3Labs-Sprint-108, 3Labs-Sprint-109: Switch NFS server back to labstore1001 - https://phabricator.wikimedia.org/T107038#1751148 (10yuvipanda) p:5Triage>3High Changing priority to 'high' since labstore1002 is increasingly having more hardware issues.
[22:27:58] <wikibugs>	 6Labs, 10Labs-Team-Backlog, 3Labs-Sprint-107, 3Labs-Sprint-108, 3Labs-Sprint-109: Switch NFS server back to labstore1001 - https://phabricator.wikimedia.org/T107038#1751150 (10yuvipanda)
[22:32:29] <Luke081515>	 Negative24: Are you still here?
[22:32:39] <Negative24>	 yea
[22:32:59] <Luke081515>	 The Instance don't send any mails at the moment, can you help?
[22:34:35] <Negative24>	 I don't think I figured that out. Did you check the settings?
[22:35:14] <Luke081515>	 yeah, he don't send welcome mails, and they should override the normal settings
[22:36:27] <Negative24>	 overrides as in it should send a welcome email no matter what the settings are set to?
[22:37:04] <Negative24>	 give me a few minutes to spin up another test instance
[22:42:39] <Negative24>	 Luke081515: sorry, I have to go now but I'll send you a message if I find out something
[22:43:41] <Luke081515>	 Negative24: Thanks, but I guess I can resolve this by my own at this time :)
[22:49:29] <Krinkle>	 YuviPanda: ping
[22:50:23] <YuviPanda>	 Krinkle: pong maybe
[22:50:31] <YuviPanda>	 I saw the grafana link, I had no idea there was a grafana.wmflabs.org
[22:50:34] <YuviPanda>	 no idea where it's hosted even
[22:50:39] <Krinkle>	 YuviPanda: yeah, it's okay
[22:50:44] <Krinkle>	 YuviPanda: I'm using production grafana now
[22:50:49] <Krinkle>	 I added labs graphite as data source
[22:51:02] <Krinkle>	 it's over HTTP client-side only anyway
[22:51:54] <Krinkle>	 It's coming along nicely but the overwhelming amount of garbage from no-longer existent instances is making it unusable
[22:52:07] <Krinkle>	 as it relies on property discovery essentially
[22:52:21] <Krinkle>	 kind of like how earlier versions of Nagf worked before I made that use the API instead
[22:52:33] <Krinkle>	 Can we bring the archiver back and/or do a purge once?
[22:52:47] <Krenair>	 <YuviPanda> no idea where it's hosted even
[22:52:50] <Krenair>	 -> https://phabricator.wikimedia.org/T115752
[22:52:51] <Krinkle>	 I can't find the task for it
[22:52:52] <YuviPanda>	 Krinkle: yeah godog bought the archiver back
[22:52:56] <YuviPanda>	 but haven't merged it
[22:53:03] <YuviPanda>	 I can probably merge and babysit it next week
[22:53:31] <Krenair>	 <Krinkle> It's coming along nicely but the overwhelming amount of garbage from no-longer existent instances is making it unusable
[22:53:32] <Krenair>	 -> https://gerrit.wikimedia.org/r/#/c/248317/
[22:53:46] <YuviPanda>	 yup
[22:53:47] <YuviPanda>	 that's the patch
[22:54:04] <YuviPanda>	 I can possibly merge it now if it's blocking you
[22:57:18] <Krenair>	 not blocking me
[22:57:32] <Krenair>	 I made the bug because of all the silly alerts it creates in shinken
[22:58:05] <YuviPanda>	 heh always fun when both Krenair and Krinkle are talking
[22:58:10] <YuviPanda>	 at least there's no more krrrit-wm
[22:58:27] <YuviPanda>	 (it's still on kubernetes, just not called krrrit-wm anymore)
[23:05:12] <Krinkle>	 What's the difference between bytes_avail and bytes_free in diamond diskspace?
[23:05:21] <Krinkle>	 https://github.com/BrightcoveOS/Diamond/blob/6ea198f3ebe58473467c6dc38b20e683c278192c/src/collectors/diskspace/diskspace.py#L211-L212
[23:05:25] <Krinkle>	 weird stuff
[23:06:52] <Krinkle>	 avail tends to be smaller than free
[23:08:02] <Krinkle>	 Ah, Ubuntu forums to the rescue
[23:10:13] <wikibugs>	 6Labs, 10Tool-Labs: Initial Deployment of Kubernetes to Tool Labs (Tracking) - https://phabricator.wikimedia.org/T111885#1751185 (10yuvipanda)
[23:10:15] <wikibugs>	 6Labs, 10Tool-Labs, 3Labs-Sprint-114, 3Labs-Sprint-115, and 2 others: Add support to dynamicproxy for kubernetes based web services - https://phabricator.wikimedia.org/T111916#1751184 (10yuvipanda) 5Open>3Resolved
[23:17:46] <shinken-wm>	 PROBLEM - Puppet failure on tools-webgrid-lighttpd-1410 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0]
[23:25:57] <wikibugs>	 6Labs, 10Tool-Labs: Enforce that containers from a user run with the uid assigned to that user - https://phabricator.wikimedia.org/T116504#1751201 (10yuvipanda) 3NEW
[23:26:05] <Negative24>	 <YuviPanda> heh always fun when both Krenair and Krinkle are talking
[23:26:14] <Negative24>	 Re: Username coloring!
[23:57:46] <shinken-wm>	 RECOVERY - Puppet failure on tools-webgrid-lighttpd-1410 is OK: OK: Less than 1.00% above the threshold [0.0]