[01:02:47] PROBLEM - ToolLabs Home Page on toollabs is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:06:47] RECOVERY - ToolLabs Home Page on toollabs is OK: HTTP OK: HTTP/1.1 200 OK - 908295 bytes in 4.300 second response time [01:12:41] milimetric: What are you doing on druid1 atm? [01:12:55] milimetric: that tail is - literally - eating up all t he NFS I/O bandwidth. [01:13:01] ugh [01:13:03] sorry! :( [01:13:08] it's this insane task [01:13:13] I have to tail -n +2 a file [01:13:16] and it's 35GB [01:13:32] because the first line and only the first line is causing this huge indexing job to fail (not my code) [01:13:54] is there a better way to copy in place on there? [01:14:07] it should be done relatively soon... [01:14:31] You might want to insert a pv with a -L limit? [01:14:32] Coren: ^ [01:14:47] But if it's over soon, then you should be okay. [01:15:10] !ping [01:15:11] !pong [01:15:40] Pro tip: when you have a very expensive I/O operation we (any opsen) can usually do it easier and faster directly on the storage server instead. [01:15:44] :/ I started it about 30 minutes ago, and if I'm right it should only take about 35 minutes [01:15:59] oh, i had no idea i had rights to log into the store server [01:16:02] milimetric: Well, the output file size should give you a good idea. [01:16:04] how do I do that [01:16:43] I'd ls -l the output file but i didn't wanna kill the NFS even more [01:16:58] labstore1002.eqiad.wmnet. The actual storage is under /srv [01:17:07] You'll know for the next time. :-) [01:17:50] thx and sorry, if it's not over in 10 minutes I'll kill it [01:17:57] When you do an expensive thing there, just making it 'ionice -c idle ' means you won't kill I/O. [01:18:23] lag on tools, anyone? [01:18:32] milimetric: That's allright, but since it's user-impacting you need to do an outage report now. :-) [01:21:07] lol [01:21:08] Negative24: Yeah; should be over soon. [01:21:09] * Negative24 is reading scrollback [01:21:09] ok, my oh my - https://wikitech.wikimedia.org/wiki/Incident_documentation right? [01:21:10] Ayup. [01:21:10] (Just to clarify, /you/ couldn't have logged in to labstore1002 to do the fix, afaik, but anyone in ops can and would have been happy to help) [01:22:22] oh, I see [01:22:45] How much does tools rely on NFS? [01:22:55] Coren: I'm gonna kill it, mind helping me then? [01:23:16] milimetric: Sure thing. Just tell me what the file is and I'll start it now [01:24:20] k, it's /data/project/milimetric/1day.pageviews.2015.10.14.json and I just need everything but the 1st line in a file in the same folder, doesn't matter what you call it [01:24:21] Negative24: Quite a bit - it's the project that most depends on it. That said, the normal case (as we see now) is performance impact not brokenness. [01:24:21] sorry everyone, lag should be ok now, I killed it [01:24:22] ok, makes sense [01:24:26] milimetric: thanks [01:25:25] milimetric: It was about 2/3 done - good thing you killed it. [01:25:33] yes [01:25:39] i'll delete that partially done one [01:26:14] oh... does *that* use NFS bandwidth too? [01:26:43] milimetric: It'll have an impact, but very briefly. [01:27:25] * milimetric is spooked and sorry, will look at /data/project with fear from now on [01:28:06] Heh. No need for fear but just remember that manipulating 100s of G of data has a cost. :-) [01:28:23] Ugh [01:28:27] Got a page about nfs [01:28:35] YuviPanda: handled now. [01:28:36] A bit away from my laptop... [01:28:38] Ah ok [01:28:39] Luke081515|away: oh I didn't know you had an away nick. I was trying to reply to your privmsg [01:28:40] Cook [01:28:42] Cool [01:28:47] I'll go away then. Thanks [01:28:53] np [01:29:18] milimetric: I'll wait for everything to catch up then start your tail [01:32:46] https://wikitech.wikimedia.org/wiki/Incident_documentation/20151023-LabsNFS-Lag [01:33:04] (feel free to change or ask me to put any other details in) [01:33:24] milimetric: I started the tail capped at 20MB/s; will take 28m [01:33:34] thx much [01:34:01] milimetric: Only thing missing is a timeline. [01:34:07] this is for the pageview API, btw, we're trying to figure out better ways to store the data in less space so we can give people more historical data [01:34:18] milimetric: (when you started the operation, when you killed it) [01:34:30] k [01:39:35] added the timeline and moved to 10-24 because UTC: https://wikitech.wikimedia.org/wiki/Incident_documentation/20151024-LabsNFS-Lag [01:42:34] milimetric: Danke. [01:48:41] milimetric: It make take a while longer than pv told me at first since it's on Idle ionice - other users get priority. [01:59:01] is something broken again? (tools webservices don't respond to me) [01:59:28] hm, works again [02:01:07] but nfs seems to be full [02:05:01] Coren: ^? [02:05:01] "full"? [02:05:01] no bandwith left [02:05:02] sorry [02:05:02] Yeah, looking. [02:05:03] can't log in, connections frozen [02:05:03] I actually see no NFS traffic at all - that seems wrong. [02:05:17] ... and it's back. [02:05:20] yup [02:05:24] * Coren boggles a bit. [02:07:21] strange [02:07:21] hmm [02:07:22] Coren: how do you check the NFS stats? [02:07:22] For a short while, there was absolutely no network traffic between labs and the storage server [02:07:22] Negative24: Depends. Right now I'm doing iptraf on the server proper, but my normal "keep an eye on thing" is in grafana [02:07:23] Ok. Grafana is pretty cool [02:07:23] Hm. Not seeing any outliers on traffic either, so nothing is going rogue that I can see. [02:09:15] Well, except someone making a tarball on tools-bastion but it's not /too/ bad. [02:18:05] It's doing it again. [02:18:17] yeah :\ [02:18:29] afaict, network connections just... stop. [02:18:36] Then start again. [02:18:43] someone plugging some cables? [02:19:29] I'm not seeing anything broken on the labstore; and the server's network appears to be up. [02:19:31] ... and, traffic returns. [02:19:35] * Coren looks closer at routing between labs and not-labs. [02:20:26] and problems again [02:20:36] and up [02:20:40] wow [02:21:03] gifti: I think what you're seeing is a side effect of exponential backoff though - when the connection returns it's going to pick up slowly. [02:21:27] gifti: Because the clients don't know it's not just network congestion [02:21:51] mhm [02:24:49] * Coren looks at labnet1002 with suspicion. [02:28:18] * Coren wonders why wm-bots is taking doing so much I/O [02:36:34] Augh. I hate heisenbugs. [02:51:14] so I wanted to just read that big file from /data/project now, to process it, now that it's ready [02:51:41] just making sure that's ok. There's no real way to pv it because it's not my code that's reading it [02:52:06] (because it looks like there are other problems going on) [02:53:31] I guess the process reading it will be rate-limited anyway, by how much it can process [02:53:47] so probably won't kill anything. Just ping me if I'm using up too much bandwidth again [02:54:09] kk [02:54:44] Reading, as a rule, is lighter than writing. [03:12:21] when it says 80Gb allocated storage here: https://wikitech.wikimedia.org/wiki/Nova_Resource:Druid1.analytics.eqiad.wmflabs [03:12:39] is that available somewhere I'm not looking? It looks like / has only 20G on druid1 [03:17:43] milimetric: Extra space is not partitionned by default. [03:18:14] oh, is there a way to get that? Then I can pv copy that big file and end my nightmares [03:18:21] it failed again some weird characters in the middle of it [03:19:03] labs_lvm::volume [03:19:39] sweet [03:20:30] hm, I see role::labs::lvm::mnt and role::labs::lvm::srv [03:20:44] role::labs::lvm::srv is an easily-usable role you can simply apply that basically just mounts "everything else" on /srv [03:21:04] Both just use labs_lvm::volume with reasonable values. [03:21:34] and that won't be nfs-mounted, so I can copy to it and then do whatever I want on /srv? [03:22:06] Not NFS, though still virtual resource. So, "Yes, for various values of 'whatever I want'". :-) [03:22:13] Much more forgiving than NFS though. [03:22:34] cool, I'm just going to be piping that file through regexes until I get it to validate [03:22:43] thx [04:48:47] YuviPanda: The Grafana instance in labs is a bit weird (each user has their own org, dashboards made inside that are not publicly visible). But here's something to consider: http://i.imgur.com/teEggEj.png [04:49:20] The new Grafana 2.0 has an important feature for this to work: Repeatable panels, and repeatable rows - based on available values in a variable. [04:49:27] So it can repeat a row for each server in a project. [04:51:50] Once we can host this dashboard somewhere that doesn't require logging in, it'll feel like it's time to decommision my Nagf tool - https://tools.wmflabs.org/nagf/?project=cvn - which is now fully obsoleted by Grafana functionality with this latest release. [04:52:10] We can probably also use production grafana, since it just queries graphite. It can take multiple graphite installs. [08:18:46] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1410 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [08:58:45] RECOVERY - Puppet failure on tools-webgrid-lighttpd-1410 is OK: OK: Less than 1.00% above the threshold [0.0] [09:46:37] PROBLEM - Puppet staleness on tools-k8s-bastion-01 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [43200.0] [11:50:45] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1410 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [12:25:53] RECOVERY - Puppet failure on tools-webgrid-lighttpd-1410 is OK: OK: Less than 1.00% above the threshold [0.0] [14:18:48] 10Wikibugs: Wrong message, wikibugs desplayed normaly users, if herald does an action - https://phabricator.wikimedia.org/T116477#1750540 (10Luke081515) 3NEW [16:43:18] 10Wikibugs, 6Phabricator: Wrong message, wikibugs desplayed normaly users, if herald does an action - https://phabricator.wikimedia.org/T116477#1750681 (10Legoktm) wikibugs is just echoing the information it received from Phabricator: ``` 2015-10-24 14:10:59,941 - wikibugs.wb2-phab - DEBUG - Processing {"class... [16:49:44] 10Wikibugs, 6Phabricator: Wrong message, wikibugs desplayed normaly users, if herald does an action - https://phabricator.wikimedia.org/T116477#1750692 (10valhallasw) redis2irc.log: ``` 2015-10-24 14:11:00,579 - irc3.wikibugs - DEBUG - > PRIVMSG #wikimedia-releng :10Beta-Cluster-Infrastructure, 10CirrusSearch,... [16:49:51] * valhallasw`cloud eyes legoktm [16:49:52] :> [16:57:39] 6Labs, 10Tool-Labs: Provide centralized logging (logstash) - https://phabricator.wikimedia.org/T97861#1750706 (10intracer) [16:59:18] 10Wikibugs, 6Phabricator: Wrong message, wikibugs desplayed normaly users, if herald does an action - https://phabricator.wikimedia.org/T116477#1750711 (10Luke081515) I wonder, normaly herald don't does something, when someone just reads. [16:59:52] 6Labs, 10Tool-Labs: Cannot start java processes using the grid engine - https://phabricator.wikimedia.org/T69588#1750712 (10intracer) [17:02:13] PROBLEM - ToolLabs Home Page on toollabs is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:06:49] RECOVERY - ToolLabs Home Page on toollabs is OK: HTTP OK: HTTP/1.1 200 OK - 908345 bytes in 3.345 second response time [18:21:46] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1410 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [18:56:45] RECOVERY - Puppet failure on tools-webgrid-lighttpd-1410 is OK: OK: Less than 1.00% above the threshold [0.0] [19:03:43] 6Labs, 6Phabricator: phabricator puppet at labs broken - https://phabricator.wikimedia.org/T116442#1750815 (10Negative24) Which project and node is this? [19:06:53] Luke081515|AFK: I'm going to try and replicate ^ [19:09:43] Negative24: the error tells you [19:10:02] It's rcm-3 in the rcm project [19:10:20] JohnFLewis: oh I see [19:10:40] nm then [19:12:27] I'm trained to just see what I want to see in an error (because I see them so much :P) [19:13:06] Mkay :P [19:28:38] 6Labs, 6Phabricator: phabricator puppet at labs broken - https://phabricator.wikimedia.org/T116442#1750835 (10Negative24) Hmm, it works for me on phab-t116442.performance.eqiad.wmflabs. @Luke081515, how did you configure the puppet group on the rcm project? [19:42:32] twentyafterfour: was there a ticket created for the phab security extension not being cloned by puppet on install? [19:52:50] 6Labs, 6Phabricator: phabricator puppet at labs broken - https://phabricator.wikimedia.org/T116442#1750851 (10Luke081515) I used role::phabricator::main, and then tried to get it with sudo puppet agent -tv. The whole console output is this: luke081515@rcm-2:~$ sudo puppet agent -tv Info: Retrieving plugin... [19:52:58] Negative24: I wrote it down at the task [19:53:56] 6Labs, 6Phabricator, 7Puppet: phabricator puppet at labs broken - https://phabricator.wikimedia.org/T116442#1750852 (10Krenair) [19:54:18] Luke081515: I was asking how not which class [19:54:54] Negative24: with sudo puppet agent -tv [19:55:32] Did you add it through Special:NovaPuppetGroup? [19:55:42] and then configure the VM? [19:56:04] Luke081515: and you want role::phabricator::labs not role::phabricator::main [19:56:04] Negative24: Yeah, that's true [19:56:10] hm, ok [19:56:14] try that [19:56:45] ok, wait a moment [19:57:58] Luke081515: it will fail again but I'll walk you from there [19:58:32] phabricator takes a few steps. The puppet role isn't a turnkey configure-er [19:59:21] ok, thanks. At the moment one error, and a lot of notices [20:00:38] Luke081515: post the error here [20:00:50] Error: /Stage[main]/Apache::Logrotate/Augeas[Apache2 logs]: Could not evaluate: Save failed with return code false, see debug [20:01:01] and two more: [20:01:03] 6Labs, 10wikitech.wikimedia.org: Interacting with table generated by {{#ask}} throws Uncaught TypeError - https://phabricator.wikimedia.org/T101642#1750860 (10Aklapper) [20:01:05] 6Labs, 6operations, 10wikitech.wikimedia.org: distribution upgrade for wikitech-static instance - https://phabricator.wikimedia.org/T94585#1750861 (10Aklapper) [20:01:06] Error: Could not start Service[phd]: Execution of '/usr/sbin/service phd start --force' returned 255: [20:01:07] 6Labs, 10wikitech.wikimedia.org: Project Bastion has service groups - https://phabricator.wikimedia.org/T64537#1750862 (10Aklapper) [20:01:09] 6Labs, 10wikitech.wikimedia.org: Upgrade SMW to 1.9 or later - https://phabricator.wikimedia.org/T62886#1750863 (10Aklapper) [20:01:09] Error: /Stage[main]/Phabricator/Service[phd]/ensure: change from stopped to running failed: Could not start Service[phd]: Execution of '/usr/sbin/service phd start --force' returned 255: [20:01:11] 6Labs, 10wikitech.wikimedia.org: Discrepancies in public IP instance lists between different wikitech UIs - https://phabricator.wikimedia.org/T62883#1750865 (10Aklapper) [20:01:13] 6Labs, 10wikitech.wikimedia.org: Can't reset password on wikitech (Unicode passwords not accepted), due to LDAP/opendj? - https://phabricator.wikimedia.org/T58114#1750867 (10Aklapper) [20:01:15] 6Labs, 10wikitech.wikimedia.org, 7Regression: [Regression] Editing "Documentation" page for labs project not working - https://phabricator.wikimedia.org/T47519#1750869 (10Aklapper) [20:01:16] that are all errors [20:01:17] 6Labs, 10wikitech.wikimedia.org: Hostnames assigned to floating IP persist when deallocated - https://phabricator.wikimedia.org/T55816#1750870 (10Aklapper) [20:01:19] 6Labs, 10wikitech.wikimedia.org: Cleanup and enable UserFunctions extension on wikitech - https://phabricator.wikimedia.org/T47455#1750873 (10Aklapper) [20:01:21] 6Labs, 10LabsDB-Auditor, 10Tool-Labs: Make labsdb views fully column-whitelist based - https://phabricator.wikimedia.org/T86218#1750874 (10Aklapper) [20:01:31] 6Labs, 10LabsDB-Auditor: Generate simple HTML interface to view reports generated by labsdb-auditor - https://phabricator.wikimedia.org/T78723#1750881 (10Aklapper) [20:01:49] 6Labs, 10Tool-Labs: deduplicate compute::general and compute::dedicated roles - https://phabricator.wikimedia.org/T99131#1750882 (10Aklapper) [20:01:51] 6Labs, 10Tool-Labs: Investigate alternatives to dedicated exec node for gifti's tools - https://phabricator.wikimedia.org/T99130#1750883 (10Aklapper) [20:01:53] 6Labs, 10Tool-Labs: Shinken: make sure 'Free space - all mounts' can handle no-longer-existing mounts - https://phabricator.wikimedia.org/T99077#1750884 (10Aklapper) [20:01:54] 6Labs, 10Tool-Labs: Fix shinken config to remove tools-webproxy-test - https://phabricator.wikimedia.org/T99073#1750885 (10Aklapper) [20:01:56] 6Labs, 10Tool-Labs, 5Patch-For-Review: Deprecate #no-default-php in .lighttpd.conf - https://phabricator.wikimedia.org/T98818#1750886 (10Aklapper) [20:01:58] 6Labs, 10Tool-Labs: Investigate / get rid of http debian repo from toollabs - https://phabricator.wikimedia.org/T98575#1750887 (10Aklapper) [20:02:00] 6Labs, 10Tool-Labs: Add shinken admin accounts for tools ops - https://phabricator.wikimedia.org/T97862#1750889 (10Aklapper) [20:02:02] 6Labs, 10Tool-Labs: Convert tomcat-starter to python - https://phabricator.wikimedia.org/T98442#1750888 (10Aklapper) [20:02:06] 6Labs, 10Tool-Labs, 3ToolLabs-Goals-Q4: Make tools-mail redundant - https://phabricator.wikimedia.org/T96967#1750897 (10Aklapper) [20:02:08] 6Labs, 10Tool-Labs: toolsbeta: set up puppet-compiler / temporary-apply - https://phabricator.wikimedia.org/T97081#1750896 (10Aklapper) [20:02:31] wow the batch editing Andre [20:02:50] Luke081515: you can ignore the apache error [20:02:56] yeah, I found the batch edit job: 78 tasks [20:03:09] but the phd service is what is what I needed [20:03:13] 6Labs, 10Tool-Labs, 10Internet-Archive: Document how to install Python modules in a tool's home directory/virtual environment - https://phabricator.wikimedia.org/T63824#1750973 (10Aklapper) [20:03:13] 6Labs, 10Tool-Labs, 7Tracking: Toolserver migration to Tools (tracking) - https://phabricator.wikimedia.org/T60788#1750975 (10Aklapper) [20:03:15] 6Labs, 10Tool-Labs, 7Documentation: add basic expectations management to docs - https://phabricator.wikimedia.org/T56701#1750984 (10Aklapper) [20:03:16] ok [20:03:17] 6Labs, 10Tool-Labs, 7Documentation: Wikimedia Labs system admin (sysadmin) documentation sucks - https://phabricator.wikimedia.org/T57946#1750982 (10Aklapper) [20:03:19] 6Labs, 10Tool-Labs: Status page should automatically refresh data - https://phabricator.wikimedia.org/T54275#1750983 (10Aklapper) [20:03:21] 6Labs, 10Tool-Labs: Clean up list of projects on Tool Labs home page and add Tomcat tools - https://phabricator.wikimedia.org/T51937#1750987 (10Aklapper) [20:04:10] Luke081515: cd to /srv/phab/libext and run `sudo git clone https://github.com/wikimedia/phabricator-extensions-security.git security` [20:04:10] hm, ok, so whats the next step? [20:04:17] ok, wait a moment [20:04:42] 6Labs, 10LabsDB-Auditor, 10MediaWiki-extensions-OpenStackManager, 10Tool-Labs, and 7 others: Labs' Phabricator tags overhaul - https://phabricator.wikimedia.org/T89270#1750993 (10Aklapper) 5Open>3Resolved a:3Aklapper >>! In T89270#1374717, @Aklapper wrote: > * This will **NOT** retroactively add exis... [20:04:42] done [20:05:06] Luke081515: now cd to phabricator/ and run `sudo bin/storage upgrade` and answer yes to the prompts [20:06:08] Negative24: not that I know of [20:06:30] Negative24: Done [20:06:42] twentyafterfour: ok because we have another user that I'm walking through the phab lab install process ^ [20:07:02] Luke081515: then rerun puppet and you should be good to go [20:08:32] 6Labs, 10wikitech.wikimedia.org: Hostnames assigned to floating IP persist when deallocated - https://phabricator.wikimedia.org/T55816#1751008 (10Krenair) Is {T115194} an instance of this bug? [20:08:37] Negative24: Notice: Finished catalog run in 16.54 seconds [20:08:40] succesful [20:08:57] 6Labs, 6Phabricator, 7Puppet: phabricator puppet at labs broken - https://phabricator.wikimedia.org/T116442#1751011 (10Negative24) 5Open>3Resolved a:3Negative24 `role::phabricator::main` isn't the right Puppet class to use in Labs. I'm pretty sure the error had to do with the site variables. I walked @... [20:09:28] Luke081515: did you configure a web proxy? [20:09:47] Negative24: Which port? 8080 or 80? [20:09:54] Luke081515: 80 [20:10:00] ok, then I got one [20:10:25] then you should be able to reach the instance and configure a default admin account [20:12:48] 504 Gateway Time-out :( [20:12:59] Negative24: Not possible at the moment [20:13:10] Luke081515: what the web address [20:13:13] Or need I to configure the domain first? [20:13:22] s/what's/what [20:13:23] I choosed http://luke081515.wmflabs.org/ [20:14:04] make sure the web proxy is pointed at the right instance [20:14:12] https://wikitech.wikimedia.org/wiki/Special:NovaProxy [20:14:49] the instance is the right [20:14:58] *is correct [20:16:01] but it still times out [20:16:47] oh I know [20:16:54] security policies [20:17:53] Luke081515: what security groups is your instance in? (check Special:NovaInstance under Security groups) [20:18:29] Negative24: Do you mean Special:NovaSecurityGroup? [20:18:48] Luke081515: yeah [20:19:15] you need to add port 80 to one of the security groups your instance is in [20:19:51] Negative24: At the moment, I got these: https://phabricator.wikimedia.org/F2764696 [20:20:19] ok, only one security group. [20:20:34] click add rule next to the default rule [20:20:37] yeah, I don't change it yet, since the project exists [20:20:54] ok, which ports, and which protocol? [20:21:19] eh, wozu hast du einen TaxonBot-Ordner [20:21:58] Luke081515: beginning and end port range: 80, protocol: tcp, CIDR range: 0.0.0.0/0 [20:22:53] Negative24: And which source group? I only have default, so choose this? [20:23:26] Luke081515: no don't select anything there. That's for creating another type of rule [20:23:37] ok [20:23:46] I created the rule now [20:23:47] Luke081515: only fill in the section "Individual rule" [20:24:11] yeah, I've done that [20:24:40] hmm, its still not working. At least it hit the instance [20:25:17] give me one sec [20:25:30] yeah, but I have now another error, an exception [20:25:36] This request asked for "/" on host "luke081515.wmflabs.org", but no site is configured which can serve this request. [20:26:01] yeah I saw that [20:30:10] Luke081515: ah ok. So puppet sets the phabricator url to .wmflabs.org so you're are going to have to use rcm-3.wmflabs.org [20:30:36] ok, so I have to change the proxy config? [20:30:47] yes, create a new proxy [20:31:15] YEAH, it works! [20:31:17] https://rcm-2.wmflabs.org/ [20:31:59] is das das neue phab? [20:32:07] you have two instances? rcm-2 and rcm-3? [20:32:16] yeah :) [20:32:37] ok [20:32:56] Negative24: so puppet imported the config to my instance, and I can change it? [20:33:27] Luke081515: No. It pulls the config from the puppetmaster. I don't think you can change it [20:33:36] hm [20:34:31] but changing local config works :) [20:34:46] which config? [20:34:58] the config at phabricator, like require login [20:34:58] are you trying to change the base-uri? [20:35:11] no, I first put the instance to login required ;) [20:35:24] and that works [20:35:56] but I got still mysql errors [20:36:01] yeah, those configs are in the database [20:36:06] ok [20:37:27] Negative24: Can youhelp me? I got MySQL May Run Slowly [20:37:30] Can I change that? [20:37:41] you can ignore those [20:37:48] just click ignore issue [20:37:51] ok [20:40:46] Negative24: Then, thank you much for your help :) [20:41:03] Luke081515: you're welcome [20:43:23] Negative24: is it possible to migrate the data of the old phab in this new one now? [20:43:52] doctaxon: I'm not sure. I believe that would require a database migration [20:43:53] tasks, diffs and gits, projects, workboards, all [20:44:07] yes, but how to [20:44:29] doctaxon: I guess you need access to the old database, and that's not possible [20:44:54] i think, Negative24 has a plan, right? [20:45:25] I had no idea what you're trying to do [20:45:27] *have [20:45:51] i need the data of the old phab in this new now to work on on my tasks for example [20:46:15] there are many projects on the old [20:46:28] i need it on the new now [20:46:41] what old phab? phabricator.wikimedia.org? [20:47:06] https://luke081515.phoreplay.com [20:47:06] I don't even know what this is for [20:48:04] i am user on this phab and i need the new data migrated on this new phab of Luke081515 now [20:48:23] so ask Luke081515 to dump and import the databases? [20:48:51] oh ya, may I know, how to do this? [20:48:57] the problem is, that I can't access the old database, So I will have to do this manully, but it is not so much :) [20:49:20] it doesn't work manually [20:49:26] I does [20:49:29] *it [20:49:36] Luke081515: you're going to have to bring that up with phoreplay [20:49:46] ask the phoreplay people for a database dump, then, I guess? [20:49:55] or write a bot to migrate everything, but that's not going to be a lot of fun [20:50:05] no that won [20:50:12] *won't [20:50:45] valhallasw`cloud: Was flimport not a bot, which migrates from another phabricator? Maybe a script like this exists? [20:51:43] yes, this is a good question? [20:53:02] flimport was bz -> phab, right? [20:53:21] no, I think flimport was phab01.wmflabs -> phab.wm.o [20:53:27] but it did a pretty crappy job at it [20:53:40] all messages are 'from: flimport' etc: https://phabricator.wikimedia.org/T111 [20:53:56] it's probably mentioned somewhere in the migration docs [20:54:32] and rtimport, was this also a phab->phab bot? [20:54:50] rt->phab, as the name suggests. [20:55:10] and we've had a jira->phab bot, which also did a reasonable-but-not-very-good job [20:55:17] hm, ok. I don't know whats rt is, but this is another question :) [20:55:22] Luke081515: any bot that you resurface is going to need serious overhaul work to get it to work with what you are trying to do [20:56:07] hm, ok, so it could be faster to this manually, we have just 50 open tasks :) [20:56:48] and while you're at it, why not just move to the wikimedia phabricator instance...? [20:58:30] Luke081515: I'm pretty sure phoreplay would be happy to help. They already do database backups from what I see. [21:01:03] ok, then thanks for you're help [21:01:12] and valhallasw has a good point [21:02:06] We got some private tasks, and use mainly not english to communicate, and normaly I would use a different configuration [21:03:56] Neither of those should really be an issue. There is no strict requirement for people to use English on wm-phab (although it's common) and private tasks can just be handled the same way security issues are. [21:04:00] But it's your choice in the end. [21:05:19] (and I would do some testing at my instance too). Thank you, for your help :) [21:10:26] 6Labs, 10wikitech.wikimedia.org, 7Regression: [Regression] Editing "Documentation" page for labs project not working - https://phabricator.wikimedia.org/T47519#1751057 (10Krinkle) 5Open>3Resolved a:3Krinkle [21:29:30] 6Labs, 6Phabricator, 5Patch-For-Review, 7Puppet: On labs phabricator references security extension even though it isn't present - https://phabricator.wikimedia.org/T104904#1751092 (10Negative24) 5Resolved>3Open Those two commits ensure the directory is created but doesn't install the security extension... [21:32:33] 6Labs, 6Phabricator, 5Patch-For-Review, 7Puppet: On labs phabricator references security extension even though it isn't present - https://phabricator.wikimedia.org/T104904#1751095 (10mmodell) I think we want the security extension in labs. At least until we deprecate it's use. I'm in the process of develop... [21:56:43] 6Labs, 10Beta-Cluster-Infrastructure: Figure out why wikipedia requires an extra DNS entry that the other sites do not - https://phabricator.wikimedia.org/T111661#1751131 (10Krenair) a:5Andrew>3Krenair [21:59:17] 6Labs, 10Beta-Cluster-Infrastructure: beta-hhvm.wmflabs.org? - https://phabricator.wikimedia.org/T111657#1751137 (10Krenair) Have also got cloudadmin and killed these entries in the labs global DNS config: ```beta-hhvm: beta-hhvm.wmflabs.org wikipedia-beta-hhvm: wikipedia.beta-hhvm.wmflabs.org``` [22:02:36] 6Labs, 10Beta-Cluster-Infrastructure: Figure out why wikipedia requires an extra DNS entry that the other sites do not - https://phabricator.wikimedia.org/T111661#1751138 (10Krenair) 5Open>3Resolved I got cloudadmin, removed the weird NovaAddress entry, and then removed the `wikipedia-beta: wikipedia.beta.... [22:27:48] 6Labs, 3Labs-Sprint-107, 3Labs-Sprint-108, 3Labs-Sprint-109: Switch NFS server back to labstore1001 - https://phabricator.wikimedia.org/T107038#1751148 (10yuvipanda) p:5Triage>3High Changing priority to 'high' since labstore1002 is increasingly having more hardware issues. [22:27:58] 6Labs, 10Labs-Team-Backlog, 3Labs-Sprint-107, 3Labs-Sprint-108, 3Labs-Sprint-109: Switch NFS server back to labstore1001 - https://phabricator.wikimedia.org/T107038#1751150 (10yuvipanda) [22:32:29] Negative24: Are you still here? [22:32:39] yea [22:32:59] The Instance don't send any mails at the moment, can you help? [22:34:35] I don't think I figured that out. Did you check the settings? [22:35:14] yeah, he don't send welcome mails, and they should override the normal settings [22:36:27] overrides as in it should send a welcome email no matter what the settings are set to? [22:37:04] give me a few minutes to spin up another test instance [22:42:39] Luke081515: sorry, I have to go now but I'll send you a message if I find out something [22:43:41] Negative24: Thanks, but I guess I can resolve this by my own at this time :) [22:49:29] YuviPanda: ping [22:50:23] Krinkle: pong maybe [22:50:31] I saw the grafana link, I had no idea there was a grafana.wmflabs.org [22:50:34] no idea where it's hosted even [22:50:39] YuviPanda: yeah, it's okay [22:50:44] YuviPanda: I'm using production grafana now [22:50:49] I added labs graphite as data source [22:51:02] it's over HTTP client-side only anyway [22:51:54] It's coming along nicely but the overwhelming amount of garbage from no-longer existent instances is making it unusable [22:52:07] as it relies on property discovery essentially [22:52:21] kind of like how earlier versions of Nagf worked before I made that use the API instead [22:52:33] Can we bring the archiver back and/or do a purge once? [22:52:47] no idea where it's hosted even [22:52:50] -> https://phabricator.wikimedia.org/T115752 [22:52:51] I can't find the task for it [22:52:52] Krinkle: yeah godog bought the archiver back [22:52:56] but haven't merged it [22:53:03] I can probably merge and babysit it next week [22:53:31] It's coming along nicely but the overwhelming amount of garbage from no-longer existent instances is making it unusable [22:53:32] -> https://gerrit.wikimedia.org/r/#/c/248317/ [22:53:46] yup [22:53:47] that's the patch [22:54:04] I can possibly merge it now if it's blocking you [22:57:18] not blocking me [22:57:32] I made the bug because of all the silly alerts it creates in shinken [22:58:05] heh always fun when both Krenair and Krinkle are talking [22:58:10] at least there's no more krrrit-wm [22:58:27] (it's still on kubernetes, just not called krrrit-wm anymore) [23:05:12] What's the difference between bytes_avail and bytes_free in diamond diskspace? [23:05:21] https://github.com/BrightcoveOS/Diamond/blob/6ea198f3ebe58473467c6dc38b20e683c278192c/src/collectors/diskspace/diskspace.py#L211-L212 [23:05:25] weird stuff [23:06:52] avail tends to be smaller than free [23:08:02] Ah, Ubuntu forums to the rescue [23:10:13] 6Labs, 10Tool-Labs: Initial Deployment of Kubernetes to Tool Labs (Tracking) - https://phabricator.wikimedia.org/T111885#1751185 (10yuvipanda) [23:10:15] 6Labs, 10Tool-Labs, 3Labs-Sprint-114, 3Labs-Sprint-115, and 2 others: Add support to dynamicproxy for kubernetes based web services - https://phabricator.wikimedia.org/T111916#1751184 (10yuvipanda) 5Open>3Resolved [23:17:46] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1410 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [23:25:57] 6Labs, 10Tool-Labs: Enforce that containers from a user run with the uid assigned to that user - https://phabricator.wikimedia.org/T116504#1751201 (10yuvipanda) 3NEW [23:26:05] heh always fun when both Krenair and Krinkle are talking [23:26:14] Re: Username coloring! [23:57:46] RECOVERY - Puppet failure on tools-webgrid-lighttpd-1410 is OK: OK: Less than 1.00% above the threshold [0.0]