[00:14:00] PROBLEM - Puppet staleness on tools-exec-15 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [43200.0] [00:14:06] RECOVERY - Host tools-webproxy-jessie is UP: PING OK - Packet loss = 0%, RTA = 0.66 ms [00:17:15] (03PS1) 10Tim Landscheidt: Fix lintian warning for jQuery minified source [labs/toollabs] - 10https://gerrit.wikimedia.org/r/193003 (https://phabricator.wikimedia.org/T90790) [00:20:37] 6Labs: Upgrade labs cluster to Trusty - https://phabricator.wikimedia.org/T90821#1068320 (10Andrew) 3NEW a:3Andrew [00:22:40] 6Labs: Upgrade Labs Compute nodes to Trusty - https://phabricator.wikimedia.org/T90822#1068332 (10Andrew) 3NEW a:3Andrew [00:24:55] 6Labs: Upgrade Labs Compute nodes to Trusty - https://phabricator.wikimedia.org/T90822#1068342 (10Andrew) Before upgrade, virt1005 was running icehouse: ii nova-compute 1:2014.1.3-0ubuntu1~cloud0 post upgrade, still icehouse: ii nova-compute 1:2014.1.3-0ubuntu2... [00:25:47] 6Labs: Upgrade Labs Compute nodes to Trusty - https://phabricator.wikimedia.org/T90822#1068349 (10Andrew) So, the only pain point here is the reboot, which necessitates a reboot of labs instances as well. Is it possible to avoid or defer this reboot? [00:27:36] 6Labs: Upgrade labs network node to trusty - https://phabricator.wikimedia.org/T90823#1068354 (10Andrew) 3NEW a:3Andrew [00:31:15] 6Labs: Upgrade labs controller to Trusty - https://phabricator.wikimedia.org/T90824#1068376 (10Andrew) 3NEW a:3Andrew [00:31:52] 6Labs: Move wikitech web interface to a dedicated server - https://phabricator.wikimedia.org/T88300#1068386 (10Andrew) 5Open>3Resolved [00:34:34] (03CR) 10Tim Landscheidt: [C: 032] Fix lintian warning for jQuery minified source [labs/toollabs] - 10https://gerrit.wikimedia.org/r/193003 (https://phabricator.wikimedia.org/T90790) (owner: 10Tim Landscheidt) [00:37:46] (03CR) 10Tim Landscheidt: "recheck" [labs/toollabs] - 10https://gerrit.wikimedia.org/r/192765 (owner: 10Tim Landscheidt) [00:38:15] (03PS2) 10Tim Landscheidt: Import htmlpurifier 4.5.0 [labs/toollabs] - 10https://gerrit.wikimedia.org/r/192765 [00:44:26] (03CR) 10Tim Landscheidt: [C: 032] Import htmlpurifier 4.5.0 [labs/toollabs] - 10https://gerrit.wikimedia.org/r/192765 (owner: 10Tim Landscheidt) [01:50:25] 6Labs: Upgrade labs cluster to Trusty - https://phabricator.wikimedia.org/T90821#1068829 (10scfc) 5Open>3Resolved [01:52:01] 6Labs: Upgrade labs cluster to Trusty - https://phabricator.wikimedia.org/T90821#1068832 (10scfc) 5Resolved>3Open (Sorry, wrong reference.) [01:53:00] 10Tool-Labs, 5Patch-For-Review: Job labs-toollabs-debian-glue fails - https://phabricator.wikimedia.org/T90790#1068838 (10scfc) 5Open>3Resolved [02:32:11] Hey. [02:32:31] I was trying to use Magnus Manske's quick intersection tool, but I'm getting a "no webservice" error. [02:32:47] I know absolutely nothing about how this type of stuff works. Can someone explain this to me? https://tools.wmflabs.org/quick-intersection/index.php?lang=en&project=wikipedia&cats=Canada+road+transport+articles+without+KML%0D%0AB-Class+Canada+road+transport+articles&ns=1&depth=12&max=30000&start=0&format=html&callback= [02:33:53] TCN7JMinnesota: The tool is down. Give me a minute, I can give it a quick kick in the diodes. [02:34:00] Okay [02:34:37] Try now? [02:34:52] Seems to be working. Thanks. [03:04:05] andrewbogott: What are the options to increase main disk space for instances? [03:04:10] https://phabricator.wikimedia.org/T74011#1068912 [03:04:18] To give us at least a little bit more headroom [03:04:41] 2GB for /var and /tmp (of the 7GB disk of which 5 is taken by the base ci slave image) [03:04:44] is not enough [03:05:01] its enough for a day, but a few things do build up. [03:05:08] Can we re-arrange that in some way? [03:05:16] Krinkle: what instance? [03:05:21] andrewbogott: all [03:05:28] andrewbogott: in this context, integration slaves [03:05:50] every day Jenkins depools instances as slaves because its heatlcheck notices space left is < 1GB [03:06:05] When you create a new instance you can select a size. All sizes start with a default 20G / [03:06:08] causingme to have to manually get rid of a few random gerrit repos from the cache to clear up space. [03:06:20] andrewbogott: Was that increased recently? [03:06:29] but larger sizes have allocated unpartitioned space. That space can be partitioned and mounted with puppet classes such as labs::lvm::srv [03:06:36] Yeah, we mount that on /mnt [03:06:39] for jenkins workspaces [03:06:41] 70GB [03:06:47] But that doesn't apply to /var and /tmp [03:07:07] Without that extra partition we couldn't even run a single mw job [03:07:09] Yes, previously instances had a smaller / and /var and left more space to be custom allocated. [03:07:33] andrewbogott: Is there a way to "upgrade" instances? [03:07:39] Rebuilding will take me a day for all 12 instances [03:07:47] no, instances cannot be resized. [03:07:51] Could do, but would prefer not to, even if it's an unkosher method [03:08:23] If it takes you hours to recreate an instance then you probably need to write some puppet code :( [03:08:36] andrewbogott: I have [03:08:36] https://wikitech.wikimedia.org/wiki/Nova_Resource:Integration/Setup [03:08:39] I can help with that, if necessary. [03:08:39] It's not good enough [03:08:48] That's what I go through [03:08:54] the manual steps at the end take 2 minutes [03:08:57] the time is in the rest [03:09:19] on average I take an hour [03:09:21] sometimes 2 [03:09:34] It's terrible. I understand why, but it's really unworkable [03:09:57] and then the race conditions sometimes, starting over again [03:10:24] the main puppet run of course alone takes like 30 minutes, bu nothing we can do about that [03:10:31] Hm, probably you can have the local puppetmaster set project-wide using hiera (maybe, I’ve had spotty luck with hiera in labs) [03:10:42] wait, which is the ‘main’ puppet run? [03:10:49] None of these things take me more than two or three minutes. [03:11:13] andrewbogott: after enabling the ci slave role [03:11:19] which installs a shitload of packages and stuff [03:11:23] ah, I see. [03:11:32] Well, you don’t have to create all your instances in sequence, at least. [03:11:35] basically turning it into a glorified app server [03:11:59] andrewbogott: Yeah, I'd need some additional quota to work with though. So that I can create a new pool while the current one stays live [03:12:14] otherwise I'd have to do it rolling one by one [03:12:19] that’s easy. You’ll probably need that anyway if you’re making bigger instances. [03:12:45] andrewbogott: Do precise instances still work properly if newly created? [03:12:53] yes [03:13:05] integration-slave1010 is m1large like the others [03:13:11] I’m not sure if yuvi changed the partition scheme for precise, though, you’d have to try and see. [03:13:11] but created more recently [03:13:21] it has the bigger main disk (18 GB it seems) [03:13:22] nice [03:13:37] ok, good. [03:14:00] larger than m1.large is not needed. just need the main disk to be upgraded to the new default basically [03:14:15] 4 cores, 80gb partition, 8gb ram, that's all great [03:14:42] ok [03:14:57] this is project ‘integration’ right? [03:15:00] Right now we have 9 instances. 4x precise, 5x trusty. All m1.large [03:15:02] andrewbogott: yeah [03:15:16] andrewbogott: could you add quota so that I can create 9 m1.large instances tomorrow? [03:15:47] 9 slaves. there are more instances but those don't matter right now [03:17:36] I can, as long as you promise to clean up afterwards :) [03:18:21] * Coren attempts to understand which timezone(s) andrewbogott is synced with and fails. [03:18:42] I’m in SF right now, will be in MSP next week :) [03:18:43] andrewbogott: absolutely [03:18:58] But I’m working a goofy schedule partly due to late-night outage on Monday [03:19:17] andrewbogott: btw, here's a dump of current disk usage on slaves. https://gist.githubusercontent.com/Krinkle/be7fdc62001314ce128f/raw/ - anything in particular that stands out or you find odd? [03:19:47] hm, how many cores in an m1.large? [03:19:50] * andrewbogott looks for himself [03:19:59] 4 I think [03:20:50] Krinkle: ok, want to double-check the quotas? [03:22:11] nothing looks especially strange about that disk usage [03:23:20] andrewbogott: looks good (quota) [03:23:55] Krinkle: let me know if you get stuck on anything tomorrow. [03:24:01] Thanks [03:30:57] andrewbogott: ah, and I see /var no longer has its own mount? I guess in the new instances this was folded back into the main disk [03:31:16] cool [03:31:25] right — we debated it long and hard and finally just making one big partition won out [03:31:47] It’s not my favorite since it means that filling up /var/log also fills up /. But, for most cases it should be better. [03:42:52] Krinkle: If you want to be extra happy safe, you might want to put /var/log on its on partition. You can do it with the biglogs class [03:44:26] * Krinkle goes and creates them now. Might as well. So they're ready to go. [03:50:59] Coren: andrewbogott: Thanks! [03:51:00] o/ [04:15:37] (03PS1) 10Tim Landscheidt: Disallow XoviBot in robots.txt [labs/toollabs] - 10https://gerrit.wikimedia.org/r/193043 (https://phabricator.wikimedia.org/T90636) [04:19:32] (03CR) 10Tim Landscheidt: [C: 032] Disallow XoviBot in robots.txt [labs/toollabs] - 10https://gerrit.wikimedia.org/r/193043 (https://phabricator.wikimedia.org/T90636) (owner: 10Tim Landscheidt) [04:20:35] 10Tool-Labs: Block xovibot user-agent - https://phabricator.wikimedia.org/T90636#1069013 (10scfc) [04:27:07] (03PS1) 10Tim Landscheidt: Move htmlpurifier to the correct location [labs/toollabs] - 10https://gerrit.wikimedia.org/r/193044 [04:27:53] (03CR) 10Tim Landscheidt: [C: 032] "Doh!" [labs/toollabs] - 10https://gerrit.wikimedia.org/r/193044 (owner: 10Tim Landscheidt) [04:51:03] 10Tool-Labs: Block xovibot user-agent - https://phabricator.wikimedia.org/T90636#1069030 (10scfc) According to http://www.xovibot.net/, they honour `robots.txt` so I added an entry there. There are some disgruntled reports on the web about them actually //not// reading `robots.txt`, but as they will not read it... [04:56:19] (03PS2) 10Ricordisamoa: Initial commit [labs/tools/faces] - 10https://gerrit.wikimedia.org/r/192096 [05:05:28] (03CR) 10Ricordisamoa: "The basics of logging in and opting in/out are now working. Thank you!" [labs/tools/faces] - 10https://gerrit.wikimedia.org/r/192096 (owner: 10Ricordisamoa) [05:08:50] (03CR) 10Ricordisamoa: "Still missing:" [labs/tools/faces] - 10https://gerrit.wikimedia.org/r/192096 (owner: 10Ricordisamoa) [05:11:52] (03CR) 10Ricordisamoa: "PS:" [labs/tools/faces] - 10https://gerrit.wikimedia.org/r/192096 (owner: 10Ricordisamoa) [05:58:02] 6Labs, 10Tool-Labs: Migrate tools-redis to a bigger instance - https://phabricator.wikimedia.org/T87107#1069081 (10scfc) a:3coren I must have missed the last line: ``` root@tools-redis:~# df -h Filesystem Size Used Avail Use% Mounted on /dev/vda1... [05:58:13] 6Labs, 10Tool-Labs: Migrate tools-redis to a bigger instance - https://phabricator.wikimedia.org/T87107#1069083 (10scfc) 5Open>3Resolved [06:00:14] 6Labs, 6operations: Wikitech registration for prior SVN user - https://phabricator.wikimedia.org/T90658#1069090 (10Dragons_flight) @chasemp: You mention changing where such requests are sent. Do I need to do anything else to ensure that this request is seen by the appropriate people? I'm guessing that it is... [06:17:37] 6Labs, 6operations: Wikitech registration for prior SVN user - https://phabricator.wikimedia.org/T90658#1069103 (10chasemp) The request ended up in the right place, so no worries. [06:28:57] tools.wmflabs.org index broken :( [06:31:18] what do you mean? [06:32:16] maybe it's something on my end, but just getting a blank page on https://tools.wmflabs.org/ right now [06:32:40] oh, only the index, maybe proxy is down. [06:32:48] yeah [06:33:21] ^^ Coren YuviPanda|zzz when you're not Zzzing [06:34:44] hey Eloquence [06:34:46] looking into it [06:34:56] (a bit strange, since other tools are resolving fine) [06:35:07] thanks for poking [06:54:58] Eloquence: back up [06:56:17] \o/ [06:56:39] investigating underlying causes now. [06:59:40] 6Labs, 10Tool-Labs, 7Tracking: Make toollabs reliable enough (Tracking) - https://phabricator.wikimedia.org/T90534#1069127 (10yuvipanda) >>! In T90534#1067877, @Pine wrote: > Just some comments about why this important: > > * Tool Labs is supposed to be more stable than Beta Labs. Beta Labs and Tool Labs a... [07:11:26] (03CR) 10Yuvipanda: "This caused the index to fail, since the code was still relying on the standalone generated php file which wasn't in the git repo. I've mo" [labs/toollabs] - 10https://gerrit.wikimedia.org/r/192765 (owner: 10Tim Landscheidt) [07:18:02] 6Labs, 10Tool-Labs: Set up sufficient monitoring for toollabs - https://phabricator.wikimedia.org/T90845#1069140 (10yuvipanda) 3NEW [07:18:34] 6Labs, 10Tool-Labs: Monitor toollabs home page to make sure it is up - https://phabricator.wikimedia.org/T90847#1069152 (10yuvipanda) 3NEW [07:24:42] 6Labs, 10Tool-Labs: Set up sufficient monitoring for toollabs - https://phabricator.wikimedia.org/T90845#1069164 (10yuvipanda) See also T69879 [07:29:55] 6Labs, 10Tool-Labs: Set up sufficient monitoring for toollabs - https://phabricator.wikimedia.org/T90845#1069173 (10yuvipanda) [07:29:56] 6Labs, 10Tool-Labs, 5Patch-For-Review: Monitor toollabs home page to make sure it is up - https://phabricator.wikimedia.org/T90847#1069170 (10yuvipanda) 5Open>3Resolved a:3yuvipanda http://shinken.wmflabs.org/service/toollabs/ToolLabs%20Home%20Page [07:30:04] Eloquence: fixed and set up a monitor for (http://shinken.wmflabs.org/service/toollabs/ToolLabs%20Home%20Page). [07:30:07] thanks for reporting! [07:32:25] 6Labs, 10Tool-Labs, 7Tracking: Make dumps syncing to Labs NFS reliable enough (Tracking) - https://phabricator.wikimedia.org/T90848#1069174 (10yuvipanda) 3NEW [07:33:04] 10Wikimedia-Labs-Infrastructure: Create -latest alias for dumps - https://phabricator.wikimedia.org/T47646#1069181 (10yuvipanda) [07:33:05] 10Tool-Labs: enwiki database dumps missing - https://phabricator.wikimedia.org/T89537#1069182 (10yuvipanda) [07:33:06] 6Labs, 10Tool-Labs, 7Tracking: Make dumps syncing to Labs NFS reliable enough (Tracking) - https://phabricator.wikimedia.org/T90848#1069174 (10yuvipanda) [07:33:21] 6Labs, 10Tool-Labs, 7Tracking: Make sure that toollabs can function fully even with one virt* host fully down - https://phabricator.wikimedia.org/T90542#1069183 (10scfc) The "limping along" bit is what I am afraid of :-). Planning for catastrophes is a hard problem. Instead of depending on that a fire will... [07:33:26] 10Wikimedia-Labs-Infrastructure: Create -latest alias for dumps - https://phabricator.wikimedia.org/T47646#481243 (10yuvipanda) [07:33:27] 6Labs, 10Tool-Labs, 7Tracking: Make toollabs reliable enough (Tracking) - https://phabricator.wikimedia.org/T90534#1069185 (10yuvipanda) [07:33:28] 10Tool-Labs: enwiki database dumps missing - https://phabricator.wikimedia.org/T89537#1038760 (10yuvipanda) [07:37:45] 6Labs, 10Tool-Labs, 7Tracking: Make sure that toollabs can function fully even with one virt* host fully down - https://phabricator.wikimedia.org/T90542#1069195 (10yuvipanda) @scfc: hmm, fair enough. I guess we should make sure to avoid any PR that says Tools is now 'fire-proof'. Allowing one virt host to di... [07:40:49] 10Tool-Labs: enwiki database dumps missing - https://phabricator.wikimedia.org/T89537#1069197 (10yuvipanda) p:5High>3Unbreak! This is still the case, despite newer dumps being available on dumps.wikimedia.org. @Coren @arielglenn? [08:17:46] 6Labs, 10Tool-Labs: Monitor bigbrother - https://phabricator.wikimedia.org/T90850#1069215 (10scfc) 3NEW [08:23:10] (03CR) 10Tim Landscheidt: "Sure that you didn't witness some temporary glitch until I merged and deployed https://gerrit.wikimedia.org/r/#/c/193044/?" [labs/toollabs] - 10https://gerrit.wikimedia.org/r/192765 (owner: 10Tim Landscheidt) [08:24:10] (03CR) 10Tim Landscheidt: "(In other words: This change was incorrect, but I fixed it in the other change.)" [labs/toollabs] - 10https://gerrit.wikimedia.org/r/192765 (owner: 10Tim Landscheidt) [09:33:54] YuviPanda: Hi [09:34:02] hi [09:34:37] YuviPanda: during your vac. I have create a documentation for the mwoffliner VMs - first step to the puppet scripts. https://wikitech.wikimedia.org/wiki/Nova_Resource_Talk:Mwoffliner [09:35:19] YuviPanda: I have also created a new VM and I want to ask if you could again make the publicIP/rsync conf on it (mwoffliner2) https://wikitech.wikimedia.org/wiki/Nova_Resource_Talk:Mwoffliner#Virtual_machine_creation [09:35:22] nice! [09:35:38] Kelson: so you want a pubic IP for rsync? [09:35:59] YuviPanda: yes, AFAIK here are the 3 points I can not do myself: Ask admin to configure a public IP [09:35:59] Ask admin to put a DNS resolver mwofflinerX.wmflabs.org [09:35:59] Ask admin to allow rsync from the outside to that server [09:36:24] there’s part 0 which I need to do (‘allocate an IP address to project mwoffliner’) [09:36:30] after that you can do all these things yourself [09:36:35] look at ‘manage addresses’ in wikitech sidebar [09:36:38] let me allocate an ip [09:36:55] YuviPanda: that's great [09:38:29] YuviPanda: hm, wikitech seems to be broken, currently I have nothing in my list of instances (should be 2 instances: mwoffliner1, mwoffliner2) https://wikitech.wikimedia.org/wiki/Special:NovaInstance [09:39:00] Kelson: try logging out and back in [09:39:46] YuviPanda: better thx [09:41:35] !log mwoffliner increased public ip quota to 2 per request from Kelson [09:41:41] Logged the message, Master [09:41:45] Kelson: ^ [09:41:52] you should be able to allocate that how you please now. [09:42:46] 6Labs, 10Tool-Labs, 10Beta-Cluster, 6operations: Investigate and do incident report for strange virt1012 issues - https://phabricator.wikimedia.org/T90566#1069340 (10hashar) 5Open>3Resolved The incident report has been written and published. Thank you @Andrew! [09:42:47] 6Labs, 10Tool-Labs, 10Beta-Cluster, 6operations: A virt host seems down, taking down all instances with it - https://phabricator.wikimedia.org/T90530#1069342 (10hashar) [09:44:28] YuviPanda: thx, ip stuff done. To allow rsync, I have to add a role? [09:44:46] Kelson: ‘manage security groups’, open up the rsync port to whatever? [09:45:57] YuviPanda: might that be that these rules are for the whole project, so if you already have configured for mwoffliner1, this applies to mwofflienr2 auto. ? [09:48:04] Kelson: yup [09:48:14] security groups are for all projects [09:48:40] YuviPanda: then, let us test it :))) Merci for your help. [09:50:11] Kelson: yw! [09:50:33] YuviPanda: with this two VMs, I'm able to create one time a month a ZIM file for all wikivoyage, wikiquote, wikinews, wikibooks, wikisource, wikiversity and wikispecies. Now start the big job with Wikipedia/wiktionary. [09:50:42] Kelson: wooot! :) [09:50:53] Kelson: you can add more VMs if that’ll increase your speed [09:51:37] :) [09:52:47] GerardM-: you’re aware of https://phabricator.wikimedia.org/T90534 [09:53:25] YuviPanda: IMO, the speed is ok. But I'll definitly need more VMs, in particular for the big Wikipedias. It seems also to be that I reach the 130GB limit with projects > 1.5 M articles. Trying currently to figure if I still can optimize something before asking if there is any solution on wmflabs side. Will probably come back later with this topic because WPEN is not only a little over this 1.5 M article limit. [09:53:54] Kelson: right. you can also put huge files in /data/project, which has a *lot* more space (15TB atm), but is NFS so will be slower [09:54:56] YuviPanda: great, I have also our server here in Switzerland... let try to pospone as much as possible this problem :) [09:55:02] cool :) [09:56:00] 10Tool-Labs, 10Datasets-General-or-Unknown, 6operations: enwiki database dumps missing - https://phabricator.wikimedia.org/T89537#1069357 (10yuvipanda) a:5coren>3ArielGlenn [09:56:15] thanks for working on this, Kelson! :) [09:59:14] Kelson: an email to labs-l about your experiences would be nice, when you have the time :) [10:00:34] GerardM-: Hi Gerard, remember this? https://meta.wikimedia.org/wiki/Offline_Projects/Library/Wikipublish ? in my mind, this is still in the pipeline, and finishing this work to get fresh ZIM files every months backed in WM datacenter is a pre-requisite. Bring back ZIM in "collection" is an other one. [10:01:09] YuviPanda: OK, I will do it [10:02:19] Kelson: ty [10:02:34] Great [10:03:02] it is important that WMF management knows about these things ... THIS is one way to realise the goal of the WMF [10:04:37] 6Labs, 10Tool-Labs: Monitor bigbrother - https://phabricator.wikimedia.org/T90850#1069381 (10yuvipanda) Hmm, I wonder how exactly we would do this, since we don't really have active checks atm. [10:04:39] GerardM-: You are right. I try. Last concrete effort in that direction: https://www.mediawiki.org/w/index.php?title=Wikimedia_MediaWiki_Core_Team%2FBacklog%2FImprove_dumps&diff=1417187&oldid=1415717 [10:22:47] YuviPanda: I have again a question. By setting a hostname to the public IP, I put "mwoffliner2" and choose "wmflabs" in the select box. Unfortunately the result is "mwoffliner2.153.80.208.in-addr.arpa" instead of "mwoffliner2.wmflabs.org" [11:01:01] 10Tool-Labs, 5Patch-For-Review: Setup (and document) an easy way to run nodejs based tools - https://phabricator.wikimedia.org/T1102#1069452 (10yuvipanda) 5Open>3Resolved a:3yuvipanda All done now! [11:04:41] 10Tool-Labs: Make webservice2 default webservice implementation - https://phabricator.wikimedia.org/T90855#1069456 (10yuvipanda) 3NEW a:3yuvipanda [11:05:43] Kelson: checking [11:06:41] Kelson: ugh, looks like a bug :( [11:06:46] Kelson: let me file a bug report [11:07:32] YuviPanda: ok :( has worked for mwoffliner1, 2 months ago... strange. [11:07:48] Kelson: yeah. is a bug in wikitech, probably [11:09:19] 6Labs, 10Wikimedia-Labs-wikitech-interface: Wikitech doesn't allow to associate a hostname with a public ip address - https://phabricator.wikimedia.org/T90856#1069466 (10yuvipanda) 3NEW [11:09:36] Kelson: ^ [11:09:48] YuviPanda: subscribed... [11:10:02] Kelson: ty. the IP itself should work, however [11:10:14] YuviPanda: yes, this is not "critical" to me. [11:10:20] Kelson: right [11:13:42] 6Labs, 10Wikimedia-Labs-wikitech-interface: Wikitech doesn't allow to associate a hostname with a public ip address - https://phabricator.wikimedia.org/T90856#1069490 (10yuvipanda) Confirmed to be not isolated to just mwoffliner [11:52:59] !log deployment-prep created deployment-parsoid01-test to test patch to use role::parsoid on labs [11:53:04] Logged the message, Master [12:00:06] 6Labs: mwyaml backend isn't tried at all - https://phabricator.wikimedia.org/T90466#1069550 (10yuvipanda) Right, I think what happens is if your project already has hiera files in ops/puppet and then you try to override them on wikitech, *that* doesn't show up. [12:00:44] 6Labs, 6operations, 7Puppet: Values from mwyaml backend don't override values from ops/pupppet yaml files in hieradata/labs - https://phabricator.wikimedia.org/T90466#1069551 (10yuvipanda) [12:05:49] 6Labs, 6operations, 7Puppet: Values from mwyaml backend don't override values from ops/pupppet yaml files in hieradata/labs - https://phabricator.wikimedia.org/T90466#1069563 (10Joe) Uhm, this is the case because you specifically asked the values in puppet/hieradata to be authoritative :) We can change that... [12:09:10] 6Labs, 6operations, 7Puppet: Values from mwyaml backend don't override values from ops/pupppet yaml files in hieradata/labs - https://phabricator.wikimedia.org/T90466#1069564 (10yuvipanda) oh, right. I guess that's a miscommunication somewhere. I'd definitely want wikitech values to override ops/puppet values. [12:09:21] 6Labs, 6operations, 7Puppet: Values from mwyaml backend don't override values from ops/pupppet yaml files in hieradata/labs - https://phabricator.wikimedia.org/T90466#1069565 (10yuvipanda) p:5Triage>3Normal [12:47:28] hi [12:48:55] can someone help me setting a gui client on linux mint in order to manage files on my tool thing [12:49:08] im a complete noob in this area [12:54:48] 6Labs, 10Tool-Labs, 7Tracking: Make dumps syncing to Labs NFS reliable enough (Tracking) - https://phabricator.wikimedia.org/T90848#1069613 (10ArielGlenn) [12:54:49] 10Tool-Labs, 10Datasets-General-or-Unknown, 6operations: enwiki database dumps missing - https://phabricator.wikimedia.org/T89537#1069611 (10ArielGlenn) 5Open>3Resolved The copy script is running regularly (as evidenced by the fact that it's picking up the other dumps). I guess that at some point the en... [13:33:28] I have a question regarding the databases on tool labs [13:34:43] there are tables like revision_userindex, which implies that there is some kind of index [13:34:50] but [13:34:54] MariaDB [wikidatawiki_p]> show index from revision_userindex; [13:34:54] Empty set (0.00 sec) [13:47:24] I need to start working during the night. The last four outages that were reported here happened while I was sleeping! [13:47:39] * Coren grumbles. [13:48:03] YuviPanda: Can you tell me what was up with tools.wmflabs.org? [13:54:12] lbenedix: Those are views and mysql doesn't like to be clear about the underlying tables. [13:55:27] lbenedix: https://wikitech.wikimedia.org/wiki/Help:Tool_Labs/Database#Tables_for_revision_or_logging_queries_involving_user_names_and_IDs explains their use [14:32:06] 6Labs: Increase storage available to labs NFS server - https://phabricator.wikimedia.org/T85607#1069784 (10coren) There are issues with the Precise version of LVS on labstore1001 that (obviously) were not picked up by test on labstore2001 - which is at Trusty. Investigation points to problems in the lvm2 packag... [14:33:47] Coren: [14:33:54] ? [14:33:57] are you sure it's userspace issues? [14:34:25] paravoid: Yes; the lack of a thin_check executable is something that was fixed upstream in 2.0.99 [14:34:36] okay [14:34:56] I'm first checking if it's in backports though. Might be lucky. [14:40:26] we should really plan for an upgrade at some point [14:42:45] paravoid: Very much so. But honestly, given the number of labs downtimes in the last month because random hardware failure, I'd really rather not hit the users again for a while. [14:45:42] paravoid: That's odd. Are we special-handling security.ubuntu.com at the network level? I can hit it from labstore but not ports.ubuntu.com [14:45:50] (No issue with either from my home network) [14:48:16] Ah, nevermind, it's not backported. [14:48:30] it's special-handled in the apt config [14:48:41] yes, I agree to not hit the users again for a while [14:48:53] but we should find a way where one machine is not a Labs SPOF :) [14:49:51] I could update and set 1002 up though - switchover between the two is a 5m downtime or so. [14:50:23] Then update 1001 and switch back. [14:51:35] That'd be as close to painless as possible with NFS [16:48:26] I am having problems with one of my instances. I keep getting 503 errors when I try to use pywikibot to upload pages. here is a pastebin: http://pastebin.com/XiUh5SsC [16:48:51] This is the instance: http://drmf-beta.wmflabs.org/wiki/Main_Page [16:49:02] Does anyone have any idea what might be going on? [16:54:21] Howie_: What do you see? I see a mw main page with a unicorn logo [16:57:23] @Coren. that's correct [16:57:40] But I get the 503 errors when I try to use pywikibot. Did you look at the pastebin [16:58:12] Ah, no, sorry - didn't notice that. [16:59:48] That said, api.php seems to work fine as well; but the pywikibot stack trace doesn't tell us what it actually tried to do. [18:05:58] PROBLEM - Puppet failure on tools-webproxy-02 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [18:12:04] PROBLEM - Puppet failure on tools-webproxy-01 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [18:37:42] Could a admin restart the webservice for drtrigonbot on tools? Has been down for a day, but the maintainer hasn't been active in months. No hurries, thanks in advance [18:40:39] 6Labs, 10Tool-Labs: Provide webservice bigbrotherrc for actively used tools - https://phabricator.wikimedia.org/T90569#1070713 (10Dzahn) here's another one: < sitic> Could a admin restart the webservice for drtrigonbot on tools? Has been down for a day, but the maintainer hasn't been active in months. No hurr... [18:42:04] sitic: Should be up shortly. [18:42:12] thanks :-) [18:50:08] 10MediaWiki-extensions-OpenStackManager, 10Echo: No "View notifications" link in an empty Echo notification - https://phabricator.wikimedia.org/T55771#1070823 (10EBernhardson) [18:50:30] 10MediaWiki-extensions-OpenStackManager, 10Echo: No "View notifications" link in an empty Echo notification - https://phabricator.wikimedia.org/T55771#1070826 (10EBernhardson) p:5Triage>3Normal This looks like a misconfiguration of the openstack notifications for echo, . [19:03:51] 10MediaWiki-extensions-OpenStackManager, 10Echo: No "View notifications" link in an empty Echo notification - https://phabricator.wikimedia.org/T55771#1070894 (10Quiddity) [19:38:42] 6Labs, 10Tool-Labs: create bigbrotherrc for drtrigonbot - https://phabricator.wikimedia.org/T90912#1070988 (10valhallasw) 3NEW [19:45:47] Hi. Can I request deletion two of my now obsolete and empty tool? Thank you [19:46:20] ebraminio: Sure, which are they? [19:47:37] Coren: I've PMd their name to you. Thank you [19:48:33] They be dead [19:49:13] Coren: Thank you. Also I wanted to thank you and YuviPanda|zzz for the new node.js webservice2 capability, it is very nice to have them :) [19:49:35] ebraminio: The thanks goes mostly to Yuvi, he was the node.js champion. :-) [19:57:54] Coren: PMd another one for deletion, thank you again :) [19:57:56] 6Labs, 6operations, 7Puppet: Values from mwyaml backend don't override values from ops/pupppet yaml files in hieradata/labs - https://phabricator.wikimedia.org/T90466#1071085 (10thcipriani) Since Ifa5c79bc10f1147518fbf352d75c9fcd3019bd72 I don't think mwyaml will read from Hiera: since the regex spe... [20:10:39] hi. I'm trying to access a tool I built a few months ago at http://tools.wmflabs.org/grantmaking/geo-data-prototype.html but I keep getting no webservice. I've confirmed that the html file is still there though. [20:12:25] do we have news on beta labs having issues? I am getting intermittent DB connection error: Can't connect to MySQL server on '10.68.16.193' errors [20:12:27] HaithamS: Is your webservice actually running? You can see if it is with 'qstat' and restart it with 'webservice start' if it isn't. [20:12:50] and other intermittent weirdness over the last 24 hours or so [20:13:03] chrismcmahon: First I hear of it. Do you know if Sean is aware? [20:13:44] Coren: it's really flaky and not reproducible (yet) [20:14:17] I'm guessing no one is aware but me and some Mobile folks who had some tests fail [20:17:29] chrismcmahon: I see no smoking gun in the logs (although there are a LOT of warnings you* may want to look into) *someone from the team [20:18:35] Coren: OK. something funky seems to be afoot, I had intermittent 503s from beta labs overnight and today at least 2 db errors, plus some connection weirdness that might be beta but might be SauceLabs, hard to tell [20:18:53] so nothing specific, and just a few general failures [20:19:15] 6Labs, 10Tool-Labs: Provide webservice bigbrotherrc for actively used tools - https://phabricator.wikimedia.org/T90569#1071177 (10Dzahn) 12:10 < HaithamS> hi. I'm trying to access a tool I built a few months ago at http://tools.wmflabs.org/grantmaking/geo-data-prototype.html but I keep getting no webservice. I... [20:19:49] Thanks Coren, will check and let you know [20:20:05] Hi, anyone know what I need to write in lighttpd config if I want to make https://tools.wmflabs.org/otrsreports rather than just https://tools.wmflabs.org/otrsreports/ redirect to https://tools.wmflabs.org/otrsreports/index.html? Right now we're using [20:20:06] url.rewrite-once = ( "^(.*)/$" => "$1/") [20:20:06] url.rewrite-if-not-file = ( "^([^?]*)(\?.*)?$" => "$1.html$2" ) [20:20:10] You may be reaching capacity? I see no cries for help from tyhe DB proper, though I see a database crash maybe some 9 days ago. [20:22:04] pajz: I'm not sure what "^(.*)/$" => "$1/" was intended to do, but that's basically a noop. [20:22:39] pajz: Because what that does is replace "anything followed by a slash" with "that thing, followed by a slash". [20:23:56] url.rewrite-once = ( "^([^/]*)$" => "$1/" ) is probably what you wanted to do. [20:24:23] Hm, more likely url.rewrite-once = ( "^/([^/]*)$" => "/$1/" ) [20:24:43] Basically I'd just like two things: .../otrsreports should redirect to /otrsreports/index.html and for every sub-page, whenever it ends in .html it should be accessible without the .html [20:26:16] Do you think url.rewrite-once = ( "^/([^/]*)$" => "/$1/" ) will do the trick? [20:27:30] It should. [20:28:14] Thank you, I'll try :) [20:30:05] Coren, it worked. Thanks!! [21:39:30] hi there, quick question for Coren or another labs admin: I'm helping someone set up their labs account so they can run queries against the replica dbs. She has created an account and is able to SSH in to tools-login. But I don't see a replica.my.cnf file (or .my.cnf file) in her home directory. Is there a step we missed? She is wikitech user:wubwubwub [21:39:58] J-Mo: How long ago was the account created? [21:40:33] she says 5 days ago, Coren [21:41:02] J-Mo: Might be related to the outage earlier this week. Lemme check. [21:41:58] thanks! [21:43:22] J-Mo: There were indeed a couple of credentials that were missed in the shuffle. fix't [21:47:49] got 'em! Thanks, Coren. I appreciate the quick turn-around. [22:05:45] 6Labs, 10hardware-requests, 6operations: Buy at least one more virt server for eqiad - https://phabricator.wikimedia.org/T90783#1071699 (10RobH) [22:05:51] 6Labs, 10hardware-requests, 6operations: New hp servers for labs - https://phabricator.wikimedia.org/T89752#1071702 (10RobH) [22:19:08] 6Labs, 10hardware-requests, 6operations: New hp servers for labs - https://phabricator.wikimedia.org/T89752#1071759 (10RobH) [22:19:18] 6Labs, 10hardware-requests, 6operations: Buy at least one more virt server for eqiad - https://phabricator.wikimedia.org/T90783#1071762 (10RobH) [22:24:52] 6Labs, 10hardware-requests, 6operations: eqiad: (4) virt nodes - https://phabricator.wikimedia.org/T89752#1071798 (10RobH) 5Open>3stalled a:3RobH [22:25:17] 6Labs, 10hardware-requests, 6operations: eqiad: (1) virt node - https://phabricator.wikimedia.org/T90783#1071803 (10RobH) 5Open>3stalled [22:31:04] 10MediaWiki-extensions-OpenStackManager, 10Wikimedia-Labs-Infrastructure: Switch to using nova internal DNS - https://phabricator.wikimedia.org/T90289#1071840 (10Andrew) 5Open>3Invalid this looks like a dead end. [22:35:39] andrewbogott: Is there some auto-updating principle as far as you know for self puppetmasters? [22:36:00] 6Labs: Install OpenStack Horizon for production labs - https://phabricator.wikimedia.org/T87279#1071875 (10Andrew) californium is tapped for this job [22:36:12] Krinkle: I believe there’s a flag that makes the puppet master auto-update. [22:36:20] This might be setup by Antoine, but lacking documentation I'm hoping maybe you know. It seems every hour at :16 the /var/lib/git/operations/puppet on integartin-puppetmater is updated. [22:36:25] Sometimes causing conflicts naturally [22:36:46] Wait, you /don’t/ want it to update? [22:36:46] we used to do it manually once a week [22:36:56] Yeah, that seems sensible, no? [22:37:15] I don't want random stuff to break or changes to get deployed [22:37:35] especially with the practice of ops randomly breaking stuff that we don't use in prod. e.g. duplciate package definitions are hard to detect. [22:37:53] of course I'm annoyed because I ran into it while building the instances earlier today. [22:38:03] the instances wouldn't provision beayse of a syntax error (!) on puppet master [22:38:09] from an unresolved merge conflict [22:38:16] <<< in the middle of the file [22:38:18] dirty git status [22:39:15] i don't have the bandwidth to have puppet break and then to have to update to latest ops/puppet upstream when things break. [22:39:31] prefer it happen on my terms when I have the time for it [22:39:38] unless it's maintained by ops entirely. [22:40:17] There is a ‘puppetmaster_autoupdate’ flag [22:40:22] It is set to ‘true’ for your puppetmaster [22:40:28] I dontknow when it was enabled or how. But it broke twice in one day today. And wasn't there 2 months ago. [22:40:30] I didn’t implement the flag or set it [22:42:35] My experience is that those who plan to update manually never do, but maybe you’re the exception :) [22:43:16] well, we sure did lag behind several weeks. [22:43:25] But that's what you get with 2 part-time people maintaining it. [22:43:42] It only broke once over the past year when default values changed for ldap or something like that [22:43:43] Why does it need to diverge from master puppet at all? Is that one of the things it tests? [22:43:45] and we rebased immedaitely. [22:44:21] But I'd rather lag behind and break once a year when ops/labs drop support and we forget to update, then to have it break continously without notice. [22:44:48] andrewbogott: We average about a dozen unmerged patches. [22:45:00] Which sometimes get merged modified by the merger, [22:45:02] as happened today [22:45:09] so it conflicts [22:45:30] and the autoupdate flag apparently is implemented in such a way that it leaves the rebase behind in a broken state [22:45:49] But I know where to look now :) [23:11:08] andrewbogott: Okay, almost done with provisioning. Doing a quick check and noticed that the newly created trusty instances show up weird in their memory statistics. https://tools.wmflabs.org/nagf/?project=integration#h_integration-slave1402_memory The existing precise and trusty nodes and the new precise nodes have like 6/10 memory used. But hte new trusty nodes have 10/8 memory used? [23:14:06] Krinkle: I’m not able to load that page (probably my dumb ISP’s fault.) does running ‘free’ on the instance return sensible numbers? [23:14:47] total used free shared buffers cached [23:14:47] Mem: 8176828 7911944 264884 19868 214972 5505876 [23:15:46] for the new integration-slave1402. Never used. Freshly provisioned (finished 6 hours ago, idle since) [23:16:42] That has to be something that puppet did to them — my new instances don’t look like that at all [23:17:27] andrewbogott: http://bit.ly/1Evzeo2 [23:18:04] 1010 is half like that with many days uptime. [23:18:06] Strange. [23:18:09] OK. Will keep looking [23:20:17] Aye, puppet failing again. Tried 3 times in a row, but consistently failing Warning: Error 400 on SERVER: cannot generate tempfile `/var/lib/puppet/yaml/node/i-000008ce.eqiad.wmflabs.yaml20150226-16857-1arpl16-9' [23:20:23] Aye, syslog on puppetmaster shows disk is full [23:20:23] ugh [23:21:23] hi guys! I was wondering if anyone knows how to extract article coordinates from mediawiki databases in tool labs. [23:22:13] var/log/puppet/reports 900M of 1.8GB /var/log [23:22:16] on puppetmaster [23:23:14] * Krinkle goes on removal hunt [23:25:00] /var disk is full (1.8 of 1.9GB) - /var/log/puppet/reports is 1.1GB - purging