[00:01:51] (03PS36) 10Ricordisamoa: Initial commit [labs/tools/wikidata-slicer] - 10https://gerrit.wikimedia.org/r/241296 [00:06:51] (03CR) 10Ricordisamoa: "PS36 avoids overwriting the 'next' builtin function in app.py" [labs/tools/wikidata-slicer] - 10https://gerrit.wikimedia.org/r/241296 (owner: 10Ricordisamoa) [00:20:58] (03PS37) 10Ricordisamoa: Initial commit [labs/tools/wikidata-slicer] - 10https://gerrit.wikimedia.org/r/241296 [00:23:50] (03CR) 10Ricordisamoa: "PS37 uses url_for() also for the login link" [labs/tools/wikidata-slicer] - 10https://gerrit.wikimedia.org/r/241296 (owner: 10Ricordisamoa) [00:30:40] (03PS38) 10Ricordisamoa: Initial commit [labs/tools/wikidata-slicer] - 10https://gerrit.wikimedia.org/r/241296 [00:34:48] (03CR) 10Ricordisamoa: "PS38 uses url_for() for 3 other links" [labs/tools/wikidata-slicer] - 10https://gerrit.wikimedia.org/r/241296 (owner: 10Ricordisamoa) [01:32:27] UBN: Can someone take a look for the LDAP? He don't let my set up working instnaces [01:35:24] 6Labs, 10Labs-Infrastructure: LDAP is not working - https://phabricator.wikimedia.org/T122757#1913815 (10Luke081515) [03:03:55] (03PS39) 10Ricordisamoa: Initial commit [labs/tools/wikidata-slicer] - 10https://gerrit.wikimedia.org/r/241296 [03:11:04] (03CR) 10Ricordisamoa: "PS39 makes formatters singletons instead of static classes" [labs/tools/wikidata-slicer] - 10https://gerrit.wikimedia.org/r/241296 (owner: 10Ricordisamoa) [03:32:08] (03PS40) 10Ricordisamoa: Initial commit [labs/tools/wikidata-slicer] - 10https://gerrit.wikimedia.org/r/241296 [03:34:15] (03CR) 10Ricordisamoa: "PS40 removes unused quote_plus import" [labs/tools/wikidata-slicer] - 10https://gerrit.wikimedia.org/r/241296 (owner: 10Ricordisamoa) [03:36:11] (03PS41) 10Ricordisamoa: Initial commit [labs/tools/wikidata-slicer] - 10https://gerrit.wikimedia.org/r/241296 [03:37:54] (03CR) 10Ricordisamoa: "PS41 makes StringFormatter extend ValueFormatter directly instead of through SimpleTermFormatter" [labs/tools/wikidata-slicer] - 10https://gerrit.wikimedia.org/r/241296 (owner: 10Ricordisamoa) [03:41:19] (03PS42) 10Ricordisamoa: Initial commit [labs/tools/wikidata-slicer] - 10https://gerrit.wikimedia.org/r/241296 [03:51:59] (03CR) 10Ricordisamoa: "PS42 always passes the mainsnak to statement formatters, thus making StringFormatter work again" [labs/tools/wikidata-slicer] - 10https://gerrit.wikimedia.org/r/241296 (owner: 10Ricordisamoa) [08:41:12] Change on 12wikitech.wikimedia.org a page Nova Resource:Tools/Access Request/Dst2015 was created, changed by Dst2015 link https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/Access_Request/Dst2015 edit summary: Created page with "{{Tools Access Request |Justification=I would like to create a tool to automatically classify Wikipedia articles (FA, GA, A-Class, B-Class, etc.). After that I would also like..." [10:13:47] Change on 12wikitech.wikimedia.org a page Nova Resource:Tools/Access Request/Dst2015 was modified, changed by Tim Landscheidt link https://wikitech.wikimedia.org/w/index.php?diff=247049 edit summary: [10:51:25] 10Tool-Labs-tools-Other, 6Community-Tech, 6Community-Tech-fixes, 7Tracking: Improving Magnus' tools (tracking) - https://phabricator.wikimedia.org/T115537#1914092 (10Ricordisamoa) Tool Labs volunteer Tim Landscheidt has published [[ //lists.wikimedia.org/pipermail/labs-l/2015-December/004193.html | a list... [11:50:53] PROBLEM - ToolLabs Home Page on toollabs is CRITICAL: CRITICAL - Socket timeout after 10 seconds [11:55:52] RECOVERY - ToolLabs Home Page on toollabs is OK: HTTP OK: HTTP/1.1 200 OK - 974411 bytes in 6.102 second response time [12:18:22] 6Labs, 10Tool-Labs: labs NFS slowness / high load - https://phabricator.wikimedia.org/T122743#1914154 (10valhallasw) ``` 12:50 PROBLEM - ToolLabs Home Page on toollabs is CRITICAL: CRITICAL - Socket timeout after 10 seconds 12:55 RECOVERY - ToolLabs Home Page on toollabs is OK: HTTP O... [12:24:27] 6Labs, 10Tool-Labs, 10DBA, 6Stewards-and-global-tools: Throttling linkwatcher tool user as it is consuming 100% CPU - https://phabricator.wikimedia.org/T121094#1914161 (10Beetstra) Working on it again. Some of the new counting mechanisms were not performing as requested, but that has now been updated.... [13:14:28] 6Labs, 10Tool-Labs, 10DBA, 6Stewards-and-global-tools: Throttling linkwatcher tool user as it is consuming 100% CPU - https://phabricator.wikimedia.org/T121094#1914267 (10jcrespo) @Beetstra Having a lot of transactions per second is not a problem, that can be handled by the existing resources, and if it wa... [13:22:01] 6Labs, 10Continuous-Integration-Infrastructure, 10Labs-Infrastructure, 6Release-Engineering-Team, and 3 others: rake-jessie jobs stuck due to no ci-jessie-wikimedia slaves being attached to Jenkins - https://phabricator.wikimedia.org/T122731#1914298 (10hashar) Nodepool relies on the wmflabs OpenStack API.... [13:45:33] 6Labs, 10Continuous-Integration-Infrastructure, 10Labs-Infrastructure, 6Release-Engineering-Team, and 4 others: rake-jessie jobs stuck due to no ci-jessie-wikimedia slaves being attached to Jenkins - https://phabricator.wikimedia.org/T122731#1914303 (10hashar) a:3hashar https://gerrit.wikimedia.org/r/#/c... [13:46:00] 6Labs, 10Continuous-Integration-Infrastructure, 10Labs-Infrastructure, 6operations, 5Patch-For-Review: rake-jessie jobs stuck due to no ci-jessie-wikimedia slaves being attached to Jenkins - https://phabricator.wikimedia.org/T122731#1914305 (10hashar) [13:46:47] 6Labs, 10Continuous-Integration-Infrastructure, 10Labs-Infrastructure, 6operations, 5Patch-For-Review: Nodepool deadlocks when querying unresponsive OpenStack API (was: rake-jessie jobs stuck due to no ci-jessie-wikimedia slaves being attached to Jenkins) - https://phabricator.wikimedia.org/T122731#1914308... [14:24:43] 6Labs, 10Tool-Labs: tools-checker-01 denies access with ssh - https://phabricator.wikimedia.org/T122470#1905220 (10scfc) (I split off T122802 for toolsbeta-webproxy-01 and [[https://wikitech.wikimedia.org/w/index.php?title=Nova_Resource:Tools/Admin&diff=247086&oldid=242222|documented]] the project root key pro... [14:39:32] 6Labs, 10Tool-Labs: toolsbeta-webproxy-01 not accessible - https://phabricator.wikimedia.org/T122802#1914344 (10scfc) [14:59:05] Change on 12wikitech.wikimedia.org a page Nova Resource:Tools/Access Request/Jarallah was created, changed by Jarallah link https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/Access_Request/Jarallah edit summary: Created page with "{{Tools Access Request |Justification=I would to create articles via the bot for the Arabic Wikipedia and need for this Tools. |Completed=false |User Name=Jarallah }}" [16:04:05] Is anyone allowed to create in the template namespace on the Wikitech wiki? [16:31:22] 6Labs, 10Tool-Labs: toolsbeta-webproxy-01 not accessible - https://phabricator.wikimedia.org/T122802#1914449 (10yuvipanda) Yeah, the problem is that puppet hasn't run there in so long, mostly due to my fault (I added a bunch of required hiera params to the proxy role and did not add them to toolsbeta). [16:45:09] 6Labs, 10Tool-Labs: labs NFS slowness / high load - https://phabricator.wikimedia.org/T122743#1914462 (10yuvipanda) Hmm, it is using ` metric => "servers.${::hostname}.loadavg.01",` so that, I guess? [16:55:10] TerraCodes: I think so? [16:55:37] ok [17:02:45] 6Labs, 10Tool-Labs: labs NFS slowness / high load - https://phabricator.wikimedia.org/T122743#1914467 (10valhallasw) Ah, yes. Found them -- `servers.labstore1001.loadavg.*`. This is the graph of the last week: {F3201383} so it does seem to be a mostly transient load issue. [17:03:24] valhallasw`cloud: oh, I didn't know you had graphite prod access but makes sense [17:03:33] valhallasw`cloud: I'm trying to think how to track *what* is causing load [17:03:36] YuviPanda: anyone with NDA has access [17:03:39] (am at WM Dev SUpport) [17:03:42] valhallasw`cloud: yeah, forgot... [17:04:11] YuviPanda: so I think we're probably fine for now -- as long as it's transient, it's not a huge issue. I'm a bit concerned about the icebot thing though [17:04:33] valhallasw`cloud: what's the icebot stuff? I haven't caught up on backlog :( [17:04:57] YuviPanda: icebot is basically doing querying using bash + sql, causing like 1k sql invocations per minute [17:05:02] ah [17:05:08] on tools-bastion-02 [17:05:12] valhallasw`cloud: yeah, I think we can safely disable them right now? [17:05:31] valhallasw`cloud: I also never swatted the maintainers of the other issues (gridengine floods :|) [17:05:46] yeah, it's too hard to contact maintainers [17:05:53] yeah [17:05:53] that's my goal with tools.contact [17:06:00] * YuviPanda nods [17:06:05] write bug report, go there, ente rbug number and it will spam talk pages [17:08:58] YuviPanda: and just CCing people in phab is not enough to reach them, it seems (see faes email, he was cc'ed on the task, but apparently did not receive an email about it, or missed it) [17:09:43] valhallasw`cloud: I also emailed him and he responded too, so that email confused me [17:10:00] hm, maybe he just forgot about it then [17:11:55] Labsadmins: Can you take a look at T122757? This bug don't let me create any working trusty/ürecise instance [17:11:59] *precise [17:13:51] (03PS43) 10Ricordisamoa: Initial commit [labs/tools/wikidata-slicer] - 10https://gerrit.wikimedia.org/r/241296 [17:15:43] Luke081515: I'm taking a look [17:15:56] thanks [17:17:17] (03CR) 10Ricordisamoa: "PS43 removes the hard-coded demo and API machinery from EditDispatcher, moved into DemoEditDispatcher and RealEditDispatcher respectively" [labs/tools/wikidata-slicer] - 10https://gerrit.wikimedia.org/r/241296 (owner: 10Ricordisamoa) [17:20:26] Luke081515: I created rcm-1001, and it seems to work [17:21:16] 6Labs, 10Labs-Infrastructure: LDAP is not working - https://phabricator.wikimedia.org/T122757#1914497 (10yuvipanda) p:5Unbreak!>3Triage I created rcm-1001 and it works. Did you delete and recreate instances with the same name earlier? sometimes they get DNS cached and name-reuse causes this problem... [17:21:38] YuviPanda: It's jessie. Jessie worked yesterday, but in trusty/precise I can't login after creation [17:22:28] 6Labs: Rebuild jessie image - https://phabricator.wikimedia.org/T122812#1914501 (10yuvipanda) 3NEW [17:22:37] Luke081515: ok, I'll try trusty too. [17:22:42] Luke081515: I'm going to delete rcm [17:22:44] err [17:22:46] rcm-1001 [17:23:19] ok [17:23:31] YuviPanda: I found this in rcm-1001 output: [17:23:38] 2016-01-04T17:16:21.469709+00:00 rcm-1001 nslcd[1025]: [2da5b5] ldap_start_tls_s() failed (uri=ldap://ldap-codfw.wikimedia.org:389): Connect error: TLS: hostname does not match CN in peer certificate [17:23:49] Luke081515: yup, that looks like puppet didn't run [17:24:03] Luke081515: did you have another instance with the same name before? [17:24:41] rcm-1001? I just have rcm-1-6 more times, but I wait more than a half day, before recreating [17:27:03] Luke081515: ok I can reproduce it [17:27:06] looks like puppet is failing [17:27:10] because it can't install 'arcconf' [17:27:48] looks like this is the phabricator puppet [17:28:57] so: disable the phabricator role, let the machine build, and only after you can login enable it again to debug? [17:31:22] valhallasw`cloud: The problem is: I have only one role active, and till the next deploy, where this is fixed, I can not remove this role [17:31:40] This is https://phabricator.wikimedia.org/T122733 [17:31:44] Luke081515: activate some other role? [17:32:00] you can go to 'manage puppet groups' and add a harmless role [17:32:06] like role::labs::instance [17:33:07] ok, i will try it [17:38:12] Seems like this solved the issue [17:38:24] \o/ [17:38:26] ok [17:38:40] Luke081515: in general, when cerating a new instance, it's good to wait until first puppet run completes before applying your own code [17:38:50] *own roles [17:39:24] YuviPanda: The thing is: I didn't attach any roles [17:39:34] Luke081515: the phab role? [17:39:36] before the creation finished, but I get the errors [17:39:38] Luke081515: actually yeah, that's right too [17:39:42] Not to rcm-6 and rcm-5 [17:39:45] too early in the day my brain to work [17:39:49] you're right [17:39:52] yeah [17:41:11] Luke081515: are you unblocked now, btw? [17:41:22] Yeah, thanks [17:41:31] Luke081515: thanks! [17:45:05] 6Labs, 10Labs-Infrastructure, 5Patch-For-Review: LDAP is not working - https://phabricator.wikimedia.org/T122757#1914575 (10yuvipanda) Can confirm it's happening with trusty instances... [18:16:20] Change on 12wikitech.wikimedia.org a page Nova Resource:Tools/Access Request/Jarallah was modified, changed by Tim Landscheidt link https://wikitech.wikimedia.org/w/index.php?diff=247227 edit summary: [18:16:53] hi all - is this a good time to request a password reset for -labs ? (I see LDAP Problems in the header there) [18:17:02] YuviPanda: Trusty builds work. They are very slow due, I think, to the changes you made to apt proxy stuff on carbon, but that’s not urgent. [18:17:20] andrewbogott: hmm, interesting. [18:17:25] darkblue_b: you should be able to do a password reset on wikitech [18:17:32] I don't think there are LDAP problems [18:17:39] oh ok - looking thx [18:18:03] heh [18:18:20] haha [18:18:22] ok [18:18:23] yours is better [18:18:36] andrewbogott: I tried rcm-1002 and it failed [18:18:43] ok, will look [18:18:51] precise seems broken in any case [18:19:11] right [18:21:09] YuviPanda: I just logged in to rcm-1002 [18:21:51] andrewbogott: oh [18:21:53] andrewbogott: I see [18:21:55] ok [18:22:02] Luke081515: ^ [18:23:17] 6Labs, 10Labs-Infrastructure: Setup an apt proxy for labs - https://phabricator.wikimedia.org/T122819#1914685 (10yuvipanda) 3NEW [18:23:19] andrewbogott: ^ there. [18:23:22] ok [18:23:50] 6Labs, 10Labs-Infrastructure: Setup an apt proxy for labs - https://phabricator.wikimedia.org/T122819#1914697 (10yuvipanda) I also want us to not use squid for it - @JGreen was talking about using nginx for this in FR maybe we should do that too. [18:25:23] YuviPanda: why can't we use carbon? [18:25:28] new firewall settings? [18:25:49] valhallasw`cloud: because of https://phabricator.wikimedia.org/T122368 [18:26:12] I can't see that, but I get the point :-p [18:26:23] valhallasw`cloud: I added you to the policy [18:26:54] 6Labs, 10Labs-Infrastructure: Setup an apt proxy for labs - https://phabricator.wikimedia.org/T122819#1914711 (10yuvipanda) Can't use carbon because of T122368 [18:28:50] still doesn't work :/ [18:29:50] valhallasw`cloud: try now [18:30:47] aaah. [18:38:36] 10Tool-Labs-tools-Other, 7Epic: Convert all Labs tools to use cdnjs for static libraries and fonts - https://phabricator.wikimedia.org/T103934#1914725 (10Ricordisamoa) [18:41:31] So quick question, is it all cool to use tool labs as a proxy, because wifi at Wikidev is crap and blocking everything? [18:41:51] And furthermore, is it ok if I publically post instructions on wiki how to do that [18:44:59] bawolff: I'm going to create a separate public bastion instance that people can use for it [18:45:16] cool :) [18:50:05] bawolff: use mwds-proxy.wmflabs.org [18:50:10] bawolff: I'll delete it after tomorrow [18:50:15] thanks [18:50:39] RECOVERY - Puppet failure on tools-docker-builder-01 is OK: OK: Less than 1.00% above the threshold [0.0] [18:52:34] bawolff, YuviPanda, it's against the TOS, but I suppose an exception for wikidev makes sense [18:52:54] valhallasw`cloud: yup, which is why I made a separate instance [18:57:05] Well I just added instructions to https://www.mediawiki.org/wiki/Wikimedia_Developer_Summit_2016#Discussion_session_resources [18:58:20] bawolff: I edited it. it just needs someone to have a labs account (is in the bastion project) [18:58:31] thanks [18:58:51] 6Labs, 6operations, 5Patch-For-Review: Increase timeout for tools-home check - https://phabricator.wikimedia.org/T122615#1914796 (10Dzahn) a:3Dzahn [19:06:36] 6Labs, 6operations, 5Patch-For-Review: Increase timeout for tools-home check - https://phabricator.wikimedia.org/T122615#1914817 (10Dzahn) p:5Triage>3Normal With the changes above the timeout has been raised from 10 to 20 and made configurable. It can be adjusted by editing `check_command => 'check_http... [19:07:06] 6Labs, 6operations, 5Patch-For-Review: Increase timeout for tools-home check - https://phabricator.wikimedia.org/T122615#1914823 (10Dzahn) 5Open>3Resolved [19:07:31] 6Labs, 6operations, 5Patch-For-Review: Increase timeout for tools-home check - https://phabricator.wikimedia.org/T122615#1914827 (10yuvipanda) \o/ Thanks for doing that! I think we can re-enable the SMS notification now too [19:10:17] YuviPanda: tools-login is still flapping every now and then because of nfs [19:10:28] auuughhhh [19:11:11] valhallasw`cloud: there's nothing killing it as such [19:11:15] so I suspect it's LDAP being terrible [19:11:28] would that spike load on the server? [19:12:31] valhallasw`cloud: depends, but it'll definitely make file access hang [19:13:16] (03PS1) 10Legoktm: Generate a gitinfo.json to be included in tarballs [labs/tools/extdist] - 10https://gerrit.wikimedia.org/r/262366 (https://phabricator.wikimedia.org/T122769) [19:13:19] valhallasw`cloud: I'm going to try bugging moritz and paravoid in person to see if they can take a look at this [19:13:28] LDAP has been super frustrating [19:13:44] ok [19:16:30] 6Labs, 6operations, 5Patch-For-Review: Increase timeout for tools-home check - https://phabricator.wikimedia.org/T122615#1914888 (10Dzahn) >>! In T122615#1914827, @yuvipanda wrote: > \o/ Thanks for doing that! I think we can re-enable the SMS notification now too alright, done with the revert above. will be... [19:18:15] 6Labs, 6operations: Increase timeout for tools-home check - https://phabricator.wikimedia.org/T122615#1914892 (10Dzahn) [19:28:38] (03PS2) 10Legoktm: Generate a gitinfo.json to be included in tarballs [labs/tools/extdist] - 10https://gerrit.wikimedia.org/r/262366 (https://phabricator.wikimedia.org/T122769) [19:56:59] 6Labs, 10Continuous-Integration-Infrastructure, 10Labs-Infrastructure, 6operations, 5Patch-For-Review: Nodepool deadlocks when querying unresponsive OpenStack API (was: rake-jessie jobs stuck due to no ci-jessie-wikimedia slaves being attached to Jenkins) - https://phabricator.wikimedia.org/T122731#1914967... [20:10:27] kaldari, ping [20:10:53] kaldari, I figure IRC communication would be easier than email. [20:24:00] hi Cyberpower678 :) [20:30:11] myrcx, hi [22:39:28] 6Labs, 10Tool-Labs: toolsbeta-webproxy-01 not accessible - https://phabricator.wikimedia.org/T122802#1915363 (10scfc) Puppet staleless in Toolsbeta is usually more often caused by local hacks on the puppetmaster that cannot be rebased automatically. The puppetmaster then stops auto-updating. But as it isn't... [23:07:29] 6Labs, 7LDAP: Restore ldaplist -l passwd - https://phabricator.wikimedia.org/T122595#1915417 (10MoritzMuehlenhoff) I have a patch for that which is working fine, but there's been an API change in python-ldap 2.3 (precise) and 2.4 (trusty, jessie). Do we also need this in precise?`Then I would need to implement... [23:29:24] YuviPanda: a) I see you b) can you make a ticket about that thing from last night for me? (: