[00:56:22] (03CR) 10Tim Landscheidt: "This would need to be done similarly to handling "-l h_vmem=" & "-l virtual_free=", and I can do that, but (IIRC) there is the other possi" [labs/toollabs] - 10https://gerrit.wikimedia.org/r/282975 (owner: 10Yuvipanda) [00:58:05] (03CR) 10Yuvipanda: "Indeed, that was what was being discussed when I was drafting https://etherpad.wikimedia.org/p/deprecate-precise." [labs/toollabs] - 10https://gerrit.wikimedia.org/r/282975 (owner: 10Yuvipanda) [00:58:42] (03CR) 10Yuvipanda: "And I'll very happily accept someone who actually knows perl to rewrite this patch to fit in better :)" [labs/toollabs] - 10https://gerrit.wikimedia.org/r/282975 (owner: 10Yuvipanda) [01:56:52] I keep getting ECDSA key errors when I log in now. [01:56:54] Is this normal? [01:58:11] enterprisey: we changed the bastion on tools. check out the /topic [01:58:26] d'oh [01:58:32] thanks [01:58:34] enterprisey: np! [04:12:48] 06Labs, 10Tool-Labs, 13Patch-For-Review: Puppet fails on all Precise execution nodes - https://phabricator.wikimedia.org/T132282#2202084 (10yuvipanda) 05Open>03Resolved a:03yuvipanda Fixed! I made a legacy class with the fonts that are still available in precise, and used the mediawiki one for trusty. [04:21:00] RECOVERY - Puppet run on tools-exec-1201 is OK: OK: Less than 1.00% above the threshold [0.0] [04:21:14] RECOVERY - Puppet run on tools-webgrid-lighttpd-1210 is OK: OK: Less than 1.00% above the threshold [0.0] [04:21:15] RECOVERY - Puppet run on tools-redis-1001 is OK: OK: Less than 1.00% above the threshold [0.0] [04:21:55] RECOVERY - Puppet run on tools-exec-1217 is OK: OK: Less than 1.00% above the threshold [0.0] [04:23:31] RECOVERY - Puppet run on tools-exec-1208 is OK: OK: Less than 1.00% above the threshold [0.0] [04:23:45] RECOVERY - Puppet run on tools-exec-1212 is OK: OK: Less than 1.00% above the threshold [0.0] [04:23:45] RECOVERY - Puppet run on tools-exec-1214 is OK: OK: Less than 1.00% above the threshold [0.0] [04:26:05] RECOVERY - Puppet run on tools-webgrid-lighttpd-1209 is OK: OK: Less than 1.00% above the threshold [0.0] [04:27:39] RECOVERY - Puppet run on tools-exec-1216 is OK: OK: Less than 1.00% above the threshold [0.0] [04:28:47] RECOVERY - Puppet run on tools-exec-1209 is OK: OK: Less than 1.00% above the threshold [0.0] [04:30:25] RECOVERY - Puppet run on tools-webgrid-lighttpd-1202 is OK: OK: Less than 1.00% above the threshold [0.0] [04:32:10] RECOVERY - Puppet run on tools-exec-1215 is OK: OK: Less than 1.00% above the threshold [0.0] [04:33:44] RECOVERY - Puppet run on tools-webgrid-lighttpd-1203 is OK: OK: Less than 1.00% above the threshold [0.0] [04:33:56] RECOVERY - Puppet run on tools-exec-1219 is OK: OK: Less than 1.00% above the threshold [0.0] [04:34:14] RECOVERY - Puppet run on tools-webgrid-lighttpd-1204 is OK: OK: Less than 1.00% above the threshold [0.0] [04:36:28] RECOVERY - Puppet run on tools-webgrid-lighttpd-1208 is OK: OK: Less than 1.00% above the threshold [0.0] [04:37:36] RECOVERY - Puppet run on tools-exec-1204 is OK: OK: Less than 1.00% above the threshold [0.0] [04:37:36] RECOVERY - Puppet run on tools-exec-1205 is OK: OK: Less than 1.00% above the threshold [0.0] [04:42:02] RECOVERY - Puppet run on tools-exec-1221 is OK: OK: Less than 1.00% above the threshold [0.0] [04:43:33] RECOVERY - Puppet run on tools-exec-1213 is OK: OK: Less than 1.00% above the threshold [0.0] [04:43:59] RECOVERY - Puppet run on tools-exec-1202 is OK: OK: Less than 1.00% above the threshold [0.0] [05:55:50] 06Labs, 10Tool-Labs, 10labs-sprint-119, 06Community-Tech-Tool-Labs, 10Diffusion: Figure out a git hosting solution for tools/kubernetes - https://phabricator.wikimedia.org/T117071#2202145 (10mmodell) This all sounds reasonable and achievable, though not completely trivial. It will be a little tricky get... [06:05:01] 06Labs, 06Operations, 07Puppet: Implement role based hiera lookups for labs - https://phabricator.wikimedia.org/T120165#1847021 (10mmodell) Is this really difficult to do? I'm very interested in fixing this but not at all sure where to start. [06:37:44] RECOVERY - Puppet run on tools-grid-shadow is OK: OK: Less than 1.00% above the threshold [0.0] [06:44:08] RECOVERY - Puppet run on tools-exec-1203 is OK: OK: Less than 1.00% above the threshold [0.0] [06:44:22] RECOVERY - Puppet run on tools-exec-1207 is OK: OK: Less than 1.00% above the threshold [0.0] [08:05:39] RECOVERY - Puppet run on tools-webgrid-lighttpd-1207 is OK: OK: Less than 1.00% above the threshold [0.0] [08:32:43] RECOVERY - Puppet run on tools-webgrid-lighttpd-1206 is OK: OK: Less than 1.00% above the threshold [0.0] [10:07:33] 10Tool-Labs-tools-Other: Gadget for Article Monitor from RENDER is broken - https://phabricator.wikimedia.org/T132161#2202589 (10kai.nissen) Weird, today I cannot reproduce this anymore. This might have been caused by labs outage. Both the Article Monitor and the Article List Generator work fine. [10:23:25] 06Labs, 10Labs-Infrastructure: I/O on labmon1001 is very slow - https://phabricator.wikimedia.org/T127957#2202606 (10fgiunchedi) I don't think I could anytime soon, though thinking back I think getting SSDs is guaranteed to fix the issue so I'd recommend going for that [11:48:24] 06Labs: username case mismatch in keystone totp plugin - https://phabricator.wikimedia.org/T132455#2202762 (10coren) @Krenair I believe it was, yes. [14:58:26] 10Tool-Labs-tools-stewardbots, 06Stewards-and-global-tools: Unified and centralized CSS and JS for all tools in the project - https://phabricator.wikimedia.org/T130030#2203410 (10MarcoAurelio) 05Open>03Resolved a:03MarcoAurelio Done in the /resources directory for all tools but hat-web-tool, which partly... [14:58:38] 10Tool-Labs-tools-stewardbots, 06Stewards-and-global-tools: Unified and centralized CSS and JS for all tools in the project - https://phabricator.wikimedia.org/T130030#2203413 (10MarcoAurelio) p:05Triage>03Normal [15:30:30] 06Labs, 10Tool-Labs, 10labs-sprint-119, 06Community-Tech-Tool-Labs, 10Diffusion: Figure out a git hosting solution for tools/kubernetes - https://phabricator.wikimedia.org/T117071#2203598 (10bd808) After talking through some related things with @yuvipanda and @Krenair yesterday on irc, I think we have a... [15:31:21] (03PS85) 10Ricordisamoa: Initial commit [labs/tools/wikidata-slicer] - 10https://gerrit.wikimedia.org/r/241296 [15:48:55] PROBLEM - Puppet run on tools-webgrid-lighttpd-1405 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [15:49:42] (03CR) 10Ricordisamoa: "PS85 adds a 'value' property of type Object and a 'getValue' method to DraggableElement and uses that instead of a stringified 'data-value" [labs/tools/wikidata-slicer] - 10https://gerrit.wikimedia.org/r/241296 (owner: 10Ricordisamoa) [15:57:17] 06Labs, 10Tool-Labs, 06Community-Tech-Tool-Labs, 15User-bd808: Rewrite jsub in python - https://phabricator.wikimedia.org/T132475#2203717 (10bd808) a:03bd808 [15:57:50] 06Labs, 10Tool-Labs, 06Community-Tech-Tool-Labs, 15User-bd808: Rewrite jsub in python - https://phabricator.wikimedia.org/T132475#2199735 (10bd808) p:05Triage>03Normal [15:58:01] (03PS86) 10Ricordisamoa: Initial commit [labs/tools/wikidata-slicer] - 10https://gerrit.wikimedia.org/r/241296 [16:02:33] (03CR) 10Ricordisamoa: "PS86 makes Section.prototype.extractSingleValue clone the DraggableElement's value to restore the previous behaviour" [labs/tools/wikidata-slicer] - 10https://gerrit.wikimedia.org/r/241296 (owner: 10Ricordisamoa) [16:02:43] 06Labs, 10Tool-Labs, 06Community-Tech-Tool-Labs, 15User-bd808: Rewrite jsub in python - https://phabricator.wikimedia.org/T132475#2203732 (10bd808) Step #1 of this is going to be to document the current script so that we can audit the existing functionality and properly document the requirements. After an... [16:11:37] 06Labs: username case mismatch in keystone totp plugin - https://phabricator.wikimedia.org/T132455#2203786 (10Krenair) Did you ever commit using SVN? I couldn't find you in mediawikiwiki.code_authors [16:18:59] RECOVERY - Puppet run on tools-webgrid-lighttpd-1405 is OK: OK: Less than 1.00% above the threshold [0.0] [16:24:43] 06Labs: Put a firewall on labtestcontrol2001 - https://phabricator.wikimedia.org/T132598#2203840 (10Andrew) [16:27:55] 06Labs, 10Tool-Labs, 06Community-Tech-Tool-Labs, 07Epic: Tools web interface for tool authors (Brainstorming ticket) - https://phabricator.wikimedia.org/T128158#2203850 (10bd808) >>! In T128158#2178001, @chasemp wrote: > @bd808 what would you think about using https://phabricator.wikimedia.org/ponder/ for... [16:34:52] 06Labs: Put a firewall on labtestcontrol2001 - https://phabricator.wikimedia.org/T132598#2203828 (10Krenair) node 'labcontrol1001.wikimedia.org' contains "include base::firewall" labtestcontrol has it commented out other things are missing as well [16:50:54] PROBLEM - Puppet run on tools-webgrid-generic-1402 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [17:20:02] PROBLEM - Puppet run on tools-bastion-03 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [17:20:54] PROBLEM - Host tools-worker-1011 is DOWN: PING CRITICAL - Packet loss = 100% [17:22:04] PROBLEM - Puppet run on tools-webgrid-lighttpd-1410 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [17:26:18] 10Tool-Labs-tools-Other: Gadget for Article Monitor from RENDER is broken - https://phabricator.wikimedia.org/T132161#2204090 (10jeblad) This works for me now, after flushing out the cache. Going to close the bug. [17:27:09] 10Tool-Labs-tools-Other: Gadget for Article Monitor from RENDER is broken - https://phabricator.wikimedia.org/T132161#2204098 (10jeblad) 05Open>03Invalid [17:41:42] RECOVERY - Puppet run on tools-webgrid-lighttpd-1201 is OK: OK: Less than 1.00% above the threshold [0.0] [17:41:47] here's an example of something a wiki is really not very good at -- https://meta.wikimedia.org/wiki/Tool_Labs -- list of tools that was created 2014-07-13 and edited twice since [17:49:50] YuviPanda, hey, is there some account I can use in shinken that allows me to use commands? [17:55:01] RECOVERY - Puppet run on tools-bastion-03 is OK: OK: Less than 1.00% above the threshold [0.0] [17:55:40] * Krenair will brb [17:56:55] RECOVERY - Puppet run on tools-webgrid-lighttpd-1410 is OK: OK: Less than 1.00% above the threshold [0.0] [18:13:14] 06Labs, 10Tool-Labs, 06Community-Tech-Tool-Labs, 15User-bd808: Develop vision and roadmap for Tool Labs enhancements - https://phabricator.wikimedia.org/T132610#2204284 (10bd808) [18:13:26] 06Labs, 10Tool-Labs, 06Community-Tech-Tool-Labs, 15User-bd808: Develop vision and roadmap for Tool Labs enhancements - https://phabricator.wikimedia.org/T132610#2204320 (10bd808) p:05Triage>03High [18:32:59] * Krenair is back [18:38:33] hey Krenair. no, but you can create an account on the machine since you have ssh access... [18:38:44] Krenair: there should be one in /etc/shinken for valhallasw`cloud [18:39:58] YuviPanda, Ah in /etc/shinken/customconfig/private-contacts.cfg - okay [18:40:19] Krenair: yup. not puppetized though. oh well [18:40:53] yeah well it contains sensitive data [18:42:50] YuviPanda: I rewrote a bit. https://wikitech.wikimedia.org/w/index.php?title=Tools_Precise_deprecation&type=revision&diff=435442&oldid=435307 [18:43:06] to be more explicit in the 'you need to do X' department [18:43:54] valhallasw`cloud: <3 awesome! [18:44:06] it really doesn't want me to stay in https :/ [18:44:31] think I wrote a patch for this at some point but upstream was moving things about [18:56:31] YuviPanda, does it have some web interface I can access somewhere? [18:56:39] labmon1001.eqiad.wmnet [18:56:47] Krenair: graphite.wmflabs.org? [18:56:51] ah yes [19:08:37] YuviPanda, is "UNKNOWN: execution of the check script exited with exception timed out" from check_graphite normal? [19:13:00] Krenair: hm, no - usually that means either the shinken box is overloaded, or the graphite box is overloaded [19:13:21] Krenair: https://phabricator.wikimedia.org/T127957 might be related [19:13:34] krenair@shinken-01:~$ uptime [19:13:34] 19:13:25 up 20 days, 23:23, 1 user, load average: 0.33, 0.24, 0.20 [19:14:10] so the latter mabe. [19:14:50] yep, that sounds like the problem [19:17:50] Krenair: it doesn't seem to have any easy quick fixes though... [19:18:49] I'm still not sure things are working like I expect [19:19:06] I run echo -n "deployment-prep.deployment.tin.keyholder.status:2|g" | nc -w 1 -u labmon1001.eqiad.wmnet 8125 [19:19:30] Krenair: it takes almost a minute sometimes for new metrics to be created. it is throttled. [19:19:34] then on shinken-01: /usr/lib/nagios/plugins/check_graphite -U http://labmon1001.eqiad.wmnet -T 1 check_threshold 'deployment-prep.deployment-tin.keyholder.status' -W 0 -C 0 --from 1min --perc 100 --over [19:19:39] yeah this is far over a minute [19:19:49] Krenair: hmm, I see. [19:20:13] Krenair: I usually just use the python statsd library to check these things, since I am never sure if I'm missing some aspect of the protocol [19:20:16] which returns either "UNKNOWN: execution of the check script exited with exception timed out" (which I believe is explained now) or "UNKNOWN: No valid datapoints found" (wat?) [19:21:25] ohhhh, hang on [19:21:38] there's a typo [19:22:45] :) [19:22:52] got a "." in deployment-tin in the echo command [19:25:00] also my -T to the check_graphite call was way too low [19:32:15] YuviPanda, I think I'm still missing something about how Graphite works [19:32:34] http://graphite.wmflabs.org/render/?width=586&height=308&_salt=1460575913.514&target=deployment-prep.deployment-tin.keyholder.status&from=-60minutes [19:32:44] those gaps [19:32:48] I think those are my problem [19:33:10] those would be periods that no data was recieved [19:33:10] how often does it have to report to keep the line going? [19:33:28] depends on the resolution of the rrd [19:33:37] probably once a minute? [19:33:46] I think it's once a minute yeah [19:34:08] depends though, I think guages don't need to be refreshed all the time? Not fully sure. [19:35:33] the difference between my check and the puppetmaster cherry-pick thing is that uses --from 48h, I use 10min [19:36:31] as for the scripts that do the reporting.. [19:37:36] puppetmaster cherry-pick data is reported by a script run by cron every 10 minutes [19:38:53] http://graphite.wmflabs.org/render/?width=586&height=308&_salt=1460576312.686&from=-60hours&target=deployment-prep.deployment-puppetmaster.puppetmaster.cherrypicked_commits.ops-puppet [19:39:52] maybe when you're looking at 48h, data every 10 minutes is enough, but not when you're looking at 10min? [19:40:27] probably. [19:48:47] Krenair: oh, yeah that would make a difference. I don't remember what our default time series resolution is for graphite, but generally it stores data at several different sample frequencies and depending on the duration and point in time you query for you get the highest resolution data that covers the requested interval. [19:50:49] * YuviPanda is afk for food [20:18:22] guys, any idea why i'd get pubkey denied on deployment-cxserver03.eqiad.wmflabs ? [20:18:35] (given i'm an admin of deployment-prep) [20:26:14] mobrovac, working on it [20:26:31] cheers! [20:26:35] !log deployment-prep corrected deployment-cxserver03:/etc/puppet/puppet.conf puppetmaster to use .deployment-prep as part of dns name [20:26:42] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Deployment-prep/SAL, Master [20:27:00] had to log in as root :/ [20:27:22] mobrovac, I thought I cleaned up all the instances with this particular broken puppet issue [20:27:30] Maybe salt decided to miss cxserver03 [20:27:46] heheh [20:27:53] that seems like a likely explanation [20:28:07] I should probably add myself to the beta cluster root keys at some point [20:28:20] kk, i can log in now [20:28:22] thnx Krenair! [20:28:59] bd808, it's a pretty simple edit to https://wikitech.wikimedia.org/wiki/Hiera:Deployment-prep and in a few months it'll probably prove useful [20:30:21] amazing how i keep forgetting about the hiera page [20:30:48] {{done}} https://wikitech.wikimedia.org/w/index.php?title=Hiera:Deployment-prep&diff=435861&oldid=427474 [20:32:22] bd808, and I just saw it apply to cxserver03 [20:33:16] it even works :) [20:37:34] !log deployment-prep doing the same with -redis02 [20:37:41] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Deployment-prep/SAL, Master [20:38:14] hey, where's stashbot? [20:38:19] grr [20:43:47] @seen stashbot [20:43:47] bd808: Last time I saw stashbot they were quitting the network with reason: Ping timeout: 250 seconds N/A at 4/13/2016 10:41:11 AM (10h2m36s ago) [20:45:03] !log stashbot Bot AWOL since 2016-04-13T10:41 and not attaching to channels when restarted [20:45:04] stashbot is not a valid project. [20:45:14] !log tools.stashbot Bot AWOL since 2016-04-13T10:41 and not attaching to channels when restarted [20:45:19] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.stashbot/SAL, Master [20:53:40] 06Labs, 10Labs-Infrastructure, 10Beta-Cluster-Infrastructure, 07Tracking: Log files on labs instance fill up disk (/var is only 2GB) (tracking) - https://phabricator.wikimedia.org/T71601#2204928 (10Krenair) [20:54:17] 06Labs, 10Labs-Infrastructure, 10Beta-Cluster-Infrastructure, 07Tracking: Log files on labs instance fill up disk (/var is only 2GB) (tracking) - https://phabricator.wikimedia.org/T71601#727373 (10Krenair) [20:55:50] stashbot: hello! [20:58:41] 06Labs, 10Labs-Infrastructure, 10Beta-Cluster-Infrastructure, 07Tracking: Log files on labs instance fill up disk (/var is only 2GB) (tracking) - https://phabricator.wikimedia.org/T71601#2204947 (10Krenair) ```lang=irc Apr 12 20:39:13 Krenair: on deployment-mediawiki01 it looks like the biggest dis... [21:06:53] bd808: not sure if my message got through earlier but - didn't we add you to overall labs roots? You have cloudadmin, so if you aren't in labs roots you can make a patch and I'll merge [21:07:17] I don't think I am, no. [21:07:41] I got cloudadmin accidentally :) [21:22:52] (03PS87) 10Ricordisamoa: Initial commit [labs/tools/wikidata-slicer] - 10https://gerrit.wikimedia.org/r/241296 [21:25:38] (03CR) 10Ricordisamoa: "PS87 adds word-wrap: break-word to .draggable .panel-body" [labs/tools/wikidata-slicer] - 10https://gerrit.wikimedia.org/r/241296 (owner: 10Ricordisamoa) [21:36:13] hi all! I'm writing a tool, http://tools.wmflabs.org/oabot/ [21:36:53] but I'm not sure I've fully understood how the tools labs work [21:38:00] my app is a python WSGI app that needs to make calls to the mediawiki api (via pywikibot) as well as an external api (http://dev.dissem.in/api.html) [21:38:46] is it ok to do these calls from the webapp itself? or do I need to submit a job so that the requests are done from the grid? [21:40:21] pintoch, are they just simple web requests? [21:40:35] tom29739: yes, HTTP requests [21:41:29] pintoch, you could probably just do them straight from the webapp. [21:41:49] It's not very CPU-intensive, so it'll probably be fine. [21:42:05] tom29739: ok, then I don't understand why the webapp stalls [21:42:29] How are you running the app? [21:42:58] webservice uwsgi-python start [21:43:21] pintoch, there should be a log file somewhere, have you checked it? [21:44:01] I think it's '~/error.log' [21:44:05] I should add more logging indeed [21:44:38] Because that file should store any errors, like tracebacks and the like. [21:46:07] pintoch, I can't really do any problem-solving because I'm not a member of your tool. [21:46:20] sure sure, thanks a lot anyway! :-) [21:46:35] (but I'm happy to add you to the tool if you want to join) [21:47:27] No problem :) [22:05:07] tom29739: I've done some checks: it's the pywikibot request that stalls [22:05:23] more specifically, getting the text of a page [22:05:53] it works fine from bastion but not from the web server [22:06:40] is there any special way to setup pywikibot on wmflabs? [22:27:03] 06Labs, 10Tool-Labs, 10labs-sprint-119, 06Community-Tech-Tool-Labs, 10Diffusion: Figure out a git hosting solution for tools/kubernetes - https://phabricator.wikimedia.org/T117071#2205327 (10mmodell) @thcpriani reminded me of https://try.gogs.io/ ... might be worth looking into also? [22:29:56] RECOVERY - Puppet run on tools-exec-1206 is OK: OK: Less than 1.00% above the threshold [0.0] [22:41:21] 06Labs, 10Tool-Labs, 10labs-sprint-119, 06Community-Tech-Tool-Labs, 10Diffusion: Figure out a git hosting solution for tools/kubernetes - https://phabricator.wikimedia.org/T117071#2205378 (10bd808) >>! In T117071#2205327, @mmodell wrote: > @thcpriani reminded me of https://try.gogs.io/ ... might be worth... [23:20:36] 10Tool-Labs-tools-Other, 06Community-Tech, 07Category, 07Community-Wishlist-Survey: Pageview Stats tool - https://phabricator.wikimedia.org/T120497#2205455 (10DannyH) [23:21:00] 10Tool-Labs-tools-Other, 06Community-Tech, 07Category, 07Community-Wishlist-Survey: Pageview Stats tool - https://phabricator.wikimedia.org/T120497#1959020 (10DannyH)