[01:40:35] [13nagf] 15Krinkle pushed 1 new commit to 06master: 02https://github.com/wikimedia/nagf/commit/edff0da5be54132159a8de8e4d2ff3bcc6ecadf9 [01:40:35] 13nagf/06master 14edff0da 15Timo Tijhof: readme: Add Travis CI badge [01:42:05] wikimedia/nagf#18 (master - edff0da: Timo Tijhof) The build passed. - http://travis-ci.org/wikimedia/nagf/builds/38002746 [03:14:39] PROBLEM - ToolLabs: Low disk space on /var on labmon1001 is CRITICAL: CRITICAL: tools.tools.diskspace._var.byte_avail.value (11.11%) [03:22:59] RECOVERY - ToolLabs: Low disk space on /var on labmon1001 is OK: OK: All targets OK [12:18:21] *********** [12:18:52] Coren: uhm... [12:19:04] It's a low security one, and is already being replaced. :-) [12:19:09] :) [12:28:52] paravoid: Good eye though; or do you stalk password-like regexes? :-) [12:29:10] hah [12:34:00] Huh. I just had a (tiny) credit on the statement of one of my cards that's labeled "class action". [13:18:20] Coren: my favorite class action lawsuit was for the ZipDisk ‘click of death’ error [13:18:29] as a settlement, I got a coupon for more ZipDisks [13:18:56] Yeah, I heard of that. It was... amusing. I'm sure the lawyers didn't get paid in zip disks though. [13:19:07] i recall thinking just that at the time :D [13:20:05] That said, the credit was $27 rather than a coupon. I dunno how Visa wronged me though. :-) [13:21:36] sweet! $27 can be exchanged for many peanuts [13:54:44] I will let you know when I see andrewbogott around here [13:54:44] @notify andrewbogott [14:22:37] ACKNOWLEDGEMENT - ToolLabs: Puppet failure events on labmon1001 is CRITICAL: CRITICAL: tools.tools-trusty.puppetagent.failed_events.value (100.00%) Coren Trusty still has a few minor manifest issues [16:16:22] * ^demon|away finds a pointy stick to stab salt with a few times [16:16:26] <^demon|away> Error: Execution of '/usr/bin/salt-call --out=json grains.append deployment_target elasticsearch' returned 2: [ERROR ] The Salt Master has cached the public key for this node, this salt minion will wait for 10 seconds before attempting to re-authenticate [16:16:26] <^demon|away> Minion failed to authenticate with the master, has the minion key been accepted? [16:17:24] <^demon|away> (i already did puppet cert sign ) [16:39:09] 3Wikimedia Labs / 3Infrastructure: source group field is confusing - 10https://bugzilla.wikimedia.org/67759#c1 (10Chase) test comment, pay no mind [16:45:14] Coren, petan: ping! :) How's it going? I'm having trouble accessing the replica database for metwiki on tool labs. I get: SELECT command denied to user 'u3609'@'10.68.16.7' for table 'cn_notices' [16:46:15] Hm. Lemme see. [16:46:51] thanks ... on wikitech I'm andyrussg [16:47:26] Hm. Wait, I don't know that table. Was it added recently? [16:48:03] No... It's part of CentralNotice [16:48:38] I'm logging into tools-login.wmflabs.org [16:48:55] Then doing mysql --defaults-file="${HOME}"/replica.my.cnf -h metawiki.labsdb metawiki [16:49:26] From there, it appears when I do show tables; which is what I expect [16:49:26] AndyRussG: Ah, hmmm. You know, nobody ever requested those tables be made available. [16:49:35] Ah hmmmm [16:49:44] Interesting [16:49:58] The reason you get a permission denied is because you use metawiki where you should use metawiki_p; but then the tables wouldn't be there at all. [16:50:14] Ah right, yeah I didn't see anything in metawiki_p [16:50:15] (Because they aren't made availabe) [16:50:21] Ah I see [16:50:28] There's no reason they couldn't be. [16:50:45] They just were never considered for some reason. [16:51:10] Could you open a bz requesting them? I'll have a look at them to make sure there is no data to redact and will add them to the replication. [16:51:45] Probably not a corner of the databases that many people need to look at. I just want to do some row counts to make sure a CentralNotice patch will scale correctly [16:52:39] If it's just a one-shot deal, I can look at it for you now; but still open the bz because I expect someone else might want that data someday anyways and there is no reason to not make it available. [16:53:17] OK! sure that'd be fantastic [16:53:34] "select count(*) from cn_notices;" sez "481" [16:53:43] :) [16:54:10] Excellent.... [16:55:06] Could I maybe bug you to do the same for cn_known_devices, cn_notice_languages, cn_notice_countries and cn_notice_projects? [16:56:08] 5, 39875, 19791 and 3071 respectively. [16:58:54] Thanks!! Humm the last three are a lot bigger than I thought, let me check if I shouldn't be excluding some rows :/ [17:00:11] Ah OK I see one sec... [17:05:30] Rather it should be: [17:05:45] select count(distinct nl_language) from cn_notice_languages; [17:06:15] select count(distinct nc_country) from cn_notice_countries; [17:06:33] select count(distinct np_project) from cn_notice_projects; [17:06:42] And that'd be it :) [17:06:51] Coren: ^ ... and thanks again [17:06:59] 399, 280 and 14. Makes more sense from the names, too. :-) [17:07:28] Yeah silly me [17:07:35] * AndyRussG seeks more coffee [17:08:15] K I'll file the bug report (so I won't have to _bug_ you again) [17:49:26] YuviPanda: would you like me to merge https://gerrit.wikimedia.org/r/#/c/165360/ or are you working on a new version? [19:06:43] It depends on if you've re-logged in since https://gerrit.wikimedia.org/r/#/c/165501/ [19:06:54] andrewbogott: ah, probably not. [19:07:00] andrewbogott: maybe we should expire all sessions? [19:07:08] well, probably too late for that, but would have been justified [19:07:53] is that easy? [19:08:46] yeah, you can clear redis [20:29:07] 3Tool Labs tools / 3[other]: Magnus Treeviews not processing - 10https://bugzilla.wikimedia.org/54009#c7 (10Sarah Stierch) This is still a problem. I have been unable to get anything to read, despite letting it sit for hours on end or trying it on different web browsers or internet connections. This time I... [20:33:09] 3Wikimedia Labs / 3tools: Grid engine "swallows" quotation marks (double and single quotation marks) and does not recognize pages at cs.wiki (more at "Additional Information") - 10https://bugzilla.wikimedia.org/72092 (10Wesalius) 3UNCO p:3Unprio s:3major a:3Marc A. Pelletier Intention: I was trying t... [20:35:22] 3Wikimedia Labs / 3tools: Grid engine "swallows" quotation marks (double and single quotation marks) and does not recognize pages at cs.wiki (more at "Additional Information") - 10https://bugzilla.wikimedia.org/72092#c1 (10Wesalius) the problem with quotation marks appeared before (in May) when trying to sub... [21:04:24] 3Wikimedia Labs / 3tools: Grid engine "swallows" quotation marks (double and single quotation marks) and does not recognize pages at cs.wiki (more at "Additional Information") - 10https://bugzilla.wikimedia.org/72092#c2 (10Marc A. Pelletier) 5UNCO>3RESO/WON The problem is that gridengine will perform an... [21:06:33] would it be possible to run an X (tk) application on labs? [21:33:35] YuviPanda: ping [21:33:41] Krinkle: pong [21:33:44] YuviPanda: incinga is generally working for labs, right? [21:33:54] welllllll :) [21:33:57] Does it have irc notifications? [21:34:01] graphite based checks are working, yes [21:34:04] Krinkle: yes, we do [21:34:07] would be nice to have notifications if puppet fails on cvn [21:34:20] Krinkle: sure, that's not too hard... [21:34:23] for integration we have labsmon/prod, but for other less proddy projects would be nice [21:34:36] it still won't catch one particular type of puppet failure (syntax errors) but works fine for others [21:34:41] Where do they go for regular projects? [21:34:53] Krinkle: -operations, and for betalabs to -qa [21:35:23] Krinkle: file a bug and I'll do that tomorrow? right now working on finishing up the script to archive deleted instances [21:38:21] YuviPanda: Hm.. interesting. Didn't we have a catch-all at some point? [21:38:51] Krinkle: well, catch-all for IRC is -operations, but we don't have it reporting puppet failures on all instances, since that'll probably be a *lot* [21:39:18] YuviPanda: No, labs projects don't report to -operations [21:39:40] there's beta/tools/integration via graphite/labsmon/incinga, but in general nothig goes to -operations [21:39:43] (and shouldn't since its labs) [21:40:00] I mean the stuff of labs's own icinga instance [21:40:10] https://icinga.wmflabs.org/cgi-bin/icinga/status.cgi?hostgroup=cvn&style=detail [21:40:13] Krinkle: so, looking at the code, toollabs comes here, betalabs goes to -qa, contint goes to -qa and -ops (since hashar requested that) [21:40:18] is there an irc pipeline for that? [21:40:41] Krinkle: you should not count icinga.wmflabs.org as even existing. it's dead and dusted and deaaaaaaad. icinga.wikimedia.org/cgi-bin/icinga/status.cgi?search_string=labmon is what you're looking for [21:41:11] incinga wmflabs seems to be working fine? [21:41:40] ok [21:42:14] so we'll need to add another one like these [21:42:14] https://github.com/wikimedia/operations-puppet/blob/production/modules/toollabs/manifests/monitoring.pp [21:42:18] Krinkle: nobody in particular looks at it, it doesn't have IRC integration, and it's not fully accurate. Code is also not puppetized. [21:42:21] Krinkle: yeah, pretty much. [21:42:30] Krinkle: only question is, since cvn isn't puppetized, where to put it [21:42:38] We should improve those queries before we start duplicating it [21:42:48] agreed [21:43:04] and maybe abstract it to a simple class, not related to contint/ or toollabs/ etc. [21:43:18] unless there is like special queries ideally adding a new labs project would just be an array item [21:43:44] Krinkle: well, I'm working on actual labs monitoring and when that happens these will all go away... [21:43:45] YuviPanda: for Icinga, ops have no point in handling beta labs issue indeed. But definitely should be informed of CI related problems since that is a production service [21:44:03] YuviPanda: yeah [21:44:48] YuviPanda: well, I was going to recommend abstracting that monitoring.pp and having labs project request adding them to it and irc notify to a channel of their choosing (NOT -operations, of course) [21:45:06] but it's not that important, I'll wait for the generic one [21:45:33] ok [21:45:34] YuviPanda: yield; Can you describe roughly what your plan for it is? [21:45:43] Krinkle: sure! [21:45:53] Krinkle: basically, http://shinken.wmflabs.org/ what we'll be using for it. [21:46:06] Krinkle: the code to get info about instances/projects has been merged and is live [21:46:32] Krinkle: we'll write 'rules' in some form that 'match' instance info (project, name, roles applied) and generate appropriate service checks in shinken [21:46:33] (is there a meeting going on ?) [21:46:58] Krinkle: for example, every host would have a 'check_ping, check_root_space, check_var_space' checks defined by default [21:47:09] Krinkle: and if /srv is mounted (via the role) we'll check that as well, and same for /var/log [21:47:24] YuviPanda: cool [21:47:37] Krinkle: projects will have to opt-in to this system (via a puppet patch), and write rules matching the roles applied vs checks desired [21:47:48] Krinkle: can also have 'generic' checks that aren't associated with a single host [21:47:54] YuviPanda: what is it's output? like graphite/ganglia (history) or more like icinga (alerts) [21:48:04] Krinkle: http://shinken.wmflabs.org/all username: guest, password: guest. [21:48:11] Krinkle: like icinga. alerts + dashboard [21:48:16] alerts will just be email + IRC [21:49:41] Krinkle: we picked shinken instead of icinga because icinga is kinda deadish now, and also is a very 'monolithic' application (hard to scale horizontally), shinken isn't [21:49:47] and we might in the future also use shinken in prod [21:49:53] cool [21:50:05] YuviPanda: config is web-driven? [21:50:34] Krinkle: nope, ops/puppet driven. Although it is probably going to be just a simple python file defining rules, so could be in any repo. [21:51:32] YuviPanda: ok. So it would be relatively trivial (once set up) to add generic checks for all instances in project X and have alerts go to ##my-channel [21:53:17] Krinkle: yup. [21:53:48] Krinkle: only unsolved question is 'passive' checks that require something on the host to submit a check result to shinken, since that requires running something on the host in question as well. I haven't fully figured that out. [21:56:42] hi all - Brian in Berkeley here.. I was curious about this page.. http://commons.wikimedia.org/wiki/Category:SVG_locator_maps_of_countries_in_European_Union_(green_and_grey_scheme) [21:57:13] darkblue_b: hey! you'd probably get better answers about that at #wikimedia-commons [21:57:24] oh ah hm [21:57:54] what are the wikimedia IRC channels .. is there a list ? [21:58:24] if I was to set up a project VM, it might generate things like that [21:58:27] darkblue_b: https://meta.wikimedia.org/wiki/IRC/Channels :) [22:02:11] Hi guys. What happened to WIWOSM and mapnik? [22:02:49] putnik: that sort of thing is why I am here.... [22:03:34] I build a linux distribution of mapping sfwr, along with a few others... [22:03:51] we have reference installs of mapnik and many many other things [22:04:02] completely applicable to build VMs with.... [22:07:23] darkblue_b: if you want to play around with a VM, I suggest requesting a new project (https://wikitech.wikimedia.org/wiki/Help:Contents). We only have vanilla ubuntu available tho [22:07:50] YuviPanda: it looked like ... you just got a 14.04 image... this week ? [22:08:05] no, we got a few months ago [22:08:09] ok [22:08:24] *toollabs*, which is more akin to a shared hosting environment, got its first supported trusty instance this week [22:08:31] ah [22:26:22] re identity.. I created an account when I first arrived, and received email confirming.. etc.. but now I used that id/pass to login to a Talk: page on commons.wikimedia.org, and it seems that auth does not know the identity [22:26:44] is this my mistake somehow? or are there multiple, seperate worlds that each need an identity ... [22:27:48] darkblue_b: so wikitech.wikimedia.org is a separate world from... everything else [22:27:59] everything else (*.wikimedia.org, *.wikipedia.org, wikidata.org, etc) are one world [22:28:03] and wikitech is a separate one [22:28:10] I see, thx [22:28:21] yw [22:30:56] !log xtools - looks down, users asked on -tech http://tools.wmflabs.org/xtools/ [22:31:10] !log tools - xtools down, users asked on -tech http://tools.wmflabs.org/xtools/ [22:31:38] hello [22:31:41] is Xtools down? [22:32:39] Diego_Queiroz: if you go here https://tools.wmflabs.org/ and then find xtools in the list, see the links to the maintainers? [22:32:54] unfortunately they don't seem to be here at the moment, but i think that's just a timezone thing [22:33:04] you could ping them on the wiki talk pages [22:33:14] oh great [22:33:15] :P [22:33:41] !seen hedonil [22:33:41] you probably wanted to use @seen [22:33:46] @seen hedonil [22:33:46] mutante: Last time I saw hedonil they were leaving the channel #wikimedia-labs at 9/22/2014 10:02:49 PM (23d30m57s ago) [22:33:56] @seen cyberpower678 [22:33:56] mutante: Last time I saw cyberpower678 they were quitting the network with reason: Quit: Leaving N/A at 9/27/2014 6:25:07 PM (18d4h8m48s ago) [22:34:24] they don´t appear to be here for a while [22:34:29] yea..hmm [22:34:32] i´ll try to contact them on the user page [22:34:40] Diego_Queiroz: there is another way, we can open a Bugzilla bug [22:34:45] but talk page is good, yea [22:36:34] well. if it is down it would not be a case for a bug, i guess... [22:44:46] (03PS2) 10BearND: Update build script for Gradle [labs/tools/wikipedia-android-builds] - 10https://gerrit.wikimedia.org/r/165375 [22:44:48] (03PS2) 10BearND: Need JAVA_HOME for Gradle [labs/tools/wikipedia-android-builds] - 10https://gerrit.wikimedia.org/r/165376 [22:44:50] (03PS2) 10BearND: Expand wildcards when copying apk [labs/tools/wikipedia-android-builds] - 10https://gerrit.wikimedia.org/r/165377 [22:44:52] (03PS2) 10BearND: Output the current time stamp to stdout and stderr [labs/tools/wikipedia-android-builds] - 10https://gerrit.wikimedia.org/r/165520 [22:45:20] (03CR) 10BearND: [C: 032] Update build script for Gradle [labs/tools/wikipedia-android-builds] - 10https://gerrit.wikimedia.org/r/165375 (owner: 10BearND) [22:45:43] (03CR) 10BearND: [C: 032] Need JAVA_HOME for Gradle [labs/tools/wikipedia-android-builds] - 10https://gerrit.wikimedia.org/r/165376 (owner: 10BearND) [22:46:32] (03CR) 10BearND: [C: 032] Expand wildcards when copying apk [labs/tools/wikipedia-android-builds] - 10https://gerrit.wikimedia.org/r/165377 (owner: 10BearND) [22:46:46] (03CR) 10BearND: [C: 032] Output the current time stamp to stdout and stderr [labs/tools/wikipedia-android-builds] - 10https://gerrit.wikimedia.org/r/165520 (owner: 10BearND) [22:47:22] Krinkle: btw, https://gerrit.wikimedia.org/r/#/c/166902/ adds an archiver script that archives deleted instances. I just ran it manually and it deleted 24 dead host metrics [22:53:49] !log deployment-prep cleaned up acct and atop logs in deployment-bastion [23:00:49] Diego_Queiroz: if user pages fail i'd still bug report it, the bug being that it's down [23:01:09] a service that is unexpectedly not working is a bug to me [23:01:23] TParis replyed to me [23:01:24] https://en.wikipedia.org/w/index.php?title=User_talk%3ATParis&diff=629775769&oldid=629770123 [23:01:29] ah, quick, nice [23:01:38] Apparently, there is a problem with wmflabs [23:02:03] well, if it is really all of toollabs.. then this is the best channel to report it as well [23:02:07] but is it? [23:02:32] i see other tools that look normal [23:02:39] me too [23:02:57] meh, -> Bugzilla :) [23:03:03] but I really don´t understand the minor workings of tool labs to tell [23:03:21] that way it doesn't need realtime replies and we can check on it [23:04:13] it even has a separate component already for XTools:) [23:05:09] 3Tool Labs tools / 3X!'s tools: Xtools offline - 10https://bugzilla.wikimedia.org/72104 (10Diego Queiroz) 3UNCO p:3Unprio s:3normal a:3None Currently, Xtools is not accessible at all. When I try to access https://tools.wmflabs.org/xtools I get an 504 error: Gateway Time-out from nginx/1.5.0. [23:05:09] Diego_Queiroz: https://bugzilla.wikimedia.org/show_bug.cgi?id=72105 [23:05:12] ^ [23:05:22] heh [23:05:24] 3Tool Labs tools / 3X!'s tools: http://tools.wmflabs.org/xtools/ down - 10https://bugzilla.wikimedia.org/72105 (10Daniel Zahn) 3NEW p:3Unprio s:3normal a:3None http://tools.wmflabs.org/xtools/ keeps loading forever currently but nothing appears to happen. as reported by Diego Queiroz on IRC [23:05:52] done :) [23:05:58] done as well, duplicate :) [23:06:11] haha [23:06:38] 3Tool Labs tools / 3X!'s tools: http://tools.wmflabs.org/xtools/ down - 10https://bugzilla.wikimedia.org/72105#c1 (10Daniel Zahn) arr.. ok. duplicate of 72104 :) [23:07:38] added cyberpower to it [23:08:27] removed duplicate [23:08:37] 3Tool Labs tools / 3X!'s tools: http://tools.wmflabs.org/xtools/ down - 10https://bugzilla.wikimedia.org/72105#c2 (10Daniel Zahn) 5NEW>3RESO/DUP *** This bug has been marked as a duplicate of bug 72104 *** [23:08:37] 3Tool Labs tools / 3X!'s tools: Xtools offline - 10https://bugzilla.wikimedia.org/72104#c1 (10Daniel Zahn) *** Bug 72105 has been marked as a duplicate of this bug. ***