[00:21:04] 6Labs, 10Tool-Labs, 10Diffusion, 10Internet-Archive: Copy contents of https://svn.toolserver.org/ to Wikimedia Diffusion - https://phabricator.wikimedia.org/T60801#1960838 (10jayvdb) >>! In T60801#1959192, @mmodell wrote: > All but 4 of the repositories imported successfully into subdirectories on rTSVN. I... [00:22:24] 6Labs, 10Tool-Labs: Figure out a way to support java 1.8 on tool labs (Merl's bot) - https://phabricator.wikimedia.org/T121279#1960839 (10intracer) Upcoming version of my [[ https://commons.wikimedia.org/wiki/Commons:WLX_Jury_Tool | Commons:WLX Jury Tool ]] will use Play 2.4 which requires Java 8. So I won't b... [00:25:58] (03PS16) 10Ricordisamoa: Initial commit [labs/tools/faces] - 10https://gerrit.wikimedia.org/r/192096 [00:26:36] YuviPanda: I've got a bunch more admin changes queued up but I kind of don't want to submit them to gerrit and endure the jenkins -2 spam :/ [00:27:17] * bd808 tries to grok https://phabricator.wikimedia.org/T124618 [00:29:39] (03CR) 10Ricordisamoa: "PS16 uses a specific Exception class in ImageMappedFilePage#parse()" [labs/tools/faces] - 10https://gerrit.wikimedia.org/r/192096 (owner: 10Ricordisamoa) [00:32:32] 6Labs, 10Tool-Labs: toollabs dh_test (debian-glue) fails - https://phabricator.wikimedia.org/T124618#1960842 (10bd808) So we apparently need to either stub out `/usr/local/bin/log-command-invocation` during the test runs or add a way to suppress calling it entirely. [00:55:08] (03PS1) 10BryanDavis: Don't call log-command-invocation under Jenkins [labs/toollabs] - 10https://gerrit.wikimedia.org/r/266166 (https://phabricator.wikimedia.org/T124618) [00:56:56] (03CR) 10jenkins-bot: [V: 04-1] Don't call log-command-invocation under Jenkins [labs/toollabs] - 10https://gerrit.wikimedia.org/r/266166 (https://phabricator.wikimedia.org/T124618) (owner: 10BryanDavis) [01:02:11] (03CR) 10Yuvipanda: "hmm, I guess it should just not actually fail when the logging call fails, unconditionally, I guess..." [labs/toollabs] - 10https://gerrit.wikimedia.org/r/266166 (https://phabricator.wikimedia.org/T124618) (owner: 10BryanDavis) [01:12:48] (03CR) 10Tim Landscheidt: [C: 04-1] "This would make the test suite fail everywhere but in Jenkins. You need to invert the logic: In the test suite, do not log. In the test " [labs/toollabs] - 10https://gerrit.wikimedia.org/r/266166 (https://phabricator.wikimedia.org/T124618) (owner: 10BryanDavis) [01:23:44] (03PS2) 10BryanDavis: Don't call log-command-invocation when testing jsub [labs/toollabs] - 10https://gerrit.wikimedia.org/r/266166 (https://phabricator.wikimedia.org/T124618) [01:25:12] (03CR) 10jenkins-bot: [V: 04-1] Don't call log-command-invocation when testing jsub [labs/toollabs] - 10https://gerrit.wikimedia.org/r/266166 (https://phabricator.wikimedia.org/T124618) (owner: 10BryanDavis) [01:27:29] "No space left on device" blerg [01:35:49] (03CR) 10BryanDavis: "recheck" [labs/toollabs] - 10https://gerrit.wikimedia.org/r/266166 (https://phabricator.wikimedia.org/T124618) (owner: 10BryanDavis) [01:40:09] (03PS1) 10BryanDavis: Add default styling to error pages [labs/toollabs] - 10https://gerrit.wikimedia.org/r/266171 [01:40:11] (03PS1) 10BryanDavis: Fix font-size on tool.php and list.php properly [labs/toollabs] - 10https://gerrit.wikimedia.org/r/266172 [01:40:13] (03PS1) 10BryanDavis: Prevent content/common.inc.php from being viewed [labs/toollabs] - 10https://gerrit.wikimedia.org/r/266173 [01:40:15] (03PS1) 10BryanDavis: Add support for proper error page testing [labs/toollabs] - 10https://gerrit.wikimedia.org/r/266174 [01:40:17] (03PS1) 10BryanDavis: Upgrade HTML Purifier to 4.7.0 [labs/toollabs] - 10https://gerrit.wikimedia.org/r/266175 [01:40:19] (03PS1) 10BryanDavis: Use HTML5 instead of XHTML [labs/toollabs] - 10https://gerrit.wikimedia.org/r/266176 [01:40:21] (03PS1) 10BryanDavis: Add support for toolinfo.json to tool page [labs/toollabs] - 10https://gerrit.wikimedia.org/r/266177 [01:40:23] (03PS1) 10BryanDavis: Replace array_key_exists with isset [labs/toollabs] - 10https://gerrit.wikimedia.org/r/266178 [03:49:27] 10Quarry: Show desktop notification when a query is done - https://phabricator.wikimedia.org/T124625#1960941 (10APerson) 3NEW [08:30:22] 6Labs, 10MediaWiki-Vagrant, 10VisualEditor: VisualEditor not working in labs-vagrant - https://phabricator.wikimedia.org/T124575#1961143 (10Yurik) Possible - i only created a webproxy vem3.wmflabs.org:80 to my instance:8080, without any additional port forwarding. The docker instance added a number of por... [09:45:53] 10Tool-Labs-tools-Other, 10DBA: tools.tools-info credentials are not functioning - https://phabricator.wikimedia.org/T105911#1961249 (10jcrespo) @valhallasw poke labs admins. [09:47:15] 6Labs, 10Tool-Labs, 10Tool-Labs-tools-Other, 10DBA: tools.tools-info credentials are not functioning - https://phabricator.wikimedia.org/T105911#1961259 (10valhallasw) @yuvipanda? [10:32:34] 6Labs, 10Tool-Labs, 10Tool-Labs-tools-Other, 10DBA: tools.tools-info credentials are not functioning - https://phabricator.wikimedia.org/T105911#1961366 (10yuvipanda) a:3yuvipanda I'll regen it and document the process this week. [11:37:27] 6Labs, 10Tool-Labs: lighttpd does not correctly close connections (CLOSE_WAIT) - https://phabricator.wikimedia.org/T104799#1961507 (10Nemo_bis) This killed erwin85 tools today. ``` 2016-01-21 18:47:39: (log.c.166) server started 2016-01-21 19:31:10: (log.c.166) server started 2016-01-21 19:31:44: (server.c.15... [11:52:35] 6Labs, 10Labs-Infrastructure, 6operations, 5Patch-For-Review: mail from testlabs to ops list - https://phabricator.wikimedia.org/T124516#1961519 (10akosiaris) p:5Triage>3Normal [11:52:42] 6Labs, 10Labs-Infrastructure, 6operations, 5Patch-For-Review: mail from testlabs to ops list - https://phabricator.wikimedia.org/T124516#1958596 (10akosiaris) p:5Normal>3High [14:11:44] forgot to mention i was running quite heavy select queries one after the other on the db replica ; if anybody notices any slowdown, please tell me and i'll stop them immediately - i'd have to find another way, which would prove quite difficult, but they're not particularly urgent [14:18:03] Why is this tool loading shit from Google and CloudFlare? https://tools.wmflabs.org/family/ancestors.php?q=P1733 [14:21:58] Alphos: which tool or db user are you using? [14:22:53] jynus wikidatawiki_p and one other per query, depending on the wiki i'm exploring [14:23:10] trying to get items pointing to pages that redirect to other pages which relate to another item [14:23:35] https://github.com/alaefin/wikidata-redirects-conflicts-reports/blob/master/sprintf.sql the query looks something like that [14:24:03] "%1$s" would be replaced by "bgwiki", or "frwiki"... [14:24:40] the tool itself is wikidata-redirects-conflicts-reports (sorry, never been great at finding funny names) [14:26:43] i've created a scatterplot of number of pages on that wiki / time taken in seconds http://alphos.fr/wikidata/reportstimescatter.html the biggest one so far is arwiki, started about 4 hours ago, which took about 69 minutes [14:27:34] yeah, it is ok, it is whithin the limits of aceptable- more so if you are proactive announcing it here [14:27:43] thank you! [14:27:48] jynus no, thank you :) [14:27:53] talking of which [14:28:58] i intend to run that reporting tool about once every month or so, possibly in a cronjob, on all wikis with wikilinks (including enwiki and commonswiki) ; would that still be within the limits of acceptable ? [14:29:53] as i said, i could find another, slower way for bigger wikis ; not sure it'd need less resources in total, but it would spread their consumption [14:29:54] that depends on how they are done [14:30:03] if possible, serialize the execution [14:30:32] that would be the slower way - and it would likely take a lot of time :/ [14:31:06] that is the point- not consuming all resources of the server at the same time with 30 threads :-) [14:31:23] arwiki was a bit of a test, but so far, for wikis up to 100k pages, it's much faster [14:32:59] heh, i get that ^^ point is it will then require quite a lot of back-and-forth between (likely php) and mysql, possibly with a prepare() that wouldn't take much less time - that's part of the issue, since none of the queries technically share structure, as each request prompts a different db [14:33:28] much less time than the current straight query, i mean [14:34:15] it's obviously worth exploring - if i hit the db for more than an hour with the current setup, i'll sure kill it for bigger ones [14:41:51] (03PS3) 10Tim Landscheidt: Don't call log-command-invocation when testing jsub [labs/toollabs] - 10https://gerrit.wikimedia.org/r/266166 (https://phabricator.wikimedia.org/T124618) (owner: 10BryanDavis) [14:43:34] (03CR) 10Tim Landscheidt: "PS3: Added debian/changelog entry and used the same condition in both instances." [labs/toollabs] - 10https://gerrit.wikimedia.org/r/266166 (https://phabricator.wikimedia.org/T124618) (owner: 10BryanDavis) [14:45:05] (03CR) 10Tim Landscheidt: [C: 032] Don't call log-command-invocation when testing jsub [labs/toollabs] - 10https://gerrit.wikimedia.org/r/266166 (https://phabricator.wikimedia.org/T124618) (owner: 10BryanDavis) [14:57:44] i can't connect to replicas db from my instance with mysql commandline [14:58:05] PROBLEM - ToolLabs Home Page on toollabs is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - string 'Magnus' not found on 'http://tools.wmflabs.org:80/' - 323 bytes in 0.003 second response time [14:59:46] grid seems down _again_ [15:03:08] grid sucks [15:06:59] YuviPanda Coren petan whichever of you guys are online [15:07:09] hello [15:07:45] $ qstat [15:07:45] error: unable to send message to qmaster using port 6444 on host "tools-grid-master.tools.eqiad.wmflabs": can't resolve host name [15:07:56] could you fix that ^ [15:08:20] and lots of webservices broken [15:08:30] good grief [15:08:30] I am sorry but I am really busy atm... I am afraid this isn't anything that can be fixed in 10 sec or something like that [15:08:37] ok [15:09:13] let me check if there isn't some dns issue [15:09:44] hmm looks like that [15:09:48] where is valhallasw [15:10:00] I think there is some dns issue petan, what are you doing to check? [15:10:21] chasemp: I am just trying random servers to see if they can resolve or not [15:10:26] (03PS2) 10Tim Landscheidt: Add default styling to error pages [labs/toollabs] - 10https://gerrit.wikimedia.org/r/266171 (owner: 10BryanDavis) [15:10:29] I just noticed that ORES is down and I can't ssh to the ores-web-01.eqiad.wmflabs. [15:10:34] Happened a minute ago. [15:10:38] Some restarts happening? [15:10:43] (03CR) 10jenkins-bot: [V: 04-1] Add default styling to error pages [labs/toollabs] - 10https://gerrit.wikimedia.org/r/266171 (owner: 10BryanDavis) [15:11:12] halfak: looks like a possible dns issue not sure yet, I would imagine it's not ores's fault tho [15:11:20] labs dns is clearly down [15:11:25] OK. Will hold tight for a bit. [15:11:27] unfortunatelly nothing I have access to [15:11:57] * halfak is super happy any time he gets a page and is already at his desk [15:12:17] working on balanced class issues and score consistency in ORES this morning. [15:12:30] petan, quick Q if you have a minute. [15:12:51] Do you set any thresholds for ORES scores in Huggle or just sort/report the score? [15:13:11] RECOVERY - ToolLabs Home Page on toollabs is OK: HTTP OK: HTTP/1.1 200 OK - 772579 bytes in 2.970 second response time [15:13:25] And we're back! [15:13:35] \o/ [15:13:47] halfak: what sorts of thresholds you mean? like don't display pages bellow score N? [15:13:53] Yeah. [15:14:04] Or flag revisions score score > N with a color [15:14:15] I restart pdns entirely on labservices [15:14:22] so if things just came back that was clearly the culprit [15:14:38] 6Labs, 10Tool-Labs, 15User-bd808: toollabs dh_test (debian-glue) fails - https://phabricator.wikimedia.org/T124618#1961783 (10scfc) 5Open>3Resolved a:5scfc>3bd808 [15:14:39] Thanks chasemp [15:15:01] halfak: no, I don't, in fact it could be possible (in theory) that a page would have so low score (<= -800) that editor would be whitelisted, in that case all next edits by same editor would be [15:15:01] ignored (unless user choses not to disable whitelist) [15:15:35] but that can't happen right now because "amplifier" for ores scores is so low that ores can't classify edits by such huge score [15:16:07] petan, OK. I'm currently working on a change that will make our probability estimates make much more sense, but it will also make the estimates much lower in general. [15:16:09] edits with very high score would however always have different icon in queue (with exclamation mark) if that is what you meant by color [15:16:19] E.g. an edit that would have scored 92% might now score 33%. [15:16:33] for that reason I implemented offset constant that can be increased to shift the value [15:16:50] there are 2 variables in huggle config that can change ores scores [15:16:53] amplifier and offset [15:17:14] petan, gotcha. So we might need to change the amplifier if the range of the probability estimates suddenly shifted substantially. [15:17:23] amplifier changes the scale, eg higher the amplifier higher the score, offset shifts the "zero" [15:17:31] (03CR) 10Tim Landscheidt: "recheck" [labs/toollabs] - 10https://gerrit.wikimedia.org/r/266171 (owner: 10BryanDavis) [15:17:46] Yeah. Seems like we'd need to do both then. [15:18:10] The new zero would be X% where X is the percentage of vandalism over the entire set of edits. [15:18:28] the problem here is that these two constants are stored in local config and there is no way to change them remotely for users :P, but I suppose nobody modified them by hand so they use default and [15:18:29] with next release the default would change if I change it in source [15:18:39] have to go now [15:19:04] OK. Will think about this more and let you know anything well in advance. [15:19:06] Thanks petan [15:19:07] o/ [15:22:39] yw [15:22:58] you can always send me a mail if you need to discuss it more deeply [15:26:44] (03CR) 10Tim Landscheidt: [C: 032] "Did not test this, but looks good enough to me." [labs/toollabs] - 10https://gerrit.wikimedia.org/r/266171 (owner: 10BryanDavis) [15:27:13] (03PS2) 10Tim Landscheidt: Fix font-size on tool.php and list.php properly [labs/toollabs] - 10https://gerrit.wikimedia.org/r/266172 (owner: 10BryanDavis) [15:32:24] (03CR) 10Tim Landscheidt: [C: 032] "Did not test this, but looks good enough to me." [labs/toollabs] - 10https://gerrit.wikimedia.org/r/266172 (owner: 10BryanDavis) [15:33:30] (03PS2) 10Tim Landscheidt: Prevent content/common.inc.php from being viewed [labs/toollabs] - 10https://gerrit.wikimedia.org/r/266173 (owner: 10BryanDavis) [15:34:01] chasemp: still erroring: $ qstat [15:34:01] error: commlib error: got select error (Connection refused) [15:34:02] error: unable to send message to qmaster using port 6444 on host "tools-grid-master.tools.eqiad.wmflabs": got send error [15:34:30] $ ssh tools-bastion-01.eqiad.wflabs [15:34:30] ssh_exchange_identification: read: Connection reset by peer [15:34:55] now? [15:34:56] oops the second one was a typo [15:35:09] yeah first one now [15:35:23] or unstable [15:35:28] (03PS2) 10Tim Landscheidt: Add support for proper error page testing [labs/toollabs] - 10https://gerrit.wikimedia.org/r/266174 (owner: 10BryanDavis) [15:35:50] like a few seconds ago erroring [15:36:29] (03CR) 10Tim Landscheidt: [C: 032] "Did not test this, but looks good enough to me." [labs/toollabs] - 10https://gerrit.wikimedia.org/r/266173 (owner: 10BryanDavis) [15:37:47] (03CR) 10Tim Landscheidt: [C: 032] "Did not test this, but looks good enough to me." [labs/toollabs] - 10https://gerrit.wikimedia.org/r/266174 (owner: 10BryanDavis) [15:39:21] (03PS2) 10Tim Landscheidt: Upgrade HTML Purifier to 4.7.0 [labs/toollabs] - 10https://gerrit.wikimedia.org/r/266175 (owner: 10BryanDavis) [15:39:35] (03CR) 10Tim Landscheidt: [C: 032] "Did not test this, but looks good enough to me." [labs/toollabs] - 10https://gerrit.wikimedia.org/r/266175 (owner: 10BryanDavis) [15:42:16] (03PS2) 10Tim Landscheidt: Use HTML5 instead of XHTML [labs/toollabs] - 10https://gerrit.wikimedia.org/r/266176 (owner: 10BryanDavis) [15:43:08] (03PS2) 10Tim Landscheidt: Add support for toolinfo.json to tool page [labs/toollabs] - 10https://gerrit.wikimedia.org/r/266177 (owner: 10BryanDavis) [15:44:34] (03CR) 10Tim Landscheidt: [C: 032] "Did not test this, but looks good enough to me." [labs/toollabs] - 10https://gerrit.wikimedia.org/r/266176 (owner: 10BryanDavis) [15:45:51] (03CR) 10Tim Landscheidt: [C: 032] "Did not test this, but looks good enough to me." [labs/toollabs] - 10https://gerrit.wikimedia.org/r/266177 (owner: 10BryanDavis) [15:46:58] 6Labs, 10MediaWiki-Vagrant, 10VisualEditor: VisualEditor not working in labs-vagrant - https://phabricator.wikimedia.org/T124575#1961886 (10mobrovac) [15:59:08] (03CR) 10Tim Landscheidt: "I'm not a huge fan of this, the reason concisely expressed in the commit message: "The difference between isset() and array_key_exists() i" [labs/toollabs] - 10https://gerrit.wikimedia.org/r/266178 (owner: 10BryanDavis) [16:43:47] jynus can i try hitting wikis > 250k one at a time with the same query now ? i think i've done all < 250k, with the longest of them being bewiki at 112k and 10min7sec ; and the biggest being euwiki at 230k, 2min26sec http://alphos.fr/wikidata/reportstimescatter.html ; reminding that arwiki (406k) took 70 minutes [16:44:33] question is, if you do only need one query each month, you can wait, right? [16:44:52] one per wiki ;) [16:45:21] but yes, it can definitely wait - it has this far :) [16:45:37] petan: during the dns outage, were you resolving public or internal names? tools.wmflabs.org or tools-bastion01.eqiad.wmflabs or tools-bastion01? [16:54:15] i honestly think arwiki was a fluke ; not sure why it took so long relatively to what appears to be much less complex for most other wikis (which seem to take between O(log) and O(n^2) on their respective pagecount) ; other apparent flukes include tlwiki, hiwiki, bewiki, and possibly thwiki and simplewiki ; i suspected bytes/char used for page names, but simplewiki and tlwiki clearly can't make that true (both mostly ascii) ; maybe item count ? [16:54:15] or *conflicting* item count (or I/O to write the reports, measured together with the sql) ? [17:12:13] chasemp: andrewbogott thanks for taking care of it! [17:17:12] 10Tool-Labs-tools-Other, 6Community-Tech, 7Community-Wishlist-Survey, 7Milestone: Pageview Stats tool - https://phabricator.wikimedia.org/T120497#1962139 (10Milimetric) >>! In T120497#1959020, @Lokal_Profil wrote: > In addition to any programming the students are tasked with doing various rounds of technic... [17:24:33] 6Labs, 10Labs-Infrastructure, 6operations, 5Patch-For-Review: mail from testlabs to ops list - https://phabricator.wikimedia.org/T124516#1962167 (10Dzahn) also let's stop using ops@ wherever we see it and use ops@lists , the proper list address [17:29:00] 6Labs, 10Tool-Labs, 5Patch-For-Review, 15User-bd808: Make error pages mobile friendly - https://phabricator.wikimedia.org/T119830#1962172 (10bd808) 5Open>3Resolved a:3bd808 I think I've fixed things up quite a bit for the error pages. See https://tools.wmflabs.org/this_is_an_intentional_404/ as an ex... [18:01:23] 6Labs, 10Labs-Infrastructure, 6operations, 5Patch-For-Review: mail from testlabs to ops list - https://phabricator.wikimedia.org/T124516#1962320 (10Andrew) > ops@lists @dzahn, can you clarify? What is the full email address I should be using here? [18:04:27] 6Labs, 10Labs-Infrastructure, 6operations, 5Patch-For-Review: mail from testlabs to ops list - https://phabricator.wikimedia.org/T124516#1962345 (10Dzahn) >>! In T124516#1962320, @Andrew wrote: > @dzahn, can you clarify? What is the full email address I should be using here? ops@lists.wikimedia.org plea... [18:06:04] 6Labs: Figure out how to deal with SSL cert issues for kubernetes masters - https://phabricator.wikimedia.org/T119814#1962357 (10yuvipanda) I should just setup SANs for the kubernetes domains into the certificate used by the k8s master. [18:08:08] 6Labs: Periodic internal labs dns outages - https://phabricator.wikimedia.org/T124680#1962377 (10Andrew) 3NEW a:3Andrew [18:09:41] 6Labs, 10Labs-Infrastructure, 6operations, 5Patch-For-Review: mail from testlabs to ops list - https://phabricator.wikimedia.org/T124516#1962395 (10Andrew) a:3Andrew [18:21:39] 6Labs, 10wikitech.wikimedia.org, 5Patch-For-Review: Decide on future of Semantic extensions on Wikitech - https://phabricator.wikimedia.org/T123599#1962453 (10Bugreporter) Is this a duplicate of {T53642}? [19:03:06] I'm trying to check my security groups, but I can't see the “Labs Projectadmins” section on wikitech.wikimedia.org. Apparently I need to be a projectadmin - does it mean an admin of a project on labs? If so, I am one (my project is alkamidbot), but I still can't see the link [19:06:48] !log tools kill python merge/merge-unique.py tools-exec-1213 as it seemed to be overwhelming nfs [19:06:51] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL, Master [19:09:56] oh, apparently a projectadmin is something else. Can I set up a mediawiki-vagrant as an ordinary user then? [19:32:55] andrewbogott: I am back, during outage I just tried resolving stuff from labs (not to labs) I tried internal names and also normal records for 3rd sites. Internal didn't work, others did work but [19:32:55] took long time to resolve. [19:48:53] alkamid: nope, you can't setup mediawiki-vagrant on tools [19:49:14] https://wikitech.wikimedia.org/wiki/Help:MediaWiki-Vagrant_in_Labs [19:49:29] andrewbogott: chasemp am going to switch cron to new hosts unless you guys object... [19:49:36] should have no user facing impact [20:07:46] no objection :) [20:15:22] kkk am doin it now [20:15:22] PROBLEM - ToolLabs Home Page on toollabs is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:15:28] I didn't do anything yet [20:15:33] I just ssh'd into places and NFS is hung [20:15:39] is it still hung? [20:15:41] no [20:15:43] it just came back [20:15:55] still a bit slow [20:15:56] I restarted the nfsd I !log'd it in -ops [20:16:15] need to a way to !log from here and hit SAL when it's more labs relevant [20:16:21] hard to keep a real close eye on -ops [20:16:27] ah [20:16:30] ok [20:16:33] that might have been it [20:16:48] it took like a few secs for all clients to reconnect [20:16:51] seemingly [20:17:01] yeah [20:17:03] seems ok now [20:17:11] sorry I should have waited / coordinated that better [20:17:15] I was in my own little world [20:17:24] should I go on with the cron stuff or do you want me to hold off till you're done? [20:17:30] np :D [20:17:31] nope I'm sitting for a bit go for it [20:17:37] I won't change any more things out from under you :) [20:17:54] kkk :D [20:19:00] petan: ok, thanks, that’s consistent with what I was seeing as well [20:19:36] sigh, a 'vim filename' tkaes like 10s now, but that's probably because of my .vimrc [20:19:42] but it did used to be instant [20:20:08] RECOVERY - ToolLabs Home Page on toollabs is OK: HTTP OK: HTTP/1.1 200 OK - 771985 bytes in 3.282 second response time [20:24:28] 6Labs: Periodic internal labs dns outages - https://phabricator.wikimedia.org/T124680#1963233 (10Andrew) Three times in the last week I've gotten alert storms like this: tools-exec-1409 : Jan 25 14:59:15 : diamond : unable to resolve host tools-exec-1409 This seems to be due to a temporary failure of the inter... [20:26:40] YuviPanda: it's possible there is a regression? it was a pretty trivial bump since say early friday but it's not impossible [20:26:59] my own testing says I get around the same performance from a client but it seems maybe more stable across clients [20:27:00] idk yet [20:27:59] there has been a small side effect of upping the thread pool which is clients are succeeding more in gettnig their I/O scheduled [20:28:01] chasemp: yeah, it was this slow last week too [20:28:11] so probably not whatever you did on friday [20:28:13] and thus the contention for I/O has increased some under heavy load [20:28:15] ok [20:30:32] !log tools switched over cron host to tools-cron-01, manually copied all old cron files from tools-submit to tools-cron-01 [20:30:36] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL, Master [20:33:08] switchover done! [20:33:14] things seem to be firing from tools-cron-01 [20:33:31] sweet [20:38:44] nope it's all good [20:38:51] * YuviPanda calls that task done [20:38:58] I'll keep tools-submit around for a few more days [20:39:36] 6Labs, 10Tool-Labs, 5Patch-For-Review: Migrate tools-submit to tools-cron-01/-02 - https://phabricator.wikimedia.org/T123873#1963301 (10yuvipanda) 5Open>3Resolved This is all done now! \o/ [20:39:38] 6Labs, 10Tool-Labs: tools.taxonbot and tools.giftbot cronjobs not firing - https://phabricator.wikimedia.org/T123186#1963303 (10yuvipanda) [20:50:39] 6Labs, 10Tool-Labs, 5Patch-For-Review: Migrate tools-submit to tools-cron-01/-02 - https://phabricator.wikimedia.org/T123873#1963368 (10yuvipanda) I've left tools-submit as is (with crontabs moved to crontabs.bak so they don't accidentally fire) in case we need to recover anything in the next few days. [20:52:14] What's happened [20:52:25] Cron daemon on tool lab is spamming my inbox [20:52:40] YuviPanda: ^ [20:53:36] uh oh [20:53:42] liangent: what's it spamming you with [20:54:18] . [20:54:48] I'm copy pasting the error mail from gmail but its not workkng [20:54:50] Why [20:55:05] heh, are project owners getting the cron mail now :?:)) [20:55:10] https://etherpad.wikimedia.org/p/cronerror [20:55:14] liangent : your hardware is after you, it's the uprising :p [20:55:18] instead of root@.. that would be really cool [20:55:34] Anyway I'm away from computers and reading mails on my phone [20:56:25] hmm [20:56:34] mutante: chasemp do you know how I can read contents of outgoing mail? [20:56:37] http://pastebin.com/3sQe5M8P [20:56:37] if at all? [20:56:41] I see mail going out to a bunch of tools [20:56:43] thanks liangent [20:56:44] YuviPanda: ^ [20:57:02] YuviPanda: ngrep will dump the contents I think it will be messy-ish [20:57:10] ngrep -W byline port 25 [20:57:12] maybe [20:57:16] what host? [20:57:24] chasemp: tools-cron-01 [20:57:49] hmm [20:57:52] so [20:57:54] /data/project/liangent-php/php [20:57:56] indeed [20:57:58] does not exist [20:58:10] But I dunno why its calling that [20:58:49] yeah [20:58:55] am looking at your crontab now [20:58:59] it should only be calling jsub [20:58:59] is it trying to lauch the cron now [20:59:04] and not doing jsub [20:59:05] :) [20:59:16] I see another [20:59:19] data/project/wahrani/php: No such file or directory. [20:59:22] so it may be a thing [20:59:33] ooh [20:59:35] data/project/cluebot/php: No such file or directory [20:59:37] uh oh :) [20:59:40] yeah [20:59:46] and in fact [20:59:51] that error does seem to be coming from jsub [20:59:57] since if I run it from commandline [20:59:59] I get the same [21:00:01] hmm [21:00:20] except it's fine on tools-bastion-01 [21:00:30] hm nfs seems ok [21:00:33] aaaah [21:00:35] I know the problem [21:00:52] some path alias thing or soemthign? [21:01:00] sooooo [21:01:03] it looks like [21:01:08] jsub checks to see if it can find executable [21:01:09] on current host [21:01:14] before submitting job [21:01:16] and php wasn't installed [21:01:18] on tools-cron-01 [21:01:25] well that's kinda odd [21:01:27] while the previous exec node had *all* the packages [21:01:37] that exec nodes had [21:01:41] liangent: your cronspam should be fixed now [21:01:45] chasemp: yeah, I agree [21:01:53] chasemp: since it's submitting them to the grid, not running it locally [21:01:58] I mean it doesn't prove anything since nothing soudl be run there.... [21:01:59] yeah [21:02:07] so I suppose we rip it out of jsub [21:02:17] I've installed php right now to get around this [21:02:26] maybe it wsa a testing relic idk [21:02:38] yeah [21:02:44] but I think ripping it out is the right thing to do [21:02:51] I'm cool w/ it [21:03:06] yeah [21:03:14] esp. the alt may be running down every env or interpter for every job [21:03:17] for really no value [21:03:22] YuviPanda: I'm still getting new mails [21:03:34] The same content [21:04:35] liangent: whoops, apparently Installing php5-common isn't enough [21:04:37] installing php5 now [21:04:42] this ofc installs apache [21:04:49] liangent: should stop *now* [21:04:52] or not [21:04:58] apparently that doesn't give you 'php' either [21:04:59] YuviPanda php5-cli not part of php5-common, is it ? [21:05:13] yeah [21:05:16] it isn't [21:05:18] am installing it now [21:05:23] now it works [21:05:26] guessing this might cause that :D [21:05:29] yeah [21:05:33] ^^ [21:05:46] chasemp: ugh, there's a lotta complex-ish logic in jsub that assumes that you can use which to find the full path of things [21:05:53] i think i've spent way too much time helping people in ##php :D [21:05:54] and I don't know what changing that will break. [21:05:56] Alphos: :D [21:05:59] Still not fixed [21:06:07] YuviPanda: I'm also fine w/ installing things [21:06:09] liangent: are they with new timestamps? [21:06:09] Last mail 16:05 ET [21:06:14] right [21:06:16] Imean in theory this should be installed via puppet but I guess not [21:06:35] it's not terrible to be able to quick test a job [21:06:39] but it's a weird model [21:06:53] chasemp: it was being installed via puppet, I just didn't put it into the new class since I was like 'this is probably a layover from an earlier time, surely it is not needed now' [21:07:02] ahhhh [21:07:05] so I can just add it into puppet and feel sad about installing the whole world [21:07:32] liangent: yeah, it was fixed past that, let me kno wif you get any mails newer than :06 [21:07:59] chasemp: so I'm tempted to just install that via puppet again, and just keep in mind to kick this when we redo jsub to not be perl [21:08:09] cool w/ me [21:08:16] it's not harmful persay just odd [21:08:52] yeah [21:08:53] well [21:09:02] it does increase puppet run times but that is ok I guess [21:10:32] we'll tear it all out as we can :) [21:10:46] yeah [21:12:00] so YuviPanda what was jlocal? [21:12:02] see labs-l [21:12:51] chasemp: so remember how crontab script forces you to always use jsub? [21:12:53] that's a lie [21:12:57] ha [21:12:58] if you prepend 'jlocal' to your command [21:12:59] ok [21:13:00] it lets it be [21:13:03] jlocal is just... [21:13:17] exec $@ [21:13:19] literally [21:13:20] under what circumstances were people using jlocal? [21:13:27] they have a shell script [21:13:30] that does jsubs for them [21:13:33] based off some criteria [21:13:46] oic [21:14:11] array jobs >.> because you would not implement that in jsub!!! [21:15:04] YuviPanda: seems fixed [21:15:12] No more mails in last 10minutes [21:15:13] liangent: \o/ thanks for reporting [21:15:30] And thanks for fixing [21:16:03] BTW can you configure http://bots.wmflabs.org/~wm-bot/logs/%23wikimedia-labs/20160125.txt [21:16:12] To be served with content type text/plain [21:16:22] let this be a reminder to all who should install php5-cli ! \o/ [21:16:29] So at least it can be viewed easier on phones [21:17:51] that's already the case [21:18:15] the weird chars you see are the ones the bots use to make their messages more colourful [21:18:50] chasemp: I think the person who is seeing a missing jlocal [21:18:52] is seeing it because [21:18:57] old host had it both on /usr/bin [21:19:00] and in /usr/local/bin [21:19:02] while new one [21:19:05] only has latter [21:19:19] don't shoot me when I ask why the dupe paths on old :) [21:19:22] liangent : there's a freenode channel mode to strip them off channel-wide, but i kinda think non-coloured bot messages would be quite difficult to read, and it won't fix the previous logs :/ [21:19:35] chasemp go ahead, ask, make my day ! :p [21:19:51] chasemp: I don't actually know. it was unpuppetized [21:19:55] root@tools-cron-01:/var/spool/cron/crontabs# grep 'jlocal' * | wc -l [21:19:58] 60 [21:20:00] * YuviPanda sobs a bit [21:20:28] maybe we force jlocal to nice out and put a timeout on them for sanity [21:20:31] chasemp: hmm, I guess maybe /usr/local/bin is not in cron PATH [21:20:35] Alphos: nope I mean returning content-type: text/plain in http header [21:20:53] chasemp: yeah, that's stuff I wanted to do. and also enforce that they can't spawn any processes otherthan jsub [21:20:55] liangent yes, that's already the case, i just checked [21:20:56] (or qsub) [21:21:00] just never got to it [21:21:02] hmm [21:21:09] ok thanks man [21:21:35] I think I'll have to just say fuck it and put jlocal in /usr/bin too [21:21:37] HTTP/1.1 200 OK /// Server: nginx/1.9.4 /// Date: Mon, 25 Jan 2016 21:17:27 GMT /// Content-Type: text/plain /// etc [21:21:38] so CRON can find it [21:22:02] Alphos: huh but why chrome insists in downloading it [21:22:24] YuviPanda: ln -s ? [21:22:40] what does nice out mean? [21:22:41] chasemp: yeah, am doing that but via puppet [21:22:45] liangent that's between you and your browser :/ could be chrome doesn't like the non-UTF-8 non-ASCII chars [21:23:09] YuviPanda: tx [21:23:51] chasemp: k no [21:23:53] err [21:23:55] np [21:24:22] Alphos: hmm okay [21:25:37] (well, technically they are ASCII and thus UTF-8, but meh) [21:27:34] (control chars, U+0002, U+0003, U+001D, U+001F) [21:36:50] * anomie wishes for review on https://gerrit.wikimedia.org/r/#/c/264440/ [21:38:16] sorry about the delay, anomie [21:38:18] merged [21:38:34] Yay, one less obstacle to making AnomieBOT use OAuth [21:41:26] starting deploy of OCG on labs. [21:42:02] cscott: I presume you mean betacluster :) [21:42:11] uh, yeah. [21:42:13] usually logs for that go into #wikimedia-releng [21:42:16] err [21:42:18] deployment-pdf01.deployment-prep.eqiad.wmflabs [21:42:21] ! logs for that I mean [21:42:26] yeah. betacluster :) [21:43:13] oh, https://wikitech.wikimedia.org/wiki/OCG#Deploying_the_latest_version_of_OCG says !log here. we can correct that if that's not the best place. [21:44:04] let me edit [21:44:07] thanks for pointing that out [21:44:55] cscott: updated! [21:44:55] i think the Parsoid deploy instructions matched that, once upon a time, but now that we have automatic deploys to beta that text probably doesn't live there anymore. i'll doublecheck to be sure. [21:45:34] (well, git deploy is persistently telling me "0/2 minions completed fetch", so maybe i'm not deploying to beta right now after all.) [21:47:20] or maybe you are!11 [21:47:24] the magic of git dpeloy [21:49:40] YuviPanda: deployment-pdf01 also says "The last Puppet run was at Fri Jan 22 21:55:09 UTC 2016 (4298 minutes ago)." [21:49:43] which doesn't seem right [21:50:15] it is entirely possible puppet has been broken there for a few days [21:50:19] git-deploy status is "last-return: 72 mins ago", which is well before I started to run git-deploy. [21:50:26] but also well *after* the last puppet run [21:50:30] d [21:50:43] YuviPanda: any ideas on how to kick it? [21:50:46] asking in -releng is probably going to give you a better chance of finding someone who knows what the issues with that are. [21:51:02] unfortunately no. I've managed to mostly shield myself from such dangerous knowledge. [21:51:16] YuviPanda: i just did, but there are just echoes in there. you seemed more responsive & knowledgeable. ;) [21:51:37] :P I think they're rolling out a pretty intense mediawiki deploy [21:53:49] How can I fetch an external webpage from Tool Labs and have it load into a Wikipedia template? [21:55:17] load into ? [21:55:53] tom29739 : are you trying to parse the page and get useful data from it, then write that data in a mediawiki template call ? [21:58:13] if so, i assume the same way you would if you weren't on tool labs : request that page, parse it, concatenate in your template, send an edit request to the wikipedia api [21:59:04] Like I have https://tools.wmflabs.org/plaintexteditcounter/test.html and I want to insert a template to fetch that page. The tool is an edit counter so the template would fetch the webpage and then return the value in to the page that the template is transcluded in. [22:00:20] (03PS1) 10Andrew Bogott: One more dummy cert [labs/private] - 10https://gerrit.wikimedia.org/r/266404 [22:01:41] cscott: sorry to make you repeat yourself… what is the actual/original problem? [22:01:55] (03CR) 10Andrew Bogott: [C: 032 V: 032] One more dummy cert [labs/private] - 10https://gerrit.wikimedia.org/r/266404 (owner: 10Andrew Bogott) [22:02:20] 10Quarry, 5Patch-For-Review: Cannot download data from a query with Unicode characters in its title - https://phabricator.wikimedia.org/T123031#1963675 (10Krenair) 5Open>3Resolved [22:02:25] tom29739 : you can't call local templates with values from an external source, or external templates, in mediawiki ; you'll have to write a few bits of javascript, some of which will have to reside on your tool, some of which either in your own common.js, or in the global common.js ; but why would you need to do what the NavPopups gadget already does ? [22:04:14] andrewbogott: thcipriani is helping me over in -releng. puppet hasn't run on deployment-pdf01 in labs since friday, that seems to be the root cause. [22:04:27] o [22:04:27] ok [22:04:28] ...possibly :) [22:04:31] tom29739 and so we're clear, yes, the js will have some bits of ajax calls in it [22:04:33] let me know if you need me to bang on puppet [22:04:37] andrewbogott: git-deploy/salt isn't responding either, but hopefully the puppet problem is the root cause. [22:04:56] 10Quarry, 5Patch-For-Review: Query counter increases but draft query is not accessible when window is closed and query doesn't have a title - https://phabricator.wikimedia.org/T101394#1963690 (10Krenair) 5Open>3Resolved [22:05:32] andrewbogott: one weird thing is that pdf01 seems to think its hostname is deployment-pdf01.eqiad.wmflabs (missing the deployment-prep bit) [22:05:53] when you say “thinks”... [22:06:28] it also says, "deployment-pdf01 is a Puppet client of deployment-puppetmaster.eqiad.wmflabs" on boot (again, missing the deployment-prep bit) [22:06:32] thanks alphos [22:06:34] 10Quarry: Number of queries shown in profile is wrong - https://phabricator.wikimedia.org/T86512#1963694 (10Krenair) Sounds like T101394 to me - for some reason sometimes 'None' was shown, sometimes an empty string (which meant you couldn't use it). It no longer shows empty strings. [22:06:39] hostname and hostname -d both return reasonable things [22:06:41] well, I guess that hostname -f is correct. it's a problem with both salt and puppet [22:06:46] tom29739 really, go for navpopups [22:07:03] server = deployment-puppetmaster.eqiad.wmflabs [22:07:36] how does it work to get a user's editcount [22:07:50] andrewbogott: same with /etc/salt/minion for whatever reason [22:07:59] thcipriani: it’s because that’s what you set it to [22:08:00] https://wikitech.wikimedia.org/wiki/Hiera:Deployment-prep [22:08:07] for some value of ‘you' [22:08:38] tom29739 it requests the local api https://www.mediawiki.org/wiki/API:Users , which in turns requests the local db [22:08:40] well, the salt master at least [22:09:07] YuviPanda: you can only easily inspect emails that are queued on tools-mail but that only happens if they can't be delivered directly [22:09:39] any idea what changed on friday to make this stop working? [22:11:05] well, fwiw, changing the puppet.conf got puppet working, seemingly [22:11:26] andrewbogott: ^ thanks! [22:11:49] YuviPanda: chasemp we could kill jlocal once submitting jobs from exec hosts is allowed [22:11:55] thcipriani: probably running puppet will rewrite puppet.conf back the way it was... [22:11:59] or, at least, it’s supposed to :) [22:12:14] valhallasw`cloud: ah, ok [22:12:33] valhallasw`cloud: and we migrate all the current users to something else, although I think chasemp might've had some objections to having exec hosts be submit hosts [22:13:16] I'm also OK with just having jlocal if it's uppetized [22:13:30] valhallasw`cloud: yeah, it's puppetized now [22:13:39] After all, less time spent on SGE is better ;-) [22:14:01] * YuviPanda agrees vehemently :D [22:14:03] cool [22:32:25] YuviPanda, re: mediawiki-vagrant. Thanks. It seems I'm confusing Labs with Tools. Is the need for a virtual environment to develop user scripts for pl.wikt a valid reason to apply for a new project on Labs? Or is there a project that I can join and set up my vagrant instance? [22:35:34] alkamid: is there a reason you can't just do it on test.wikipedia.org? [22:37:08] YuviPanda, I don't know, is that something people use to develop .js scripts? [22:37:32] yes, I think that's a valid place to test your userscripts [22:38:13] I was thinking of importing a dump (it's small) so I could simulate the wiki, but I guess it's not necessary [22:38:32] that's probably too much work for little benefit I think :) [22:38:32] OK, I'll go to test then. Thanks! [22:38:38] np [22:42:11] alkamid if you're anything like me and you want to test your scripts when they interfere with other people's edits, feel free to ask for people to edit some pages :D [22:43:33] will do Alphos (: but first let's get all that rust off my poor JS [22:44:08] been there, done that XD [22:45:24] alkamid : i'll give you ~20 backs-and-forths between your editor and ?action=edit before you clean your common.js and just paste "javascript:" in the address bar instead :p [22:52:12] andrewbogott: are labs jessie instances good to go again? [22:52:18] YuviPanda: yes [22:52:24] the kernel is mislabed but otherwise fine [22:52:28] cool [22:52:28] (I’m told) [22:52:36] did you re-enable jessie creaetion?