[01:28:53] Hello world! [01:31:49] I have a question [01:32:15] Hi robot wm-bot :) [01:33:29] Ok, Can I run a irc bot for join Wikipedia Channels on Tool labs? [01:34:34] yes [01:42:13] Also can I use for a trivial game only for Wikipedians? (Sorry for my english) [03:07:31] 3Wikimedia Labs / 3tools: lighttpd redirects URLs of directories without a trailing slash from https to http - 10https://bugzilla.wikimedia.org/64627#c9 (10Tim Landscheidt) (In reply to metatron from comment #8) > Hmm, this need further knowledge on the subject. > 1. Lighttpd is configured without ssl engin... [03:19:00] 3Wikimedia Labs / 3tools: Support SPDY/3 on the proxy - 10https://bugzilla.wikimedia.org/65134#c2 (10Tim Landscheidt) Was this related/dependant on nginx being updated? (And thus would need to be reopened as we downgraded nginx IIRC.) [05:14:08] !log deployment-prep Restarted logstash on deployment-logstash1; Last event logged at 2014-06-01T0722:56 [05:14:10] Logged the message, Master [05:16:48] @replag [05:16:58] !replag [05:17:13] * a930913 hmms. [05:17:29] @info [05:18:33] @help [05:19:06] bd808: have you tested https://gerrit.wikimedia.org/r/#/c/136620/ ? [05:19:19] it looks good, i can just merge it if you did [05:19:52] I ran it with --dry-run and it looked like it would do the right things [05:20:18] it's lame that there's really no way to test that "for real" [05:22:36] ori: I sent an email to Antoine asking about how to setup the vagrant dir for jenkins runs. I think it will be pretty easy. I'll set it up on beta tomorrow too. [05:23:13] bd808: woo, that'd be pretty nice [05:24:16] My blocker for jenkins right now is that the repo isn't replicated on zuul.eqiad.wmnet and I don't know how to set that up [05:26:31] for the longest time i thought "contint" was "continental", because, you know, hashar [05:26:47] nice [05:44:32] bd808: is deployment::target { 'mediawiki': } used by anything? [05:45:12] ori: I don't think so. I've though about ripping all of that out of the puppet config. [05:45:36] cool, i'm on it [07:56:50] Hmm, individually, these queries run quickly, but combined... select rev_user_text as user, count(1) as count from revision_userindex where (rev_user_text in (select substring_index(page_title,"/",1) f rom page where page_namespace=2 and page_title like "%/vada.js")) and rev_comment like "% ([[WP:Vada|Vada]])" group by rev_user_text order by count desc; [07:58:38] Any way to enhance the efficiency? [08:32:01] 3Wikimedia Labs / 3Infrastructure: Virtual image for Ubuntu Trusty (14.04) - 10https://bugzilla.wikimedia.org/60684#c4 (10Antoine "hashar" Musso) 5NEW>3RESO/FIX Andrew has announced the availability of an Ubuntu Trusty in images in labs a few days ago. That solve this bug. Thank you Andrew! [08:37:53] Help request: I can't get http://tools.wmflabs.org/dewkin/ working. Tried deleting and recreating public_html, putting an index.html file, restarting the webserver... [08:39:56] Ricordisamoa: tail ~/error.log [08:43:51] a930913: error.log does not show useful information [08:50:10] Ricordisamoa: webservice start, then wait ten seconds and do webservice status. [08:52:45] Nevermind, figured it out! Thanks [14:09:45] 3Wikimedia Labs / 3tools: Please remove project local-maps - 10https://bugzilla.wikimedia.org/65250#c3 (10Marc A. Pelletier) 5NEW>3RESO/FIX The DNs aren't actually the same (note how they live in two different projects), and those OU aren't actually used anymore either. The status in LDAP is unrelated.... [14:11:17] 3Wikimedia Labs / 3tools: Replicate the Gerrit mysql database to labsdb - 10https://bugzilla.wikimedia.org/50422#c1 (10Marc A. Pelletier) p:5Normal>3Low This may not be all that relevant given the likelihood of switching to phabricator in the medium term. [14:12:55] hello Coren ! I could use two more users on the labs NFS server ( jenkins and zuul ) so I can put files in /data/project hehe [14:12:58] https://bugzilla.wikimedia.org/show_bug.cgi?id=64868 [14:13:01] 3Wikimedia Labs / 3tools: Install ogc on tools - 10https://bugzilla.wikimedia.org/60449 (10Marc A. Pelletier) 5NEW>3ASSI [14:13:54] hashar: I'm in bugzilla triage mode, so I'll get to that soon. [14:14:06] great! [14:14:41] another one is upgrading python-six package but I don't think it is in bugzilla [14:14:53] was a mail send around the Zurich hackaton iirc [14:23:15] 3Wikimedia Labs / 3tools: user_password_expires column is missing - 10https://bugzilla.wikimedia.org/64369#c3 (10Marc A. Pelletier) 5PATC>3RESO/FIX Merged and applied to track the change to production. [14:54:01] 3Wikimedia Labs / 3tools: Please remove project local-maps - 10https://bugzilla.wikimedia.org/65250#c4 (10Tim Landscheidt) (In reply to Marc A. Pelletier from comment #3) > The DNs aren't actually the same (note how they live in two different > projects), [...] Ah! Missed that. [15:01:16] 3Wikimedia Labs / 3tools: Replicate the Gerrit mysql database to labsdb - 10https://bugzilla.wikimedia.org/50422#c2 (10Kunal Mehta (Legoktm)) In that case we can re-purpose the bug to make Phabricator's database available on labs after the switch happens. [15:43:41] paravoid: should we schedule a weekly swift check in so that we're forced to actually budget time and think about that on a regular basis? (Since I have thought about it ~ not at all so far) [15:44:19] um… huh, wrong channel (they're in a different order in my gui today for some reason?) [16:03:49] Cyberpower678: yay for receiving e-mails ;-) [16:04:05] :D [16:28:02] phe: You still want wscredits to be removed, I assume? [16:28:53] 3Wikimedia Labs / 3(other): "Deprecated" error on devwiki/pagemigrationwiki (http://pagemigration.wmflabs.org/wiki/Special:RecentC hanges) - 10https://bugzilla.wikimedia.org/66040 (10Southparkfan) 3NEW p:3Unprio s:3major a:3None Created attachment 15551 --> https://bugzilla.wikimedia.org/attachment... [16:29:08] Feel free to move when needed [17:09:22] scfc_de, yes [17:24:43] phe: Done. I've moved the tool's directory to your home directory (~phe/wscredits) for you to archive/use/delete it. [18:02:38] "/usr/local/bin/lighttpd-starter: line 45: cannot create temp file for here-document: No space left on device" [18:02:49] scfc_de: ^ [18:08:29] ali_: are you still facing this issue? [18:08:53] yup :( [18:09:26] can't start my webservice [18:11:22] YuviPanda: it's fixed now. thanks [18:11:30] ali_: just clearing out some files :) [18:16:45] Coren: tools-webgrid-01 ran out of space from zoomviewer again. [18:19:00] andrewbogott: around? [18:19:06] YuviPanda: Unfortunately, dschwen hadn't replied to my suggestion at https://de.wikipedia.org/wiki/Benutzer_Diskussion:Dschwen#zoomviewer_hatte_.2Ftmp_auf_tools-webgrid-01_geflutet yet. I'll see if I can set TMPDIR for him. [18:19:11] YuviPanda: yes, but in a meeting [18:19:15] (as is Coren) [18:19:18] andrewbogott: ok. makes sense. [18:19:31] scfc_de: we could apply the lvm class on the exec hosts to give people more spcae [18:19:40] YuviPanda: but if tools is broken...? [18:19:52] andrewbogott: I fixed it for now, but looking for more permanent solutions [18:19:59] ok, thanks [18:20:42] andrewbogott: :) [18:22:54] YuviPanda: I filed a bug to enlarge /tmp IIRC. But applying that will be disruptive (disable Puppet on both webgrids, disable queue grid-01, reschedule all jobs on that host, enable & run Puppet, move files from the old /tmp to the new, reboot, enable queue, repeat the process for the other host). [18:24:08] scfc_de: hmm, we can just setup /srv, and have those use it in the meantime [18:27:23] YuviPanda: What do you mean by that? The problem (AFAICS) is that (all) webservices set up sockets in /tmp, so you can't move that without restarting the webservices. [18:27:39] scfc_de: right, so you setup /srv/tmp and have zoomviewer and other apps use that. [18:27:45] actually that's a terrible idea [18:27:51] also shouldn't apps be setting up sockets in /run? [18:27:54] or somesuch? [18:29:04] scfc_de: but yeah, looks like we do need to schedule some downtime for this [18:29:17] YuviPanda: No clue. I suggested to dschwen for zoomviewer to use /data/project/zoomviewer/..., and that's what I'm trying out right now. [18:29:28] scfc_de: yeah, but that's a temp fix, I guess. [18:31:54] andrewbogott: any ETA on when the meeting will end? [18:32:10] It's pretty much always from 16:00 to 17:00, give or take a few minutes [18:32:22] andrewbogott: ah, alright then! [18:32:30] YuviPanda: No, it should last :-). But it will only fix zoomviewer and not other apps that fill up /tmp. IIRC we now had two cases where someone tried joe with a large file (log?) which left behind a core in /tmp. So, while not making such incidents avoidable, I think /tmp should have more (double?) the space than main memory + swap. [18:32:55] scfc_de: yeah, agreed. no reason we can't have more space on the nodes themselves, I think [18:33:03] You have funny clocks. For me, it's 18:32Z :-). [18:33:31] scfc_de: I didn't fully process the times mentioned, but I figured it's another 27 mins :D [18:35:16] Even more interesting: Where is it 16:34L now? :-) US east coast should be 14:34L, and Iceland has UTC as well. Greenland? :-) [18:35:36] scfc_de: IIRC they are in a cruise ship in the middle of the atlantic [18:35:50] scfc_de: some offsite. did you not see the labs-l thread? [18:36:12] YuviPanda: Missed the memo :-). [18:36:16] scfc_de: :P [18:36:35] scfc_de: putting all our opsen in one ship in the middle of the atlantic does sound like a great idea. no more TZ issues! [18:52:24] Now if I only knew how to trigger zoomviewer into creating some temporary file ... Either it doesn't create those for all pictures, or it discards them afterwards so fast, that my "while true; do find ...; done" doesn't pick them up. Heisenbugs. [18:57:48] *argl* Where does TMPDIR=/tmp/1312570.1.webgrid-lighttpd come from? [18:58:47] Okay, that lighttpd does only set for its own process. Children get the correct setting. [19:00:07] !log tools zoomviewer: Set TMPDIR to /data/project/zoomviewer/var/tmp and ./webwatcher.sh; cannot see *any* temporary files being created anywhere, though. iipsrv.fcgi however has TMPDIR set as planned. [19:00:10] Logged the message, Master [19:25:19] scfc_de: want to mail out about TMPDIR? [19:25:32] andrewbogott: meeting still on? :) [19:26:08] YuviPanda: no, what's up? [19:26:14] Or should I just read the scroll? [19:26:38] YuviPanda: You mean labs-l? Yeah, why not. [19:26:41] andrewbogott: just wondering if there's a negative effect in applying say a 40G LVM volume to all the exec nodes. [19:26:54] andrewbogott: is that disk space at a premium? [19:27:31] YuviPanda: disk space can be overprovisioned, so unless it's filled it doesn't cost us anything. [19:27:42] andrewbogott: ah cool then. no reason to not give them space then [19:27:45] But there might not be 40G available to partition, as each instance as a pre-set available amount. [19:27:51] Based on the image flavor. [19:27:54] andrewbogott: ah, right. whichever that is. [19:27:54] YuviPanda: BTW, if you apply that LVM volume /tmp to all exec nodes, you'll have to jump through the same hoops (restart if a process uses /tmp) there as well. [19:27:56] You can't partition more than it has. [19:28:08] scfc_de: yeah, we will need to figure that out [19:28:23] scfc_de: needs downtime [19:29:03] It would be nice if the partitioning could be done "softer", i. e. just one large partition and then quotas for individual directories/users/etc. [19:33:12] Individually, these queries run quickly, but combined are slow. select rev_user_text as user, count(1) as count from revision_userindex where (rev_user_text in (select substring_index(page_title,"/",1) f rom page where page_namespace=2 and page_title like "%/vada.js")) and rev_comment like "% ([[WP:Vada|Vada]])" group by rev_user_text order by count desc; [19:33:20] Any way to enhance the efficiency? [19:39:56] andrewbogott: think you can look at https://gerrit.wikimedia.org/r/#/c/135442/ and it's dependency https://gerrit.wikimedia.org/r/#/c/135499/ when you have the time? [19:40:22] * YuviPanda puts those at the bottom of ori's queue as well [19:42:45] !log deployment-prep Updated scap to a7da355 [19:42:47] Logged the message, Master [19:45:22] YuviPanda: i'm not allowed to merge other people's puppet changes; i can only +/- 1 [19:45:36] ori: oh, didn't realize. [19:45:40] ori: I'll bug andrewbogott then [19:50:50] YuviPanda: I don't understand… won't this mongo patch break existing stats boxes? [19:51:02] Since it introduces a requirement on a package not in apt? [19:51:10] andrewbogott: gah, hmm, you are right. [19:51:44] andrewbogott: how hard is it to get it into apt? these are packages signed by mongo themselves [19:52:05] Is your overall goal to get the newfangled mongo onto stats, or into toollabs? [19:52:09] or elsewhere? [19:52:28] Typically a package needs to be audited (e.g. by Ubuntu) before it goes into our repo [19:53:25] andrewbogott: toollabs [19:53:36] andrewbogott: and for some stupid reason they have different names. [19:53:48] In that case maybe make a different class for it… [19:53:56] But there will stil be the issue with the unofficial package [19:54:05] andrewbogott: on tools I guess it's going to be ok. [19:54:07] You need the bleeding-edge mongo rather than the stable version? [19:54:30] andrewbogott: for some definiton of bleeding edge :D the two year old version is *very* outdated [19:55:10] andrewbogott: trusty is better, but 12.04 is just too old [19:55:50] Is there an official trusty release for that mongo version? Maybe it'll work on precise? [19:57:14] andrewbogott: 2.4 is the official release. problem is the config file format changed between 2.4 and 2.6, and the older style is deperecated. [19:57:52] 2.4 is official on trusty? [20:01:29] a930913: Rephrasing (I think :-)) the query to "select rev_user_text as user, count(1) as count from revision_userindex join (select substring_index(page_title,"/",1) as rev_user_text from page where page_namespace=2 and page_title like "%/vada.js") as s using (rev_user_text) where rev_comment like "% ([[WP:Vada|Vada]])" group by rev_user_text order by count desc;" returns in < 2 s, but I don't know if the results are correct. [20:02:10] andrewbogott: yeah [20:02:14] andrewbogott: 2.4 is official on trusty [20:02:55] YuviPanda: so that means we'll be running 2.4 in production for years to come, seems like a safer bet than using the newer version on tools. Unless there's a feature that you're truly desperate for :) [20:03:03] safer bet, also easier :) [20:04:51] andrewbogott: hmm, I'll investigate [20:05:35] andrewbogott: so I can just setup a new VM with trusty for tools-mongo [20:05:51] scfc_de: Ah, that almost works. Much better. Thanks. [20:06:46] andrewbogott: in that case I'll just bin my redo to new yaml style config patch [20:06:58] ok [20:07:12] andrewbogott: hmm, also, if we can decomission the mongo in production, does that make things better [20:07:35] Maybe, but I'm still overall prejudiced against packages that don't come straight from Ubuntu [20:08:31] andrewbogott: why exactly, btw? I've been trying to find that out for a while :D [20:08:36] andrewbogott: we run a hand-rolled nginx... [20:09:18] nginx isn't in apt anyplace, it's just a seat-of-pants package local to that project. We needed features that weren't available elsewhere... [20:09:19] because debian / ubuntu have a pretty robust quality control process [20:09:21] and, it's caused us a ton of trouble. [20:09:24] Whereas… ^^ [20:09:38] hmm, alright then. [20:09:48] * YuviPanda checks nginx version on trusty [20:10:04] YuviPanda: it has SPDY [20:10:23] YuviPanda: yeah, I just thought of that -- would be great to build our proxy on a stock Trusty box if it works [20:10:26] ori: trusty has 1.4.6, I guess it does have SPDY. [20:10:26] *rebuild [20:10:35] yep [20:10:36] ori: not sure if it has the appropriate things in nginx-extras we use, though [20:10:42] dunno about that [20:10:47] ori: yeah, checking [20:10:51] But of course it might fail in all the same ways that the official nginx package failed :( [20:11:18] andrewbogott: yeah. [20:11:40] andrewbogott: no harm trying, though. [20:11:56] andrewbogott: unsure how to build a box inside toollabs that won't pick up from the local repo, though [20:11:58] yep, I agree -- would be a lot better. [20:12:06] and also a more likely candidate for production if it runs from stock packages. [20:12:18] andrewbogott: let me just do it now :D [20:12:30] YuviPanda: ah, hm. I think it only picks up the local repo if you have the labs repo package. [20:12:41] So should be easy [20:12:57] Oh, except maybe that's a dependency in the class… so I guess it'll require a couple lines of puppet [20:13:04] I think YuviPanda was referring to the nginx package by WMF? [20:13:15] andrewbogott: it is, yeah [20:13:19] andrewbogott: I can probably hand install though? [20:13:33] yeah, install the package first, then apply the puppet package second and... [20:13:44] if it doesn't have ensure->latest (which I think it doesn't…) [20:13:50] andrewbogott: doesn't [20:14:11] scfc_de: The problem isn't the sockets, it's that zoomviewer makes *huge* tempfiles in /tmp and doesn't clean up after itself very well. [20:15:03] Coren: The problem with moving /tmp to a new partition is that the sockets live there, so you need to restart the webservices. [20:15:42] !log tools create instance tools-trusty-test to test nginx proxy on trusty [20:15:45] Logged the message, Master [20:15:45] scfc_de: I don't want to just grow /tmp anyways; that'll only postpone the same issue. Tools will have to be more disciplined about their tempfiles. [20:17:56] Coren: I know, but at the moment it is smaller than the RAM which can make a simple core dump fill it up. So I would support making it 10 GBytes to have a bit more margin. [20:18:49] andrewbogott: can you log into tools-trusty-test? I'm getting permission denied [20:18:50] * Coren ponders. [20:19:33] YuviPanda: nope :( [20:19:56] There shouldn't be core files in there in the first place; I think it's better to set aside room elsewhere and set TMPDIR there instead. [20:20:11] andrewbogott: baah [20:20:23] So that the root issue is addressed -- tools' temp files can't wedge the system up. [20:20:24] andrewbogott: should I delete and re-create? [20:20:30] andrewbogott: might do that tomorrow instead then. grr [20:21:01] YuviPanda: maybe? I don't know why it's broken :( [20:21:14] andrewbogott: hmm, ok. I wonder if it is just trusty packages being broken? [20:21:22] andrewbogott: as in, the trusty image [20:21:31] was it a fresh box? They've been working for me lately. [20:21:38] Unless something has changed with the upstream puppet stuff [20:23:23] andrewbogott: hmm, ok. I'll try again tomorrow [20:23:32] Doesn't it always take a while until the first login is possible? [20:23:41] (I. e., till after the first Puppet run.) [20:24:25] scfc_de: oh yeah, works now :D [20:24:55] andrewbogott: works now, took a bit longer [20:25:51] err: /Stage[main]/Role::Labs::Instance/Mount[/data/project]: Could not evaluate: Execution of '/bin/mount -o rw,vers=4,bg,hard,intr,sec=sys,proto=tcp,port=0,noatime,nofsc /data/project' returned 32: mount.nfs: mounting labstore.svc.eqiad.wmnet:/project/tools/project failed, reason given by server: No such file or directory [20:25:51] hmm [20:25:52] yeah, 10 mins or so [20:25:54] Actually, scfc_de is right, I can log in now [20:26:01] I didn't realize that was a brand-new instance [20:26:49] andrewbogott: yeah, me too [20:27:44] andrewbogott: I was getting that error with the nfs mount [21:16:50] What is wrong... Reasonator et al are gone [21:21:23] ... andrewbogott the tools are unbearably slow [21:21:27] what is the matter [21:24:12] Coren: ^ ? [21:27:33] Reasonator seems now to be back [21:27:39] the Wikidata game is a blank [21:28:06] http://tools.wmflabs.org/wikidata-game/ [21:29:18] * Coren checks. [21:30:05] Coren, are there tools that assess the overall health of the grid? All I know to do is look at CPU usage on individual nodes [21:30:28] andrewbogott: No, but making some graphite metrics for that is on my summer to-do. [21:31:04] So you locate runaway processes via 'top' and ps? [21:31:16] GerardM-: That's odd, I've seen the wikidata-game webservice restarting with no action on my part. Your doing? [21:31:27] I am just a user [21:31:37] no special right [21:31:38] s [21:31:56] andrewbogott: Yeah, top, ps, iostat. The usual. [21:32:05] GerardM-: Perhaps one of the maintainers is working on it? [21:32:11] .. ok it is back [21:32:33] Specifically, /the/ maintainer. [21:33:06] Magnus [21:33:19] GerardM-: It's been started explicitly by someone ~2 minutes ago; clearly Magnus saw something wrong with it and fixed it. [21:34:10] I saw some tools with watchdogs in crontabs. [21:34:38] scfc_de: True, that might also be it, but magnus is online atm. [21:35:01] And active as of <1min ago. [21:35:29] Sounds more plausible, then :-). [21:35:30] andrewbogott: As a rule, the grid will work around runaway nodes, but that doesn't help the jobs that were already there. [21:36:48] magnus did restart the webservive [21:36:50] ce [21:37:18] didnt know about Wikidata Game.. interesting [21:39:25] it has an impact on the missing statements it tackles :) [21:40:03] 3Wikimedia Labs / 3tools: Unable to explain queries on replicated databases - 10https://bugzilla.wikimedia.org/48875#c14 (10Marc A. Pelletier) Yeah, at best this partially populated schema offers an approximation; but that's arguably better than /no/ information. @metatron: I'm okay with making this availab... [22:33:11] (03PS1) 10Andrew Bogott: Add fake swift passwords for testing [labs/private] - 10https://gerrit.wikimedia.org/r/136928 [22:33:35] (03CR) 10Andrew Bogott: [C: 032] Add fake swift passwords for testing [labs/private] - 10https://gerrit.wikimedia.org/r/136928 (owner: 10Andrew Bogott) [22:34:31] (03CR) 10Andrew Bogott: [V: 032] Add fake swift passwords for testing [labs/private] - 10https://gerrit.wikimedia.org/r/136928 (owner: 10Andrew Bogott) [22:38:13] (03PS1) 10Andrew Bogott: Revert "Add fake swift passwords for testing" [labs/private] - 10https://gerrit.wikimedia.org/r/136931 [22:38:45] (03CR) 10Andrew Bogott: [C: 032 V: 032] Revert "Add fake swift passwords for testing" [labs/private] - 10https://gerrit.wikimedia.org/r/136931 (owner: 10Andrew Bogott) [22:59:05] hedonil, hi