[00:43:46] (03CR) 10Krinkle: [C: 031] Add some more repos to #mediawiki-visualeditor [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/163322 (owner: 10Jforrester) [00:47:22] (03CR) 10Jforrester: [C: 032] "Per Timo." [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/163322 (owner: 10Jforrester) [00:47:25] (03Merged) 10jenkins-bot: Add some more repos to #mediawiki-visualeditor [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/163322 (owner: 10Jforrester) [00:50:52] !log lolrrit-wm Restart to deploy to latest master [00:50:52] lolrrit-wm is not a valid project. [00:50:55] Bah. [00:52:06] (How do I log something for a Tools log?) [02:22:36] (03PS1) 10Jforrester: No need to broadcast Citoid service deploy repo changes twice [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/163795 [04:46:56] James_F|Away: I think the syntax is !log tools. whatever [06:35:31] RECOVERY - ToolLabs: Low disk space on /var on labmon1001 is OK: OK: All targets OK [07:19:02] 3Wikimedia Labs / 3deployment-prep (beta): deployment-rsync01 20GB hard drive is too small - 10https://bugzilla.wikimedia.org/71431#c1 (10Antoine "hashar" Musso) deployment-rsync01.eqiad.wmflabs ( https://wikitech.wikimedia.org/wiki/Nova_Resource:I-000002f4.eqiad.wmflabs ) is a m1 small with 20GB disk alloca... [11:17:51] !log wikidata-build wikidata-builder3.eqiad.wmflabs apt-get dist-upgrade [11:17:53] Logged the message, Master [11:18:42] !log wikidata-build wikidata-jenkins1.eqiad.wmflabs apt-get dist-upgrade and puppet run with current operations/puppet.git [11:18:43] Logged the message, Master [11:32:57] Hmm, I have access to tool-labs, so I should be able to receive mail from revi(shell username)@tools.wmflabs.org, but when I send it, labs mailserver says email not available (also crontab email is not arriving) [12:00:25] Silke_WMDE: Sorry to bother you again, I think you have received another email concerning Platonides' stuff in Toolserver, could you tell me how is the subject going? [12:01:27] jem Sorry, no idea - the tools are all maintained by volunteers who generally don't "report" what's going on. [12:01:32] That is... [12:01:42] I suppose I saw a mail to nosy [12:01:57] asking for more files from that directory [12:02:46] what exactly do you need to know? [12:03:56] I explain [12:04:31] The problems is with a bot used for the WLM contest that supposedly is (was) under /home/platonides [12:04:37] problem* [12:04:46] i know [12:04:59] you or someone from your group asked for access [12:05:02] and got it [12:05:22] Yes, but the previous file apparently was only the public_html contents, and the bot wasn't there [12:05:37] I haven't seen that file, but that is what I have been told [12:05:52] So I warned about it and Ecemaml sent another email yesterday [12:06:04] I don't have access to the files [12:06:06] it's nosy [12:06:24] (aka marlen.caemmerer@wikimedia.de) [12:06:38] please poke her again if you haven't heard anything yet [12:06:53] I'm reviewing the email [12:07:10] I'm in the middle of a meeting jem. I'll be back online later. [12:07:12] Yes, it was sent to her, yesterday 10:41 CEST [12:07:18] Ok, thanks, Silke_WMDE [12:52:31] 3Tool Labs tools / 3[other]: Catscan2 offline - 10https://bugzilla.wikimedia.org/71402 (10Fæ) a:3Fæ [13:00:00] 3Tool Labs tools / 3[other]: Catscan2 offline - 10https://bugzilla.wikimedia.org/71402#c1 (10Fæ) 5NEW>3RESO/INV Hurrah, I have no explanation but Catscan appears to be responding today. [14:49:45] godog: monitoring: filippo-test-trusty <- ? [14:51:10] andrewbogott: yep, I replied to your email, should be running an updated puppet [14:51:30] its /etc/ldap.conf is still bad -- my test looks for ldap-eqiad in the ldap server list. [14:51:40] which suggests that puppet is failing somehow. [14:51:52] gah, ok I'll take a look now [14:52:23] thanks [14:54:00] -ldapserver = virt1000.wikimedia.org [14:54:00] +ldapserver = ldap-eqiad.wikimedia.org [14:54:02] andrewbogott: ^ [14:54:17] thanks [14:54:47] I like how in this particular instance the last thing that puppet does is restart the puppetmaster, which doesn't restart and hence commits seppuku [14:55:42] yeah, that seems to happen 100% of the time. It's a new problem, I don't know why [15:04:54] hey I have a problem ssh-ing to wikidata-jenkins2.eqiad.wmflabs and wikidata-jenkins3.eqiad.wmflabs and wdjenkins.eqiad.wmflabs [15:05:09] jzerebecki: I'll check, one moment... [15:05:56] rarely it worked on 2 but then i could not do sudo -i [15:06:08] try now? [15:06:20] yep works on 2 [15:06:25] what did you do? [15:06:54] service nslcd restart [15:07:10] It's my fault -- when I updated ldap on a bunch of instances something went wrong with nslcd -- I'm not sure what. [15:07:17] A reboot would fix things as well. [15:07:26] are all three behaving OK now? [15:08:00] the other two are still the same [15:08:07] still broken? OK, let me look. [15:08:15] wdjenkins is fine [15:08:36] andrewbogott: ok all are fine now [15:08:37] thx [15:08:41] oh, great. [15:12:50] bd808: Aha, thanks. [15:13:46] jzerebecki: are you logging in to those boxes to update puppet? Looks like all three still have the old ldap settings [15:17:36] bd808, hashar, ^d, someone who cares about beta: deployment-mediawiki02 and deployment-mediawiki03 aren't puppetizing properly and thus in danger of losing ldap [15:17:43] Can one of y'all straighten that out? [15:18:39] andrewbogott: Puppet has been disabled by someone there (_joe_?) [15:18:57] I can turn puppet on, but I'm not sure what it will break. :( [15:19:00] andrewbogott: ah sorry I forgot to fill a bug for them and poke related people [15:19:48] _joe_: Did you disable puppet on deployment-mediawiki0[12]? If so do you remember why? [15:23:24] So the only thing I see in SAL is "2014-09-04 13:54 _joe_: stopped puppet on the appservers but mw03, testing an apache change" [15:23:52] andrewbogott: yes i'm updating puppet on them for the new ldap settings [15:23:57] great! [15:23:58] thanks [15:25:19] 3Wikimedia Labs / 3deployment-prep (beta): deployment-rsync01 20GB hard drive is too small - 10https://bugzilla.wikimedia.org/71431#c2 (10Greg Grossmeier) p:5Unprio>3Normal Let's not make the Jenkins beta-scap-eqiad job very divergent from prod (at all). Let's make the Beta Cluster like prod... [15:28:21] !log deployment-prep enabling puppet and forcing run on deployment-mediawiki01 [15:28:24] Logged the message, Master [15:30:37] andrewbogott: When I forced a puppet run on deployment-mediawiki01 it didn't show any changes. Seems odd [15:30:54] bd808: if it's self-hosted then you'd need to rebase as well... [15:31:23] it's from the beta self host... [15:31:34] hm, weird [15:31:53] What files can I spot check? [15:32:14] my test looks at /etc/ldap.conf [15:32:23] if it refers to virt1000, that's old, if ldap-eqiad, that's new [15:32:26] ldap-eqiad.wikimedia.org [15:32:33] weird [15:32:51] so it did update, and just didn't tell you about it [15:32:59] <_joe_> bd808: no I reenabled it on 01 actually in the past [15:33:35] 02 still has the old stuff [15:34:04] !log deployment-prep enabling puppet and forcing run on deployment-mediawiki02 [15:34:06] Logged the message, Master [15:34:49] jeremyb: do you have anything to do with the 'planet' project these days? [15:35:40] * bd808 sees that andrewbogott was pointing out 02 and 03 as the hosts needing help [15:36:03] bd808: yep, everything else looks caught up [15:36:11] Oh! Yeah, that explains why 01 was already working... [15:36:17] it's because it was already working [15:36:20] yeah :) [15:36:24] 02 got a lot of changes [15:36:48] !log deployment-prep enabling puppet and forcing run on deployment-mediawiki03 [15:36:51] Logged the message, Master [15:37:37] YuviPanda|zzzz: I would like to kill the instance 'verpverpverp.' That or get you to fix it... [15:43:35] !log wikidata-build wikidata-jenkins3.eqiad.wmflabs apt-get dist-upgrade and puppet run with current operations/puppet.git [15:43:37] Logged the message, Master [15:43:44] !log quarry: enabled puppet on quarry-runner-test, updated, installed a bunch of maria stuff, rebooted [15:43:45] quarry: is not a valid project. [15:43:59] !log quarry enabled puppet on quarry-runner-test, updated, installed a bunch of maria stuff, rebooted [15:44:01] Logged the message, dummy [15:44:02] !log wikidata-build wikidata-test.eqiad.wmflabs apt-get dist-upgrade and puppet run with current operations/puppet.git [15:44:04] Logged the message, Master [15:45:20] !log wikidata-build wikidata-jenkins2.eqiad.wmflabs apt-get dist-upgrade and puppet run with current operations/puppet.git [15:45:22] Logged the message, Master [15:47:14] 3Wikimedia Labs / 3deployment-prep (beta): deployment-rsync01 20GB hard drive is too small - 10https://bugzilla.wikimedia.org/71431#c3 (10Sam Reed (reedy)) How long did it take to break? I deleted a weird tmp dir, killed the whole cache dir, and re-ran sync-common. Which gave ~2G free space. I'm wondering... [16:00:51] !log wikidata-build wdjenkins.eqiad.wmflabs apt-get dist-upgrade and puppet run with current operations/puppet.git [16:00:53] Logged the message, Master [16:01:53] andrewbogott: All of the beta hosts should be updated for your ldap change now. Thanks for poking us. [16:02:19] bd808: cool, thanks [16:05:17] Hello [16:05:41] Could I have some help with the use of crontab on Tool Labs? Doesn't work for me. Thanks by advance. [16:07:16] of course I read https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/Help#Scheduling_jobs_at_regular_intervals_with_cron [16:16:13] Automatik: I don't know anything more than that page says, but maybe if you explain your problem someone lurking will be able to give some suggestions. [16:19:55] I mean when I execute the program directly, it works (but with the error "fatal: ambiguous argument 'HEAD': both revision and filename") whereas when I do a cron (for 12:28 UTC I tried: "28 12 * * * /usr/bin/jsub -N cron-tools.botomatik-1 -once -quiet python navig_mensuel.py"), it doesn't work (does anything). [16:21:47] 3Wikimedia Labs / 3deployment-prep (beta): Unable to connect to redis server - 10https://bugzilla.wikimedia.org/71415#c7 (10Greg Grossmeier) p:5Highes>3Immedi s:5major>3blocke a:3Ori Livneh Ori: can you please take a look at this ASAP? Redis dependency is breaking Beta/it's unasable for testing now... [16:23:55] jzerebecki or Chris-J_WMDE, is one of you planning to work on wbdocs and/or phab08? [16:24:30] 3Wikimedia Labs / 3deployment-prep (beta): Unable to connect to redis server - 10https://bugzilla.wikimedia.org/71415#c8 (10Greg Grossmeier) (Actually, I might just ask out on [Ops] for some (SWAT) deployer to help out.) [16:41:59] hmm, I don't get crontab email from tool labs... [16:42:06] though it runs well. [16:47:30] maybe you used the -quiet option [17:06:01] 3Wikimedia Labs / 3deployment-prep (beta): Unable to connect to redis server - 10https://bugzilla.wikimedia.org/71415#c9 (10Greg Grossmeier) 5REOP>3RESO/FIX andrewbogott: i think Chris wanted to, but not today anymore [18:16:56] <^d> wtf is up with wikitech? [18:17:01] <^d> getting monospaced font everywhere. [18:19:11] im not able to load my tool's web interface http://tools.wmflabs.org/recitation-bot/cgi-bin/add_doi.py [18:19:25] oops, i see: status: DNS issues. ill be patient [18:25:22] ^d: it looks ok to me... [18:25:25] screenshot? [18:26:22] andrewbogott, it does load now, thanks [18:26:42] notconfusing_: howdy! [18:26:48] ragesoss, hey [18:26:52] I met one of your collaborators in Seattle the other week. [18:27:05] :) [18:27:21] <^d> andrewbogott: It's only one page. Must be some borked text. [18:27:33] <^d> [[Search]] [18:28:18] Please, how long a modified cron file to be considered? [18:28:28] oh yeah, the sidebar is messed up there. neato [18:28:45] <^d> andrewbogott: Missing [18:28:55] <^d> Actually, double open. [18:29:01] <^d> It had

foo
[18:29:02] <^d>	 :)
[18:29:22] 	 ragesoss, Thomas Maillart
[18:32:01] 	 andrewbogott: For some reason the office wi-fi DNS still hasn't picked the DNS for tools-login.wmflabs.org: "server can't find tools-login.wmflabs.org: NXDOMAIN". Everywhere else seems to have picked it up now though.
[18:32:30] 	 andrewbogott: Who should I bug about that?
[18:32:47] 	 mutante maybe?
[18:32:52] 	 If it's really an office IT thing, then probably Joel.  Weird though
[19:14:21] 	 Does anybody have a rough idea of how large an empty MediaWiki implementation is?
[19:14:34] 	 I just want an estimate.
[19:14:55] 	 Howie_: development or from a release tarball?
[19:15:08] 	 What's the difference?
[19:15:14] 	 Howie_: development has git history
[19:15:33] 	 Howie_: the git history for just mediawiki/core is ~500M  and takes up most of the initial space
[19:15:36] 	 How much does the git history contribute?
[19:15:48] 	 and then the extensions each have their own, some big some small(but none as big as core)
[19:15:53] 	 cal it a gig?
[19:16:01] 	 sweet
[19:16:02] 	 thanks!
[20:26:19] 	 !log deployment-prep Converted deployment-rsync02 to use local puppet & salt masters
[20:26:22] 	 Logged the message, Master
[21:14:48] 	 3Wikimedia Labs / 3Infrastructure: Prevent puppet from creating local user when they are defined in LDAP - 10https://bugzilla.wikimedia.org/71480 (10Antoine "hashar" Musso) 3NEW p:3Unprio s:3normal a:3None We had a few LDAP rolling upgrades over the past few days. When puppet realize a User type, it...
[21:20:57] 	 is there a logstash for beta labs?
[21:21:02] 	 or just /data/project/logs/*
[21:21:17] 	 ebernhardson: https://logstash-beta.wmflabs.org/
[21:21:21] 	 \o/
[21:21:25] 	 bd808: thanks
[21:21:45] 	 If you need the password, let me know
[21:22:03] 	 bd808: turns out i do, it doesn't like my ldap(which makes sense)
[21:33:47] 	 3Tool Labs tools / 3Erwin's tools: shared.css gives a 404 breaking layout. - 10https://bugzilla.wikimedia.org/71482 (10Andre Koopal) 3NEW p:3Unprio s:3normal a:3None Via Common.css the following css is included:  @import "//bits.wikimedia.org/static-current/skins/common/shared.css";  After advice on...
[21:40:07] 	 bd808: i dunno who to ask,  but it appears all runJobs logging in beta logstash stopped on 9-29
[21:40:21] 	 there is about 24-36 hours of no logs
[21:40:31] 	 hmm... is the job runner running?
[21:40:37] 	 which machine? i can check
[21:40:52] 	 deployment-jobrunner01
[21:40:54] 	 oh, duh
[21:41:42] 	 It may be dead. It likes to die
[21:41:46] 	 bd808: there is an active redisJobRunnerService process
[21:42:45] 	 ahha,
[21:42:49] 	 2014-09-30T21:42:32+0000: Could not connect to Redis server 10.68.16.146:.
[21:42:49] 	 mwscript showJobs.php --wiki=enwiki --list
[21:42:52] 	 I think it's dead
[21:42:58] 	 from /var/log/mediwiki/jobrunner.log
[21:43:13] 	 Ah a casualty of the redis01 dns problem
[21:43:26] 	 sudo service jobrunner restart
[21:43:55] 	 bd808: same output
[21:44:05] 	 bd808: i think its because the port is '.'
[21:44:09] 	 looking now where that comes from
[21:44:27] 	 empty port does seem not right
[21:45:08] 	 well, unless Redis treats null as default? i dont have the redis client source handy
[21:47:22] 	 from deployment-jobrunner01 i can't telnet to 10.68.16.146 6379, its filtered at the FW level most likely(no deny, just hang)
[21:47:38] 	 assuming we have redis on 6379, not sure how to check that either
[21:47:59] 	 I think it has a bad ip cached. I think it should be 10.68.16.177
[21:48:15] 	 There was a whole debacle with this yesterday
[21:48:50] 	 Ori accidentally made a second instance with the deployment-redis01 host name
[21:49:02] 	 which apparently wikitech is happy to let you do
[21:49:21] * bd808 double checks ip
[21:50:09] 	 ebernhardson: Yeah. Ip should be 10.68.16.177
[21:50:11] 	 i can verify /etc/jobrunner/jobrunner.conf still points to 10.68.16.146
[21:50:19] 	 hmm, so somewhere in puppet :)  looking
[21:50:37] 	 Oh. not using host name maybe
[21:51:05] 	 yup, no mediawiki-redis01, just raw ip
[21:51:44] 	 :(
[21:57:33] 	 PROBLEM - ToolLabs: Puppet failure events on labmon1001 is CRITICAL: CRITICAL: tools.tools-exec-02.puppetagent.failed_events.value (22.22%) tools.tools-webgrid-03.puppetagent.failed_events.value (33.33%)  
[22:00:06] 	 !log deployment-prep Cleaned deleted instances out of salt and trebuchet redis
[22:00:09] 	 Logged the message, Master
[22:07:33] 	 bd808: https://gerrit.wikimedia.org/r/163973 to switch from ip to deployment-redis01 (i also don't know anything about the puppet standards or how to deploy those)
[22:10:02] 	 ebernhardson: I'll check it out. I can cherry-pick to beta 
[22:10:40] 	 thanks
[22:14:52] 	 (03CR) 10Catrope: [C: 032] No need to broadcast Citoid service deploy repo changes twice [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/163795 (owner: 10Jforrester)
[22:14:55] 	 (03Merged) 10jenkins-bot: No need to broadcast Citoid service deploy repo changes twice [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/163795 (owner: 10Jforrester)
[22:16:50] 	 !log tools.lolrrit-wm Restart to fix Citoid duplicate definition
[22:16:52] 	 Logged the message, Master
[22:19:44] 	 (03CR) 10Jforrester: "(And deployed.)" [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/163795 (owner: 10Jforrester)
[22:25:00] 	 RECOVERY - ToolLabs: Puppet failure events on labmon1001 is OK: OK: All targets OK  
[23:56:41] 	 ebernhardson: Job queue is almost empty in beta again thanks to your patch :)