[00:43:46] (03CR) 10Krinkle: [C: 031] Add some more repos to #mediawiki-visualeditor [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/163322 (owner: 10Jforrester) [00:47:22] (03CR) 10Jforrester: [C: 032] "Per Timo." [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/163322 (owner: 10Jforrester) [00:47:25] (03Merged) 10jenkins-bot: Add some more repos to #mediawiki-visualeditor [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/163322 (owner: 10Jforrester) [00:50:52] !log lolrrit-wm Restart to deploy to latest master [00:50:52] lolrrit-wm is not a valid project. [00:50:55] Bah. [00:52:06] (How do I log something for a Tools log?) [02:22:36] (03PS1) 10Jforrester: No need to broadcast Citoid service deploy repo changes twice [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/163795 [04:46:56] James_F|Away: I think the syntax is !log tools. whatever [06:35:31] RECOVERY - ToolLabs: Low disk space on /var on labmon1001 is OK: OK: All targets OK [07:19:02] 3Wikimedia Labs / 3deployment-prep (beta): deployment-rsync01 20GB hard drive is too small - 10https://bugzilla.wikimedia.org/71431#c1 (10Antoine "hashar" Musso) deployment-rsync01.eqiad.wmflabs ( https://wikitech.wikimedia.org/wiki/Nova_Resource:I-000002f4.eqiad.wmflabs ) is a m1 small with 20GB disk alloca... [11:17:51] !log wikidata-build wikidata-builder3.eqiad.wmflabs apt-get dist-upgrade [11:17:53] Logged the message, Master [11:18:42] !log wikidata-build wikidata-jenkins1.eqiad.wmflabs apt-get dist-upgrade and puppet run with current operations/puppet.git [11:18:43] Logged the message, Master [11:32:57] Hmm, I have access to tool-labs, so I should be able to receive mail from revi(shell username)@tools.wmflabs.org, but when I send it, labs mailserver says email not available (also crontab email is not arriving) [12:00:25] Silke_WMDE: Sorry to bother you again, I think you have received another email concerning Platonides' stuff in Toolserver, could you tell me how is the subject going? [12:01:27] jem Sorry, no idea - the tools are all maintained by volunteers who generally don't "report" what's going on. [12:01:32] That is... [12:01:42] I suppose I saw a mail to nosy [12:01:57] asking for more files from that directory [12:02:46] what exactly do you need to know? [12:03:56] I explain [12:04:31] The problems is with a bot used for the WLM contest that supposedly is (was) under /home/platonides [12:04:37] problem* [12:04:46] i know [12:04:59] you or someone from your group asked for access [12:05:02] and got it [12:05:22] Yes, but the previous file apparently was only the public_html contents, and the bot wasn't there [12:05:37] I haven't seen that file, but that is what I have been told [12:05:52] So I warned about it and Ecemaml sent another email yesterday [12:06:04] I don't have access to the files [12:06:06] it's nosy [12:06:24] (aka marlen.caemmerer@wikimedia.de) [12:06:38] please poke her again if you haven't heard anything yet [12:06:53] I'm reviewing the email [12:07:10] I'm in the middle of a meeting jem. I'll be back online later. [12:07:12] Yes, it was sent to her, yesterday 10:41 CEST [12:07:18] Ok, thanks, Silke_WMDE [12:52:31] 3Tool Labs tools / 3[other]: Catscan2 offline - 10https://bugzilla.wikimedia.org/71402 (10Fæ) a:3Fæ [13:00:00] 3Tool Labs tools / 3[other]: Catscan2 offline - 10https://bugzilla.wikimedia.org/71402#c1 (10Fæ) 5NEW>3RESO/INV Hurrah, I have no explanation but Catscan appears to be responding today. [14:49:45] godog: monitoring: filippo-test-trusty <- ? [14:51:10] andrewbogott: yep, I replied to your email, should be running an updated puppet [14:51:30] its /etc/ldap.conf is still bad -- my test looks for ldap-eqiad in the ldap server list. [14:51:40] which suggests that puppet is failing somehow. [14:51:52] gah, ok I'll take a look now [14:52:23] thanks [14:54:00] -ldapserver = virt1000.wikimedia.org [14:54:00] +ldapserver = ldap-eqiad.wikimedia.org [14:54:02] andrewbogott: ^ [14:54:17] thanks [14:54:47] I like how in this particular instance the last thing that puppet does is restart the puppetmaster, which doesn't restart and hence commits seppuku [14:55:42] yeah, that seems to happen 100% of the time. It's a new problem, I don't know why [15:04:54] hey I have a problem ssh-ing to wikidata-jenkins2.eqiad.wmflabs and wikidata-jenkins3.eqiad.wmflabs and wdjenkins.eqiad.wmflabs [15:05:09] jzerebecki: I'll check, one moment... [15:05:56] rarely it worked on 2 but then i could not do sudo -i [15:06:08] try now? [15:06:20] yep works on 2 [15:06:25] what did you do? [15:06:54] service nslcd restart [15:07:10] It's my fault -- when I updated ldap on a bunch of instances something went wrong with nslcd -- I'm not sure what. [15:07:17] A reboot would fix things as well. [15:07:26] are all three behaving OK now? [15:08:00] the other two are still the same [15:08:07] still broken? OK, let me look. [15:08:15] wdjenkins is fine [15:08:36] andrewbogott: ok all are fine now [15:08:37] thx [15:08:41] oh, great. [15:12:50] bd808: Aha, thanks. [15:13:46] jzerebecki: are you logging in to those boxes to update puppet? Looks like all three still have the old ldap settings [15:17:36] bd808, hashar, ^d, someone who cares about beta: deployment-mediawiki02 and deployment-mediawiki03 aren't puppetizing properly and thus in danger of losing ldap [15:17:43] Can one of y'all straighten that out? [15:18:39] andrewbogott: Puppet has been disabled by someone there (_joe_?) [15:18:57] I can turn puppet on, but I'm not sure what it will break. :( [15:19:00] andrewbogott: ah sorry I forgot to fill a bug for them and poke related people [15:19:48] _joe_: Did you disable puppet on deployment-mediawiki0[12]? If so do you remember why? [15:23:24] So the only thing I see in SAL is "2014-09-04 13:54 _joe_: stopped puppet on the appservers but mw03, testing an apache change" [15:23:52] andrewbogott: yes i'm updating puppet on them for the new ldap settings [15:23:57] great! [15:23:58] thanks [15:25:19] 3Wikimedia Labs / 3deployment-prep (beta): deployment-rsync01 20GB hard drive is too small - 10https://bugzilla.wikimedia.org/71431#c2 (10Greg Grossmeier) p:5Unprio>3Normal Let's not make the Jenkins beta-scap-eqiad job very divergent from prod (at all). Let's make the Beta Cluster like prod... [15:28:21] !log deployment-prep enabling puppet and forcing run on deployment-mediawiki01 [15:28:24] Logged the message, Master [15:30:37] andrewbogott: When I forced a puppet run on deployment-mediawiki01 it didn't show any changes. Seems odd [15:30:54] bd808: if it's self-hosted then you'd need to rebase as well... [15:31:23] it's from the beta self host... [15:31:34] hm, weird [15:31:53] What files can I spot check? [15:32:14] my test looks at /etc/ldap.conf [15:32:23] if it refers to virt1000, that's old, if ldap-eqiad, that's new [15:32:26] ldap-eqiad.wikimedia.org [15:32:33] weird [15:32:51] so it did update, and just didn't tell you about it [15:32:59] <_joe_> bd808: no I reenabled it on 01 actually in the past [15:33:35] 02 still has the old stuff [15:34:04] !log deployment-prep enabling puppet and forcing run on deployment-mediawiki02 [15:34:06] Logged the message, Master [15:34:49] jeremyb: do you have anything to do with the 'planet' project these days? [15:35:40] * bd808 sees that andrewbogott was pointing out 02 and 03 as the hosts needing help [15:36:03] bd808: yep, everything else looks caught up [15:36:11] Oh! Yeah, that explains why 01 was already working... [15:36:17] it's because it was already working [15:36:20] yeah :) [15:36:24] 02 got a lot of changes [15:36:48] !log deployment-prep enabling puppet and forcing run on deployment-mediawiki03 [15:36:51] Logged the message, Master [15:37:37] YuviPanda|zzzz: I would like to kill the instance 'verpverpverp.' That or get you to fix it... [15:43:35] !log wikidata-build wikidata-jenkins3.eqiad.wmflabs apt-get dist-upgrade and puppet run with current operations/puppet.git [15:43:37] Logged the message, Master [15:43:44] !log quarry: enabled puppet on quarry-runner-test, updated, installed a bunch of maria stuff, rebooted [15:43:45] quarry: is not a valid project. [15:43:59] !log quarry enabled puppet on quarry-runner-test, updated, installed a bunch of maria stuff, rebooted [15:44:01] Logged the message, dummy [15:44:02] !log wikidata-build wikidata-test.eqiad.wmflabs apt-get dist-upgrade and puppet run with current operations/puppet.git [15:44:04] Logged the message, Master [15:45:20] !log wikidata-build wikidata-jenkins2.eqiad.wmflabs apt-get dist-upgrade and puppet run with current operations/puppet.git [15:45:22] Logged the message, Master [15:47:14] 3Wikimedia Labs / 3deployment-prep (beta): deployment-rsync01 20GB hard drive is too small - 10https://bugzilla.wikimedia.org/71431#c3 (10Sam Reed (reedy)) How long did it take to break? I deleted a weird tmp dir, killed the whole cache dir, and re-ran sync-common. Which gave ~2G free space. I'm wondering... [16:00:51] !log wikidata-build wdjenkins.eqiad.wmflabs apt-get dist-upgrade and puppet run with current operations/puppet.git [16:00:53] Logged the message, Master [16:01:53] andrewbogott: All of the beta hosts should be updated for your ldap change now. Thanks for poking us. [16:02:19] bd808: cool, thanks [16:05:17] Hello [16:05:41] Could I have some help with the use of crontab on Tool Labs? Doesn't work for me. Thanks by advance. [16:07:16] of course I read https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/Help#Scheduling_jobs_at_regular_intervals_with_cron [16:16:13] Automatik: I don't know anything more than that page says, but maybe if you explain your problem someone lurking will be able to give some suggestions. [16:19:55] I mean when I execute the program directly, it works (but with the error "fatal: ambiguous argument 'HEAD': both revision and filename") whereas when I do a cron (for 12:28 UTC I tried: "28 12 * * * /usr/bin/jsub -N cron-tools.botomatik-1 -once -quiet python navig_mensuel.py"), it doesn't work (does anything). [16:21:47] 3Wikimedia Labs / 3deployment-prep (beta): Unable to connect to redis server - 10https://bugzilla.wikimedia.org/71415#c7 (10Greg Grossmeier) p:5Highes>3Immedi s:5major>3blocke a:3Ori Livneh Ori: can you please take a look at this ASAP? Redis dependency is breaking Beta/it's unasable for testing now... [16:23:55] jzerebecki or Chris-J_WMDE, is one of you planning to work on wbdocs and/or phab08? [16:24:30] 3Wikimedia Labs / 3deployment-prep (beta): Unable to connect to redis server - 10https://bugzilla.wikimedia.org/71415#c8 (10Greg Grossmeier) (Actually, I might just ask out on [Ops] for some (SWAT) deployer to help out.) [16:41:59] hmm, I don't get crontab email from tool labs... [16:42:06] though it runs well. [16:47:30] maybe you used the -quiet option [17:06:01] 3Wikimedia Labs / 3deployment-prep (beta): Unable to connect to redis server - 10https://bugzilla.wikimedia.org/71415#c9 (10Greg Grossmeier) 5REOP>3RESO/FIX andrewbogott: i think Chris wanted to, but not today anymore [18:16:56] <^d> wtf is up with wikitech? [18:17:01] <^d> getting monospaced font everywhere. [18:19:11] im not able to load my tool's web interface http://tools.wmflabs.org/recitation-bot/cgi-bin/add_doi.py [18:19:25] oops, i see: status: DNS issues. ill be patient [18:25:22] ^d: it looks ok to me... [18:25:25] screenshot? [18:26:22] andrewbogott, it does load now, thanks [18:26:42] notconfusing_: howdy! [18:26:48] ragesoss, hey [18:26:52] I met one of your collaborators in Seattle the other week. [18:27:05] :) [18:27:21] <^d> andrewbogott: It's only one page. Must be some borked text. [18:27:33] <^d> [[Search]] [18:28:18] Please, how long a modified cron file to be considered? [18:28:28] oh yeah, the sidebar is messed up there. neato [18:28:45] <^d> andrewbogott: Missing [18:28:55] <^d> Actually, double open. [18:29:01] <^d> It had foo [18:29:02] <^d> :) [18:29:22] ragesoss, Thomas Maillart [18:32:01] andrewbogott: For some reason the office wi-fi DNS still hasn't picked the DNS for tools-login.wmflabs.org: "server can't find tools-login.wmflabs.org: NXDOMAIN". Everywhere else seems to have picked it up now though. [18:32:30] andrewbogott: Who should I bug about that? [18:32:47] mutante maybe? [18:32:52] If it's really an office IT thing, then probably Joel. Weird though [19:14:21] Does anybody have a rough idea of how large an empty MediaWiki implementation is? [19:14:34] I just want an estimate. [19:14:55] Howie_: development or from a release tarball? [19:15:08] What's the difference? [19:15:14] Howie_: development has git history [19:15:33] Howie_: the git history for just mediawiki/core is ~500M and takes up most of the initial space [19:15:36] How much does the git history contribute? [19:15:48] and then the extensions each have their own, some big some small(but none as big as core) [19:15:53] cal it a gig? [19:16:01] sweet [19:16:02] thanks! [20:26:19] !log deployment-prep Converted deployment-rsync02 to use local puppet & salt masters [20:26:22] Logged the message, Master [21:14:48] 3Wikimedia Labs / 3Infrastructure: Prevent puppet from creating local user when they are defined in LDAP - 10https://bugzilla.wikimedia.org/71480 (10Antoine "hashar" Musso) 3NEW p:3Unprio s:3normal a:3None We had a few LDAP rolling upgrades over the past few days. When puppet realize a User type, it... [21:20:57] is there a logstash for beta labs? [21:21:02] or just /data/project/logs/* [21:21:17] ebernhardson: https://logstash-beta.wmflabs.org/ [21:21:21] \o/ [21:21:25] bd808: thanks [21:21:45] If you need the password, let me know [21:22:03] bd808: turns out i do, it doesn't like my ldap(which makes sense) [21:33:47] 3Tool Labs tools / 3Erwin's tools: shared.css gives a 404 breaking layout. - 10https://bugzilla.wikimedia.org/71482 (10Andre Koopal) 3NEW p:3Unprio s:3normal a:3None Via Common.css the following css is included: @import "//bits.wikimedia.org/static-current/skins/common/shared.css"; After advice on... [21:40:07] bd808: i dunno who to ask, but it appears all runJobs logging in beta logstash stopped on 9-29 [21:40:21] there is about 24-36 hours of no logs [21:40:31] hmm... is the job runner running? [21:40:37] which machine? i can check [21:40:52] deployment-jobrunner01 [21:40:54] oh, duh [21:41:42] It may be dead. It likes to die [21:41:46] bd808: there is an active redisJobRunnerService process [21:42:45] ahha, [21:42:49] 2014-09-30T21:42:32+0000: Could not connect to Redis server 10.68.16.146:. [21:42:49] mwscript showJobs.php --wiki=enwiki --list [21:42:52] I think it's dead [21:42:58] from /var/log/mediwiki/jobrunner.log [21:43:13] Ah a casualty of the redis01 dns problem [21:43:26] sudo service jobrunner restart [21:43:55] bd808: same output [21:44:05] bd808: i think its because the port is '.' [21:44:09] looking now where that comes from [21:44:27] empty port does seem not right [21:45:08] well, unless Redis treats null as default? i dont have the redis client source handy [21:47:22] from deployment-jobrunner01 i can't telnet to 10.68.16.146 6379, its filtered at the FW level most likely(no deny, just hang) [21:47:38] assuming we have redis on 6379, not sure how to check that either [21:47:59] I think it has a bad ip cached. I think it should be 10.68.16.177 [21:48:15] There was a whole debacle with this yesterday [21:48:50] Ori accidentally made a second instance with the deployment-redis01 host name [21:49:02] which apparently wikitech is happy to let you do [21:49:21] * bd808 double checks ip [21:50:09] ebernhardson: Yeah. Ip should be 10.68.16.177 [21:50:11] i can verify /etc/jobrunner/jobrunner.conf still points to 10.68.16.146 [21:50:19] hmm, so somewhere in puppet :) looking [21:50:37] Oh. not using host name maybe [21:51:05] yup, no mediawiki-redis01, just raw ip [21:51:44] :( [21:57:33] PROBLEM - ToolLabs: Puppet failure events on labmon1001 is CRITICAL: CRITICAL: tools.tools-exec-02.puppetagent.failed_events.value (22.22%) tools.tools-webgrid-03.puppetagent.failed_events.value (33.33%) [22:00:06] !log deployment-prep Cleaned deleted instances out of salt and trebuchet redis [22:00:09] Logged the message, Master [22:07:33] bd808: https://gerrit.wikimedia.org/r/163973 to switch from ip to deployment-redis01 (i also don't know anything about the puppet standards or how to deploy those) [22:10:02] ebernhardson: I'll check it out. I can cherry-pick to beta [22:10:40] thanks [22:14:52] (03CR) 10Catrope: [C: 032] No need to broadcast Citoid service deploy repo changes twice [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/163795 (owner: 10Jforrester) [22:14:55] (03Merged) 10jenkins-bot: No need to broadcast Citoid service deploy repo changes twice [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/163795 (owner: 10Jforrester) [22:16:50] !log tools.lolrrit-wm Restart to fix Citoid duplicate definition [22:16:52] Logged the message, Master [22:19:44] (03CR) 10Jforrester: "(And deployed.)" [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/163795 (owner: 10Jforrester) [22:25:00] RECOVERY - ToolLabs: Puppet failure events on labmon1001 is OK: OK: All targets OK [23:56:41] ebernhardson: Job queue is almost empty in beta again thanks to your patch :)