[00:18:13] 10MediaWiki-extensions-OpenStackManager, 10wikitech.wikimedia.org: Wikitech often loses track of internal openstack/nova session - https://phabricator.wikimedia.org/T101199#1332480 (10Krinkle) 3NEW [00:18:46] 6Labs, 10Labs-Infrastructure: labsconsole: Empty instance list - https://phabricator.wikimedia.org/T73731#1332493 (10Krinkle) [00:18:51] 10MediaWiki-extensions-OpenStackManager, 10wikitech.wikimedia.org: Wikitech often loses track of internal openstack/nova session - https://phabricator.wikimedia.org/T101199#1332496 (10Krinkle) [00:19:17] 6Labs, 10wikitech.wikimedia.org: Unable to see or delete existing web proxy - https://phabricator.wikimedia.org/T90391#1332498 (10Krinkle) 5Open>3Resolved a:3Krinkle [01:51:49] I applied for an OAuth token about a week and a half ago on Mediawiki.org and I haven't heard anything. Am I doing it wrong? [03:24:15] Magog_the_Ogre: yeah, you need to bug someone with staff rights on IRC :P [03:24:53] thanks legoktm [03:25:39] 6Labs, 10Labs-Infrastructure, 7Regression: `hostname -f` on cvn-app5.eqiad.wmflabs returning "error: Name or service not known" - https://phabricator.wikimedia.org/T101215#1332818 (10Krinkle) 3NEW [03:50:06] (03PS2) 10Mattflaschen: basic setup for the application [labs/tools/flow-oauth-demo] - 10https://gerrit.wikimedia.org/r/213590 (https://phabricator.wikimedia.org/T101217) (owner: 10Rjain) [06:33:33] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1207 is CRITICAL 70.00% of data above the critical threshold [0.0] [06:58:32] RECOVERY - Puppet failure on tools-webgrid-lighttpd-1207 is OK Less than 1.00% above the threshold [0.0] [07:38:59] 6Labs, 5Patch-For-Review, 7database: Get Labs openstack service dbs on a proper db server - https://phabricator.wikimedia.org/T92693#1333065 (10jcrespo) a:5jcrespo>3None [08:54:31] YuviPanda: *poke* :) [08:54:37] hi addshore [08:54:38] 'sup [08:54:54] hows the cold north of the UK? :) [08:55:04] addshore: VERY [08:55:09] Also, any idea which files I should poke for https://phabricator.wikimedia.org/T100885 ? [08:55:31] loooking [08:55:57] addshore: hmm, apergos would know, I think. it's an nfs mount on teh dataset hosts, let me poke around [08:55:59] I would guess in https://github.com/wikimedia/operations-puppet/tree/production/modules/dataset/files/labs ? [08:56:08] kk :) [08:56:37] I mean, they are currently in /data/scratch wikibase and wikidata [08:56:41] addshore: looks like https://github.com/wikimedia/operations-puppet/blob/production/modules/dataset/files/labs/labs-rsync-cron.sh [08:56:43] but best make things consistent and easy :P [08:56:56] addshore: yes, and /public is in a different mount with less outages :) [08:57:01] xD [08:58:57] 10Tool-Labs, 3Labs-Sprint-100: Move toollabs to designate - https://phabricator.wikimedia.org/T100023#1303692 (10yuvipanda) This kind of blew up and caused a gridengine outage, but is all done now. Awaiting more details + incident report from @COren. [09:05:04] 6Labs, 10Datasets-General-or-Unknown, 10Labs-Infrastructure, 10Wikidata, 3Wikidata-Sprint-2015-06-02: Add Wikidata json dumps to labs in /public/dumps - https://phabricator.wikimedia.org/T100885#1333173 (10Addshore) a:3Addshore [09:06:52] awesome, patch up! ;) [09:18:46] !log wdq-mm deleted wdq-mm-02 [09:18:51] Logged the message, Master [09:41:15] 6Labs, 10Datasets-General-or-Unknown, 10Labs-Infrastructure, 10Wikidata, and 2 others: Add Wikidata json dumps to labs in /public/dumps - https://phabricator.wikimedia.org/T100885#1333206 (10Addshore) So everyone watching this ticket has some idea of a timescale for this! ``` 10:06 AM (PS1) A... [10:28:06] 6Labs: Upgrade postgres on labsdb1004 / 1005 to 9.4 - https://phabricator.wikimedia.org/T101233#1333262 (10yuvipanda) 3NEW [10:32:44] 6Labs: Upgrade postgres on labsdb1004 / 1005 to 9.4, and PostGis 2.1 - https://phabricator.wikimedia.org/T101233#1333272 (10Yurik) [10:52:12] 6Labs: Cannot ssh into wdq-mm-02.eqiad.wmflabs - https://phabricator.wikimedia.org/T101102#1333304 (10yuvipanda) Recreated it, but I wdq-mm doesn't start yet. I guess the package needs to be updated and the data file copied over. [11:09:59] 6Labs: Cannot ssh into wdq-mm-02.eqiad.wmflabs - https://phabricator.wikimedia.org/T101102#1333322 (10Magnus) On wdq-mm-01, the file is /srv/wdq/latest.wdq I created that directory on 02, but can't scp the file over from 01 for some reason. [11:10:47] 6Labs: Cannot ssh into wdq-mm-02.eqiad.wmflabs - https://phabricator.wikimedia.org/T101102#1333323 (10yuvipanda) I usually just cp the file to /data/scratch and copy it back the other side :D [11:23:10] 10Tool-Labs-tools-Other, 10Phragile, 6TCB-Team: Deploy Phragile on tool-labs - https://phabricator.wikimedia.org/T100192#1333343 (10Tobi_WMDE_SW) It is deployed on http://phragile.wmflabs.org now but not jet puppetized. I'm closing this task now and there is a separate one for Puppet: T101235 [11:34:53] 6Labs: milimetric and halfak would like postgresql database access - https://phabricator.wikimedia.org/T91267#1333410 (10yuvipanda) 5Open>3Resolved halfak has access for a while now. [11:47:41] 6Labs: Cannot ssh into wdq-mm-02.eqiad.wmflabs - https://phabricator.wikimedia.org/T101102#1333486 (10Magnus) Thanks, that worked. File is copied. But the service still won't run. :-( [12:02:32] 6Labs, 6operations: labvirt1005 doesn't boot up - https://phabricator.wikimedia.org/T100030#1333509 (10faidon) What's the status of this? Is it blocked on someone outside the Labs team? [12:09:12] 6Labs, 6operations: labvirt1005 doesn't boot up - https://phabricator.wikimedia.org/T100030#1333520 (10yuvipanda) Ugh, this fell through the cracks :| Ideally, someone will investigate ways to get this machine booting up on a kernel that's new enough to not have the memory issues that @bblack pointed out - an... [12:17:14] 6Labs, 6operations, 3ToolLabs-Goals-Q4: Investigate kernel issues on labvirt** hosts - https://phabricator.wikimedia.org/T99738#1333536 (10yuvipanda) T100030 is related [12:21:30] 6Labs, 6operations: labvirt1005 doesn't boot up - https://phabricator.wikimedia.org/T100030#1333543 (10yuvipanda) I've asked for help in the ops@ list again. [12:28:24] 6Labs, 6operations: labvirt1005 doesn't boot up - https://phabricator.wikimedia.org/T100030#1333554 (10yuvipanda) @andrew says that similar issues had cropped up in another machine before, and a rollback to an older kernel fixed it. [12:47:14] YuviPanda: regarding the boot problems of virt* [12:47:21] what is the OS ? [12:47:25] ubuntu ? [13:20:53] 6Labs, 10Labs-Infrastructure, 6operations, 3Labs-Sprint-100: Make a block-level copy of the codfw mirror of labstore1001 to eqiad - https://phabricator.wikimedia.org/T101010#1333661 (10coren) The copy is progressing nicely (if not as fast as hoped); the bottleneck appears to be the ssh channel window size... [13:45:05] 6Labs, 5Patch-For-Review, 7database: Get Labs openstack service dbs on a proper db server - https://phabricator.wikimedia.org/T92693#1333697 (10Andrew) Is this done, or still pending the backup work? [14:14:16] 6Labs, 5Patch-For-Review, 7database: Get Labs openstack service dbs on a proper db server - https://phabricator.wikimedia.org/T92693#1333782 (10jcrespo) @Andrew Blocked by the above patches (backups). [14:15:55] 6Labs, 6operations: labvirt1005 doesn't boot up - https://phabricator.wikimedia.org/T100030#1333783 (10BBlack) The memory issues were just a random guess, not real evidence. I do think getting on newer kernels is probably a win in general, though. The alerts about not finding disks.... is this generic to all... [14:53:04] andrewbogott: YuviPanda: Anything to add to http://etherpad.wikimedia.org/p/Labs-20150602 [14:53:27] Not sure what I could put in Actionables though. [14:54:17] ‘fix gridengine’ :( [14:55:13] Well yeah, having the alias bypass the check is arguably a bug in gridengine but in fairness it's not /supposed/ to support renaming exec nodes at all in the first place. [14:55:35] Yeah, I guess renaming a running server isn’t exactly standard practice. [14:55:55] The "correct" thing to do would have been to create new nodes with the new naming scheme; we avoided that because disruptive outage for running jobs (which we did avoid) [14:57:06] "Host name will not change while the system is live" is not an entirely unreasonable presumption. :-) [14:57:35] yeah [14:57:59] I added a few actionable [14:57:59] s [14:59:10] andrewbogott: just for some impact feeling: https://commons.wikimedia.org/w/index.php?title=Special:ListFiles/Matanya&ilshowall=1 <- those videos were produced by the video project in labs, thank you! [14:59:29] YuviPanda: That second one would be an interesting exercise worthy of writing a whole book, not a recipe. I don't think it's plausible to write a tutorial on "how to do time recovery of a BDB including manually editing a dump for newbies" :-) [14:59:41] matanya: lots! [15:00:47] YuviPanda: Beyond "have had to recover broken BDB before lots in panicky situations and learn to read and edit the outbut of db_dump and friends" :-) [15:20:27] 6Labs, 10Labs-Infrastructure: missing database entries at categorylinks table on dewiki db - https://phabricator.wikimedia.org/T72711#1334028 (10Merl) And currently again: ``` $ mysql -hs5.labsdb -vvve "select page_id, page_latest, cl_to from dewiki_p.page left join dewiki_p.categorylinks on page_id=cl_from wh... [15:23:39] 6Labs, 10Labs-Infrastructure: missing database entries at categorylinks table on dewiki db - https://phabricator.wikimedia.org/T72711#1334042 (10coren) @merl: That query produces exactly the same result in production - whatever the issue you expect may be, it is not related to replication. [15:25:55] (03PS1) 10Sitic: logevent support [labs/tools/crosswatch] - 10https://gerrit.wikimedia.org/r/215639 [15:26:14] Coren: i am expecting the categories also shown at the bottom of http://de.wikipedia.org/w/index.php?title=Leopold_Mozart&oldid=142745685 [15:26:20] (03CR) 10Sitic: [C: 032 V: 032] logevent support [labs/tools/crosswatch] - 10https://gerrit.wikimedia.org/r/215639 (owner: 10Sitic) [15:27:10] to which project must the task moved then? [15:27:15] Merlissimo: That may be an issue related to the job runners, but I can tell you that production databases give exactly that result for that query. [15:27:37] Merlissimo: I *think* there is a related ticket already. Give me a minute. [15:30:31] Merlissimo: I can't find a specific bug. I might suggest MediaWiki-JobQueue as a first attempt, if it's not queue related, they're more likely to know where the issue is. Definitely not DB though [15:34:11] how long will the dns changes last? [15:34:48] doctaxon: They're intended to be permanent. [15:35:15] permanent until when? [15:36:23] Wait, I'm not sure I understand the context to your question, then. [15:38:22] My sessions on bastion, trusty and login shut down every minute [15:39:25] Hm. That has nothing to do with DNS (I expect you thought so because of the /topic which refers to something else and which is, in fact, done) [15:39:30] with connection abort [15:39:42] doctaxon: Lemme see if I can see why in the logs. [15:40:24] doctaxon: What is you username? [15:41:33] Coren - Can you access to the logs? [15:41:50] the bot username is taxonbot [15:44:32] doctaxon: I see nothing in the log except the connection closing on your end; nor do I see anyone else with suspiciously brief sessions. Perhaps there is a network issue on your side? Have you tried to connect elsewhere? [15:45:24] There is no possibility to connect elsewhere right now [15:46:16] You may try one of the general bastions. (bastion.wmflabs.org) [15:54:48] (03PS1) 10Ricordisamoa: Initial commit [labs/tools/translatemplate] - 10https://gerrit.wikimedia.org/r/215644 [15:55:06] Coren - general bastion does not support become [15:55:23] Well, no, this was only about testing your ssh issues. :-) [16:04:55] (03PS2) 10Ricordisamoa: Initial commit [labs/tools/translatemplate] - 10https://gerrit.wikimedia.org/r/215644 [16:14:24] 6Labs, 6Analytics-Kanban: LabsDB problems negatively affect analytics tools like Wikimetrics, Vital Signs, Quarry, etc. {mole} - https://phabricator.wikimedia.org/T76075#1334179 (10kevinator) 5Open>3declined a:3kevinator I'm closing this task because it is very broad and there are no clear next steps to... [16:38:12] 10Tool-Labs: Install "internetarchive" python module - https://phabricator.wikimedia.org/T100977#1334291 (10Nemo_bis) I don't understand, was the package actually installed? [16:49:09] 6Labs: Upgrade postgres on labsdb1004 / 1005 to 9.4, and PostGis 2.1 - https://phabricator.wikimedia.org/T101233#1334317 (10Yurik) [16:49:29] 10Tool-Labs: Install "internetarchive" python module - https://phabricator.wikimedia.org/T100977#1334320 (10yuvipanda) 5Resolved>3declined [16:50:13] 10Tool-Labs: Install "internetarchive" python module - https://phabricator.wikimedia.org/T100977#1325330 (10yuvipanda) That's better :) [17:11:00] thcipriani: how did staging hold up through the change? [17:11:58] so, changed the use_dnsmasq=false, which changed /etc/resolv.conf to use 208.80.154.12 as a nameserver, then it seems, it refuses to connect to ldap [17:12:15] .12? [17:12:19] It should be .20 I think [17:12:22] yeah, seems like it should be .20 [17:12:28] Sounds like your puppet repo is out of date [17:12:39] ah, lemme check that [17:13:06] I think there’s a class you can set that will automatically rebase on a cron. I don’t know how well it works though [17:16:52] andrewbogott: staging was indeed behind, rebased re-running puppet now [17:17:11] great. Should help, although you might have to hand-tune resolv.conf to get puppet runs [17:18:29] kk, so it is .20 now, and I can still hit ldap, so hooray :) [17:18:49] cool [17:20:20] andrewbogott: so, it seems, the master certname has updated in puppet.conf on staging-palladium, but not the agent certname. Even on subsequent runs. [17:21:01] Ok, let me have a look... [17:23:57] thcipriani: that’s the puppetmaster, right? [17:24:01] right [17:26:28] where is the name of the puppetmaster defined? [17:26:42] also, fun thing, since I set use_dnsmasq to false, and the nameserver was incorrect for whatever reason, the other instances in that project can connect up to the puppet master to receive the update and (with no ldap) I have no sudo privileges to update :( [17:27:01] s/can connect/can't connect/ [17:27:19] 6Labs, 10Labs-Infrastructure: unstable puppet runs on holmium - https://phabricator.wikimedia.org/T101281#1334458 (10Andrew) 3NEW a:3Andrew [17:27:21] andrewbogott: you mean in /etc/puppet/puppet.conf? [17:27:42] thcipriani: I mean, where is the puppet setting that tells the clients what the master is named. [17:27:48] I’m confused by hiera vs. ldap [17:28:28] ah, so it should be in hiera in the case of staging [17:28:42] it's definied in role::puppet::self as an argument ot that class [17:29:01] so if it's not defined in hiera, then it uses the top level variable retrieved from ldap [17:29:32] * andrewbogott looks at puppet code [17:30:05] 10Tool-Labs: Install "internetarchive" python module - https://phabricator.wikimedia.org/T100977#1334470 (10Sitic) @yuvipanda Nemos problem is actually that trusty has no pip installed (precise had): https://wikitech.wikimedia.org/wiki/Help_talk:Tool_Labs/Python_application_stub (virtualenv may be a bit overwhe... [17:32:06] thcipriani: I am going to rm -rf /etc/puppet/puppet.conf.d/10-self.conf to force it to regenerate. [17:32:41] kk [17:33:33] 10Tool-Labs: Install "internetarchive" python module - https://phabricator.wikimedia.org/T100977#1334483 (10Nemo_bis) Thanks both. >>! In T100977#1334470, @Sitic wrote: > @Nemo_bis Yuvi is basically asking you to run: > > ``` > virtualenv ~/env > source $HOME/env/bin/activate > echo "source $HOME/env/bin/activ... [17:33:43] ok, looks the same as before. To you too? [17:33:58] certname is i-0000094c.staging.eqiad.wmflabs which seems right to me [17:35:38] okie doke. Looks like the puppet master also generated private keys for that name at some point [17:37:17] 6Labs, 10Labs-Infrastructure: unstable puppet runs on holmium - https://phabricator.wikimedia.org/T101281#1334495 (10yuvipanda) This might be causing ocassional DNS outages, evident from some gridengine failures and some diamond error messages [17:38:28] andrewbogott: so the name staging-palladium.eqiad.wmflabs still resolves, will that always be true/should that be the case? [17:40:25] yes, for the near (and possibly distant) future the new dns maintains those names as well as the more correct .project.eqiad.wmflabs names. [17:40:45] It’s sort of bad practice to rely on them since they can be hijacked, but… that’s a memo for another day. [17:41:09] kk, just wondering if the salt minion conf would have to change [17:41:31] doesn't seem like it _has_ to, but probably should [17:41:46] andrewbogott: yes, I was hoping to start naming new toollabs hosts just 'exec-01' instead of 'tools-exec-01' but that's still problematic [17:42:05] thcipriani: yeah [17:42:11] So are you getting good puppet runs on clients now? [17:42:56] checking... [17:45:20] 6Labs, 7Blocked-on-Operations: Upgrade postgres on labsdb1004 / 1005 to 9.4, and PostGis 2.1 - https://phabricator.wikimedia.org/T101233#1334561 (10Yurik) [17:45:31] 6Labs, 10Maps, 7Blocked-on-Operations: Upgrade postgres on labsdb1004 / 1005 to 9.4, and PostGis 2.1 - https://phabricator.wikimedia.org/T101233#1333262 (10Yurik) [17:45:48] some are still in the state where they can't contact ldap :\ others are complaining about the cert from puppet master not matching. [17:45:54] others are fine [17:48:03] thcipriani: ok… you saw the step where you have to rename the puppetmaster on the client in puppet.conf? [17:49:43] missed that, just tried it, problem now is the old dns can't resolv the new hostname. [17:49:58] 10Tool-Labs: Install "internetarchive" python module - https://phabricator.wikimedia.org/T100977#1334616 (10Sitic) I've added it to https://wikitech.wikimedia.org/wiki/Help:Tool_Labs#My_tool_requires_a_package_that_is_not_currently_installed_in_Tool_Labs._How_can_I_add_it.3F I'm not sure what to do with https://... [17:50:32] andrewbogott: I'm on staging-sca01, FYI [17:50:46] hm, I wonder why that didn’t happen to me in my tests [17:51:03] I wonder if we could just delete the certs on the client instead of changing the hostname in puppet.conf [17:51:19] yes, that should be fine. [17:51:52] but, then, will it grab the cert for the old hostname again? Well, let's find out. [17:52:25] 6Labs, 10Labs-Infrastructure, 5Patch-For-Review: unstable puppet runs on holmium - https://phabricator.wikimedia.org/T101281#1334628 (10Andrew) https://gerrit.wikimedia.org/r/#/c/215673/ fixed the main problem. runs are still unstable due to ferm rules. [17:54:27] 10Tool-Labs: Install "internetarchive" python module - https://phabricator.wikimedia.org/T100977#1334657 (10yuvipanda) https://merlijn.vandeen.nl/2015/flask-mwoauth-on-tools.html has good info on setting up a flask app. [17:56:34] andrewbogott: hmm, so removed certs on client, cleaned the cert from master, requested new cert and it seems to have fetched the old one :\ [17:56:50] Server hostname 'staging-palladium.eqiad.wmflabs' did not match server certificate; [17:57:24] * andrewbogott will log in and look [17:57:52] maybe just manually wrangling resolv.conf or adding the master to /etc/hosts and renaming on the client [17:58:16] easiest things I can think of off the top of my head :\ [17:58:54] because the wrong dns ip got sent out everywhere… we’re in unexplored territory [17:59:53] right, this did resolve itself on other machines, oddly [18:01:14] Hey, there is a big issue, Can you install requests library in labs (both on tools and grid engine) [18:01:19] andrewbogott: ^ [18:01:33] pywikibot now is using requests [18:01:45] thcipriani: well, also puppet doesn’t run cleanly on your clients. [18:01:51] I mean, it doesn’t compile. [18:01:57] So nothing much is going to happen on those systems :( [18:02:04] have a look on staging-sca01 [18:02:17] Amir1: hi. you should use a virtualenv. [18:02:36] hey [18:03:06] It's possible but it would make things easier for people [18:03:08] hmm, bitrot of some of these patches, but the fact that it's contacting the server is good enough in that instance [18:03:33] Amir1: not in the long term no. with a virtualenv, you control all the depdencies yourselves. [18:03:37] Amir1: I was worried about that too when I saw the sudden switch, but requests seems to be installed (at least the exec nodes) [18:03:39] Amir1: also, requests is already installed anyway. [18:03:58] but in the future (we should write this down somewhere), I think we should be encouraging more people to use virtualenvs [18:04:15] hmm [18:04:16] thcipriani: so, I just did what you suggested… pasted the new resolver ip into resolv.conf and edited the server name. [18:04:16] ok [18:04:29] Amir1: in the case of pywikibot, we could probably provide a venv. Not completely sure. [18:04:44] also, requests has the complete mess there's various versions used [18:04:48] andrewbogott: kk, noted if we run into it with deployment-prep. [18:05:08] yes, and global packages have the problem that some people depend on one version and others on another [18:05:09] YuviPanda: is there a reason to prefer virtualenv and not pip install --user? (except having several virtualenvs) [18:05:39] sitic: vitualenvs are per-app [18:05:45] sitic: and dependencies are per-app, so that makes more sense [18:06:08] YuviPanda or andrewbogott: I'm having problems adding a user to the suggestbot project through wikitech. Is that a known issue? [18:06:08] One noteworthy thing before moving into deployment prep: I think the ldap-yaml-enc.py update has to merge first [18:06:11] valhallasw: I mean that pip is not installed in trusty, so that you have to use virtualenv [18:06:18] Nettrom: have you tried logging out and logging back in? [18:06:26] sitic: that's just ubuntu retardedness [18:06:33] ok [18:06:40] idk, the old tools-login someone might have bypassed puppet and installed pip? [18:06:57] I think python-pip should just be installed [18:07:02] YuviPanda: ok, have you checked the issue about dump reading? [18:07:06] YuviPanda: yeah, didn't help. the user I'm trying to add isn't in the dropdown list [18:07:07] It's a big blocker for me [18:07:21] Amir1: can you file a bug? [18:07:28] I already did [18:07:43] YuviPanda: is it the case that labs ldap info now comes from that ldap-yaml-enc.py script? [18:07:43] alright. I haven't gotten around to be able to look at it yet, sadly :( [18:07:46] Nettrom: um… drop down list? Sounds like you’re trying to add an admin [18:07:48] rather than a normal user [18:07:54] thcipriani: only for staging, actually. [18:08:06] andrewbogott: so I should not be doing this from https://wikitech.wikimedia.org/w/index.php?title=Special:NovaServiceGroup&action=managemembers&projectname=tools&servicegroupname=tools.suggestbot ? [18:08:11] YuviPanda: https://phabricator.wikimedia.org/T100227 [18:08:15] ah, ok, I can see the script in puppet.conf on other instances, but wasn't sure what it's role was [18:08:17] valhallasw: it's going to error for people installing global scripts anyway, I guess. I don't mind getting it installed. [18:08:24] Nettrom: there are two ‘add user’ links. One adds them to a role, the one to the right adds them to the project... [18:08:36] s/add user/add member/ [18:08:41] Nettrom: see what I mean? [18:09:02] Whoah, wait, you’re talking about service groups and not projects… [18:09:08] * andrewbogott pretty confused [18:09:12] andrewbogott: yeah, sorry [18:09:39] what user are you trying to add? [18:09:46] andrewbogott: kjschiroo [18:10:37] 10Tool-Labs: make https://dumps.wikimedia.org/other/wikidata/ available on tool labs - https://phabricator.wikimedia.org/T98655#1334758 (10Sitic) [18:10:40] 6Labs, 10Datasets-General-or-Unknown, 10Labs-Infrastructure, 10Wikidata, and 2 others: Add Wikidata json dumps to labs in /public/dumps - https://phabricator.wikimedia.org/T100885#1334759 (10Sitic) [18:11:20] Nettrom: it… works fine for me. You’re seeing a million names in the drop-down but not that one? [18:11:53] andrewbogott: exactly! [18:12:31] Nettrom: well, I added them. I can’t explain what you’re seeing. [18:13:02] andrewbogott: I couldn't figure it out either, since his user talk page indicated he's already got all the access needed [18:13:21] andrewbogott: thanks for the help! [18:15:22] YuviPanda: looking at deployment_salt puppet.conf, somehow node_terminus is exec and not ldap. I may be misunderstanding something... [18:15:53] thcipriani: oh, is deployment-salt is also using ENC? [18:16:04] I _think_ so [18:17:29] yes, it is: https://wikitech.wikimedia.org/wiki/Hiera:Deployment-prep [18:17:45] role::puppet::self::enc: yaml+ldap [18:18:12] same with Integration: https://wikitech.wikimedia.org/wiki/Hiera:Integration [18:19:07] which, if we're sticking with that, we'll need to merge https://gerrit.wikimedia.org/r/#/c/202790/ before the new dns rolls [18:19:07] thcipriani: ugh, sorry, in like, 4 different conversations at once :) [18:19:28] thcipriani: am merging that now [18:19:45] YuviPanda: kk, thanks [18:19:59] thcipriani: has staging been swiched already? [18:20:04] YuviPanda: yup [18:20:50] thcipriani: done [18:20:52] andrewbogott: that enc change will have to roll out before we can switch use_dnsmasq to false on any projects using node_terminus: exec in puppet [18:20:56] YuviPanda: thanks :) [18:21:13] thcipriani: ah, true. [18:21:25] But step 1) update puppet will roll it out, right? [18:21:39] there's also only 3 projects that use it I think [18:21:58] yeah, puppet update should roll it, I think [18:22:29] 6Labs, 10Labs-Infrastructure, 5Patch-For-Review: unstable puppet runs on holmium - https://phabricator.wikimedia.org/T101281#1334852 (10Andrew) 5Open>3Resolved [18:23:27] andrewbogott: ok, ready to break deployment-prep? :) [18:24:04] thcipriani: yep! No time like the present. [18:24:20] alright, jumping on deployment salt to update puppet [18:24:26] logging in -releng [18:30:12] 10Tool-Labs, 10Incident-20150602-gridengine-dns-failure: Tools: puppetize the alias_hosts workaround for mismatching DNS node names - https://phabricator.wikimedia.org/T101296#1334896 (10coren) 3NEW a:3coren [18:32:15] ok, deployment-salt puppet updated, running once manually before changing use_dnsmasq [18:49:03] YuviPanda: you told me about release and how I can use it in jsub but I forgot :P [18:50:36] Amir1: ah. -l release=trusty [18:50:53] valhallasw: ^ we should make that the default for tools that don't already have a job running in precise at some point [18:50:55] not sure how to best do that [18:57:03] thanks :) [18:57:34] PROBLEM - Puppet failure on tools-bastion-01 is CRITICAL 50.00% of data above the critical threshold [0.0] [18:58:08] PROBLEM - Puppet failure on tools-exec-1210 is CRITICAL 22.22% of data above the critical threshold [0.0] [18:59:32] PROBLEM - Puppet failure on tools-exec-wmt is CRITICAL 30.00% of data above the critical threshold [0.0] [18:59:42] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1407 is CRITICAL 50.00% of data above the critical threshold [0.0] [18:59:45] PROBLEM - Puppet failure on tools-webgrid-generic-1402 is CRITICAL 60.00% of data above the critical threshold [0.0] [18:59:45] PROBLEM - Puppet failure on tools-trusty is CRITICAL 50.00% of data above the critical threshold [0.0] [19:00:11] PROBLEM - Puppet failure on tools-redis-02 is CRITICAL 66.67% of data above the critical threshold [0.0] [19:00:11] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1208 is CRITICAL 66.67% of data above the critical threshold [0.0] [19:00:11] PROBLEM - Puppet failure on tools-exec-1204 is CRITICAL 55.56% of data above the critical threshold [0.0] [19:00:23] andrewbogott: ^ this is us and not you, do not worry :) [19:01:19] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1202 is CRITICAL 55.56% of data above the critical threshold [0.0] [19:01:29] YuviPanda: thanks :) [19:02:19] PROBLEM - Puppet failure on tools-checker-02 is CRITICAL 50.00% of data above the critical threshold [0.0] [19:02:41] PROBLEM - Puppet failure on tools-exec-1202 is CRITICAL 20.00% of data above the critical threshold [0.0] [19:03:07] PROBLEM - Puppet failure on tools-exec-1403 is CRITICAL 33.33% of data above the critical threshold [0.0] [19:03:23] PROBLEM - Puppet failure on tools-exec-1201 is CRITICAL 60.00% of data above the critical threshold [0.0] [19:03:41] PROBLEM - Puppet failure on tools-exec-1402 is CRITICAL 40.00% of data above the critical threshold [0.0] [19:04:13] PROBLEM - Puppet failure on tools-webgrid-generic-1404 is CRITICAL 55.56% of data above the critical threshold [0.0] [19:05:46] PROBLEM - Puppet failure on tools-exec-1207 is CRITICAL 50.00% of data above the critical threshold [0.0] [19:05:52] PROBLEM - Puppet failure on tools-mail is CRITICAL 30.00% of data above the critical threshold [0.0] [19:06:22] PROBLEM - Puppet failure on tools-exec-1216 is CRITICAL 22.22% of data above the critical threshold [0.0] [19:06:30] PROBLEM - Puppet failure on tools-master is CRITICAL 20.00% of data above the critical threshold [0.0] [19:06:42] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1405 is CRITICAL 40.00% of data above the critical threshold [0.0] [19:06:46] PROBLEM - Puppet failure on tools-bastion-02 is CRITICAL 40.00% of data above the critical threshold [0.0] [19:07:06] PROBLEM - Puppet failure on tools-exec-1212 is CRITICAL 33.33% of data above the critical threshold [0.0] [19:07:18] PROBLEM - Puppet failure on tools-shadow is CRITICAL 40.00% of data above the critical threshold [0.0] [19:07:36] PROBLEM - Puppet failure on tools-precise-dev is CRITICAL 30.00% of data above the critical threshold [0.0] [19:08:02] PROBLEM - Puppet failure on tools-exec-1407 is CRITICAL 20.00% of data above the critical threshold [0.0] [19:08:47] PROBLEM - Puppet failure on tools-exec-1208 is CRITICAL 60.00% of data above the critical threshold [0.0] [19:08:57] PROBLEM - Puppet failure on tools-exec-catscan is CRITICAL 60.00% of data above the critical threshold [0.0] [19:09:01] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1204 is CRITICAL 50.00% of data above the critical threshold [0.0] [19:09:13] PROBLEM - Puppet failure on tools-exec-1401 is CRITICAL 66.67% of data above the critical threshold [0.0] [19:09:21] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1409 is CRITICAL 60.00% of data above the critical threshold [0.0] [19:09:33] YuviPanda: not sure how to do that without breaking stuff [19:09:35] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1207 is CRITICAL 22.22% of data above the critical threshold [0.0] [19:09:42] valhallasw: yeah [19:09:49] PROBLEM - Puppet failure on tools-webgrid-generic-1403 is CRITICAL 20.00% of data above the critical threshold [0.0] [19:09:58] YuviPanda: I think we should move to the 'set in manifest, start/stop with webservice' model, and then we can just change all manifests [19:10:04] valhallasw: +1 [19:10:31] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1206 is CRITICAL 60.00% of data above the critical threshold [0.0] [19:10:45] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1406 is CRITICAL 50.00% of data above the critical threshold [0.0] [19:10:50] valhallasw: that should be part of the 'one script runs as the user and does things' that you mentioned for tools-manifest [19:10:53] PROBLEM - Puppet failure on tools-exec-1405 is CRITICAL 60.00% of data above the critical threshold [0.0] [19:10:53] PROBLEM - Puppet failure on tools-webgrid-generic-1401 is CRITICAL 50.00% of data above the critical threshold [0.0] [19:10:58] *nod* [19:11:01] PROBLEM - Puppet failure on tools-services-01 is CRITICAL 50.00% of data above the critical threshold [0.0] [19:11:33] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1403 is CRITICAL 40.00% of data above the critical threshold [0.0] [19:11:35] valhallasw: also, https://gerrit.wikimedia.org/r/#/c/215505/ :) [19:11:57] PROBLEM - Puppet failure on tools-checker-01 is CRITICAL 30.00% of data above the critical threshold [0.0] [19:11:59] PROBLEM - Puppet failure on tools-redis-01 is CRITICAL 30.00% of data above the critical threshold [0.0] [19:12:05] YuviPanda: oh, cool [19:12:11] don't have time to look at it atm [19:12:18] PROBLEM - Puppet failure on tools-exec-cyberbot is CRITICAL 44.44% of data above the critical threshold [0.0] [19:12:19] PROBLEM - Puppet failure on tools-exec-1211 is CRITICAL 40.00% of data above the critical threshold [0.0] [19:12:26] valhallasw: cool. do you think you'll have time anytime this week? [19:12:43] not sure, but I can probably give it a glance-over [19:13:02] valhallasw: yeah, I will try to nitpick to your quality before merging, but there are some arch changes I'd love to have a look over [19:13:52] PROBLEM - Puppet failure on tools-exec-1406 is CRITICAL 50.00% of data above the critical threshold [0.0] [19:14:00] PROBLEM - Puppet failure on tools-exec-1209 is CRITICAL 40.00% of data above the critical threshold [0.0] [19:22:33] RECOVERY - Puppet failure on tools-bastion-01 is OK Less than 1.00% above the threshold [0.0] [19:24:41] RECOVERY - Puppet failure on tools-webgrid-lighttpd-1407 is OK Less than 1.00% above the threshold [0.0] [19:24:45] RECOVERY - Puppet failure on tools-webgrid-generic-1402 is OK Less than 1.00% above the threshold [0.0] [19:25:09] RECOVERY - Puppet failure on tools-redis-02 is OK Less than 1.00% above the threshold [0.0] [19:25:11] RECOVERY - Puppet failure on tools-webgrid-lighttpd-1208 is OK Less than 1.00% above the threshold [0.0] [19:26:19] RECOVERY - Puppet failure on tools-webgrid-lighttpd-1202 is OK Less than 1.00% above the threshold [0.0] [19:27:11] RECOVERY - Puppet failure on tools-checker-02 is OK Less than 1.00% above the threshold [0.0] [19:28:07] RECOVERY - Puppet failure on tools-exec-1210 is OK Less than 1.00% above the threshold [0.0] [19:28:25] RECOVERY - Puppet failure on tools-exec-1201 is OK Less than 1.00% above the threshold [0.0] [19:29:14] RECOVERY - Puppet failure on tools-webgrid-generic-1404 is OK Less than 1.00% above the threshold [0.0] [19:29:34] RECOVERY - Puppet failure on tools-exec-wmt is OK Less than 1.00% above the threshold [0.0] [19:29:46] RECOVERY - Puppet failure on tools-trusty is OK Less than 1.00% above the threshold [0.0] [19:30:10] RECOVERY - Puppet failure on tools-exec-1204 is OK Less than 1.00% above the threshold [0.0] [19:32:42] RECOVERY - Puppet failure on tools-exec-1202 is OK Less than 1.00% above the threshold [0.0] [19:32:52] has anyone got wikidata toolkit running on tools-labs, or any other remote server? [19:32:52] i would like to do this so that downloading the data dump is fast, or perhaps ever take from local storage? [19:33:06] RECOVERY - Puppet failure on tools-exec-1403 is OK Less than 1.00% above the threshold [0.0] [19:33:38] RECOVERY - Puppet failure on tools-exec-1402 is OK Less than 1.00% above the threshold [0.0] [19:33:49] RECOVERY - Puppet failure on tools-exec-1208 is OK Less than 1.00% above the threshold [0.0] [19:33:58] RECOVERY - Puppet failure on tools-exec-catscan is OK Less than 1.00% above the threshold [0.0] [19:34:00] RECOVERY - Puppet failure on tools-webgrid-lighttpd-1204 is OK Less than 1.00% above the threshold [0.0] [19:34:05] notconfusing: emailing labs-l is probably going to get a better answer. there's also wdq.wmflabs.org [19:34:12] RECOVERY - Puppet failure on tools-exec-1401 is OK Less than 1.00% above the threshold [0.0] [19:34:23] RECOVERY - Puppet failure on tools-webgrid-lighttpd-1409 is OK Less than 1.00% above the threshold [0.0] [19:35:42] RECOVERY - Puppet failure on tools-webgrid-lighttpd-1406 is OK Less than 1.00% above the threshold [0.0] [19:35:47] RECOVERY - Puppet failure on tools-exec-1207 is OK Less than 1.00% above the threshold [0.0] [19:35:51] RECOVERY - Puppet failure on tools-mail is OK Less than 1.00% above the threshold [0.0] [19:35:51] RECOVERY - Puppet failure on tools-webgrid-generic-1401 is OK Less than 1.00% above the threshold [0.0] [19:35:53] RECOVERY - Puppet failure on tools-exec-1405 is OK Less than 1.00% above the threshold [0.0] [19:35:59] RECOVERY - Puppet failure on tools-services-01 is OK Less than 1.00% above the threshold [0.0] [19:36:21] RECOVERY - Puppet failure on tools-exec-1216 is OK Less than 1.00% above the threshold [0.0] [19:36:33] RECOVERY - Puppet failure on tools-master is OK Less than 1.00% above the threshold [0.0] [19:36:39] RECOVERY - Puppet failure on tools-webgrid-lighttpd-1405 is OK Less than 1.00% above the threshold [0.0] [19:36:45] RECOVERY - Puppet failure on tools-bastion-02 is OK Less than 1.00% above the threshold [0.0] [19:37:09] RECOVERY - Puppet failure on tools-exec-1212 is OK Less than 1.00% above the threshold [0.0] [19:37:19] RECOVERY - Puppet failure on tools-shadow is OK Less than 1.00% above the threshold [0.0] [19:37:23] RECOVERY - Puppet failure on tools-exec-cyberbot is OK Less than 1.00% above the threshold [0.0] [19:37:35] RECOVERY - Puppet failure on tools-precise-dev is OK Less than 1.00% above the threshold [0.0] [19:38:03] RECOVERY - Puppet failure on tools-exec-1407 is OK Less than 1.00% above the threshold [0.0] [19:38:53] RECOVERY - Puppet failure on tools-exec-1406 is OK Less than 1.00% above the threshold [0.0] [19:39:33] RECOVERY - Puppet failure on tools-webgrid-lighttpd-1207 is OK Less than 1.00% above the threshold [0.0] [19:39:47] YuviPanda, that's a good idea [19:39:49] RECOVERY - Puppet failure on tools-webgrid-generic-1403 is OK Less than 1.00% above the threshold [0.0] [19:40:33] RECOVERY - Puppet failure on tools-webgrid-lighttpd-1206 is OK Less than 1.00% above the threshold [0.0] [19:41:38] RECOVERY - Puppet failure on tools-webgrid-lighttpd-1403 is OK Less than 1.00% above the threshold [0.0] [19:41:58] RECOVERY - Puppet failure on tools-checker-01 is OK Less than 1.00% above the threshold [0.0] [19:42:00] RECOVERY - Puppet failure on tools-redis-01 is OK Less than 1.00% above the threshold [0.0] [19:42:16] RECOVERY - Puppet failure on tools-exec-1211 is OK Less than 1.00% above the threshold [0.0] [19:43:58] RECOVERY - Puppet failure on tools-exec-1209 is OK Less than 1.00% above the threshold [0.0] [19:55:07] are the Wikidata json dumps available from tools-labs, it seems that there are only the xml ones? [19:55:07] well those are in the dumps/public/wikidatawiki folder, but I see that their public URL is http://dumps.wikimedia.org/other/wikidata/ [19:57:01] notconfusing: addshore was looking into them earlier, I think they're available somewhere atm not in the usual place and he's working on making them available from the usual place [19:57:23] notconfusing: https://phabricator.wikimedia.org/T100885 [19:58:27] YuviPanda, you just have all the answers! :) [19:58:31] Fantastic [20:11:55] PROBLEM - Puppet failure on tools-webgrid-generic-1401 is CRITICAL 60.00% of data above the critical threshold [0.0] [20:17:25] YESSS http://tools.wmflabs.org/lolrrit-wm/ :) [20:17:45] (I kille it, but still) [20:22:37] where's the tool i remember from the past that showed me global edit count of a user across all wikis [20:23:37] mutante: https://tools.wmflabs.org/guc/ ? [20:26:22] valhallasw: yes, thank you! [20:29:28] works, just odd that it claims it found all the edits in "1" project [20:34:14] YuviPanda: No moar failures that I can see. [20:34:31] YuviPanda: But also, not an unheard of issue: https://bugzilla.mozilla.org/show_bug.cgi?id=214625 [20:35:30] There's definitely a bug in the libc resolver, but it's little known because who *has* kilobyte-long lines in their hosts file? :-) [20:36:51] RECOVERY - Puppet failure on tools-webgrid-generic-1401 is OK Less than 1.00% above the threshold [0.0] [20:44:43] Coren: I haven't been able to access beta labs from the office all day. JonR says he can get to it from home though. Any idea what might be wrong? [20:45:07] kaldari: Hm, that seems odd. What host are you trying to reach exactly? [20:45:18] I can reach it okay, but then again I'm also home. [20:45:29] en.wikipedia.beta.wmflabs.org which resolves to 208.80.155.135 [20:45:48] or en.m.wikipedia.beta.wmflabs.org [20:46:45] Yeah, wfm here and from my colo server in ohio [20:46:59] I think you need to talk to OIT [20:47:36] Coren: thanks for checking. I'll ask OIT [20:47:38] Have you checked if others in the office have the issue? [20:55:43] PROBLEM - Puppet failure on tools-webgrid-generic-1402 is CRITICAL 20.00% of data above the critical threshold [0.0] [20:56:55] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1408 is CRITICAL 30.00% of data above the critical threshold [0.0] [20:59:05] PROBLEM - Puppet failure on tools-exec-1210 is CRITICAL 44.44% of data above the critical threshold [0.0] [20:59:21] PROBLEM - Puppet failure on tools-exec-1201 is CRITICAL 20.00% of data above the critical threshold [0.0] [21:00:13] PROBLEM - Puppet failure on tools-webgrid-generic-1404 is CRITICAL 22.22% of data above the critical threshold [0.0] [21:00:33] PROBLEM - Puppet failure on tools-exec-wmt is CRITICAL 40.00% of data above the critical threshold [0.0] [21:00:39] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1407 is CRITICAL 60.00% of data above the critical threshold [0.0] [21:00:45] PROBLEM - Puppet failure on tools-trusty is CRITICAL 60.00% of data above the critical threshold [0.0] [21:01:10] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1208 is CRITICAL 66.67% of data above the critical threshold [0.0] [21:01:10] PROBLEM - Puppet failure on tools-exec-1204 is CRITICAL 66.67% of data above the critical threshold [0.0] [21:02:18] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1202 is CRITICAL 55.56% of data above the critical threshold [0.0] [21:03:14]