[00:00:12] it's not hooked up to prod centralauth [00:00:15] I use a unique password, but my usual username. [00:00:25] *nod* [00:01:12] quiddity: works for me logged in. Do you have any weird gadgets or anything? [00:01:34] I was experimenting with the mobile-frontend-experimental build, at the time... [00:02:52] quiddity: may be a real bug. #wikimedia-releng would be a good place to show for help finding log messages [00:03:05] s/show/shop/ [00:03:12] will do, thanks. [00:19:21] (all fixed. jdlrobson reset my preferences, which fixed it) [03:18:10] legoktm: is redis still dead? [03:20:43] YuviPanda: I haven't tried restarting wikibugs or grrrit-wm [03:55:10] legoktm: hmm [03:56:32] !log tools restarted redis server, it had OOM-killed [03:56:41] Logged the message, Master [03:57:10] ls [03:59:09] <^demon|away> YuviPanda: Can you shoot something on iridium for me again? [03:59:10] <^demon|away> https://phabricator.wikimedia.org/T92360 [03:59:19] ^demon|away: sure [03:59:24] <^demon|away> thx [04:03:28] ^demon|away: done yo [04:03:46] <^demon|away> thx bro [04:12:42] YuviPanda: do you know about https://phabricator.wikimedia.org/T92313?workflow=create ? [04:13:03] legoktm: I’m redoing redis, i’ll brb in a bit [04:13:28] PROBLEM - Host tools-redis is DOWN: CRITICAL - Host Unreachable (10.68.17.132) [04:14:38] yeah, I know... [04:14:47] !log tools kill tools-redis instance, upgrade to trusty while it is down anyway [04:14:52] Logged the message, Master [04:15:52] YuviPanda: oh wat I just started grrrit and wikibugs [04:15:53] lol [04:17:26] RECOVERY - Host tools-redis is UP: PING OK - Packet loss = 0%, RTA = 0.87 ms [04:28:18] !log tools tools-redis is back now, as trusty and hopefully slightly more fortified [04:28:19] legoktm: ^ [04:28:22] Logged the message, Master [04:28:59] PROBLEM - Puppet failure on tools-redis is CRITICAL: CRITICAL: 12.50% of data above the critical threshold [0.0] [04:31:01] !log tools.lolrrit-wm restarted grrrit-wm and gerrit-to-redis [04:31:05] Logged the message, Master [04:31:24] legoktm: this also moves us to a much more recent version of redis... [04:33:58] RECOVERY - Puppet failure on tools-redis is OK: OK: Less than 1.00% above the threshold [0.0] [04:36:29] YuviPanda: woot [04:36:57] !log tools.wikibugs restarted both wb2-phab and wb2-irc [04:37:01] Logged the message, Master [06:17:30] legoktm: so wikibugs is fine, grrrit-wm is still dead [07:55:02] (03PS1) 10Yuvipanda: Directly interface with gerrit stream-events [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/195861 [07:55:08] (03CR) 10jenkins-bot: [V: 04-1] Directly interface with gerrit stream-events [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/195861 (owner: 10Yuvipanda) [07:56:02] (03PS2) 10Yuvipanda: Directly interface with gerrit stream-events [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/195861 [08:20:22] 10Tool-Labs, 5Patch-For-Review: Puppetize LVM extension for tools redis - https://phabricator.wikimedia.org/T91370#1108662 (10yuvipanda) Done differently in I976f4c29d6730bd563ae6fb7a33c86b6249705d2, has redis data in /srv instead. [08:20:52] 10Tool-Labs, 5Patch-For-Review: Puppetize LVM extension for tools redis - https://phabricator.wikimedia.org/T91370#1108664 (10yuvipanda) 5Open>3Resolved a:3yuvipanda [10:36:45] hey YuviPanda, do I have to do anything to get access to beta.wmflabs.org? [10:37:17] werdna: hey! I [10:37:22] werdna: I guess you could ask the releng people... [10:37:44] werdna: by ‘access’ you mean ssh or? [10:37:53] something like that [10:38:04] the user research people want me to import something onto en.wikipedia.wmflabs.org [10:38:08] ah [10:38:16] you… probably should co-ordinate that with RelEng [10:38:22] I can add you already, however. [10:38:34] I'll ask [10:38:41] alright. [10:38:50] * YuviPanda desists [12:12:21] Can someone tell me how to disable big brother. I cannot for the life of me get it to stop. [12:12:50] It's going on a wild rampage on xTools. I think something about big brother got borked. [12:14:10] YuviPanda, [12:14:41] hi Cyberpower678 [12:14:44] what do you mean by wild rampage [12:14:49] I see only one job on https://tools.wmflabs.org/?status [12:14:51] for it [12:15:34] It keeps rebooting xTools. Recently we've had our mailbox spammed with 100+ emails from big brother. [12:16:42] We want to switch to our own web watcher which is a little more advanced than big brother. [12:16:52] So Itried disabling bigbrother [12:17:02] YuviPanda, ^ but it won't go away. [12:17:04] Cyberpower678: oh, hmm. if it isn’t an active issue, I suggest filing a bug and poking Coren when he’s around. [12:17:14] so that sounds like a bug (can’t opt-out of bigbrother) [12:17:20] (and a terrible one) [12:17:27] Cyberpower678: however, I also thought you were moving xtools to its own VM? [12:17:40] I tried deleting the file, and then simply blanking it, but it just keeps coming back. [12:18:00] We are, but we're slow with setting it up. [12:18:05] fair enough [12:18:18] We are learning the environment. [12:18:20] Cyberpower678: alright, so 1. delete the file, and I’ll restart bigbrother itself. and we’ll see if that helps? [12:18:32] I have no access right now [12:18:55] alright. I’ll wait till you do have access :) [12:18:58] Could you do it for me for xtools, xtools-ec, and xtools-articleinfo [12:19:03] Ok. [12:19:06] sorry, neck deep in some betalabs stuff [13:35:23] 10Tool-Labs: Fix and clean up generation of ssh_known_keys - https://phabricator.wikimedia.org/T92379#1109287 (10scfc) 3NEW a:3scfc [14:14:35] 10Tool-Labs: Tool Labs: Provide anonymized view of the user_properties table - https://phabricator.wikimedia.org/T60196#1109425 (10coren) a:3coren [14:24:20] 10Tool-Labs: Tool Labs: Provide anonymized view of the user_properties table - https://phabricator.wikimedia.org/T60196#1109452 (10coren) The one thing I am missing to implement this at this time is either (a) the whitelist of properties to collate or (b) the method by which that whitelist is generated. [14:25:30] 10Tool-Labs: Audit redis usage on toollabs - https://phabricator.wikimedia.org/T91979#1109463 (10coren) a:3coren [14:25:38] 10Tool-Labs: Audit redis usage on toollabs - https://phabricator.wikimedia.org/T91979#1100333 (10coren) p:5Triage>3High [14:28:56] YuviPanda: After data collection, the biggest source of redis use is now known. It's not clear, because of limited data, whether it's also the bigger user of keys but it's likely (takes roughly 50% of usage on its own) [14:29:36] Coren: ah. who is it? [14:29:43] YuviPanda: anomiebot [14:29:47] Coren: oooh, interesting. [14:29:59] Coren: btw, I re-imaged the box to be trusty + set limit at 12G, so OOM killer won’t get to redis [14:30:06] * YuviPanda pokes anomie [14:30:11] That'll help. [14:30:11] Coren: do they have ttl set? [14:30:38] * anomie didn't think anomiebot was that big a user of redis [14:30:55] YuviPanda: Not that I can see. [14:31:00] Coren: how are you measuring this, btw? [14:31:02] anomie: There might be a bug? [14:31:04] I also switched from aof to rdb [14:31:21] YuviPanda: Monitoring wire traffic. Turns out to have been the most reliable way. :-) [14:31:26] ah [14:31:35] As far as keys, all AnomieBOT's keys should be prefixed with "AnomieBOT:" + a random-looking string. [14:31:48] anomie: We can't actually /list/ keys. :-) [14:32:14] Coren: There's no special admin access to list them? [14:32:31] anomie: But the biggest simple change you could make is set a TTL on the keys if they have limited lifetime. [14:33:35] (I emailed a reminder to labs-l) [14:53:04] Coren, YuviPanda: I just checked AnomieBOT's code, and it turns out a timeout already is being set on just about everything where a timeout would make sense. Mostly 86400 seconds, 7200s for redirect info (i.e. title=>targets and target=>title), and interwiki/namespace prefix lists for 604800s (7 days). The only thing that's not already setting a timeout that might deserve one is the cache of what aliases there are for "#REDIRECT". [14:53:46] anomie: hmm, fair enough. [14:54:06] Coren: we should look at key sizes (via rdb parsing, I think?) since our problem isn’t throughput but storage... [14:54:09] well, ‘size’ rather. [14:54:25] The other things that don't are the keys behind https://tools.wmflabs.org/anomiebot [14:54:27] anomie: how many places do you have where it doesn’t make sense to set a timeout? [14:54:35] YuviPanda: Heh, I just answered that [14:54:39] heh [14:55:15] anomie: nice :) have you switched to trusty already, btw? :) [14:55:37] YuviPanda: For web stuff, yes. For grid jobs, probably not. [15:01:01] anomie: hmm, ok. it doesn’t make as much of a difference in grid jobs yet. Maybe in several more months... [15:02:07] I'm not specifying any release, and I don't anticipate anything breaking since it hasn't when I run stuff locally for testing. [15:03:56] anomie: cool [15:03:59] anomie: thanks :) [15:06:56] legoktm: https://gerrit.wikimedia.org/r/#/c/195861/ gets rid of redis dependency for grrrit-wm :) [15:08:39] legoktm: I’m going to merge now, and hopefully it won’t break [15:08:52] (03CR) 10Yuvipanda: [C: 032] Directly interface with gerrit stream-events [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/195861 (owner: 10Yuvipanda) [15:12:26] !log tools.lolrrit-wm yuvipanda: Deployed 163bac22f003c83443c69d03efed65d8c358ab8c Directly interface with gerrit stream-events [15:12:30] Logged the message, Master [15:12:42] grrrit-wm: why haven’t you restarted?! [15:12:43] aaah [15:12:44] :D [15:12:45] there we go [15:14:47] YuviPanda: ooh! [15:15:16] legoktm: aargah, it isn’t actually working... [15:15:19] worked fine when I was testing... [15:15:20] oh [15:15:20] wait [15:15:22] config change [15:20:21] YuviPanda: Having fun? [15:20:27] James_F: yup [15:20:40] James_F however, once it comes back up, it should be more reliable because it has a *lot* less moving parts [15:20:47] Nice. [15:21:24] YuviPanda: Feel like pushing https://gerrit.wikimedia.org/r/#/c/195569/ too whilst you're there? [15:21:40] James_F: I shall, after I bring it back to life [15:21:45] Ta. [15:38:04] (03CR) 10Yuvipanda: "It was precise's fault, wasn't it?" [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/195861 (owner: 10Yuvipanda) [15:38:12] of course it was :D [15:39:11] (03PS1) 10Yuvipanda: Run on trusty nodes [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/195921 [15:39:55] (03CR) 10Yuvipanda: [C: 032] Run on trusty nodes [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/195921 (owner: 10Yuvipanda) [15:39:58] (03Merged) 10jenkins-bot: Run on trusty nodes [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/195921 (owner: 10Yuvipanda) [15:40:11] (03PS3) 10Yuvipanda: Add some obvious repos to -releng (and scap to -operations too) [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/195569 (owner: 10Jforrester) [15:42:54] (03CR) 10Yuvipanda: [C: 032] Add some obvious repos to -releng (and scap to -operations too) [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/195569 (owner: 10Jforrester) [15:42:58] (03Merged) 10jenkins-bot: Add some obvious repos to -releng (and scap to -operations too) [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/195569 (owner: 10Jforrester) [15:43:08] James_F: ^ done [15:44:09] !log tools.lolrrit-wm yuvipanda: Deployed 4d0fbb28c9ddf9dec60c03e1a1084fc3ad42cd60 Add some obvious repos to -releng (and scap to -operations too) [15:44:13] Logged the message, Master [16:05:51] YuviPanda: Thanks! [16:48:25] 10Tool-Labs: Provide a link to source code of tools.wmflabs.org - https://phabricator.wikimedia.org/T92394#1109831 (10Petrb) 3NEW [16:49:25] 10Tool-Labs: Provide a link to source code of tools.wmflabs.org - https://phabricator.wikimedia.org/T92394#1109839 (10Petrb) [17:02:42] YuviPanda: BugBro is doing it again. I'm getting a flood of restart emails. [17:02:59] Coren: ^ [17:03:17] And it should technically be off. [17:03:18] I still see only one set of jobs though [17:03:39] I blanked the .bigbrotherrc files [17:06:40] It still shouldn't be fidgeting with xTools. [17:07:17] Wait, flood of restart email or flood of "I couldn't start your tool" email? Because it shouldn't even be /trying/ more than three times in a 24h window [17:07:29] Coren: he’s trying to opt out of bigbrother and it wont’ let him [17:08:37] Which tool is that? [17:10:33] Both and what YuviPanda said. For xTools, xTools-ec, and xTools-articleinfo [17:10:56] I'm also getting a failed to start services messages. [17:12:00] I have to go for now. [17:13:23] Coren: ^^ [17:13:41] CP678|iPhone: I'm gonna be looking into it. It's clearly a bug. [17:14:19] :-) [17:14:23] Cya [17:23:38] YuviPanda: I was wondering if you'd like to be a GSoC mentor for the upcoming round. [17:24:20] Niharika: probably, assuming I can find the right project... [17:24:27] Niharika: actually, I’ll be moving continents at that time, so maybe not? [17:25:02] YuviPanda: Ah. Okay. I am mentor-hunting for https://phabricator.wikimedia.org/T5525 Any ideas who'd be interested? [17:25:56] Niharika: ah, probably not me. legoktm or someone in mediawiki-core maybe. It also might be too big a project for GSoC [17:26:22] YuviPanda: By the way, congratulations! :D [17:26:28] Niharika: :) ty [17:26:34] yeah, that might be a little large for a GSoC project... [17:26:52] maybe if you scrap the UI part of it and just do backend implementation [17:27:59] legoktm: Okay. We could do that. There's interest in pursuing this, AFAICS. [17:28:22] Niharika: there's definitely interest, but it's a *hard* problem. [17:29:38] legoktm: Right. [18:00:45] YuviPanda: The redis pubsub stuff doesn't fill the server does it? [18:01:00] a930913: pubsub shouldn’t, no. [18:01:11] a930913: although me and legoktm are thinking of setting up a tools-redis-pubsub just for pubsub [18:01:14] (and queues) [18:02:25] YuviPanda: But migration :p Is there such a specification difference? [18:02:27] 10Tool-Labs-tools-Other: Non technical: "Database reports" - status query - https://phabricator.wikimedia.org/T92353#1110265 (10Aklapper) [18:02:57] a930913: no. mostly we’ll disallow normal key activities on the pubsub / queue redis, so it won’t die because of overloaded caches :D [18:03:06] but that might not be necessary. let’s see how the latest fortifications hold up [18:03:57] YuviPanda: redis-stable and redis-experimental? :p [18:04:06] nope :P [18:04:14] redis-cache and redis-queue maybe [18:04:49] :D [18:27:14] 10Tool-Labs-tools-Other: Non technical: "Database reports" - status query - https://phabricator.wikimedia.org/T92353#1110357 (10Legoktm) https://github.com/mzmcbride/database-reports/issues/13 and https://github.com/mzmcbride/database-reports/issues/14 covers why reports aren't running currently. [18:45:24] Tool is down: https://tools.wmflabs.org/kmlexport/?project=de&linksfrom=1&article=Cham&redir=bing [18:47:07] !log deployment-prep created deployment-mediawiki03 [18:47:12] Logged the message, Master [18:47:50] Weiroutsi: I just rebooted it [18:56:20] @YuviPanda: Thank you! [18:56:56] yw :) [19:28:07] 10Tool-Labs: Provide a link to source code of tools.wmflabs.org - https://phabricator.wikimedia.org/T92394#1110600 (10scfc) [19:28:08] 10Tool-Labs: Provide source/repository link on https://tools.wmflabs.org - https://phabricator.wikimedia.org/T86431#1110601 (10scfc) [19:39:54] 6Labs, 10Wikimedia-Labs-wikitech-interface: Use a Puppet ENC to define which classes are included in which nodes (in Labs) - https://phabricator.wikimedia.org/T85279#1110728 (10yuvipanda) If we make it query LDAP, we can actually have this be an addition to the LDAP terminus for labs. Maybe even parameterize i... [19:40:16] 6Labs, 10Wikimedia-Labs-wikitech-interface: Use a Puppet ENC to define which classes are included in which nodes (in Labs) - https://phabricator.wikimedia.org/T85279#1110729 (10yuvipanda) a:3yuvipanda [20:05:45] 10Tool-Labs-tools-Other: Non technical: "Database reports" - status query - https://phabricator.wikimedia.org/T92353#1110813 (10Haruth) Thanks for the info. Will leave well alone on the basis that things are happening behind the scenes :) [20:57:23] 6Labs, 7Puppet: Puppet Trebuchet provider compares refname with commit sha1 and does NOT refresh the git repo! - https://phabricator.wikimedia.org/T77002#1111012 (10chasemp) p:5High>3Normal [20:58:06] did the process to create instances change recently? [20:58:18] I see no admin links on Special:NovaInstance [20:58:21] (03PS1) 10Yuvipanda: Use ssh::userkey for root as well [labs/private] - 10https://gerrit.wikimedia.org/r/196019 [20:58:55] tgr: nope. if you are seeing blanks, try logging out and back in [20:59:54] YuviPanda: that worked, thanks [21:00:00] tgr: yw! [21:00:13] * YuviPanda goes back to his IT Crowd-esque ioffice [21:08:35] YuviPanda: Any idea why queries would be much much slower? [21:08:55] https://tools.wmflabs.org/multichill/queries/wikidata/noclaims_nlwiki.txt took 43m48.979s, it's usually a comple of minutes [21:09:08] multichill: ugh, no idea... [21:09:25] Do we keep graphs of the load of the db servers? [21:09:31] yeah... [21:09:38] looking at them now [21:10:11] I ran this yesterday evening [21:11:17] multichill: hmm, so it’s 3AM and my brain isn’t working :( [21:11:26] can you file a bug? maybe Coren can take a look (cc springle as well) [21:11:35] and I’ll poke around tomorrow if nobody else gets to it first [21:11:36] sorry [21:11:49] Just wondering [21:12:12] I run that query every once in a while, I'll see how long it takes next time [21:12:14] multichill: the graphs are public at ganglia.wikimedia.org [21:12:15] heh [21:12:20] Right [21:16:19] Take it easy YuviPanda [21:17:18] I should :) [22:01:02] * Coren reads scrollback [22:01:09] (Out to dinner, sorry) [22:15:41] are nfs failures in labs puppet runs normal? [22:15:43] "Execution of '/bin/mount /data/project' returned 32: mount.nfs: mounting labstore.svc.eqiad.wmnet:/project/multimedia/project failed, reason given by server: No such file or directory" [22:15:54] or is that project misconfigured somehow? [22:17:37] tgr: I believe mounting can fail on the first run. A puppet re-run (or reboot) should fix it iirc [22:18:05] this happens on every run (did not try rebooting) [22:18:14] not really a problem, just wondering [22:18:32] reboot I think? [22:29:33] There are two possibiilties: Either your instance was booted faster than NFS server was ready - can happen on new instances - or you forgot to actually turn 'project storage' on in the project settings. [22:29:48] But the latter, I think, was changed to on by default some time ago. [22:38:06] must have been a timing issue [22:38:11] rebooting helped, thanks