[00:53:12] (03PS1) 10Sitic: Log and ignore partial ORES errors [labs/tools/crosswatch] - 10https://gerrit.wikimedia.org/r/229295 [00:53:14] (03PS1) 10Sitic: Fix de translations [labs/tools/crosswatch] - 10https://gerrit.wikimedia.org/r/229296 [00:53:16] (03PS1) 10Sitic: Add ability to blacklist wikis [labs/tools/crosswatch] - 10https://gerrit.wikimedia.org/r/229297 [00:53:18] (03PS1) 10Sitic: Fix edge case bug in flaggedrevs task [labs/tools/crosswatch] - 10https://gerrit.wikimedia.org/r/229298 [01:11:28] Hi. Does anyone know how I can query commons tables from labs? [01:12:02] Specifically, I have a query run against a WP table which gives me a list of images, and I want to which ones are hosted on Cmomons and which ones are non-existent [01:36:17] 6Labs, 10VisualEditor: Investigate and potentially move off NFS in the 'visualeditor' project - https://phabricator.wikimedia.org/T102688#1509386 (10Jdforrester-WMF) [01:37:15] 6Labs, 10VisualEditor, 10VisualEditor-MediaWiki, 10wikitech.wikimedia.org, 7Regression: Pages can't be saved with VisualEditor on wikitech (Uncaught ReferenceError: attrId is not defined) - https://phabricator.wikimedia.org/T104360#1509392 (10Jdforrester-WMF) [03:02:10] huji: `sql commonswiki` should work from tools-dev. See https://wikitech.wikimedia.org/wiki/Help:Tool_Labs/Database for details and instructions for how to get access from another Labs host. [06:27:28] 6Labs, 3Labs-Sprint-107, 3Labs-Sprint-108: Evaluate a 'cluster solution' for use on Tool Labs - https://phabricator.wikimedia.org/T106475#1509788 (10Joe) A few points: - Do you and the users really like the current interface? Do we want users to still ssh into the system? I don't think that's a good idea, a... [06:41:15] 6Labs, 3Labs-Sprint-107, 3Labs-Sprint-108: Evaluate a 'cluster solution' for use on Tool Labs - https://phabricator.wikimedia.org/T106475#1509797 (10yuvipanda) So the actual quarterly goal is to make an alternate way to run webservices available. Currently webservices are run as gridengine jobs in precise or... [06:44:11] 6Labs, 3Labs-Sprint-107, 3Labs-Sprint-108: Evaluate a 'cluster solution' for use on Tool Labs - https://phabricator.wikimedia.org/T106475#1509800 (10yuvipanda) >>! In T106475#1509788, @Joe wrote: > - Do you and the users really like the current interface? Do we want users to still ssh into the system? I don'... [06:44:12] 6Labs, 3Labs-Sprint-107, 3Labs-Sprint-108: Evaluate a 'cluster solution' for use on Tool Labs - https://phabricator.wikimedia.org/T106475#1509801 (10yuvipanda) >>! In T106475#1509788, @Joe wrote: > - Do you and the users really like the current interface? Do we want users to still ssh into the system? I don'... [06:49:11] 6Labs, 3Labs-Sprint-107, 3Labs-Sprint-108: Evaluate a 'cluster solution' for use on Tool Labs - https://phabricator.wikimedia.org/T106475#1509802 (10MoritzMuehlenhoff) Another point that should be evaluated along (maybe it's covered by the "Monitoring" part in the spreadsheet, but I thought I should add it)... [06:50:15] 6Labs, 3Labs-Sprint-107, 3Labs-Sprint-108: Evaluate a 'cluster solution' for use on Tool Labs - https://phabricator.wikimedia.org/T106475#1509804 (10yuvipanda) @MoritzMuehlenhoff what kind of logs do you expect this to have? Just accounting of which jobs ran from which users at what times on what hosts? [07:06:26] 6Labs, 3Labs-Sprint-107, 3Labs-Sprint-108: Evaluate a 'cluster solution' for use on Tool Labs - https://phabricator.wikimedia.org/T106475#1509816 (10MoritzMuehlenhoff) >>! In T106475#1509804, @yuvipanda wrote: > @MoritzMuehlenhoff what kind of logs do you expect this to have? Just accounting of which jobs ra... [07:25:11] 6Labs, 3Labs-Sprint-107, 3Labs-Sprint-108: Evaluate kubernetes for use on Tool Labs - https://phabricator.wikimedia.org/T107993#1509843 (10yuvipanda) 3NEW [07:30:34] 6Labs, 3Labs-Sprint-108: Simple method to have a per-project debian repository - https://phabricator.wikimedia.org/T104194#1509849 (10yuvipanda) Only requirement is that it's easy to add new packages and has super low overhead. [07:31:54] 6Labs, 7Tracking: Create k8s-eval project - https://phabricator.wikimedia.org/T107994#1509851 (10yuvipanda) 3NEW [07:32:03] 6Labs, 7Tracking: New Labs project requests (Tracking) - https://phabricator.wikimedia.org/T76375#1509862 (10yuvipanda) [07:32:05] 6Labs, 7Tracking: Create k8s-eval project - https://phabricator.wikimedia.org/T107994#1509851 (10yuvipanda) 5Open>3Resolved a:3yuvipanda Done [09:19:22] 6Labs, 3Labs-Sprint-107, 3Labs-Sprint-108: Evaluate kubernetes for use on Tool Labs - https://phabricator.wikimedia.org/T107993#1510078 (10yuvipanda) I've created the k8s-eval project and am in the process of setting up a 3 node etcd cluster to start with. See https://wikitech.wikimedia.org/wiki/Hiera:K8s-ev... [09:21:45] 6Labs, 3Labs-Sprint-107, 3Labs-Sprint-108: Evaluate kubernetes for use on Tool Labs - https://phabricator.wikimedia.org/T107993#1510080 (10yuvipanda) Ah! https://wikitech.wikimedia.org/w/index.php?title=Hiera%3AK8s-eval&type=revision&diff=173110&oldid=173108 made it work, which might make sense considering t... [09:27:25] 6Labs, 3Labs-Sprint-107, 3Labs-Sprint-108: Evaluate a 'cluster solution' for use on Tool Labs - https://phabricator.wikimedia.org/T106475#1510084 (10scfc) >>! In T106475#1509797, @yuvipanda wrote: > So the actual quarterly goal is to make an alternate way to run webservices available. […] Why? ("Goal" vs.... [09:32:13] 6Labs, 3Labs-Sprint-107, 3Labs-Sprint-108: Evaluate a 'cluster solution' for use on Tool Labs - https://phabricator.wikimedia.org/T106475#1510100 (10yuvipanda) I don't think a brand new environment will work - we had that opportunity during the toolserver migration but didn't take it (IMO). The only way to a... [09:39:45] 6Labs, 3Labs-Sprint-107, 3Labs-Sprint-108: Evaluate a 'cluster solution' for use on Tool Labs - https://phabricator.wikimedia.org/T106475#1510110 (10yuvipanda) a deprecation schedule would be: # SGE for webservices # webservicemonitor # SGE for continuous jobs # bigbrother # SGE (all of it!) # NFS (eventually) [09:55:11] 6Labs, 3Labs-Sprint-107, 3Labs-Sprint-108: Evaluate kubernetes for use on Tool Labs - https://phabricator.wikimedia.org/T107993#1510175 (10yuvipanda) Ok, after some false starts there's a 3 node etcd cluster in there now \o/ I've clarified https://wikitech.wikimedia.org/wiki/Etcd a little bit, should add a b... [10:06:46] 6Labs, 3Labs-Sprint-107, 3Labs-Sprint-108: Evaluate kubernetes for use on Tool Labs - https://phabricator.wikimedia.org/T107993#1510199 (10yuvipanda) (Docs have been edited with a note of caution + more info). So that was super simple. Next step is to try to setup flannel. [10:30:09] 6Labs, 6operations, 10wikitech.wikimedia.org: Turn on Cirrus replicas for labswiki (wikitech) - https://phabricator.wikimedia.org/T83760#1510257 (10Krenair) [10:37:06] 6Labs, 6operations, 10wikitech.wikimedia.org: Turn on Cirrus replicas for labswiki (wikitech) - https://phabricator.wikimedia.org/T83760#1510263 (10Krenair) @manybubbles, @ottomata: It doesn't look like the other wikis set this... Since wikitech is now part of the normal system, is there still anything to do... [11:04:33] 6Labs, 10Tool-Labs, 3Labs-Sprint-100: Clean up huge logs on toollabs - https://phabricator.wikimedia.org/T98652#1510324 (10scfc) [11:51:18] 6Labs, 3Labs-Sprint-107, 3Labs-Sprint-108: Evaluate a 'cluster solution' for use on Tool Labs - https://phabricator.wikimedia.org/T106475#1510426 (10scfc) You get pitchforks if you force (however gently) all users to migrate to yet another new system with no reasoning and no compensation for the disturbance.... [12:53:26] PROBLEM - Puppet staleness on tools-exec-cyberbot is CRITICAL 55.56% of data above the critical threshold [43200.0] [13:57:33] 6Labs, 10Tool-Labs, 3Labs-Sprint-101, 3Labs-Sprint-102, and 3 others: Puppetize toolserver.org redirect configuration - https://phabricator.wikimedia.org/T85165#1510673 (10coren) It was, indeed, never applied after the tests. D'oh. Fixed. As to whether make it a module or a flat role; it was made a flat... [14:18:43] !log deployment-prep update deployment-restbase01 to openjdk8 T104887 [14:18:46] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Deployment-prep/SAL, Master [14:23:28] Hi I think Quarry servers are faulty again. Queries from 12 hours have not been processed [14:23:36] It happened a few days ago and YuviPanda fixed it [14:24:15] YuviPanda: do we need to drain any tools instances before the 1009 reboot, or is it just spares running there? [14:27:15] In other news, I have found a page which was deleted on July 12, but in the replica database it is still in the page and revisio ntable (rather than teh archive table) [14:27:57] Make that 3 pages. Did we have any replication error then? How can this be fixed? [14:33:12] !log deployment-prep update deployment-restbase02 to openjdk8 T104887 [14:33:15] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Deployment-prep/SAL, Master [14:36:43] andrewbogott: will you restart tools-exec-gift also? :\ [14:38:44] huji, create a ticket on phabricator with all details [14:43:45] jynus: will do [14:44:17] paste the number here when finished [14:45:01] ok, i didn't read the whole email [14:48:25] 6Labs: Replication issue with Fa WP replica - https://phabricator.wikimedia.org/T108032#1510733 (10Huji) 3NEW [14:49:27] jynus: ^ [14:50:29] do you have the full sql? [14:50:46] it will make the investigation easier [14:50:55] of course I do [14:51:25] basically, the SQL with what you get and what you would expect would be very helpful [14:52:05] 6Labs: Replication issue with Fa WP replica - https://phabricator.wikimedia.org/T108032#1510776 (10Huji) [14:52:14] jynus: I pasted it there [14:52:38] thank you, huji, will try to take a look at it, but I cannot right now [14:52:57] jynus: not an urgent matter; hence setting importance to "lowest" [14:57:34] gifti: remind me later, please? I’m in the middle of things :) [14:57:52] andrewbogott: nvm [15:01:29] YuviPanda: @ grid engine replacement: I think scfc is also wondering why it's important to do this Right Now(TM) and not when we actually switch to a platform without SGE such as jessie [15:15:36] $ ssh bastion.wmflabs.org -- ssh: connect to host bastion.wmflabs.org port 22: Connection refused [15:15:44] down of everyone or just me ^ ?? [15:16:23] bd808: works for me [15:16:26] bd808: it’s back now [15:16:49] andrewbogott: cool and thanks [16:08:55] YuviPanda: I can't see my results http://quarry.wmflabs.org/query/894 (There should be ~3000 rows) [16:26:48] (03CR) 10Sitic: [C: 032 V: 032] Log and ignore partial ORES errors [labs/tools/crosswatch] - 10https://gerrit.wikimedia.org/r/229295 (owner: 10Sitic) [16:27:07] (03CR) 10Sitic: [C: 032 V: 032] Fix de translations [labs/tools/crosswatch] - 10https://gerrit.wikimedia.org/r/229296 (owner: 10Sitic) [16:27:19] (03CR) 10Sitic: [C: 032 V: 032] Add ability to blacklist wikis [labs/tools/crosswatch] - 10https://gerrit.wikimedia.org/r/229297 (owner: 10Sitic) [16:27:33] (03CR) 10Sitic: [C: 032 V: 032] Fix edge case bug in flaggedrevs task [labs/tools/crosswatch] - 10https://gerrit.wikimedia.org/r/229298 (owner: 10Sitic) [16:27:50] (03CR) 10Sitic: [C: 032 V: 032] Set red as theme accent color [labs/tools/crosswatch] - 10https://gerrit.wikimedia.org/r/229406 (owner: 10Sitic) [16:29:07] Hi. When I get a list of images used in articles, is there is a way with ONE query to determine which ones are uploaded on the wiki, which ones on Commons, and which ones are nonexistent? [16:29:34] I can write a query that can tell if they are uploaded locally or not. But the "or not" group would include both nonexistent and Commons. [16:32:49] huji: yes, you can join with the commons tables [16:46:10] huji: they seem ok? (quarry) [16:46:37] valhallasw`cloud: it isn't really 'right now' - the plan is to start providing a non-default alternative to webservices this quarter, and nothing more [16:47:36] Dispenser: when did you last run it? some results were lost during the big NFS outage [16:47:49] Aug 1 [16:47:49] valhallasw`cloud: but it also will be running on jessie from scratch [16:54:45] (03CR) 10Sitic: [C: 032 V: 032] Add expand/collapse button for diff preview [labs/tools/crosswatch] - 10https://gerrit.wikimedia.org/r/229407 (https://phabricator.wikimedia.org/T107735) (owner: 10Sitic) [17:08:40] 6Labs, 6operations, 3Labs-Sprint-107, 3Labs-Sprint-108, 3ToolLabs-Goals-Q4: Investigate kernel issues on labvirt** hosts - https://phabricator.wikimedia.org/T99738#1511353 (10Andrew) labvirt1009 is now running 3.16.0-45-generic. A few tentative suspend/resumes suggest that all is well. If labvirt1009 i... [17:13:19] Dispenser: hmm, I'm not sure what's happening. Can you file a bug? [17:13:46] andrewbogott: yeah, I picked that host because it had no tools impact they were all spares [17:14:14] 6Labs, 3Labs-Sprint-107, 3Labs-Sprint-108: allow routing between labs instances and public labs ips (done, document) - https://phabricator.wikimedia.org/T96924#1511370 (10Andrew) Currently, a subset of floating IPs are properly aliased to their internal IPs by the labs pdns recursor. The list of ips affecte... [17:14:22] YuviPanda: it was halted for about 12 hours or so. I guess it choked? [17:14:27] hmm [17:14:30] it shouldn't have [17:14:33] valhallasw`cloud: can you show me how to joi with commons tables? [17:14:34] it's a bit strange [17:14:38] I'l try to take a look later [17:14:48] YuviPanda: thanks [17:15:03] 6Labs, 3Labs-Sprint-107, 3Labs-Sprint-108: Automate population of floating->internal IP aliasing in labs pdns recursor. - https://phabricator.wikimedia.org/T108063#1511379 (10Andrew) 3NEW [17:15:25] YuviPanda: yep, the reboot went fine with a minimum of screaming :) [17:15:30] yeah [17:18:23] 6Labs, 3Labs-Sprint-107, 3Labs-Sprint-108: Evaluate a 'cluster solution' for use on Tool Labs - https://phabricator.wikimedia.org/T106475#1511422 (10yuvipanda) >>! In T106475#1510426, @scfc wrote: > You get pitchforks if you force (however gently) all users to migrate to yet another new system with no reason... [17:20:45] 6Labs, 3Labs-Sprint-107, 3Labs-Sprint-108: Evaluate a 'cluster solution' for use on Tool Labs - https://phabricator.wikimedia.org/T106475#1511435 (10yuvipanda) [17:22:22] leila: hey! [17:22:32] hellooo YuviPanda. [17:22:41] valhallasw`cloud: just suggested that we add a question about how people actually use toollabs. do they develop locally and scp? git? develop remotely? [17:22:51] I think that'll be super useful and allow us to support workflows better [17:22:57] I wonder if that can just be a freeform text field [17:24:17] We can do free form for sure, if we can capture it as couple of structured questions it will be easier on us later though. We can also do a combo. [17:24:46] Will you be in the office YuviPanda? Maybe we can brainstorm for 10 minutes how we can break it down to couple of questions? [17:25:01] leila: yes I'll be in the office in the afternoon [17:25:48] ooki, I have a 2-5pm hacking session, if you're in the office before that, we can chat then, or come and poke me while I'm in that session, YuviPanda. [17:26:16] leila: hah. ok! [17:26:19] I'll try to be there before [17:26:23] depends on how my laundrygoes [17:26:37] sounds good. you don't have that much stuff, I think you can do it, YuviPanda. ;-) [17:26:48] hahaha :D [17:26:48] ok [17:26:49] true [17:52:47] 6Labs, 3Labs-Sprint-107, 3Labs-Sprint-108: Automate population of floating->internal IP aliasing in labs pdns recursor. - https://phabricator.wikimedia.org/T108063#1511601 (10Andrew) [17:52:49] 6Labs, 3Labs-Sprint-107, 3Labs-Sprint-108: allow routing between labs instances and public labs ips (done, document) - https://phabricator.wikimedia.org/T96924#1511600 (10Andrew) [17:52:55] 6Labs, 10Tool-Labs: Move tools to designate - https://phabricator.wikimedia.org/T96641#1511604 (10Andrew) [17:52:57] 6Labs, 3Labs-Sprint-107, 3Labs-Sprint-108: allow routing between labs instances and public labs ips (done, document) - https://phabricator.wikimedia.org/T96924#1511602 (10Andrew) 5Open>3Resolved a:3Andrew [17:54:18] 6Labs: Automate population of floating->internal IP aliasing in labs pdns recursor. - https://phabricator.wikimedia.org/T108063#1511379 (10Andrew) [18:17:33] YuviPanda or Coren: Probably you’ve already explained this a couple of times but… can you tell me why the tools nfs stuff is under project/tools/project but most other projects (e.g. testlabs) are elswhere, under other/testlabs/project [18:17:34] ? [18:18:48] andrewbogott: Sharding. We spun off tools and maps (the bigger ones) so that we don't end up with all I/O to a single filesystem, nor a single huge filesystem that is long and complicated to repair/back up/etc [18:19:04] andrewbogott: If another project becomes big and unwieldy, it'll be spun off too. [18:19:12] Coren: ok. So in terms of my archive script… is it safe to assume that any ex-project-in-need-of-archiving is under... [18:19:13] oh, dang [18:19:19] so I need to check both locations I guess? [18:19:52] No, we can keep the presumption that any sharded project is both big and active. You can deal with others only and you're all set. [18:19:57] yup [18:20:00] you can just deal with others [18:21:42] If we ever end up having to archive a sharded project, then it's completely different anyways because it's a whole fs and not just a directory. If it ever happens (very unlikely) then we'll do it as a special case. [18:24:21] ok, thanks [18:24:53] YuviPanda: while you’re here… I’m slightly confused by your steps 2 and 3 on https://phabricator.wikimedia.org/T104857 [18:25:11] I assume you just mean ‘get a list of volumes that are present but not in the yaml file’? [18:25:39] (I guess I’m just confused about whether when you say ‘active’ you mean ‘present’ or something subtler.) [18:25:49] andrewbogott: ah, yes, I just mean present [18:26:01] so everything not present in the yaml file should be archived [18:26:11] ok, thanks [18:36:13] 6Labs, 6operations: audit labs versus production ssh keys - https://phabricator.wikimedia.org/T108078#1511839 (10RobH) 3NEW a:3RobH [18:37:19] Coren: would you have time to review the SPF & rDNS changes for tools-mailrelay-02 this week? [18:38:24] valhallasw`cloud: I'm pretty sure I will. I'm almost done with the tests I'm running on labstore1001 now, if you want I'll take a look at 'em right afterwards since it's not going to be very long I expect. [18:39:19] Coren: ok! [18:41:13] YuviPanda: no nfs for bastion anymore? [18:41:24] (The list of orphans is LONG which is making me nervous :) ) [18:41:41] andrewbogott: yes, that was the first to be removed during the outage [18:41:46] heh [18:41:57] andrewbogott: I did remove NFS from close to a 100 projects during and right after the outage [18:42:25] ok… my logic looks right, it’s just intimidating [18:43:34] :D [18:51:02] 10Quarry: Large queries results not displaying (Aug 2015) - https://phabricator.wikimedia.org/T108084#1511927 (10Dispenser) 3NEW [18:51:36] YuviPanda: output from a dry run here: https://dpaste.de/kLEX [18:54:18] andrewbogott: nothing bad pops out - most of the ones I checked seem ones I killed [18:54:21] Dispenser: thanks [18:55:06] I kinda blank out when writing reports. Is that enough detail? [18:55:26] valhallasw`cloud: I am going to email you my question so hopefully you can get to it when you haev time [18:55:28] Dispenser: yup! [18:55:38] huji: you should just email the labs-l list so others might be able to help too [18:55:45] huji: eh [18:55:45] good idea [18:55:50] ^ what YuviPanda said [18:56:19] joining with commons is not different from joining to other tables; just use 'commonswiki_p.page' as table name instead of 'page' [18:56:21] huji: :) in general it's a good idea to mail a list or something rather than an individual, unless it's got private info or security things [18:56:36] but I don't know where exactly file info is in the database [18:56:55] YuviPanda: the traef off is to annoy many people versus just one, when you have a dumb question! [18:57:37] you won't annoy people with questions on mailing lists -- that's what they are for [18:57:41] +1 [18:57:55] not with questions; but with "dumb" questiosn! [18:58:00] sending personal mails, on the other hand, is frowned upon [18:58:09] and on't give me the "there is no such thing .." response! [18:58:12] I don't believe in it [18:58:19] because you're basically forcing that specific person to respond [18:58:20] :P [18:58:25] huji: yeah, such questions on to labs-l > such questions to individuals [18:58:29] Old cliche (old because true): The only dumb question is the one you don't ask. :-) [18:58:50] and valhallasw`cloud you just answered my Q! commonswiki_p is the answer. thanks a lot [19:03:38] 6Labs, 6Release-Engineering: Cleanup quaity-assurance labs project - https://phabricator.wikimedia.org/T108087#1511967 (10yuvipanda) 3NEW [19:08:46] 6Labs, 6Release-Engineering: Cleanup quaity-assurance labs project - https://phabricator.wikimedia.org/T108087#1511987 (10dduvall) I don't //think// it's used, but @zeljkofilipin might know better. [19:22:22] 6Labs, 10Analytics, 10Labs-Infrastructure, 3Labs-Sprint-108, 5Patch-For-Review: Set up cron job on labstore to rsync data from stat* boxes into labs. - https://phabricator.wikimedia.org/T107576#1512028 (10Ottomata) @mark or @akosiaris or @faidon, we'll need a hole punched in the Analytics VLAN ACL for th... [19:49:09] 6Labs, 6operations: audit labs versus production ssh keys - https://phabricator.wikimedia.org/T108078#1512121 (10Krenair) According to my script, both @mvolz and @jdouglas have labs keys in production. [20:45:36] 6Labs, 6operations: audit labs versus production ssh keys - https://phabricator.wikimedia.org/T108078#1512302 (10Mvolz) Whoops. You can revoke that key, I haven't needed production access in a while. https://gerrit.wikimedia.org/r/190405 was the change. [21:07:32] 6Labs, 6operations, 5Patch-For-Review: audit labs versus production ssh keys - https://phabricator.wikimedia.org/T108078#1512371 (10Krenair) Per request, here's the script. Stick it in modules/ldap/files/scripts (operations/puppet clone) on a machine which is connected to labs LDAP, and run it from that dire... [21:15:38] 6Labs, 6operations, 5Patch-For-Review: audit labs versus production ssh keys - https://phabricator.wikimedia.org/T108078#1512384 (10RobH)