[00:59:48] hey, humans [01:00:05] there's a problem with the GLAMOROUS tool [01:00:45] specifically, when it's supposed to link to wikidata, it links to "wikidata.wikipedia.org" [02:10:26] 6Labs: Process for user backups - https://phabricator.wikimedia.org/T85608#1143928 (10coren) Jessie saves. Snapshots are back, and working, but not yet user-accessible (design work will be needed, perhaps automount?) At the very least, once we turn the feature on, admins can recover user files. [02:53:30] Coren: secuirty question about user backups, is access limited to the projects that a user has access to? [02:54:20] When it'll be automated, you can only see the part of the snapshots that matches the project yes. [03:27:52] !install [04:16:03] Coren: I also usually add legoktm to help me catch python style issues. I can help.too :) [04:16:14] hello [06:35:34] PROBLEM - Puppet failure on tools-trusty is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [06:41:55] RECOVERY - Free space - all mounts on tools-dev is OK: OK: All targets OK [06:42:41] PROBLEM - Puppet failure on tools-exec-01 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [07:17:37] RECOVERY - Puppet failure on tools-exec-01 is OK: OK: Less than 1.00% above the threshold [0.0] [10:42:42] (03PS1) 10Steinsplitter: Adding new tags for #wikimedia-commons-tech [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/199242 [12:18:44] 6Labs: Storage capacity & redundancy expansion (tracking) - https://phabricator.wikimedia.org/T85604#1144764 (10coren) So here is the current picture: * The new filesystem on thin volumes is in place and contains a copy of the live filesystem, but rsync is unable to keep up with the rate of change so actual dow... [12:20:59] YuviPanda|zz: Sorry about the alert. I ended up finishing my day well past 0h to make sure all my ducks are in order for a meeting with Mark today. :-) [12:21:20] I have a few hours now, I'll use them for that. [12:26:02] 6Labs: Upgrade labstore2001 to Jessie - https://phabricator.wikimedia.org/T93740#1144787 (10coren) 3NEW a:3coren [12:30:21] 6Labs: Replicate data between codfw and eqiad - https://phabricator.wikimedia.org/T85606#1144798 (10coren) This is ready to start; the replicated copy will not be the live one until the filesystem switch needed for T85608 is done but it does not depend on it. What //is// a dependency is to finish tracking down... [12:31:21] 6Labs: Process for user backups - https://phabricator.wikimedia.org/T85608#950693 (10coren) This is now working on the new (not live) filesystem. We are pending only the switch. [12:41:04] PROBLEM - Puppet failure on tools-bastion-01 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [13:01:08] RECOVERY - Puppet failure on tools-bastion-01 is OK: OK: Less than 1.00% above the threshold [0.0] [13:44:17] Coren: Is there a way I can see the db load graphs? [13:46:50] a930913: Our graphite is restricted to staff (or maybe NDA signers also, I'd need to check). Would you like me to pull some numbers out for you? [13:48:20] Coren: Are there any obvious loads? Like a rising edge and a falling edge? [13:48:30] * Coren checks [13:48:36] In, say, the last half day. [13:48:49] <^d> Coren: nda also, yes [13:49:00] <^d> (also: ganglia might be useful here?) [13:49:29] ^d: I couldn't see anything that looked like it would be the db. [13:50:33] a930913: In Miscelaneous quiad, you want labsdb* [13:50:39] eqiad* [13:50:47] Depending on which you connect to. [13:51:22] Wait, that has all but 1001-1003. They must be elsewhere. [13:51:26] * Coren hunts them down. [13:52:51] Ah! In mysql eqiad. [13:53:31] http://ganglia.wikimedia.org/latest/?r=day&cs=&ce=&m=cpu_report&c=MySQL+eqiad&h=labsdb1001.eqiad.wmnet&tab=m&vn=&hide-hf=false&mc=2&z=small&metric_group=ALLGROUPS [13:53:38] a930913: ^^ Might be all you need [13:56:28] Coren: \o/ [13:57:04] Coren: 100{1,3} are balanced between? [13:59:22] a930913: No, which you hit depends on what DB you are working with [13:59:40] Coren: wikidatawiki [14:00:38] a930913: 1003 then [14:01:47] Coren: ^d: Danke. [14:01:56] <^d> yw [14:02:07] np [14:03:13] Now to see if I can see myself :p [14:04:01] Coren: The theory is, if I can't see myself affecting it, I don't need to worry about overloading, right? :p [14:05:01] a930913: It's not optimal, but it's a good starting point. [14:05:31] Coren: I'm running other diagnostic too. [14:05:45] Merged a whole load of query too. [14:06:18] Instead of each Q, it grabs everything for the item in one hit. [14:06:34] Sounds like a good optimization. [14:06:35] Not sure where the balance is though. [14:07:24] I.e. do I grab more data that I don't need in one go, or do I get many datas of what I want? [14:07:34] And Springle hasn't been on here :( [14:08:26] In theory, I could make a super mega large query to grab everything I needed in one go. [14:09:00] But would that be optimal. [14:10:16] I... don't know. In the absence of our DBA you can always just measure. :-) [14:11:32] Coren: I have a queue system that buffers the requests which means I can limit them. How do I work out how many jobs I can submit at once? (Assuming much CPU/memory/disk bound.) [14:12:20] As a sysadmin, my answer must be "as little as will still get you your results in a reasonable amount of time". :-) [14:12:32] Be conservative at first. [14:12:48] Coren: These are web requests. [14:13:21] "Conservative at first" I.e. increase each day until complaint? :D [14:13:55] No, it means start with something small (like 2-3 at most) and increase when you see requests piling up regularily. :-) [14:15:00] :) [14:17:11] The good news is they take a fraction of the memory I thought they would :) [14:24:00] 6Labs: known_host key updating on virt* (and possibly elsewhere) - https://phabricator.wikimedia.org/T93748#1144980 (10Andrew) 3NEW [15:37:37] Coren: hey! cool :) did you manage to get it done in the meantime? [15:37:54] YuviPanda: In meeting with Mark, will talk to you shortly. [15:38:00] Coren: ah cool [15:46:40] Coren: When you finish your meeting, what server do I look at for the /data/project/ nfs load? [15:46:54] Unless YuviPanda, do you know? [15:46:55] a930913: labstore1001 [15:46:59] :) [15:48:35] a930913: That one is in "Labs NFS cluster eqiad" [15:48:48] Yeah, found it :) [15:49:54] Good news then. I can't see me. [15:50:32] Though someone is really doing some IO every five minutes :o [15:54:06] a930913: fwiw, I noticed IO on labs was really bad yesterday around this time. wa of like 95 doing a bunch of apt-get installs... [15:56:29] thcipriani: Not me, I was SQL bound then :p [15:56:56] Now I'm CPU bound. parsing JSON. [16:05:55] Coren: should I start reviewing https://gerrit.wikimedia.org/r/199267 now or wait for you to take out the WIP tag? [16:13:31] YuviPanda: Hi [16:13:59] Did not find andrewbogott online for some days now. [16:14:43] Vivek: It’s because I’m in North America :) [16:14:58] But I’m in a meeting now, and for a long time into the future :( [16:15:17] andrewbogott: always a reason :p [16:16:13] Vivek: I can multitask as long as you don’t mind my having a very short attention span [16:16:32] andrewbogott: So I found you, good. [16:16:39] :) [16:17:16] Vivek: also, my bouncer is always here so you can leave questions or pm me when I’m AFK [16:18:48] sure. [16:35:28] (03PS1) 10Southparkfan: Replace absolute paths with relative paths [labs/tools/WMT] - 10https://gerrit.wikimedia.org/r/199281 [16:44:06] Change on 12wikitech.wikimedia.org a page Nova Resource:Tools/Access Request/Glaisher was created, changed by Glaisher link https://wikitech.wikimedia.org/wiki/Nova+Resource%3aTools%2fAccess+Request%2fGlaisher edit summary: Created page with "{{Tools Access Request |Justification=Host some tools |Completed=false |User Name=Glaisher }}" [16:45:56] YuviPanda: Go ahead and start reviewing now - I need to do a notification before I take some bandwidth to make the new filesystem the live one for several days anyways. [16:48:07] * Coren braces of the scathing review. "Your python reads like perl!" [16:48:08] :-) [17:00:00] Why is the Phabricator Diffusion puppet repo behind Gitblit and Gerrit? [17:00:59] Negative24: I am not sure, I think mostly because nobody uses Diffusion... [17:01:42] Nobody should because Diffusion is referencing Gerrit they should exactly the same [17:01:51] heh [17:02:06] They aren't even mirrors they have the same storage [17:02:16] github? [17:02:18] oh [17:02:25] no github is a mirror [17:02:32] Negative24: ^d is the man you are looking for, maaabe [17:02:55] just a little confusing referencing files that are different all around [17:03:00] <^d> Diffusion polls on its own, it's not a push system [17:03:22] exactly. Its setup so that Phabricator only looks for new changes [17:03:32] its read-only to the gerrit repos [17:04:05] at least that's how its supposed to be setup until #gerrit-migration [17:07:27] its also not helpful that Diffusion uses phab usernames and Gitblit uses real names and then Gerrit uses LDAP usernames :) [17:11:58] 6Labs, 5Patch-For-Review: Clarify public/private role for holmium (aka labs-ns2) - https://phabricator.wikimedia.org/T93639#1145566 (10Andrew) [17:16:42] <^d> Negative24: My suggestion is to forget gitblit :p [17:18:57] ^d: i only use it in these type of situations. its been acting real slow anyways [17:19:14] <^d> it's always slow [17:20:22] github also reports that Diffusion is behind :( [17:21:07] just use github :) [17:21:11] that’s what I do [17:21:13] gitblit is terrible [17:27:49] YuviPanda: i'm not worrying about gitblit because i know how bad it is but I'm trying to figure out phabricator [17:28:03] aaaaah, I see. that makes sense. [17:28:06] thats what i have been working with [17:28:10] Negative24: chasemp might also know [17:30:38] anybody knows what could cause "DB connection error: Can't connect to local MySQL server through socket '/var/run/mysqld/mysqld.sock' (111) ()" on labs machine in the middle of a script run? mysql is fine now, but looks like something killed it in the middle of a run... [17:30:53] that's a permission issue [17:30:53] diffusion: in general terms it's behind because it hasn't pulled in new things :) [17:31:05] in practical terms it does a rolling pull scheduled based on how active it sees a repo [17:31:14] and we haven't worried about a bit of delay [17:31:31] chasemp: Is that related to this: https://secure.phabricator.com/book/phabricator/article/diffusion_updates/ [17:31:42] yes basically [17:41:45] 6Labs, 5Patch-For-Review, 7Puppet: puppet-run is confused by stale lock files - https://phabricator.wikimedia.org/T92766#1145775 (10BBlack) 5Open>3Resolved a:3BBlack I'm assuming the fixup from a week ago worked for labs as well, closing. Re-open if not! :) [17:44:39] Negative24: also I noticed a bug so thanks :) [17:46:42] chasemp: Which? [17:47:34] a permission issue not an upstream issue [17:48:02] CC me on it. I also found a permission issue this morning so I wonder if it is the same thing [17:50:28] Negative24: https://phabricator.wikimedia.org/rOPUP5b535a7a0915a9687d99057010c843c279f6e597 [17:51:20] chasemp: Yes that is what I found but not just scripts/repository the whole scripts dir needs to be owned by phd [17:51:42] can you point me to why say setup tools needs to be owned by phd? [17:51:45] you only cleared up the repo management files but there are many more that still error out [17:52:14] yeah I was looking for doc but the phab setup docs aren't very clear [17:52:50] it goes along the lines of "git clone and be happy. oh and here are some docs about how to use it :)" [17:52:58] some things like util and user in scripts I havne't seen a reason for what you are suggesting [17:53:08] opposed at this point to doing that without knowing why [17:53:18] but yes it's all loosely documented [17:55:08] I guess its fine for the moment but as we go forward exploring more phab apps we may find more but yes the repo management was all that I was impacted by for the moment [17:55:31] the only other thing I am aware of is the ssh / repo hosting [17:55:36] but that will be a big deal either way [17:59:32] chasemp: https://gerrit.wikimedia.org/r/#/c/198769/ [18:03:28] gtg [18:08:26] chasemp: Looks like it resolved itself (or did you do something?) [18:08:43] to what are you referring? [18:14:18] I've added a section to Help:Tool Labs: https://wikitech.wikimedia.org/wiki/Help:Tool_Labs#Setting_up_code_review_and_version_control Complain if anything looks wrong [18:14:53] (new section being: https://wikitech.wikimedia.org/wiki/Help:Tool_Labs#Enabling_simple_public_HTTP_access_to_local_Git_repository) [18:17:55] 6Labs: Make a labs_storage module - https://phabricator.wikimedia.org/T93781#1146005 (10coren) 3NEW [18:18:26] 6Labs: Make a labs_storage module - https://phabricator.wikimedia.org/T93781#1146015 (10yuvipanda) This counts for labstore1003 too, right? [18:22:17] Change on 12wikitech.wikimedia.org a page Nova Resource:Tools/Access Request/Glaisher was modified, changed by Tim Landscheidt link https://wikitech.wikimedia.org/w/index.php?diff=149856 edit summary: [18:27:16] * Negative24 hates Comcast [18:28:14] Who doesn't [18:28:36] If I get disconnected for no apparent reason blame xfinity [18:30:26] 6Labs, 5Patch-For-Review: Replicate data between codfw and eqiad - https://phabricator.wikimedia.org/T85606#1146053 (10coren) [18:30:27] 6Labs, 5Patch-For-Review: Process for user backups - https://phabricator.wikimedia.org/T85608#1146052 (10coren) [18:30:43] 6Labs, 5Patch-For-Review: Process for user backups - https://phabricator.wikimedia.org/T85608#950693 (10coren) [18:30:44] 6Labs, 5Patch-For-Review: Replicate data between codfw and eqiad - https://phabricator.wikimedia.org/T85606#950673 (10coren) [18:39:25] 6Labs, 10MediaWiki-extensions-OpenStackManager, 5Patch-For-Review: Wikitech 'manage instances' displays "PHP Fatal error: Call to a member function getImageName() on a non-object" - https://phabricator.wikimedia.org/T89856#1146101 (10Andrew) 5Open>3Resolved Patch is merged -- now instances that refer to... [19:13:20] 6Labs, 5Patch-For-Review: Make a labs_storage module - https://phabricator.wikimedia.org/T93781#1146209 (10coren) There will be a class for labstore1003 too, yes, though my first pass will be [12]00[12] [19:17:27] 6Labs: Sync up the new labs NFS project filesystem with the live one - https://phabricator.wikimedia.org/T93792#1146217 (10coren) 3NEW a:3coren [19:18:04] 6Labs: Sync up the new labs NFS project filesystem with the live one - https://phabricator.wikimedia.org/T93792#1146217 (10coren) (As a note, this will be running in a screen session so that it can be supervised) [19:18:57] 6Labs: Sync up the new labs NFS project filesystem with the live one - https://phabricator.wikimedia.org/T93792#1146240 (10yuvipanda) oh, so is the new backedup file system going to be on /mnt? [19:20:47] 6Labs: Sync up the new labs NFS project filesystem with the live one - https://phabricator.wikimedia.org/T93792#1146245 (10coren) No, it will take the old volume's place at /srv/project. /mnt is only used during the copy process because they (obviously) need to both be mounted. [19:21:17] 6Labs: Sync up the new labs NFS project filesystem with the live one - https://phabricator.wikimedia.org/T93792#1146247 (10yuvipanda) Ah fair enough :) [19:33:45] 6Labs, 10hardware-requests, 6operations: Replace virt1000 with a newer warrantied server - https://phabricator.wikimedia.org/T90626#1146302 (10RobH) virt1000 is a dual X5647 @ 2.93GHz w/ 32GB. Also if the replacement has to be under warranty, it'll be slightly more challenging. Is there a particular entry... [19:38:45] (03CR) 10John F. Lewis: [C: 032 V: 032] Replace absolute paths with relative paths [labs/tools/WMT] - 10https://gerrit.wikimedia.org/r/199281 (owner: 10Southparkfan) [19:45:43] 6Labs, 5Patch-For-Review: Move to a new dns scheme for labs: hostname.projectname.eqiad.wmflabs - https://phabricator.wikimedia.org/T93087#1146331 (10Andrew) Every labs instance now has puppetVar: use_dnsmasq=true in ldap. [19:46:59] 6Labs: Make a fact for project_id on labs instances - https://phabricator.wikimedia.org/T93684#1146340 (10Andrew) [19:47:01] 6Labs, 5Patch-For-Review: Move to a new dns scheme for labs: hostname.projectname.eqiad.wmflabs - https://phabricator.wikimedia.org/T93087#1146339 (10Andrew) [20:04:33] 6Labs: Make a fact for project_id on labs instances - https://phabricator.wikimedia.org/T93684#1146406 (10coren) Isn't that what ${instanceproject} is? [20:45:45] (03PS1) 10Southparkfan: Use new double-hashed channel namespace [labs/tools/WMT] - 10https://gerrit.wikimedia.org/r/199325 [20:46:48] (03CR) 10Alpha: [C: 032 V: 032] Use new double-hashed channel namespace [labs/tools/WMT] - 10https://gerrit.wikimedia.org/r/199325 (owner: 10Southparkfan) [21:00:38] 6Labs: Sync up the new labs NFS project filesystem with the live one - https://phabricator.wikimedia.org/T93792#1146529 (10coren) This has now started. [21:02:53] Coren: that probably is what instanceproject is — I can find references to it but can’t tell where it comes from… can you? [21:08:32] (03PS1) 10John F. Lewis: feed #wmt to new channel [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/199329 [21:09:12] oh, from ldap of course [21:09:22] so it’s there but not a fact [21:09:58] (03PS1) 10John F. Lewis: feed #wmt to new channel [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/199330 [21:48:30] YuviPanda: You could have just imported the private ssh key for the server and nobody would have noticed your change ;-) [22:32:06] 10Tool-Labs: Memory Exhausted Near / Tool labs error while querying with Python - https://phabricator.wikimedia.org/T93074#1146898 (10Springle) For the error 1064: https://bugs.mysql.com/bug.php?id=69383 . Check the size of your largest prepare statements. The second error, "Commands out of sync; you can't run... [23:13:51] chasemp: Did you see my comment on https://phabricator.wikimedia.org/rOPUP5b535a7a0915a9687d99057010c843c279f6e597 [23:14:37] ah I don't think anyone is tracking diffusion comments but in essence, can you give me a reason to make the change? [23:16:27] all the other permission configs set the group to phd [23:16:48] i guess i should be commenting on gerrit [23:16:58] i'm so wrapped up in phab at the moment [23:18:09] well repository tools can be managed via sudo by some users as root [23:18:21] but if I user and group them as phd I have to allow those users to sudo as phd [23:18:28] not sure I want to mix it up that way [23:18:49] I'm ok with how it is for now at least for as long as we are shaking things out and we have a reason to make another change [23:19:00] ok that's fine [23:20:35] cli perms for phab are an emerging field :)