[00:06:45] 6Labs, 6operations: Make morebots run on a production host - https://phabricator.wikimedia.org/T94638#1168850 (10Andrew) IRC bots in general, and morebots in particular, are of questionable security. I don't think it's a good idea to let them run on production. [00:08:07] 6Labs, 6operations: Make morebots run on a production host - https://phabricator.wikimedia.org/T94638#1168854 (10yuvipanda) In that case shouldn't we maybe actually security audit them and then make them run on prod? Or use something else, like logstash? [00:08:23] 6Labs, 6operations: Make morebots run on a production host - https://phabricator.wikimedia.org/T94638#1168855 (10yuvipanda) !log even when no bot is available is useful still, because a lot of people have IRC logs... [00:12:15] RECOVERY - SSH on tools-webgrid-03 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.4 (protocol 2.0) [00:19:03] 6Labs, 3ToolLabs-Q4-Sprint-1: Create a simple checklist to follow for announcing / doing planned maintenance (on labs) - https://phabricator.wikimedia.org/T94608#1168889 (10yuvipanda) [00:20:52] PROBLEM - Puppet staleness on tools-webgrid-03 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [43200.0] [00:22:59] 6Labs, 10Wikimedia-Labs-Infrastructure, 5Patch-For-Review, 3ToolLabs-Q4-Sprint-1: Alert when conntrack table is full on labnet1001 - https://phabricator.wikimedia.org/T90437#1168893 (10yuvipanda) Cool! :) Now this just needs an alert. [00:25:44] RECOVERY - Puppet staleness on tools-webgrid-03 is OK: OK: Less than 1.00% above the threshold [3600.0] [00:33:08] 6Labs, 10Tool-Labs: Planned labs maintenance on tools-db: Puppetization + log file change - https://phabricator.wikimedia.org/T94643#1168907 (10yuvipanda) 3NEW [00:33:24] 6Labs, 3ToolLabs-Q4-Sprint-1: Create a simple checklist to follow for announcing / doing planned maintenance (on labs) - https://phabricator.wikimedia.org/T94608#1168917 (10yuvipanda) Example at T94643 [00:37:32] 6Labs, 10Tool-Labs: Planned labs maintenance on tools-db: Puppetization + log file change - https://phabricator.wikimedia.org/T94643#1168943 (10yuvipanda) [00:42:12] 10Tool-Labs: Register labs-announce at Gmane - https://phabricator.wikimedia.org/T94647#1168971 (10scfc) 3NEW a:3scfc [00:42:44] 10Tool-Labs: Register labs-announce at Gmane - https://phabricator.wikimedia.org/T94647#1168982 (10yuvipanda) I didn't realize this wasn't automatic. [00:44:37] 6Labs, 10Tool-Labs: Planned labs maintenance on tools-db: Puppetization + log file change - https://phabricator.wikimedia.org/T94643#1168985 (10yuvipanda) [00:54:20] 10Tool-Labs: Register labs-announce at Gmane - https://phabricator.wikimedia.org/T94647#1169017 (10scfc) Filled out the form; waiting for confirmation. [00:55:15] 10Tool-Labs: Register labs-announce at Gmane - https://phabricator.wikimedia.org/T94647#1169018 (10scfc) If Gmane would automatically monitor all the mailing lists in the world, I probably wouldn't use it anymore :-). [00:58:13] 6Labs, 6operations: Make morebots run on a production host - https://phabricator.wikimedia.org/T94638#1169025 (10bd808) >>! In T94638#1168854, @yuvipanda wrote: > In that case shouldn't we maybe actually security audit them and then make them run on prod? Or use something else, like logstash? Getting !logs in... [01:03:18] 6Labs, 10Tool-Labs: Planned labs maintenance on tools-db: Puppetization + log file change - https://phabricator.wikimedia.org/T94643#1169036 (10yuvipanda) [01:10:19] RECOVERY - Puppet failure on tools-exec-15 is OK: OK: Less than 1.00% above the threshold [0.0] [01:38:51] 6Labs, 10Tool-Labs: Planned labs maintenance on tools-db: Puppetization + log file change - https://phabricator.wikimedia.org/T94643#1169076 (10yuvipanda) [01:38:54] 6Labs, 10Tool-Labs, 5Patch-For-Review, 3ToolLabs-Goals-Q4, 3ToolLabs-Q4-Sprint-1: Puppetize & fix tools-db - https://phabricator.wikimedia.org/T88234#1169075 (10yuvipanda) [01:41:02] wikibugs: you are not in all channels [01:41:05] if anyone here has an idea... [01:41:05] I am looking for https://tools.wmflabs.org/not-in-the-other-language/ in Magnus' git repository but it doesn't seem to be there... [01:47:41] harej: what does it do? [01:48:01] > This tool looks for Wikidata items that have a page in one language but not in the other. [01:48:07] and you can choose which wikipedias to compare [01:48:14] and which category tree to populate the list with [01:48:58] harej: Ah, I have something similar [01:49:12] harej: http://tools.wmflabs.org/betacommand-dev/cgi-bin/suggested_articles.py [01:49:38] does that still work in the era of wikidata? [01:50:04] harej: I just wrote it about two months ago [01:50:08] ah! [01:50:46] use qqx as a lang code if you just want to see the total number of lang links for articles in a given category [01:51:47] harej: thats cross language and works on all projects [01:52:32] qqx, metasyntaxese ;) [01:52:52] yep :P [01:53:27] harej: it only takes about 3-5 minutes for CAT:LIVING [01:53:42] which is over half a million articles [01:58:30] 10Tool-Labs, 3ToolLabs-Goals-Q4: Show replication lags in Graphite - https://phabricator.wikimedia.org/T50694#1169089 (10yuvipanda) This would also allow us to alert based on it. [01:59:13] harej: Like I said, I often already have most things already written :) [01:59:53] so I am not sure it works in the same exact way as magnus' thing [01:59:54] compare: [02:00:21] https://tools.wmflabs.org/not-in-the-other-language/?lang1=fr&proj1=wiki&lang2=en&proj2=wiki&cat=Barack+Obama&depth=9&starts_with=&doit=Do+it [02:00:25] https://tools.wmflabs.org/betacommand-dev/cgi-bin/suggested_articles.py?title=Barack+Obama&iw=en&project=wikipedia&lang=fr&recurse=9&articles=NS0 [02:02:31] harej: thats because apparently not all lang links from frwiki are in wikidata [02:02:48] That's... possible? [02:03:30] Yes [02:03:50] thats why your getting different results, my tool looks at all lang links [02:04:05] but your tool gives me *fewer* results [02:04:33] ...because the language links already exist, just not in wikidata [02:04:34] ah. [02:05:04] Yep [02:05:06] I think [02:06:40] harej, yeah, there are a few things that require lang links to remain in articles. Eg. when an article combines 2 things, that are separate elsewhere. E.g. #11. at https://www.wikidata.org/wiki/Help:FAQ#Editing [02:06:58] harej: the other limiting factor is that a result must have at least one lang link to be reported [02:19:35] 6Labs, 10Wikimedia-Labs-Infrastructure, 5Patch-For-Review, 3ToolLabs-Q4-Sprint-1: Alert when conntrack table is full on labnet1001 - https://phabricator.wikimedia.org/T90437#1169124 (10coren) Not quite, I can't seem to find the putatively collected metrics in graphite. [02:20:12] 10Tool-Labs: separate /tmp and /var/tmp volumes - https://phabricator.wikimedia.org/T66697#1169126 (10yuvipanda) 5Open>3Resolved We have a bigger / now, and lvm can be used as necessary. [03:27:12] What is the best place to test puppet changes? [03:28:12] Coren, why is the cyberbot queue stuck? [03:47:51] wikibugs: y u keep leaving other channels? [03:49:22] greg-g: ask legoktm :P [03:50:07] I want it to speak for itself. [03:52:16] Just like icinga responded last friday :) [04:08:06] 10Tool-Labs, 3ToolLabs-Goals-Q4, 3ToolLabs-Q4-Sprint-1: Make webservice2 default webservice implementation - https://phabricator.wikimedia.org/T90855#1169308 (10yuvipanda) So it should default to precise (still) if called as 'webservice' but trusty if called as webservice2. Also it should be ok with being c... [04:30:34] 10Tool-Labs, 3ToolLabs-Goals-Q4, 3ToolLabs-Q4-Sprint-1: Make webservice2 default webservice implementation - https://phabricator.wikimedia.org/T90855#1169324 (10yuvipanda) *also* should check with bigbrother to make sure that *that* is doing things ok [05:52:52] PROBLEM - Puppet failure on tools-exec-gift is CRITICAL: CRITICAL: 25.00% of data above the critical threshold [0.0] [05:56:14] Hi! I've noticed my PHP web tools often load forever these last few weeks, even when they're just displaying an input form with no significant data queries. The pages have some simple profiling which show load times under one second (even when it took >30 seconds to load), so it seems to be hanging before the page script is called. [05:56:40] This seems to happen periodically — uptime monitoring looks like this: http://i.imgur.com/308HlQU.png [05:57:27] Could this be due to a connection/other limit? Is there any documentation on how to troubleshoot this sort of issue? [05:58:41] Uptime (within reasonable load times) for the last 24 hours is 49%. :| [06:01:34] For reference, a sample tool is http://tools.wmflabs.org/meta/stalktoy/ with source code at https://github.com/Pathoschild/Wikimedia-contrib [06:07:59] hi Pathoschild [06:08:21] Hi YuviPanda. [06:08:51] Pathoschild: I guess the only way is for you to add more logging to your tool and see how that goes? Depending on what you’re doing it might be slow queries / too much CPU / whatever…. [06:09:13] I was going to suggest moving to trusty and using bigbrother but I see you already do [06:19:42] YuviPanda: I just wrapped the script with a timer (with no DB queries / data processing outside the timer) and refreshed until the issue happened again. The request took over two minutes (time to first byte 2.2 minutes), but the script timer shows <1 second load time so it's not blocked by a DB query or heavy CPU processing. [06:20:41] Pathoschild: hmm, interesting. is this in PHP? or python? [06:21:05] It's PHP using webservice2. [06:22:24] Could it be a connection limit, with the request queued for 2 minutes waiting on other connections? [06:22:49] RECOVERY - Puppet failure on tools-exec-gift is OK: OK: Less than 1.00% above the threshold [0.0] [06:23:22] Pathoschild: shouldn’t be, esp if you think your script is executing in less than 1s [06:24:14] Pathoschild: hmm, when I try to use the stalktoy it does seem to work for me in under 1-2s? [06:25:00] I’m getting PHP warnings, though [06:25:01] ( ! ) Notice: Undefined variable: hasGlobalGroups in /data/project/meta/git/wikimedia-contrib/tool-labs/stalktoy/index.php on line 762 [06:25:06] but outside of that things seem fast? [06:25:07] Pathoschild: ^ [06:25:23] Yep, its normal load time with no query is <1 second. If you keep refreshing occasionally it will suddenly load forever for a while. [06:26:10] Sometimes it becomes slow for long periods; I pointed an uptime monitor at it, and the longest downtime is over two hours unresponsive (with checks every 5 minutes). [06:29:08] Pathoschild: hmm, I’ve been unable to make it slow :( [06:29:20] I did enabled more detailed logging in lighttpd, but I don’t think that’s going to be of much use... [06:29:32] Pathoschild: can you file a bug and / or email labs-l? It’s close to midnight and I think I’ve to go to bed [06:31:26] Pathoschild: sorry :( [06:35:52] 10Tool-Labs-tools-Other: Fix tool kmlexport - https://phabricator.wikimedia.org/T92963#1169390 (10Thgoiter) 6 h down right now. [06:42:13] 10Tool-Labs-tools-Other: Fix tool kmlexport - https://phabricator.wikimedia.org/T92963#1169391 (10yuvipanda) Have started it back again. Not sure why bigbrother didn't do that. Hopefully we'll have a longer term solution in the next few weeks. [06:46:59] Is it possible for any of you guys to restart this webservice (https://tools.wmflabs.org/citations/doibot.php) remotely? Ir has been down since the recent outage. [06:47:06] It* [06:47:19] Josve05a: sure [06:47:45] :) Thanks [06:48:02] Josve05a: can you try now? [06:50:20] Yay :D [06:50:54] Seems to work now https://en.wikipedia.org/w/index.php?title=Bob_Fothergill&diff=prev&oldid=654455963 [06:52:32] PROBLEM - Puppet failure on tools-exec-06 is CRITICAL: CRITICAL: 75.00% of data above the critical threshold [0.0] [07:13:15] 6Labs, 10Continuous-Integration, 7Puppet: Fix Could not find data item ganglia_class in any Hiera data file and no default supplied at /manifests/ganglia.pp:22" - https://phabricator.wikimedia.org/T94669#1169440 (10Krinkle) 3NEW [07:16:13] 6Labs, 10Continuous-Integration, 7Puppet: Fix "Could not find data item ganglia_class in any Hiera data file and no default supplied at /manifests/ganglia.pp:22" - https://phabricator.wikimedia.org/T94669#1169449 (10Krinkle) [07:17:32] RECOVERY - Puppet failure on tools-exec-06 is OK: OK: Less than 1.00% above the threshold [0.0] [08:52:44] 10Wikimedia-Labs-wikistats: MediaWiki - wiki registry - https://phabricator.wikimedia.org/T39062#1169650 (10Nemo_bis) RobiH's subscription was lost in the phabricator migration... [[https://meta.wikimedia.org/w/index.php?title=User_talk%3ARobiH&action=historysubmit&diff=11725084&oldid=5699398|notified him]] of t... [09:08:58] 6Labs, 6operations: Make morebots run on a production host - https://phabricator.wikimedia.org/T94638#1169711 (10fgiunchedi) p:5Triage>3Normal [09:55:07] 6Labs, 10Continuous-Integration, 5Patch-For-Review, 7Puppet: Fix "Could not find data item ganglia_class in any Hiera data file and no default supplied at /manifests/ganglia.pp:22" - https://phabricator.wikimedia.org/T94669#1169838 (10hashar) 5Open>3Resolved a:3hashar I have confirmed on integration-... [10:33:03] Important: Consider your security scheme before you create an instance, you can not remove or add security groups to an instance once it has been created. [10:33:07] hrrrr seriously? [10:36:13] 6Labs, 10REFLEX: Public IP and Wildcard DNS for REFLEX project - https://phabricator.wikimedia.org/T92273#1169889 (10werdna) Yes, the wildcard domain would be really helpful! [10:48:30] 10Wikimedia-Labs-wikitech-interface, 6operations: distribution upgrade for wikitech-static instance - https://phabricator.wikimedia.org/T94585#1169907 (10fgiunchedi) p:5Triage>3Normal [10:54:29] 10Wikimedia-Labs-wikitech-interface, 6operations, 7Regression: Some wikitech.wikimedia.org thumbnails broken (404) - https://phabricator.wikimedia.org/T93041#1169928 (10Aklapper) [10:54:38] 10Wikimedia-Labs-wikitech-interface, 6operations, 7Regression: Some wikitech.wikimedia.org thumbnails broken (404) - https://phabricator.wikimedia.org/T93041#1127368 (10Aklapper) First example seems to work again now [13:12:51] Coren, can you have a look at qstat for cyberbot and tell me what is happening? The bot isn't doing a thing on wiki. [13:14:17] CP678: It's got some 12 jobs running, but some 16 errored out. [13:14:37] Do you have a script I can use to kick them? [13:15:20] Which is likely a result of labs doing the NFS switchover. The question is, why am I restricted to 16? [13:15:37] Isn't that the entire point of the cyberbot node, to remove that limit? [13:15:44] With 4GB and 2 CPUs? [13:15:50] You aren't - they aren't queued, they are errored out. [13:16:02] Do you want me to reschedule the lot? [13:16:41] Coren, Eqw means they errored when queued correct? [13:16:51] What does Rr mean [13:17:05] And yes please. Can you restart all of them? [13:17:28] CP678: No, in general it means it errored out when starting so they never once ran. Rr only means "it was restarted at some point in the past" [13:17:58] If they errored out, why are they still in the job list? [13:18:15] CP678: Because you might - as I'm about to do - reschedule them. :-) [13:18:20] From the past, they typically disappeared and allowed the crontab to restart an instance of the job. [13:18:38] How do you do that? [13:19:09] That dpeneds on how they errored out. You can clear the error state with qmod -cj [13:19:49] But the scripts with Rr have stopped operating since the switch. What happened there. [13:20:09] CP678: That's a bug. [13:20:16] In? [13:20:48] Infrastructure. That one's entirely on me. Precise instances (which your node is) did not handle the switch well. [13:20:59] Oh. [13:21:12] * CP678 doesn't trust trusty. [13:21:41] But... it's trusty! :-) [13:22:04] Everytime we tried, it blew up. [13:22:40] There /are/ a lot of new versions of things. Depending on what languages you use, that does mean a disruptive upgrade. [13:23:03] . [13:23:04] did usa covertly supply isis with weapons like they did with al-qaeda to justify creating wars? [13:23:04] did usa excute the creative mess in the middle east like they said they will, does the creative mess include explosions with uncertain responsibles to create wars? [13:23:04] plz, send my qs to help limiting usa&israel aggression against others. [13:23:05] .did usa covertly supply isis with weapons like they did with al-qaeda to justify creating wars? [13:23:05] did usa excute the creative mess in the middle east like they said they will, does the creative mess include explosions with uncertain responsibles to create wars? [13:23:05] plz, send my qs to help limiting usa&israel aggression against others. [13:23:29] !kick kyugyi [13:24:29] Well that was disruptive. [13:25:24] CP678: Everything is restarting. Not all at once because load, but about 2/3 done atm [13:26:28] Coren, I know trusty isn't supposed to be a disruptive upgrade, but since I'm dependant on PHP 5.3 upgrading to 5.5 seems to destroy many things. :/ [13:26:38] Thank you. :-) [13:26:46] I note you have a bot named 'obama'. :-) [13:27:11] CP678: afaict, everything is running now. [13:28:18] LoL [13:36:46] RECOVERY - Puppet staleness on tools-exec-cyberbot is OK: OK: Less than 1.00% above the threshold [3600.0] [13:50:48] Coren, cyberbot has kicked into overdrive. [13:50:54] :-) [15:42:01] Hi [15:42:23] I wish to infoem that my server i-00000336.eqiad.wmflabs can't boot up [15:42:51] I tried to reboot but remains in stale state [15:43:27] please can you fix the issue? [15:43:59] Sbiribizio: what project is that from? [15:44:16] also, keep in mind that ‘stale’ refers to the puppet status, not the run state [15:44:37] osmit project [15:46:50] Sbiribizio: try now? [15:46:51] I also tried to access by ssh first logging to bastioh host and then making ssh to server [15:47:12] it was ‘SHUTOFF’ so clearly not accessible via ssh :) [15:47:34] I works now. [15:47:41] great! [15:47:52] I tried to restart it, but I can't [15:48:06] Yeah, I started it from the commandline. I’m not sure why reboot didn’t work. [15:48:48] Power of the sysadmin! [15:48:49] thanks [16:25:40] <^d> Something up with ldap? [16:25:52] <^d> I just tried creating a new instance, console is spamming things like dap_start_tls_s() failed: Connect error: No such file or directory (uri="ldap://ldap-eqiad.wikimedia.org:389") [16:25:55] <^d> And can't login [16:26:24] ^d: I have a meeting shortly, but I’ll look [16:26:49] ^d: the console of the new instance? [16:26:53] <^d> Yep [16:26:55] <^d> i-00000a27.eqiad.wmflabs / staging-test-tin [16:31:19] ^d: I just created a new instance and it worked just fine. So, given that I’m distracted, maybe just delete and try again :) [16:31:29] * ^d nukes [16:43:45] <^d> andrewbogott: Same result [16:43:54] <^d> (also, there's entries for codfw, not just eqiad) [16:44:10] ^d: ok, I’ll have to look after my call [16:44:21] If you have a moment, see if you can reproduce the issue in a different project [16:44:44] <^d> Actually, nvm. I think it just had to run puppet a second time. Race condition? [16:47:33] I suspect hiera settings for your project, if there are any [16:47:59] <^d> There's lots :) [16:48:07] <^d> It's all good. Sorry to waste your time [16:48:50] no worries, I definitely want to know if ldap breaks :) [17:07:12] Coren: [17:07:22] network saturation checks for both labstores are un an UNKNOWN state [17:07:34] Hm. [17:07:40] * Coren looks. [17:08:16] I see both at ok... [17:08:25] OH! 1002. Doesn't have a bond0! [17:08:54] Bleh. Means the test has to change the check depending on server. Uglies. [17:08:59] * Coren fixes that. [17:29:33] 10Wikimedia-Labs-wikistats: MediaWiki - wiki registry - https://phabricator.wikimedia.org/T39062#1170907 (10RobiH) Not at this time. [17:33:01] https://tools.wmflabs.org/magnustools/multistatus.html is down [17:45:15] GerardM-: Restarted it. [17:46:00] paravoid: The test should get okay as puppet propagates. [17:56:50] 6Labs, 10OpenStreetMap: Please create an "osm" labs project - https://phabricator.wikimedia.org/T94718#1170964 (10MaxSem) 3NEW [18:10:48] 6Labs, 10OpenStreetMap: Please create an "osm" labs project - https://phabricator.wikimedia.org/T94718#1171039 (10MaxSem) [18:14:14] 6Labs, 10OpenStreetMap: Please create an "osm" labs project - https://phabricator.wikimedia.org/T94718#1171052 (10Andrew) there's already 'maps,' 'osmit,' and 'osm4wiki.' I'm happy to create a new project, but a more distinctive name might be nice :) [18:23:53] 6Labs, 5Patch-For-Review, 3ToolLabs-Goals-Q4: dhclient overwrites /etc/resolv.conf - https://phabricator.wikimedia.org/T93691#1171101 (10scfc) 5Open>3Resolved a:5scfc>3Andrew [18:23:54] 6Labs, 10Wikimedia-Labs-Infrastructure, 3ToolLabs-Goals-Q4: Move LabsDB aliases to DNS - https://phabricator.wikimedia.org/T63897#1171104 (10scfc) [18:24:52] 6Labs, 5Patch-For-Review, 3ToolLabs-Goals-Q4: dhclient overwrites /etc/resolv.conf - https://phabricator.wikimedia.org/T93691#1143549 (10scfc) Verified on `tools-exec-13` (Precise), `tools-exec-21` (Trusty) and `toolsbeta-jessie-test2` (Jessie) that `ifdown eth0; ifup eth0` does not overwrite `/etc/resolv.co... [18:38:33] 6Labs, 6Phabricator, 5Patch-For-Review, 7Puppet: Disable by default Phabricator alternate file domain on Labs - https://phabricator.wikimedia.org/T93837#1171179 (10Negative24) 5Open>3Resolved [19:09:03] 6Labs, 10OpenStreetMap: Please create an "osm" labs project - https://phabricator.wikimedia.org/T94718#1171395 (10MaxSem) @Andrew, let's make it 'maps-team'. [19:45:16] andrewbogott: Quick sanity check for https://gerrit.wikimedia.org/r/#/c/201203/2 ? [19:50:58] Coren: andrewbogott I’m back! [19:51:01] yes, SSN morning [19:51:30] I was about to ask how your battle with bureaucracy went. I take it from the spoils that you emerged victorious? :-) [19:52:38] Coren: well, I do have a receipt saying I’ll get my SSN in 2 weeks... [19:52:42] so I guess that’s some form of victory? [19:52:56] delayed-action victory! [19:53:11] :) [19:53:26] mutante: Okay to push that contactgroup change? [19:53:36] Merge "add jmm to icinga SMS contact group" into production (b494869) [19:53:44] 6Labs, 10OpenStreetMap: Please create an "osm" labs project - https://phabricator.wikimedia.org/T94718#1171685 (10Andrew) OK, project created with MaxSem as founding and sole projectadmin. Max, you can add other members and admins as you see fit. [19:53:52] 6Labs, 10OpenStreetMap: Please create an "osm" labs project - https://phabricator.wikimedia.org/T94718#1171686 (10Andrew) 5Open>3Resolved a:3Andrew [19:53:53] 6Labs, 7Tracking: New Labs project requests (Tracking) - https://phabricator.wikimedia.org/T76375#1171689 (10Andrew) [19:54:03] Coren: yes please, sorry for leaving it there [19:54:12] 'shappens. :-) [20:00:56] Coren: OGE question - what’s Rr vs r in status? [20:01:13] The uppercase R only means the job was restarted some time in the past. [20:01:37] I see [20:02:20] Coren: webservice2 checks for state = r, I guess it should instead check if state *contains* r [20:02:41] It should indeed. [20:10:49] 6Labs, 10Wikimedia-Labs-Infrastructure, 5Patch-For-Review: Internal DNS look-ups fail every once in a while - https://phabricator.wikimedia.org/T72076#1171756 (10coren) [20:10:50] 6Labs, 10Wikimedia-Labs-Infrastructure, 5Patch-For-Review, 3ToolLabs-Q4-Sprint-1: Alert when conntrack table is full on labnet1001 - https://phabricator.wikimedia.org/T90437#1171755 (10coren) 5Open>3Resolved [20:11:32] 6Labs, 10Wikimedia-Labs-Infrastructure, 3ToolLabs-Q4-Sprint-1: Alert when conntrack table is full on labnet1001 - https://phabricator.wikimedia.org/T90437#1058808 (10coren) [20:13:46] 10Wikimedia-Labs-wikistats: MediaWiki - wiki registry - https://phabricator.wikimedia.org/T39062#1171775 (10Dzahn) oh, thanks @Nemo_bis for this, and welcome to Phabricator @RobiH [20:14:52] mutante: https://phabricator.wikimedia.org/tag/wikimedia-labs-wikistats/ should probably be named ‘Wikistats' [20:15:00] 10Wikimedia-Labs-wikistats: MediaWiki - wiki registry - https://phabricator.wikimedia.org/T39062#1171777 (10Dzahn) @RobiH So the idea was if you (or others) have wikis to add you can just paste them here and that would be simplest form of "registry". Separately you should still be able to login to the labs insta... [20:15:10] mutante: mind if I rename? [20:17:54] YuviPanda: it used to make sense in Bugzilla, it was a component for tools within labs. i don't know about the naming conventions in phab [20:18:28] mutante: there’s no hierarchy in phab, so just use whatever makes sense, mostly. wikibugs used to be under labs, but is its own thing now, for example [20:18:57] i think we are supposed to create tickets for naming projects though [20:19:01] i dont mind personally [20:19:16] it just got imported that way [20:19:40] mutante: https://www.mediawiki.org/wiki/Phabricator/Creating_and_renaming_projects#Renaming_projects [20:19:44] andre_ doesn't like the "whatever makes sense" part [20:19:53] mutante: so if you’re ok, I’ll rename. [20:20:32] well, it also breaks my notifications, but go ahead [20:20:40] thanks :) [20:21:37] am i supposed to request a separate channel for the tool? [20:21:48] is it about the notifications? [20:21:52] if you want to? [20:22:08] not really, mostly about the fact that wikistats isn’t really related to labs except for where it’s hosted in? [20:22:20] I don’t mind the notifications coming here. [20:22:44] I think partly it’s that my eye looks at ‘Wikimedia-Labs-‘ and I imagine something’s up and then ‘bam’ it is just wikistats :D [20:23:07] yea, that will apply to anything you are not personally working on [20:23:12] YuviPanda: How is SF? [20:23:16] but i get it, it's ok [20:23:21] Seen any of it or just fixing stuff? :P [20:23:32] multichill: paperwork, some firefighting :) [20:26:47] so what are other tools doing about this? nobody notifies this channel? [20:28:29] mutante: notifying this channel is fine :) I just wanted a different name :) [20:28:49] ok [20:38:39] 6Labs, 10Tool-Labs, 3ToolLabs-Goals-Q4, 3ToolLabs-Q4-Sprint-1: Make sure tools-db is replicated somewhere - https://phabricator.wikimedia.org/T88718#1171866 (10coren) [20:38:40] 6Labs, 10Tool-Labs, 5Patch-For-Review, 3ToolLabs-Goals-Q4, 3ToolLabs-Q4-Sprint-1: Puppetize & fix tools-db - https://phabricator.wikimedia.org/T88234#1171865 (10coren) [20:38:50] 6Labs, 10Tool-Labs: Planned labs maintenance on tools-db: Puppetization + log file change - https://phabricator.wikimedia.org/T94643#1171867 (10coren) [20:38:51] 6Labs, 10Tool-Labs, 3ToolLabs-Goals-Q4, 3ToolLabs-Q4-Sprint-1: Make sure tools-db is replicated somewhere - https://phabricator.wikimedia.org/T88718#1018775 (10coren) [20:54:18] (03PS1) 10Legoktm: HACK: Always join channels on privmsg [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/201347 [20:54:33] valhallasw`cloud: ^ [20:54:37] (03CR) 10Legoktm: [C: 032] HACK: Always join channels on privmsg [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/201347 (owner: 10Legoktm) [20:54:45] 10Tool-Labs, 5Patch-For-Review, 3ToolLabs-Goals-Q4, 3ToolLabs-Q4-Sprint-1: Make webservice2 activities blocking - https://phabricator.wikimedia.org/T93334#1171969 (10yuvipanda) 5Open>3Resolved Done! [20:54:46] 10Tool-Labs, 5Patch-For-Review, 3ToolLabs-Goals-Q4, 3ToolLabs-Q4-Sprint-1: Make webservice2 default webservice implementation - https://phabricator.wikimedia.org/T90855#1171971 (10yuvipanda) [20:55:02] (03Merged) 10jenkins-bot: HACK: Always join channels on privmsg [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/201347 (owner: 10Legoktm) [20:56:14] legoktm: yeah, might work [20:56:42] !log tools.wikibugs legoktm: Deployed f09815aee08458b7fb283db7c7e0aed49e3b149d HACK: Always join channels on privmsg wb2-irc [20:56:45] !log tools.wikibugs Updated channels.yaml to: f09815aee08458b7fb283db7c7e0aed49e3b149d HACK: Always join channels on privmsg [20:56:47] Logged the message, Master [20:56:51] Logged the message, Master [20:56:59] * legoktm pats wikibugs [20:56:59] Coren: how do I test bigbrother? [20:57:05] wat [20:57:09] 2? [20:57:18] okay [20:58:18] 10Wikibugs: wikibugs test bug - https://phabricator.wikimedia.org/T1152#1172008 (10Legoktm) ! [20:59:15] valhallasw`cloud: looks to be working [20:59:44] legoktm: I guess so [20:59:58] maybe we'll be killed for sending too much data to the server at some point [21:00:12] * YuviPanda kills [21:00:21] but we'll see ;-) [21:11:08] RECOVERY - Puppet failure on tools-redis is OK: OK: Less than 1.00% above the threshold [0.0] [21:25:16] 10Wikibugs: Ignore sprints for default channel - https://phabricator.wikimedia.org/T94761#1172089 (10yuvipanda) 3NEW [21:33:59] !log tools.xmlfeed killed 8779979 on request from Mjbmr [21:34:03] Logged the message, Master [21:36:36] 10Tool-Labs: Register labs-announce at Gmane - https://phabricator.wikimedia.org/T94647#1172114 (10scfc) I received the confirmation; now waiting for the next announcement to create the actual group, afterwards import the initial message to Gmane for completeness. [21:59:35] YuviPanda: Easiest way is with a webservice - just start the default lighttpd and kill it [22:00:29] ah, fair enough [22:00:44] I’m very close to killing the old ‘webservice' [22:01:19] wfm. Plz to symlink to webservice2. :-) [22:02:07] Coren: yeah :) Am wondering if we should keep a symlink at the *old* /usr/bin spot too, since the new one is from puppet and in /usr/local/bin [22:02:49] Hm. I don't like the idea of puppet putting stuff outside local, but that would be the path of least breakage. [22:03:15] Coren: well, we could put the symlink in the toollabs package [22:04:32] That too. [22:07:03] Coren: we should also rename the ‘misctools’ package to something else :) [22:07:06] but one step at a time, I guess [22:07:19] toolstools [22:07:36] Coren: Can you co-ordinate with Sean for the tools-db downtime? Next tuesday, maybe? [22:17:00] YuviPanda: Sure thing. [22:17:07] Coren: \o/ sweet [22:19:46] so... API / user_id / OAuth question... is there a way to find a user's global id from their local id? [22:19:59] or vice versa? [22:21:14] ragesoss: wooo, I know this. moment [22:21:19] quarry does this [22:21:46] my app currently stores users by their en.wiki user_id, but with OAuth login, we'll immediately know only the global id. And we want a reliable way to know whether we already have a user object for them, without going through username (since we can't guarantee that hasn't changed onwiki) [22:22:39] ragesoss: https://github.com/wikimedia/analytics-quarry-web/blob/master/quarry/web/app.py#L141 [22:23:55] YuviPanda: Is there a magical place where I can test puppet? [22:24:28] Negative24: https://wikitech.wikimedia.org/wiki/Help:Self-hosted_puppetmaster basically [22:25:20] Should I do it in a specific lab project or just my own project? [22:25:38] Negative24: this is for phabricator, no? Just use the phabricator project, spin up an instance and use that I guess [22:26:05] Yes it is. How'd you guess? ;) [22:27:22] :P [22:27:29] ragesoss: so that should help. [22:27:41] ragesoss: oh, wait. That didn’t actually answer your question :| [22:27:52] thanks YuviPanda. checking it out now, and still confused. [22:27:54] :) [22:27:55] ragesoss: and I don’t actually know the answer to your question, I think. ask legoktm or csteipp? [22:28:16] will do, YuviPanda [22:40:30] I got an answer from csteipp, YuviPanda. [22:41:09] (for the record, there's no way to directly get local_id from global_id, without going through username) [22:42:08] !log phabricator created phab-pup for temporary puppet testing (only about a week) [22:42:12] Logged the message, Master [22:46:09] 10Tool-Labs-tools-Other: Map Warper - No space left on device - https://phabricator.wikimedia.org/T73604#1172418 (10jeblad) I believe this is fixed? [23:01:52] ragesoss: ah, cool [23:23:22] PROBLEM - ToolLabs Home Page on toollabs is CRITICAL: CRITICAL - Socket timeout after 10 seconds [23:26:10] baah [23:26:12] what now [23:26:18] Coren: andrewbogott ^ [23:26:20] oh [23:26:21] hmm [23:26:28] nevermind, it seems up... [23:26:44] not sure why shinken thinks otherwise [23:27:03] Networking burp? [23:27:09] yeah [23:27:13] my ssh is also very slow [23:27:15] well [23:27:16] was [23:27:17] is fine now [23:27:21] I guess next check will find it up? [23:27:34] Yeah, I see a burst of I/O on labstore1001. [23:27:44] Looks like it's going away. [23:27:53] RECOVERY - ToolLabs Home Page on toollabs is OK: HTTP OK: HTTP/1.1 200 OK - 757806 bytes in 3.016 second response time [23:28:08] There was a large, unusual spike in writes. [23:28:41] So I guess everything slowed down for a minute or two. What's the timeout on that check? [23:28:45] 10s [23:28:49] Oh, nvm, 10 secs. Says so right there. [23:29:10] That might be a mite too paranoid for a critical. [23:29:10] Coren: there’s an NFS diamond collector, I wonder if that will be of any use at all [23:29:26] idk, I think 10s is definitely a long time for a web page response :D [23:29:32] You mean, for things like that? [23:29:59] the collector? It collected a bunch of stats including use on a per-instance basis, IIRC [23:30:02] It's a big page, and is dependent on DB an IO alot. [23:30:14] yeah, that’s what makes it a good test, I think [23:30:26] because it goes down whenever one of a lot of things goes down [23:30:45] I suppose, but a single check at 10s is a hair trigger. [23:31:25] Coren: 10s is just the default timeout. [23:31:39] Coren: and it triggers only after 2 failures, IIRC [23:43:56] YuviPanda: So to get my self puppet master instance to use the phabricator role I just configure it to use it as normal and it'll use itself as the puppetmaster? [23:44:21] Negative24: following those self-puppetmaster instructions and then configuring it to use phab, yeah [23:44:34] Good [23:46:35] Hey YuviPanda: I just tried to ssh into tools-login.wmflabs.org and it alerted me that the remote host identification has changed. Is that normal/can I ignore that? [23:49:07] * phe summon Coren about /data/project/phetools/cache recovery [23:53:29] bearND: yup!