[00:22:02] oh weird. I can't connect to any new instances now O_O. just created an instance and seeing this in the log: [00:22:09] https://www.irccloud.com/pastebin/s9CU0h2y/ [00:23:13] bd808: that...seems not good? [00:23:48] bearloga: yuck [00:24:09] does your project have a local puppetmaster? [00:26:48] bd808: I'm not sure :\ ebernhardson? [00:27:00] bd808: this is the project: https://wikitech.wikimedia.org/wiki/Nova_Resource:Shiny-r [00:27:04] I had that issue on a new instance a week ish ago. [00:27:28] The fix is to delete and recreate unfortunately :/ [00:28:16] looks unlikey that there is a a local puppetmaster. [00:28:42] tom29739: oof. this may or may not be my 3rd delete-recreate attempt to fix that issue :\ [00:29:11] bearloga: are you using the same hostname each time? That can cause issues sometimes [00:29:41] ^ just don't do it [00:29:53] I've found it best to go with the 01, 02, 03, ... naming thing [00:31:01] hmm, i don't think shiny has a local puppetmaster [00:31:16] bd808 tom29739: oh. will try that. [00:32:29] bearloga: you can at lest feel better that you aren't alone in these new instance bugs. They even happen to the roots from time to time. Not sure that is a big comfort though [00:32:58] m.adhu had to make like 7 instances in a row to get on that worked a few weeks ago [00:33:10] I seem to encounter new bugs whenever I set new instances most of the time [00:34:05] I go though loads of instances, but I guess that's a benefit of OpenStack - instances can be disposable [00:42:49] bd808: oh wow. okay. new issue (sorry! it seems to be a rough day for me and labs) is that I can't connect to an existing instance that I was able to before a few hours ago. now I just get "channel 0: open failed: administratively prohibited: open failed" tried rebooting the instance [00:46:51] whyyyyyyyy https://www.irccloud.com/pastebin/uvgMVQrn/ [00:56:48] bearloga: that's a problem that has been going around the Labs cluster like a damn virus. :/ [00:57:05] the only known fix is to keep rebooting until it comes back up [00:57:17] there's a tracking bug about it somewhere... [00:58:47] T141673 [00:58:48] T141673: Track labs instances hanging - https://phabricator.wikimedia.org/T141673 [00:59:35] bd808: oh wow. That actually kinda does make me feel better :) [00:59:51] bd808: thanks! [01:00:34] 06Labs, 10Labs-Infrastructure, 13Patch-For-Review: Track labs instances hanging - https://phabricator.wikimedia.org/T141673#2679133 (10bd808) I rebooted tools-elasticsearch-02 today for this. I was trying to get stashbot/sal back up and running so I didn't debug any farther than ssh was hung both as my user... [01:10:42] (03CR) 10BryanDavis: "Add note about emails actually being fully public (at least functionaly)." (031 comment) [labs/striker] - 10https://gerrit.wikimedia.org/r/313139 (https://phabricator.wikimedia.org/T144710) (owner: 10BryanDavis) [02:11:31] 06Labs, 10Tool-Labs, 06Collaboration-Team-Triage, 06Community-Tech-Tool-Labs, and 5 others: Enable Flow on wikitech (labswiki and labtestwiki), then turn on for Tool talk namespace - https://phabricator.wikimedia.org/T127792#2679212 (10Mattflaschen-WMF) [02:12:04] 06Labs, 10Tool-Labs, 06Collaboration-Team-Triage, 06Community-Tech-Tool-Labs, and 5 others: Enable Flow on wikitech (labswiki and labtestwiki), then turn on for Tool talk namespace - https://phabricator.wikimedia.org/T127792#2054159 (10Mattflaschen-WMF) [06:52:40] PROBLEM - Puppet run on tools-exec-cyberbot is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [07:21:02] PROBLEM - Puppet run on tools-worker-1011 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [07:32:39] RECOVERY - Puppet run on tools-exec-cyberbot is OK: OK: Less than 1.00% above the threshold [0.0] [07:45:53] PROBLEM - Puppet staleness on tools-k8s-master-01 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [43200.0] [07:46:59] PROBLEM - Host secgroup-lag-102 is DOWN: CRITICAL - Host Unreachable (10.68.17.218) [09:30:59] RECOVERY - Puppet run on tools-worker-1011 is OK: OK: Less than 1.00% above the threshold [0.0] [09:35:50] 06Labs, 10Labs-Infrastructure, 10DBA: Implement a frontend failover solution for labsdb replicas - https://phabricator.wikimedia.org/T141097#2679433 (10Marostegui) [09:37:40] 06Labs, 06Operations: cronspam from labscontrol1001, labstore1001, labnet1002.eqiad.wmnet, labsdb1003.eqiad.wmnet - https://phabricator.wikimedia.org/T132422#2679434 (10elukey) Summary after today's hacking with @Andrew: 1) logrotate errors while zipping should be resolved via https://gerrit.wikimedia.org/r/#... [09:39:40] 06Labs, 10Labs-Infrastructure, 10DBA: Provide at least 2 separate service endpoints: one for slow, long running queries; and another for quick, web requests - https://phabricator.wikimedia.org/T147051#2679435 (10Marostegui) [09:40:40] 06Labs, 10Labs-Infrastructure, 10DBA: Provision with data the new labsdb servers and provide replica service with at least 1 shard from a sanitized copy from production - https://phabricator.wikimedia.org/T147052#2679448 (10Marostegui) [11:22:01] PROBLEM - Puppet run on tools-worker-1011 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [11:36:08] 06Labs, 10Labs-Infrastructure, 10DBA: Provision with data the new labsdb servers and provide replica service with at least 1 shard from a sanitized copy from production - https://phabricator.wikimedia.org/T147052#2679448 (10Krenair) To actually have users connect to new labsdb servers we're going to need vie... [12:40:41] 06Labs, 10Labs-Infrastructure, 07Puppet: Investigate usage of hiera_hash in our puppet repo - https://phabricator.wikimedia.org/T146621#2679673 (10Andrew) 05Open>03Resolved I just talked this over with Chase and I'm back to being convinced that this does just what we want. Specifically -- first match st... [12:47:31] 06Labs, 10Labs-Infrastructure: Default source group (security group) allowances do not update properly - https://phabricator.wikimedia.org/T142165#2679696 (10Andrew) p:05High>03Low This is sort of resolved by the timeout fix, but I'm still hoping that upstream will merge the proper fix into Liberty. [13:32:02] RECOVERY - Puppet run on tools-worker-1011 is OK: OK: Less than 1.00% above the threshold [0.0] [14:23:00] PROBLEM - Puppet run on tools-worker-1011 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [14:26:13] yuvipanda: bah, how the f do lighttpd.conf files work... [14:40:39] 06Labs, 10Labs-Infrastructure, 10DBA, 06Operations: Migrate labsdb1005/1006/1007 to jessie - https://phabricator.wikimedia.org/T123731#2680059 (10fgiunchedi) 05Open>03stalled Setting as stalled, though next steps look like this: [] Flip tools master from labsdb1005 to labsdb1004 [] Decommission labsdb... [15:03:24] addshore: they are dark magic. What are you trying to get lighttpd to do for you? [15:04:32] redirect requests to https://tools.wmflabs.org/grafana-json-datasource/annotations to call annotations.php! [15:04:50] It should be super simple right? :D but my handfull of attempts just failed outright..; [15:08:37] addshore: what about -- url.rewrite-if-not-file += ( "/grafana-json-datasource/annotations" => "/grafana-json-datasource/annotations.php" ) [15:10:32] bd808: no joy :/ https://tools.wmflabs.org/grafana-json-datasource/annotations [15:10:55] hmm.... [15:12:43] thats what I thought! [15:17:03] addshore: what about this magic? -- url.rewrite-if-not-file += ( "^(.*)$" => "$0.php" ) [15:17:15] That works for me at https://tools.wmflabs.org/bd808-test2/foo [15:17:54] hmm, still 404.. [15:17:55] its kind of a poor man's router [15:18:06] the file should be called "lighttpd.conf" right? [15:18:32] .lighttpd.conf [15:18:41] BAh [15:20:09] but, hmm, that still doesnt seem to wrok, what on earth am I doing wrong.. [15:20:52] ls: cannot access /data/project/grafana-json-datasource/.lighttpd.conf: No such file or directory [15:20:59] still not in the right place [15:21:09] where are you putting the config? [15:22:02] oh wait.... it also doesnt go in public_html... hah [15:22:49] addshore: :) it's not an .htaccess file [15:23:01] https://wikitech.wikimedia.org/wiki/Help:Tool_Labs/Web#Configuring_the_web_server [15:23:29] hah, working now, apparently I missed some of the key points when I read those docs.... somehow..... [15:23:57] edit welcome! maybe we need a "so if you are used to Apache..." section [15:24:54] yeh :D thanks for the help! [15:40:54] RECOVERY - Puppet staleness on tools-k8s-master-01 is OK: OK: Less than 1.00% above the threshold [3600.0] [15:59:36] (03CR) 10Lokal Profil: [C: 032] Update India base category [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/313388 (owner: 10Jean-Frédéric) [16:02:11] (03CR) 10Lokal Profil: "IS it possible to use the harvesting in the bot development to populate the database needed for API development (since the dumps are a pro" [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/313452 (owner: 10Jean-Frédéric) [16:03:44] (03Merged) 10jenkins-bot: Update India base category [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/313388 (owner: 10Jean-Frédéric) [16:06:23] yuvipanda: andrewbogott: Is Horizon going to get fixed soon. It's been down [16:07:02] works for me — what happens when you use it? [16:07:24] Error: Unable to retrieve usage information. [16:07:50] hm, maybe you have a project selected that you're not a member of? [16:08:22] am I understanding correctly that you're logged in but that it doesn't display any content? [16:08:47] (03CR) 10Lokal Profil: "What was the reason for reverting the other patch?" (032 comments) [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/313451 (owner: 10Jean-Frédéric) [16:08:56] andrewbogott: ah wait. Just my stupidity at work here. Had the wrong project selected [16:09:18] I make that same mistake, frequently :) It's not a great interface. [16:09:24] glad you're unstuck! [16:09:30] :-) [16:09:59] andrewbogott: so maybe you can help me out. [16:10:17] I've never created a webservice before. How would I go about that? [16:11:55] I've never made one from scratch either… [16:11:58] (and I have to go shortly) [16:12:19] Oh. :p [16:12:20] but I think it's pretty simple — I'd use python and flask, and use tools so that the network stuff is taken care of automatically [16:12:29] That should get you to proof-of-concept at least [16:12:48] It's simple if you understand it. I have yet to touch it. :p [16:16:26] andrewbogott: wait what happened to my available quota? [16:17:12] I had 20 CPUs and 50 GB of RAM at my disposal. Those unallocated seems to have vanished. [16:18:31] We lowered the default new project quota (due to running out of resources) — if you need an increase you can file a request here: https://phabricator.wikimedia.org/T140904 [16:19:38] New projects yes, but my project already had those available. [16:19:47] Now they're gone [16:19:59] all unused quota over the new defaults was taken back too [16:20:08] I was going to create an instance for a webservice. [16:20:12] we were way way over allocated [16:20:49] we're also a lot more strict now about allocating quota for things that should be in tools now, I think. [16:20:50] CP678|Laptop: what's the webservice going to do? Does it really need a whole instance to itself? [16:21:13] tool labs is pretty nice for running most webservices [16:22:54] bd808: I like keeping things together. Since I have a cyberbot project, I was going to create a small instance for the web service. [16:23:59] a whole instance for a low request volume web interface is a waste of donor funds honestly [16:24:18] bd808: low request volume [16:24:19] ? [16:24:48] It's supposed to give users some control over the IABot DB. I anticipate more than just low. [16:25:15] really? liek thousands of people per minute will want to tweak IABot data? [16:26:06] bd808: It's quite possible. I can create a tool labs interface, but I would get annoyed if it ends up crashing and needs to be moved. [16:26:15] geohack hits about 600 req/min on avg on tools and does perfectly fine. [16:26:33] yuvipanda: really? [16:26:37] yes [16:26:38] really [16:27:10] anyway, the bottom line is we do not have unlimited resources, and you need to provide a fairly strong rationale for increasing your quota. Please file a ticket and we'll discuss that in our team meeting on Monday. [16:27:59] RECOVERY - Puppet run on tools-worker-1011 is OK: OK: Less than 1.00% above the threshold [0.0] [16:28:08] see https://phabricator.wikimedia.org/T144623 and https://phabricator.wikimedia.org/T143020 for examples of how this discussion has gone and what kind of resolutions we've had in the recent past [16:39:02] (03CR) 10Jean-Frédéric: "> What was the reason for reverting the other patch?" [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/313451 (owner: 10Jean-Frédéric) [16:44:55] yuvipanda: so I guess the smart thing in this case is to create the interface on tool labs which, while doing the heavy processing on my project, which at some point in the future will inevitably need to have more resources allocated. [16:54:02] PROBLEM - Puppet run on tools-worker-1011 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [17:15:42] anomie: ping [17:16:36] CP678|Laptop: ? [17:16:46] anomie: can you approve IABotManagementConsole consumer? [17:17:18] CP678|Laptop: Did you try asking at https://meta.wikimedia.org/wiki/Steward_requests/Miscellaneous ? [17:17:46] Why would I ask there? [17:18:15] Is this some new thing? [17:22:44] Apparently it is. anomie: I used to just come to you. Didn't know there were procedures in development. [17:23:30] CP678|Laptop: I don't know if it ever got made official, but a while back the stewards were given the ability to handle OAuth consumers and quietly started doing so. [17:25:00] That page looks rather dead TBH [17:29:02] RECOVERY - Puppet run on tools-worker-1011 is OK: OK: Less than 1.00% above the threshold [0.0] [17:35:02] 06Labs, 10Tool-Labs, 06Collaboration-Team-Triage, 06Community-Tech-Tool-Labs, and 5 others: Enable Flow on wikitech (labswiki and labtestwiki), then turn on for Tool talk namespace - https://phabricator.wikimedia.org/T127792#2680437 (10Mattflaschen-WMF) >>! In T127792#2618122, @Catrope wrote: > For consist... [18:49:41] !log phabricator adding EBernhardson to project to help me with elasticsearch testing [18:49:44] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Phabricator/SAL, Master [19:11:46] 10PAWS, 10Jupyter-Hub: I can't login my bot in JUPYTER - https://phabricator.wikimedia.org/T135306#2680642 (10Aklapper) One month later: @Yuvipanda, @Maathavan, any news here? And who should this be assigned to? [19:22:46] !log tools.heritage Deployed latest from Git: 08ea28d, fb856b4, 867a229, a454375 [19:22:49] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.heritage/SAL, Master [20:11:29] yuvipanda: Anything particularly exciting in Kubernetes 1.4 for us? [20:11:36] Was just reading http://blog.kubernetes.io/2016/09/kubernetes-1.4-making-it-easy-to-run-on-kuberentes-anywhere.html?utm_source=webopsweekly&utm_medium=email