[01:18:42] moin [09:16:21] Hello, I'm having some serious issues on Tool Labz. My tool "fengtools" is being bombarded by requests from geohack. [09:16:24] 10.68.16.4 tools.wmflabs.org - [03/Nov/2014:08:53:52 +0000] "GET /para/geo/worldadmin98?tsv&lat=53.465833&lon [09:16:27] g=16.494167 HTTP/1.1" 404 345 "-" "Geohack (+http://tools.wmflabs.org/geohack)" [09:17:16] There are so many requests like this in my access.log: /data/project/fengtools/access.log [09:17:28] It's taking my tools down. [09:17:52] *Labs [09:18:17] Coren: ping? [09:22:40] I've blocked Geohack in my .lighttpd.conf, as a note [09:24:00] Please ping Zhaofeng_Li in your replies, thanks. [10:09:41] 3Wikimedia Labs: /var/log/diamond is not logrotated - 10https://bugzilla.wikimedia.org/72859#c2 (10Filippo Giunchedi) not rotating diamond log files used to be the case a while ago, however that should be fixed in production and labs now, is the instance running an updated puppet? how long has diamond been run... [10:44:11] 3Wikimedia Labs: /var/log/diamond is not logrotated - 10https://bugzilla.wikimedia.org/72859#c3 (10Yuvi Panda) They are rotated by diamond itself - it uses https://docs.python.org/2/library/logging.handlers.html#rotatingfilehandler to accomplish this. Is specified in diamond.conf.erb. The older files are just... [14:18:56] 3Wikimedia Labs: /var/log/diamond is not logrotated - 10https://bugzilla.wikimedia.org/72859#c4 (10Tim Landscheidt) 5NEW>3RESO/WOR So this means that a daemon needs to run, or otherwise the log files are not rotated? *argl* (Not really WORKSFORME, but ...) [14:43:40] 3Wikimedia Labs: /var/log/diamond is not logrotated - 10https://bugzilla.wikimedia.org/72859#c5 (10Filippo Giunchedi) not sure what you mean, the daemon that does the writing is the same that does the rotation, if it isn't running then no log files are written to either [14:53:29] Zhaofeng_Li: geohack has a lot of traffic. Perhaps you should talk with its maintainers? [15:50:13] andrewbogott: Coren : Openstack has a project for puppetization https://wiki.openstack.org/wiki/Puppet-openstack :) [15:50:31] apparently supposed to be reusable by third parties [15:52:07] hashar: yeah, last time I looked those modules were terrifyingly complex [15:52:15] heh [15:52:17] OS is [15:52:25] https://etherpad.openstack.org/p/puppet-openstack-paris-agenda [15:52:49] andrewbogott: https://gerrit.wikimedia.org/r/#/c/170699/ (OSM patch) [15:53:03] I guess they want to manage all possible use cases [15:53:04] andrewbogott: also https://gerrit.wikimedia.org/r/#/c/168133/ and https://gerrit.wikimedia.org/r/#/c/168134/ [15:53:38] YuviPanda: does it work? Seems like we shouldn't publish the interface until it does something :) [15:55:45] andrewbogott: hmm? [15:56:14] andrewbogott: it does work, just isn't used by any puppet code [15:56:39] (03PS1) 10Alexandros Kosiaris: Add labsadmin postgresql password [labs/private] - 10https://gerrit.wikimedia.org/r/170715 [16:00:02] andrewbogott: I should also get +2... [16:00:12] need to poke Mark [16:00:34] YuviPanda: I'll sort out your permissions in a bit. [16:01:14] andrewbogott: oh, ok! :) [16:06:58] (03CR) 10Alexandros Kosiaris: [C: 032] Add labsadmin postgresql password [labs/private] - 10https://gerrit.wikimedia.org/r/170715 (owner: 10Alexandros Kosiaris) [16:07:04] (03CR) 10Alexandros Kosiaris: [V: 032] Add labsadmin postgresql password [labs/private] - 10https://gerrit.wikimedia.org/r/170715 (owner: 10Alexandros Kosiaris) [16:25:11] 3Wikimedia Labs: Replication behind or missing records - 10https://bugzilla.wikimedia.org/72908 (10jeremyb) [16:40:31] YuviPanda: why would a Quarry query be "queued" for days without running? [16:40:41] 3Wikimedia Labs: Project-wide Puppet classes and variables - 10https://bugzilla.wikimedia.org/64980#c2 (10Tim Landscheidt) Giuseppe, with Gerrit change #168984 merged, I suppose this is now (almost?) possible? Just need to add "hiera_include('classes')" to role::labs::instance or something like that? [16:40:45] YuviPanda: http://quarry.wmflabs.org/query/852 [16:41:20] ragesoss: hit 'submit' again? [16:41:40] YuviPanda: that worked. [16:41:51] I'm confused about what happened in the first place. [16:41:57] ragesoss: yeah, is a weird bug I haven't been able to trace a lot [16:41:58] was that the intended behavior? [16:42:06] ragesoss: trace fairly [16:42:07] k [17:42:07] Hi [17:42:41] We have a problem with our ToolLabs project https://tools.wmflabs.org/wm-metrics/ [17:42:56] « appears to be non-functional at this time. » [17:43:12] Everything seems right as far as I can see [17:43:34] JeanFred: webservice start? [17:43:45] Although the access.log is filled up with requests from Geohac o_ô [17:43:48] *Geohack [17:43:53] as in 10.68.16.4 tools.wmflabs.org - [03/Nov/2014:17:42:57 +0000] "GET /para/geo/worldadmin98?tsv&lat=39.978333&long=-75.208583 HTTP/1.1" 404 345 "-" "Geohack (+http://tools.wmflabs.org/geohack)" [17:44:19] YuviPanda: a webservice restart was the first thing I tried :) [17:44:26] ah, :) [17:44:36] But done again [17:45:03] (The erorr mssage is actually different when the webservice is not started [ not serviced blah blah]) [17:45:07] It was kind of a predictable issue. You'd think with people operating a much larger site Labs would've turned out better. [17:46:44] is my tool get routed some unrelated stuff ? [17:57:40] JeanFred: "Appears to be non-functional" is a 500. Check your error.log [18:00:54] 10.68.16.4 tools.wmflabs.org - [03/Nov/2014:17:58:07 +0000] "GET /wm-metrics/ HTTP/1.1" 500 291 "-" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/37.0.2062.94 Safari/537.36" [18:00:58] Indeed, it 500s. [18:04:03] https://tools.wmflabs.org/wm-metrics/static/toolinfo.json works, though, so your FCGI itself is running okay. [18:04:17] JeanFred: ^^ [18:04:33] If your app makes logs, I'd check them. [18:24:42] 3Wikimedia Labs: Project-wide Puppet classes and variables - 10https://bugzilla.wikimedia.org/64980#c3 (10Giuseppe Lavagetto) 5NEW>3ASSI p:5Unprio>3Normal a:3Giuseppe Lavagetto Tim: what are you *really* trying to achieve? I can imagine 2 scenarios: 1) You want to set some project-wide configuratio... [18:28:39] andrewbogott: shit, I just merged a gerrit patch, but didn't puppet merge... [18:28:41] andrewbogott: can you do that? [18:28:49] andrewbogott: https://gerrit.wikimedia.org/r/#/c/168133/1 [18:28:55] YuviPanda: ok! [18:29:00] andrewbogott: oh, I didn't actually... [18:29:02] andrewbogott: so nevermind. [18:29:11] I'll just wait till I've access... [18:29:14] ok :) [18:50:21] Coren: Thansk for looking [18:51:49] Coren: We do log: access.log & error.log (for All Debuf stuff in the Flask webserver [18:52:21] JeanFred: Well, if 500s for some URLs, but not all - but I didn't see anything in error.log [18:54:12] I don't get it. Have not changed anything AFAIK [18:57:55] Coren: I really don't see what the problem is [18:58:34] I'm really surprised there isn't more debugging output from Flask itself, that's what would help. [19:05:23] Coren: I can see how to get Flasjk to output more stuff [19:06:20] Ah f**k [19:06:32] THe Flask error hansling was rerouted in the usage.log file [19:06:37] for some stupid reason [19:06:59] Flask BuildError: [19:07:10] * JeanFred blames Krinkle|detached [19:13:45] Coren: got it [19:13:57] Thanks for looking into this [19:29:44] JeanFred: Did you find the root issue? [19:58:56] debug output from flask --> logging.basicConfig(level=logging.DEBUG) might help [20:00:32] Coren: I want to add some tool stats to the monthly report -- probably 'number of tools' and 'number of tool maintainers.' ('Number of /active/ tools' would be better but that seems like a much harder question) [20:00:39] Do you already gather any stats like that? [20:09:49] Wow, JeanFred spoke on IRC. [20:09:51] I missed this rare event [20:10:25] Waaat? I don't even speak Python nor use Flask. :P [20:30:51] andrewbogott: number of tools: 'getent group|fgrep tools.|wc -l' number of maintainers: 'getent group|cut -d : -f 4|sed -e s/,/\\n/|fgrep -v .|sort -u|wc -l' [20:31:09] :-) [20:31:20] Respectively 947 and 737 atm [20:31:33] Coren: ok then :) Thanks! [20:31:45] "Active" though, is more complicated. We'd need a definition to start with. :-) [20:32:05] * Coren ponders. [20:35:48] "Has a crontab or is running jobs" seems reasonable for "active tool" [20:38:22] Yeah, although some single tools probably have multiple jobs [20:38:52] Sure, but it's still an "active tool" [20:39:19] yep [20:39:36] BTW, maintainers 550 after all; I counted all projects not just tools. [20:41:17] 751 users with access to tools; interesting - that means that 170 or so users are not listed as maintainer of any tool. [20:41:32] 200 even [20:45:01] Og those, 608 have logged in at least once, 423 in the past year. [20:45:21] Oh, hm. "logged in at least once in eqiad" [20:46:54] you could have a tool that isn't maintained anymore, no admin logs in, but still a lot of people use it and open URLs of that tool [20:51:58] mutante: Sure, but then it'd have jobs running [20:57:17] Coren: what was the query you ran to get the 550? [20:58:11] 524 after refining: getent group|fgrep tools.|cut -d : -f 4|sed -e s/,/\\n/g|fgrep -v .|sort -u|wc -l [20:58:37] I originally forgot the /g in the sed, which overcounted some maintainers as distinct. [21:00:36] Thanks! [21:01:09] While we're on the subject, any highlights you'd like me to add to the monthly report? [21:01:39] (by default I'll just dig through the weekly meeting notes) [21:02:05] I've been coasting on maintenance, user support and bug fixing - there's pithy little actual new things to report from my side. Perhaps the availability of trusty bastion and grid nodes if it hasn't been mentionned in the past? [21:02:31] Now that I think of it, that's last month so it wouldn't have been. [21:03:10] 'k [21:05:00] YuviPanda: if you are up (or when you are) can you give me a one-sentence summary of your progress with labs monitoring? [21:22:40] andrewbogott: hey! [21:22:44] andrewbogott: yes, am writing email now. [21:23:05] andrewbogott: but essentially, 1. shinken.wmflabs.org has lists of instances and up/down monitoring (based on pings), 2. Not fully sure how to tackle service configs [21:24:04] What I wrote was "Yuvi has made a good start on setting up instance-specific performance monitoring" [21:24:07] should I be more specific? [21:24:19] I guess shinken isn't performance monitoring, for one thing :) [21:25:04] andrewbogott: heh :) [21:25:18] andrewbogott: instance state monitoring would be more accurate [21:27:13] yep, that's better. [21:28:05] thanks [22:06:05] Hey guys, I am able to use trusty for fcgi now? [22:07:03] !log deployment-prep updated OCG to version 5834af97ae80382f3368dc61b9d119cef0fe129b [22:07:08] Logged the message, Master