[08:51:25] 10Analytics-Kanban, 10User-Elukey: Restart Analytics JVM daemons for open-jdk security updates - https://phabricator.wikimedia.org/T179943#3822225 (10elukey) [10:08:23] 10Analytics, 10Analytics-Cluster, 10Patch-For-Review, 10User-Elukey: Enable more accurate smaps based RSS tracking by yarn nodemanager - https://phabricator.wikimedia.org/T182276#3822333 (10elukey) [10:21:59] as promised: https://wikitech.wikimedia.org/wiki/Analytics/Systems/Druid#Safe_restart_of_MiddleManagers_when_running_Real_time_Indexing_jobs [10:22:14] nuria_: --^ :) [10:23:34] Thanks elukey :) [10:34:07] 10Analytics, 10Analytics-EventLogging, 10Icinga, 10Operations, 10Patch-For-Review: eventlog2001 - CRITICAL status of defined EventLogging jobs - https://phabricator.wikimedia.org/T119930#3822391 (10elukey) Thanks a lot @Dzahn, opened https://phabricator.wikimedia.org/T182397 [10:34:31] 10Analytics, 10Operations, 10ops-eqiad: Decomission eventlog2001 - https://phabricator.wikimedia.org/T182397#3822392 (10elukey) [10:38:47] 10Analytics-Kanban, 10Patch-For-Review, 10Services (watching): Add action api counts to graphite-restbase job - https://phabricator.wikimedia.org/T176785#3822398 (10JAllemandou) @Pchelolo: It has indeed happen. The tak has been moved to done on our kanban, we'll resolve it after we finalize the discussion :)... [10:41:20] joal: this is basically the change for the druid_exporter that I was talking about https://gerrit.wikimedia.org/r/#/c/396051 [10:42:43] the historical segment metrics have incorrect labels now uffff [10:42:53] pfff [10:43:38] elukey: Just added a nit-picky comment ... [10:44:00] (03PS2) 10Thiemo Mättig (WMDE): Record metrics for Wikidata task priorities (via color) [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/396065 [10:44:26] (03CR) 10Thiemo Mättig (WMDE): Record metrics for Wikidata task priorities (via color) (031 comment) [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/396065 (owner: 10Thiemo Mättig (WMDE)) [10:44:34] ah thanks! [10:47:45] (03CR) 10Lucas Werkmeister (WMDE): [C: 031] Record metrics for Wikidata task priorities (via color) [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/396065 (owner: 10Thiemo Mättig (WMDE)) [10:49:04] !log Update wmf.mediawiki_history as explained in email (rename current table to old, create new one) [10:49:06] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [10:49:18] joal: just updated the cr [10:49:23] also renamed one metric [10:52:04] hopefully now there are better tests and more flexibility [10:52:28] elukey: Looks great :) [11:15:09] !log Start mediawiki-history oozie jobs new-version [11:15:10] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [11:54:49] joal: updated the prometheus-druid exporter, metrics looks good now! [11:54:55] will double check on Monday their values [11:55:02] but the new release seems gooood [12:01:31] hehehe :) [12:01:59] joal: how urgent is the clickstream patch ? [12:02:32] elukey: no emergency - Should actually be reviewed by ottomata and possibly milimetric and nuria as well (adding HTML for doc etc) [12:02:32] today is technically a bank holiday for me but I got nerd-sniped into druid-prometheus (the famous "oh I'll work only 30 mins) [12:02:47] elukey: WHAT THE HELL ARE YOU STILL DONG HERE ? [12:02:49] :D [12:03:38] * elukey runs [12:03:47] o/ [12:28:33] (03CR) 10Fdans: Add pageviews by country endpoint (032 comments) [analytics/aqs] - 10https://gerrit.wikimedia.org/r/393591 (https://phabricator.wikimedia.org/T181520) (owner: 10Fdans) [12:45:06] taking a break team - will be here for meeting later [13:18:33] (03PS5) 10Fdans: Add pageviews by country endpoint [analytics/aqs] - 10https://gerrit.wikimedia.org/r/393591 (https://phabricator.wikimedia.org/T181520) [13:56:29] 10Analytics, 10Discovery, 10EventBus, 10Wikidata, and 4 others: Create reliable change stream for specific wiki - https://phabricator.wikimedia.org/T161731#3822862 (10Ottomata) > Seems like the mirroring is done by 0.9 MirrorMaker and timestamp handling was added only in 0.10 MirrorMaker. Hm, ya but I ha... [14:29:35] joal, yt? [14:29:52] i need some help testing superset stuff, and I kinda need another LDAP user to do it! [14:30:03] Hi ottomata [14:30:06] I'm here :) [14:30:09] in meeting, but here [14:32:11] ok, i can test this in prod or labs, probably better to test in labs, so let's do that [14:32:27] joal, do the thing where you edit /etc/hosts and set superset.wikimedia.org to localhost [14:32:31] then [14:32:32] ssh -N j1.analytics.eqiad.wmflabs -L 9080:j1.analytics.eqiad.wmflabs:80 [14:32:34] and [14:32:39] localhost:9080 [14:32:46] i think it isn't going to httpsify you [14:32:48] i hope [14:32:49] :) [14:33:50] ottomata: If I go to localhost, why would I need the /etc/hosts set [14:33:51] ? [14:34:35] because you are proxying to :80 and the apache virtualhost expects the Host: header to be superset.wikimedia.org [14:34:36] oh right [14:34:36] sorry [14:34:37] i mean [14:34:37] go to [14:34:38] ottomata: Same error as this morning with the real cname: ERR_TOO_MANY_REDIRECTS [14:34:42] superset.wikimedia.org:9080 [14:34:43] sorry [14:34:53] oh [14:34:56] actually [14:35:00] you don't need the /ets/hosts thing [14:35:03] you can go to lcoalhost [14:35:08] because there is only one virtualhost on the labs machine [14:35:10] so it'll default to it [14:35:36] ok, joal do again now [14:35:37] am watching [14:35:45] ok makes sense [14:35:47] this is what I need to test [14:35:52] cool :) [14:35:57] you are actually authenticating fine (no https redirect, rifght?) [14:36:00] its just lots of redirects. [14:36:08] its because superset is not auto-creating your superset account after auth [14:36:09] correct - auth worked (AFAIK) [14:36:11] which i think it should [14:36:40] i can't test this myself, because i'm the admin user with an account [14:36:43] so it'll work for me [14:36:53] and i can't create a fake admin user when remote/ldap auth is on [14:36:58] because it'll only auth with ldap [14:40:33] wow doh [14:40:36] joal i just read code [14:40:38] growl [14:40:45] this won't work with AUTH_REMOTE_USER. [14:40:51] so. yesterday I got AUTH_REMOTE_LDAP to work [14:40:54] except [14:40:57] not with LDAP groups [14:41:04] i could let any LDAP user authenticate [14:41:08] but not restrict to groups [14:41:23] so, I switched to AUTH_REMOTE_USER, which uses the HTTP auth [14:41:29] with and LDAP auth http proxy in front [14:41:34] that works with ldap groups [14:41:48] but, i just read code.... LDAP_AUTH will auto 'register' you with superset [14:41:52] AUTH_REMOTE_USER will not. [14:41:54] growl. [14:42:02] we'd have to make manual accounts in superset [14:42:05] grrrr [14:42:45] :( [14:44:17] i know how to fix flask_appbuilder... [14:44:26] we could fork it or something [14:44:28] maybe upstream a fix [14:44:31] to support ldap groups. [14:45:25] and/or fix to support user reg with REMOTE_USER [14:45:26] grr [14:54:27] joal: will you try again? same thing? [14:54:31] ottomata: sure [14:54:52] 500 [14:54:58] ok hang on... [14:54:59] AttributeError: 'bool' object has no attribute 'login_count' [14:54:59] i see it [14:55:37] ok try again joal [14:56:55] You probalby know what happens :) [14:56:58] great! [14:57:00] I'm in, but invalid login [14:57:04] invalid login? [14:57:18] probably a leftover - now looks good [14:57:37] Can't look at my profile :) [14:58:04] anyway, I'm in! [14:58:04] cant look at your profile? [14:58:08] what does it do? [14:59:05] It tells me restricted access [15:00:01] that's weird, you are an "Alpha" user by default [15:00:05] lemme make you an admin.. [15:00:34] joal: ok what does it do now? [15:00:49] Works for me ottomata :) [15:01:03] hm strange [15:01:26] ok i'm going to make a patch to flask_appbuilder and see if we can get it upstreamed. [15:01:41] Thanks a lot for that mate :) [15:01:51] ottomata: Can I start adding cluster and so? [15:01:56] or is it in labs? [15:02:14] that's labs [15:02:18] i can get you in prod if you like [15:02:30] but druid seems...sluggish, in that it wont' succeed updating cluster metadata [15:02:31] you can try [15:02:39] lemme make you an account [15:02:42] and you can do waht you want in prod [15:02:43] cool :) [15:03:05] joal your ldap username [15:03:05] is [15:03:06] Joal? [15:03:15] joal [15:03:20] k [15:03:31] ok joal [15:03:36] https://superset.wikimedia.org [15:03:37] same long [15:03:39] login* [15:03:44] that is prod [15:04:21] I didn't even have to login ottomata - Weird (I tried this porning) [15:05:24] your ldap creds for that site were probably saved in your browser session [15:05:33] right [15:05:37] so you can authenticate with the proxy [15:05:42] but you ddidn't have an account in superset [15:05:45] i just made one for you [15:05:49] makes sense [15:05:58] It linked to it straight [15:06:00] the hacky thing i did in labs was make superset auto-create an account if authenticated. [15:06:12] ok [15:08:36] OO, joal it makes sense why you can't auto register with AUTH_REMOTE_USER. [15:08:43] if so, anyone could do [15:08:56] --header 'X-Remote-User: joal' and log in [15:09:03] hm, but not really [15:09:09] beacuse they have to go through the proxy [15:09:11] You wouldn't pass auth, right? [15:09:13] if they aren't logged into thorium [15:09:16] well, if you hit the app [15:09:17] you would. [15:09:21] the proxy does the ldap auth [15:09:23] yes [15:09:24] and then sets a header [15:09:31] that's how superset knows who you are [15:09:37] but with AUTH_REMOTE_USER [15:09:44] all it does is compare the REMOTE_USER header [15:09:47] to what it has in the db [15:10:05] hm, in our setup that's fine. [15:10:12] because you can't hit the app directly [15:10:17] unless you are logged into thorium [15:10:27] or a $CACHE_SERVER. [15:10:29] wait no [15:10:30] not true [15:10:43] yeah, superset app port only open to lcoalhost [15:10:49] but still, a little weird. [15:10:53] i was going to make an upstream patch [15:11:04] but it means that if someone enables auto reg with AUTH_REMOTE_USER [15:11:18] anyone can register...i guess that is the point... [15:11:18] hm [15:15:57] 10Analytics, 10Discovery, 10Discovery-Analysis, 10Discovery-Search: UDF for language detection - https://phabricator.wikimedia.org/T182352#3823175 (10TJones) The CLD3 page says it is intended to run in a browser and relies on Chromium... that's kinda weird. And I don't see a list of supported/identified la... [15:24:37] ottomata: Superset behavior when updating druid datasources is super weird [15:28:28] 10Analytics-Kanban, 10Patch-For-Review: Productionize Superset - https://phabricator.wikimedia.org/T166689#3823217 (10Ottomata) Alright! https://superset.wikimedia.org is up and running. I had a bit of trouble with authentication. Here's the skinny: superset uses Flask-AppBuilder as its web framework. Fla... [15:28:30] y3eah.... [15:28:40] i haven't totally looked at that yet [15:28:44] can you figure out what's it doing? [15:28:47] is it configured properly? [15:29:28] joal: yesterday when i was looking a little bit, it seemed like it was hanging trying to insert a datasource into the mysql d. [15:29:29] db. [15:29:45] ottomata: maybe mysql write right issue? [15:30:11] maybe.. [15:30:25] it can write to other tables though... [15:30:33] INSERT INTO datasources (created_on, changed_on, description, default_endpoint, is_featured, filter_select_enabled, offset, cache_timeout, params, perm, datasource_name, is_hidden, fetch_values_from, cluster_name, user_id, changed_by_fk, created_by_fk) VALUES ('2017-12-08 15:29:40.386500', '2017-12-08 15:29:40.386519', NULL, NULL, 0, 0, 0, NULL, NULL, NULL, 'banner_activity_minutely', 0, NULL, 'analytics-eqiad', NULL, 2, 2) [15:30:38] just never finishes... [15:30:55] oh wait, it did finsih one. [15:30:58] oh its a transactional thing [15:30:59] hmm [15:31:14] i think its timing out [15:31:21] it can't insert all the datasources in some timeout [15:31:26] so fails the transaction. [15:32:18] ottomata: Could be related to having 2 clusters? [15:32:35] it did the same with one [15:32:38] k [15:32:38] i'm going to delete the analytics one though [15:32:46] Why? [15:32:46] and try jsut public [15:32:53] the one i had before was analytics [15:32:57] and it has more datasources? [15:33:16] we can readd later [15:34:04] it takes forever for mysql to finish this write [15:34:17] Weirdoh [15:35:02] yeah [15:35:03] Lock wait timeout exceeded; try restarting transaction [15:35:03] ... [15:40:51] oohhh for public we got firewall probs... :) [15:41:41] so, removing public I guess :) [15:41:42] huh, coordinator needs to be open as well [15:41:49] no i tried with analytics yesterday [15:41:52] i want to try with public [15:41:59] to see if it finishes queryhing the fewer datasources there [15:46:08] ok team - leaving to catch Naé at creche - Will be back later on tonight [15:59:12] 10Analytics-Kanban, 10Operations: Allow access to druid public-eqiad cluster ports 8081 from analytics VLAN - https://phabricator.wikimedia.org/T182443#3823299 (10Ottomata) p:05Triage>03Normal [16:01:25] 10Analytics-Kanban, 10Patch-For-Review: Productionize Superset - https://phabricator.wikimedia.org/T166689#3823316 (10Ottomata) a:03Ottomata [16:09:53] 10Analytics-Kanban, 10Operations: Allow access to druid public-eqiad cluster ports 8081 from analytics VLAN - https://phabricator.wikimedia.org/T182443#3823331 (10Ottomata) [16:18:35] 10Analytics-Kanban, 10Patch-For-Review: Productionize Superset - https://phabricator.wikimedia.org/T166689#3823388 (10mark) [16:18:37] 10Analytics-Kanban, 10Operations: Allow access to druid public-eqiad cluster ports 8081 from analytics VLAN - https://phabricator.wikimedia.org/T182443#3823385 (10mark) 05Open>03Resolved a:03mark Added port 8081 to the existing term for druid on cr1-eqiad and cr2-eqiad. [16:30:12] I'm gonna get lunch and drive home now, be back in an hour and a half or so [16:40:13] 10Analytics, 10Discovery, 10Discovery-Analysis, 10Discovery-Search: UDF for language detection - https://phabricator.wikimedia.org/T182352#3823433 (10Nuria) >The CLD3 page says it is intended to run in a browser and relies on Chromium... that's kinda weird. Seems that you can install it as a library with... [18:28:14] 10Analytics, 10Research: Make HTML dumps available - https://phabricator.wikimedia.org/T182351#3821170 (10leila) a:05leila>03None [18:28:57] hey joal. are you still around? [18:34:53] lzia: joal is gone for the day, can i help you with something? [18:35:02] ow hi nuria_. :) [18:36:04] nuria_: joal and I had a chat a couple of days ago about HTML dumps. we did some homework on our end and created T182351 to follow up discussions with Analytics. I just wanted to tell him that the ticket is now ready for your review. :) [18:36:04] T182351: Make HTML dumps available - https://phabricator.wikimedia.org/T182351 [18:36:34] lzia: yes i talked to joal about it but dumps is a project now being moved to cloud infrastructure by bd808 team [18:37:25] lzia: so that might be something that they might have thought about (or not) but in any case it will be pending on the move happening [18:37:36] lzia: cc bd808 on ticket so he knows [18:38:59] nuria_: ok. thanks for the advice. adding bd808 [18:40:55] 10Analytics, 10Research: Make HTML dumps available - https://phabricator.wikimedia.org/T182351#3821170 (10leila) @bd808 I learned from Nuria that we should have a chat with you regarding dumps and this ticket. So, here I am. :) I'm bringing this up now as I know we're all planning for next quarter, and I don't... [18:49:01] Hi lzia - Thanks nuria_ for the follow up :) [18:58:53] 10Analytics, 10Discovery, 10EventBus, 10Wikidata, and 5 others: Create reliable change stream for specific wiki - https://phabricator.wikimedia.org/T161731#3823680 (10Ottomata) Woot, that did it ^. We need topics to default to LogAppendTime. ``` [@stat1005:/home/otto] $ ./kafkacat -Q -b kafka-jumbo1001... [19:06:12] milimetric, nuria_ - Just added you as reviewers for the patch to rsync clickstream to dumps page - I also modifed the analytics-dumps-page accordingly [19:06:22] joal: k [19:06:29] joal: will look at this monday [19:06:41] I've also a change in clickstream page, but I wait for data to be avalable before applying [19:06:44] sure [19:07:11] Actually - I'll make the change on page, like that it is reviewa [19:07:16] reviewable [19:08:35] 10Analytics, 10Discovery, 10EventBus, 10Wikidata, and 5 others: Create reliable change stream for specific wiki - https://phabricator.wikimedia.org/T161731#3823684 (10Nuria) Nice, Can @Smalyshev check whether consuming from these topics as set would work for his purposes? [19:16:14] 10Analytics, 10Discovery, 10EventBus, 10Wikidata, and 5 others: Create reliable change stream for specific wiki - https://phabricator.wikimedia.org/T161731#3823695 (10Ottomata) So, FYI, the timestamps as they are now are the timestamp that the kafka jumbo-eqiad cluster received the messages. These are rep... [19:16:54] joal: , i merge that? [19:17:02] oh [19:17:09] i'll just +1 sorry just read backscroll [19:28:01] thanks ottomata :) [20:13:44] 10Analytics-Kanban, 10Patch-For-Review: Productionize Superset - https://phabricator.wikimedia.org/T166689#3823874 (10Ottomata) Oof, Somethign is not happy with MySQL + superset and druid metadata refresh. ``` Dec 8 20:10:01 thorium superset[5227]: 2017-12-08 20:10:01,479:ERROR:root:(_mysql_exceptions.Operat... [20:22:59] 10Analytics-Kanban, 10Analytics-Wikistats: Bug: Bar Chart disappears - https://phabricator.wikimedia.org/T182461#3823937 (10Milimetric) [20:26:12] (03PS1) 10Milimetric: Fix loading sparse data into widgets [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/396469 (https://phabricator.wikimedia.org/T182224) [21:21:33] 10Analytics-Kanban, 10Patch-For-Review: Productionize Superset - https://phabricator.wikimedia.org/T166689#3824206 (10Ottomata) Hm, might not be a MySQL related problem after all. I switched the database to a local sqlite db, and I get a very similar problem: ``` Dec 8 21:21:05 thorium superset[27367]: Oper... [21:45:29] 10Analytics-Kanban, 10Patch-For-Review: Productionize Superset - https://phabricator.wikimedia.org/T166689#3824271 (10Ottomata) Hm, looks to be a threading/async/gevent issue. I switched the gunicorn worker class back to sync, and it works now. I bumped up to 8 workers. If 8 people run long queries at once,... [21:46:04] 10Analytics, 10EventBus, 10MW-1.31-release-notes (WMF-deploy-2017-11-14 (1.31.0-wmf.8)), 10Patch-For-Review, 10Services (next): Timeouts on event delivery to EventBus - https://phabricator.wikimedia.org/T180017#3824272 (10Pchelolo) I've been digging it a bit more to figure out the reason for the remainin... [22:20:35] 10Analytics-Kanban, 10Analytics-Wikistats: Bug: Bar Chart disappears - https://phabricator.wikimedia.org/T182461#3824334 (10Milimetric) [22:21:55] (03PS1) 10Milimetric: Fix bar chart not re-rendering [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/396537 (https://phabricator.wikimedia.org/T182461)