[05:27:06] 10Analytics: Investigate requests flagged as pageview in analytics header coming from bots - https://phabricator.wikimedia.org/T135251#3097090 (10Tbayer) See also T117631#1783069 [06:53:24] !log re-run mediacounts-archive-wf-2017-03-13 from Hue (OOMs in the stderr) [06:53:25] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [07:06:12] helloooo [07:13:48] o/ [07:15:42] 10Analytics, 10Analytics-Cluster, 15User-Elukey: Apply Xms Java Heap settings to all the Hadoop daemons - https://phabricator.wikimedia.org/T159219#3097155 (10elukey) [07:15:51] 06Analytics-Kanban, 15User-Elukey: Review the recent Varnishkafka patches - https://phabricator.wikimedia.org/T158854#3097156 (10elukey) [07:16:02] 06Analytics-Kanban, 15User-Elukey: AQS: Verify that node not being able to restart logs locally to errorlog not to logstash - https://phabricator.wikimedia.org/T155791#3097157 (10elukey) [07:16:44] 10Analytics-Cluster, 06Analytics-Kanban, 13Patch-For-Review, 15User-Elukey: Add jmxtrans metrics from Hadoop yarn-mapreduce-historyserver - https://phabricator.wikimedia.org/T156272#3097158 (10elukey) a:03elukey [08:48:47] 10Analytics, 03Interactive-Sprint: Report updater should support Graphite mapping plugins - https://phabricator.wikimedia.org/T152257#3097255 (10Nemo_bis) [08:49:19] 10Analytics, 06Discovery, 06Discovery-Analysis, 07RfC: RFC: Requirements for analytics stats processor - https://phabricator.wikimedia.org/T150028#2772066 (10Nemo_bis) > we use report updater that creates tsv files "Report updater" is described at https://wikitech.wikimedia.org/wiki/Analytics/Reportupdate... [10:18:09] * elukey afk for ~one hour! [10:25:11] (03PS2) 10Mforns: Add new endpoint for legacy pageviews [analytics/aqs] - 10https://gerrit.wikimedia.org/r/342417 (https://phabricator.wikimedia.org/T156391) [10:25:20] (03CR) 10Mforns: Add new endpoint for legacy pageviews (034 comments) [analytics/aqs] - 10https://gerrit.wikimedia.org/r/342417 (https://phabricator.wikimedia.org/T156391) (owner: 10Mforns) [10:34:24] (03PS1) 10Joal: Correct per-project endpoint date check [analytics/aqs] - 10https://gerrit.wikimedia.org/r/342601 (https://phabricator.wikimedia.org/T160311) [10:34:31] 10Analytics, 13Patch-For-Review: Sort inconsistency in AQS timestamp behavior - https://phabricator.wikimedia.org/T160311#3097461 (10JAllemandou) I did a quick check: - Per Project - Hourly - **BUG** - Returns multiple hours results only for full days: - https://wikimedia.org/api/res... [10:35:20] (03PS3) 10Mforns: Add script to generate WSC abbrevs to domain map [analytics/refinery] - 10https://gerrit.wikimedia.org/r/338786 (https://phabricator.wikimedia.org/T158330) [10:35:28] fdans: Hi ! Please have a quick look at the patch I just submitted [10:40:41] (03CR) 10Mforns: "Fixed the issue, see inline comment." (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/338786 (https://phabricator.wikimedia.org/T158330) (owner: 10Mforns) [10:43:49] 10Analytics: Refactor monthly banner oozie job to use already indexed daily data - https://phabricator.wikimedia.org/T159727#3097465 (10JAllemandou) [10:45:13] 10Analytics, 10DBA, 06Labs: Discuss labsdb visibility of rev_text_id and ar_comment - https://phabricator.wikimedia.org/T158166#3097467 (10JAllemandou) I discussed this with @ArielGlenn. He told me he wouold investigate. Ping @ArielGlenn? [11:11:20] joal: on it! [11:16:43] (03CR) 10Fdans: [V: 031] "Looks good, and thanks for the test! :)" [analytics/aqs] - 10https://gerrit.wikimedia.org/r/342601 (https://phabricator.wikimedia.org/T160311) (owner: 10Joal) [11:26:58] joal: o/ - feeling better? [11:29:27] Hi elukey :) [11:30:30] elukey: a full day of not-thinking about hadoop, cassandra nor spark was very much what I needed :) [11:30:45] joal: we should merge https://gerrit.wikimedia.org/r/#/c/342205/ and https://gerrit.wikimedia.org/r/#/c/342486/ to be done with the fix [11:31:00] joal: I don't believe that you did, but surely you had some rest :D [11:31:01] (sorry to bring back cassandra to your life Joal) [11:31:54] fdans: I suggest removing old project code in aqs [11:32:07] fdans: is it a good idea? [11:32:35] hmm [11:32:53] it makes sense since it's just a timestamp change [11:33:08] it doesn't really make sense to keep it for historical reasons joal :) [11:33:22] also fdans, we made a mistake (without consequences by chance) when we started jobs to continuously upgraed per project v2 [11:33:38] oh no! [11:33:46] fdans: we forgot to patch the monthly query [11:34:01] fdans: But by chance, no month has finished, so no code has run [11:36:30] joal: damn you're totally right! [11:36:45] many moving parts, I'm sorry [11:37:21] so joal the procedure would be to kill the monthly job and relaunch it with the query patched? [11:37:40] fdans: Don't worry, we were 2 on the thing and we both forgot :) [11:38:17] fdans: it would, but since we plan to actually lerge and deploy the prod thing soon, no need to it on the non-prod jobs [11:39:54] so hold on, this would only affect the monthly job we're running to continuously populate the keyspace [11:40:22] fdans: we currently have 2 jobs running: one for old keyspace, one for new keyspace [11:40:29] but since we're now merging and deploying the switch to v2, and the month isn't going to end before then, we can just kill it [11:40:46] the one for new keyspace usesis temporary since it uses code in your folder [11:40:52] correct [11:41:06] cool cool [11:41:40] joal: shall we merge then those two changes? [11:41:53] oh sorry, yeah, remove the old code :) [11:42:06] fdans: And, let's wait for milimetric to review as well :) [11:45:21] * elukey lunch! [12:28:40] joal: stopped yarn and hdfs on 1043, preparing for reimage [12:28:46] k elukey [12:28:50] (the containers are still working \o/) [12:28:56] :) [12:29:10] will wait for the containers to finish and then I'll start [12:29:34] elukey: however when you'll reimage the machine, they'll die (or you have powers I don't even suspect) [12:33:21] joal: mmm if the appmaster is on 1043 right? [12:33:32] or is it the case that it is always on the same host? [12:33:33] elukey: Was meaning when you'll [12:33:37] restart the node [12:35:00] joal: mmm sorry I didn't get it (I know you are right but I'd need to understand it :) [12:35:38] elukey: it was a bad joke - Like you'll keep running containers even if stopping the computer [12:38:23] joal: I don't get your jokes immediately because usually you tell me where are my knowledge holes, I am used to it so I immediately start reviewing what I did :D [12:39:00] elukey: hm, this needs to change, you actually know more than me now ;) [12:39:16] ahahhaha suuuuure [12:40:35] 10Analytics, 10Analytics-Cluster, 06Operations, 13Patch-For-Review, 15User-Elukey: Reimage all the Hadoop worker nodes to Debian Jessie - https://phabricator.wikimedia.org/T160333#3097658 (10ops-monitoring-bot) Script wmf_auto_reimage was launched by elukey on neodymium.eqiad.wmnet for hosts: ``` ['analy... [12:49:52] 06Analytics-Kanban, 10ChangeProp, 06Operations, 10Reading-Web-Trending-Service, 06Services (watching): Upgrade librdkafka 0.9.4 on SCB and Varnishes - https://phabricator.wikimedia.org/T159379#3097663 (10faidon) Just as a general comment, as long as the package name remains the same (`librdkafka1`), the... [12:51:50] ufff oozie [12:51:55] * elukey restart jobs [12:53:20] !log restarted webrequest-load-wf-text-2017-3-14-11 [12:53:21] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [12:53:53] !log restarted webrequest-load-wf-upload-2017-3-14-11 [12:53:53] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [12:55:52] Thanks elukey [12:56:37] joal: we should have metrics for the Job History Server soon [12:56:50] even if the MBeans for jmx are not that rich [12:56:55] but we'll have JVM metrics [12:57:01] awesome [12:57:14] elukey: You see, when I tell you you know more than I do [12:57:37] I created https://wikitech.wikimedia.org/wiki/User:Elukey/Ops/Jconsole if people wants to use Jconsole without getting crazy [12:58:52] (03CR) 10Joal: Update sqoop and namespace_map scripts for versioning (034 comments) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/341586 (https://phabricator.wikimedia.org/T160152) (owner: 10Joal) [12:59:02] (03PS3) 10Joal: Update sqoop and namespace_map scripts for versioning [analytics/refinery] - 10https://gerrit.wikimedia.org/r/341586 (https://phabricator.wikimedia.org/T160152) [13:06:45] 10Analytics, 10Analytics-Cluster, 06Operations, 13Patch-For-Review, 15User-Elukey: Reimage all the Hadoop worker nodes to Debian Jessie - https://phabricator.wikimedia.org/T160333#3097694 (10ops-monitoring-bot) Completed auto-reimage of hosts: ``` ['analytics1043.eqiad.wmnet'] ``` and were **ALL** succe... [13:09:37] (03CR) 10Joal: Add oozie jobs for mw history denormalized (036 comments) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/341030 (https://phabricator.wikimedia.org/T160074) (owner: 10Joal) [13:11:23] taking a break a-team, later [13:37:37] mforns: have you tested the aqs change in beta? I can batcave with you to do that if you want :) [13:39:52] fdans, I'd be pleased :] [13:42:01] mforns: give me a couple minutes, finishing a change... [13:45:46] (03PS2) 10Fdans: Switch table pointer to v2 in per project endpoint [analytics/aqs] - 10https://gerrit.wikimedia.org/r/342486 (https://phabricator.wikimedia.org/T156312) [13:53:15] (03CR) 10Mforns: [C: 031] "LGTM! One minor subjective nitpick, see inline comments. Agree to merge anyway!" (032 comments) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/325312 (https://phabricator.wikimedia.org/T144717) (owner: 10Joal) [13:56:48] (03CR) 10Mforns: [C: 031] "I think you forgot to push your latest changes :]" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/341030 (https://phabricator.wikimedia.org/T160074) (owner: 10Joal) [13:59:12] (03CR) 10Fdans: "lgtm pending successful test in beta" (033 comments) [analytics/aqs] - 10https://gerrit.wikimedia.org/r/342417 (https://phabricator.wikimedia.org/T156391) (owner: 10Mforns) [14:08:52] (03CR) 10Mforns: [C: 031] "LGTM! One comment not related to the change :P" (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/341586 (https://phabricator.wikimedia.org/T160152) (owner: 10Joal) [14:12:39] mforns: want to do that? [14:12:46] fdans, yes! [14:12:55] a la baticueva! [14:13:10] si seƱor :] [14:14:03] !log analytics1043 back in service [14:14:04] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [14:25:46] elukey: mind jumping in to the batcave with us for a second? [14:27:07] fdans: I am in the middle of something, do you mind if I join in ~10 mins? [14:27:15] sure! :) [14:27:20] thanks :) [14:28:08] phew dudes, it is nasty round here [14:37:33] ottomata: o/ [14:37:39] how is the weather looking? [14:37:46] fdans: joining [14:38:10] elukey: there's no need anymore, thank you luca, sorry :) [14:38:20] (unless you really want to see us!) [14:39:59] nastay! the snow isn't too deep, but it is sleeting hard [14:53:20] 06Analytics-Kanban, 13Patch-For-Review: Spike: Split unique devices data for Asiacell and non-Asiacell traffic in Iraq - https://phabricator.wikimedia.org/T158237#3098020 (10Nuria) p:05Triage>03Lowest [15:01:59] joal: standduppp? [15:05:48] ping joal [15:05:58] (03CR) 10Milimetric: [C: 032] Correct per-project endpoint date check [analytics/aqs] - 10https://gerrit.wikimedia.org/r/342601 (https://phabricator.wikimedia.org/T160311) (owner: 10Joal) [15:08:51] 06Analytics-Kanban, 13Patch-For-Review: Create hive tables and queries for standrard metrics computation out of mediawiki denormalized history - https://phabricator.wikimedia.org/T160155#3098088 (10Nuria) a:05Milimetric>03JAllemandou [15:09:26] 06Analytics-Kanban, 10ChangeProp, 06Operations, 10Reading-Web-Trending-Service, 06Services (watching): Upgrade librdkafka 0.9.4 on SCB and Varnishes - https://phabricator.wikimedia.org/T159379#3098089 (10Ottomata) Ya, I expect things to be fine on the varnishes. I was also hoping that our currently deplo... [15:16:26] ottomata: thanks for the review! Fixed, I forgot those occurrences :) [15:17:23] https://phabricator.wikimedia.org/T159883 [15:23:58] mforns: instead of hdfs dfs -rm to overwrite file: hdfs dfs -put -f [15:24:47] joal, yes! I overlooked that option yesterday! [15:24:55] waaay better [15:24:57] ;0 [15:25:01] :) [15:25:26] * joal wants a keyboard that types what it thinks [15:25:40] Krinkle: gonna merge your puppet things today, s'ok? [15:25:41] you around? [15:26:18] joal, wait! there's no -f option: https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-common/FileSystemShell.html#put [15:26:42] just tried on stat1004 mforns [15:27:13] joal, awesome, but disclaimer -> not my fault if docs are wrong [15:27:14] :] [15:27:20] :D [15:31:23] 10Analytics: Spike: Spark 2.x as cluster default (working with oozie) - https://phabricator.wikimedia.org/T159962#3098172 (10Ottomata) Need a place to park some notes: ``` spark build your own deb export JAVA_HOME - add openjdk into build depends - export http proxies - alter .m2/settings.xml to add proxy elukey, fdans: About aqs on betq - How do we proceed ? [15:39:38] joal: I am going to check scap configs today/tomorrow, I am pretty sure that the issue is there [15:39:56] elukey: ok - About having new machines for cassandra ? [15:40:47] joal: second step, but it shouldn't be a big issue (need to ask to hashar if it is ok to spin up two new instances, then we'll need to re-build the cassandra cluster etc.. [15:41:19] elukey: Should we wait for this to happen to move forward with the new changes we have? [15:41:35] milimetric fdans mforns feedback on updated language selector? https://www.dropbox.com/sh/lfrn4lcjyqhou7o/AAAmzec_63b1UwaZCGFDw1gea?dl=0 [15:42:26] at bottom of mock [15:43:11] joal: maybe we can sync tomorrow and see if I fixed the issue and decide? [15:43:16] wdyt? [15:44:17] (03PS13) 10Joal: Port standard metrics to reconstructed history [analytics/refinery] - 10https://gerrit.wikimedia.org/r/322103 (https://phabricator.wikimedia.org/T160155) (owner: 10Milimetric) [15:44:35] elukey: works for me - fdans, any concenr? [15:44:57] (03PS8) 10Joal: Add oozie jobs for mw history denormalized [analytics/refinery] - 10https://gerrit.wikimedia.org/r/341030 (https://phabricator.wikimedia.org/T160074) [15:45:37] elukey, joal sounds good to me (sorry, was batcavin with marcel) [15:45:50] (03PS4) 10Joal: Update sqoop and namespace_map scripts for versioning [analytics/refinery] - 10https://gerrit.wikimedia.org/r/341586 (https://phabricator.wikimedia.org/T160152) [15:45:54] btw Batcave in Catalan is Batcova [15:47:57] ottomata: let's merge the puppet UA changes right? cc Krinkle [15:48:41] yes, does Krinkle need to be around to babysit? [15:49:29] (03CR) 10Joal: Update sqoop and namespace_map scripts for versioning (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/341586 (https://phabricator.wikimedia.org/T160152) (owner: 10Joal) [15:49:35] nuria: ^^ [15:51:53] (03CR) 10Joal: [C: 031] "LGTM. let's merge in sync with AQS." [analytics/refinery] - 10https://gerrit.wikimedia.org/r/342205 (https://phabricator.wikimedia.org/T156312) (owner: 10Fdans) [15:52:49] ottomata: mmm, for the metrics stuff we can look at it, the idea is that browsers reported here would not chnage: https://grafana.wikimedia.org/dashboard/db/navigation-timing-by-browser [15:53:02] ottomata: there is the other change about not consuming from zmq [15:53:13] nuria: there are 3 changes, afaik, 2 are puppet [15:53:18] https://gerrit.wikimedia.org/r/#/c/341723/4 [15:53:20] https://gerrit.wikimedia.org/r/#/c/341724/4 [15:53:22] ottomata: right [15:53:27] not sure which are related to the UA stuff [15:53:33] ottomata: we will deploy the puppet first [15:53:45] does Krinkle want to be around for hte merge? [15:53:47] or should we just do it? [15:54:21] ottomata: I think we can do it, i have meeting until an hour from now, we can probably deploy puppet changes after and ping Krinkle [15:54:33] ottomata: https://gerrit.wikimedia.org/r/#/c/341724/4 -> this changes consumption [15:54:59] changes consumption? [15:55:17] ja, nuria we can do it whenever, in one hour is fine [15:55:18] if you like [15:55:49] ottomata: consumption because the performance consumer was consuming zmq and now it is going to be consuming kafka [15:55:55] no [15:55:56] ottomata: there is a third change [15:55:58] it will stillb e consuming mq [15:56:00] zmq [15:56:07] just using eventlogging instead of zmq client directly [15:56:14] right? [15:56:15] yeah [15:56:16] ottomata: ahhh [15:56:18] endpoint => "tcp://${eventlogging_host}:8600", [15:56:22] i suggested we do kafka too [15:56:27] but krinkle wanted to do one thing at a time [15:56:28] which is fine [15:56:44] ottomata: right!, i though we did both at the same time, should have looked [15:57:19] ottomata: and this is teh third change: https://gerrit.wikimedia.org/r/#/c/337158/ [15:57:22] *the [15:57:33] ottomata: let me see if we can send a meeting req to tio [15:57:41] *meeting req to Krinkle [15:57:53] meeting req? ok [15:58:04] nuria: i follow your lead, when you are ready to merge, just lemme know [16:00:33] ashgrigas: how come some of the languages have stars? [16:00:52] thats the current component a way to favorite them [16:01:01] we can remove that if we don't need to support favoriting languages [16:01:13] if its likely not common in wikistats vs on wikis [16:01:26] oh, I see, so the languages you favorite show up by default without expanding the "more languages" thing? [16:02:04] yeah [16:02:21] i think it misunderstood the component though after just hearing back from Pau [16:02:34] i think all we need is the button and the overlay search ui [16:02:44] and don't need to expose common languages across the bottom [16:02:51] unless we thinks its more useful to do so [16:03:43] milimetric: mind taking a look at this one? sorry https://gerrit.wikimedia.org/r/#/c/342486/ [16:06:08] (03CR) 10Milimetric: [C: 032] Switch table pointer to v2 in per project endpoint [analytics/aqs] - 10https://gerrit.wikimedia.org/r/342486 (https://phabricator.wikimedia.org/T156312) (owner: 10Fdans) [16:06:54] 06Analytics-Kanban, 10ChangeProp, 06Operations, 10Reading-Web-Trending-Service, 06Services (watching): Upgrade librdkafka 0.9.4 on SCB and Varnishes - https://phabricator.wikimedia.org/T159379#3098349 (10Pchelolo) @faidon The `librdkafka` [[ https://github.com/edenhill/librdkafka/releases | changelog ]]... [16:07:15] joal: you ok with this one? ^ [16:07:23] reading fdans [16:08:08] fdans: Yes, looks good [16:08:19] fdans: we should make a cobo with my patch as well [16:10:03] 06Analytics-Kanban, 10ChangeProp, 06Operations, 10Reading-Web-Trending-Service, 06Services (watching): Upgrade librdkafka 0.9.4 on SCB and Varnishes - https://phabricator.wikimedia.org/T159379#3098364 (10faidon) Oh, I hadn't realized node-rdkafka was using the C++ API. Yes, the C++ ABI is unstable, cf. h... [16:10:23] job-history metrics added to https://grafana.wikimedia.org/dashboard/db/analytics-hadoop [16:10:29] joal,ottomata --^ [16:11:17] nice elukey! :) [16:11:56] joal: cobo? [16:12:41] 10Analytics-Cluster, 06Analytics-Kanban, 13Patch-For-Review, 15User-Elukey: Add jmxtrans metrics from Hadoop yarn-mapreduce-historyserver - https://phabricator.wikimedia.org/T156272#3098390 (10elukey) Metrics are flowing to graphite, added graphs to https://grafana.wikimedia.org/dashboard/db/analytics-hadoop [16:14:10] fdans: meant combo, but my fingers are fat today [16:16:40] milimetric more like this https://www.dropbox.com/s/uni6i9uchuyw1uj/Dashboard%20Overview%20lang%20selector%20open.png?dl=0 [16:19:44] joal yeah but I'm not sure what you mean by combo :) [16:19:51] hey yall, still waiting for review on this: https://gerrit.wikimedia.org/r/#/c/341922/ if anyone finds a moment [16:20:01] fdans: We hsould merge and deploy the 2 patches together [16:20:16] Ahhh cool cool [16:24:33] ashgrigas: I think I'm confused about that too, how come there are two columns where you pick stuff? My understanding of the component was that you use a mini-world map or other country selector to get the languages usually spoken in those countries, then you pick among those languages. And there are also some "worldwide" languages that always show up. [16:33:23] joal,fdans - can I make some deployments from tin-deployment-prep [16:33:27] ? [16:33:51] y [16:34:04] (03PS5) 10Joal: Update sqoop and namespace_map scripts for versioning [analytics/refinery] - 10https://gerrit.wikimedia.org/r/341586 (https://phabricator.wikimedia.org/T160152) [16:34:22] sorry elukey i'd check with mforns [16:34:53] mmmm not online [16:35:28] (03CR) 10Fdans: [V: 032] Switch table pointer to v2 in per project endpoint [analytics/aqs] - 10https://gerrit.wikimedia.org/r/342486 (https://phabricator.wikimedia.org/T156312) (owner: 10Fdans) [16:35:37] the deployment was fine this time, weird [16:36:12] ah no with --force it failed [16:36:40] in analytics/aqs/deploy: promote and restart_service stage(s): 100% (ok: 0; fail: 1; left: 0) [16:36:45] that is what I remembered [16:37:31] 16:36:00 [deployment-aqs01.deployment-prep.eqiad.wmflabs] Executing check 'depool' [16:37:34] 16:36:00 [deployment-aqs01.deployment-prep.eqiad.wmflabs] Unhandled error: [16:37:43] yeah I am pretty sure it is the depool [16:38:14] fdans: did you get all green in your last deployment? [16:38:29] elukey: yes [16:46:58] (03PS2) 10Joal: Update static wiki projects list [analytics/refinery] - 10https://gerrit.wikimedia.org/r/342030 (https://phabricator.wikimedia.org/T160153) [16:48:28] 06Analytics-Kanban: Synchronise changes for productionisation of mediawiki history jobs - https://phabricator.wikimedia.org/T160154#3098533 (10JAllemandou) [16:48:32] wikimedia/mediawiki-extensions-EventLogging#639 (wmf/1.29.0-wmf.16 - 838abb7 : Translation updater bot): The build has errored. [16:48:32] Change view : https://github.com/wikimedia/mediawiki-extensions-EventLogging/compare/wmf/1.29.0-wmf.16 [16:48:32] Build details : https://travis-ci.org/wikimedia/mediawiki-extensions-EventLogging/builds/211019219 [16:55:40] (03PS1) 10Elukey: Configure no pool/depool checks for AQS Beta [analytics/aqs/deploy] - 10https://gerrit.wikimedia.org/r/342649 [16:57:02] (03CR) 10Thcipriani: [C: 031] Configure no pool/depool checks for AQS Beta [analytics/aqs/deploy] - 10https://gerrit.wikimedia.org/r/342649 (owner: 10Elukey) [16:58:00] 06Analytics-Kanban, 10ChangeProp, 06Operations, 10Reading-Web-Trending-Service, 06Services (watching): Upgrade librdkafka 0.9.4 on SCB and Varnishes - https://phabricator.wikimedia.org/T159379#3098592 (10Ottomata) Included 0.9.4 in [[ https://apt.wikimedia.org/wikimedia/pool/backports/libr/librdkafka/ |... [17:00:27] (03CR) 10Elukey: [V: 032 C: 032] Configure no pool/depool checks for AQS Beta [analytics/aqs/deploy] - 10https://gerrit.wikimedia.org/r/342649 (owner: 10Elukey) [17:03:07] joal, fdans --^ [17:03:15] much better now [17:03:21] I tried scap-deploy --force [17:03:28] it fails but for this: [17:03:34] 17:01:53 [deployment-aqs01.deployment-prep.eqiad.wmflabs] Check 'endpoints' failed: /legacy-pageviews/per-project/{project}/{access-site}/{granularity}/{start}/{end} (Get legacy pageviews) is CRITICAL: Test Get legacy pageviews returned the unexpected status 404 (expecting: 200) [17:03:49] mforns: --^ [17:04:00] so the issue with the scap deployment should be gone [17:04:15] now it is an application issue :D [17:04:24] elukey, oh! [17:04:48] elukey, but... what is that test? [17:05:05] so it says 17:01:52 [deployment-aqs01.deployment-prep.eqiad.wmflabs] Executing check 'endpoints' [17:05:32] command: check_endpoints_aqs [17:06:22] it must be a script deployed somewhere [17:06:37] ??? [17:06:44] OK I will look for it [17:07:03] I am looking for it too but can't find it [17:07:09] thanks for digging this out! [17:07:17] don't worry, will look for it tomorrow [17:07:34] thank you for looking at that elukey [17:07:39] milimetric: had to reschedule our meeting with research [17:11:25] yep, I saw, no problem [17:11:45] irccloud seems to have decided to randomly disable notifications, sorry if I miss pings [17:17:06] mforns: it seems a nrpe check that services provides adding "command: check_endpoints_" to the scap config [17:17:42] but I am not sure how it works :( [17:17:46] elukey, oh wow [17:18:14] am relocating, bbs [17:18:37] elukey, by the error message it seems that it is expecting data to be returned, no data means the endpoint returns 404 [17:20:18] elukey, oh, I think I know where it comes from [17:20:30] (03CR) 10Fdans: [V: 032] Correct per-project endpoint date check [analytics/aqs] - 10https://gerrit.wikimedia.org/r/342601 (https://phabricator.wikimedia.org/T160311) (owner: 10Joal) [17:21:03] k will try to sort it out [17:21:24] joal: both changes merged. Deploy? [17:21:53] (03CR) 10Nuria: ">We could have an oozie workflow, that executes this script and then the hql with the creation >of the table." (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/338786 (https://phabricator.wikimedia.org/T158330) (owner: 10Mforns) [17:23:02] (afk 5min) [17:24:09] mforns: Marko explained to me what to do, fixing it in a sec [17:24:55] elukey, I think it's related to the code in v1/legacy-pageviews.yaml [17:25:12] in the x-amples section [17:25:42] (03PS1) 10Elukey: Disable the scap endpoint check in beta (not working) [analytics/aqs/deploy] - 10https://gerrit.wikimedia.org/r/342652 [17:25:49] mforns: yeah but that check is not supposed to work in beta :( [17:26:33] I'm not sure it works anywhere... [17:26:39] the unique devices test requests data for 1970-01-01 and expects a 200 [17:26:47] ahahaha [17:26:59] I'm sure production cassandra doesn't hold data for that year [17:27:16] (03CR) 10Elukey: [V: 032 C: 032] Disable the scap endpoint check in beta (not working) [analytics/aqs/deploy] - 10https://gerrit.wikimedia.org/r/342652 (owner: 10Elukey) [17:27:28] :] thanks elukey [17:28:32] working! :) [17:28:44] fdans: I deployed your last changes too (I think) - /srv/deployment/analytics/aqs/deploy-cache/revs/bfe22774b31ea16388fe2fc825462d676e9421de [17:28:56] can you double check on aqs01 that everything is fine? [17:29:13] so now it restarts the service without doing depool,pool and service check [17:32:06] awesome elukey [17:32:21] I need to leave now, but I'll check tomorrow [17:32:33] see you team tomorrow! byeee [17:32:36] o/ [17:34:19] going afk as well, let me know if the aqs beta deployments are good now people :) [17:34:22] byeeee [17:34:34] (03CR) 10Nuria: Add mediawiki history spark jobs to refinery-job (031 comment) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/325312 (https://phabricator.wikimedia.org/T144717) (owner: 10Joal) [17:36:29] (03CR) 10Nuria: "We will need to add this job to the documentation about historical project counts so we know what code was run in case we find future oddi" (031 comment) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/337593 (https://phabricator.wikimedia.org/T156388) (owner: 10Mforns) [17:37:10] (03CR) 10Nuria: "Anyone that knows more scala than me should merge though." [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/337593 (https://phabricator.wikimedia.org/T156388) (owner: 10Mforns) [17:39:50] (03CR) 10Nuria: Update static wiki projects list (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/342030 (https://phabricator.wikimedia.org/T160153) (owner: 10Joal) [17:39:55] nuria: I'll be there in a few minutes [17:40:07] Let's sync over IRC? [17:43:58] back [17:44:31] Krinkle: okeis, everyone here ? cc ottomata [17:44:41] o/ [17:45:10] Krinkle: we have 3 puppet changes to merge, 2 apply to teh setup of the webperf consumer, the third one applies to what it actually does. [17:45:13] fdans: too late for deploy IMO - elukey ? [17:45:17] cc ottomata [17:45:32] thus, let's deploy the chnages that have to do with set up first, sounds good? [17:45:39] yeah thought so joal, let's do tomorrow morning? [17:45:42] ^ ottomata Krinkle ? [17:45:56] Sounds good [17:45:59] elukey: ok for a deploy iof AQS + refinery tomorrow morning? [17:46:05] evenlogging/de-zmq related first? [17:46:15] Krinkle: ok let's wait for ottomata , he acepted meeting so he's arround [17:47:18] ya [17:47:24] i'm here [17:47:45] Krinkle: ok let's deploy these changes 1st: [17:47:46] Krinkle: this one first? [17:47:46] https://gerrit.wikimedia.org/r/#/c/341724/4 [17:48:22] oh https://gerrit.wikimedia.org/r/#/c/341723/ is parent of that [17:48:23] https://gerrit.wikimedia.org/r/#/c/341723/ [17:48:28] Krinkle: can we do ^ first ? [17:48:49] (03CR) 10Joal: Add mediawiki history spark jobs to refinery-job (032 comments) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/325312 (https://phabricator.wikimedia.org/T144717) (owner: 10Joal) [17:49:05] ottomata: I'd do them in one go if you don't mind [17:49:09] But yeah, parent first [17:49:11] ok fine with me [17:49:11] k [17:49:20] proceeding [17:49:35] ve.py local eventlogging, navtiming zmq>el, navtiming ua_parse [17:49:51] will do navtiming as separate, cause we gotta deploy anyway i thin... [17:49:52] Krinkle: best dashboard to watch is this one right? https://grafana.wikimedia.org/dashboard/db/navigation-timing-by-browser [17:50:02] nuria: Yes [17:50:15] Krinkle: k, looking [17:50:17] Especially the rate limit graph on the bottom of that one will help [17:50:24] rate, not rate limit [17:50:24] running puppet on hafnium.. [17:50:48] sorry, 'ua_parse as separate...' ^^^ [17:51:27] Krinkle: that is the rate of data coming into graphana? [17:51:55] Yeah, it's the number of metrics reported per browser per minute into statsd>Graphite>Grafana [17:52:02] https://grafana.wikimedia.org/dashboard/db/navigation-timing-by-browser?panelId=6&fullscreen&var-metric=mediaWikiLoadComplete [17:52:10] ok merged, run, it resetarted [17:52:11] firstPaint is default, but only in IE/Chrome. mwLoadComplete will show all browsers [17:52:11] look ok? [17:52:17] k, i see, so you can use it for alarming too [17:52:37] ottomata: the same ua parser debian install we hav ein eventlogging machine should be on hafnium [17:53:45] nuria: this required el code change, right? [17:53:55] ? [17:54:00] oh, ok nm [17:54:00] https://grafana.wikimedia.org/dashboard/db/visualeditor-load-save?panelId=12&fullscreen&from=now-30m&to=now-3m [17:54:01] ottomata: no [17:54:01] i get it [17:54:01] https://grafana.wikimedia.org/dashboard/db/navigation-timing-by-browser?panelId=6&fullscreen&var-metric=mediaWikiLoadComplete&from=now-15m&to=now [17:54:29] this is actually making webperf consumer able to consume our upcoming changes [17:54:33] ottomata: Yeah, it's not using ua_parse in navtiming :) [17:54:51] ok, so, this is fine to merge then? [17:54:52] https://gerrit.wikimedia.org/r/#/c/337158/22 [17:54:54] We had our own regexes on userAgent raw, and we're adding future-compat for it being an object preparsed by EL [17:55:01] ottomata: so all code we are deploying now should make no difference but install shoudl not fail as ua_parse shoudl be available as debian [17:55:25] ya i python-ua-parser - Python port of Browserscope's user agent parser it is installed [17:55:27] on hafnium [17:55:36] Why would install even look at ua_parse? [17:55:38] yeah [17:55:47] it shoudlnt', right, cause EL processor on eventlog1001 is doing the parsing [17:55:55] Yep [17:55:59] this is just adapting the consumer on hafnium to deal with the chang ein format [17:56:07] ottomata: ah yes, you guys are right [17:56:21] Krinkle: as it is not used for nothing [17:56:24] right right [17:56:35] sorry for the confusion [17:56:43] ok, gimme +1 to merge https://gerrit.wikimedia.org/r/#/c/338044/ and https://gerrit.wikimedia.org/r/#/c/337158/ [17:57:18] VE and navtiming look good. Both were restarted, right? [17:57:25] (03PS3) 10Joal: Update static wiki projects list [analytics/refinery] - 10https://gerrit.wikimedia.org/r/342030 (https://phabricator.wikimedia.org/T160153) [17:57:28] from the consumer refactor we just merged [17:57:30] ya puppet restarted them [17:57:31] Info: /Stage[main]/Webperf::Navtiming/File[/srv/webperf/navtiming.py]: Scheduling refresh of Service[navtiming] [17:57:41] Info: /Stage[main]/Webperf::Ve/File[/lib/systemd/system/ve.service]: Scheduling refresh of Service[ve] [17:57:48] ottomata: I'll re-run tests on 337158 one more time [17:58:02] k [17:58:27] OK. passing :) [17:58:42] k, so merge em? [17:58:49] Krinkle: of course they pass! [17:59:03] :) - just checking since Jenkins doesn't run them and there were a couple rebases [17:59:10] Krinkle: ah yes yes [17:59:12] oh, see your +1, merging... [17:59:16] Krinkle: which i did [18:00:32] running puppet [18:01:29] ok, navtiming updated and restarted [18:03:04] ottomata: k [18:03:46] ottomata, Krinkle let's baby sit for a bit no? [18:04:00] Yeah, 5min should suffice [18:04:09] ottomata, Krinkle; the chnage here we will see it with our EL deployment which is coming next [18:04:10] cool [18:04:34] Ah okay, so that's happening now as well. Alrighty then [18:04:42] ottomata: i am happy to do that one but if both of you are arround it will be great [18:06:40] Krinkle: ya, no? [18:06:48] Krinkle: makes sense since we are all here [18:06:55] Yep [18:07:00] Krinkle: if there is an issue we will revert EL change [18:07:09] and fix perf consumer if needed [18:07:15] cc ottomata makes sense? [18:08:04] 06Analytics-Kanban, 13Patch-For-Review: Provide 2 static files to differenciate prod and labs projects to sqoop in - https://phabricator.wikimedia.org/T160153#3098901 (10JAllemandou) p:05Normal>03Triage [18:08:07] i'm ready [18:08:08] ya [18:08:14] nuria: ya let's do it [18:08:20] ok, merging [18:08:39] ottomata: i have tested this on beta so i hope there are no surprises [18:08:51] k [18:09:25] leaving for tonight a-team, see you tomorrow for another day of CRs :) [18:09:51] laters! [18:11:07] ottomata: k drunning deploy [18:11:11] *running [18:11:16] 06Analytics-Kanban, 15User-Elukey: Ongoing: Give me permissions in LDAP - https://phabricator.wikimedia.org/T150790#2796693 (10ovasileva) Hi - I am requesting access to pivot by being added to the wmf group (I am the product manager for the reading-web team - https://meta.wikimedia.org/wiki/User:OVasileva_(WMF... [18:12:39] ottomata: now my doubt is... [18:12:52] ottomata: do we need to build anything on eventlogging machine? [18:12:56] build anything? [18:14:03] ottomata: python build the eventlogging package on the machine? [18:14:16] ottomata: or just restart? [18:14:46] build? naw just restart, if code is deployed, just gotta restart it [18:15:34] ottomata: k [18:16:26] ottomata: k restarted let me see if UAS appear parsed [18:17:22] nuria: [18:17:22] "userAgent": "{\"os_minor\": \"11\", \"os_major\": \"10\", \"device_family\": \"Other\", \"os_family\": \"Mac OS X\", \"browser_minor\": \"0\", \"wmf_app_version\": \"-\", \"browser_major\": \"56\", \"browser_family\": \"Chrome\"}" [18:17:24] looks good ja? [18:17:26] json string? [18:17:32] yes, it must be a string [18:17:35] cool [18:17:38] ya then its working [18:17:57] ottomata: let me look at several though [18:18:22] k [18:20:48] ottomata: i see some UAS truncated when inserting into db [18:21:49] uh oh [18:21:52] string too long? [18:22:07] ottomata: yes [18:22:20] hmmm [18:22:24] ottomata: but not all of them, just some [18:23:08] yeah the long ones... :) [18:23:15] nuria: what table are you looking at? [18:23:49] I was looking at PageContentSaveComplete_5588433 [18:24:03] varchar(191) [18:24:06] userAgent [18:24:52] nuria: https://github.com/wikimedia/eventlogging/blob/master/eventlogging/jrm.py#L76-L79 [18:25:21] ua will never be an index column [18:25:21] hm [18:25:38] ottomata: right [18:26:29] are any varchars ever an index column for EL? [18:26:52] ottomata: tables do not have indexes by default unles they have been added [18:27:25] ottomata: but answer could be yes, i guess [18:27:25] yea [18:27:57] i dunno, what should we do? [18:28:23] ottomata: let's try to quantify how many go over that limit [18:28:41] ok [18:28:53] ottomata: we can do it using kafkacat so it should not be hard [18:29:25] ottomata: to be fair i see data truncated errors for other columns too [18:29:44] ottomata: but UA is the most prevalent one maybe [18:30:00] ottomata: ahem .. maybe not [18:30:58] ottomata: [18:30:58] at which layer is it truncated? Before or after mysql? [18:31:04] before our changes: [18:31:08] https://www.irccloud.com/pastebin/ZGNzOSvy/ [18:31:12] Krinkle: it's mysql [18:31:22] Krinkle: but plenty of those before too [18:31:30] If we distribute invalid JSON in userAgent property, then we should revert and maybe change it to store null instead of a truncated string [18:31:34] Krinkle: let me see if number has increased significantly [18:32:09] Krinkle: ya, it's shortcoming of EL consumer for all columns that go over 191 though [18:32:20] Krinkle: not just this one [18:32:33] nuria: that means kafka/zmq consumers are affected? [18:32:42] Krinkle: no [18:32:52] Krinkle: data is all good in kafka [18:33:05] Krinkle: as length limits are only on db [18:33:12] Okay [18:34:24] nuria: i did a quick consume of 1000 lines [18:34:42] and counted ua strings < 192 and > 192 [18:34:46] i got about 1/3 are > 192 [18:34:54] ottomata: ya [18:34:54] that is with escaped " though [18:34:59] so, probably smaller than that [18:35:05] but it is still pretty significant [18:35:49] ottomata: i looked an in every log rotation of consumer there is about ~6500 errors of > length for UA prior to our change [18:36:04] oh ya, that makes sense [18:36:13] i bet the strings were longer before [18:36:15] since they had much more data [18:36:17] right? [18:36:23] althouh, now its json so its got field names too [18:36:28] could even out in the wash? [18:36:28] but [18:36:35] but, truncated json is pretty bad [18:36:37] it won't parse [18:36:41] ottomata: ya, could be a wash [18:36:50] ottomata: ya, taht is why is worst [18:37:07] i think this 191 char limit is a bad idea [18:37:09] can we just increase it? [18:37:14] do we know of indexes? [18:37:23] wouldn't really help, as we'd have to alter all the existent table [18:37:24] s [18:37:24] engh [18:37:25] ungh [18:37:38] ottomata: i do not see why we could not change it but alters would be expensive [18:37:54] ya [18:38:03] ottomata: but the number of errors is much greater now [18:38:04] we could change EL code for future tables [18:38:05] but meh [18:38:06] ottomata: than before [18:38:12] about 5 times larger [18:38:18] ottomata: so it is not a wash [18:38:43] aye [18:38:45] ottomata: so we should probably revert [18:38:52] k [18:38:54] ottomata: but still fix is not trivial [18:38:57] ya [18:39:03] as we can 1) make this column greater [18:39:04] +1 for revert nuria [18:39:16] ottomata: ya, let's revert [18:39:20] ottomata: doing [18:41:10] ottomata: this is the revert: https://gerrit.wikimedia.org/r/#/c/342663/1 [18:41:29] nuria: i wonder if this indexing limit is even true anymore [18:41:34] now that the tables are tokudb [18:41:47] ottomata: i bet not , let's ask jaime [18:41:53] nuria: , merged [18:43:05] !log rolling back prior eventlogging deployment, userAgent column is restricted to 191 chars, needs to be bigger or UAs are truncated [18:43:05] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [18:45:06] 06Analytics-Kanban, 10ChangeProp, 06Operations, 10Reading-Web-Trending-Service, 06Services (watching): Upgrade librdkafka 0.9.4 on SCB and Varnishes - https://phabricator.wikimedia.org/T159379#3099128 (10Ottomata) librdkafka has been upgraded to 0.9.4 on all cache hosts, and varnishkafka has been restart... [18:45:33] ottomata: deployed revert [18:45:41] ottomata: all back to how it used to be [18:45:56] ottomata: let's ask jaime of manu if they are still there no? [18:46:08] Krinkle: so no UA changes, so metrics should be unaffected [18:46:29] Krinkle: we need to sort out issue with length of column [18:47:13] Okay [18:56:37] 06Analytics-Kanban, 10DBA: Change length of userAgent column on EL tables - https://phabricator.wikimedia.org/T160454#3099166 (10Nuria) [18:58:07] 06Analytics-Kanban, 10DBA: Change length of userAgent column on EL tables - https://phabricator.wikimedia.org/T160454#3099215 (10Nuria) p:05Triage>03High [18:58:33] 06Analytics-Kanban, 10DBA: Change length of userAgent column on EL tables - https://phabricator.wikimedia.org/T160454#3099219 (10jcrespo) Why not deploy a new schema instead? Are the already inserted user agents going to change? [18:59:10] 06Analytics-Kanban, 10DBA: Change length of userAgent column on EL tables - https://phabricator.wikimedia.org/T160454#3099221 (10Nuria) [18:59:46] 06Analytics-Kanban, 10DBA: Change length of userAgent column on EL tables - https://phabricator.wikimedia.org/T160454#3099166 (10Nuria) [19:00:28] 10Analytics-EventLogging, 06Analytics-Kanban, 13Patch-For-Review: Change userAgent field to user_agent_map in EventCapsule - https://phabricator.wikimedia.org/T153207#3099228 (10Nuria) We had to revert this changes due to an issue with database columns that we did not detect in beta: https://phabricator.wiki... [19:12:22] 06Analytics-Kanban, 10DBA: Change length of userAgent column on EL tables - https://phabricator.wikimedia.org/T160454#3099300 (10Nuria) >Are the already inserted user agents going to change? no >Why not deploy a new schema instead? A new schema would not change length of column as that is hardcoded on db so... [19:15:37] (03CR) 10Ottomata: "Any thoughts on 'version' partition name?" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/341586 (https://phabricator.wikimedia.org/T160152) (owner: 10Joal) [19:16:08] 06Analytics-Kanban: Create cron job in puppet sqooping prod and labs DBs - https://phabricator.wikimedia.org/T160083#3099318 (10Ottomata) Waiting for https://gerrit.wikimedia.org/r/#/c/341586/ to land before I proceed with this. [19:17:01] 10Analytics, 10Analytics-Cluster, 06Operations, 10hardware-requests: EQIAD: stat1003 replacement - https://phabricator.wikimedia.org/T159839#3099320 (10Ottomata) Bump @robh [19:17:03] 10Analytics, 10Analytics-Cluster, 06Operations, 10hardware-requests: EQIAD: stat1002 replacement - https://phabricator.wikimedia.org/T159838#3099321 (10Ottomata) Bump @robh [19:17:54] 10Analytics, 10Analytics-Cluster: Enable hyperthreading on analytics100[12] - https://phabricator.wikimedia.org/T159742#3099327 (10Ottomata) @elukey, can we do this when as reimage these as part of T157807 ? [19:26:44] 06Analytics-Kanban, 10DBA: Change length of userAgent column on EL tables - https://phabricator.wikimedia.org/T160454#3099400 (10jcrespo) > other than alter would need to run in a smaller set of tables, newly created. For me that would be a huge win- it could be deployed in seconds, rather than weeks. > Cha... [19:38:50] 10Analytics, 10DBA, 06Operations: Prep to decommission old dbstore hosts (db1046, db1047) - https://phabricator.wikimedia.org/T156844#3099433 (10jcrespo) To be clear: there are 3 machines with eventlogging/analytics stuff (among other)- db1046, db1047 and dbstore1002 (this last one should still be ok for a c... [19:47:11] what domains are supported by the Pageviews API? [19:47:31] https://github.com/wikimedia/analytics-refinery/blob/master/static_data/pageview/whitelist/whitelist.tsv [19:49:07] thx musikanimal [19:49:16] so at a glance, SUL wikis + foundationwiki? [19:49:51] and some chapter wikis, I think [19:50:13] all the *.wikimedia.org's [19:52:05] no nyc.wikimedia though :( [19:55:43] 06Analytics-Kanban, 10DBA: Change length of userAgent column on EL tables - https://phabricator.wikimedia.org/T160454#3099462 (10Nuria) >other than alter would need to run in a smaller set of tables, newly created. Sorry, I did not explain this well-enough. A capsule change would not help little in this case... [19:56:53] tgr: but we can support what you need most likely, it is a whitelist for a reason. [19:57:08] cc musikanimal [19:57:57] tgr|away: if you need nyc you can send us a CR or phab ticket cc musikanimal [19:58:18] tgr|away: and ... now that i look at code this might be a bug [19:58:31] no I'm not really complaining about nyc.wikimedia, all the relevant pages go to enwiki anyway hah =P [19:58:46] it might be cool to see how many hits the main page gets, though [19:59:04] nuria: just trying to figure out what wikis to enable the PageViewInfo extension on [19:59:26] tgr: ah ok [20:01:05] tgr: but yes, when it comes to chapters we only added the ones that asked us so so for nyc this mneeds to be changed: https://github.com/wikimedia/analytics-refinery-source/commit/2d96d1ac02e44f09dda2857491a6b43486968484#diff-618a01f5939249ef8a87b40c4d508011R51 [20:03:43] I can ask Pharos if he's interested, but don't change it just for me :) [20:10:45] 10Analytics, 10DBA, 06Operations: Prep to decommission old dbstore hosts (db1046, db1047) - https://phabricator.wikimedia.org/T156844#3099522 (10Ottomata) Oh yeah, rats, I totally forgot to put this in our budget request. Hm. do db1046 and db1047 host just EL data, or also wiki dbs? [20:25:57] 10Analytics, 10Analytics-EventLogging: Research Spike: Better support for Eventlogging data on hive - https://phabricator.wikimedia.org/T153328#2877030 (10Ottomata) Kite (+Morphlines?) seems to make this way easier. I didn't get it working with partitions yet, but http://kitesdk.org/docs/1.1.0/Using-the-Kit... [20:32:59] musikanimal: will do not change [20:33:41] musikanimal: sorry, we will not change it but easy to do, file CR or phab [20:34:07] 10Analytics, 10Analytics-EventLogging: Find an alternative query interface for eventlogging Mariadb storage - https://phabricator.wikimedia.org/T159170#3099656 (10Ottomata) [20:34:09] 10Analytics: Project: migrate mysql eventlogging access to hadoop - https://phabricator.wikimedia.org/T145527#3099658 (10Ottomata) [20:34:27] 10Analytics, 10Analytics-EventLogging: Research Spike: Better support for Eventlogging data on hive - https://phabricator.wikimedia.org/T153328#3099660 (10Ottomata) https://github.com/kite-sdk/kite-examples/blob/master/json/README.md [20:39:40] nuria: ok thanks! [21:28:58] 10Analytics, 10Analytics-EventLogging, 10DBA, 10ImageMetrics: Drop EventLogging tables for ImageMetricsLoadingTime and ImageMetricsCorsSupport - https://phabricator.wikimedia.org/T141407#3099979 (10TerraCodes) [22:26:53] nuria: How does access to Hue work? I don't know exactly what it offers, but if it offers the ability to query like I can do on stat1002, then I'd be interested in trying that. I've seen various references to it, but not sure how to login. [22:39:51] 06Analytics-Kanban, 10DBA: Change length of userAgent column on EL tables - https://phabricator.wikimedia.org/T160454#3099166 (10Nuria) p:05High>03Triage [22:41:09] Krinkle: http://hue.wikimedia.org [22:41:24] Krinkle: have you tried pivot too? http://pivot.wikimedia.org [22:41:56] nuria: Yeah, I see links to it in various places, but none of the logins I know work [22:41:58] Krinkle: hue lets you build queries but ui is slow, you shall suffer , and being a perf person probably you will suffer more than others [22:42:08] pivot works for me [22:42:10] great [22:42:15] but not sure how to login to hue [22:42:30] Krinkle: ok, pivot is all visual but it is likely that for your use cases it might be sufficient [22:42:46] Krinkle: is it browser and pageview data that you are interested on? [22:43:12] Krinkle: if pivot works but hue does not do file a ticket, there must be an issue [22:43:15] nuria: Not sure yet, just wondered what's available in Hue, and if it overlaps with what I have access to, it might be a new way to explore and worth trying [22:43:21] Okay, will do. [22:43:27] nuria: Which access group is it supposd to be using? [22:43:38] Krinkle:nda users and wmf employees [22:43:48] Using LDAP/Labs for authentication? [22:44:16] Krinkle: ldap yes, labs ... no so sure, i'd say no [22:44:39] Krinkle: ah, you mean a wikitech user when you say labs? [22:44:44] Yeah [22:44:52] same as gerrit/grafana-admin/pivot etc. [22:45:01] Krinkle: yes, is that user but after we have verified you have an nda [22:45:05] OK Works now. Not sure why it didn't earlier. I assumed I had to be whitelisted somehow, but works now [22:45:19] Krinkle: or that you are an employee of wikimedia de /us.. etc [22:45:29] Krinkle: hue and pivot ? [22:45:36] yep [22:45:39] Pivot I had used before already [22:46:58] k [22:50:08] Krinkle: ok, let me know how you find things [22:50:41] nuria: I primarily query Kafka or Hive from stat1002 using kafkacat or hive [22:51:00] Krinkle: right cause just want to see NavigationTiming data? [22:51:13] Krinkle: or do you query webrequest stream directly? [22:51:16] I've been using Pivot a bit recently for pageview stuff, which has been very nice. And for the most part I don't query pageview in any way since analytics.wikimedia.org gives me most of what I need already. [22:52:02] nice! will make sure teams knows that [22:52:14] But every once in a while when working on operations/mediawiki-config I'll query raw webrequest to debug how and if certain deprecated endpoints are still used. [22:52:31] Krinkle: I see kind of to see user requests [22:52:34] Krinkle: so funny [22:52:42] e.g. something very specific like domain=wikimediafoundation.org and path LIKE '/logos/%' or some such [22:52:48] https://github.com/wikimedia/operations-mediawiki-config/tree/master/docroot/foundation/logos [22:53:03] Krinkle: that here that is the easiest way to see incoming stream of user requests [22:53:03] And let it run for a long time on the past 2 months and end up getting 3 results and then deciding to delete the files [22:53:18] Krinkle: ya [22:53:39] I opened Hue for the first time just now, and was greeted with a world map of some query ottomata created a year or so ago. That looked rather impressive. [22:54:06] Hm.. actually it's from 2017-01 [22:55:23] nuria: See https://phabricator.wikimedia.org/T128115#2439788 and further comments for some of the queries I did recently [22:55:32] Those were a lot of work to do, and not sure if there's an easier way to do so [22:55:37] they all involve statsv [22:56:11] That we did I think as as a Poc to make sure hue was running ok when we upgrade [22:56:24] Krinkle: i think they are super slow but not super hard [22:56:38] basically I just open up a giant kafkacat pipe to the last 2 weeks of statsv messages and then grep it to just the metrics I was interested in. And using the fact htat it contains a userAgent to extract additional information to learn more about the individual metrics. It's like a lightweight version of EventLogging. [22:59:41] In this case I had to manually parse the user agents, which took a long time to do. (used xargs to speed with parallel python processes all invoking ua-parser to speed it up, but still..) [23:16:29] Krinkle: if we ever deploy these changes you will not need to do the parsing anymore [23:16:49] nuria: even for statsv/kafka? [23:17:34] I guess the one that currently happens when feeding hive will happen earlier at kafka level and then re-used everywhere? [23:17:36] Krinkle: no, sorry, only for EL, true, the webrequest with processed urls is accesible via hive but not kafka [23:17:48] and processed user_agents [23:18:01] ah okay, the plan is to keep the raw UA in kafka [23:18:20] is that a feature, or is it for efficiency/stability? [23:18:58] Krinkle: it is scale i'd say, if we processed pageviews via streaming (not batch) we could easily do that [23:19:15] Krinkle: live stream| some | processed stream [23:19:48] Krinkle: and we are going to do a poc about that but it has many caveats: no reruns/how do you account for data loss? [23:20:21] Yeah, if the ua parser has problems or needs to be updated (which it does need to , periodically, to account of new browser) [23:20:27] then you'd like to repopulate any known bad data [23:20:34] Krinkle: so to be operationally sound we would need both batch processing (delayed) and real time would be a bonus that could be used for your purpose and others [23:20:53] Krinkle: which reminds me we have not updated it in a while [23:21:11] Krinkle: i still have on my queue the opera turbo change