[07:54:17] 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Add a prometheus metric exporter to all the Druid daemons - https://phabricator.wikimedia.org/T177459#3807906 (10Jason821) Hi all, I'm trying to run this program, but haven't figured out how. I'm running 'python exporter.py -l xxxx:8000' to run it on my... [08:55:35] 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Add a prometheus metric exporter to all the Druid daemons - https://phabricator.wikimedia.org/T177459#3807947 (10elukey) Hi! Can you give me a bit more detail about the "doesn't work well"? https://github.com/wikimedia/operations-software-druid_exporte... [09:15:14] morning! [09:15:23] I am going to restart the hadoop reboots [09:15:36] analytics1040 -> analytics1068 [09:26:56] 10Analytics-Kanban, 10User-Elukey: Restart Analytics JVM daemons for open-jdk security updates - https://phabricator.wikimedia.org/T179943#3807997 (10elukey) >>! In T179943#3807993, @Stashbot wrote: > {nav icon=file, name=Mentioned in SAL (#wikimedia-operations), href=https://tools.wmflabs.org/sal/log/AWAg10Op... [09:59:35] helloooo team [09:59:59] o/ [10:01:26] hi elukey how was last week? [10:02:09] mforns: good! What about you?? [10:05:23] 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Add a prometheus metric exporter to all the Druid daemons - https://phabricator.wikimedia.org/T177459#3808125 (10Jason821) Hi, I've made it work already! But I didn't see JVM/GC related metrics collected in prometheus, which display in your demo -> htt... [10:08:31] elukey, good! a bit stressed out, but everything starts to cool down now [10:08:39] 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Add a prometheus metric exporter to all the Druid daemons - https://phabricator.wikimedia.org/T177459#3808141 (10elukey) >>! In T177459#3808125, @Jason821 wrote: > Hi, > > I've made it work already! But I didn't see JVM/GC related metrics collected in... [10:13:15] mforns: how's the weather? [10:14:08] elukey, weather is nice, not so hot, a bit rainy, bit cold at night [10:28:01] :) [10:28:22] mforns: not sure if you've read the updates about the analytics db but this is the summary [10:28:50] 1) I backed up the remaining "unowned" dbs from db1047 to my home dir on stat1006 [10:29:08] aha [10:29:11] 2) assigned role::spare::system (and shut down mysql) on db104[67] [10:29:30] 3) dropped the log database from dbstore1002 [10:29:50] cool [10:30:32] 10Quarry, 10Discovery, 10VPS-Projects, 10Wikidata, and 2 others: Setup sparqly service at https://sparqly.wmflabs.org/ (like Quarry) - https://phabricator.wikimedia.org/T104762#3808229 (10Lucas_Werkmeister_WMDE) I donโ€™t think thatโ€™s a good fit. Query results arenโ€™t necessarily notable for Commons, nor are... [10:31:27] awesome elukey, I've seen the EL purging taks resolved :D [10:33:40] mforns: it runs daily and afaics it should be good, but if you want to check on db1108 [10:41:22] 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Add a prometheus metric exporter to all the Druid daemons - https://phabricator.wikimedia.org/T177459#3808258 (10Jason821) Druid.io doesn't seem to have a jmx interface, how can I collect those JVM metrics with this jmx_exporter? I only see JVM/GC metric... [10:44:58] 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Add a prometheus metric exporter to all the Druid daemons - https://phabricator.wikimedia.org/T177459#3808262 (10elukey) >>! In T177459#3808258, @Jason821 wrote: > Druid.io doesn't seem to have a jmx interface, how can I collect those JVM metrics with th... [11:14:33] PROBLEM - Hadoop NodeManager on analytics1056 is CRITICAL: PROCS CRITICAL: 0 processes with command name java, args org.apache.hadoop.yarn.server.nodemanager.NodeManager [11:14:54] PROBLEM - Hadoop NodeManager on analytics1057 is CRITICAL: PROCS CRITICAL: 0 processes with command name java, args org.apache.hadoop.yarn.server.nodemanager.NodeManager [11:15:31] I am draining them but it is taking ages --^ [11:15:34] downtime expired [11:25:03] RECOVERY - Hadoop NodeManager on analytics1057 is OK: PROCS OK: 1 process with command name java, args org.apache.hadoop.yarn.server.nodemanager.NodeManager [11:25:34] RECOVERY - Hadoop NodeManager on analytics1056 is OK: PROCS OK: 1 process with command name java, args org.apache.hadoop.yarn.server.nodemanager.NodeManager [11:32:52] The failures from oozie are due to my reboots [11:33:02] will restart jobs as soon as I finish [11:44:06] Does anyone now what is the limit on the FetchResults method in Hive? Some of my queries return org.apache.thrift.TException: Error in calling method FetchResults Error: org.apache.thrift.TException: Error in calling method CloseOperation (state=08S01,code=0) Error: Error while cleaning up the server resources (state=,code=0) and some: Exception in thread "main" java.lang.OutOfMemoryError: Java heap space. Of course I've reduced my appetite [11:44:06] [11:45:21] GoranSM: how many rows were you trying to fetch? [11:45:31] (even a time window is fine) [11:46:26] elukey: Hi elukey. The LIMIT in the query was set to 10,000,000 - and it is possible that at least some of the queries hit it indeed. [11:48:16] not sure what are the Xmx settings, but they might not be enough for such a big result set [11:49:21] elukey: I thought so. Thanks. It is inessential for what I need, I've reduced the number of rows being fetched and that will do. For now. See: https://phabricator.wikimedia.org/T181965 [11:50:24] will try to check, 10M rows are a lot though regardless of the Xmx settings :) [11:51:16] 10Analytics-Kanban, 10Operations, 10ops-eqiad: kafka1018 fails to boot - https://phabricator.wikimedia.org/T181518#3808499 (10mark) >>! In T181518#3800473, @elukey wrote: > Just spoke to Chris and Faidon on IRC, and with my team. The best option seems to be repurpose notebook1002.eqiad.wmnet to kafka1023.eqi... [11:53:30] elukey: no worries, the constraint is not pressing at all at the moment. The thing is that if we decide to have a system that tracks the frequency of usage of Wikidata items **for all items** - something that I'm working on - then we will need to see whether we can increase the limits set in the Xmx settings. Thanks a lot! [11:54:20] elukey: I will also take a look at the docs to see if anything like an incremental fetch from Hive is possible, or what are the known workarounds, if any. [12:04:24] !log re-run webrequest-load-check_sequence_statistics-wf-upload-2017-12-4-7 (failed due to reboots) [12:04:26] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [12:04:44] ahhahah lol [12:04:52] !log re-run webrequest-load-wf-upload-2017-12-4-8 (failed due to reboots) [12:04:54] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [12:09:14] * elukey lunch! [12:19:49] GoranSM: With a system tracking **all items**, you'd probably go for either storing results in HDFS or something similar, not fetch them as results as a hive query I assume [12:39:21] joal: Right. With system going for all items I guess your favorite Druid pal will have to catch them up from Hive and act as a serving layer somewhere. [12:39:57] GoranSM: probably not druid, no - items dimension has too high of a cardinality - You'd go for cassandra I assume [12:40:52] joal: Thanks for letting me. In any case, it will be another database to learn to work with for a guy felt in love with statistical modeling once :) [12:40:52] GoranSM: And this is in with the idea of having to serve them through an api with low latency - if you only want to make offline analysis, even i slow, hive could be a good friend :) [12:41:19] GoranSM: hive or spark, obviously :) [12:42:09] joal: it seems possible to use the {SparkR} R package from stat1005 since the installation of Spark 2. Implies: you can probably expected many questions from me this month :) [12:43:09] GoranSM: I've nevere used sparkR, so I might not be a good helper, but I'll take questions :) [12:44:32] joal: :) Don't worry, just kidding, the {SparkR} docs are pretty elaborated [12:44:38] cool [12:47:19] helloooo elukey, are we still having the same ol trouble with the COPY command in cqlsh? [13:17:54] (nvm used insert statements like there's no tomorrow) [13:20:44] :D [13:20:58] I don't remember the COPY issue [13:21:03] buuut if you already solved good :) [13:21:22] joal: o/ - if we want we can test the druid middlemanager restart [13:22:25] fdans: I'm sorry to have to answer you that: tomorrow never dies (https://www.youtube.com/watch?v=9BelIZssK1k) [13:22:38] Heya elukey - Let's go for it [13:25:17] 10Analytics-Kanban, 10Patch-For-Review: Remove AppInstallIId from EventLogging purging white-list - https://phabricator.wikimedia.org/T178174#3808836 (10mforns) Hi @Tbayer and all, Sorry for not responding to T178174#3684426, didn't catch it. > Which schemas are affected exactly? Yes, searching for "appInstal... [13:27:28] joal: so I'd proceed with curl -X POST http://druid1002.eqiad.wmnet:8091/druid/worker/v1/disable [13:27:50] elukey: we have druid100[23] to restart correct? [13:27:56] correct [13:27:59] ok [13:28:07] go for it [13:28:38] {"druid1002.eqiad.wmnet:8091":"disabled"} [13:30:15] (03PS21) 10Mforns: Add core class and job to import EL hive tables to Druid [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/386882 (https://phabricator.wikimedia.org/T166414) [13:30:53] the middlemanager is executing the banner task [13:31:15] elukey: it should drain - meaning, next hour, not contain any task [13:31:15] so I am wondering if I can stop the middle manager now or need to wait the hour [13:31:29] let's wait for 3pm [13:33:02] I think that we could already proceed, but there is no rush [13:34:50] (03CR) 10jerkins-bot: [V: 04-1] Add core class and job to import EL hive tables to Druid [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/386882 (https://phabricator.wikimedia.org/T166414) (owner: 10Mforns) [13:40:36] 10Analytics-Kanban, 10Operations, 10ops-eqiad, 10Patch-For-Review: kafka1018 fails to boot - https://phabricator.wikimedia.org/T181518#3808893 (10elukey) My idea is to: 1) set notebook1002 as spare::system and clean it up from running services. 2) use wmf-auto-reimage to rename the host to kafka1023 if po... [14:05:17] joal, days/months in the top metric with a single digit have to be filled by zeroes in the URL right? like /2015/5/5 should be /2015/05/05 [14:05:53] fdans: --verbose ? [14:06:51] joal: when using the top pageviews endpoint [14:07:41] you shouldn't be able to use dates like https://wikimedia.org/api/rest_v1/metrics/pageviews/top/sq.wikipedia.org/all-access/2015/1/1 [14:07:56] ah sorry, I guess I just answered my question [14:08:05] never mind joal [14:08:20] :) [14:08:20] * fdans and his monday of stoopid questions [14:12:39] (03PS3) 10Fdans: Add pageviews by country endpoint [analytics/aqs] - 10https://gerrit.wikimedia.org/r/393591 (https://phabricator.wikimedia.org/T181520) [14:16:01] joal: just tested that change in aqs beta with data ๐Ÿ‘Œ๐Ÿผ๐Ÿ‘Œ๐Ÿผ๐Ÿ‘Œ๐Ÿผ [14:16:32] I'm going to apply the 1000 pageview limit in the queries now [14:17:38] !log re-run pageview-druid-hourly-wf-2017-12-4-11 in Hue (failed due to reboots) [14:17:39] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [14:17:54] joal: druid1002's middlemanager drained [14:18:04] so now I'd proceed in this way [14:18:20] 1) stop zookeeper on druid1002 and see what happens to the overlords [14:18:32] 2) stop the druid1002's daemons and reboot [14:19:23] 10Analytics, 10Discovery, 10EventBus, 10Wikidata, and 3 others: Create reliable change stream for specific wiki - https://phabricator.wikimedia.org/T161731#3809028 (10Ottomata) Unfortunately not yet! We are very close...the cluster is up and running, but [[ https://phabricator.wikimedia.org/T175461 | port... [14:19:49] 10Analytics-Cluster, 10Analytics-Kanban: Port Clients to new jumbo cluster - https://phabricator.wikimedia.org/T175461#3809037 (10Ottomata) [14:19:53] 10Analytics, 10Discovery, 10EventBus, 10Wikidata, and 3 others: Create reliable change stream for specific wiki - https://phabricator.wikimedia.org/T161731#3809036 (10Ottomata) [14:20:12] (03PS10) 10Fdans: Add pageview by country oozie jobs [analytics/refinery] - 10https://gerrit.wikimedia.org/r/394062 (https://phabricator.wikimedia.org/T181521) [14:20:41] ah! Just stopped zookeeper and the overlord changed to druid1001 [14:20:44] what the hell [14:22:00] 2017-12-04T14:20:17,570 INFO org.apache.zookeeper.ClientCnxn: Unable to read additional data from server sessionid 0xe95e328c024806dc, [14:22:03] likely server has closed socket, closing socket connection and attempting reconnect [14:22:06] this is on druid1003 [14:23:15] now I am curious to know what happens in this state to the next job submitted [14:23:32] which job you mean? [14:23:39] task sorry [14:23:48] aither realtime or backend? [14:23:53] we should see soon [14:24:12] no preference, just want to see if a new task fails or not [14:24:19] elukey: at least disabling worked as expected [14:24:52] yep seems so [14:26:53] index_hadoop_pageviews-hourly_2017-12-04T14:26:16.530Z started [14:27:10] scheduled not on druid1002 and it seems working [14:27:25] that's great :) [14:27:33] elukey: are you still rebooting machines? [14:27:41] hadoop machines I mean sorry elukey [14:28:23] yep [14:28:36] I am doing the 106* series [14:28:45] so I should be done soon [14:29:42] ok so I am going to shutdown overlord, middlemanager, coordinator, historical and broker on druid1002 joal [14:29:47] ok [14:31:44] rebooting [14:36:19] druid1002 back up and running [14:36:19] elukey: from what I see on pivot, no holes - I like that procedure :) [14:36:44] so let's see now if the next tasks are going to be ok [14:37:01] if so, the theory of the rougue middlemanager is probably the root cause [14:37:06] of the previous issue [14:37:25] elukey: I assume so yes [14:37:41] elukey: Have you reenable middlemanager on druid1002? [14:37:59] elukey: cluster now have 3 nodes on coord UI [14:38:00] as far as I've read once you restarted it gets back into the game [14:38:15] mforns: i wonder if we should update https://wikitech.wikimedia.org/wiki/Analytics/Systems/EventLogging/Schema_Guidelines to remove the examples of time spent/since measures [14:38:20] since they don't work so well for aggregations [14:38:28] joal: never checked the coord ui, only the overlord one, [14:38:46] elukey: coord is actually much fancier ;) [14:38:59] ottomata, makes sense... [14:39:06] elukey: currently on druid1003:8081 [14:39:45] ahhh nice [14:40:00] elukey: you can check datasrouces, segments etc [14:40:05] ottomata, time measures make total sense to EL I think, but the problem is we didn't find an easy enought way to show them no? [14:40:17] in pivot I mean [14:40:25] but we might get there... [14:40:50] 10Analytics, 10MediaWiki-API: There is not an easy way to tag API requests by application for analytics - https://phabricator.wikimedia.org/T181862#3809131 (10dbarratt) >>! In T181862#3805021, @Legoktm wrote: > Can you not use a unique user-agent for your application? The [[ https://developer.mozilla.org/en-U... [14:40:53] oh its just a pivot problem? ok then [14:40:53] nm [14:42:40] ottomata, AFAIK hive supports percentile calculation [14:42:46] (03PS11) 10Fdans: Add pageview by country oozie jobs [analytics/refinery] - 10https://gerrit.wikimedia.org/r/394062 (https://phabricator.wikimedia.org/T181521) [14:42:53] rebooting the last 106* nodes people [14:42:55] ottomata: o/ [14:43:20] hey all [14:43:29] hi mforns!! [14:43:36] hi milimetric :] [14:43:40] hiii [14:43:56] I'm curious what you think of India now, after the crazy transformation in the last few month [14:43:58] s [14:44:21] milimetric, I didn't know... what happened? [14:44:37] they took out all that cash from circulation [14:45:21] oh, this happened also at the beginning of this year [14:45:35] right, but now I understand the effects are stronger [14:45:43] after having some time [14:45:58] aha, I'm not aware really [14:46:14] (03PS12) 10Fdans: Add pageview by country oozie jobs [analytics/refinery] - 10https://gerrit.wikimedia.org/r/394062 (https://phabricator.wikimedia.org/T181521) [14:46:36] I think the situation is vastly better now compared to when the did the demonetization (a year ago). [14:46:43] It was total chaos then. [14:46:47] Also, hi! [14:46:57] milimetric, I thought it was to avoid falsifications [14:47:20] hey! :] [14:47:35] whatuuuup mforns [14:47:44] whazaaa fdans :] [14:47:52] milimetric: can I start reviewing your patch on wikistats? [14:48:09] fdans: yeah! I moved it to in-review and took off the [WIP], that signals it's all yours [14:48:18] cool [14:48:23] mforns: if you wanna look at this thing too, I'll add you [14:48:39] milimetric, ok! [14:48:52] omg, sorry fdans I didn't add you as a reviewer [14:48:56] I thought you were already on there [14:49:07] yeah me too! [14:49:30] fdans: check out the line count [14:49:37] * milimetric is so proud of net negative code changes [14:50:16] you are become death, the destroyer of code [14:50:26] :P [14:50:33] this is... the nicest thing anyone's ever said to me [14:51:28] elukey: finished with hadoop nodes? [14:51:43] elukey: I want to restart my spark job that fialed miserably all weekend loong :( [14:54:14] (03CR) 10Mforns: "LGTM overall! Left a comment about naming, and a couple minor typos." (035 comments) [analytics/aqs] - 10https://gerrit.wikimedia.org/r/393591 (https://phabricator.wikimedia.org/T181520) (owner: 10Fdans) [14:54:27] joal: nope still need a bit of time, sorry! [14:54:42] thank you for the review mforns !!! [14:54:44] np prob elukey - I'm just fed up at spark currently [14:55:13] np fdans :] [15:07:31] elukey: Shall I restart upload refine jobs? [15:07:51] sure [15:08:19] !log Rerunning 15:47:35 < fdans> whatuuuup mforns [15:08:20] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [15:08:32] roooooh - My copy-paste doesn't work as expected [15:08:35] Again [15:08:36] 10Analytics, 10MediaWiki-API: There is not an easy way to tag API requests by application for analytics - https://phabricator.wikimedia.org/T181862#3804738 (10Anomie) >>! In T181862#3809131, @dbarratt wrote: > I was under the impression that API requests where not really logged anywhere? I don't know why you... [15:08:39] hahah excellent [15:09:04] !log Rerun webrequest-load-wf-upload-2017-12-4-12 and webrequest-load-wf-upload-2017-12-4-13 [15:09:05] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [15:09:23] I mean, that logged sentence was also very important fdans :) [15:12:04] 10Analytics, 10MediaWiki-API: There is not an easy way to tag API requests by application for analytics - https://phabricator.wikimedia.org/T181862#3809221 (10dbarratt) @Anomie whoops... sorry 'bout that, I do remember now. Yes I remember that I created this because it seemed much more complicated than "taggin... [15:13:10] joal: druid1002 is serving traffic! [15:13:26] Hey elukey :) [15:13:32] This is a success :) [15:13:43] elukey: it's long, but works [15:14:20] oh yes [15:14:22] \o/ [15:21:29] (03PS7) 10Fdans: [wip] Map component and Pageviews by Country metric [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/392661 (https://phabricator.wikimedia.org/T181529) [15:24:10] (03CR) 10jerkins-bot: [V: 04-1] [wip] Map component and Pageviews by Country metric [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/392661 (https://phabricator.wikimedia.org/T181529) (owner: 10Fdans) [15:26:52] 10Analytics, 10MediaWiki-API: There is not an easy way to tag API requests by application for analytics - https://phabricator.wikimedia.org/T181862#3809269 (10dbarratt) @Anomie Also... side note... since EventLogging is part of MediaWiki (and, if I am reading this correctly, to be used on a MediaWiki page) how... [15:34:33] (03PS9) 10Joal: Upgrade scala to 2.11.7 and Spark to 2.1.1 [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/348207 [15:35:46] Maaaaaaan - Inbelievable - Spark 1.6 dataframes don't check for nulls, even if defined as non-nullable in schemas [15:35:55] * joal is grumpy [15:38:36] (03PS1) 10Joal: Add performance tests for scala JSON libs [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/395018 [15:38:59] ebernhardson: --^ Added that patch, for you to double check py perf tests of json libs in scala [15:46:56] (03CR) 10Fdans: "This is a great change. Left a few comments but none super prioritary :)" (0315 comments) [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/391490 (https://phabricator.wikimedia.org/T180556) (owner: 10Milimetric) [15:47:39] milimetric is in killimetric mode [15:48:21] heh [15:49:21] hey a-team: I'm going to try to join the batcave 10 minutes early before standup, to have time to hang out and shoot the shit [15:51:38] (03PS4) 10Fdans: Add pageviews by country endpoint [analytics/aqs] - 10https://gerrit.wikimedia.org/r/393591 (https://phabricator.wikimedia.org/T181520) [16:12:54] !log restart webrequest-load-wf-upload-2017-12-4-13 (failed due to hadoop reboots) [16:12:56] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [16:15:16] 10Analytics-Kanban, 10Analytics-Wikistats: Add link to new wikistats 2.0 to wikistats 1.0 pages - https://phabricator.wikimedia.org/T182001#3809460 (10Nuria) [16:29:30] !log restart webrequest-load-wf-upload-2017-12-4-12 (failed due to hadoop reboots) [16:29:31] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [16:34:46] joal: thanks! [16:35:33] joal: i spent some more time playing around with it over the weekend and found another 10x improvement, bringing the new code up to 300x faster than python :) [16:35:49] * ebernhardson now wonders if the python could have been optimized as well .. but whatever [16:35:51] ebernhardson: do you mind explaning? [16:36:48] joal: mostly the next 10x came from assigning each query urlids starting at 0, instead of having a global url id counter, and then replacing two occurances of Array[Map[Int, UrlRel]] with Array[Array[UrlRel] [16:37:18] of course ebernhardson ! a local map instead of a global one ! How cleber :) [16:37:31] s/cleber/clever [16:38:09] now performance profiling shows no code is invoked outside of those functions (no support code), and memory allocation is completely flat (no allocation in the tight loop) [16:38:42] That's great :) With serde improvement, your code will fly :) [16:39:27] yup, before was taking 30 minutes with 200+executors (and 2000 partitions). curious how this will run now :) i'm sure there will be some overhead from spark but should be much better [16:39:53] ebernhardson: Please let us know :) I always love to know stuff goes fast :D [16:45:36] 10Analytics-Cluster, 10Analytics-Kanban, 10Analytics-Wikistats: Add map component to Wikistats 2 - https://phabricator.wikimedia.org/T181529#3809607 (10fdans) [16:45:50] 10Analytics, 10Analytics-Kanban: Privacy pageview threshold for map report - https://phabricator.wikimedia.org/T181508#3809610 (10fdans) [16:47:12] elukey: i kinda want to work on kafka ssl stuff this afternoon [16:47:18] got some mins before ops meeting to discuss? [16:49:48] ottomata: even after if you want [16:49:57] i got more meetings after [16:50:05] 10Analytics, 10Discovery, 10EventBus, 10Wikidata, and 3 others: Create reliable change stream for specific wiki - https://phabricator.wikimedia.org/T161731#3809640 (10Nuria) @Ottomata Could @Smalyshev do a test on consuming from the new cluster though with teh understanding it is not yet productionized to... [16:50:09] lemme join batcave [16:50:25] oh its full [16:50:25] ah sorry batcave-2 [16:50:27] ya [16:53:17] 10Analytics-Kanban, 10WMDE-Analytics-Engineering, 10User-GoranSMilovanovic: Please review the WDCM public datasets and allow them to access published datasets on stat1005 - https://phabricator.wikimedia.org/T181871#3809662 (10fdans) a:03mforns [16:57:18] 10Analytics, 10MediaWiki-API: There is not an easy way to tag API requests by application for analytics - https://phabricator.wikimedia.org/T181862#3804738 (10fdans) @dbarratt please ping us on IRC on #wikimedia-analytics to get a better picture of what you want to do [16:58:09] fdans ping [16:58:31] a-team ห† [16:59:46] Hi davidwbarratt - We're looking for context / information on the above ticket you wrote [17:00:17] hi davidwbarratt! we are in the backlog grooming meeting, do you want to chime in? https://hangouts.google.com/hangouts/_/wikimedia.org/a-batcave [17:01:26] so... we have a client-side application (JavaScript) and... eventually we'd like to be able to get analytics on the the users using it... there's a couple ways this could be done, we could just track the API requests which would tell us everything we want to know, but there are potentially other solutions to (like piwik, etc.) [17:01:59] 10Analytics, 10Analytics-EventLogging, 10MediaWiki-API, 10Easy, and 2 others: ApiJsonSchema implements ApiBase::getCustomPrinter for no good reason - https://phabricator.wikimedia.org/T91454#3809705 (10fdans) [17:03:01] davidwbarratt, yes, at first glance, I would think of either piwik or EventLogging, do you have a use case that these systems do not support? [17:03:28] mforns well it's not within MediaWiki.. so I don't think EventLogging will work [17:03:29] davidwbarratt, https://wikitech.wikimedia.org/wiki/Analytics/Systems/EventLogging [17:03:56] mforns but I think piwik would work well, but I'm not sure a random Toolforge tool can use our piwik? [17:03:57] EventLogging has also a client for JS [17:04:42] mforns uhhh where would I find the client? [17:05:00] looking [17:06:21] davidwbarratt, we're talking about this in our meeting, and yes, EventLogging would not be a fit for this case, you're right [17:06:32] but piwik would [17:06:51] we already use piwiki in labs a lot [17:06:59] *piwik [17:07:15] mforns yeah I _think_ we could just log the queries with piwik before sending to the API [17:07:23] aha [17:07:27] mforns how would we add a new site to Piwik? [17:08:36] 10Analytics-Kanban, 10Analytics-Wikistats: Privacy pageview threshold for map report - https://phabricator.wikimedia.org/T181508#3809755 (10fdans) [17:08:50] davidwbarratt: can you tell us a bit more about your app? [17:08:51] davidwbarratt: I can add it for you, I'll ping you for details after our meeting [17:08:59] davidwbarratt: sa link to it so we can see what doe sit do? [17:09:17] https://tools.wmflabs.org/interaction-timeline/?wiki=testwiki&user=Derby%20pie&user=Sweets%20lover [17:09:23] davidwbarratt: once we see that we can tell you best fit , piwik is a good fit for website analytics [17:09:42] davidwbarratt: web uis used by not so-many-users [17:09:46] and here's the code https://github.com/wikimedia/interactiontimeline [17:09:55] 10Analytics, 10Analytics-Wikimetrics, 10Epic: EPIC: Productionizing Wikimetrics {dove} - https://phabricator.wikimedia.org/T76726#3809766 (10fdans) [17:09:58] 10Analytics, 10Analytics-Wikimetrics: change wikimetrics in labs to use debian packages [13 pts] {dove} - https://phabricator.wikimedia.org/T76770#3809765 (10fdans) 05Open>03declined [17:09:59] davidwbarratt: where is it deployed? [17:10:01] I mean another solution is that we make a proxy for the API [17:10:10] davidwbarratt: is it deployed in labs? [17:10:17] nuria_ yes https://tools.wmflabs.org/interaction-timeline/?wiki=testwiki&user=Derby%20pie&user=Sweets%20lover [17:10:23] 10Analytics, 10Analytics-Wikimetrics, 10Epic: EPIC: Productionizing Wikimetrics {dove} - https://phabricator.wikimedia.org/T76726#818961 (10fdans) [17:10:28] 10Analytics, 10Analytics-Wikimetrics: upgrade wikimetrics to trusty [8 pts] {dove} - https://phabricator.wikimedia.org/T76769#3809768 (10fdans) 05Open>03declined [17:10:40] davidwbarratt: mmm.. probably that is not a good solution to learn about your users [17:10:41] nuria_ but it seems kinda... idk... unnecessary to make a whole API proxy just to get analytics [17:10:49] davidwbarratt: what are you interested on knowing? [17:11:05] 10Analytics, 10Analytics-Dashiki: weekly/monthly granularity for pageviews in Dashiki - https://phabricator.wikimedia.org/T76092#3809794 (10fdans) 05Open>03Resolved a:03fdans [17:11:18] nuria_ that hasn't been strictly defined yet but loosly https://phabricator.wikimedia.org/T180088 [17:11:30] davidwbarratt: looking [17:11:44] i.e. what queries they are running (i.e. what are they putting into the form field, etc.) [17:11:52] how often people are using it [17:12:32] (03PS11) 10Milimetric: Simplify and fix breakdowns and other data [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/391490 (https://phabricator.wikimedia.org/T180556) [17:13:29] "to know if the Timeline is providing acceptable results for our users" I guess summarizes the problem [17:14:01] this ticket isn't pressing, but it is in the pipeline [17:14:14] so I wanted to at least know _how_ we would do it [17:15:03] (03CR) 10Milimetric: "pushed a new patch and replied" (0313 comments) [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/391490 (https://phabricator.wikimedia.org/T180556) (owner: 10Milimetric) [17:15:12] 10Analytics, 10Discovery, 10EventBus, 10Wikidata, and 3 others: Create reliable change stream for specific wiki - https://phabricator.wikimedia.org/T161731#3809821 (10Ottomata) Sure, I suppose! You can connect to it with a Kafka client now. The Kafka brokers are kafka-jumbo100[1-6].eqiad.wmnet:9092 I th... [17:19:11] so afaik... the solutions are 1) Use Piwik or 2) Log the API requests (via the API itself, or via a Proxy, but imho, should be fixed in the API) [17:19:17] 10Analytics-Kanban, 10User-Elukey: Restart Analytics JVM daemons for open-jdk security updates - https://phabricator.wikimedia.org/T179943#3809888 (10elukey) [17:20:26] signing off team, byeeeee [17:21:36] 10Analytics, 10ChangeProp, 10EventBus, 10MediaWiki-JobQueue, and 5 others: Select candidate jobs for transferring to the new infrastucture - https://phabricator.wikimedia.org/T175210#3809898 (10mobrovac) [17:21:48] davidwbarratt: ok, from what i see you want some vanilla web analytics on a tool on labs, i would look to piwik , you do not need anything fancier than that [17:22:15] davidwbarratt: how many users are you estimating your app will have? [17:22:29] nuria_ great! uhh, very few, it's really just an admin tool [17:22:36] nuria_ if there's more than a few something is wrong. :) [17:22:44] davidwbarratt: then, i would not log too much [17:23:04] davidwbarratt: you will not likely get any signal at all thta you vcannot get w/o asking your users [17:23:19] davidwbarratt: so plain piwik usage woudl be fine [17:24:25] nuria_ great! uhh, what we need to do to set that up on our instance (doesn't need to be done now, I can jut put that in the ticket) [17:24:30] (03CR) 10Fdans: [C: 032] Simplify and fix breakdowns and other data [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/391490 (https://phabricator.wikimedia.org/T180556) (owner: 10Milimetric) [17:24:52] milimetric: feel free to release/I can do that tomorrow morning [17:24:57] davidwbarratt: once you are ready you can open us a ticket and we will give you a piwik snippet you can copy [17:25:08] nuria_ perfect, thanks [17:25:08] milimetric: maybe 2.1.0 since it's the beta release? [17:25:15] alpha* [17:25:35] davidwbarratt: you can close the other ticket yo open, it mixes mw api plus eventlogging which are not of relevance here [17:25:52] nuria_ kk, will do [17:26:37] (03Merged) 10jenkins-bot: Simplify and fix breakdowns and other data [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/391490 (https://phabricator.wikimedia.org/T180556) (owner: 10Milimetric) [17:30:01] gone for diner a-team, will be ack after [17:32:00] o/ [17:33:32] 2.1.0 sounds good fdans, I think that's how semver works [17:34:32] nuria_: so that change is merged, I've gotta cook lunch but ping me if you want to talk webpack [17:34:38] I'm going to deploy later today [17:46:02] ok, updated our ticket with your recommendations https://phabricator.wikimedia.org/T180088#3809941 [18:12:26] davidwbarratt: ok, let us know when your tool is deployed [18:13:35] nuria_ no problem! I'm mean technically it's already usable on Toolforge https://tools.wmflabs.org/interaction-timeline/?wiki=testwiki&user=Derby%20pie&user=Sweets%20lover but it isn't widely marketed to admins yet. :) [18:19:19] 10Analytics, 10Analytics-EventLogging, 10MediaWiki-API, 10Easy, and 2 others: ApiJsonSchema implements ApiBase::getCustomPrinter for no good reason - https://phabricator.wikimedia.org/T91454#3810080 (10Legoktm) @fdans This is an issue with the EventLogging MediaWiki extension. [19:01:17] 10Analytics-Kanban: Geowiki stopped updating on October 24th - DATA LOSS (read comments) - https://phabricator.wikimedia.org/T179952#3810295 (10Ijon) Thanks. It is not a critical issue, and no need to spend any time recovering it. The availability of this data to our community is the important thing. Are we s... [19:07:50] ottomata: o/ [19:08:00] do you have a sec for a puppet suggestion/brain bounce? [19:10:44] sure! [19:10:50] bc? [19:11:07] even here is fine! [19:11:30] so I created https://gerrit.wikimedia.org/r/#/c/395054 to add the prometheus config for hadoop master/standby [19:11:30] ok [19:11:39] same trick used for the worker [19:11:52] this time though it doesn't work - https://puppet-compiler.wmflabs.org/compiler02/9152/analytics1001.eqiad.wmnet/ [19:11:58] +YARN_RESOURCEMANAGER_OPTS="$YARN_RESOURCEMANAGER_OPTS -Xms2048m -javaagent:/usr/share/java/prometheus/jmx_prometheus_javaagent.jar=10.64.36.118::" [19:12:14] see the last ::? it should contain a path and a port number [19:14:09] looking.. [19:14:33] elukey: not sure, never tried what you are trying before! [19:14:45] but, i would put include ::profile::hadoop::common at thte top [19:14:48] before the variable assignments [19:14:53] unlikely that it matters, i know [19:14:55] but worth a try [19:15:05] oh wait [19:15:06] worry [19:15:06] sorry [19:15:09] still confused hang on [19:15:39] elukey: i'm not sure how this would work [19:15:56] so, you have master.yaml, which gets passed to the master.pp during role hiera param lookup stuff [19:16:41] profile::hadoop::commons pulls profile::hadoop::common::hadoop_namenode_opts etc.. [19:17:02] and that key contains a lookup for variables defined in the class [19:17:11] that worked perfectly for worker.pp [19:17:21] so I thought that it was a allowed trick [19:17:40] that's a crazy trick! i woudln't expect it to work [19:17:51] didn't know you could pull puppet defined variables into hiera [19:18:28] my understanding is that puppet would have looked them up when using them, so no hiera involvement [19:18:37] oh really? [19:18:44] oh when the hiear() function runs [19:18:50] they'll be looked up from wahtever scope? [19:18:52] hm [19:19:04] but I am probably wrong, since this is not working [19:19:37] it could be a class ordering problem [19:19:41] role master does [19:19:45] include ::profile::hadoop::mysql_password [19:19:46] which does [19:19:50] require ::profile::hadoop::common [19:20:09] haha, wait, no this is confusing! [19:20:39] yes, luca, ok i htink this might be it [19:20:53] "Luca you are stupid" [19:20:59] is that it? :D [19:21:03] profile::hadoop::common is required by profile::hadoop::mysql_password, so i betcha that it is being evaled first somehow [19:21:20] profile::hadoop::common gets required, which calls all the hiera() functions for the params [19:21:28] looks up the value of profile::hadoop::common::hadoop_namenode_opts for the role [19:21:40] but profile::hadoop::master hasn't been included/required yet [19:21:54] so profile::hadoop::master::namenode_jmx_exporter_config_file is undef [19:22:31] let's see with pcc [19:24:08] I used that trick since after using %{::ipaddress} I wondered "Why not?" [19:25:15] ipaddress works for sure [19:25:18] because its defined by facter [19:25:19] not puppet [19:25:40] 10Analytics, 10ChangeProp, 10EventBus, 10MediaWiki-JobQueue, and 4 others: Migrate htmlCacheUpdate job to Kafka - https://phabricator.wikimedia.org/T182023#3810367 (10Pchelolo) [19:25:44] ahhh [19:26:04] elukey: ! gah I removed super.users from my test instancesin labs [19:26:07] and the cluster is still functioning [19:26:11] i never granted any ACLS though! [19:26:39] so you can produce/etc.. ? [19:27:21] yeah, and no errors for brokers replication [19:27:51] mmmm strange [19:29:25] Current ACLs for resource `Cluster:kafka-cluster`: [19:29:27] nothin. [19:31:14] what do the log say? [19:31:17] AHHH [19:31:20] because I set [19:31:29] Current ACLs for resource `Cluster:kafka-cluster`: [19:31:30] oops [19:31:41] allow.everyone.if.no.acl.found=true [19:31:49] ahh ok now I feel better :D [19:41:30] milimetric: is there any other geowiki data besides: srv/geowiki/data-private [19:41:35] on 1006? [19:44:20] ottomata: so, going back to prometheus, I am going to be less DRY and put some plain strings in hiera for the moment [19:44:33] +1 [19:44:47] thanks for the brain bounce! [19:45:41] 10Analytics-Kanban: Get access to geowiki data - https://phabricator.wikimedia.org/T182027#3810473 (10Nuria) [19:45:45] nuria_: that folder is the source for everything, but it's processed and put in different forms in the databases, limn-compatible files, and maybe other places I'm not sure about [19:49:13] 10Analytics-Kanban: Geowiki stopped updating on October 24th - DATA LOSS (read comments) - https://phabricator.wikimedia.org/T179952#3810503 (10Milimetric) Yes, @Ijon, on track. Thank you for the response. [19:49:35] 10Analytics-Kanban: Get access to geowiki data - https://phabricator.wikimedia.org/T182027#3810504 (10Tbayer) Specifically, it looks like we will need to be able to access the /srv/geowiki/data-private folder on stat1006. (The documentation https://wikitech.wikimedia.org/wiki/Analytics/Systems/Geowiki#Getting_ac... [19:53:17] going offline people! [19:53:18] o/ [19:53:23] * elukey off! [19:53:43] miBye elukey [19:57:52] 10Analytics-Kanban: Hadoop expansion Hardware - https://phabricator.wikimedia.org/T182029#3810524 (10Nuria) [20:00:12] 10Analytics, 10EventBus, 10MW-1.31-release-notes (WMF-deploy-2017-11-14 (1.31.0-wmf.8)), 10Patch-For-Review, 10Services (next): Timeouts on event delivery to EventBus - https://phabricator.wikimedia.org/T180017#3810549 (10Pchelolo) An interesting but probably completely useless and unrelated piece of inf... [20:00:56] 10Analytics-Kanban, 10Analytics-Wikistats: Privacy pageview threshold for map report - https://phabricator.wikimedia.org/T181508#3810551 (10Milimetric) @ezachte the trouble comes if there is so little reading traffic that it coincides with editing traffic. If that's the case, any pageview data could be used i... [20:04:02] 10Analytics-Kanban: Get access to geowiki data - https://phabricator.wikimedia.org/T182027#3810573 (10Ottomata) You both now have a geowiki-pw.txt file in your home directories on stat1006. You should be able to use that to access https://stats.wikimedia.org/geowiki-private (for reference: https://wikitech.wi... [20:04:11] 10Analytics-Kanban: Get access to geowiki data - https://phabricator.wikimedia.org/T182027#3810574 (10Ottomata) oh [20:05:40] 10Analytics-Kanban: Get access to geowiki data - https://phabricator.wikimedia.org/T182027#3810575 (10Ottomata) Tilman, I think the web interface just lets you access the same files. [20:14:47] elukey: i take that back, my cluster still works fine, even with that allow.all thing=false [20:17:06] joal: Hello! I'm looking at your mediawiki_wikitext table. Is the revision_id in it equal to rev_id or rev_text_id in revision table? Thanks! :) [20:17:44] chelsyx: I think it's rev_id [20:18:02] joal: Got it. Thanks! [20:18:19] np [20:22:27] Hi joal: Have a second for a question in the batcave? [20:22:33] sure Shilad [20:22:46] Thanks! [20:44:21] Gone for tonight a-team [21:05:43] (03PS1) 10Milimetric: Release 2.1.0 [analytics/wikistats2] (release) - 10https://gerrit.wikimedia.org/r/395087 [21:06:53] (03CR) 10Milimetric: [V: 032 C: 032] "updated the docs a little: https://wikitech.wikimedia.org/wiki/Analytics/Wikistats2.0#Contributing_and_Deployment" [analytics/wikistats2] (release) - 10https://gerrit.wikimedia.org/r/395087 (owner: 10Milimetric) [21:08:07] (03PS1) 10Milimetric: Prepare for release 2.1.0 [analytics/wikistats2] (release) - 10https://gerrit.wikimedia.org/r/395089 [21:08:13] (03CR) 10jerkins-bot: [V: 04-1] Prepare for release 2.1.0 [analytics/wikistats2] (release) - 10https://gerrit.wikimedia.org/r/395089 (owner: 10Milimetric) [21:08:40] (03Abandoned) 10Milimetric: Prepare for release 2.1.0 [analytics/wikistats2] (release) - 10https://gerrit.wikimedia.org/r/395089 (owner: 10Milimetric) [21:09:48] (03PS1) 10Milimetric: Prepare for release 2.1.0 [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/395090 [21:10:03] (03CR) 10Milimetric: [C: 032] Prepare for release 2.1.0 [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/395090 (owner: 10Milimetric) [21:16:20] (03CR) 10Milimetric: [V: 032 C: 032] Prepare for release 2.1.0 [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/395090 (owner: 10Milimetric) [22:13:08] 10Analytics-Kanban: Get access to geowiki data - https://phabricator.wikimedia.org/T182027#3810926 (10Tbayer) Thanks! I was able to access the files successfully using the web interface. Depending on how our usage of this data is going to turn out exactly, it would be preferable to also being able to access the... [23:56:46] hola nuria_. soo, my question is: do you think you all will have some bandwidth to help with the bot detection problem if I find a student and advisor who're up for helping with it? [23:57:06] I would be so happy to see this problem resolved. ;) [23:58:11] I /think/ for the first quarter we will work on it, a 10-20% of someone's time in your team would be great, so we can bounce questions, run our findings by you, etc.