[02:42:55] 10Analytics-Kanban: Create small sample mediawiki-history table in MariaDB - https://phabricator.wikimedia.org/T165309#3836500 (10Milimetric) I'm spinning in place on this task. Somehow hive corrupted data when I tried to export a sample. I have wiki_db == '{{Information|' in some records!!!? [03:52:43] 10Analytics-Kanban: Create small sample mediawiki-history table in MariaDB - https://phabricator.wikimedia.org/T165309#3836524 (10Milimetric) This is now available for querying. cc @Neil_P._Quinn_WMF, @ezachte, and @Tbayer . Feel free to play with it and let us know how useful it is. It's a bit of a pain to d... [09:06:28] joal: o/ [09:06:35] did a bit of a revolution in https://grafana.wikimedia.org/dashboard/db/analytics-hadoop?orgId=1&from=now-6h&to=now [09:06:58] it may be too extreme now but it is incredibly fast to load and see only what you need [09:09:50] also added some metrics for ebernhardson's task (sum or running containers, failed/killed, etc..) [09:10:23] my idea is to restart nodemanagers on 103* today to apply the new RSS calculation settings [09:10:33] and then to the rest next week if everything looks good [09:13:28] Sounds good to me :) [09:13:33] elukey: o/ --^ [09:43:16] elukey: question for you on graphite metrics [09:43:45] sure [09:44:20] elukey: To plot traffic for AQS on pageviews-per-article, you use the .sample_rate metric [09:44:28] Would you mind explaining me? [09:45:45] well if you are talking about the aqs-elukey dashboard, I copied over and modified the aqs one so probably there can be mistakes [09:46:28] I actually don't know what the sample_rate thing does, and when I try to apply to replicate for the new WKS2 enpoints, well I get strange results [09:47:27] ? [09:48:56] we could use the "rate" metric, just realized it [09:49:01] wow and it looks way more traffic [09:49:15] elukey: count? [09:50:22] well then you have to rate it no? [09:50:27] we already have the rate [09:50:45] elukey: why 'rate' it? [09:51:26] the count afaik is always increasing, so there is not much point in plotting it if you don't use a per second rate [09:51:41] but if it is the number of datapoints it might be good [09:51:46] maybe I am confusing with prometheus [09:51:49] checking [09:52:47] aqs-elukey now uses count [09:52:53] if you want to check [09:53:07] at some point we'll need to review the main aqs dashboard [09:56:26] elukey: Checking numbers vs hour hive data [09:57:26] elukey: We got 2625623 request for per-article endpoint on hour 7 today [09:57:42] in average ~730 req/s [10:01:24] * elukey just ran the webrequest_joal_checker.hql script to verify false positives [10:01:59] :) [10:03:30] wow the query is really nice [10:05:25] elukey: Ah, just realized that what we see in graphite is un-cached responses only !!! [10:10:33] elukey: I don't understand :( For an hour of data, I have more 200 than 2xx :S So weird [10:11:40] missing a bit of context [10:11:53] what data are you checking atm? :) [10:12:35] I'm checking data in grafan, using sumSeries(restbase.*.v1_metrics_pageviews_per-article_-project-_-access-_-agent-_-article-_-granularity-_-start-_-end-.*.200.sum) [10:14:14] ah ok [10:15:08] why .sum in the end? maybe .count ? [10:15:11] elukey: And I can't manage to replicate numbers I have in hive [10:15:31] but numbers in hive are from varnish right? [10:15:42] correct elukey - I now take only cache-misses [10:16:12] 95698 request in hour 7 had cache-misse [10:17:16] I find, using sum, 72584 [10:17:28] Which is not perfect, but order of magnitude is ok [10:19:24] what cache status filter do you use? I am asking because now we don't have simply "miss" but iirc also frontend-miss or something similar [10:19:34] so the misses that we want are the varnish backend ones [10:19:48] Caches status for that hour: [10:19:52] cache_status _c1 [10:19:52] hit-front 2516044 [10:19:52] hit-local 13495 [10:19:52] hit-remote 385 [10:19:52] miss 95698 [10:19:55] pass 1 [10:20:11] from all the caches right? [10:20:30] so maybe we only need eqiad [10:20:42] say you have a miss on esams [10:21:02] * elukey wonders [10:21:12] elukey: this is the cache_status value provided in wbrequest [10:26:12] I am asking to the traffic masters if 'miss' in this context means "it landed to the backend" [10:30:03] not sure joal, I tried to think about any Varnish-related reason that could affect the numbers but I didn't find one [10:30:22] maybe restbase.*.v1_metrics_pageviews_per-article_-project-_-access-_-agent-_-article-_-granularity-_-start-_-end-.*.200.sum is not enough? [10:30:26] elukey: I think rate is the way to go for realtime values [10:30:45] elukey: But for count, it's a bit strage [10:33:20] not sure if rate is precomputed or not, need to check that [10:33:56] 10Analytics-Kanban, 10Analytics-Wikistats: Wikistats Bug: Menu to select projects doesn't work (sometimes?) - https://phabricator.wikimedia.org/T179530#3727714 (10Elitre) Right now (Safari, MacBook Air), when I select Wikipedia and am prompted to choose a language, the list won't go beyond Bhojpuri (although I... [10:43:55] need to run errand + early lunch, ttl! [10:44:17] elukey: https://grafana-admin.wikimedia.org/dashboard/db/aqs-wikistats-2-traffic?orgId=1 [10:44:22] For when you're back ;) [10:44:38] ack! :) [11:16:03] 10Analytics, 10Analytics-Wikistats: Wikistats 2.0 Bug at swwiki - https://phabricator.wikimedia.org/T182859#3836943 (10Kipala) [11:31:55] 10Analytics, 10Analytics-Wikistats: Wikistats 2.0 Bug at swwiki - https://phabricator.wikimedia.org/T182859#3836943 (10JAllemandou) Thanks @Kipala :), good catch ! Seems related to dates of the data. In Wikistats 2 data is pulled from 2015-10. The time selector at the back doesn't avtually means anything for... [11:49:38] 10Analytics, 10Analytics-Wikistats: Wikistats 2.0 Bug at swwiki - https://phabricator.wikimedia.org/T182859#3837043 (10Kipala) Ok, that is the reason. Is that for all numbers? (a history page 0.2 instead of 2.0?) [12:13:39] 10Analytics, 10Analytics-Wikistats: Wikistats 2.0 Bug at swwiki - https://phabricator.wikimedia.org/T182859#3837083 (10JAllemandou) >>! In T182859#3837043, @Kipala wrote: > a history page 0.2 instead of 2.0? I don't understand this bit ... The time selector works well on any other page (with charts). The "top... [12:13:52] Hi fdans, are you around? [12:14:17] helloooo joseph, what up! joal [12:15:10] fdans: What would be a good format for points in lines for you to display? [12:17:36] joal: we can declare 404 as a new hidden metric of line chart type, that gets one file as source [12:17:55] the perfect format would be imitating a time series response from aqs like in pageviews [12:19:37] fdans: makes sense :) [12:19:40] fdans: will do ! [12:19:48] awesome!!! [12:19:57] fdans: scale management is ok I imagine? [12:20:35] joal: the ideal situation would be to span it over 2 years, as that's the default view for line graphs [12:20:46] ok fdans [12:20:58] and, format-wise - pageview or edits? [12:21:05] values don't matter as graphs auto-scale [12:21:33] joal: the value name doesn't matter because we specify it in the metric's config [12:21:39] fdans: edits feels easier as I only serve ts and value, not other content [12:21:45] ok makes sense [12:21:56] yeah just need the timestamp and the value joal [12:22:00] So, a json array with a timestamp and a value :) [12:22:38] exactly joal :D [12:39:03] 10Analytics-Kanban, 10Analytics-Wikistats: Wikistats Bug: Menu to select projects doesn't work (sometimes?) - https://phabricator.wikimedia.org/T179530#3837170 (10fdans) @Elitre yes, by design we limit the number of results on the wiki selector, so that the scroll bar remains usable and to encourage the user t... [12:45:19] fdans: Do you chart continue line if there are missing days ? [12:45:25] Or do I need to fill the holes [12:45:26] ? [12:51:12] fdans: First try - https://gist.github.com/jobar/bcd9758d9dece925ac8907bb491205cf [12:55:38] (03CR) 10Fdans: "Tested in Mac Chrome 62 and Safari 10.1.1, in both looks smooth and it feels less laggy!" [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/397989 (https://phabricator.wikimedia.org/T182601) (owner: 10Nuria) [12:56:18] joal: I'm going to get sum lunch now, I'll test this after that, thank you joseph! [12:56:25] :) [13:54:16] 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Add the prometheus jmx exporter to all the Hadoop daemons - https://phabricator.wikimedia.org/T177458#3837454 (10elukey) We are now in the process of refining Yarn/HDFS worker configurations. The first thing that I'd blacklist for the nodemanager are all... [14:00:34] mforns: o/ [14:00:36] pymysql.err.InternalError: (1054, "Unknown column 'event_action.loaded.timing' in 'field list'") [14:00:42] this is eventlogging cleaner [14:00:46] hey elukey :] [14:00:56] oh... [14:01:29] (that is awesome since we caught an error and it got sent to the analytics-alerts ml) [14:01:32] elukey, is that the no-whitelist cleaner on master? or is it the sanitizer on slave? [14:01:44] only the slave one, didn't add anything to the master [14:01:44] yea :] [14:01:48] ok [14:01:56] maybe a whitelist missing item? [14:02:08] or a wrong item [14:02:13] mmmm [14:02:19] full stack trace in the email [14:02:42] elukey, it doesn't look though like an error thrown directly by the cleaner, no? [14:02:57] more like an unexpected thing [14:03:11] yep just noticed it, more a pymysql thing [14:03:35] elukey, I think I did not receive the email, what is the subject? [14:04:24] Cron /usr/bin/flock --verbose -n /var/lock/eventlogging_cleaner /usr/local/bin/eventlogging_cleaner --whitelist /etc/eventlogging/whitelist.tsv --older-than 90 --start-ts-file /var/run/eventlogging_cleaner --batch-size 10000 --sleep-between-batches 2 >> /var/log/eventlogging/eventlogging_cleaner.log [14:05:01] ah there you go [14:05:02] ok, yea I got that [14:05:03] INFO: line 121: Executing command UPDATE `Edit_17541122` SET dt = NULL,userAgent = NULL,event_action.loaded.timing = NULL WHERE timestamp >= %(start_ts)s AND timestamp <= %(end_ts)s with params {'end_ts': '20170915110001', 'start_ts': '20170914110001'} [14:05:07] this is the last line in the log [14:05:18] event_action.loaded.timing = NULL [14:05:18] aa [14:06:05] maybe this is the only one field with dots in its name that we're purging??? [14:06:43] so event_action.loaded.timing is in the table [14:06:50] (I did show col in mysql) [14:06:58] is Edit_17541122 a new schema? [14:09:40] so the last one of https://meta.wikimedia.org/wiki/Schema:Edit [14:12:27] https://meta.wikimedia.org/w/index.php?title=Schema:Edit&action=history was added yesterday [14:15:30] 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Add the prometheus jmx exporter to all the Hadoop daemons - https://phabricator.wikimedia.org/T177458#3837513 (10fgiunchedi) I took a quick look at the metrics, some observations: datanode: 1. `Hadoop_DataNode_DatanodeNetworkCounts{name="DataNodeInfo",... [14:21:30] now that change was done yesterday evening, but the cleaner started today [14:21:48] so it shouldn't be a race condition of some sort [14:22:04] maybe it is the dot [14:23:40] yes I don't find any set something.with.dots in previous logs [14:26:20] mforns: we need to quote or add `` to the values with dots [14:26:40] or we can always add `` [14:27:04] sorry elukey, was finishing a paragraph in the docs. [14:27:19] elukey, yes, Edit_17541122 is a new schema [14:28:19] elukey, IIRC Edit table is the only one with dots in its fields [14:28:38] yep yep confirmed [14:28:49] elukey, and Edit table is fully whitelisted since we consider editing data as public data [14:29:09] so, adding a new dotted field in it, would be the only example of dotted field set to NULL [14:29:51] elukey, we should totally fix the dotted thing by adding `...` [14:29:59] but also add that field to the white-list I guess [14:30:30] I wonder if the premise "it's edit related data -> it's not sensitive" is true in this case, I will re-review the schema for privacy [14:34:39] https://usercontent.irccloud-cdn.com/file/7ZLUDsl4/Screen%20Shot%202017-12-14%20at%2015.34.06.png [14:34:40] hehe joal [14:35:29] mforns: https://gerrit.wikimedia.org/r/#/c/398264/1/modules/profile/files/mariadb/misc/eventlogging/eventlogging_cleaner.py ? [14:36:24] fdans, xDDDDD [14:36:53] almost fdans :) [14:37:13] * joal loves our 4o4 [14:37:30] hate those dots. [14:46:10] +1 [14:47:11] fdans: the 0 could be more round, otherwise this is so great [14:47:14] what dots? [14:47:26] oh! [14:47:49] joal: what do you think about adding a page-history API? To get all the titles a page has had? [14:49:13] not sure to understand what you suggest milimetric [14:50:45] aqs/metrics/page-history/23456 -> { 2001-01-01: 'Romania', 2007-05-01: 'Roumania', 2014-12-02: 'Rominia' } [14:50:46] that kind of thing [14:51:00] ottomata: hiiiii - I've spammed you earlier on [14:51:40] HIIII earlier on,... [14:51:45] oh emails [14:51:47] yes i see that, getting to it! [14:51:57] the lawyers wrote a lot of emails last night i guess [14:51:59] they write long ones... [14:52:10] :D [14:52:15] sure sure, whenever you have time [14:54:21] joal: aqs/metrics/page-history/23456 -> { 2001-01-01: 'Romania', 2007-05-01: 'Roumania', 2014-12-02: 'Rominia' } [14:56:00] elukey: haha yayyyy puppet 4 broke stuff! [14:56:07] will look into it! [14:56:43] I was so looking forward to add the new vk TLS instance :D [15:08:36] fdans: with a round zero: https://gist.github.com/jobar/5f7ac014ea64a08058bf609e1783e84f [15:08:39] :D [15:09:21] mforns: running the cleaner now manually [15:10:21] elukey, [15:10:44] is that a good thing? :D [15:11:21] elukey, it depends, it can go to -> \o/ [15:11:35] or... [15:12:02] :] [15:44:40] 10Analytics-Cluster, 10Analytics-Kanban, 10Patch-For-Review: Support multi DC statsv - https://phabricator.wikimedia.org/T179093#3837784 (10Ottomata) I'm going to bring this discussion back here from gerrit, since it will be easier to keep track of and find later. > Reading further, it sounds like this is a... [15:45:00] 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Add the prometheus jmx exporter to all the Hadoop daemons - https://phabricator.wikimedia.org/T177458#3837787 (10elukey) So added the following on the analytics1029's yarn nodemanager and got: ``` blacklistObjectNames: - 'Hadoop:service=NodeManager,na... [15:55:30] elukey: ok, i verified volans' the puppet-sign-cert solution [15:55:46] and geneerated and signed the varnishkafka cert on the puppet master [15:56:14] i'm going to commit it to private [15:57:02] \o/ [15:57:25] i did that with a local copy of cergen in my homedir, so i'm going to make a new deb package with the fix and upgrade [15:59:16] ottomata: so we are ready for https://gerrit.wikimedia.org/r/#/c/397765/ [16:01:02] ping ottomata [16:01:56] cominngng [16:02:37] elukey: ya think so [16:11:07] (03CR) 10Fdans: [C: 032] Splitting webpack bundle [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/397989 (https://phabricator.wikimedia.org/T182601) (owner: 10Nuria) [16:12:06] 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Add the prometheus jmx exporter to all the Hadoop daemons - https://phabricator.wikimedia.org/T177458#3837852 (10elukey) >>! In T177458#3837513, @fgiunchedi wrote: > I took a quick look at the metrics, some observations: > > datanode: > > 1. `Hadoop_Da... [16:13:40] (03Merged) 10jenkins-bot: Splitting webpack bundle [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/397989 (https://phabricator.wikimedia.org/T182601) (owner: 10Nuria) [16:15:37] 10Analytics, 10Analytics-Wikistats: Wikistats 2.0 Bug at swwiki - https://phabricator.wikimedia.org/T182859#3836943 (10Milimetric) The bug is like this: All the charts use a time *range* like "from 2015 until 2017". So the time selector works fine for those Ranked lists use a *single* date like "2017". In... [16:17:31] 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Add the prometheus jmx exporter to all the Hadoop daemons - https://phabricator.wikimedia.org/T177458#3837870 (10elukey) New Datanode metrics: ``` elukey@analytics1029:/var/log/hadoop-hdfs$ curl http://analytics1029.eqiad.wmnet:51010/metrics -s | grep -... [16:26:08] mforns: awesome doc on privacy, again I'm so glad you stepped in on this issue, I was mega lost :))) [16:26:46] fdans, thanks, but we got all together, the power of a-team [16:37:21] ping a-team [16:38:00] ping mforns joal [16:40:03] ottomata: something weird in https://gerrit.wikimedia.org/r/#/c/398255/3 from your point of view? pcc breaks :( [16:40:12] https://puppet-compiler.wmflabs.org/compiler02/9354/kafka1023.eqiad.wmnet/change.kafka1023.eqiad.wmnet.err [16:40:21] ping mforns joal [16:41:32] soooooorry! [16:45:26] a-team, sorry, I had an issue here, a bug in my wife's ear [16:45:28] solved now [16:45:48] aaaah! :o [16:50:20] 10Analytics-Kanban, 10Analytics-Wikistats: Wikistats Bug: Menu to select projects doesn't work (sometimes?) - https://phabricator.wikimedia.org/T179530#3837937 (10Elitre) well... I wonder if others would consider that as "broken" as I did, then. TY! [16:51:26] 10Analytics-Cluster, 10Analytics-Kanban, 10Patch-For-Review: Support multi DC statsv - https://phabricator.wikimedia.org/T179093#3837941 (10fgiunchedi) >>! In T179093#3837784, @Ottomata wrote: > I'm going to bring this discussion back here from gerrit, since it will be easier to keep track of and find later.... [16:57:07] 10Analytics, 10Analytics-Wikistats: Wikistats 2.0 Bug at swwiki - https://phabricator.wikimedia.org/T182859#3837966 (10Kipala) I do not think it adds up 2015 - 2017. I pick a few lemmata fom 1Oct 2015 to today OLD TOOL: https://tools.wmflabs.org/pageviews/?project=sw.wikipedia.org&platform=all-access&agent=... [16:59:00] 10Analytics, 10Analytics-Wikistats: Wikistats 2.0 Bug at swwiki - https://phabricator.wikimedia.org/T182859#3837970 (10Kipala) So where do the readers for Laozi (=Lao Tse) come from? If it was not a bunch of automatic scripts in 2015 - that figure looks impossible on this wikipedia. [17:18:02] Hey analytics-engineering: would you mind if I use something like `hive -hiveconf mapreduce.map.memory.mb=4096 -hiveconf mapreduce.reduce.memory.mb=5120` in my Hive queries from stat1005? [17:18:38] I am running into some `Exception in thread "main" java.lang.OutOfMemoryError: Java heap space` problems [17:19:01] So I thought a solution along the following lines: https://stackoverflow.com/questions/29673053/java-lang-outofmemoryerror-java-heap-space-with-hive would do? Please advise. Thanks. [17:19:59] GoranSM: sure try it ! :) [17:20:34] ottomata: Thank you. Could you suggest what values to use: I am not aware of the precise constraints on the analytics cluster? [17:20:44] GoranSM: The thing to know would be if the OOM happens in workers or in hive client [17:21:06] the solution you suggest would work for first (workers), not if problem occurs in client [17:21:10] GoranSM: --^ [17:21:26] ottomata: joal; the thing started happening when I've switched from using beeline to hive in my R scripts [17:21:45] And I had to do that because crontab wouldn't run beeline system calls from R under my username on stat1005 [17:22:00] when I do beeline, nothing similar happens [17:22:02] GoranSM: seems not related to workers then - related to client, the solution you suggest won't solve the problem I think [17:22:27] Also, this thing pops up every now and then: log4j:WARN No such property [maxBackupIndex] in org.apache.log4j.DailyRollingFileAppender. [17:23:15] The only change made in respect to the queries that run successfully was a switch from beeline to hive command, for the reasons explained above (crontab). [17:24:20] joal: when you say the problem is related to client, what exactly do you mean: a stat1005 related constraint, a crontab related constrain..? [17:24:49] I am sorry to bother you about this my dear colleagues but these queries are a must have, unfortunatellty [17:25:03] when you launch hive or beeline, there is a local JVM process launched [17:25:10] that process then submits mapreduce jobs to the hadoop cluster [17:25:18] 'client' here refers to the local JVM process [17:26:29] ottomata: thanks for a clarification. So, turns out that the JVM process has different constraints/settings in respect to whether it was run by a beeline or by a hive command? [17:26:57] I con'a recall how we solved that last time it came [17:27:07] ottomata: Because when I use beeline (which I cannot do from R on crontab, for whatever reason) everything works perfectly. [17:27:20] joal: go on. I'm all ear. [17:27:45] hmm, just checked, it looks like both hive and beeline launch the jvm with -Xmx256m [17:27:46] arf - I can't - looking now [17:27:58] but, they are different codebases, so probably handle query results differently [17:28:10] let me help [17:28:13] i'm trying to figure out how to launch the hive jvm with more mem [17:28:55] the problem might be related to the size of the queries themselves; they send a LOT of characters on the cluster (Wikidata item IDs), and they've already cut into nice batches for beeline to able to handle them [17:29:10] Found it GoranSM - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hive/Queries#Out_of_Memory_Errors_on_Client [17:29:18] So it might be that what ottomata says is the only thing we need: more memory for the JVM process launched by the hive command [17:29:36] AHH i just found that by experiment joal :) [17:29:49] yes GoranSM try setting HADOOP_HEAPSIZE when you launch hive or beeline [17:30:14] ottomata: that solution (the URL) would work if the query fails in the fetch phase (--incremental=true) [17:30:42] ottomata: However, these queries seem to be failing even before they're delivered to the workers, as you sau [17:30:44] *say [17:31:35] ottomata: setting HADOOP_HEAPSIZE to how much? Please, I don't want to play with the parameters of whose constraints I am not aware of (if any reading is recommended, I will gladly go for it) [17:32:36] Also GoranSM - While I understand you need data to be computed, hive queries ran from R are not something we have done [17:33:13] GoranSM: try bigger amounts, maybe start with 1024 (1G) [17:33:13] We usually go for oozie to run hadoop side queries, or reportupdater when queries are for external reports and data is small enough [17:33:17] then try 2048 (2G) [17:33:20] etc. [17:33:28] joal: Well there's a first time for everything :) I can run beeline by system calls from R nicely, however that - for some reason - fails under crontab. [17:33:41] GoranSM: what's the error when it fails from cron? [17:33:45] ottomata: thanks. [17:33:47] i assume it doesn't start running the hive job, right? [17:34:17] ottomata: when i run beeline by system calls from R under crontab: "no current connection" is reported back. [17:34:59] oh i betcha i know [17:35:06] GoranSM: how do you laucnh beeline from R/cron? [17:35:10] GoranSM: beeline needs to be given the hive server to connect to [17:35:20] ya, we have a sneaky CLI wrapper when you use it from the cli [17:35:22] that does that for you [17:35:30] when you call 'beeline' from the cli [17:35:34] you are actually running /usr/local/bin/beeline [17:35:39] I just do: beeline -e something [17:35:47] which ends up doing os.execv('/usr/bin/beeline', ['/usr/bin/beeline'] + ARGS) [17:35:54] so, probably , your r script is runing /usr/bin/beeline [17:36:00] try changing your R script to launch with [17:36:04] /usr/local/bin/beeline -e something [17:36:54] @ottomata just to check if I understand correctly, you are suggesting to simply replace all beeline -e * calls with /usr/local/bin/beeline -e something * calls? [17:37:04] yes, in your R script [17:37:24] without being qualified, i betcha your R script is running /usr/bin/beeline [17:37:27] instead of /usr/local/bin/beeline [17:37:30] Ok. Testing, You guys rock, thank you very much. Be back to report. [17:37:36] you can use /usr/bin/beeline, but you'd need to specify the hive server hostname [17:37:45] look at /usr/local/bin/beeline to see how it works [17:37:47] ottomata: Don't know, could be. [17:37:49] its just a little wrapper [17:38:08] I also want to reiterate GoranSM: for production systems, calling hive from is not what we recommend :) [17:41:28] joal: O:-) [17:43:50] GoranSM: yeah, maybe you can just use RJDBC in R itself? [17:43:52] rather than shelling out? [17:43:55] https://stackoverflow.com/a/33020370/555565 [17:44:02] see the UPDATEd example there [17:45:00] s/"jdbc:hive2://localhost:10000/mydb"/"jdbc:hive2://analytics1003.eqiad.wmnet:10000/mydb" [17:57:18] 10Analytics-Kanban: Add link to wikistats 2 from analytics.wikimedia.org - https://phabricator.wikimedia.org/T182904#3838173 (10Nuria) [17:57:37] 10Analytics-Kanban: Add link to wikistats 2 from analytics.wikimedia.org - https://phabricator.wikimedia.org/T182904#3838173 (10Nuria) [18:01:43] ottomata: get ready [18:01:50] ottomata: Here it goes: [18:01:58] Traceback (most recent call last): [18:01:59]  File "/usr/local/bin/beeline", line 21, in [18:01:59]    DEFAULT_OPTIONS = {'-n': os.environ['USER'], [18:01:59]  File "/usr/lib/python2.7/UserDict.py", line 40, in __getitem__ [18:01:59]    raise KeyError(key) [18:01:59] KeyError: 'USER' [18:03:11] GoranSM: how do you launch R? [18:03:19] from what crontab? [18:03:21] user? [18:03:26] nice -10 Rscript [18:03:29] user: goransm [18:03:40] crontab: goransm's crontab [18:04:45] dunno what Rscript does or why you'd lose the USER env [18:04:45] ottomata: nice -10 Rscript , where Rscript is a command to have system runs of R scripts [18:04:58] in your cron, try adding [18:05:03] USER=goransm ... [18:05:06] at the beginning [18:05:12] oh maybe its nice.. [18:05:17] try [18:05:26] export USER=goransm && nice -10 Rscript .. [18:05:37] ottomata: will do, thanks. let me try [18:16:35] ottomata: https://gerrit.wikimedia.org/r/#/c/398301/1 [18:16:42] :) [18:16:53] just +1ed :) [18:18:12] elukey: FYI, just upgraded cergen to 0.2.0 on puppetmaster1001 with puppet-sign-cert fix [18:18:18] super [18:18:27] I think we are ready to go for kafka1023 [18:18:29] shall I? [18:18:44] yes do it! [18:26:03] ottomata: it seems to be working with (A) beeline calls from /usr/local/bin/beeline and passing the username in crontab as (B) export USER=goransm && nice etc etc. [18:26:49] great! [18:29:23] ottomata: I can confirm now, the cluster processes are running. [18:29:32] ottomata: thanks a lot!!! [18:29:48] milimetric: i do not understand how piwik doe snot show visits from spain cause fdans should show up every day [18:30:08] milimetric: that he deploys at least , right? [18:30:16] I guess I should be sending one beer to San Francisco (@ottomata) and one to France (@joal) :) [18:31:03] haha ottomata is brooklyn, but i'll catch the beer midair on its way over [18:31:12] ottomata: 1023 is bootstrapping now [18:31:47] nuria_: ublock might be blocking piwik on my browser [18:33:03] (03PS1) 10Fdans: Replaces hardcoded URL with dynamic date values [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/398306 (https://phabricator.wikimedia.org/T182859) [18:33:32] 10Analytics-Kanban: Initial Launch of new Wikistats 2.0 website - https://phabricator.wikimedia.org/T160370#3838248 (10Nuria) [18:33:35] 10Analytics, 10Analytics-Wikistats: Design new UI for Wikistats 2.0 - https://phabricator.wikimedia.org/T140000#3838249 (10Nuria) [18:33:37] 10Analytics-Kanban, 10Analytics-Wikistats: Backend for wikistats 2.0 - https://phabricator.wikimedia.org/T156384#3838247 (10Nuria) 05Open>03Resolved [18:34:27] ottomata: :) I thought it was San Francisco. Anyways, thanks a lot. [18:37:35] milimetric nuria_ if you can okay that fix I'd like to deploy it asap [18:38:14] hello kafka1023! https://grafana.wikimedia.org/dashboard/db/kafka?refresh=5m&orgId=1&from=now-3h&to=now [18:38:51] fdans: i have not looked at it, conceding to milimetric [18:43:16] ottomata: everything looks good, I downtimed 1023 for two hours just to be sure [18:43:26] but I don't see anything weird [18:43:40] ok if I hand over it to you? [18:52:26] going afk now but will check later people :) [18:52:29] * elukey afk! [19:00:22] fdans: sorry, had lunch, don't worry, I'll look it over and deploy if you're gone [19:00:29] (metrics meeting started) [19:00:40] awesome, thank you milimetric [19:01:15] great elukey i'm watching it [19:01:16] thanks [19:01:20] (also in meetings....) [19:04:18] elukey: its totally working! syncing everything over nicely :) [20:01:21] (03PS2) 10Milimetric: Replaces hardcoded URL with dynamic date values [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/398306 (https://phabricator.wikimedia.org/T182859) (owner: 10Fdans) [20:02:04] Dario giving us some wikilove RIGHT NOW for wikistats so you know! [20:02:28] fdans: you around still? [20:02:42] k, too late, deploying :) [20:02:44] milimetric: yep, looking at your patch [20:02:52] ah, go ahead then [20:02:54] oh ok :) [20:02:56] nono, all yours [20:03:09] I just didn't want special casing in aqs, otherwise it was cool [20:04:29] milimetric: I thought the opposite [20:04:57] I thought it was aqs who should know about particularities in the metrics, not the components [20:05:34] my first approach was like yours but moved it to aqs for that reason milimetric [20:06:34] fdans: mmm, maybe, but the aqs client is dumb otherwise, just takes parameters. So if we do something like that, we should maybe make a second layer [20:06:45] that layer would know about the config and the inputs, and make the connection [20:07:00] I think we're both right, neither of those things should know about each other [20:07:12] we're already doing a special case for tops in aqs, though, right? [20:07:21] the formatTops method [20:07:30] for output, yeah, didn't like that either [20:07:38] because it's replicated in GraphModel [20:07:42] (the special casing) [20:07:51] so it's really the config / GraphModel that should do all this work [20:08:46] like let { common, unique } = graphModel.getParameters(userInput); [20:08:49] fdans: ^ [20:08:57] right [20:09:00] I think that's what we can get to from both MetricWidget and Detail [20:09:07] so this just seemed like a closer step to that [20:10:47] anyway fdans, feel free to deploy either way [20:13:02] ok, let's deploy this [20:13:25] (03CR) 10Fdans: [C: 032] Replaces hardcoded URL with dynamic date values [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/398306 (https://phabricator.wikimedia.org/T182859) (owner: 10Fdans) [20:18:30] (03PS1) 10Fdans: Release 2.1.2 [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/398321 [20:21:42] Gone for tonight lads [20:21:56] Thanks for the patch and deploy milimetric and fdans !! [20:22:18] (03PS2) 10Fdans: Release 2.1.2 [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/398321 [20:24:02] (03CR) 10Fdans: [V: 032 C: 032] Release 2.1.2 [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/398321 (owner: 10Fdans) [20:25:01] thanks fdans, and let's keep in mind the refactor, it's needed for more than just this reason [20:26:17] (03PS1) 10Fdans: Release 2.1.2 [analytics/wikistats2] (release) - 10https://gerrit.wikimedia.org/r/398322 [20:26:42] (03CR) 10Fdans: [V: 032 C: 032] Release 2.1.2 [analytics/wikistats2] (release) - 10https://gerrit.wikimedia.org/r/398322 (owner: 10Fdans) [20:27:48] ok a-team patches deployed, and I just tested the hive query with the bucketed views and the K and it's looking good! [20:27:56] see you on Monday [20:30:53] Bye fdans_onholiday :) [20:40:31] anyone have some info on how hue.wikimedia.org login is controlled? [20:40:46] I have an access request for someone requesting shell access for analytics-privatedata-users in regards to it [20:40:48] https://phabricator.wikimedia.org/T182908 [20:41:02] Seems odd they are asking for shell access, and then requesting access to a web service. [20:48:56] byye [20:49:00] robh, yeah [20:49:20] they don't need shell access to log into hue, but hue iteracts with hadoop, and for that they need a shell account [20:49:35] What controls login to the hue portal? [20:49:37] robh: https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Access#HTTP_Access [20:49:42] It seems they want both, heh [20:49:44] its ldap, but the account needs to be manually synced [20:50:06] ahhhh [20:50:12] ok, let me correct my reply [20:50:22] so, once they have a shell account, then we can create etheir hue account [20:51:10] so they need privatedata-users or just any stat shell? [20:51:29] the task lacks details on exactly what they plan to do with the shell access side so i asked for them to clarify [20:51:35] but perhaps its perfectly clear to you? =] [20:51:47] he emailed me before, so a little~! [20:52:12] i actually don't know for sure if he needs analytics-users or analyitcs-privatedata-users, but usually people using hadoop want access to webrequest or other private data, so usually its analytics-privatedata-users [20:52:28] if they want hadoop access, its always going to be one of the analytics-* groups [20:53:10] robh, also, i think he already has a shell account defined in data.yaml, just not in any groups(?) [20:53:14] nope [20:53:17] he has an ldap account [20:53:18] no shell [20:53:29] so he has the ldap nda flag but no login to systems yet [20:53:47] (so he is defined in data.yaml yes) [20:53:56] but in the ldap users only section for now [20:54:18] ah ok [20:54:48] all sorted just need some feedback, thx for posting on task [20:56:10] Finally got a shell request and those are typically the most rewarding parts of clinic duty (people are very happy when you give them access to things) [20:56:21] feel bad makign them explain more but thems the breaks. [21:04:12] haha [21:04:13] :) [23:02:41] dear analytics. congrats on Wikistats 2. [23:03:44] question: I assume that Edits in https://stats.wikimedia.org/v2/#/commons.wikimedia.org show the edits in namespace 0 and not the uploads to Commons. Is this correct? And if yes, when do you think this will be fixed? :) [23:48:57] 10Analytics: Refresh zookeeper nodes on equiad - https://phabricator.wikimedia.org/T182924#3838887 (10Nuria) [23:49:59] 10Analytics: Refresh zookeeper nodes on eqiad - https://phabricator.wikimedia.org/T182924#3838896 (10Nuria) a:03elukey [23:50:53] 10Analytics, 10Analytics-Kanban: Hardware refresh eventlogging box - https://phabricator.wikimedia.org/T182925#3838900 (10Nuria) [23:55:56] 10Analytics, 10Analytics-Kanban: Put in service 8 new hadoop nodes - https://phabricator.wikimedia.org/T182926#3838922 (10Nuria) [23:59:20] 10Analytics, 10Analytics-Kanban, 10User-Elukey: Refactor analytics cronjobs to alarm on failure with (maybe) cronic - https://phabricator.wikimedia.org/T172532#3838935 (10Nuria)