[02:42:55] <wikibugs>	 10Analytics-Kanban: Create small sample mediawiki-history table in MariaDB - https://phabricator.wikimedia.org/T165309#3836500 (10Milimetric) I'm spinning in place on this task.  Somehow hive corrupted data when I tried to export a sample.  I have wiki_db == '{{Information|' in some records!!!?
[03:52:43] <wikibugs>	 10Analytics-Kanban: Create small sample mediawiki-history table in MariaDB - https://phabricator.wikimedia.org/T165309#3836524 (10Milimetric) This is now available for querying.  cc @Neil_P._Quinn_WMF, @ezachte, and @Tbayer .  Feel free to play with it and let us know how useful it is.  It's a bit of a pain to d...
[09:06:28] <elukey>	 joal: o/
[09:06:35] <elukey>	 did a bit of a revolution in https://grafana.wikimedia.org/dashboard/db/analytics-hadoop?orgId=1&from=now-6h&to=now
[09:06:58] <elukey>	 it may be too extreme now but it is incredibly fast to load and see only what you need
[09:09:50] <elukey>	 also added some metrics for ebernhardson's task (sum or running containers, failed/killed, etc..)
[09:10:23] <elukey>	 my idea is to restart nodemanagers on 103* today to apply the new RSS calculation settings
[09:10:33] <elukey>	 and then to the rest next week if everything looks good
[09:13:28] <joal>	 Sounds good to me :)
[09:13:33] <joal>	 elukey: o/ --^
[09:43:16] <joal>	 elukey: question for you on graphite metrics
[09:43:45] <elukey>	 sure
[09:44:20] <joal>	 elukey: To plot traffic for AQS on pageviews-per-article, you use the .sample_rate metric
[09:44:28] <joal>	 Would you mind explaining me?
[09:45:45] <elukey>	 well if you are talking about the aqs-elukey dashboard, I copied over and modified the aqs one so probably there can be mistakes
[09:46:28] <joal>	 I actually don't know what the sample_rate thing does, and when I try to apply to replicate for the new WKS2 enpoints, well I get strange results
[09:47:27] <elukey>	 ?
[09:48:56] <elukey>	 we could use the "rate" metric, just realized it
[09:49:01] <elukey>	 wow and it looks way more traffic
[09:49:15] <joal>	 elukey: count?
[09:50:22] <elukey>	 well then you have to rate it no?
[09:50:27] <elukey>	 we already have the rate
[09:50:45] <joal>	 elukey: why 'rate' it?
[09:51:26] <elukey>	 the count afaik is always increasing, so there is not much point in plotting it if you don't use a per second rate
[09:51:41] <elukey>	 but if it is the number of datapoints it might be good
[09:51:46] <elukey>	 maybe I am confusing with prometheus
[09:51:49] <elukey>	 checking
[09:52:47] <elukey>	 aqs-elukey now uses count
[09:52:53] <elukey>	 if you want to check
[09:53:07] <elukey>	 at some point we'll need to review the main aqs dashboard
[09:56:26] <joal>	 elukey: Checking numbers vs hour hive data
[09:57:26] <joal>	 elukey: We got 2625623 request for per-article endpoint on hour 7 today
[09:57:42] <joal>	 in average ~730 req/s
[10:01:24] * elukey just ran the webrequest_joal_checker.hql script to verify false positives
[10:01:59] <joal>	 :)
[10:03:30] <elukey>	 wow the query is really nice
[10:05:25] <joal>	 elukey: Ah, just realized that what we see in graphite is un-cached responses only !!!
[10:10:33] <joal>	 elukey: I don't understand :( For an hour of data, I have more 200 than 2xx :S So weird
[10:11:40] <elukey>	 missing a bit of context
[10:11:53] <elukey>	 what data are you checking atm? :)
[10:12:35] <joal>	 I'm checking data in grafan, using sumSeries(restbase.*.v1_metrics_pageviews_per-article_-project-_-access-_-agent-_-article-_-granularity-_-start-_-end-.*.200.sum)
[10:14:14] <elukey>	 ah ok
[10:15:08] <elukey>	 why .sum in the end? maybe .count ?
[10:15:11] <joal>	 elukey: And I can't manage to replicate numbers I have in hive
[10:15:31] <elukey>	 but numbers in hive are from varnish right?
[10:15:42] <joal>	 correct elukey - I now take only cache-misses
[10:16:12] <joal>	 95698 request in hour 7 had cache-misse
[10:17:16] <joal>	 I find, using sum, 72584
[10:17:28] <joal>	 Which is not perfect, but order of magnitude is ok
[10:19:24] <elukey>	 what cache status filter do you use? I am asking because now we don't have simply "miss" but iirc also frontend-miss or something similar
[10:19:34] <elukey>	 so the misses that we want are the varnish backend ones
[10:19:48] <joal>	 Caches status for that hour:
[10:19:52] <joal>	 cache_status	_c1
[10:19:52] <joal>	 hit-front	2516044
[10:19:52] <joal>	 hit-local	13495
[10:19:52] <joal>	 hit-remote	385
[10:19:52] <joal>	 miss	95698
[10:19:55] <joal>	 pass	1
[10:20:11] <elukey>	 from all the caches right?
[10:20:30] <elukey>	 so maybe we only need eqiad
[10:20:42] <elukey>	 say you have a miss on esams
[10:21:02] * elukey wonders
[10:21:12] <joal>	 elukey: this is the cache_status value provided in wbrequest
[10:26:12] <elukey>	 I am asking to the traffic masters if 'miss' in this context means "it landed to the backend"
[10:30:03] <elukey>	 not sure joal, I tried to think about any Varnish-related reason that could affect the numbers but I didn't find one
[10:30:22] <elukey>	 maybe restbase.*.v1_metrics_pageviews_per-article_-project-_-access-_-agent-_-article-_-granularity-_-start-_-end-.*.200.sum is not enough?
[10:30:26] <joal>	 elukey: I think rate is the way to go for realtime values
[10:30:45] <joal>	 elukey: But for count, it's a bit strage
[10:33:20] <elukey>	 not sure if rate is precomputed or not, need to check that
[10:33:56] <wikibugs>	 10Analytics-Kanban, 10Analytics-Wikistats: Wikistats Bug: Menu to select projects doesn't work (sometimes?) - https://phabricator.wikimedia.org/T179530#3727714 (10Elitre) Right now (Safari, MacBook Air), when I select Wikipedia and am prompted to choose a language, the list won't go beyond Bhojpuri (although I...
[10:43:55] <elukey>	 need to run errand + early lunch, ttl!
[10:44:17] <joal>	 elukey: https://grafana-admin.wikimedia.org/dashboard/db/aqs-wikistats-2-traffic?orgId=1
[10:44:22] <joal>	 For when you're back ;)
[10:44:38] <elukey>	 ack! :)
[11:16:03] <wikibugs>	 10Analytics, 10Analytics-Wikistats: Wikistats  2.0 Bug at swwiki - https://phabricator.wikimedia.org/T182859#3836943 (10Kipala)
[11:31:55] <wikibugs>	 10Analytics, 10Analytics-Wikistats: Wikistats  2.0 Bug at swwiki - https://phabricator.wikimedia.org/T182859#3836943 (10JAllemandou) Thanks @Kipala :), good catch ! Seems related to dates of the data. In Wikistats 2 data is pulled from 2015-10. The time selector at the back doesn't avtually means anything for...
[11:49:38] <wikibugs>	 10Analytics, 10Analytics-Wikistats: Wikistats  2.0 Bug at swwiki - https://phabricator.wikimedia.org/T182859#3837043 (10Kipala) Ok, that is the reason. Is that for all numbers? (a history page 0.2 instead of 2.0?)
[12:13:39] <wikibugs>	 10Analytics, 10Analytics-Wikistats: Wikistats  2.0 Bug at swwiki - https://phabricator.wikimedia.org/T182859#3837083 (10JAllemandou) >>! In T182859#3837043, @Kipala wrote: > a history page 0.2 instead of 2.0? I don't understand this bit ... The time selector works well on any other page (with charts). The "top...
[12:13:52] <joal>	 Hi fdans, are you around?
[12:14:17] <fdans>	 helloooo joseph, what up! joal 
[12:15:10] <joal>	 fdans: What would be a good format for points in lines for you to display?
[12:17:36] <fdans>	 joal: we can declare 404 as a new hidden metric of line chart type, that gets one file as source
[12:17:55] <fdans>	 the perfect format would be imitating a time series response from aqs like in pageviews
[12:19:37] <joal>	 fdans: makes sense :)
[12:19:40] <joal>	 fdans: will do !
[12:19:48] <fdans>	 awesome!!!
[12:19:57] <joal>	 fdans: scale management is ok I imagine?
[12:20:35] <fdans>	 joal: the ideal situation would be to span it over 2 years, as that's the default view for line graphs
[12:20:46] <joal>	 ok fdans
[12:20:58] <joal>	 and, format-wise - pageview or edits?
[12:21:05] <fdans>	 values don't matter as graphs auto-scale
[12:21:33] <fdans>	 joal: the value name doesn't matter because we specify it in the metric's config
[12:21:39] <joal>	 fdans: edits feels easier as I only serve ts and value, not other content
[12:21:45] <joal>	 ok makes sense
[12:21:56] <fdans>	 yeah just need the timestamp and the value joal 
[12:22:00] <joal>	 So, a json array with a timestamp and a value :)
[12:22:38] <fdans>	 exactly joal :D
[12:39:03] <wikibugs>	 10Analytics-Kanban, 10Analytics-Wikistats: Wikistats Bug: Menu to select projects doesn't work (sometimes?) - https://phabricator.wikimedia.org/T179530#3837170 (10fdans) @Elitre yes, by design we limit the number of results on the wiki selector, so that the scroll bar remains usable and to encourage the user t...
[12:45:19] <joal>	 fdans: Do you chart continue line if there are missing days ?
[12:45:25] <joal>	 Or do I need to fill the holes
[12:45:26] <joal>	 ?
[12:51:12] <joal>	 fdans: First try - https://gist.github.com/jobar/bcd9758d9dece925ac8907bb491205cf
[12:55:38] <wikibugs>	 (03CR) 10Fdans: "Tested in Mac Chrome 62 and Safari 10.1.1, in both looks smooth and it feels less laggy!" [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/397989 (https://phabricator.wikimedia.org/T182601) (owner: 10Nuria)
[12:56:18] <fdans>	 joal: I'm going to get sum lunch now, I'll test this after that, thank you joseph!
[12:56:25] <joal>	 :)
[13:54:16] <wikibugs>	 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Add the prometheus jmx exporter to all the Hadoop daemons - https://phabricator.wikimedia.org/T177458#3837454 (10elukey) We are now in the process of refining Yarn/HDFS worker configurations. The first thing that I'd blacklist for the nodemanager are all...
[14:00:34] <elukey>	 mforns: o/
[14:00:36] <elukey>	 pymysql.err.InternalError: (1054, "Unknown column 'event_action.loaded.timing' in 'field list'")
[14:00:42] <elukey>	 this is eventlogging cleaner
[14:00:46] <mforns>	 hey elukey :]
[14:00:56] <mforns>	 oh...
[14:01:29] <elukey>	 (that is awesome since we caught an error and it got sent to the analytics-alerts ml)
[14:01:32] <mforns>	 elukey, is that the no-whitelist cleaner on master? or is it the sanitizer on slave?
[14:01:44] <elukey>	 only the slave one, didn't add anything to the master
[14:01:44] <mforns>	 yea :]
[14:01:48] <mforns>	 ok
[14:01:56] <elukey>	 maybe a whitelist missing item?
[14:02:08] <elukey>	 or a wrong item
[14:02:13] <elukey>	 mmmm
[14:02:19] <elukey>	 full stack trace in the email
[14:02:42] <mforns>	 elukey, it doesn't look though like an error thrown directly by the cleaner, no?
[14:02:57] <mforns>	 more like an unexpected thing
[14:03:11] <elukey>	 yep just noticed it, more a pymysql thing
[14:03:35] <mforns>	 elukey, I think I did not receive the email, what is the subject?
[14:04:24] <elukey>	 Cron <eventlogcleaner@db1108> /usr/bin/flock --verbose -n /var/lock/eventlogging_cleaner /usr/local/bin/eventlogging_cleaner --whitelist /etc/eventlogging/whitelist.tsv --older-than 90 --start-ts-file /var/run/eventlogging_cleaner --batch-size 10000 --sleep-between-batches 2 >> /var/log/eventlogging/eventlogging_cleaner.log
[14:05:01] <elukey>	 ah there you go
[14:05:02] <mforns>	 ok, yea I got that
[14:05:03] <elukey>	 INFO: line 121: Executing command UPDATE `Edit_17541122` SET dt = NULL,userAgent = NULL,event_action.loaded.timing = NULL WHERE timestamp >= %(start_ts)s AND timestamp <= %(end_ts)s with params {'end_ts': '20170915110001', 'start_ts': '20170914110001'}
[14:05:07] <elukey>	 this is the last line in the log
[14:05:18] <elukey>	 event_action.loaded.timing = NULL
[14:05:18] <mforns>	 aa
[14:06:05] <mforns>	 maybe this is the only one field with dots in its name that we're purging???
[14:06:43] <elukey>	 so event_action.loaded.timing is in the table
[14:06:50] <elukey>	 (I did show col in mysql)
[14:06:58] <elukey>	 is Edit_17541122 a new schema?
[14:09:40] <elukey>	 so the last one of https://meta.wikimedia.org/wiki/Schema:Edit
[14:12:27] <elukey>	 https://meta.wikimedia.org/w/index.php?title=Schema:Edit&action=history was added yesterday 
[14:15:30] <wikibugs>	 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Add the prometheus jmx exporter to all the Hadoop daemons - https://phabricator.wikimedia.org/T177458#3837513 (10fgiunchedi) I took a quick look at the metrics, some observations:  datanode:  1. `Hadoop_DataNode_DatanodeNetworkCounts{name="DataNodeInfo",...
[14:21:30] <elukey>	 now that change was done yesterday evening, but the cleaner started today
[14:21:48] <elukey>	 so it shouldn't be a race condition of some sort
[14:22:04] <elukey>	 maybe it is the dot
[14:23:40] <elukey>	 yes I don't find any set something.with.dots in previous logs
[14:26:20] <elukey>	 mforns: we need to quote or add `` to the values with dots
[14:26:40] <elukey>	 or we can always add ``
[14:27:04] <mforns>	 sorry elukey, was finishing a paragraph in the docs.
[14:27:19] <mforns>	 elukey, yes, Edit_17541122 is a new schema
[14:28:19] <mforns>	 elukey, IIRC Edit table is the only one with dots in its fields
[14:28:38] <elukey>	 yep yep confirmed
[14:28:49] <mforns>	 elukey, and Edit table is fully whitelisted since we consider editing data as public data
[14:29:09] <mforns>	 so, adding a new dotted field in it, would be the only example of dotted field set to NULL
[14:29:51] <mforns>	 elukey, we should totally fix the dotted thing by adding `...`
[14:29:59] <mforns>	 but also add that field to the white-list I guess
[14:30:30] <mforns>	 I wonder if the premise "it's edit related data -> it's not sensitive" is true in this case, I will re-review the schema for privacy
[14:34:39] <fdans>	 https://usercontent.irccloud-cdn.com/file/7ZLUDsl4/Screen%20Shot%202017-12-14%20at%2015.34.06.png
[14:34:40] <fdans>	 hehe joal 
[14:35:29] <elukey>	 mforns: https://gerrit.wikimedia.org/r/#/c/398264/1/modules/profile/files/mariadb/misc/eventlogging/eventlogging_cleaner.py ?
[14:36:24] <mforns>	 fdans, xDDDDD
[14:36:53] <joal>	 almost fdans :)
[14:37:13] * joal loves our 4o4
[14:37:30] <ottomata>	 hate those dots.
[14:46:10] <elukey>	 +1
[14:47:11] <milimetric>	 fdans: the 0 could be more round, otherwise this is so great
[14:47:14] <milimetric>	 what dots?
[14:47:26] <milimetric>	 oh!
[14:47:49] <milimetric>	 joal: what do you think about adding a page-history API?  To get all the titles a page has had?
[14:49:13] <joal>	 not sure to understand what you suggest milimetric 
[14:50:45] <milimetric>	 aqs/metrics/page-history/23456 -> { 2001-01-01: 'Romania', 2007-05-01: 'Roumania', 2014-12-02: 'Rominia' }
[14:50:46] <milimetric>	 that kind of thing
[14:51:00] <elukey>	 ottomata: hiiiii - I've spammed you earlier on
[14:51:40] <ottomata>	 HIIII earlier on,...
[14:51:45] <ottomata>	 oh emails
[14:51:47] <ottomata>	 yes i see that, getting to it!
[14:51:57] <ottomata>	 the lawyers wrote a lot of emails last night i guess
[14:51:59] <ottomata>	 they write long ones...
[14:52:10] <elukey>	 :D
[14:52:15] <elukey>	 sure sure, whenever you have time
[14:54:21] <milimetric>	 joal: aqs/metrics/page-history/23456 -> { 2001-01-01: 'Romania', 2007-05-01: 'Roumania', 2014-12-02: 'Rominia' }
[14:56:00] <ottomata>	 elukey:  haha yayyyy puppet 4 broke stuff!
[14:56:07] <ottomata>	 will look into it!
[14:56:43] <elukey>	 I was so looking forward to add the new vk TLS instance :D
[15:08:36] <joal>	 fdans: with a round zero: https://gist.github.com/jobar/5f7ac014ea64a08058bf609e1783e84f
[15:08:39] <joal>	 :D
[15:09:21] <elukey>	 mforns: running the cleaner now manually
[15:10:21] <mforns>	 elukey, <o>
[15:10:44] <elukey>	 is that a good thing? :D
[15:11:21] <mforns>	 elukey, it depends, it can go to <o> -> \o/
[15:11:35] <mforns>	 or...
[15:12:02] <mforns>	 :]
[15:44:40] <wikibugs>	 10Analytics-Cluster, 10Analytics-Kanban, 10Patch-For-Review: Support multi DC statsv - https://phabricator.wikimedia.org/T179093#3837784 (10Ottomata) I'm going to bring this discussion back here from gerrit, since it will be easier to keep track of and find later.  > Reading further, it sounds like this is a...
[15:45:00] <wikibugs>	 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Add the prometheus jmx exporter to all the Hadoop daemons - https://phabricator.wikimedia.org/T177458#3837787 (10elukey) So added the following on the analytics1029's yarn nodemanager and got:  ``` blacklistObjectNames:   - 'Hadoop:service=NodeManager,na...
[15:55:30] <ottomata>	 elukey:  ok, i verified volans' the puppet-sign-cert solution
[15:55:46] <ottomata>	 and geneerated and signed the varnishkafka cert on the puppet master
[15:56:14] <ottomata>	 i'm going to commit it to private
[15:57:02] <elukey>	 \o/
[15:57:25] <ottomata>	 i did that with a local copy of cergen in my homedir, so i'm going to make a new deb package with the fix and upgrade
[15:59:16] <elukey>	 ottomata: so we are ready for https://gerrit.wikimedia.org/r/#/c/397765/
[16:01:02] <nuria_>	 ping ottomata 
[16:01:56] <ottomata>	 cominngng
[16:02:37] <ottomata>	 elukey:  ya think so
[16:11:07] <wikibugs>	 (03CR) 10Fdans: [C: 032] Splitting webpack bundle [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/397989 (https://phabricator.wikimedia.org/T182601) (owner: 10Nuria)
[16:12:06] <wikibugs>	 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Add the prometheus jmx exporter to all the Hadoop daemons - https://phabricator.wikimedia.org/T177458#3837852 (10elukey) >>! In T177458#3837513, @fgiunchedi wrote: > I took a quick look at the metrics, some observations: >  > datanode: >  > 1. `Hadoop_Da...
[16:13:40] <wikibugs>	 (03Merged) 10jenkins-bot: Splitting webpack bundle [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/397989 (https://phabricator.wikimedia.org/T182601) (owner: 10Nuria)
[16:15:37] <wikibugs>	 10Analytics, 10Analytics-Wikistats: Wikistats  2.0 Bug at swwiki - https://phabricator.wikimedia.org/T182859#3836943 (10Milimetric) The bug is like this:  All the charts use a time *range* like "from 2015 until 2017".  So the time selector works fine for those Ranked lists use a *single* date like "2017".  In...
[16:17:31] <wikibugs>	 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Add the prometheus jmx exporter to all the Hadoop daemons - https://phabricator.wikimedia.org/T177458#3837870 (10elukey) New Datanode metrics:  ``` elukey@analytics1029:/var/log/hadoop-hdfs$ curl http://analytics1029.eqiad.wmnet:51010/metrics -s | grep -...
[16:26:08] <fdans>	 mforns: awesome doc on privacy, again I'm so glad you stepped in on this issue, I was mega lost :)))
[16:26:46] <mforns>	 fdans, thanks, but we got all together, the power of a-team
[16:37:21] <nuria_>	 ping a-team
[16:38:00] <nuria_>	 ping mforns joal
[16:40:03] <elukey>	 ottomata: something weird in https://gerrit.wikimedia.org/r/#/c/398255/3 from your point of view? pcc breaks :(
[16:40:12] <elukey>	 https://puppet-compiler.wmflabs.org/compiler02/9354/kafka1023.eqiad.wmnet/change.kafka1023.eqiad.wmnet.err
[16:40:21] <nuria_>	 ping mforns joal 
[16:41:32] <joal>	 soooooorry!
[16:45:26] <mforns>	 a-team, sorry, I had an issue here, a bug in my wife's ear
[16:45:28] <mforns>	 solved now
[16:45:48] <milimetric>	 aaaah! :o
[16:50:20] <wikibugs>	 10Analytics-Kanban, 10Analytics-Wikistats: Wikistats Bug: Menu to select projects doesn't work (sometimes?) - https://phabricator.wikimedia.org/T179530#3837937 (10Elitre) well... I wonder if others would consider that as "broken" as I did, then. TY!
[16:51:26] <wikibugs>	 10Analytics-Cluster, 10Analytics-Kanban, 10Patch-For-Review: Support multi DC statsv - https://phabricator.wikimedia.org/T179093#3837941 (10fgiunchedi) >>! In T179093#3837784, @Ottomata wrote: > I'm going to bring this discussion back here from gerrit, since it will be easier to keep track of and find later....
[16:57:07] <wikibugs>	 10Analytics, 10Analytics-Wikistats: Wikistats  2.0 Bug at swwiki - https://phabricator.wikimedia.org/T182859#3837966 (10Kipala) I do not think it adds up 2015 - 2017. I pick a few  lemmata fom 1Oct 2015 to today  OLD TOOL: https://tools.wmflabs.org/pageviews/?project=sw.wikipedia.org&platform=all-access&agent=...
[16:59:00] <wikibugs>	 10Analytics, 10Analytics-Wikistats: Wikistats  2.0 Bug at swwiki - https://phabricator.wikimedia.org/T182859#3837970 (10Kipala) So where do the readers for Laozi (=Lao Tse) come from? If it was not a bunch of automatic scripts in 2015 - that figure looks impossible on this wikipedia.
[17:18:02] <GoranSM>	 Hey analytics-engineering: would you mind if I use something like `hive -hiveconf mapreduce.map.memory.mb=4096 -hiveconf mapreduce.reduce.memory.mb=5120` in my Hive queries from stat1005?
[17:18:38] <GoranSM>	 I am running into some `Exception in thread "main" java.lang.OutOfMemoryError: Java heap space` problems
[17:19:01] <GoranSM>	 So I thought a solution along the following lines: https://stackoverflow.com/questions/29673053/java-lang-outofmemoryerror-java-heap-space-with-hive would do? Please advise. Thanks.
[17:19:59] <ottomata>	 GoranSM:  sure try it ! :)
[17:20:34] <GoranSM>	 ottomata: Thank you. Could you suggest what values to use: I am not aware of the precise constraints on the analytics cluster?
[17:20:44] <joal>	 GoranSM: The thing to know would be if the OOM happens in workers or in hive client
[17:21:06] <joal>	 the solution you suggest would work for first (workers), not if problem occurs in client
[17:21:10] <joal>	 GoranSM: --^
[17:21:26] <GoranSM>	 ottomata: joal; the thing started happening when I've switched from using beeline to hive in my R scripts
[17:21:45] <GoranSM>	 And I had to do that because crontab wouldn't run beeline system calls from R under my username on stat1005 
[17:22:00] <GoranSM>	 when I do beeline, nothing similar happens
[17:22:02] <joal>	 GoranSM: seems not related to workers then - related to client, the solution you suggest won't solve the problem I think
[17:22:27] <GoranSM>	 Also, this thing pops up every now and then: log4j:WARN No such property [maxBackupIndex] in org.apache.log4j.DailyRollingFileAppender.
[17:23:15] <GoranSM>	 The only change made in respect to the queries that run successfully was a switch from beeline to hive command, for the reasons explained above (crontab).
[17:24:20] <GoranSM>	 joal: when you say the problem is related to client, what exactly do you mean: a stat1005 related constraint, a crontab related constrain..?
[17:24:49] <GoranSM>	 I am sorry to bother you about this my dear colleagues but these queries are a must have, unfortunatellty
[17:25:03] <ottomata>	 when you launch hive or beeline, there is a local JVM process launched
[17:25:10] <ottomata>	 that process then submits mapreduce jobs to the hadoop cluster
[17:25:18] <ottomata>	 'client' here refers to the local JVM process
[17:26:29] <GoranSM>	 ottomata: thanks for a clarification. So, turns out that the JVM process has different constraints/settings in respect to whether it was run by a beeline or by a hive command?
[17:26:57] <joal>	 I con'a recall how we solved that last time it came
[17:27:07] <GoranSM>	 ottomata: Because when I use beeline (which I cannot do from R on crontab, for whatever reason) everything works perfectly.
[17:27:20] <GoranSM>	 joal: go on. I'm all ear.
[17:27:45] <ottomata>	 hmm, just checked, it looks like both hive and beeline launch the jvm with -Xmx256m
[17:27:46] <joal>	 arf - I can't - looking now
[17:27:58] <ottomata>	 but, they are different codebases, so probably handle query results differently
[17:28:10] <GoranSM>	 let me help
[17:28:13] <ottomata>	 i'm trying to figure out how to launch the hive jvm with more mem
[17:28:55] <GoranSM>	 the problem might be related to the size of the queries themselves; they send a LOT of characters on the cluster (Wikidata item IDs), and they've already cut into nice batches for beeline to able to handle them
[17:29:10] <joal>	 Found it GoranSM - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hive/Queries#Out_of_Memory_Errors_on_Client
[17:29:18] <GoranSM>	 So it might be that what ottomata says is the only thing we need: more memory for the JVM process launched by the hive command
[17:29:36] <ottomata>	 AHH i just found that by experiment joal :)
[17:29:49] <ottomata>	 yes GoranSM try setting HADOOP_HEAPSIZE when you launch hive or beeline
[17:30:14] <GoranSM>	 ottomata: that solution (the URL) would work if the query fails in the fetch phase (--incremental=true)
[17:30:42] <GoranSM>	 ottomata: However, these queries seem to be failing even before they're delivered to the workers, as you sau
[17:30:44] <GoranSM>	 *say
[17:31:35] <GoranSM>	 ottomata: setting HADOOP_HEAPSIZE to how much? Please, I don't want to play with the parameters of whose constraints I am not aware of (if any reading is recommended, I will gladly go for it)
[17:32:36] <joal>	 Also GoranSM - While I understand you need data to be computed, hive queries ran from R are not something we have done
[17:33:13] <ottomata>	 GoranSM:  try bigger amounts, maybe start with 1024 (1G)
[17:33:13] <joal>	 We usually go for oozie to run hadoop side queries, or reportupdater when queries are for external reports and data is small enough
[17:33:17] <ottomata>	 then try 2048 (2G)
[17:33:20] <ottomata>	 etc.
[17:33:28] <GoranSM>	 joal: Well there's a first time for everything :) I can run beeline by system calls from R nicely, however that - for some reason - fails under crontab.
[17:33:41] <ottomata>	 GoranSM:  what's the error when it fails from cron?
[17:33:45] <GoranSM>	 ottomata: thanks.
[17:33:47] <ottomata>	 i assume it doesn't start running the hive job, right?
[17:34:17] <GoranSM>	 ottomata: when i run beeline by system calls from R under crontab: "no current connection" is reported back.
[17:34:59] <ottomata>	 oh i betcha i know
[17:35:06] <ottomata>	 GoranSM:  how do you laucnh beeline from R/cron?
[17:35:10] <joal>	 GoranSM: beeline needs to be given the hive server to connect to
[17:35:20] <ottomata>	 ya, we have a sneaky CLI wrapper when you use it from the cli
[17:35:22] <ottomata>	 that does that for you
[17:35:30] <ottomata>	 when you call 'beeline' from the cli
[17:35:34] <ottomata>	 you are actually running /usr/local/bin/beeline
[17:35:39] <GoranSM>	 I just do: beeline -e something 
[17:35:47] <ottomata>	 which ends up doing os.execv('/usr/bin/beeline', ['/usr/bin/beeline'] + ARGS)
[17:35:54] <ottomata>	 so, probably , your r script is runing /usr/bin/beeline
[17:36:00] <ottomata>	 try changing your R script to launch with 
[17:36:04] <ottomata>	  /usr/local/bin/beeline -e something
[17:36:54] <GoranSM>	 @ottomata just to check if I understand correctly, you are suggesting to simply replace all beeline -e * calls with  /usr/local/bin/beeline -e something * calls?
[17:37:04] <ottomata>	 yes, in your R script
[17:37:24] <ottomata>	 without being qualified, i betcha your R script is running /usr/bin/beeline
[17:37:27] <ottomata>	 instead of /usr/local/bin/beeline
[17:37:30] <GoranSM>	 Ok. Testing, You guys rock, thank you very much. Be back to report.
[17:37:36] <ottomata>	 you can use /usr/bin/beeline, but you'd need to specify the hive server hostname
[17:37:45] <ottomata>	 look at /usr/local/bin/beeline to see how it works
[17:37:47] <GoranSM>	 ottomata: Don't know, could be. 
[17:37:49] <ottomata>	 its just a little wrapper
[17:38:08] <joal>	 I also want to reiterate GoranSM: for production systems, calling hive from is not what we recommend :)
[17:41:28] <GoranSM>	 joal: O:-)
[17:43:50] <ottomata>	 GoranSM:  yeah, maybe you can just use RJDBC in R itself?
[17:43:52] <ottomata>	 rather than shelling out?
[17:43:55] <ottomata>	 https://stackoverflow.com/a/33020370/555565
[17:44:02] <ottomata>	 see the UPDATEd example there
[17:45:00] <ottomata>	 s/"jdbc:hive2://localhost:10000/mydb"/"jdbc:hive2://analytics1003.eqiad.wmnet:10000/mydb"
[17:57:18] <wikibugs>	 10Analytics-Kanban: Add link to wikistats 2 from analytics.wikimedia.org - https://phabricator.wikimedia.org/T182904#3838173 (10Nuria)
[17:57:37] <wikibugs>	 10Analytics-Kanban: Add link to wikistats 2 from analytics.wikimedia.org - https://phabricator.wikimedia.org/T182904#3838173 (10Nuria)
[18:01:43] <GoranSM>	 ottomata: get ready
[18:01:50] <GoranSM>	 ottomata: Here it goes:
[18:01:58] <GoranSM>	 Traceback (most recent call last): 
[18:01:59] <GoranSM>	   File "/usr/local/bin/beeline", line 21, in <module> 
[18:01:59] <GoranSM>	     DEFAULT_OPTIONS = {'-n': os.environ['USER'], 
[18:01:59] <GoranSM>	   File "/usr/lib/python2.7/UserDict.py", line 40, in __getitem__ 
[18:01:59] <GoranSM>	     raise KeyError(key) 
[18:01:59] <GoranSM>	 KeyError: 'USER'
[18:03:11] <ottomata>	 GoranSM:  how do you launch R?
[18:03:19] <ottomata>	 from what crontab?
[18:03:21] <ottomata>	 user?
[18:03:26] <GoranSM>	 nice -10 Rscript 
[18:03:29] <GoranSM>	 user: goransm
[18:03:40] <GoranSM>	 crontab: goransm's crontab
[18:04:45] <ottomata>	 dunno what Rscript does or why you'd lose the USER env
[18:04:45] <GoranSM>	 ottomata: nice -10 Rscript <scrip_path>, where Rscript is a command to have system runs of R scripts 
[18:04:58] <ottomata>	 in your cron, try adding
[18:05:03] <ottomata>	 USER=goransm ...
[18:05:06] <ottomata>	 at the beginning
[18:05:12] <ottomata>	 oh maybe its nice..
[18:05:17] <ottomata>	 try
[18:05:26] <ottomata>	 export USER=goransm && nice -10 Rscript ..
[18:05:37] <GoranSM>	 ottomata: will do, thanks. let me try
[18:16:35] <elukey>	 ottomata: https://gerrit.wikimedia.org/r/#/c/398301/1
[18:16:42] <elukey>	 :)
[18:16:53] <ottomata>	 just +1ed :)
[18:18:12] <ottomata>	 elukey:  FYI, just upgraded cergen to 0.2.0 on puppetmaster1001 with puppet-sign-cert fix
[18:18:18] <elukey>	 super
[18:18:27] <elukey>	 I think we are ready to go for kafka1023
[18:18:29] <elukey>	 shall I?
[18:18:44] <ottomata>	 yes do it!
[18:26:03] <GoranSM>	 ottomata: it seems to be working with (A) beeline calls from /usr/local/bin/beeline and passing the username in crontab as (B) export USER=goransm && nice etc etc.
[18:26:49] <ottomata>	 great!
[18:29:23] <GoranSM>	 ottomata: I can confirm now, the cluster processes are running. 
[18:29:32] <GoranSM>	 ottomata: thanks a lot!!!
[18:29:48] <nuria_>	 milimetric: i do not understand how piwik doe snot show visits from spain cause fdans should show up every day
[18:30:08] <nuria_>	 milimetric: that he deploys at least , right?
[18:30:16] <GoranSM>	 I guess I should be sending one beer to San Francisco (@ottomata) and one to France (@joal) :)
[18:31:03] <ottomata>	 haha ottomata is brooklyn, but i'll catch the beer midair on its way over
[18:31:12] <elukey>	 ottomata: 1023 is bootstrapping now
[18:31:47] <fdans>	 nuria_: ublock might be blocking piwik on my browser
[18:33:03] <wikibugs>	 (03PS1) 10Fdans: Replaces hardcoded URL with dynamic date values [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/398306 (https://phabricator.wikimedia.org/T182859)
[18:33:32] <wikibugs>	 10Analytics-Kanban: Initial Launch of new  Wikistats 2.0  website - https://phabricator.wikimedia.org/T160370#3838248 (10Nuria)
[18:33:35] <wikibugs>	 10Analytics, 10Analytics-Wikistats: Design new UI for Wikistats 2.0 - https://phabricator.wikimedia.org/T140000#3838249 (10Nuria)
[18:33:37] <wikibugs>	 10Analytics-Kanban, 10Analytics-Wikistats: Backend  for wikistats 2.0 - https://phabricator.wikimedia.org/T156384#3838247 (10Nuria) 05Open>03Resolved
[18:34:27] <GoranSM>	 ottomata: :) I thought it was San Francisco. Anyways, thanks a lot.
[18:37:35] <fdans>	 milimetric nuria_ if you can okay that fix I'd like to deploy it asap
[18:38:14] <elukey>	 hello kafka1023! https://grafana.wikimedia.org/dashboard/db/kafka?refresh=5m&orgId=1&from=now-3h&to=now
[18:38:51] <nuria_>	 fdans: i have not looked at it, conceding to milimetric 
[18:43:16] <elukey>	 ottomata: everything looks good, I downtimed 1023 for two hours just to be sure
[18:43:26] <elukey>	 but I don't see anything weird
[18:43:40] <elukey>	 ok if I hand over it to you?
[18:52:26] <elukey>	 going afk now but will check later people :)
[18:52:29] * elukey afk! 
[19:00:22] <milimetric>	 fdans: sorry, had lunch, don't worry, I'll look it over and deploy if you're gone
[19:00:29] <milimetric>	 (metrics meeting started)
[19:00:40] <fdans>	 awesome, thank you milimetric 
[19:01:15] <ottomata>	 great elukey i'm watching it
[19:01:16] <ottomata>	 thanks
[19:01:20] <ottomata>	 (also in meetings....)
[19:04:18] <ottomata>	 elukey:  its totally working!  syncing everything over nicely :)
[20:01:21] <wikibugs>	 (03PS2) 10Milimetric: Replaces hardcoded URL with dynamic date values [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/398306 (https://phabricator.wikimedia.org/T182859) (owner: 10Fdans)
[20:02:04] <nuria_>	 Dario giving us some wikilove RIGHT NOW for wikistats so you know!
[20:02:28] <milimetric>	 fdans: you around still?
[20:02:42] <milimetric>	 k, too late, deploying :)
[20:02:44] <fdans>	 milimetric: yep, looking at your patch
[20:02:52] <fdans>	 ah, go ahead then
[20:02:54] <milimetric>	 oh ok :)
[20:02:56] <milimetric>	 nono, all yours
[20:03:09] <milimetric>	 I just didn't want special casing in aqs, otherwise it was cool
[20:04:29] <fdans>	 milimetric: I thought the opposite
[20:04:57] <fdans>	 I thought it was aqs who should know about particularities in the metrics, not the components
[20:05:34] <fdans>	 my first approach was like yours but moved it to aqs for that reason milimetric 
[20:06:34] <milimetric>	 fdans: mmm, maybe, but the aqs client is dumb otherwise, just takes parameters.  So if we do something like that, we should maybe make a second layer
[20:06:45] <milimetric>	 that layer would know about the config and the inputs, and make the connection
[20:07:00] <milimetric>	 I think we're both right, neither of those things should know about each other
[20:07:12] <fdans>	 we're already doing a special case for tops in aqs, though, right?
[20:07:21] <fdans>	 the formatTops method
[20:07:30] <milimetric>	 for output, yeah, didn't like that either
[20:07:38] <milimetric>	 because it's replicated in GraphModel
[20:07:42] <milimetric>	 (the special casing)
[20:07:51] <milimetric>	 so it's really the config / GraphModel that should do all this work
[20:08:46] <milimetric>	 like let { common, unique } = graphModel.getParameters(userInput);
[20:08:49] <milimetric>	 fdans: ^
[20:08:57] <fdans>	 right
[20:09:00] <milimetric>	 I think that's what we can get to from both MetricWidget and Detail
[20:09:07] <milimetric>	 so this just seemed like a closer step to that
[20:10:47] <milimetric>	 anyway fdans, feel free to deploy either way
[20:13:02] <fdans>	 ok, let's deploy this
[20:13:25] <wikibugs>	 (03CR) 10Fdans: [C: 032] Replaces hardcoded URL with dynamic date values [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/398306 (https://phabricator.wikimedia.org/T182859) (owner: 10Fdans)
[20:18:30] <wikibugs>	 (03PS1) 10Fdans: Release 2.1.2 [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/398321
[20:21:42] <joal>	 Gone for tonight lads
[20:21:56] <joal>	 Thanks for the patch and deploy milimetric and fdans !!
[20:22:18] <wikibugs>	 (03PS2) 10Fdans: Release 2.1.2 [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/398321
[20:24:02] <wikibugs>	 (03CR) 10Fdans: [V: 032 C: 032] Release 2.1.2 [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/398321 (owner: 10Fdans)
[20:25:01] <milimetric>	 thanks fdans, and let's keep in mind the refactor, it's needed for more than just this reason
[20:26:17] <wikibugs>	 (03PS1) 10Fdans: Release 2.1.2 [analytics/wikistats2] (release) - 10https://gerrit.wikimedia.org/r/398322
[20:26:42] <wikibugs>	 (03CR) 10Fdans: [V: 032 C: 032] Release 2.1.2 [analytics/wikistats2] (release) - 10https://gerrit.wikimedia.org/r/398322 (owner: 10Fdans)
[20:27:48] <fdans>	 ok a-team patches deployed, and I just tested the hive query with the bucketed views and the K and it's looking good!
[20:27:56] <fdans>	 see you on Monday
[20:30:53] <joal>	 Bye fdans_onholiday :)
[20:40:31] <robh>	 anyone have some info on how hue.wikimedia.org login is controlled?  
[20:40:46] <robh>	 I have an access request for someone requesting shell access for analytics-privatedata-users in regards to it
[20:40:48] <robh>	 https://phabricator.wikimedia.org/T182908
[20:41:02] <robh>	 Seems odd they are asking for shell access, and then requesting access to a web service.
[20:48:56] <ottomata>	 byye
[20:49:00] <ottomata>	 robh, yeah
[20:49:20] <ottomata>	 they don't need shell access to log into hue, but hue iteracts with hadoop, and for that they need a shell account
[20:49:35] <robh>	 What controls login to the hue portal?
[20:49:37] <ottomata>	 robh: https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Access#HTTP_Access
[20:49:42] <robh>	 It seems they want both, heh
[20:49:44] <ottomata>	 its ldap, but the account needs to be manually synced
[20:50:06] <robh>	 ahhhh
[20:50:12] <robh>	 ok, let me correct my reply
[20:50:22] <ottomata>	 so, once they have a shell account, then we can create etheir hue account
[20:51:10] <robh>	 so they need privatedata-users or just any stat shell?
[20:51:29] <robh>	 the task lacks details on exactly what they plan to do with the shell access side so i asked for them to clarify
[20:51:35] <robh>	 but perhaps its perfectly clear to you? =]
[20:51:47] <ottomata>	 he emailed me before, so a little~!
[20:52:12] <ottomata>	 i actually don't know for sure if he needs analytics-users or analyitcs-privatedata-users, but usually people using hadoop want access to webrequest or other private data, so usually its analytics-privatedata-users
[20:52:28] <ottomata>	 if they want hadoop access, its always going to be one of the analytics-* groups
[20:53:10] <ottomata>	 robh, also, i think he already has a shell account defined in data.yaml, just not in any groups(?)
[20:53:14] <robh>	 nope
[20:53:17] <robh>	 he has an ldap account
[20:53:18] <robh>	 no shell
[20:53:29] <robh>	 so he has the ldap nda flag but no login to systems yet
[20:53:47] <robh>	 (so he is defined in data.yaml yes)
[20:53:56] <robh>	 but in the ldap users only section for now
[20:54:18] <ottomata>	 ah ok
[20:54:48] <robh>	 all sorted just need some feedback, thx for posting on task
[20:56:10] <robh>	 Finally got a shell request and those are typically the most rewarding parts of clinic duty (people are very happy when you give them access to things)
[20:56:21] <robh>	 feel bad makign them explain more but thems the breaks.
[21:04:12] <ottomata>	 haha
[21:04:13] <ottomata>	 :)
[23:02:41] <leila>	 dear analytics. congrats on Wikistats 2.
[23:03:44] <leila>	 question: I assume that Edits in https://stats.wikimedia.org/v2/#/commons.wikimedia.org show the edits in namespace 0 and not the uploads to Commons. Is this correct? And if yes, when do you think this will be fixed? :)
[23:48:57] <wikibugs>	 10Analytics: Refresh zookeeper nodes on equiad - https://phabricator.wikimedia.org/T182924#3838887 (10Nuria)
[23:49:59] <wikibugs>	 10Analytics: Refresh zookeeper nodes on eqiad - https://phabricator.wikimedia.org/T182924#3838896 (10Nuria) a:03elukey
[23:50:53] <wikibugs>	 10Analytics, 10Analytics-Kanban: Hardware refresh eventlogging box - https://phabricator.wikimedia.org/T182925#3838900 (10Nuria)
[23:55:56] <wikibugs>	 10Analytics, 10Analytics-Kanban: Put in service 8 new hadoop nodes - https://phabricator.wikimedia.org/T182926#3838922 (10Nuria)
[23:59:20] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10User-Elukey: Refactor analytics cronjobs to alarm on failure with (maybe) cronic - https://phabricator.wikimedia.org/T172532#3838935 (10Nuria)