[00:55:15] <wikibugs>	 10Analytics, 10Analytics-Wikistats: Trends for editor types, and new editors in particular (in Wikistats 2.0) - https://phabricator.wikimedia.org/T186791#3961936 (10jeblad) The way I calculate this is pretty dumb. It is a simple average over last twelve months and previous twelve months, and the difference bet...
[07:35:11] <elukey>	 gooood morning!
[07:35:24] <elukey>	 so analytics1029 has only 3G left in its root partition
[07:35:47] <elukey>	 and we have 30G in /tmp
[07:40:55] <elukey>	 weird, the big files are ASCII with output that seems to be a simple map-red task
[07:41:06] <elukey>	 3 1:0.0 2:237.0 3:0.0 4:237.0 5:118.5 6:167.5843 ..
[07:59:18] <elukey>	 all right deleted some files, need to follow up on this issue..
[08:06:18] <wikibugs>	 10Analytics, 10Discovery, 10Wikidata, 10Wikidata-Query-Service: Add pageviews total counts to WDQS - https://phabricator.wikimedia.org/T174981#3962185 (10Smalyshev) p:05Triage>03Normal
[08:10:22] <elukey>	 now I am trying to remember where are stored the pageviews archives
[08:17:03] <elukey>	 so stat1005's hdfs dir, and /mnt/hdfs was down
[08:17:12] <elukey>	 after the usual oom killer run
[08:17:27] <elukey>	 in theory every hour at :51 an rsync runs
[08:18:52] <elukey>	 and data is there
[08:19:22] <elukey>	 so in theory at :51 we'll have the rsync started and we'll be able to answer to the email about missing pageview archives
[08:29:42] <joal>	 Thanks a lot for that finding elukey :_)
[08:29:45] <joal>	 hi mate
[08:32:32] <elukey>	 joal: o/
[08:32:56] <elukey>	 the spikes in load on stat1005 are hitting also rsyncs
[08:33:04] <elukey>	 we'd need to discuss a solution
[08:33:40] <joal>	 :(
[08:34:00] <joal>	 elukey: are there ways to enforce usage limitation?
[08:35:08] <elukey>	 joal: we'd need to study what to do (a bit ignorant to be honest) but as short term plan we could ask to Ariel to move the rsync from stat1005 to stat1004
[08:35:13] <elukey>	 that is way less busy
[08:35:35] <elukey>	 it only need a rsync server and /mnt/hdfs (both already there iirc)
[08:39:09] <joal>	 Would do elukey 
[08:56:32] <dsaez>	 good morning guys
[08:56:41] <elukey>	 dsaez: o/
[08:56:52] <joal>	 Hi dsaez 
[08:57:06] <dsaez>	 qq: what is the the rsync to copy between stats machines?
[08:59:40] <dsaez>	 i'm trying,at stat1005: rsync -avz  code/ stat1004: 
[09:03:04] <elukey>	 dsaez: lemme try, I am not a super rsync user as Andrew so I don't know the command from the top of my head :D
[09:04:13] <dsaez>	 elukey, great, thx, i know that the command is somewhere on wikitech (?) but couldn't find
[09:04:23] <joal>	 dsaez: Can't help really - I'm super bad at rsync
[09:05:23] <dsaez>	 joal: no worries, I was ashamed that I didn't know, but now I see that is not trivial :D
[09:05:27] <elukey>	 dsaez: have you managed to do it in the past? Because for some dirs that are pre-whitelisted it should work, but in general IIRC you'd need a ssh key on stat1005 to mkae it work (that is not allowed of course)
[09:05:42] <elukey>	 but bare in mind that I don't have too much coffee in my veins atm
[09:05:44] <elukey>	 :D
[09:06:42] <dsaez>	 hehe... I think i did once, but couldn't find on my .bash_history , but don't worry guys, It is not urgent, I can ask ottomata later 
[09:13:02] <elukey>	 dsaez: found it!
[09:13:06] <elukey>	 rsync -av stat1004::home/elukey/whitelist.tsv .
[09:13:29] <elukey>	 so home is whitelisted in the rsync server conf and all the stats are allowed in their configs
[09:13:56] <elukey>	 so from stat1005 you just need to prepend: rsync -av stat1004::home/ to access the home directory dir, then
[09:14:08] <elukey>	 you add your home dir + file(s) path
[09:14:28] <elukey>	 like rsync -av stat1004::home/dsaez/this_is_a_cool_code_path .
[09:15:08] <dsaez>	 let me try
[09:16:04] <dsaez>	 it is asking for password :S
[09:16:47] <elukey>	 can you paste the command that you are using? It shouldn't ask
[09:17:31] <dsaez>	 @stat1004:  rsync -av stat1005:home/dsaez/code code
[09:17:47] <dsaez>	 oh wait
[09:17:55] <elukey>	 missing :
[09:17:56] <elukey>	 :P
[09:18:23] <dsaez>	 oooh!
[09:18:25] <elukey>	 rsync -av stat1005::home/etc..
[09:18:31] <dsaez>	 you are right!
[09:18:33] <dsaez>	 thanks
[09:18:39] <dsaez>	 wtf means the double :: ?
[09:21:40] <elukey>	 dsaez: I think it is called a "module" by rsync. We have a special configuration on all the stat hosts for the home directory, that basically dictates that all the rsync between stat hosts using as source the "home" module will not require any authenticataion
[09:21:46] <elukey>	 *authentication
[09:21:49] <elukey>	 from the user
[09:22:04] <elukey>	 and they are specified via the ::
[09:22:04] <dsaez>	 wow...nice, never heard about this
[09:22:28] <elukey>	 it is really nice, even if the syntax is weird the first time :D
[09:22:34] <elukey>	 I always forget it
[09:23:22] <dsaez>	 ups! no space at stat1004 :(
[09:24:22] <elukey>	 dsaez: can I remove the code dir on stat1004?
[09:24:30] <dsaez>	 yes
[09:25:08] <elukey>	 now you just unveiled an issue :D
[09:25:23] <elukey>	 on stat1005 we symlink the home dir to /srv/home
[09:25:34] <elukey>	 that is a different partition with a ton of space
[09:25:39] <elukey>	 on stat1004 we haven't done it
[09:25:46] <elukey>	 so our fault :)
[09:26:29] <dsaez>	 hehe..
[09:30:35] <elukey>	 lemme see if I can fix it
[09:36:55] <elukey>	 dsaez: is it ok if we fix this issue in the EU afternoon?
[09:37:06] <elukey>	 I'll send  you an email once done
[09:37:09] <dsaez>	 sure,
[09:37:13] <dsaez>	 thanks!
[09:37:19] <elukey>	 thank you!
[10:37:10] <wikibugs>	 10Analytics-EventLogging, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Verify duplicate entry warnings logged by the m4 mysql consumer - https://phabricator.wikimedia.org/T185291#3962734 (10elukey) On deployment-eventlog02 I tested https://gerrit.wikimedia.org/r/405687 and with my surprise none o...
[10:41:43] <elukey>	 really interesting --^
[11:27:29] * elukey lunch + errand!
[14:03:20] <wikibugs>	 10Analytics-Kanban, 10User-Elukey: Alarm when /mnt/hdfs is mounted but showing no files/dirs in there - https://phabricator.wikimedia.org/T187073#3963420 (10elukey)
[14:06:13] <ottomata>	 elukey:  that's why the pv files weren't updated ^?
[14:06:22] <ottomata>	 (hiiii btw! :) )
[14:07:38] <elukey>	 ottomata: hiiiii yes :)
[14:07:44] <elukey>	 just answered to the ml!
[14:08:14] <ottomata>	 did you just umount + mount?
[14:08:38] <elukey>	 yep exactly
[14:08:50] <ottomata>	 elukey:  I wonder if for things like that...we should see if we can install an analytics client on the dataset machine, and dl from hdfs instead..
[14:09:21] <elukey>	 ottomata: yep I was about to ask you something similar, and/or to move the rsync target from stat1005 to stat1004
[14:10:28] <ottomata>	 hm would rather do the analytics client if we can
[14:10:44] <ottomata>	 although.... the rsync runs on the ddataset hosts, right?
[14:10:48] <ottomata>	 pulls from stat1005?
[14:10:55] <ottomata>	 if that's the case then I don't mind either way
[14:11:01] <ottomata>	 dont' really want to start moving crons from sta1005 to stat1004
[14:11:12] <ottomata>	 but if the rsync runs elsewhere anyway, i'm fine with it
[14:11:52] <elukey>	 afaics it runs on dataset1001 and rsyncs from stat1005's /mnt/hdfs/etc..
[14:12:00] <ottomata>	 aye
[14:12:03] <ottomata>	 yeah i think so too
[14:12:12] <ottomata>	 in that case i'd be fine with pulling from stat1004
[14:14:33] <wikibugs>	 (03CR) 10Ottomata: Add ISP data to webrequest table (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/405899 (https://phabricator.wikimedia.org/T167907) (owner: 10Joal)
[14:19:00] <ottomata>	 elukey:  you ok if I deploy eventlogging now?
[14:19:11] <ottomata>	 gonna get my IP  changes and the eventcapsule changes in
[14:19:15] <ottomata>	 it shoul dbe no-op for mysql
[14:19:18] <ottomata>	 and everything
[14:19:28] <ottomata>	 until we add IP field to vk from https://gerrit.wikimedia.org/r/#/c/409354/
[14:19:34] <ottomata>	 but i have to deploy El first
[14:20:30] <mforns>	 hi teaaam :]
[14:20:35] <ottomata>	 hiii
[14:21:00] <ottomata>	 mforns:  i'm about to deploy the eventcapsule code change 
[14:21:03] <wikibugs>	 (03CR) 10Joal: Add ISP data to webrequest table (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/405899 (https://phabricator.wikimedia.org/T167907) (owner: 10Joal)
[14:21:09] <mforns>	 ottomata, ok
[14:21:27] <mforns>	 do you want me to look at metrics and logs and stuff?
[14:21:43] <elukey>	 ottomata: ok for me!
[14:21:59] <ottomata>	 i'll be watching some logs, just wanted to tell a couple of folks here that i'm doing it :)
[14:22:31] <mforns>	 ok
[14:22:43] <elukey>	 ottomata: you can also reset all my things, I commented one line to make a test
[14:22:51] <wikibugs>	 10Analytics-Kanban, 10Research-landing-page, 10Patch-For-Review: Pageviews/Stats on research.wikimedia.org - https://phabricator.wikimedia.org/T186819#3963480 (10bmansurov)
[14:22:51] <elukey>	 I have my results so ok to wip
[14:22:52] <ottomata>	 elukey:  its ok, my things are in beta already
[14:22:53] <elukey>	 *wipe
[14:23:00] <ottomata>	 so i'm deploying in prod now
[14:23:13] <elukey>	 ah prod sorry
[14:23:20] <elukey>	 I got it now :)
[14:23:21] <elukey>	 yep +1
[14:24:48] <wikibugs>	 10Analytics-Kanban, 10Research-landing-page, 10Patch-For-Review: Pageviews/Stats on research.wikimedia.org - https://phabricator.wikimedia.org/T186819#3956274 (10bmansurov) @Nuria thanks for the snippet. Your review on the patch above will be appreciated.
[14:46:41] <ottomata>	 !log deploying eventlogging for T186833 with EventCapsule in code and IP NO_DB_PROPERTIES
[14:46:43] <stashbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log
[14:46:43] <stashbot>	 T186833: Include X-Client-IP in EventLogging data and geocode during Hive JSON Refinement - https://phabricator.wikimedia.org/T186833
[14:50:23] <ottomata>	 cool, EL looks fine, merging vk change to produce IP
[14:50:38] <elukey>	 super
[14:51:17] <joal>	 by the way ottomata - Have you found your solution for geocode-testi
[14:51:20] <joal>	 ?
[14:51:55] <ottomata>	 joal:  hmm, haven't looked at it since friday, so no?
[14:52:00] <ottomata>	 i'm having trouble just testing spark stuff
[14:52:06] <ottomata>	 i tried doing like druid loader thing
[14:52:09] <joal>	 ottomata: I might know
[14:52:10] <ottomata>	 but got failures
[14:52:27] <joal>	 ottomata: failure from lacking data?
[14:53:42] <ottomata>	 no didn't get that far, spark context failures
[14:53:47] <joal>	 Arf :(
[14:53:55] <ottomata>	 TaskNotSerializable
[14:54:08] <joal>	 hm
[14:54:23] <wikibugs>	 (03PS5) 10Ottomata: Add TransformFunctions for JsonRefine job [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/407508 (https://phabricator.wikimedia.org/T185237)
[14:54:33] <ottomata>	 https://gerrit.wikimedia.org/r/#/c/407508/5/refinery-job/src/test/scala/org/wikimedia/analytics/refinery/job/jsonrefine/TestTransformFunctions.scala
[14:54:38] <wikibugs>	 10Analytics-EventLogging, 10Analytics-Kanban: Sunset MySQL data store for eventlogging. Find an alternative query interface for eventlogging on analytics cluster that can replace MariaDB - https://phabricator.wikimedia.org/T159170#3963585 (10Milimetric) @jcrespo: We really liked Clickhouse's performance and in...
[14:54:50] <ottomata>	 tried lots of different ways of instantiating spark in beforeEach()
[14:59:57] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Add TransformFunctions for JsonRefine job [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/407508 (https://phabricator.wikimedia.org/T185237) (owner: 10Ottomata)
[15:02:07] <elukey>	 ottomata: https://grafana.wikimedia.org/dashboard/db/varnishkafka?orgId=1&from=now-2d&to=now&var-instance=webrequest_jumbo_duplicate - looks gooooood
[15:04:18] <ottomata>	 COOOOl
[15:04:36] <ottomata>	 maybe we can do misc for real this week
[15:04:41] <ottomata>	 today...maybe!?
[15:04:44] <mforns>	 ottomata, mysql insertion rate is falling progressively since 20 minutes ago...
[15:04:48] <ottomata>	 ???
[15:04:51] <mforns>	 https://grafana.wikimedia.org/dashboard/db/eventlogging?orgId=1&from=now-3h&to=now-5m
[15:04:59] <ottomata>	 mforns:  i'm watching it insert tons of events with no failures
[15:05:07] <ottomata>	 maybe its just a rate-related blip because i restarted?
[15:05:16] <mforns>	 yes, but the throughput is less than normal
[15:05:44] <ottomata>	 mforns:  i think its just a blip.  i hope...
[15:05:53] <ottomata>	 the rate jumps and drops dramatically
[15:05:54] <mforns>	 mmmm, doesn't look like it, but lets observe the next 20 mins
[15:06:00] <ottomata>	 m, the average is low though
[15:06:04] <mforns>	 yea
[15:07:54] <ottomata>	 mforns:  btw, i'm bumping the el processors now, that will mess with metrics too, as it will need to do some rebalancing
[15:08:12] <mforns>	 ok, are you adding more?
[15:08:17] <ottomata>	 more?
[15:08:20] <ottomata>	 no
[15:08:26] <ottomata>	 i'm just bumping them to pick up ip change
[15:08:31] <mforns>	 ok ok
[15:08:39] <ottomata>	 i see IPs!
[15:08:43] <mforns>	 hehehe
[15:08:44] <ottomata>	 and events are being inserted into mysql too
[15:08:53] <ottomata>	 so everything is working from wht I can tell, just gotta make sure that througput is ok
[15:08:59] <mforns>	 ok
[15:10:37] <ottomata>	 mforns:  do we know who is runinng the Print schema?
[15:10:41] <ottomata>	 its inserting a LOT of data into mysql
[15:10:50] <ottomata>	 it is of the rate that we normally blacklist it
[15:10:55] <mforns>	 ottomata, was looking exactly into that! xD
[15:11:02] <mforns>	 it's inserting a lot of events
[15:11:11] <mforns>	 yea
[15:11:14] <mforns>	 one sec
[15:11:32] <milimetric>	 I'd blacklist it, if we don't know about it, it means someone didn't talk to us
[15:11:35] <mforns>	 baho and olga
[15:13:03] <mforns>	 ottomata, milimetric, that is the schema we discussed with a group of people about the skin field: https://phabricator.wikimedia.org/T175395
[15:13:54] <mforns>	 ottomata, insertions continue to drop in grafana
[15:15:03] <mforns>	 will histogram EL tables
[15:15:30] <ottomata>	 2018-02-12 15:15:21,252 [17235] (MainThread) Inserted 3000 Print_17630514 events in 1.217617 seconds
[15:15:35] <ottomata>	 dunno how it could be
[15:15:53] <ottomata>	 that'sh appening like eveyr 15 secds
[15:15:59] <ottomata>	 that's 200/ sec right there
[15:18:44] <ottomata>	 mforns:  just eyeballing the rate looks fine to me.  ithink the averages are just messed up
[15:18:46] <mforns>	 ottomata, I can not find those events in the slave
[15:18:50] <ottomata>	 you can see the 5 min one drop and then jump
[15:18:54] <ottomata>	 in the slave maybe ya, that takes a while
[15:18:56] <ottomata>	 let's check master
[15:20:31] <ottomata>	 mforns:  i see recent Print schema events in the master
[15:20:38] <mforns>	 ok ok
[15:21:05] <ottomata>	 it is possible the rate is slowly dropping, but that would be very strange, as we don't see any errors, and most things are working fine
[15:21:16] <mforns>	 yea, it's strange
[15:22:17] <mforns>	 ottomata, maybe the fact that those Print insertions are happening so often, it's collapsing the db and the other insertions happen all at once every 5 minutes? dunno
[15:23:49] <mforns>	 seems to be normalizing now
[15:25:04] <ottomata>	 hmm, dunno the Prints have been coming for a while now
[15:25:21] <elukey>	 mforns: if the db was collapsing we'd have a notification from the dbas probably :)
[15:27:47] <ottomata>	 mforns:  did they ever talk to us about the number of events it would be sending?
[15:27:52] <ottomata>	 the Print schema?
[15:27:57] <ottomata>	 i'm not going to do anytihng about it, but it is quite a lot
[15:27:59] <mforns>	 ottomata, not that I remember
[15:28:10] <ottomata>	 if the db is going fine, then i guess we can leave it, elukey whatchu think?
[15:28:11] <mforns>	 yes, I think we already have a week of data
[15:28:16] <mforns>	 or 4/5 days
[15:28:36] <mforns>	 which already will be a huge table
[15:29:08] <mforns>	 ~70M events
[15:29:18] <elukey>	 https://grafana.wikimedia.org/dashboard/db/mysql?orgId=1&var-dc=eqiad%20prometheus%2Fops&var-server=db1107&var-port=9104&from=now-7d&to=now
[15:29:25] <elukey>	 these are the metrics for the el master --^
[15:29:30] <elukey>	 don't see anything weird
[15:30:12] <elukey>	 having huge tables in the longer term will surely not be great
[15:30:32] <mforns>	 yes, we should blacklist that schema no?
[15:30:45] <mforns>	 will comment on the taskl
[15:31:40] <elukey>	 mforns: for 90d, assuming 70M events every 5d, it will be 1260M, that is a lot
[15:31:52] <elukey>	 let's make sure we don't have to run alter tables in there :D
[15:31:53] <mforns>	 yea
[15:31:55] <ottomata>	 great thanks yall
[15:32:49] <mforns>	 insertion rate seems to be recovering in grafana: https://grafana.wikimedia.org/dashboard/db/eventlogging?orgId=1&from=now-3h&to=now-5m
[15:33:18] <elukey>	 ottomata: did you see that the ops sync is one hour before today? Shall we send e-scrum and then join later on grosking ?
[15:34:11] <mforns>	 I guess it was that the restart sync'ed all queues to be flushed at the same time every 5 minutes. And after a while, each queue would get a bit spread out.
[15:36:02] <ottomata>	 oh elukey didn't see that..
[15:39:46] <wikibugs>	 10Analytics, 10Proton, 10Readers-Web-Backlog, 10MW-1.31-release-notes (WMF-deploy-2017-10-31 (1.31.0-wmf.6)), 10Patch-For-Review: Implement Schema:Print purging strategy - https://phabricator.wikimedia.org/T175395#3963806 (10mforns) Hi all!  We've noticed that the Print schema has been sending events to...
[15:41:35] <wikibugs>	 10Analytics-Kanban, 10Research-landing-page, 10Patch-For-Review: Pageviews/Stats on research.wikimedia.org - https://phabricator.wikimedia.org/T186819#3963809 (10Nuria) My eyes hurt from seeing that snippet  repeated on every page ...
[15:45:57] <wikibugs>	 10Analytics-Cluster, 10Analytics-Kanban, 10Patch-For-Review: Provision new Kafka cluster(s) with security features - https://phabricator.wikimedia.org/T152015#3963838 (10Nuria)
[15:45:59] <wikibugs>	 10Analytics-Cluster, 10Analytics-Kanban, 10Patch-For-Review: Set up (temporary) IPSec for Kafka jumbo-eqiad cluster - https://phabricator.wikimedia.org/T186598#3963837 (10Nuria) 05Open>03Resolved
[15:47:13] <wikibugs>	 10Analytics-Kanban, 10Research-landing-page, 10Patch-For-Review: Pageviews/Stats on research.wikimedia.org - https://phabricator.wikimedia.org/T186819#3963841 (10bmansurov) Thanks, @Nuria. It's on our roadmap to reduce repetition.
[15:49:17] <milimetric>	 does this logger need to be named here?  It doesn't look like it from just the utils code, but checking just in case: https://github.com/wikimedia/analytics-refinery/blob/master/python/refinery/util.py#L33
[15:50:12] <milimetric>	 I wanted to refactor it so all loggers use the INFO/ERROR setup that Marcel and Joseph have, otherwise that's duplicated all over the place with the two new sqoop scripts
[15:51:37] <wikibugs>	 10Analytics-Kanban: sqoop_mediawiki failed pagelinks for a few wikis - https://phabricator.wikimedia.org/T186529#3963850 (10Nuria) 05Open>03Resolved
[15:52:04] <mforns>	 milimetric, I think if you name it there, the logs will all have the tag refinery-utils
[15:52:26] <mforns>	 milimetric, not sure if you can rename a logger once created, maybe that would work
[15:52:27] <wikibugs>	 10Analytics-Kanban, 10Operations, 10ops-eqiad: dbstore1002 possibly MEMORY issues - https://phabricator.wikimedia.org/T183771#3963851 (10Nuria) 05Open>03Resolved
[15:52:57] <wikibugs>	 10Analytics-Kanban, 10Analytics-Wikistats, 10Patch-For-Review: Wikistats Bug: Menu to select projects doesn't work (sometimes?) - https://phabricator.wikimedia.org/T179530#3963853 (10Nuria) 05Open>03Resolved
[15:53:17] <wikibugs>	 10Analytics-Kanban, 10Analytics-Wikistats, 10Patch-For-Review: Wikistats Bug: Menu to select projects doesn't work (sometimes?) - https://phabricator.wikimedia.org/T179530#3727714 (10Nuria) Performance issue solved and tracked separately
[15:53:28] <wikibugs>	 10Analytics-Kanban, 10Analytics-Wikistats, 10Patch-For-Review: Please add download option 'as csv file' to Wikistats 2 - https://phabricator.wikimedia.org/T183192#3963856 (10Nuria) 05Open>03Resolved
[15:53:29] <milimetric>	 mforns: I was thinking of not naming it at all, because that's only needed in complicated multi-threaded instances
[15:53:30] <mforns>	 milimetric, oh! that was already there
[15:53:36] <mforns>	 ok ok
[15:53:41] <wikibugs>	 10Analytics-Kanban, 10Analytics-Wikistats, 10Patch-For-Review: Replace any debouncing with Vue.nextTick - https://phabricator.wikimedia.org/T180412#3963857 (10Nuria) 05Open>03Resolved
[15:53:54] <wikibugs>	 10Analytics-Kanban, 10Patch-For-Review: Add wikis to clickstream generation - https://phabricator.wikimedia.org/T185344#3963860 (10Nuria) 05Open>03Resolved
[15:54:10] <wikibugs>	 10Analytics-Kanban, 10Analytics-Wikistats: Add clear definitions to all metrics, along with links to Research: pages - https://phabricator.wikimedia.org/T183261#3963862 (10Nuria) 05Open>03Resolved
[15:54:12] <wikibugs>	 10Analytics-Kanban, 10Analytics-Wikistats: Add a tooltip to all non-obvious concepts like split categories, abbreviations - https://phabricator.wikimedia.org/T177950#3963863 (10Nuria)
[15:54:25] <wikibugs>	 10Analytics-Kanban, 10Pageviews-API, 10Patch-For-Review: Use country ISO codes instead of country names in top by country pageviews - https://phabricator.wikimedia.org/T184911#3963864 (10Nuria) 05Open>03Resolved
[15:54:30] <mforns>	 milimetric, maybe passing the logger as a parameter to all utils functions? :/
[15:54:34] <milimetric>	 mforns: so I was thinking of making an unnamed logger there, and exposing a configure_logger method that would just do what you and Joseph did
[15:54:54] <wikibugs>	 10Analytics-EventLogging, 10Analytics-Kanban: Timestamp format in Hive-refined EventLogging tables is incompatible with MySQL version - https://phabricator.wikimedia.org/T179540#3963866 (10Nuria) 05Open>03Resolved
[15:54:56] <wikibugs>	 10Analytics-EventLogging, 10Analytics-Kanban, 10Patch-For-Review: Implement EventLogging Hive refinement - https://phabricator.wikimedia.org/T162610#3963867 (10Nuria)
[15:55:06] <mforns>	 milimetric, sounds good!
[15:55:07] <milimetric>	 mforns: wanna talk in batcave, it's almost standup anyway
[15:55:11] <mforns>	 ok
[15:55:14] <wikibugs>	 10Analytics-Cluster, 10Analytics-Kanban, 10Language-Team, 10MediaWiki-extensions-UniversalLanguageSelector, and 3 others: Migrate table creation query to oozie for interlanguage links - https://phabricator.wikimedia.org/T170764#3963868 (10Nuria) 05Open>03Resolved
[15:55:24] <wikibugs>	 10Analytics-Kanban, 10Analytics-Wikistats, 10Patch-For-Review: Wrong  y-axis labels  on wikistats graph - https://phabricator.wikimedia.org/T184138#3963871 (10Nuria) 05Open>03Resolved
[15:55:37] <wikibugs>	 10Analytics-Cluster, 10Analytics-Kanban, 10Patch-For-Review: Make tranquility work with Spark - https://phabricator.wikimedia.org/T168550#3963872 (10Nuria) 05Open>03Resolved
[15:55:52] <wikibugs>	 10Analytics-Cluster, 10Analytics-Kanban, 10Analytics-Wikistats, 10RESTBase-API, 10Services (done): Add "Pageviews by Country" AQS endpoint - https://phabricator.wikimedia.org/T181520#3963874 (10Nuria)
[15:56:06] <wikibugs>	 10Analytics-Kanban: Investigate oozie suspended workflows - https://phabricator.wikimedia.org/T163933#3963875 (10Nuria) 05Open>03Resolved
[15:56:11] <wikibugs>	 10Analytics, 10Proton, 10Readers-Web-Backlog, 10MW-1.31-release-notes (WMF-deploy-2017-10-31 (1.31.0-wmf.6)), 10Patch-For-Review: Implement Schema:Print purging strategy - https://phabricator.wikimedia.org/T175395#3963877 (10Jdlrobson) This can be attributed to T181297 which added events for impressions....
[15:56:23] <wikibugs>	 10Analytics-Kanban, 10Patch-For-Review: Productionize streaming jobs - https://phabricator.wikimedia.org/T176983#3963879 (10Nuria) 05Open>03Resolved
[15:56:35] <wikibugs>	 10Analytics: Upgrade Kafka on main cluster with security features - https://phabricator.wikimedia.org/T167039#3963881 (10Nuria)
[15:56:38] <wikibugs>	 10Analytics-Cluster, 10Analytics-Kanban, 10EventBus: Delete stale topics from main Kafka clusters - https://phabricator.wikimedia.org/T149594#3963880 (10Nuria) 05Open>03Resolved
[15:56:57] <wikibugs>	 10Analytics-Kanban, 10Analytics-Wikistats, 10Patch-For-Review: Link to 'more info' doesn't always work - https://phabricator.wikimedia.org/T183188#3963883 (10Nuria) 05Open>03Resolved
[15:57:14] <wikibugs>	 10Analytics-Kanban, 10Analytics-Wikistats, 10Patch-For-Review: UI not querying daily granularity for 3-month and 1-month periods - https://phabricator.wikimedia.org/T186075#3963886 (10Nuria) 05Open>03Resolved
[15:57:30] <wikibugs>	 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Move away from jmxtrans in favor of prometheus jmx_exporter - https://phabricator.wikimedia.org/T175344#3963888 (10Nuria)
[15:57:32] <wikibugs>	 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Add the prometheus jmx exporter to all the Hadoop daemons - https://phabricator.wikimedia.org/T177458#3963887 (10Nuria) 05Open>03Resolved
[15:57:47] <wikibugs>	 10Analytics-Cluster, 10Analytics-Kanban, 10Patch-For-Review: Add IPv6 addresses for kafka-jumbo hosts - https://phabricator.wikimedia.org/T185262#3963889 (10Nuria) 05Open>03Resolved
[15:57:49] <wikibugs>	 10Analytics-Cluster, 10Analytics-Kanban, 10Patch-For-Review: Provision new Kafka cluster(s) with security features - https://phabricator.wikimedia.org/T152015#3963890 (10Nuria)
[16:01:22] <elukey>	 nuria_: hola! The ops meeting is about to start, today it starts one hour before.. is it ok if I (and probably ottomata) join later on for grosking?
[16:01:31] <elukey>	 will send the e-scrum of course
[16:02:10] <nuria_>	 ottomata, elukey : ok, please send e-scrum
[16:02:22] <ottomata>	 will do
[16:05:28] <nuria_>	 ottomata: did you blacklisted print events so they do not get inserted into mysql eventlogging? cc mforns 
[16:05:40] <ottomata>	 nuria_:  no see above
[16:05:48] <ottomata>	 were they supposed to be blacklisted?
[16:15:39] <milimetric>	 added a patch, ottomata: https://gerrit.wikimedia.org/r/#/c/409930/
[16:20:56] <wikibugs>	 (03PS6) 10Joal: Add TransformFunctions for JsonRefine job [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/407508 (https://phabricator.wikimedia.org/T185237) (owner: 10Ottomata)
[16:22:34] <joal>	 ottomata: --^ One test fails, but it seems for logic reason -- scala issue solved
[16:26:00] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Add TransformFunctions for JsonRefine job [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/407508 (https://phabricator.wikimedia.org/T185237) (owner: 10Ottomata)
[16:31:49] <wikibugs>	 10Analytics-Kanban: Record and aggregate page previews - https://phabricator.wikimedia.org/T186728#3964092 (10Milimetric) p:05Triage>03Normal
[16:33:58] <wikibugs>	 10Analytics, 10Analytics-EventLogging: uBlock blocks EventLogging - https://phabricator.wikimedia.org/T186572#3964100 (10Milimetric) 05Open>03declined If there's any disagreement with Lego's thoughts, please re-open, but I agree.
[16:36:03] <ottomata>	 AH great joal  i did it with sqlContext originally, but i didn't have the case clsses outside of the test class
[16:36:04] <ottomata>	 great thank you!
[16:36:44] <ottomata>	 milimetric:  should we ask them about that first?
[16:36:53] <ottomata>	 were they expecting it to be mysql blacklisted?
[16:36:57] <milimetric>	 we discussed in standup, ottomata, and decided just to blacklist it now
[16:37:05] <milimetric>	 if they change the sampling, we can re-enable it
[16:37:13] <ottomata>	 milimetric:  i'm worried that this will just make folks mad
[16:37:18] <ottomata>	 maybe they are using mysql right now
[16:37:19] <ottomata>	 and don
[16:37:25] <ottomata>	 and don't expect it to just stop
[16:38:15] <nuria_>	 ottomata: but in a few  days mysql can no longer  that data so it will break no matter what 
[16:38:36] <milimetric>	 *no longer query
[16:39:12] <milimetric>	 ok, ottomata, let's wait today and do it tomorrow morning if nothing changes
[16:39:14] <nuria_>	 ottomata: let's give it to EOD today
[16:41:00] <ottomata>	 nuria_:  it wil break i a fe days?
[16:41:09] <ottomata>	 its been on at this rate since  Jan 18
[16:41:16] <nuria_>	 ottomata: as it will no longer fit in my sql
[16:41:26] <ottomata>	 as in the disk will fill up?
[16:46:09] <wikibugs>	 10Analytics: Productionize Pageview_sanitization hive code with Oozie job and refinery inclusion {hawk} - https://phabricator.wikimedia.org/T118839#3964189 (10Milimetric)
[16:46:11] <wikibugs>	 10Analytics: Backfill pageview_hourly sanitization - 1 month - {hawk} - DUPLICATE THIS TASK FOR EACH MONTH TO BACKFILL - https://phabricator.wikimedia.org/T118842#3964191 (10Milimetric)
[16:48:21] <wikibugs>	 10Analytics: Productionize Pageview_sanitization hive code with Oozie job and refinery inclusion {hawk} - https://phabricator.wikimedia.org/T118839#3964201 (10Milimetric)
[16:48:23] <wikibugs>	 10Analytics: Deploy pageview sanitization and start ongoing process {hawk} - https://phabricator.wikimedia.org/T118841#3964203 (10Milimetric)
[16:49:21] <wikibugs>	 10Analytics-Kanban: Productionitize netflow job - https://phabricator.wikimedia.org/T176984#3964208 (10Milimetric) a:03elukey
[16:53:33] <wikibugs>	 10Analytics: Make Wikipedia clickstream dataset available as API - https://phabricator.wikimedia.org/T185526#3964232 (10Milimetric)
[16:53:35] <wikibugs>	 10Analytics, 10Data-release: Wikipedia Clickstream dataset. Programmatic Access - https://phabricator.wikimedia.org/T134231#3964230 (10Milimetric)
[16:56:11] <wikibugs>	 (03PS3) 10Nuria:  [WIP] Fix issues with numer formatting [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/409714 (https://phabricator.wikimedia.org/T187010)
[16:57:05] <nuria_>	 ottomata: yes, I think  reading responded saying that they are with moving  it to hive
[17:11:26] <wikibugs>	 10Analytics, 10Analytics-Cluster: Move non-critical monthly jobs to the nice queue - https://phabricator.wikimedia.org/T186180#3936013 (10Ottomata) There aren't that many monthly jobs to move (mw-history, uniques, and now clickstream), and this month was especially bad because of some work that Erik B was doin...
[17:13:25] <elukey>	 maintenance ongoing on stat1004
[17:15:17] <ottomata>	 !log restarting eventlogging-processors to blacklist Print schema in eventlogging-valid-mixed (MySQL)
[17:15:17] <stashbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log
[17:16:37] <wikibugs>	 10Analytics-Kanban, 10Analytics-Wikistats: Create Daily & Monthly pageview dump with country data and Visualize on UI - https://phabricator.wikimedia.org/T90759#3964337 (10Nuria)
[17:16:39] <wikibugs>	 10Analytics-Cluster, 10Analytics-Kanban, 10Analytics-Wikistats, 10RESTBase-API, 10Services (done): Add "Pageviews by Country" AQS endpoint - https://phabricator.wikimedia.org/T181520#3964336 (10Nuria) 05Open>03Resolved
[17:18:29] <elukey>	 !log home dirs on stat1004 moved to /srv/home (/home symlinks to it)
[17:18:34] <elukey>	 mforns: done!
[17:18:35] <stashbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log
[17:18:46] <elukey>	 dsaez: you are now free to rsync again, more space now :)
[17:18:47] <ottomata>	 elukey:  there might be some wikitech docs that need updating, not sure though
[17:18:56] <ottomata>	 i used to tell people to copy things to /srv/$username
[17:19:16] <dsaez>	 elukey: great! thanks
[17:19:42] <elukey>	 ottomata: in theory it should be transparent right?
[17:20:39] <ottomata>	 elukey:  yes, i guess it doesn't matter, but hmm yeah not many user dirs in /srv
[17:20:42] <ottomata>	 only me and shilad
[17:21:19] <ottomata>	 mforns:  FYI Print blacklist merged and applied, expect to see drop in mysql insertion rate
[17:30:45] <wikibugs>	 (03PS1) 10Joal: [WIP] Manual importer of xml dumps to hdfs [analytics/refinery] - 10https://gerrit.wikimedia.org/r/409960
[17:38:27] <wikibugs>	 (03PS1) 10Joal: Delay clickstream monthly generation by 10 days [analytics/refinery] - 10https://gerrit.wikimedia.org/r/409966 (https://phabricator.wikimedia.org/T186180)
[17:39:31] <wikibugs>	 10Analytics, 10Analytics-Cluster, 10Patch-For-Review: Move Clickstream job to later in the month - https://phabricator.wikimedia.org/T186180#3964428 (10JAllemandou) a:03JAllemandou
[17:40:19] <wikibugs>	 10Analytics-Cluster, 10Analytics-Kanban, 10Patch-For-Review: Move Clickstream job to later in the month - https://phabricator.wikimedia.org/T186180#3936013 (10JAllemandou)
[17:40:48] <joal>	 Gone for diner - Back after
[17:41:26] <awight>	 ottomata: Hi!  Is it best practice to keep event payloads small, limiting to metadata, or is it reasonable to include a few kB of content?
[18:00:06] <ottomata>	 awight:  for ease of use, maintenance, debugging, small is better, but there is no restriction really
[18:00:07] <ottomata>	 except hmm
[18:00:07] <ottomata>	 yes
[18:00:24] <ottomata>	 this one https://github.com/wikimedia/puppet/blob/production/hieradata/common.yaml#L426
[18:00:34] <ottomata>	 so, not bigger than 4MB please
[18:05:23] <wikibugs>	 10Analytics-Cluster, 10Analytics-Kanban, 10Patch-For-Review: Move non-critical monthly jobs to the nice queue - https://phabricator.wikimedia.org/T186180#3964542 (10Tbayer)
[18:06:14] <wikibugs>	 10Analytics-Cluster, 10Analytics-Kanban, 10Patch-For-Review: Move non-critical monthly jobs to the nice queue - https://phabricator.wikimedia.org/T186180#3936013 (10Tbayer) @JAllemandou This is not what the task was about; how about we create a separate one for the Clickstream job
[18:08:24] <wikibugs>	 10Analytics-Cluster, 10Analytics-Kanban, 10Patch-For-Review: Move non-critical monthly jobs to the nice queue - https://phabricator.wikimedia.org/T186180#3964549 (10Tbayer) >>! In T186180#3946710, @Nuria wrote: > @Tbayer : maybe you can help us identify here what is not critical ? >  > We could schedule jobs...
[18:15:37] <elukey>	 a-team: is pagecounts_ez produced by Erik's cron script? I don't recall if it is marked as "deprecated" or anything similar
[18:16:05] <elukey>	 a user told me via email that https://dumps.wikimedia.org/other/pagecounts-ez/merged/2018/2018-02/ wasn't updated 
[18:16:27] <elukey>	 and after tracing it back I found a script that runs every day in Erik's crontab
[18:16:44] <elukey>	 but it should be in puppet probably if it is important no?
[18:17:13] * elukey uses the WMF's matra - "there is always a task"
[18:17:53] <mforns>	 ottomata, thank you!!
[18:18:01] <mforns>	 elukey, thanks also!
[18:19:11] <mforns>	 elukey, not sure if pagecounts_ez was produced by a cron or he himself run that code every month?
[18:23:59] <elukey>	 mforns: I tried to check Erik's scripts but I'll wait for a day to have them re-executed by cron :D
[18:24:12] <mforns>	 sure :]
[18:32:16] <elukey>	 all right people logging off :)
[18:32:19] <elukey>	 byyee
[18:33:12] <bearloga>	 nuria_ joal: qq about piwik. Can I just add a snippet similar to https://phabricator.wikimedia.org/T186819#3959151 to dataviz-literacy.wmflabs.org or will it not work because data coming from *.wmflabs.org might not be accepted?
[18:33:39] <nuria_>	 bearloga: i need to whitelist  the domain first and tehn it shall work fine
[18:33:43] <nuria_>	 *then
[18:33:46] <nuria_>	 bearloga: so yes
[18:33:47] <joal>	 Hi bearloga - No clue
[18:33:59] <nuria_>	 joal: NOW you know
[18:34:03] <joal>	 :D
[18:34:18] <bearloga>	 nuria_: neat! :D thank you!
[18:34:24] <joal>	 nuria_: I'll forget soon since I don't update the whitelist myself :)
[18:34:31] <nuria_>	 bearloga: siteId is different though
[18:34:42] <nuria_>	 bearloga: write us a ticket and we can be sure to do it
[18:35:11] <bearloga>	 nuria_: will do, thank you!
[18:37:33] <joal>	 ottomata: questions on spark?
[18:38:16] <ottomata>	 joal:  yaaa, hm, i'm actually kind of listening participating in this annual planning meeting
[18:38:24] <ottomata>	 if you gotta go dont' worry, i'll work on it and ask you tomorrow
[18:38:31] <joal>	 ok :)
[18:40:06] <ottomata>	 mainly, for some reason, when running in jsonrefine job
[18:40:13] <ottomata>	 the deuplicate_eventlogging function is called
[18:40:17] <ottomata>	 but the dataframe is not deduplicated
[18:40:23] <ottomata>	 but, i'm probably doing somethign silly
[18:41:13] <joal>	 ottomata: there was an issue with the code you sent earlier (test failing) - could be related?
[18:41:27] <ottomata>	 could be!
[18:43:29] <wikibugs>	 (03CR) 10Nuria: [V: 032 C: 032] Delay clickstream monthly generation by 10 days [analytics/refinery] - 10https://gerrit.wikimedia.org/r/409966 (https://phabricator.wikimedia.org/T186180) (owner: 10Joal)
[18:44:47] <wikibugs>	 10Analytics-Kanban, 10Discovery-Analysis, 10Reading-analysis: Pageviews/Stats on dataviz-literacy.wmflabs.org - https://phabricator.wikimedia.org/T187104#3964689 (10mpopov)
[18:48:11] <bearloga>	 nuria_: ticket up https://phabricator.wikimedia.org/T187104 please let me know if my configuration causes any problems
[18:48:18] <nuria_>	 bearloga: k
[18:48:45] <ottomata>	 joal:  btw, reminds me, this look ok to you? https://gerrit.wikimedia.org/r/#/c/405938/
[18:52:55] <wikibugs>	 (03CR) 10Ottomata: "small nit, general +1 make it happen :)" (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/409960 (owner: 10Joal)
[18:54:41] <wikibugs>	 (03CR) 10Joal: [C: 031] "Never tried that action, but looks ok !" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/405938 (https://phabricator.wikimedia.org/T185419) (owner: 10Ottomata)
[18:58:34] <awight>	 ottomata: Thanks!  Just got your earlier answer.  I’ll plan for metadata-only, for politeness :)
[19:20:02] <joal>	 elukey: feeling ready for tomorrow morning?
[19:30:21] <nuria_>	 ta-ta-channnnnnn
[19:35:11] <ottomata>	 oh yaa
[19:43:34] <joal>	 ottomata: I do hope we'll be done when you join
[19:46:22] <wikibugs>	 10Analytics, 10Analytics-EventLogging: uBlock blocks EventLogging - https://phabricator.wikimedia.org/T186572#3964974 (10Tgr) I think most people use ad blockers to prevent ads, and maybe to prevent ad companies from tracking them, not to prevent the websites they use from collecting the data they need to impr...
[19:47:45] <wikibugs>	 10Analytics, 10Analytics-EventLogging: uBlock blocks EventLogging - https://phabricator.wikimedia.org/T186572#3948086 (10Nuria) is there a way to whitelist eventlogging here? I know we tried that before with adblock with not much success
[19:50:49] <wikibugs>	 (03PS7) 10Ottomata: Add TransformFunctions for JsonRefine job [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/407508 (https://phabricator.wikimedia.org/T185237)
[19:50:58] <nuria_>	 bearloga: as far as i  know we have not capture ios data in piwik for wuite some time
[19:51:01] <nuria_>	 *quite
[19:51:51] <nuria_>	 bearloga: is that not the case 
[19:53:46] <bearloga>	 nuria_: no idea! I was told that's where ios data goes but I guess not???
[19:55:13] <nuria_>	 bearloga: hasn't been for a while (as far as i know) initially it did go there but that is no longer the case
[19:55:26] <nuria_>	 bearloga: woudl need to check e-mail
[19:56:31] <nuria_>	 bearloga: I * think*  there is IOS data on EL
[19:57:34] <nuria_>	 bearloga: i see some data there yes, i  am not sure whether is just old apps logging there though
[19:57:42] <wikibugs>	 10Analytics, 10Analytics-EventLogging: uBlock blocks EventLogging - https://phabricator.wikimedia.org/T186572#3965022 (10Tgr) Either reaching out to the EasyPrivacy authors or to the uBlock authors and asking their advice, given that we don't share personal data and have fairly high standards in how we use it....
[19:59:20] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Add TransformFunctions for JsonRefine job [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/407508 (https://phabricator.wikimedia.org/T185237) (owner: 10Ottomata)
[20:00:06] <bearloga>	 nuria_: thank you for clarifying!
[20:24:06] <nuria_>	 bearloga: the user/password for ios piwik data shoudl be known by ios team
[20:24:58] <nuria_>	 bearloga: once you get it you can log in to http://piwik.wikimedia.org
[20:28:46] <nuria_>	 bearloga: added link to doc
[20:29:32] <nuria_>	 bearloga: if  ios team cannot give you access we can probably create a new user
[20:30:25] <nuria_>	 bearloga: triple checked, there is data since Augh 2016
[20:30:28] <nuria_>	 *aug
[20:30:55] <bearloga>	 nuria_: thank you very much!!!
[20:35:01] <wikibugs>	 10Analytics-Kanban, 10Research-landing-page, 10Patch-For-Review: Pageviews/Stats on research.wikimedia.org - https://phabricator.wikimedia.org/T186819#3965121 (10Nuria) Let me know wehn this is deployed, data should flow in the day after (piwik reports are run every few hours)
[20:35:15] <ottomata>	 joal: hey speaking of 'geocoded_data' naming
[20:35:24] <ottomata>	 i have an opportunity right now to name the eventlogging ones something else
[20:35:26] <ottomata>	 should I?
[20:35:27] <ottomata>	 hmmm
[20:36:37] <joal>	 ottomata: Thinking eventlogging is the future - I'd say say yes - But now, past still exists ...
[20:37:08] <ottomata>	 haha yeah
[20:43:56] <wikibugs>	 10Analytics-Kanban, 10Research-landing-page, 10Patch-For-Review: Pageviews/Stats on research.wikimedia.org - https://phabricator.wikimedia.org/T186819#3965137 (10bmansurov) @Nuria it's already live.
[20:45:17] <wikibugs>	 (03CR) 10Milimetric: "This looks like it's working, and it's a nice minimal change.  There is one inconsistency and that's the KMB (capitals) on the dashboard v" [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/409714 (https://phabricator.wikimedia.org/T187010) (owner: 10Nuria)
[21:00:03] <ottomata>	 joal:  ok, you still around?  something is kinda weird
[21:00:09] <joal>	 tell me
[21:00:22] <ottomata>	 bc?
[21:00:26] <joal>	 sure
[21:03:24] <joal>	 CAn't hear you anymore andrew
[21:03:46] <joal>	 ottomata: -^
[21:03:46] <joal>	 is it me ?
[21:35:50] <wikibugs>	 10Analytics, 10EventBus, 10Services (done): ChangeProp workers die if they can't connect to redis - https://phabricator.wikimedia.org/T179684#3965267 (10Pchelolo) 05Open>03Resolved After the deployment of the https://github.com/wikimedia/change-propagation/pull/229 the death spiral bug is not observed an...
[22:32:20] <icinga-wm>	 PROBLEM - Hadoop NodeManager on analytics1058 is CRITICAL: PROCS CRITICAL: 0 processes with command name java, args org.apache.hadoop.yarn.server.nodemanager.NodeManager
[22:33:30] <icinga-wm>	 PROBLEM - Hadoop NodeManager on analytics1031 is CRITICAL: PROCS CRITICAL: 0 processes with command name java, args org.apache.hadoop.yarn.server.nodemanager.NodeManager
[22:37:55] <ebernhardson>	 ottomata[m]: ^ i think i broke a few nodes :( my guess would be whereever python puts temp files is filled up with very large files named tmp*
[23:03:56] <elukey>	 ebernhardson: o/
[23:04:10] <elukey>	 lemme check, just seen the alarms
[23:05:17] <elukey>	 I saw a similar issue this morning on an1029 but didn't find the culprit
[23:05:20] <ebernhardson>	 elukey: chances are if those files didn't get cleaned up there might also be other nodes with overfull tmp dir''s
[23:05:27] <ebernhardson>	 (that didn't manage to fail)
[23:06:26] <nuria_>	 elukey: i am here, i can help  cleaning 
[23:07:02] <elukey>	 ebernhardson: confirmed, a ton of space across nodes :D
[23:09:22] <elukey>	 !log cleaned up tmp files on all analytics hadoop worker nodes, job filling up tmp
[23:09:23] <stashbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log
[23:09:43] <ebernhardson>	 elukey: thanks for fixing, i know it's a bit late there
[23:10:53] <elukey>	 ebernhardson: it is super fine! I didn't think about this failure scenario, do you think that there is the chance to put up some fence to prevent this in the future?
[23:11:23] <elukey>	 possibly some limit in the os itself and/or in hadoop
[23:11:46] <ebernhardson>	 there ought to be something, but i'm not sure what. I don't know if yarn has the ability to monitor disk usage of a process
[23:12:21] <icinga-wm>	 RECOVERY - Hadoop NodeManager on analytics1058 is OK: PROCS OK: 1 process with command name java, args org.apache.hadoop.yarn.server.nodemanager.NodeManager
[23:13:19] <elukey>	 !log manual restart of Yarn Node Managers on analytics1058/31
[23:13:19] <ebernhardson>	 from my side i tihnk there might be a per-application directory for putting things instead of a global temp dir which should get cleaned up better
[23:13:20] <stashbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log
[23:13:40] <icinga-wm>	 RECOVERY - Hadoop NodeManager on analytics1031 is OK: PROCS OK: 1 process with command name java, args org.apache.hadoop.yarn.server.nodemanager.NodeManager
[23:14:16] <elukey>	 ebernhardson: do you mind (whenever you have time) to open a task about it? 
[23:14:22] <ebernhardson>	 elukey: sure i'll open one
[23:14:39] <elukey>	 that would be awesome, so we'll have a complete description of how to trigger this
[23:16:59] <elukey>	 !log re-run webrequest-load-wf-upload-2018-2-12-21 via Hue (node managers failure)
[23:17:00] <stashbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log
[23:18:47] <elukey>	 all right everything looks good as far as I can see!
[23:19:01] <elukey>	 ebernhardson: going offline, will re-check tomorrow! 
[23:19:41] <elukey>	 nuria_: sorry just seen the ping! As far as I can see all good, maybe some processes to restart later on if any
[23:19:43] <SMalyshev>	 elukey, ottomata[m]: kafka question. Change streams are not accessible outside production network, right? Do we plan any change streams that would be? 
[23:20:22] <nuria_>	 SMalyshev: change streams? 
[23:20:46] <SMalyshev>	 nuria_: like mediawiki.revision-create etc.
[23:21:08] <nuria_>	 SMalyshev: they are available as eventstreams
[23:21:18] <nuria_>	 SMalyshev: not all just some
[23:21:31] * elukey off!
[23:21:31] <nuria_>	 SMalyshev: see: https://blog.wikimedia.org/2017/03/20/eventstreams/
[23:21:36] <nuria_>	 SMalyshev: makes sense?
[23:22:00] <SMalyshev>	 nuria_: yeah but those are not useful for me, it's not kafka and not seekable
[23:22:20] <nuria_>	 SMalyshev: you mean then the kafka endpoints "raw"?
[23:22:36] <SMalyshev>	 nuria_: well, any kafka topic that contains the changes
[23:22:49] <SMalyshev>	 not necessarily the whole original one
[23:23:11] <nuria_>	 SMalyshev: right, thsu far we do not have any plans to have kafka consumable outside prod 
[23:23:16] <nuria_>	 *thus
[23:24:10] <SMalyshev>	 so this means for wdqs if I make kafka consumer it is not possible to use it in labs or by anybody who is not in our production cluster. So I wonder if there's some solution. now or in the future, that we could have for it?
[23:24:30] <nuria_>	 SMalyshev: there is kafka in labs but w/o prod data
[23:24:40] <SMalyshev>	 right, so that doesn't solve it
[23:24:50] <nuria_>	 SMalyshev: right, thsoe are separate things
[23:25:07] <nuria_>	 SMalyshev: production data does not exist in labs other than via teh db replicas
[23:25:07] <SMalyshev>	 right now RC API is accessible to all, but it's not a very good API
[23:25:27] <nuria_>	 SMalyshev: the kafka in labs has events 
[23:25:34] <nuria_>	 SMalyshev: but they are from deployment prep
[23:25:39] <SMalyshev>	 nuria_: well I don't need all production data I need only the changes
[23:26:11] <nuria_>	 SMalyshev: but from the prod db right?  Or are you trying to test for which  "testing" events are sufficient?
[23:26:47] <SMalyshev>	 nuria_: no, I am talking about if I want to run actual feed, but not in WMF production, would I have any options besides RC API
[23:26:53] <nuria_>	 SMalyshev: i see
[23:26:58] <SMalyshev>	 not for testing but real Wikidata changes
[23:27:14] <SMalyshev>	 testing is separate thing
[23:28:18] <nuria_>	 SMalyshev: for "consuming" events or "producing " events?
[23:28:34] <nuria_>	 SMalyshev: or both?
[23:28:53] <SMalyshev>	 consuming only
[23:29:03] <SMalyshev>	 I assume Wikidata would produce them :)
[23:29:07] <nuria_>	 SMalyshev: then RC is the only answer 
[23:29:23] <nuria_>	 SMalyshev: for the near term (it is our plan to migrate RC on top of kafka) 
[23:29:59] <nuria_>	 SMalyshev: but we do not plan to expose kafka raw events in the near term, seems a bit raw for consumer facing events no?
[23:30:07] <SMalyshev>	 hmm what do you mean migtate RC on top of Kafka?
[23:30:25] <nuria_>	 SMalyshev: RC will work as it does now
[23:30:37] <nuria_>	 SMalyshev: but instead of being power by its current stream of data 
[23:31:03] <SMalyshev>	 nuria_: I don't really need everything there... in fact, I need just page title and revision ID and event type (and timestamp but that comes for free with Kafka)
[23:31:06] <nuria_>	 SMalyshev: it will be powered by data in a kafka topic
[23:31:54] <nuria_>	 SMalyshev: https://phabricator.wikimedia.org/T185319
[23:32:36] <nuria_>	 SMalyshev: we do not plan to expose kafka "raw" for the outside world which i think (if i understand you correctly) is what you are asking
[23:32:37] <SMalyshev>	 nuria_: well, the API there is still far from ideal and I am not sure how it would reconcile all the timestamps (right now events are not sequential by any of the criteria)
[23:32:52] <SMalyshev>	 and also RC does not have many things like page prop changes
[23:33:00] <SMalyshev>	 so it won't be permanent solution
[23:33:21] <nuria_>	 SMalyshev: right, not saying that it is, it is a better backend for what we have
[23:33:47] <SMalyshev>	 nuria_: what about not raw? E.g. if I had a stream processor that would filter it (from raw topics to stream having just ids+revisions or so)? 
[23:34:05] <SMalyshev>	 is there a good way to make that thing public?
[23:34:43] <nuria_>	 SMalyshev: I  do not think so (at this time) . I can triple check with ottomata[m] tomorrow but even event streams was not that easy to come by
[23:35:12] <nuria_>	 SMalyshev: and we have no plans to expand kafka for the outside world to consume  directly from it
[23:35:27] <nuria_>	 SMalyshev: i think yours might be the 1st request about it 
[23:35:47] <nuria_>	 SMalyshev: we get more requests for more "consumable" endpoints (like event streams) though
[23:35:54] <SMalyshev>	 well yeah maybe it's kinda special case, esp. given that usual event stream is not enough
[23:39:52] <wikibugs>	 10Analytics, 10Analytics-Cluster: Hadoop jobs that generate large temporary files can take down nodes - https://phabricator.wikimedia.org/T187139#3965657 (10EBernhardson)
[23:41:11] <SMalyshev>	 nuria_: also, I am trying to connect to deployment-kafka-jumbo-1  on labs, and not getting anything - only timeouts and failure messages. is there any docs how to work with it?
[23:41:26] <SMalyshev>	 e.g. kafkacat -L -b deployment-kafka-jumbo-1.eqiad.wmflabs:9092
[23:42:03] <nuria_>	 SMalyshev: elukey and ottomata just set it up to test some upgrades so it might be totally dead, it is our testing environment so ya, it might be dead at times
[23:42:27] <wikibugs>	 10Analytics, 10Analytics-Cluster: Hadoop jobs that generate large temporary files can take down nodes - https://phabricator.wikimedia.org/T187139#3965669 (10EBernhardson)
[23:42:33] <nuria_>	 SMalyshev: also disk fills up a ton
[23:43:05] <nuria_>	 SMalyshev: as labs infrastructure is not made for large volumes of data
[23:43:12] <SMalyshev>	 ok, I see :) I'll try to ping one of them then. In general would be nice to have some way to test these things in non-production
[23:43:14] <nuria_>	 SMalyshev: we can probably look at it tomorrow 
[23:43:21] <SMalyshev>	 nuria_: great, thanks!
[23:43:59] <nuria_>	 SMalyshev: if we are not using the cluster is  probably ok but again, nothing somewhat big-daty is easy to test in labs
[23:44:23] <SMalyshev>	 nuria_: oh I know :) I have wdqs with 300G+ data :)
[23:44:25] <wikibugs>	 10Analytics, 10Analytics-Cluster: Hadoop jobs that generate large temporary files can take down nodes - https://phabricator.wikimedia.org/T187139#3965657 (10EBernhardson) From the application side it seems like i could perhaps put the temp files in a better place. One option would be PWD of the application whi...
[23:44:36] <nuria_>	 Steinsplitter: jaja ok
[23:45:19] <SMalyshev>	 nuria_: but testing on beta wikis probably would be ok though not ideal
[23:45:56] <SMalyshev>	 nuria_: is test.wikidata and so on hooked up to that kafka setup too?
[23:45:57] <nuria_>	 SMalyshev: ya, we do it for other systems like eventlogging, we just had the kafka cluster running in labs with deploy wiki events
[23:46:49] <SMalyshev>	 or /wikidata.beta.wmflabs.org ? 
[23:48:38] <nuria_>	 SMalyshev: mmm, those must have come too , yes , I cannot imagine why they wouldn't have 
[23:49:00] <SMalyshev>	 ok cool I'll wait then until the whole thing is up
[23:58:37] <nuria_>	 tgr : have not forgotten about safari referrals 
[23:58:42] <nuria_>	 tgr: need to look at those