[00:00:56] <wikibugs_>	 10Analytics-Cluster, 10Analytics-Kanban, 10Language-Team, 10MediaWiki-extensions-UniversalLanguageSelector, and 3 others: Migrate table creation query to oozie for interlanguage links - https://phabricator.wikimedia.org/T170764#3441928 (10Milimetric) Ready to review, @Amire80.  Please take a look at the ta...
[05:44:27] <wikibugs_>	 10Analytics, 10DBA: Drop MoodBar tables from all wikis - https://phabricator.wikimedia.org/T153033#3659536 (10Marostegui) >>! In T153033#3658843, @demon wrote: >>>! In T153033#3656161, @Marostegui wrote: >> Just to be clear, you are talking about dbstore1002/db1047? >> We also have to keep in mind that there a...
[07:16:02] <wikibugs_>	 10Analytics, 10DBA: Drop MoodBar tables from all wikis - https://phabricator.wikimedia.org/T153033#3659635 (10Marostegui) I have checked the tables across the masters:  s1: has data on `enwiki` s2:  has data only on: `nlwiki` s3: has data only on: ``` frwikisource incubatorwiki itwikivoyage sewikimedia tawiki...
[07:48:10] <wikibugs_>	 10Analytics, 10DBA: Drop MoodBar tables from all wikis - https://phabricator.wikimedia.org/T153033#3659653 (10demon) Testwiki we can drop for sure. So that just leaves 7 total wikis with viable data. Farrrrrrrr better.
[07:52:46] <wikibugs_>	 10Analytics, 10DBA: Drop MoodBar tables from all wikis - https://phabricator.wikimedia.org/T153033#3659668 (10Marostegui) Thanks @demon! I will exclude testwiki from the list of wikis the tables need to be imported from
[08:34:57] <wikibugs_>	 10Analytics, 10User-Elukey: Add the prometheus jmx exporter to all the Hadoop daemons - https://phabricator.wikimedia.org/T177458#3659718 (10elukey)
[08:40:38] <wikibugs_>	 10Analytics, 10User-Elukey: Add the prometheus jmx exporter to all the Druid daemons - https://phabricator.wikimedia.org/T177459#3659740 (10elukey)
[08:41:14] <wikibugs_>	 10Analytics, 10User-Elukey: Add the prometheus jmx exporter to all the Zookeeper daemons - https://phabricator.wikimedia.org/T177460#3659754 (10elukey)
[08:42:13] <wikibugs_>	 10Analytics-Cluster, 10Analytics-Kanban, 10monitoring, 10Patch-For-Review, 10User-Elukey: Use Prometheus for Kafka JMX metrics instead of jmxtrans - https://phabricator.wikimedia.org/T175922#3659769 (10elukey)
[08:42:15] <wikibugs_>	 10Analytics, 10User-Elukey: Move away from jmxtrans in favor of prometheus jmx_exporter - https://phabricator.wikimedia.org/T175344#3590888 (10elukey)
[09:34:12] <wikibugs_>	 (03PS2) 10Fdans: Add top articles by pageviews metric [analytics/wikistats2] (develop) - 10https://gerrit.wikimedia.org/r/382139 (https://phabricator.wikimedia.org/T175266)
[09:36:04] <wikibugs_>	 (03CR) 10Fdans: "@mforns thank you so much for such a great review! I added all the changes you've suggested. Big fan of linking directly to the articles, " (033 comments) [analytics/wikistats2] (develop) - 10https://gerrit.wikimedia.org/r/382139 (https://phabricator.wikimedia.org/T175266) (owner: 10Fdans)
[09:36:53] <wikibugs_>	 (03CR) 10jerkins-bot: [V: 04-1] Add top articles by pageviews metric [analytics/wikistats2] (develop) - 10https://gerrit.wikimedia.org/r/382139 (https://phabricator.wikimedia.org/T175266) (owner: 10Fdans)
[09:43:20] <elukey>	 Running errand for a bit + early lunch people, ttl! 
[10:52:13] <joal>	 Hi a-team - Back in half-half mode - Will spend my day catching up and finishing beginning-of-quarter paperwork
[11:12:04] * elukey waves to joal 
[11:12:13] <joal>	 \o
[11:13:28] <elukey>	 joal: how are you feeling?
[11:14:07] <joal>	 elukey: As if an elephant has been walking over my forehead and belly - kinda
[11:14:51] <joal>	 elukey: But it's great, I can do things today, I'm not not anymore in between batroom for me, then for Lino, then caring Naé etc
[11:19:29] <wikibugs_>	 (03CR) 10Joal: [C: 04-1] "Comments inside. -1 because of path error." (036 comments) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/365517 (https://phabricator.wikimedia.org/T170764) (owner: 10Amire80)
[11:19:47] <joal>	 elukey: How has it been goind this week?
[11:22:02] <elukey>	 joal: argh not a nice week for you, so sorry :( glad that you feel better now!
[11:22:11] <elukey>	 the week has been positive and negative so far
[11:22:22] <elukey>	 do you want good or bad news first ?:D
[11:22:25] <joal>	 hm - not sure what it means
[11:22:28] <joal>	 hm, not sure either :)
[11:22:33] <joal>	 you decide
[11:23:15] <elukey>	 so Kafka Jumbo is now configured with basic ACLs and related logging, everything's working fine and we are only missing TLS certificate management to start the game :)
[11:23:29] <joal>	 Yay, that seems like a good news :)
[11:24:02] <elukey>	 I found a way to expose druid metrics to prometheus but it is a bit weird and I'd need to know what you think about it, but not super urgent
[11:24:37] <elukey>	 Andrew yesterday tried to merge the change to add load balancing to Druid but for various network reasons that he'll explain everything was reverted
[11:25:00] <elukey>	 and from what I've read there is a fair amount of work to do if we want druid + load balancing
[11:25:15] <elukey>	 (that was the not-so-good one :P)
[11:25:27] <joal>	 mwarf
[11:27:47] <joal>	 Thanks for the heads up elukey 
[11:32:12] <elukey>	 I also have some ideas for Hadoop and JVMs, but everything can wait few days
[11:33:04] <elukey>	 see joal I miss our daily talks in the morning, otherwise I have to keep all my non-sense ideas for me and I feel bored
[11:33:15] <joal>	 :)
[11:33:53] <elukey>	 fdans doesn't like me so he avoid any chat until standup
[11:34:21] <elukey>	 and Marcel wasn't feeling well this week too
[11:34:38] <joal>	 hm, hard week for a-team:(
[11:34:40] <elukey>	 :P
[11:34:40] <fdans>	 nooooo lucaaaa
[11:34:43] <elukey>	 hahaahah
[11:34:59] <joal>	 elukey: about druid, LVS not being working also means us not starting to split clusters, right?
[11:35:21] <fdans>	 especially now, i’ve been doing your hours lately elukey
[11:35:27] <fdans>	 hellooooo joal
[11:36:18] <joal>	 Heya fdans 
[11:38:23] <milimetric>	 Always hard when you're not around jo :)
[11:38:36] <elukey>	 fdans: <3
[11:39:01] <joal>	 Hi milimetric 
[11:39:04] <elukey>	 joal: correct.. I mean we can split the clusters but we need to take some decisions first, Andrew will probably outline all of them during standup
[11:55:56] <wikibugs_>	 (03PS3) 10Fdans: Add top articles by pageviews metric [analytics/wikistats2] (develop) - 10https://gerrit.wikimedia.org/r/382139 (https://phabricator.wikimedia.org/T175266)
[11:56:29] <fdans>	 I think this was nuria_ and milimetric 's plan from the beginning
[11:57:05] <fdans>	 really dislike gerrit => suggest we get a super worse solution => now love gerrit
[11:58:57] <milimetric>	 we've been found out!!!  Quick, hide!
[11:59:29] <joal>	 :)
[12:20:08] <elukey>	 question for the druid masters
[12:20:23] <elukey>	 I am on druid1001 and I am trying to make sense of the logs in /var/log
[12:21:54] <elukey>	 something like 2017-10-05T00:00:00.000Z should collect all the queries right?
[12:22:21] <joal>	 I dont understand that last part elukey 
[12:23:01] <elukey>	 there is a log named "2017-10-05T00:00:00.000Z" 
[12:23:03] <elukey>	 :D
[12:23:32] <joal>	 Ah
[12:23:34] <elukey>	 and from http://druid.io/docs/0.9.2/configuration/index.html I don't see a way to name it differently
[12:23:41] <elukey>	 can only see druid.request.logging.dir
[12:24:56] <elukey>	 then one log for each daemon
[12:25:04] <elukey>	 with content tagged as Event feed
[12:26:14] <joal>	 well elukey- looks like you know even more than I do :)
[12:26:41] <elukey>	 nono I am dumping things in here hoping that somebody will enlight me :D
[12:26:59] <elukey>	 why am I doing it? I know it looks weird but I need a way to find druid metrics for prometheus :P
[12:28:14] <joal>	 elukey: what weould be interesting metrics?
[12:30:24] <elukey>	 joal: there are the metrics available - http://druid.io/docs/0.9.2/operations/metrics.html
[12:34:26] <elukey>	 if they are not needed I can only grab the jvm ones from the mbeans via jmx (only ones available grrr)
[12:44:06] <joal>	 elukey: I don't think I have seen any of those metrics yet, therefore it's difficult to say
[12:44:53] <elukey>	 joal: do they look useful from the description? (trying to get an idea if it is worth or not)
[12:45:23] <joal>	 elukey: I think I'd like broker-query-time and cahe-oriented data (almost all of them)
[12:45:46] <joal>	 elukey: The thing is, I don't know if they are individual per query, aggregated, and how they look like
[12:46:00] <joal>	 elukey: Best could be to enable then, and have a look?
[12:51:03] <elukey>	 I think that they are already enabled, but dumped in log files with text around
[12:51:47] <elukey>	 druid only emits data, and the "Recipients" can be logfile, http and graphite
[12:51:51] <joal>	 Ah, ok
[12:51:55] <elukey>	 with http it sends a post to and endpoint
[12:59:51] <wikibugs_>	 10Analytics, 10Patch-For-Review, 10Release-Engineering-Team (Kanban): Move Wikistats 2 from Differential to Gerrit - https://phabricator.wikimedia.org/T177288#3660553 (10fdans) a:05hashar>03fdans
[13:14:39] <elukey>	 fdans: just did git checkout release on thorium for wikistats2, and merged your code review (plus checked the puppet run)
[13:15:37] <fdans>	 elukey:  thank you ma friend :)
[13:43:24] <milimetric>	 fdans: did you end up using release as the "production" branch?
[13:43:42] <fdans>	 yes, that's the patch that elukey just merged
[13:43:49] <fdans>	 also updated docs to reflect that
[13:43:51] <milimetric>	 ok, cool
[13:44:21] <milimetric>	 just caught your chat with N too late last night, and wasn't sure
[13:44:58] <wikibugs_>	 10Analytics-Kanban, 10Patch-For-Review, 10Release-Engineering-Team (Kanban): Move Wikistats 2 from Differential to Gerrit - https://phabricator.wikimedia.org/T177288#3660665 (10fdans)
[13:46:19] <milimetric>	 elukey: I see druid finished the daily job you restarted.  Do you think it's healthy enough to run the monthly or should I still wait?
[13:47:14] <wikibugs_>	 10Analytics: Alert user about adblocker preventing AQS requests - https://phabricator.wikimedia.org/T177491#3660684 (10fdans)
[13:47:16] <elukey>	 milimetric: not a super expert but I'd say yes, the failure looked random
[13:47:30] <wikibugs_>	 10Analytics, 10Analytics-Wikistats: Alert user about adblocker preventing AQS requests - https://phabricator.wikimedia.org/T177491#3660698 (10fdans)
[13:48:02] <milimetric>	 k
[13:48:39] <milimetric>	 !log restarted banner_activity-druid-monthly for September again
[13:48:41] <stashbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log
[13:54:37] <elukey>	 ottomata: o/
[13:56:33] <elukey>	 I am attempting to have a single file containing only json metrics from druid with https://gerrit.wikimedia.org/r/#/c/382452/1/modules/druid/templates/log4j2.xml.erb
[13:56:51] <elukey>	 that could be completely wrong of course, me and log4j are not close friends
[14:18:38] <elukey>	 tested the change on druid1006's broker, works like a charm
[14:18:59] <elukey>	 there is now a file called /var/log/druid/broker-metrics.log with json stuff in there
[14:19:23] <elukey>	 meanwhile /var/log/druid/broker.log contains only the other generic logging
[14:19:34] <elukey>	 does it sound reasonable?
[14:19:55] <halfak>	 o/ analytics folks! 
[14:20:09] <elukey>	 hey 
[14:20:13] <halfak>	 I'm trying to transfer a big file from a university computer onto stat1006 through SCP
[14:20:31] <halfak>	 It looks like I can't connect to the IP address from the stat machine but I can from my local computer.
[14:20:42] <halfak>	 The files are too big to pass through my machine. 
[14:20:45] <halfak>	 What do?
[14:21:17] <ottomata>	 elukey:  sounds awesome!
[14:21:29] <elukey>	 \o/
[14:23:11] <ottomata>	 halfak:  hm.
[14:23:26] <elukey>	 surely the analytics firewall doesn't help
[14:23:28] <ottomata>	 quick idea:  can you set up a v simple http server that you could reach on that IP?
[14:24:00] <halfak>	 ottomata, I was thinking about that but then I'll be fighting with a university to open a port for that. 
[14:24:01] <elukey>	 and then got throught the proxy, good idea indeed
[14:24:30] <halfak>	 So either I fight with us or fight with an org I have no official affiliation with :/
[14:24:46] <halfak>	 You know.  What the heck.  I'm going to try.  Maybe it's already open 
[14:25:08] <halfak>	 It shouldn't be but, shouldn'ts are often ignored in university operations :) 
[14:26:02] <elukey>	 halfak: I haven't played with it but maybe the ProxyCommand ssh config might help
[14:27:02] <elukey>	 basically doing scp from university to stat1006 but having your local pc as proxy
[14:27:23] <ottomata>	 halfak:  aye
[14:27:49] <ottomata>	 yah, but then bytes are still going through his computer, i guess ok cause he's not storing, but still weird
[14:27:50] <halfak>	 elukey, that could work.  I don't like the idea of wasting that bandwidth.  The University's VMs are probably really geographically local while I'm not. 
[14:27:55] <halfak>	 But in the end, if that is what's needed. 
[14:27:55] <ottomata>	 yeah
[14:28:02] <halfak>	 I think I'm transfering something like a 1TB
[14:28:13] <halfak>	 My internet provider is going to destroy me. 
[14:28:28] <ottomata>	 elukey:  could halfak make another ssh private key from his uni account?  maybe just temporary?
[14:28:45] <ottomata>	 then we could allow access from that key, and he could do it the other way around: push from uni server
[14:29:11] <halfak>	 Oh!  I already have that temp key :) 
[14:29:21] <halfak>	 I've been using it just for accessing this one server. 
[14:32:37] <elukey>	 in theory it could be possible to forward the agent to the univ host and use it to authenticate with stat1006, but this wouldn't be super great since anybody with root on univ host will be able to get halfak's ssh agent
[14:33:48] <elukey>	 let's see if moritzm can help 
[14:34:07] <elukey>	 hi! Sorry to ping you but we'd need a consult for a probably simple scp issue
[14:35:47] <halfak>	 Maybe I could just connect to a remote ssh host just this once :D 
[14:36:02] <halfak>	 Seems like the easiest solution :)  
[14:36:06] <moritzm>	 sure, let me read backlog
[14:36:11] <halfak>	 No sharing my private key :D  
[14:36:17] <halfak>	 thanks moritzm :) 
[14:37:14] <halfak>	 Reiterating.  I need to transfer about 1TB of data from a University machine to stat1006.  I'd like to SCP but can't from a stat machine.  I've just confirmed that all ports but 22 are blocked by the University in question. 
[14:38:45] <elukey>	 ottomata: anything against me merging https://gerrit.wikimedia.org/r/#/c/382452/2/modules/druid/templates/log4j2.xml.erb and rolling restart druid ?
[14:41:15] <moritzm>	 halfak: not sure I fully understand, are they blocking outgoing, incoming SSH or both? scp $BIGFILE stat1006:~/ fails from you the university host fails?
[14:51:59] <halfak>	 moritzm, We are blocking outgoing ssh connections 
[14:52:20] <halfak>	 moritzm, I don't want to put my private key on the university's servers. 
[14:52:27] <halfak>	 But I could if you'd rather I do that ;) 
[14:52:57] <halfak>	 It'd be a temp key I'd put in place just for this. 
[14:54:21] <ottomata>	 elukey:  nope please do!
[14:54:38] <ottomata>	 that is a good idea, i've been doing grep -v to not have to look at those metrics in logs
[14:54:48] <elukey>	 super
[14:57:01] <milimetric>	 ]'
[14:57:07] <milimetric>	 heh, sorry, elukey wait
[14:57:18] <milimetric>	 maybe let's give this indexing job a chance to finish?
[14:57:59] <elukey>	 ahahha sure sure
[14:58:01] <milimetric>	 https://hue.wikimedia.org/jobbrowser/jobs/job_1504006918778_137686/single_logs
[14:58:21] <elukey>	 I wanted to do it after standup
[14:58:23] <milimetric>	 actually, elukey it's just in hadoop now so if you do it fast you can probably do it before the indexing starts
[14:58:24] <elukey>	 I'll merge in the meantime
[14:58:27] <elukey>	 nono
[14:58:31] <elukey>	 not in a hurry
[14:58:41] <milimetric>	 but indexing might take a while
[14:58:43] <moritzm>	 ah, so your concern is that the university machine is a centrally managed host where someone else is root. that's certainly not great, don't you have the possibility to use your notebook in the uni network and run the transfer from there?
[15:07:33] <wikibugs_>	 10Analytics-Kanban, 10Operations, 10Patch-For-Review, 10User-Elukey: Tune Kafka logs to register clients connected - https://phabricator.wikimedia.org/T173493#3661052 (10elukey)
[15:08:16] <wikibugs_>	 10Analytics-Kanban, 10Analytics-Wikistats: Handle long project names in Wikiselector - https://phabricator.wikimedia.org/T173373#3661057 (10mforns)
[15:08:27] <wikibugs_>	 10Analytics-Kanban, 10Patch-For-Review: Replace references to dbstore1002 by db1047 in reportupdater jobs - https://phabricator.wikimedia.org/T176639#3661058 (10mforns)
[15:11:06] <halfak>	 "notebook in the uni network"?
[15:11:09] <halfak>	 moritzm, ^ 
[15:11:17] <halfak>	 Oh!  like physically be present.  
[15:11:29] <halfak>	 No.  not the uni I'm affiliated with.  Would be a long flight. 
[15:11:51] <halfak>	 No one there has analytics cluster access. 
[15:13:01] <wikibugs_>	 (03PS1) 10Fdans: Merge branch 'develop' into master [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/382466 (https://phabricator.wikimedia.org/T177288)
[15:13:06] <elukey>	 one of the big issue that I am seeing now is that stat1006 is in the analytics network and we have special rules on our routers to block traffic going outside the analytics network on port 22
[15:13:15] <elukey>	 only few target hosts are allowed
[15:13:37] <halfak>	 No history of short-term exceptions? 
[15:13:55] <halfak>	 It seems like this would be the most secure strategy for getting the files moved. 
[15:14:11] <elukey>	 we could think about it, I don't see a big issue, I need to figure out though if this would be enough :D
[15:14:21] <halfak>	 Gotcha.  :) 
[15:14:26] <elukey>	 (I mean, if then we'll have other blockers)
[15:14:31] <moritzm>	 so the correct and clean way would be to sync this to a machine you trust (like your notebook) and then sync to stat1006 from that host. however, I can see that this is somewhat impractical if your ISP connection is slow
[15:14:35] <elukey>	 how urgent it is?
[15:15:20] <moritzm>	 adding an additional temporary key for the transfer from that uni host to stat1006 seems fine to me in terms of risk assessment
[15:15:37] <moritzm>	 but I think this would need to be signed off by Mark
[15:15:53] <moritzm>	 since it's a slight violation of our best practices/guidelines
[15:15:54] <halfak>	 Hmm...  The urgency really comes from how this is blocking me from other work.  If I can't get it done this week, it might never happen and that would crash a Research Team goal unless someone else picks it up. 
[15:16:12] <halfak>	 moritzm, agreed.  If we could do that, I can get the xfer started in short order. 
[15:16:44] <halfak>	 But yeah, kind of insecure.  One option is to delete the key as soon as I start the xfer.  That way someone would at least need to dig through memory to find it. 
[15:16:55] <halfak>	 Maybe it's even discarded from memory after the handshake?
[15:17:06] <halfak>	 Narrow that window :) 
[15:17:13] <halfak>	 DarTar, speak of the devil
[15:17:17] <halfak>	 :D  
[15:17:37] <halfak>	 Trying to figure out the urgency of transfering meen chul's ref's dataset to stat machines/dataset hosting. 
[15:17:49] <moritzm>	 actually, yes you can do that: ssh-agent has the brillantly named "ssh-add -D KEY" command to remove a key from the SSH agent
[15:17:52] <halfak>	 It's going to take some operations work to make it happen. 
[15:18:27] <DarTar>	 hey folks, I gotta head off to the office in a moment, bit hectic here with the girls, I’ll be back later
[15:18:32] <halfak>	 kk 
[15:19:08] <moritzm>	 halfak: I'd say ping Mark on IRC to get his okay and if that's fine Otto or someone from the US opsen should be able to add your temp key during US day time
[15:19:46] <halfak>	 Oh.  Uhh... maybe it'd be better to have a opsen explain it to Mark.  I'm gonna be confusing if I explain the plan. 
[15:22:22] <elukey>	 moritzm: super newbie question - say that we'd whitelist the univ external IP addres for ssh on the analytics cr1/cr2 firewall rules, would it be ok for halfak to cp its temp ssh key on stat1006 (the one to log in to the univ), make the transfer and then delete it? Or completely stupid plan due to other reasons? (like no way scp could grab data from outside to production)
[15:23:37] <moritzm>	 this could be done, but wouldn't have much benefit over going through the bastions/scp from what I can tell
[15:23:55] <moritzm>	 that uni should have good outgoing bandwidth I guess
[15:24:26] <elukey>	 ah so we want to do the other way around from univ to stat1006, got it
[15:24:29] <elukey>	 thanks :)
[15:25:29] <moritzm>	 elukey: yeah, I think that's ok. could you ping Mark and check with him? I'm afk for a bit now, playground and bringing the kids to bed
[15:25:49] <halfak>	 thanks to both of you for your help on this :) 
[15:26:18] <elukey>	 sure :)
[15:48:18] <milimetric>	 elukey: that job died anyway - so cluster's all yours
[15:48:32] <milimetric>	 joal said it was an actual problem with data, I'll follow up with him
[15:48:45] <elukey>	 super
[15:49:25] <milimetric>	 joal: I was looking at this, but it sounds like the errors you mention are logged somewhere else? https://hue.wikimedia.org/oozie/list_oozie_workflow/0061747-170829140538136-oozie-oozi-W/
[15:49:52] <wikibugs_>	 10Analytics, 10ChangeProp, 10EventBus, 10MediaWiki-JobQueue, 10Services (designing): Split ChangeProp metrics by wiki - https://phabricator.wikimedia.org/T175952#3661258 (10fgiunchedi) (apologies about the delay, I completely missed this!)  Yeah it is likely statsd isn't going to like a 800x increase. Go...
[15:49:52] <joal>	 milimetric: looking for links - will answer soon
[15:50:38] <ottomata>	 halfak:  qq, how are those ores schemas?
[15:50:50] <ottomata>	 (not htat I have time to work on it :) )
[15:50:50] <joal>	 milimetric: culprit is that one: https://hue.wikimedia.org/oozie/list_oozie_workflow/0011451-170829140538136-oozie-oozi-W/?coordinator_job_id=0009900-170228165458841-oozie-oozi-C
[15:51:02] <halfak>	 ottomata, still fighting with awight about them.  Nothing formalized in jsonschema yet
[15:51:16] <halfak>	 Oh wait you are asking about scores. 
[15:51:17] <joal>	 milimetric: That day has failed (data is available thanks to Spark)
[15:51:30] <halfak>	 Deployment estimate is Oct 15th.  I can get it up on wmflabs now though. 
[15:51:39] <halfak>	 Was planning to do that deployment today ^_^
[15:52:05] <wikibugs_>	 (03PS1) 10Milimetric: Add am.wikimedia to the whitelist [analytics/refinery] - 10https://gerrit.wikimedia.org/r/382471
[15:52:41] <ottomata>	 ah k, no hurry, just was curious because i want to get that in eventstreams
[15:52:46] <ottomata>	 buuut i really don't have time for a while to work on it :
[15:52:47] <ottomata>	 :/
[15:53:05] <wikibugs_>	 (03CR) 10Milimetric: [V: 032 C: 032] Add am.wikimedia to the whitelist [analytics/refinery] - 10https://gerrit.wikimedia.org/r/382471 (owner: 10Milimetric)
[16:04:36] <wikibugs_>	 10Analytics-Kanban, 10Operations: LVS for Druid - https://phabricator.wikimedia.org/T177511#3661358 (10Ottomata)
[16:08:42] <wikibugs_>	 10Analytics-Kanban, 10Analytics-Wikistats, 10Patch-For-Review: Create Druid public cluster such AQS can query druid public data - https://phabricator.wikimedia.org/T176223#3661390 (10Ottomata)
[16:11:23] <nuria_>	 milimetric: coming? 
[16:11:26] <nuria_>	 joal: coming?
[16:12:04] <joal>	 yes
[16:18:04] <wikibugs_>	 10Analytics: R execution on stat1005 -> 'stack smashing error' - https://phabricator.wikimedia.org/T174946#3661424 (10fdans) p:05Triage>03Low
[16:20:11] <wikibugs_>	 10Analytics: Make Spark 2.1 easily available on new CDH5.10 cluster - https://phabricator.wikimedia.org/T158334#3661446 (10Nuria)
[16:20:13] <wikibugs_>	 10Analytics-Cluster, 10Analytics-Kanban, 10Patch-For-Review: Make tranquility work with Spark - https://phabricator.wikimedia.org/T168550#3661445 (10Nuria)
[16:25:59] <wikibugs_>	 10Analytics: Create small sample mediawiki-history table in MariaDB - https://phabricator.wikimedia.org/T165309#3262603 (10fdans) Let's sqoop out about a month of data from the mediawiki history. Also docs need to be updated to point out that this data is available.
[16:26:57] <wikibugs_>	 10Analytics: Create small sample mediawiki-history table in MariaDB - https://phabricator.wikimedia.org/T165309#3661485 (10fdans)
[16:27:08] <wikibugs_>	 10Analytics-Kanban: Create small sample mediawiki-history table in MariaDB - https://phabricator.wikimedia.org/T165309#3262603 (10fdans)
[16:30:31] <wikibugs_>	 10Analytics: Make Spark 2.1 easily available on new CDH5.10 cluster - https://phabricator.wikimedia.org/T158334#3661510 (10fdans)
[16:34:02] <wikibugs_>	 10Analytics-Cluster, 10Analytics-Kanban, 10Patch-For-Review: Port Kafka alerts from check_graphite to check_prometheus - https://phabricator.wikimedia.org/T175923#3661520 (10Ottomata)
[16:38:21] <wikibugs_>	 10Analytics, 10Analytics-Cluster, 10Patch-For-Review: cdh::hadoop::directory (and other hdfs puppet command?) should quickly check if namenode is active before executing - https://phabricator.wikimedia.org/T130832#3661544 (10fdans)
[16:43:33] <wikibugs_>	 10Analytics, 10Discovery-Analysis: Get 'sparklyr' working on stats1005 - https://phabricator.wikimedia.org/T139487#2433802 (10fdans) We'll work on this after T158334 is completed
[16:45:39] <joal>	 Gone for diner, back after
[16:45:53] <nuria_>	 milimetric: will CR amir's work today
[16:45:58] <nuria_>	 *job, rather
[16:46:57] <milimetric>	 thanks nuria_, appreciated, I know that's likely to slip up in the avalanche of work we have otherwise
[16:51:35] <wikibugs_>	 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: dbstore1002 /srv filling up - https://phabricator.wikimedia.org/T168303#3661618 (10jcrespo) There is now 1TB available, but we have now a memory (swap) problem: https://grafana.wikimedia.org/dashboard/db/server-board?refresh=1m&orgId=1&var-server=dbstore...
[16:51:47] <DarTar>	 halfak, elukey: sounds like it’s solved? 
[16:53:08] <elukey>	 DarTar: techically it should be doable, but to be super strict and follow rules it might be better to open a phab task with all the details and ask Mark/Faidon to approve it
[16:53:54] <elukey>	 so if you guys could create one we'd just need to wait for the confirmation to proceed
[16:54:00] <DarTar>	 elukey: gotcha, I’ll create one with halfak 
[16:54:04] <halfak>	 DarTar: this is going beyond my allocated time for this work. 
[16:54:05] <halfak>	 Perfect!
[16:54:21] <halfak>	 I'll keep working so long as I don't need to draft phab tasks. 
[16:54:23] <halfak>	 :D
[16:54:54] <DarTar>	 elukey: is the process above a good description of what should go in the request?
[16:55:00] <DarTar>	 halfak: deal
[16:56:13] <elukey>	 DarTar: basically what Moritz said earlier on, but I can review it after you have created it
[16:56:24] <DarTar>	 elukey: great
[16:57:43] <elukey>	 halfak: in the meantime, can you try to test how fast the proxy solution could be?
[16:57:54] <halfak>	 elukey, proxy is a no go. 
[16:58:01] <halfak>	 My ISP will shut me down 
[16:58:34] <elukey>	 oh ok 
[16:58:43] <halfak>	 They don't really like it when you download and upload 1TB in a few days. :|
[16:59:20] <elukey>	 I am not really sure why since it will be a regular data transfer, but I trust you :)
[17:00:38] <nuria_>	 elukey: question if you may
[17:00:48] <elukey>	 nuria_: sure!
[17:01:06] <nuria_>	 elukey: regarding 1002 i do not understand where the memory swap problem comes from after freeing 1 tb
[17:01:28] <wikibugs_>	 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: dbstore1002 /srv filling up - https://phabricator.wikimedia.org/T168303#3661678 (10Marostegui) >>! In T168303#3661618, @jcrespo wrote: > There is now 1TB available,   Excellent news! Maybe this ticket can be closed then? }:-)  > but we have now a memory...
[17:01:58] <elukey>	 wow didn't see it
[17:02:34] <nuria_>	 elukey: ah, mnu just updated
[17:03:21] <elukey>	 not really sure why it is swapping now, but it must be related to the huge alter done
[17:04:32] <wikibugs_>	 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: dbstore1002 /srv filling up - https://phabricator.wikimedia.org/T168303#3661683 (10jcrespo) Don't want to do that without someone from analytics around. Swap usage seems to be going down, probably even faster when it catches up- so I will leave it as is...
[17:05:54] <wikibugs_>	 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: dbstore1002 /srv filling up - https://phabricator.wikimedia.org/T168303#3661684 (10Marostegui) >>! In T168303#3661683, @jcrespo wrote: > Don't want to do that without someone from analytics around. Swap usage seems to be going down, probably even faster...
[17:06:56] <DarTar>	 elukey, halfak: https://phabricator.wikimedia.org/T177521
[17:07:35] <wikibugs_>	 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: dbstore1002 /srv filling up - https://phabricator.wikimedia.org/T168303#3661704 (10elukey) Thanks a lot for all the work people, count me in tomorrow if you want to restart it. I agree with the plan, looks good :)
[17:14:56] <elukey>	 DarTar, halfak - commented in https://phabricator.wikimedia.org/T177521#3661737, let's wait for some feedback 
[17:16:21] <elukey>	 going afk now, but will re-check tomorrow!
[17:16:24] <elukey>	 nite a-team! 
[17:16:32] <mforns>	 bye elukey!
[17:16:46] <milimetric>	 o/
[17:31:28] <DarTar>	 thanks elukey 
[22:26:05] <wikibugs_>	 10Analytics, 10Proton, 10Readers-Web-Backlog, 10Patch-For-Review, 10Readers-Web-Kanban-Board: Implement Schema:Print purging strategy - https://phabricator.wikimedia.org/T175395#3662739 (10Tbayer) (Moving discussion back here [[https://gerrit.wikimedia.org/r/#/c/379829/ |from Gerrit]])  @mforns has voice...
[22:46:18] <AndyRussG>	 By chance were there any issues on the analytics cluster on Sept. 15 between 0 and 6 hrs UTC that might have impacted webrequest data moving into Hive?
[22:48:08] <wikibugs_>	 10Analytics, 10Patch-For-Review, 10User-bd808, 10cloud-services-team (Kanban): Remove logging from labs for schema https://meta.wikimedia.org/wiki/Schema:CommandInvocation - https://phabricator.wikimedia.org/T166712#3662768 (10Nuria) @bd808: sounds good, let us know when calls are no loger coming in and we...
[22:59:48] <wikibugs_>	 (03CR) 10Nuria: "Added some comments, have we tested this job runs?" (034 comments) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/365517 (https://phabricator.wikimedia.org/T170764) (owner: 10Amire80)
[23:11:08] <nuria_>	 AndyRussG: I was looking at your FR ticket but your last comment does not quite add up, there is no data loss on our end (we monitor for that)
[23:13:00] <nuria_>	 AndyRussG: what do you see in September 15th? there are some varnish restarts that  I see on SAL but that's just about 
[23:14:21] <nuria_>	 AndyRussG: pageviews look stable: http://bit.ly/2z1bh9W
[23:30:38] <AndyRussG>	 hey nuria_ Mmm which ticket? There are a few...
[23:33:32] <wikibugs_>	 10Analytics-Tech-community-metrics, 10Developer-Relations (Oct-Dec 2017): Understand differences in results between "Top Authors" on Overview page vs. "Submitters" on Gerrit page - https://phabricator.wikimedia.org/T177566#3663034 (10Aklapper)
[23:36:12] <wikibugs_>	 10Analytics-Tech-community-metrics, 10Developer-Relations (Oct-Dec 2017): Understand differences in results between "Top Authors" on Overview page vs. "Submitters" on Gerrit page - https://phabricator.wikimedia.org/T177566#3663047 (10Aklapper) 05Open>03Resolved p:05Triage>03High Ah. Okay.  I went to ht...
[23:37:37] <AndyRussG>	 nuria_: the main issue is that donations rates seem OK, but impression rates (as queried via Druid) are all fluctuating on a daily basis and on average down
[23:37:49] <AndyRussG>	 Was just going to query directly via Hive
[23:37:53] <AndyRussG>	 instead of Druid
[23:38:32] <AndyRussG>	 We're by no means convinced it's an issue with an analytics pipeline, just thought I'd check to see if there anything obvious jumped out :)