[00:02:56] <grrrit-wm>	 (PS1) MaxSem: Fix metric name [analytics/discovery-stats] - https://gerrit.wikimedia.org/r/307457
[00:15:58] <grrrit-wm>	 (CR) Yurik: [C: 2] Fix metric name [analytics/discovery-stats] - https://gerrit.wikimedia.org/r/307457 (owner: MaxSem)
[00:16:08] <grrrit-wm>	 (CR) Yurik: [V: 2] Fix metric name [analytics/discovery-stats] - https://gerrit.wikimedia.org/r/307457 (owner: MaxSem)
[00:17:36] <grrrit-wm>	 (CR) Yurik: [C: 2 V: 2] Fix graphite port [analytics/discovery-stats] - https://gerrit.wikimedia.org/r/307449 (owner: MaxSem)
[00:18:40] <grrrit-wm>	 (CR) Yurik: [C: 2 V: 2] Refactor the logging command [analytics/discovery-stats] - https://gerrit.wikimedia.org/r/307448 (owner: MaxSem)
[00:19:37] <grrrit-wm>	 (CR) Yurik: [C: 2 V: 2] Remove unused uses [analytics/discovery-stats] - https://gerrit.wikimedia.org/r/307447 (owner: MaxSem)
[00:45:01] <grrrit-wm>	 (PS1) MaxSem: Merge branch 'master' into production [analytics/discovery-stats] - https://gerrit.wikimedia.org/r/307460
[01:12:18] <bawolff>	 Does Hadoop (Or some other analytics data stream) store info on what cookies people make requests with?
[06:10:49] <elukey>	 Just checked the oozie alarms
[06:11:09] <elukey>	 the number of dt:"-" is decreased a lot
[06:11:13] <elukey>	 but still present sometimes
[06:24:54] <elukey>	 ah and now a ERROR
[06:24:55] <elukey>	 sigh
[06:52:20] <joal>	 Hey elukey
[06:52:37] <joal>	 Can I help?
[06:53:41] <elukey>	 joal: o/
[06:53:57] <elukey>	 ah yes we need to run oozie with a more tolerant % of failures :(
[06:55:58] <elukey>	 I am investigating the new source of dt:"-"
[06:56:05] <elukey>	 that are a lot less than yesterday but still
[06:56:36] <elukey>	 I believe that we are now seeing some VSL timeouts that were covered before by the VSL store overflow errors
[06:56:55] <joal>	 hm
[06:57:20] <elukey>	 so now vk is able to keep more incomplete records in memory without dropping the oldest one periodically
[06:57:22] <joal>	 elukey: just checked the % of errors: we move between 0.5 and 5
[06:57:37] <elukey>	 and more of them are able to go to timeout :)
[06:58:02] <elukey>	 yesterday before the patch the errors were in the thousands per hour
[06:58:07] <elukey>	 now it is like houndreds
[06:58:21] <elukey>	 (I am talking per host errors)
[06:59:07] <joal>	 elukey: What strategy do we go for for refinery? 2 options: re-run only errors or restart a new coordinator for all?
[07:00:09] <elukey>	 joal: I thought that the only option was to run a more permissive coordinator
[07:00:15] <joal>	 elukey: Thanks A LOT for going with the VK stuff - It's really complex !
[07:00:53] <joal>	 elukey: correct, but I can either run a coord with 1 action, and run it evey time we have an error, or restart a global new coord (as we did for misc)
[07:01:26] <elukey>	 ahhh okok! So a one time thing vs a more permanent one
[07:01:37] <joal>	 elukey: excatly
[07:01:42] <elukey>	 I'd say a one time coord run would be fine for the moment
[07:01:47] <elukey>	 I hope to solve this mess today
[07:02:15] <joal>	 k sir, I'll do that now
[07:02:35] <elukey>	 thanks for the sir :D
[07:02:59] * elukey thinks that the last time he said these kind of strong statement he ended up coding for a month
[07:03:20] <joal>	 huhuhu :D
[07:08:33] <joal>	 elukey: Job started
[07:10:11] <elukey>	 thanks!
[09:02:32] <elukey>	 https://gerrit.wikimedia.org/r/#/c/307483/ - created a patch to raise the timeout to 1500 seconds (now: 700)
[09:02:47] <elukey>	 I don't like this approach since it is not really data driven
[09:03:07] <elukey>	 but I don't have any data from varnish so I need to make guesses
[09:16:13] <wikibugs>	 Analytics-Tech-community-metrics, Developer-Relations: Migration to new Bitergia's development dashboard - https://phabricator.wikimedia.org/T137997#2593914 (Aklapper)
[09:16:15] <wikibugs>	 Analytics-Tech-community-metrics: Deployment of Demography panel - https://phabricator.wikimedia.org/T138757#2593912 (Aklapper) Open>Resolved This is now available as a panel on https://wikimedia.biterg.io
[09:16:17] <wikibugs>	 Analytics-Tech-community-metrics, Developer-Relations: Migration to new Bitergia's development dashboard - https://phabricator.wikimedia.org/T137997#2386336 (Aklapper)
[09:16:20] <wikibugs>	 Analytics-Tech-community-metrics: Deployment of Gerrit Delays panel for engineering - https://phabricator.wikimedia.org/T138752#2593915 (Aklapper) Open>Resolved This is now available as a panel on https://wikimedia.biterg.io
[09:16:21] <wikibugs>	 Analytics-Tech-community-metrics: Deployment of Mediawiki panels - https://phabricator.wikimedia.org/T138006#2593918 (Aklapper) Open>Resolved This is now available as a panel on https://wikimedia.biterg.io
[09:16:24] <wikibugs>	 Analytics-Tech-community-metrics, Developer-Relations: Migration to new Bitergia's development dashboard - https://phabricator.wikimedia.org/T137997#2386336 (Aklapper)
[09:16:26] <wikibugs>	 Analytics-Tech-community-metrics, Developer-Relations: Migration to new Bitergia's development dashboard - https://phabricator.wikimedia.org/T137997#2386336 (Aklapper)
[09:16:28] <wikibugs>	 Analytics-Tech-community-metrics, Developer-Relations: Migration to new Bitergia's development dashboard - https://phabricator.wikimedia.org/T137997#2386336 (Aklapper)
[09:16:29] <wikibugs>	 Analytics-Tech-community-metrics: Deployment of Mailing List panel - https://phabricator.wikimedia.org/T138001#2593921 (Aklapper) Open>Resolved This is now available as a panel on https://wikimedia.biterg.io
[09:16:32] <wikibugs>	 Analytics-Tech-community-metrics: Deployment of Gerrit Backlog panel for engineering - https://phabricator.wikimedia.org/T138000#2593924 (Aklapper) Open>Resolved This is now available as a panel on https://wikimedia.biterg.io
[09:16:34] <wikibugs>	 Analytics-Tech-community-metrics: Deployment of Gerrit (basic) panel - https://phabricator.wikimedia.org/T137999#2593926 (Aklapper) Open>Resolved This is now available as a panel on https://wikimedia.biterg.io
[09:16:36] <wikibugs>	 Analytics-Tech-community-metrics, Developer-Relations: Migration to new Bitergia's development dashboard - https://phabricator.wikimedia.org/T137997#2386336 (Aklapper)
[09:16:38] <wikibugs>	 Analytics-Tech-community-metrics: Deployment of Git panel - https://phabricator.wikimedia.org/T137998#2593930 (Aklapper) Open>Resolved This is now available as a panel on https://wikimedia.biterg.io
[09:16:40] <wikibugs>	 Analytics-Tech-community-metrics, Developer-Relations: Migration to new Bitergia's development dashboard - https://phabricator.wikimedia.org/T137997#2386336 (Aklapper)
[09:18:03] <wikibugs>	 Analytics-Tech-community-metrics: Deployment of IRC panel - https://phabricator.wikimedia.org/T138004#2386456 (Aklapper)
[09:18:05] <wikibugs>	 Analytics-Tech-community-metrics: IRC support to be added to GrimoireLab - https://phabricator.wikimedia.org/T138005#2593937 (Aklapper) Open>Resolved This is done.
[09:18:40] <wikibugs>	 Analytics-Tech-community-metrics: Deployment of Mediawiki panels - https://phabricator.wikimedia.org/T138006#2593945 (Aklapper)
[09:18:42] <wikibugs>	 Analytics-Tech-community-metrics, Developer-Relations (Jul-Sep-2016): Mediawiki support to be added to GrimoireLab - https://phabricator.wikimedia.org/T138007#2593943 (Aklapper) Open>Resolved This is done.
[09:52:35] <wikibugs>	 Analytics-Tech-community-metrics, Developer-Relations (Jul-Sep-2016), Security, Vuln-XSS: Potential XSS on korma.wmflabs.org - https://phabricator.wikimedia.org/T132966#2594002 (Aklapper) Open>Resolved This is deployed and resolved.
[09:52:38] <wikibugs>	 Analytics-Tech-community-metrics, Developer-Relations (Jul-Sep-2016), Security, Vuln-XSS: Potential XSS on korma.wmflabs.org - https://phabricator.wikimedia.org/T132966#2594004 (Aklapper)
[09:56:24] <wikibugs>	 Analytics, Labs: Put pageviews dataset in labs /public/dumps - https://phabricator.wikimedia.org/T142671#2594010 (Stigmj) Any chance for this to be expedited? At nowiki we are currently using these datasets for a local stats-service (https://tools.wmflabs.org/pagecount/) and the users are complaining (ht...
[09:59:15] <wikibugs>	 Analytics-Tech-community-metrics: Maniphest support to be added to GrimorieLab - https://phabricator.wikimedia.org/T138003#2594011 (Aklapper) p:Normal>High
[10:00:30] <wikibugs>	 Analytics-Tech-community-metrics: Mismatch between numbers for code merges per organization - https://phabricator.wikimedia.org/T129910#2594013 (Aklapper) p:Normal>Low a:Lcanasdiaz>None Moved to Backlog. Does not make sense to investigate in korma (legacy) hence lowering priority.
[10:00:44] <wikibugs>	 Analytics-Tech-community-metrics: korma: Mismatch between numbers for code merges per organization - https://phabricator.wikimedia.org/T129910#2594016 (Aklapper)
[10:05:50] <elukey>	 merged!
[10:19:23] <joal>	 taking a reak a-team, see you in a bit
[10:57:14] <travis-ci>	 wikimedia/mediawiki-extensions-EventLogging#593 (wmf/1.28.0-wmf.17 - efc8c4d : Antoine Musso): The build has errored.
[10:57:14] <travis-ci>	 Change view : https://github.com/wikimedia/mediawiki-extensions-EventLogging/commit/efc8c4d0986d
[10:57:14] <travis-ci>	 Build details : https://travis-ci.org/wikimedia/mediawiki-extensions-EventLogging/builds/156186051
[11:52:16] <wikibugs>	 Analytics-Tech-community-metrics: korma: Profile names in UTF-8 incorrectly displayed as ??? - https://phabricator.wikimedia.org/T119540#2594097 (Aklapper) p:Low>Lowest
[11:53:57] <wikibugs>	 Analytics-Tech-community-metrics, Developer-Relations: Migration to new Bitergia's development dashboard - https://phabricator.wikimedia.org/T137997#2594100 (Aklapper)
[11:54:36] <wikibugs>	 Analytics-Tech-community-metrics, Developer-Relations: Migration to new Bitergia's development dashboard - https://phabricator.wikimedia.org/T137997#2386336 (Aklapper) p:Triage>Normal
[11:55:10] <wikibugs>	 Analytics-Tech-community-metrics: korma: Font used for "Organizations" header on contributors.html looks a bit out of place - https://phabricator.wikimedia.org/T100569#2594106 (Aklapper)
[11:58:42] <wikibugs>	 Analytics-Tech-community-metrics: korma: "Last 30 days" stats for specific mailing list display an account as one list item per username character - https://phabricator.wikimedia.org/T123927#2594109 (Aklapper) p:Low>Lowest
[11:59:27] <wikibugs>	 Analytics-Tech-community-metrics: korma: Clicking "Age of open changesets by Affiliation" explanation link / legend goes to top of page - https://phabricator.wikimedia.org/T110874#2594111 (Aklapper)
[11:59:42] <wikibugs>	 Analytics-Tech-community-metrics: korma: Time axis on repository.html only displays two months, repeated several items - https://phabricator.wikimedia.org/T115872#2594112 (Aklapper)
[11:59:56] <wikibugs>	 Analytics-Tech-community-metrics: korma: Empty "subject" and "creator" fields for mailing list thread on mls.html - https://phabricator.wikimedia.org/T116284#2594113 (Aklapper)
[12:00:11] <wikibugs>	 Analytics-Tech-community-metrics: korma: Illegible overlapping tables on narrow screens due to CSS - https://phabricator.wikimedia.org/T97115#2594115 (Aklapper)
[12:00:24] <wikibugs>	 Analytics-Tech-community-metrics, JavaScript: korma: Failed to load resource: the server responded with a status of 404 (Not Found) - https://phabricator.wikimedia.org/T65061#2594116 (Aklapper)
[12:00:26] <wikibugs>	 Analytics-Tech-community-metrics, JavaScript: korma: Syntax error, unrecognized expression on Korma profiles - https://phabricator.wikimedia.org/T126325#2594117 (Aklapper)
[12:01:29] <wikibugs>	 Analytics-Tech-community-metrics: korma: Panel for "Wiki revisions" on people.html does not provide 2016 data - https://phabricator.wikimedia.org/T141228#2594123 (Aklapper) p:Triage>Lowest
[12:07:50] <wikibugs>	 Analytics-Tech-community-metrics, Developer-Relations (Jul-Sep-2016): Create basic/high-level Kibana (dashboard) documentation - https://phabricator.wikimedia.org/T132323#2594129 (Aklapper)
[12:18:56] <wikibugs>	 Analytics-Tech-community-metrics, Developer-Relations (Jul-Sep-2016): Identify Wikimedia's most important/used info panels in korma.wmflabs.org - https://phabricator.wikimedia.org/T132421#2594147 (Aklapper)
[12:53:03] <wikibugs>	 Analytics-Kanban, Labs, Patch-For-Review: Put pageviews dataset in labs /public/dumps - https://phabricator.wikimedia.org/T142671#2594241 (Ottomata) p:Triage>High a:Ottomata
[12:54:43] <elukey>	 HIIII ottomata
[12:59:31] <ottomata>	 hiIi
[13:01:42] <halfak>	 o/ joal
[13:01:46] <halfak>	 Live systems chatting?
[13:44:32] <halfak|Mobile>	 o/ joal and millimetric
[13:44:41] <halfak|Mobile>	 Hard crash. Trying to recover
[13:44:49] <joal>	 good luck halfak|Mobile !
[13:48:27] <wikibugs>	 Analytics-Kanban: AQS Cassandra READ timeouts caused an increase of 503s - https://phabricator.wikimedia.org/T143873#2594422 (elukey) Next step that I'd like to do is compare the variation in traffic from the 17th onwards, to see if different queries started to land on AQS.
[14:10:25] <elukey>	 joal: do you have time for a naive question about where to find aqs logs?
[14:10:32] <joal>	 sure
[14:10:42] <joal>	 elukey: I think you know more than me ;)
[14:12:20] <elukey>	 no no like webrequest logs :P
[14:12:24] <joal>	 ah ok :)
[14:12:51] <elukey>	 can I find them via beeline with something like
[14:12:53] <elukey>	 FROM wmf_raw.webrequest
[14:12:53] <elukey>	 WHERE uri_host = "wikimedia.org" AND uri_query like '%/api/rest_v1/metrics/pageviews/per-article%'
[14:13:05] <elukey>	 or am I completely out of track?
[14:13:27] <joal>	 sounds correct, but why wmf_raw instead of wmf?
[14:14:34] <joal>	 also, if you want to do traffic analysis, look at wmnf.aqs_hourly
[14:14:44] <joal>	 elukey: --^
[14:15:21] <elukey>	 mmm I thought that wmf was refined, so I preferred as much info as possible.. aaaand I didn't know about wmnf.aqs_hourly :D
[14:18:53] <grrrit-wm>	 (CR) Mforns: [C: 2 V: 2] "LGTM Thanks a lot for doing that!" [analytics/reportupdater] - https://gerrit.wikimedia.org/r/307112 (https://phabricator.wikimedia.org/T144119) (owner: Hashar)
[14:21:28] <grrrit-wm>	 (PS3) Mforns: Support passing the exploded values by file path [analytics/reportupdater] - https://gerrit.wikimedia.org/r/306966 (https://phabricator.wikimedia.org/T132481)
[14:21:49] <wikibugs>	 Analytics-Kanban: AQS Cassandra READ timeouts caused an increase of 503s - https://phabricator.wikimedia.org/T143873#2594536 (elukey) The last outage that we registered on the 26th was related to a OOM:  Dates in UTC:  ``` 03:09  <icinga-wm> PROBLEM - cassandra CQL 10.64.48.117:9042 on aqs1003 is CRITICAL: C...
[14:23:56] <joal>	 ottomata: Hi !
[14:24:05] <ottomata>	 hiya!
[14:24:11] <joal>	 ottomata: Do we go for zookeeper change for Druid ?
[14:24:28] <ottomata>	 OH!
[14:24:37] <ottomata>	 yes lets' do it, i'm on the phone, and in the middle of a thought
[14:24:45] <ottomata>	 so hmm, gimme a few mins, hopefully we can start before standup
[14:24:56] <joal>	 ottomata: you tell me when you're ready :)
[14:25:40] <hashar>	 mforns: thanks :]
[14:25:47] <hashar>	 mforns: will add CI conf later on
[14:26:19] <hashar>	 actually right now since I have found the patch :D
[14:26:55] <mforns>	 hashar, thank you!
[14:29:46] <joal>	 elukey: does wmf.aqs_hourly contains your need?
[14:29:47] <grrrit-wm>	 (CR) Hashar: "recheck" [analytics/reportupdater] - https://gerrit.wikimedia.org/r/307112 (https://phabricator.wikimedia.org/T144119) (owner: Hashar)
[14:30:00] <joal>	 nuria_: hello
[14:30:41] <joal>	 nuria_: I added a line in the cassandra-backfilling etherpad to double check with you that month 2015-11 has been loaded (it has no line in the doc)
[14:31:05] <wikibugs>	 Analytics, Continuous-Integration-Config, Patch-For-Review: Add test runner and CI configuration to analytics/reportupdater - https://phabricator.wikimedia.org/T144119#2594575 (hashar) All good. Thank you @mforns
[14:31:24] <hashar>	 mforns: it is all set now. To reproduce CI run, you should be able to just run "tox" on your local machine
[14:31:30] <hashar>	 should gives you the same environement
[14:31:44] <mforns>	 hashar, yes it works! already tried it :]
[14:32:11] <mforns>	 awesome
[14:32:11] <elukey>	 joal: it is like candy land for me
[14:32:18] <joal>	 elukey: ;)
[14:32:19] <elukey>	 thanks :D
[14:32:30] <joal>	 elukey: if you want I have spark scripts to help
[14:32:33] <joal>	 elukey: let me know :)
[14:32:51] <wikibugs>	 Analytics, Continuous-Integration-Config, Patch-For-Review: Add test runner and CI configuration to analytics/reportupdater - https://phabricator.wikimedia.org/T144119#2594581 (hashar) Open>Resolved
[14:34:27] <elukey>	 joal: I'd be really happy to look at them with you during the next days, I didn't know that I had so much data available :(
[14:34:40] <elukey>	 I feel super lazy to not have it checked before
[14:34:59] <joal>	 elukey: https://gist.github.com/anonymous/d7ac82770f52d8eaa71edead8f9e3a28
[14:35:16] <joal>	 elukey: we can go through that whenever you want, I think it can help :)
[14:35:51] <elukey>	 wow really nice!
[14:39:00] <milimetric>	 joal / mforns: I'm just catching up on email, we can talk anytime you two want (oh man, it's almost standup!)
[14:39:35] <joal>	 milimetric / mforns : I have time now if you want, or after standup :)
[14:39:47] <milimetric>	 sure, batcave!
[14:39:50] <mforns>	 ok
[14:40:58] <ottomata>	 ok joal ah you busy now?
[14:41:09] <ottomata>	 so, i'm just going to merge the puppet patch, run puppet, and then restart druid, ja?
[14:41:14] <ottomata>	 this will set up a zk cluster on the druid nodes
[14:41:17] <ottomata>	 and point the druid configs at it
[14:41:29] <joal>	 ottomata: I think that's right
[14:41:47] <joal>	 ottomata: Since everything is stored in deep storage, should be ok
[14:41:50] <ottomata>	 k
[14:42:16] <joal>	 ottomata: I'll still triple check
[14:54:10] <nuria_>	 joal: corrected etherpad, loading 0003741-160826130408204-oozie-oozi-C was for 2015-11
[14:54:24] <joal>	 thx nuria_ !
[14:57:08] <nuria_>	 elukey: where can i see the logs that had the memory OOM for cassandra?
[15:00:07] <elukey>	 nuria_: /var/log/cassandra/ and then system.log.1.zip IIRC
[15:00:23] <elukey>	 we are using Xmx 16Gb
[15:00:27] <elukey>	 that is huuuge
[15:01:06] <grrrit-wm>	 (PS3) Milimetric: [WIP] Script sqooping mediawiki tables into hdfs [analytics/refinery] - https://gerrit.wikimedia.org/r/306292 (https://phabricator.wikimedia.org/T141476)
[15:01:16] <nuria_>	 ottomata: standddupppp?
[15:02:09] <nuria_>	 ottomata: holaaa
[15:03:17] <wikibugs>	 Analytics-Kanban, Analytics-Wikimetrics: Stop vital signs metric creation on wikimetrics - https://phabricator.wikimedia.org/T143715#2594681 (Nuria) a:Nuria
[15:03:34] <ottomata>	 AHHHH
[15:03:35] <ottomata>	 sorrry
[15:08:37] <wikibugs>	 Analytics, Analytics-EventLogging, DBA, Research-and-Data: Queries on PageContentSaveComplete are starting to pileup - https://phabricator.wikimedia.org/T144278#2594696 (jcrespo)
[15:43:25] <wikibugs>	 Analytics-Kanban, Labs, Patch-For-Review: Put pageviews dataset in labs /public/dumps - https://phabricator.wikimedia.org/T142671#2594755 (Ottomata) Done!  I'm running the first rsync over now.  I didn't create a specific rsync module, it seems like the ::dumps one should be enough.  @Stigmj, I don't...
[15:47:46] <elukey>	 mmmm ottomata why puppet was failing for zk on 100[12] ?
[15:47:53] <elukey>	 (curious)
[15:48:53] <ottomata>	 elukey:  that is druid 1001, etc.
[15:48:58] <ottomata>	 elukey:  not sure yet, just started to ask  _joe_ a q  in mw sec
[15:49:17] <ottomata>	 but, it looks like the host specific override i have in hiera is not being used
[15:49:25] <elukey>	 ah okok
[15:59:47] <elukey>	 joal: ops sync?
[15:59:50] <joal>	 sure
[16:00:32] <mforns>	 milimetric, are you planning on continuing with denormalized table today?
[16:03:36] <milimetric>	 mforns: no, I think I will start writing spark for it
[16:03:43] <milimetric>	 or at least thinking of an algorithm that would be more efficient
[16:03:50] <milimetric>	 drawing on my chalkboard :)
[16:03:56] <milimetric>	 wanna do it together? cc joal
[16:04:03] <milimetric>	 (right now I'm just updating the calendar)
[16:04:37] <mforns>	 milimetric, sure
[16:05:56] <mforns>	 milimetric, in one of the patches of the page/user scala code there's this method I ended up deleting, that historifies the admin user name
[16:07:07] <mforns>	 milimetric, this might be similar with what we want to do... dunno
[16:07:19] * mforns looks
[16:08:11] <milimetric>	 a-team: I think I cleaned up the ops-duty events.  I tried to adjust them to everyone's work schedule (later for nuria earlier for joal).  Feel free to move them yourselves, the events are all editable by everyone
[16:08:33] <joal>	 thx a lot milimetric !
[16:08:33] <mforns>	 milimetric, looks perfect for me, thanks!
[16:09:07] <nuria_>	 milimetric: super thanks
[16:09:17] <milimetric>	 mforns: a-batcave-2 to look together?
[16:09:46] <wikibugs>	 Analytics-Kanban, Labs, Patch-For-Review: Put pageviews dataset in labs /public/dumps - https://phabricator.wikimedia.org/T142671#2594863 (Nuria) Open>Resolved
[16:09:58] <wikibugs>	 Analytics-Kanban, Analytics-Wikimetrics: Stop vital signs metric creation on wikimetrics - https://phabricator.wikimedia.org/T143715#2594864 (Nuria) Open>Resolved
[16:10:29] <mforns>	 milimetric, omw
[16:10:43] <nuria_>	 elukey: i can look at ttraffic patterns on aqs today if you want
[16:12:16] <joal>	 nuria_: if you do that, there is a spark script I wrote that can help : https://gist.github.com/anonymous/d7ac82770f52d8eaa71edead8f9e3a28
[16:12:35] <nuria_>	 joal: i was planning on borrowing/stealing that one, yes
[16:12:43] <joal>	 nuria_: great ;)
[16:12:53] <joal>	 nuria_: I was not sure you'd seen that
[16:13:18] <nuria_>	 joal: will let you know what i found cc elukey
[16:13:36] <joal>	 great, thx nuria_
[16:18:10] <elukey>	 nuria_: please do, thanks a lot :)
[16:30:24] <wikibugs>	 Analytics-Kanban, Labs, Patch-For-Review: Put pageviews dataset in labs /public/dumps - https://phabricator.wikimedia.org/T142671#2594927 (Stigmj) @Ottomata yeah they are there. Thank you.
[16:31:43] <elukey>	 nuria_: would it be possible to move the hw meeting from Monday? (ops meeting colliding)
[16:31:55] <nuria_>	 elukey: ah yes, let me allow edits
[16:32:13] <nuria_>	 elukey: done
[16:36:27] <elukey>	 thanks!
[16:53:41] <icinga-wm>	 PROBLEM - YARN NodeManager Node-State on analytics1039 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[16:56:21] <icinga-wm>	 RECOVERY - YARN NodeManager Node-State on analytics1039 is OK: OK: YARN NodeManager analytics1039.eqiad.wmnet:8041 Node-State: RUNNING
[17:04:53] <ottomata>	 joal:  FYI druid restarted with new zk cluster
[17:04:58] <ottomata>	 i think it looks ok
[17:05:43] <joal>	 ottomata: Just looked through pivot: no change for me !
[17:06:08] <joal>	 by the way a-team, no more chrome41 issue
[17:06:13] <elukey>	 going afk team! byyyeeee
[17:06:22] <joal>	 ottomata: I'm gonna double check using coordinator UI
[17:06:26] <joal>	 Bye elukey !
[17:06:27] <ottomata>	 k cool
[17:06:28] <milimetric>	 cool
[17:06:33] <ottomata>	 i see lots of segments being loaded i think
[17:06:42] <joal>	 ottomata: Great :)
[17:06:46] <joal>	 That's good sign !
[17:06:59] <joal>	 milimetric: no more issue since aug 17th
[17:08:18] <joal>	 ottomata: Shall we remove the pageviews datasource (it was test) ?
[17:08:21] <joal>	 milimetric: --^
[17:12:07] <ottomata>	 joal:  uhhhh, up to yall
[17:12:41] <joal>	 k ottomata :)
[17:12:54] <joal>	 ottomata: Thanks for the zookeeper thing
[17:21:27] <ottomata>	 yup, np!
[17:21:28] <ottomata>	 lunchtime!
[17:32:54] <nuria_>	 joal: the chrome issue went away with teh chnages brandon did to send "unauthorized"
[17:33:06] <nuria_>	 to those requests , it stopped teh spam
[17:33:08] <nuria_>	 *the
[17:33:45] <joal>	 makes sense nuria_, didn't follow the ticket (my bad)
[17:34:13] <nuria_>	 joal: np at all, it took several tries
[17:37:06] <milimetric>	 joal: yeah, we can remove that, sorry was distracted
[17:37:12] <joal>	 np milimetric
[17:39:37] <wikibugs>	 Analytics, Analytics-EventLogging, DBA, Research-and-Data: Queries on PageContentSaveComplete are starting to pileup - https://phabricator.wikimedia.org/T144278#2594696 (DarTar) @jcrespo found the culprit. Let me talk to a few people but I think this is a legacy job that we can disable.
[17:40:01] <wikibugs>	 Analytics, Analytics-EventLogging, DBA, Research-and-Data: Queries on PageContentSaveComplete are starting to pileup - https://phabricator.wikimedia.org/T144278#2595274 (DarTar) p:Triage>Normal a:DarTar
[17:45:29] <joal>	 !log Drop pageviews test datasource in druid
[17:45:31] <analytics-logbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log, Master
[17:47:26] <wikibugs>	 Analytics, Analytics-EventLogging, DBA, Research-and-Data: Queries on PageContentSaveComplete are starting to pileup - https://phabricator.wikimedia.org/T144278#2595287 (DarTar) @Neil_P._Quinn_WMF @Jdforrester-WMF @Milimetric this is one of the legacy scripts populating dashboards such as http://...
[17:49:35] <joal>	 logging off a-team, bye !
[17:58:18] <grrrit-wm>	 (CR) MaxSem: [C: 2 V: 2] Merge branch 'master' into production [analytics/discovery-stats] - https://gerrit.wikimedia.org/r/307460 (owner: MaxSem)
[18:19:07] <wikibugs>	 Analytics, Analytics-EventLogging, DBA, Research-and-Data: Queries on PageContentSaveComplete are starting to pileup - https://phabricator.wikimedia.org/T144278#2595384 (Jdforrester-WMF) Yes, we want to migrate them to Dashiki. CCing @HJiang-WMF who's working on the broader objective.
[18:23:24] <wikibugs>	 Analytics-Kanban: Capacity projections of pageview API document on wikitech - https://phabricator.wikimedia.org/T138318#2595390 (Nuria) https://wikitech.wikimedia.org/wiki/Analytics/AQS#Capacity_Projections
[18:30:53] <wikibugs>	 Analytics: Dashboards working on mobile - https://phabricator.wikimedia.org/T144299#2595396 (Nuria)
[18:31:07] <wikibugs>	 Analytics, Analytics-Dashiki: Dashboards working on mobile - https://phabricator.wikimedia.org/T144299#2595408 (Nuria)
[18:42:44] <nuria_>	 jdlrobson: did you get your question answered? hive -f blah.sql > out.txt does what you want
[18:42:56] <nuria_>	 jdlrobson: https://wikitech.wikimedia.org/wiki/Analytics/Cluster/Hive/Queries
[18:44:15] <grrrit-wm>	 (PS3) Mforns: Disable the deprecated option by_wiki [analytics/reportupdater] - https://gerrit.wikimedia.org/r/306968 (https://phabricator.wikimedia.org/T132481)
[18:45:05] <grrrit-wm>	 (CR) jenkins-bot: [V: -1] Disable the deprecated option by_wiki [analytics/reportupdater] - https://gerrit.wikimedia.org/r/306968 (https://phabricator.wikimedia.org/T132481) (owner: Mforns)
[18:54:02] <wikibugs>	 Analytics, Beta-Cluster-Infrastructure, Services, scap, and 3 others: Set up AQS in Beta - https://phabricator.wikimedia.org/T116206#1743135 (bd808) >>! In T116206#2582429, @elukey wrote: > Thanks for reporting, this is my bad since analytics_hadoop_hosts is not in hiera labs. Since this value sh...
[19:07:35] <nuria_>	 ottomata: yt?
[19:10:02] <ottomata>	 hi yup
[19:10:04] <ottomata>	 nuria_:
[19:10:30] <nuria_>	 ottomata: wondering about this gist that joseph passed along: https://gist.github.com/anonymous/d7ac82770f52d8eaa71edead8f9e3a28
[19:10:52] <nuria_>	 ottomata: in order to look at the data
[19:11:27] <nuria_>	 i added couple lines:
[19:11:29] <nuria_>	 ottomata:
[19:11:40] <nuria_>	 https://www.irccloud.com/pastebin/BXWhZsza/
[19:12:12] <ottomata>	 nuria_:  ok, not familiar with it, but what's up?
[19:12:22] <nuria_>	 ottomata: but executing this gives tons of errors on spark shell
[19:12:26] <nuria_>	 https://www.irccloud.com/pastebin/rESGWdaq/
[19:12:51] <ottomata>	 how did you launch spark shell?
[19:13:43] <nuria_>	 ./spark-hell
[19:13:45] <nuria_>	 ./spark-shell
[19:13:46] <nuria_>	 jajaj
[19:13:50] <nuria_>	 ^ ottomata
[19:13:56] <nuria_>	 do i need to add jars?
[19:14:23] <ottomata>	 hm, no, just wondering about that requesting execturo thing
[19:14:39] <ottomata>	 nuria_:  those are all warnings though
[19:14:41] <ottomata>	 does it not work?
[19:14:53] <nuria_>	 ottomata: well there are pages 7 pages & pages of warnings
[19:15:10] <ottomata>	 nuria_:  i am not sure. hm
[19:15:22] <joal>	 nuria_: I'm here for a minute
[19:15:28] <nuria_>	 ottomata: ok
[19:15:28] <wikibugs>	 Analytics, Beta-Cluster-Infrastructure, Services, scap, and 3 others: Set up AQS in Beta - https://phabricator.wikimedia.org/T116206#2595633 (bd808)
[19:15:29] <ottomata>	 this one sounds worrysome
[19:15:30] <ottomata>	 WARN ExecutorAllocationManager: Unable to reach the cluster manager to request 25 total executors!
[19:15:36] <ottomata>	 but the others sound harmless
[19:15:49] <joal>	 nuria_: try launching spark-shell using --master yarn
[19:15:56] <ottomata>	 oh ja :)
[19:15:59] <nuria_>	 joal: ahahahaha
[19:16:32] <jdlrobson>	 nuria_: hey i sort of got the answer. The problem I have is I'm working with too many rows. I think I need to sample it somehow.
[19:16:49] <nuria_>	 joal : me forgot all about spark
[19:17:02] <joal>	 nuria_: also, to prevent having to kill the shell, use LIMIT in your select * query, you'll have reasonable result size :)
[19:17:26] <jdlrobson>	 (haven't worked out best way to do that yet)
[19:17:30] <nuria_>	 jdlrobson: amounts of data are huge, you need to query like this: https://wikitech.wikimedia.org/wiki/Analytics/Cluster/Hive/Queries#Always_restrict_queries_to_a_date_range_.28partitioning.29
[19:17:44] <joal>	 nuria_: or, use results.take(10).foreach(println)
[19:17:45] <jdlrobson>	 yup that's what i figured
[19:18:15] <nuria_>	 joal: ya, i was using limit but errors were all over teh place but hey, cause i was running locally
[19:18:32] <joal>	 I'll wait in case you have another error :)
[19:20:03] <wikibugs>	 Quarry: Forking your own query results in a new one owned by YuviPanda - https://phabricator.wikimedia.org/T144309#2595654 (Huji)
[19:22:39] <joal>	 nuria_: works better?
[19:22:40] <nuria_>	 joal: nah, do not worry, we will figure it out. it is running now
[19:22:45] <nuria_>	 joal: yes
[19:22:46] <joal>	 ok great :)
[19:22:50] <joal>	 tomorrow :)
[19:23:04] <nuria_>	 ciao
[19:37:49] <nuria_>	 ottomata: i take it back it no work
[19:37:53] <nuria_>	 https://www.irccloud.com/pastebin/PaXKedzK/
[19:39:25] <nuria_>	 ottomata: will look around to see if there is a setting we canuse
[19:40:58] <ottomata>	 nuria_:  have you been looking at the spark job UI too?
[19:41:06] <nuria_>	 ottomata: ah no
[19:41:13] <nuria_>	 ottomata: in yarn?
[19:41:32] <ottomata>	 ja
[19:41:42] <ottomata>	 ssh -N stat1002.eqiad.wmnet -L 8088:analytics1001.eqiad.wmnet:8088
[19:41:43] <ottomata>	 then
[19:41:50] <ottomata>	 localhost:8088
[19:41:51] <ottomata>	 find your job
[19:41:54] <ottomata>	 click on application master
[19:42:00] <ottomata>	 you ight have to change the uri back to localhost:8088 a few times
[19:42:04] <ottomata>	 but ja
[20:35:25] <grrrit-wm>	 (CR) Nuria: [C: -1] Bookmark for browser dashboard regarding graph and time (2 comments) [analytics/dashiki] - https://gerrit.wikimedia.org/r/306980 (https://phabricator.wikimedia.org/T143689) (owner: Nuria)
[20:57:01] <grrrit-wm>	 (PS1) MaxSem: Set permissions like mediawiki-config [analytics/discovery-stats] (refs/meta/config) - https://gerrit.wikimedia.org/r/307583
[20:59:33] <wikibugs>	 Quarry: Forking your own query results in a new one owned by YuviPanda - https://phabricator.wikimedia.org/T144309#2595935 (yuvipanda) whoops :( try again?
[21:56:49] <wikibugs>	 Analytics-Cluster, Operations, Patch-For-Review: Migrate titanium to jessie (archiva.wikimedia.org upgrade) - https://phabricator.wikimedia.org/T123725#2596103 (Dzahn) I have rsynced the entire /var/lib/archiva from titanium over to meitnerium, the new jessie server.  One single file, the conf/archiv...
[22:02:05] <milimetric>	 nuria_: you around?
[22:03:01] <wikibugs>	 Analytics-Dashiki, Analytics-Kanban: Sort tabs layout alphabetically - https://phabricator.wikimedia.org/T144322#2596141 (Milimetric)
[22:04:38] <grrrit-wm>	 (PS1) Milimetric: Sort legend alphabetically [analytics/dashiki] - https://gerrit.wikimedia.org/r/307634 (https://phabricator.wikimedia.org/T144322)
[22:07:15] <wikibugs>	 Analytics, Analytics-EventLogging, DBA, Research-and-Data: Queries on PageContentSaveComplete are starting to pileup - https://phabricator.wikimedia.org/T144278#2596173 (Milimetric) @HJiang-WMF: happy to help you migrate these to reportupdater / dashiki.  Actually, VERY happy to do that, because...
[22:30:30] <wikibugs>	 Analytics-Cluster, Operations, Patch-For-Review: Migrate titanium to jessie (archiva.wikimedia.org upgrade) - https://phabricator.wikimedia.org/T123725#2596235 (Dzahn) Now we have the data but still get an Error 503 - Service Unavailable from the new server, even though the archiva service is running...
[22:33:18] <wikibugs>	 Analytics-Cluster, Operations, Patch-For-Review: Migrate titanium to jessie (archiva.wikimedia.org upgrade) - https://phabricator.wikimedia.org/T123725#2596244 (Dzahn) the issue is caused by archiva user being a different UID on old and new server, which means permissions are messed up even when we p...
[22:46:45] <wikibugs>	 Analytics, Analytics-EventLogging, DBA, Research-and-Data: Queries on PageContentSaveComplete are starting to pileup - https://phabricator.wikimedia.org/T144278#2596251 (DarTar) Same here.  @HJiang-WMF @Milimetric: happy to help as needed, going over the SQL I used for the dashboards etc.  I didn...
[22:50:06] <bearloga>	 howdy! Trying here alongside #wikimedia-operations. I have a question about https://wikitech.wikimedia.org/wiki/Analytics/Data_access#Access_Groups. The "analytics-users" group has access to Hadoop/Hive on stat1004, but "(NO PRIVATE DATA)". what is meant by that? can someone have access to hadoop/hive but not private data (e.g. a sanitized, PII-less subset
[22:50:06] <bearloga>	 of wmf.webrequest)?
[22:50:52] <wikibugs>	 Analytics-Cluster, Operations, Patch-For-Review: Migrate titanium to jessie (archiva.wikimedia.org upgrade) - https://phabricator.wikimedia.org/T123725#2596254 (Dzahn) fix running:  root@meitnerium:/var/lib# find /var/lib/archiva/ -uid 108 -exec chown archiva:archiva {} \;
[22:51:17] <wikibugs>	 Analytics-Cluster, Operations: Migrate titanium to jessie (archiva.wikimedia.org upgrade) - https://phabricator.wikimedia.org/T123725#2596255 (Dzahn)
[22:56:02] <wikibugs>	 Analytics-Cluster, Operations: Migrate titanium to jessie (archiva.wikimedia.org upgrade) - https://phabricator.wikimedia.org/T123725#2596268 (Dzahn) fixed, restarted service. i got the Archiva web UI now on meitnerium (when hacking my /etc/resolv.conf to point archiva.wm.org to it).
[23:24:29] <grrrit-wm>	 (CR) EBernhardson: [C: 2 V: 2] Set permissions like mediawiki-config [analytics/discovery-stats] (refs/meta/config) - https://gerrit.wikimedia.org/r/307583 (owner: MaxSem)
[23:55:53] <wikibugs>	 Analytics: Can't log into https://piwik.wikimedia.org/ - https://phabricator.wikimedia.org/T144326#2596371 (Tbayer)