[01:51:03] <wikibugs>	 Analytics-Tech-community-metrics, DevRel-January-2016: Statistics for SCM project 'core' mix pywikibot/core, mediawiki/core and oojs/core - https://phabricator.wikimedia.org/T123808#1946757 (jayvdb) >>! In T123808#1945858, @Aklapper wrote: > @jayvdb, Thanks for finding this and raising this! And I think y...
[04:10:38] <wikibugs>	 Analytics-Tech-community-metrics, pywikibot-core, DevRel-January-2016: Statistics for SCM project 'core' mix pywikibot/core, mediawiki/core and oojs/core - https://phabricator.wikimedia.org/T123808#1946980 (jayvdb)
[05:07:34] <grrrit-wm>	 (PS1) Milimetric: Update hostnames to analytics-store [analytics/geowiki] - https://gerrit.wikimedia.org/r/265213
[05:08:03] <grrrit-wm>	 (CR) jenkins-bot: [V: -1] Update hostnames to analytics-store [analytics/geowiki] - https://gerrit.wikimedia.org/r/265213 (owner: Milimetric)
[05:10:01] <grrrit-wm>	 (CR) Milimetric: Update hostnames to analytics-store (1 comment) [analytics/geowiki] - https://gerrit.wikimedia.org/r/265213 (owner: Milimetric)
[06:24:26] <elukey>	 o/
[07:36:51] <wikibugs>	 Analytics-EventLogging, MediaWiki-API, Easy, Google-Code-In-2015, Patch-For-Review: ApiJsonSchema implements ApiBase::getCustomPrinter for no good reason - https://phabricator.wikimedia.org/T91454#1947125 (Florian) >>! In T91454#1946228, @greg wrote: > Well, we missed today's branching, but othe...
[08:39:55] <elukey>	 quick question for you guys: do we need to set any particular git config for https://github.com/wikimedia/operations-puppet ?
[08:40:14] <elukey>	 I am trying to send a code review with git review but it doesn't get my ssh cert
[08:40:27] * elukey is probably doing something terribly wrong
[08:57:55] * elukey should have used ssh:// 30 minutes ago probably
[08:57:58] <elukey>	 https://gerrit.wikimedia.org/r/#/c/265227/ :)
[09:01:02] <elukey>	 So the patch could be discared, I wanted to try the workflow
[09:01:13] <elukey>	 but it might be useful for the newbies like me
[09:28:07] <joal>	 Hi elukey
[09:28:22] <joal>	 Have you managed to git review ?
[09:41:50] <elukey>	 yep already merged by Yuvi :)
[09:42:16] <elukey>	 I don't have palladium's access so I couldn't do puppet apply
[09:53:32] <joal>	 k
[09:55:49] <elukey>	 all right going out for ~2hrs, talk with you later!
[09:58:24] <joal>	 laters !
[12:05:16] <elukey>	 back! :)
[12:07:23] <elukey>	 joal: do you have a minute for https://phabricator.wikimedia.org/T123942 ? I am trying to figure out one thing
[12:46:15] <wikibugs>	 Analytics-Kanban, Patch-For-Review: Burrow should be restarted automatically when config changes - https://phabricator.wikimedia.org/T123942#1947617 (elukey) Tentative for a patch: https://gerrit.wikimedia.org/r/#/c/265246/
[12:46:29] <wikibugs>	 Analytics-Kanban, Patch-For-Review: Burrow should be restarted automatically when config changes - https://phabricator.wikimedia.org/T123942#1947619 (elukey) p:Triage>Normal
[12:46:55] <elukey>	 ottomata: https://gerrit.wikimedia.org/r/265246 - if you have time :)
[12:48:05] <elukey>	 I created a krypton-test instance in labs, reading the docs this seems to be the next step after the change has been reviewed and before the merge to prod
[12:48:05] <Ironholds>	 why the hells do I have to be awake
[12:48:52] <elukey>	 Ironholds: too early?
[12:49:20] <Ironholds>	 well, that, but also I've been continuously awake foor..22 hours.
[12:49:57] <elukey>	 ah ok now I got your question :D
[13:10:29] <joal>	 Hi elukey, sorry I was away for lunch
[13:14:43] <elukey>	 hey joal! Nothing super important, I just wanted to ask you some things before sending the CR but I proceeded anyway :)
[13:14:57] <joal>	 okayy
[13:15:19] <joal>	 Also, I'm not very burrow knowledgeable, so you probably will teach some )
[13:19:23] <elukey>	 I've read about it this morning :d
[13:28:26] <Ironholds>	 elukey, update
[13:28:28] <Ironholds>	 the plane is tomorrow
[13:28:44] <Ironholds>	 I have been awake for 22 hours to guarantee I will be asleep on a plane that does not take off for another 26 hours
[13:28:51] * Ironholds hurls everything into the sun
[13:30:53] <elukey>	 Ironholds: what's the plan? Can we bet on your next 26 hours? :D
[13:31:21] <elukey>	 joal: https://github.com/linkedin/Burrow/wiki/Consumer-Lag-Evaluation-Rules - this seems nice! Still need to understand it fully, but it seems well done
[13:34:51] <wikibugs>	 Quarry: Cannot download data from a query with Unicode characters in its title - https://phabricator.wikimedia.org/T123031#1947747 (XXN) I confirm this. I encountered the same problem with http://quarry.wmflabs.org/query/6945
[14:39:29] <wikibugs>	 Analytics-Kanban: Projections of cost and scaling for pageview API. {hawk} [8 pts] - https://phabricator.wikimedia.org/T116097#1947959 (Milimetric) so if we need 3T per year, we'll naively need 15T for 5 years.  But we shouldn't keep daily per-article resolution for that long.  We could cut it dramatically by...
[15:41:12] <wikibugs>	 Analytics, operations, ops-eqiad: Possible bad mem chip or slot on dbproxy1004 - https://phabricator.wikimedia.org/T123546#1948124 (Cmjohnson) @ottomata @nuria let's coordinate a time that we can get this done.
[15:41:52] <wikibugs>	 Analytics, operations, ops-eqiad: Possible bad mem chip or slot on dbproxy1004 - https://phabricator.wikimedia.org/T123546#1948126 (Ottomata) I think we need to coordinate with @jcrespo.  This box is more than just eventlogging db proxy.
[15:47:04] <MarkTraceur>	 milimetric: Where did you say logs would get dumped if my scripts failed again?
[15:47:28] <wikibugs>	 Analytics, operations, ops-eqiad: Possible bad mem chip or slot on dbproxy1004 - https://phabricator.wikimedia.org/T123546#1948150 (jcrespo) No I think dbproxy1004 only serves m4/eventlogging. But we can failover to another machine without needing downtime, I just need time to setup another proxy temp...
[16:11:39] <nuria>	 milimetric: let me know if you want to talk about piwik
[16:13:47] <wikibugs>	 Analytics-EventLogging, MediaWiki-API, Easy, Google-Code-In-2015, Patch-For-Review: ApiJsonSchema implements ApiBase::getCustomPrinter for no good reason - https://phabricator.wikimedia.org/T91454#1948220 (greg) Yeah, getting it on the deployments wiki page so it's not missed is important (I'm m...
[16:49:11] <wikibugs>	 Analytics-Cluster, EventBus, Services, operations: Investigate proper set up for using Kafka MirrorMaker with new main Kafka clusters. - https://phabricator.wikimedia.org/T123954#1948340 (aaron)
[16:59:05] <wikibugs>	 Analytics-Kanban, Analytics-Wikimetrics, Patch-For-Review, Puppet: Cleanup Wikimetrics puppet module so it can run puppet continuously without own puppetmaster {dove} [21  pts] - https://phabricator.wikimedia.org/T101763#1948417 (Nuria) Open>Resolved
[16:59:20] <wikibugs>	 Analytics-Kanban, Analytics-Wikimetrics, Patch-For-Review: Use fabric to deploy wikimetrics {dove} [13 pts] - https://phabricator.wikimedia.org/T122228#1948418 (Nuria) Open>Resolved
[16:59:22] <wikibugs>	 Analytics-Kanban, Analytics-Wikimetrics, Patch-For-Review, Puppet: Cleanup Wikimetrics puppet module so it can run puppet continuously without own puppetmaster {dove} [21  pts] - https://phabricator.wikimedia.org/T101763#1347110 (Nuria)
[16:59:45] <wikibugs>	 Analytics-Kanban: Create a set of celery tasks that can handle the global metric API input {kudu} [0 pts] - https://phabricator.wikimedia.org/T117288#1948423 (Nuria)
[16:59:47] <wikibugs>	 Analytics-Kanban, Patch-For-Review: Create celery chain or other organization that handles validation and computation {kudu} [13 pts] - https://phabricator.wikimedia.org/T118308#1948422 (Nuria) Open>Resolved
[17:00:06] <wikibugs>	 Analytics-Kanban: Implement a simple public API to calculate global metrics {kudu} [0 pts] - https://phabricator.wikimedia.org/T117285#1948424 (Nuria) Open>Resolved
[17:00:30] <wikibugs>	 Analytics-Kanban: Implement a simple public API to calculate global metrics {kudu} [0 pts] - https://phabricator.wikimedia.org/T117285#1770422 (Nuria)
[17:00:32] <wikibugs>	 Analytics-Kanban, Patch-For-Review: Build a public form that can hit the new API {kudu} [8 pts] - https://phabricator.wikimedia.org/T117289#1948436 (Nuria) Open>Resolved
[17:00:46] <wikibugs>	 Analytics-Kanban, Community-Wikimetrics, Patch-For-Review: Story: WikimetricsUser reports pages edited by cohort {kudu} [13 pts] - https://phabricator.wikimedia.org/T75072#1948439 (Nuria) Open>Resolved
[17:02:00] <madhuvishy>	 a-team: joining standup in 2 minutes
[17:02:39] <nuria>	 madhuvishy: ok
[17:07:12] <wikibugs>	 Analytics-Kanban: Create Piwik cron to optimize dashboarding - https://phabricator.wikimedia.org/T124187#1948469 (Nuria)
[17:08:04] <wikibugs>	 Analytics-Kanban: Create Piwik cron to optimize dashboarding [3 pts] - https://phabricator.wikimedia.org/T124187#1948473 (Milimetric)
[17:09:13] <wikibugs>	 Analytics-Kanban: Create Piwik cron to optimize dashboarding [3 pts] - https://phabricator.wikimedia.org/T124187#1948478 (Nuria) Disable querying of data  on db ever ytime you look at the dashboard
[17:14:44] <madhuvishy>	 ottomata: after your ops meeting, can we chat about https://github.com/linkedin/Burrow/issues/4#issuecomment-172944046
[17:24:27] <ottomata>	 sho
[17:37:51] <mforns>	 !log restarted EventLogging because of Kafka consumption lag
[17:37:54] <analytics-logbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log, Master
[17:50:58] <mforns>	 nuria, I was thinking to grab one (or both) of those tasks next: https://phabricator.wikimedia.org/T108599 https://phabricator.wikimedia.org/T108867 is that OK?
[17:52:20] <joal>	 a-team, so that you know: hive has handled the query in 88 seconds, no problem
[17:52:23] <joal>	 :(
[17:52:28] <nuria>	 mforns: on meeting , will look in a sec
[17:53:06] <mforns>	 sure no rush
[17:54:44] <icinga-wm>	 PROBLEM - Difference between raw and validated EventLogging overall message rates on graphite1001 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [30.0]
[18:02:45] <joal>	 a-team, I have found a workaround using dataframes instead of raw sql
[18:02:59] <joal>	 so nevermond sql :)
[18:03:07] <madhuvishy>	 joal: coool :)
[18:03:19] <mforns>	 joal, there's a method in rdd called cartesian I think
[18:03:48] <madhuvishy>	 ottomata: wanna chat now? if not i'll go to office
[18:03:49] <joal>	 there is a cartesian join in dataframes which works (by opposition to the one in SQL :-P)
[18:03:54] <icinga-wm>	 PROBLEM - Overall insertion rate from MySQL consumer on graphite1001 is CRITICAL: CRITICAL: 28.57% of data under the critical threshold [10.0]
[18:04:53] <mforns>	 I see
[18:05:54] <icinga-wm>	 RECOVERY - Difference between raw and validated EventLogging overall message rates on graphite1001 is OK: OK: Less than 20.00% above the threshold [20.0]
[18:07:55] <ottomata>	 madhuvishy:  ja
[18:07:58] <ottomata>	 just ordered lunch
[18:08:07] <madhuvishy>	 ottomata: batcave?
[18:08:09] <ottomata>	 sure
[18:08:13] <ottomata>	 very loud here :)
[18:16:08] <ottomata>	 mforns, yt
[18:16:13] <ottomata>	 shoudl we EL to prod?
[18:16:14] <mforns>	 ottomata, yes
[18:16:18] <mforns>	 aha
[18:16:24] <mforns>	 let's do that :]
[18:16:25] <ottomata>	 looks good in beta :)
[18:16:26] <ottomata>	 ok
[18:16:31] <wikibugs>	 Analytics, Wikimedia-Developer-Summit-2016: Developer summit session:  Pageview API  from the Event Bus perspective - https://phabricator.wikimedia.org/T112956#1948744 (Aklapper) Wikimedia Developer Summit 2016 ended two weeks ago. This task is still open. **If the session in this task took place**, plea...
[18:16:40] <mforns>	 batcave ottomata?
[18:16:51] <ottomata>	 sure i just got a giant salad
[18:16:51] <mforns>	 or do you want me to do that?
[18:16:56] <mforns>	 hehe
[18:16:57] <ottomata>	 you can run deploy, and I hang out and eat?
[18:16:58] <ottomata>	 and backup?
[18:17:06] <mforns>	 sure
[18:17:14] <ottomata>	 k lets do it
[18:17:18] <mforns>	 omw
[18:20:39] <nuria>	 milimetric: jynus on ops says that EL is lagging only 3 hrs
[18:21:30] <nuria>	 mforns: the wikimedi abot sounds like a better task,
[18:21:34] <nuria>	 *wikimedia bot
[18:23:23] <nuria>	 mforns: are the mobile patches https://gerrit.wikimedia.org/r/#/c/264297/
[18:23:23] <nuria>	 mforns: stopped until the mobile cache switch?
[18:23:31] <mforns>	 nuria, yes I guess so
[18:23:40] <nuria>	 mforns: btw, i have added madhuvishy as cr-er
[18:25:44] <wikibugs>	 Analytics, operations, ops-eqiad: Possible bad mem chip or slot on dbproxy1004 - https://phabricator.wikimedia.org/T123546#1948817 (Nuria) @Cmjohson: the  update should only be a few minutes right?  If so let's do it today/tomorrow if possible.
[18:30:27] <mforns>	 !log deployed EL in production with removal of queue
[18:30:30] <analytics-logbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log, Master
[18:35:35] <elukey_>	 ottomata: https://gerrit.wikimedia.org/r/#/c/265246/ updated :)
[18:36:44] <mforns>	 nuria, I'll take wikimedia bot, and setup the meeting. but in the meantime should I do the other one?
[18:37:58] <ottomata>	 ah haha
[18:37:59] <ottomata>	 elukey_: almost!
[18:38:03] <ottomata>	 the subscribe goes on the service
[18:38:13] <ottomata>	 the service subscribes to file changes
[18:38:34] <elukey>	 ottomata: sorry I need to sleep
[18:39:01] <elukey>	 subscribing a file to itself is a clear sign that my brain is not working
[18:39:31] <wikibugs>	 Analytics-Kanban: Communicate the WikimediaBot convention {hawk} [5 pts] - https://phabricator.wikimedia.org/T108599#1948902 (mforns) a:mforns
[18:40:04] <icinga-wm>	 RECOVERY - Overall insertion rate from MySQL consumer on graphite1001 is OK: OK: Less than 20.00% under the threshold [100.0]
[18:40:11] <wikibugs>	 Analytics-Tech-community-metrics, DevRel-January-2016: Make GrimoireLib display *one* consistent name for one user, plus the *current* affiliation of a user - https://phabricator.wikimedia.org/T118169#1948905 (Lcanasdiaz) It is fixed now.
[18:40:14] <ottomata>	 mforns:  :) yay!
[18:40:16] <ottomata>	 ees working!
[18:40:25] <mforns>	 :] ottomata
[18:40:33] <wikibugs>	 Analytics-Tech-community-metrics, DevRel-January-2016: Many profiles on profile.html do not display identity's name though data is available - https://phabricator.wikimedia.org/T117871#1948909 (Lcanasdiaz)
[18:40:35] <wikibugs>	 Analytics-Tech-community-metrics, DevRel-January-2016: Make GrimoireLib display *one* consistent name for one user, plus the *current* affiliation of a user - https://phabricator.wikimedia.org/T118169#1948908 (Lcanasdiaz) Open>Resolved
[18:40:36] <nuria>	 mforns: we should be careful not to repeat this work: https://phabricator.wikimedia.org/T123546
[18:40:37] <wikibugs>	 Analytics-Tech-community-metrics, Developer-Relations, DevRel-February-2016: Who are the top 50 independent contributors and what do they need from the WMF? - https://phabricator.wikimedia.org/T85600#1948910 (Lcanasdiaz)
[18:41:11] <nuria>	 madhuvishy: can you coordinate this small outage (minutes) for EL since you are on ops-duty this week: https://phabricator.wikimedia.org/T123546
[18:41:48] <nuria>	 madhuvishy: i have added you to the ticket
[18:42:46] <mforns>	 nuria, I don't understand
[18:43:10] <nuria>	 mforns: sorry, wrong ticket
[18:43:26] <mforns>	 nuria, ah! fiu...
[18:43:35] <nuria>	 mforns: give a sec
[18:43:37] <mforns>	 sure
[18:43:39] <nuria>	 *give me a sec
[18:43:49] <mforns>	 xD
[18:44:31] <wikibugs>	 Analytics, operations, ops-eqiad: Possible bad mem chip or slot on dbproxy1004 - https://phabricator.wikimedia.org/T123546#1948935 (Nuria) @Cmjohson: @madhuvishy is on ops duty this week and she can help coordinate this small  maintenance window.   We just need to:  1. communicate to list 2. stop el,...
[18:44:49] <nuria>	 mforns: this is it https://phabricator.wikimedia.org/T117945
[18:45:45] <nuria>	 madhuvishy: let me know if you feel Ok coordinating the small maintenance window for hardware for EL
[18:45:57] <madhuvishy>	 nuria: sure but i don't understand - why do we need to stop all of EL?
[18:46:17] <madhuvishy>	 we can just stop the mysql consumers right?
[18:46:23] <madhuvishy>	 ottomata: ^
[18:46:26] <wikibugs>	 Analytics-Kanban, DBA, Patch-For-Review: Pending maintenance on the eventlogging databases (db1046, db1047, dbstore1002, other dbstores) - https://phabricator.wikimedia.org/T120187#1948959 (Nuria) @jcrespo: let's do this hardware update before the conversion Ok? https://phabricator.wikimedia.org/T123546
[18:46:44] <icinga-wm>	 PROBLEM - Overall insertion rate from MySQL consumer on graphite1001 is CRITICAL: CRITICAL: 45.45% of data under the critical threshold [10.0]
[18:47:03] <ottomata>	 doh!
[18:47:10] <ottomata>	 mforns:  this check probably isn't tuned right anymore
[18:47:31] <ottomata>	 yeahhhh now its just gonna be really spikey
[18:47:44] <ottomata>	 because it will just wait til it gets 5 mins or 3000 events
[18:47:46] <ottomata>	 for each schema
[18:48:12] <madhuvishy>	 ummmm
[18:48:27] <mforns>	 nuria, yes it's linked from the task itself, OK
[18:48:39] <nuria>	 madhuvishy: that would work too, you can update ticket telling jaime we can take teh downtime easy
[18:48:39] <madhuvishy>	 what are we talking about?
[18:48:46] <mforns>	 ottomata, yes, it will be spikey for a while
[18:49:13] <madhuvishy>	 nuria: yeah - we can just leave EL running - and no data loss hopefully
[18:49:15] <nuria>	 madhuvishy: sorry, the hardware outage : https://phabricator.wikimedia.org/T123546
[18:49:19] <madhuvishy>	 it will just reconsume
[18:49:33] <mforns>	 but ottomata, this happened before also
[18:49:35] <nuria>	 madhuvishy: just FYI to jaime that we do not need to fallback
[18:49:40] <ottomata>	 oh utnil it gets out of sync
[18:49:41] <ottomata>	 right
[18:49:48] <ottomata>	 maybe we should graph and alert on a moving average?
[18:50:06] <nuria>	 madhuvishy: makes sense?
[18:50:16] <madhuvishy>	 nuria: yeah let me comment
[18:50:34] <elukey>	 ottomata: https://gerrit.wikimedia.org/r/#/c/265246/ - should be good now :)
[18:50:37] <nuria>	 madhuvishy: cause i think we should that before taking teh longer outage for toku db conversion
[18:50:49] <ottomata>	 oooo this could be a fun one for elukey :) i will make a task
[18:51:08] <mforns>	 ottomata, yes :]
[18:51:23] <ottomata>	 elukey:  merged! :)
[18:51:39] <madhuvishy>	 nuria: downtime for m4-master will still be there
[18:51:43] <madhuvishy>	 that's not a problem
[18:51:44] <madhuvishy>	 ?
[18:52:12] <madhuvishy>	 i see ottomata mentioned that there might be other users of m4-master that are not EL
[18:52:31] <wikibugs>	 Analytics, Reading-Web, Wikipedia-iOS-App-Product-Backlog: As an end-user I shouldn't see non-articles in the list of trending articles - https://phabricator.wikimedia.org/T124082#1948995 (Milimetric) Doing this kind of heuristic as part of the API call is possible, then clients would get less than 1...
[18:52:42] <nuria>	 madhuvishy: i do not think that is the case from my prior conversations with sean but it will be something to confirm
[18:53:11] <wikibugs>	 Analytics, operations, ops-eqiad: Possible bad mem chip or slot on dbproxy1004 - https://phabricator.wikimedia.org/T123546#1948996 (madhuvishy) @jcrespo @Cmjohnson: EL can handle downtime - We will just stop the EL mysql consumers, and restart them after maintenance window - and data should get recons...
[18:54:14] <madhuvishy>	 nuria: okay i commented on the ticket
[18:54:34] <madhuvishy>	 I'll communicate to the list anyway saying mysql consumers are being stopped for a while
[18:54:41] <madhuvishy>	 when it happens
[18:54:52] <elukey>	 ottomata: mmm so now the change can be merged in Palladium and pushed, but we'd need to test it in labs before that.. how can we prevent anybody from pushing out a newer change without pulling mine in?
[18:55:03] <elukey>	 (maybe I am confused about this workflow)
[18:55:07] <milimetric>	 MarkTraceur: here's your logs (location's in the cmd prompt)
[18:55:12] <nuria>	 madhuvishy: thank you, sorry not to have mentioned this on standup earlier but i think is a good fit for ops duty as it i s all about commnunication  mostly
[18:55:12] <milimetric>	 https://www.irccloud.com/pastebin/fQNllp0k/
[18:56:01] <MarkTraceur>	 Shoot.
[18:56:35] <MarkTraceur>	 milimetric: So my existing files couldn't get overwritten because they have data that doesn't match? Can we 1. delete the files and 2. run the generator again manually?
[18:56:39] <madhuvishy>	 elukey: do you want to test burrow change in self hosted labs instance?
[18:56:41] <MarkTraceur>	 i.e. non-cron run
[18:56:45] <wikibugs>	 Analytics-Kanban: Tune eventlogging.overall.inserted.rate alert to use a movingAverage transformation - https://phabricator.wikimedia.org/T124204#1949004 (Ottomata) NEW a:elukey
[18:56:48] <madhuvishy>	 nuria: yeah alright
[18:57:00] <ottomata>	 elukey:  you can't :)
[18:57:04] <milimetric>	 MarkTraceur: I'll delete the files if you back them up
[18:57:10] <ottomata>	 if you mean test in a non self hosted puppet master in labs
[18:57:14] <ottomata>	 you can't
[18:57:29] <ottomata>	  merge into production is applied everywhere in labs eventually
[18:57:32] <ottomata>	 that uses it
[18:57:40] <ottomata>	 in prod, it is a manual merge into local puppet repo clone on palladium
[18:57:45] <MarkTraceur>	 milimetric: Sold, where do you want them? Somewhere else in the public data?
[18:57:46] <ottomata>	 and then it will be applied everywhere
[18:57:52] <wikibugs>	 Analytics-EventLogging, MediaWiki-API, Easy, Google-Code-In-2015, Patch-For-Review: ApiJsonSchema implements ApiBase::getCustomPrinter for no good reason - https://phabricator.wikimedia.org/T91454#1949019 (Florian) :P Ok, thanks for the answer! :) I'll edit wikitech:Deployments, [[ https://wikit...
[18:58:01] <milimetric>	 MarkTraceur: grrr... I have no permission to chown them
[18:58:09] <MarkTraceur>	 Heh, yeah
[18:58:12] <milimetric>	 MarkTraceur: I don't care about the backups that's just for you :)
[18:58:12] <elukey>	 madhuvishy: yep I tried it today but didn't finish it..
[18:58:29] <madhuvishy>	 elukey: okay - let me know if you need help there
[18:58:31] <MarkTraceur>	 Oh, then screw it, it's the same data, but better formatted and easier to maintain
[18:58:35] <milimetric>	 but maybe ottomata can help with this.  Andrew I can't do this:
[18:58:35] <milimetric>	 milimetric@stat1003:/a/limn-public-data/metrics$ sudo -u stats chown -R stats multimedia-health/
[18:58:57] <elukey>	 madhuvishy: I followed the things that you told me last time and it worked, all good :)
[18:59:03] <milimetric>	 oh wait MarkTraceur you can just delete that directory completely if you're done with it
[18:59:16] <MarkTraceur>	 KK
[18:59:16] <madhuvishy>	 elukey: cool :)
[18:59:26] <MarkTraceur>	 {{done}}
[18:59:42] <milimetric>	 I gotta go, sorry, but if you delete that dir you should be good.
[19:00:00] <MarkTraceur>	 milimetric: Can I force the job to run again?
[19:04:33] <madhuvishy>	 ottomata: it's possible to kill just the mysql consumers in EL right?
[19:04:43] <ottomata>	 yes
[19:05:24] <madhuvishy>	 ottomata: do we have to kill them individually?
[19:05:38] * madhuvishy looks at puppet
[19:05:57] <elukey>	 all right, going offline.. byeeee o/
[19:06:49] <wikibugs>	 Analytics-Kanban: Communicate the WikimediaBot convention {hawk} [5 pts] - https://phabricator.wikimedia.org/T108599#1524599 (Nuria) Please connect with @bd808
[19:10:34] <madhuvishy>	 ottomata: hmmm - not super sure - do you have a way to kill all processes in a consumer group?
[19:15:09] <wikibugs>	 Analytics-Tech-community-metrics, DevRel-January-2016: "Unavailable section name" displayed on repository.html - https://phabricator.wikimedia.org/T121102#1949076 (Lcanasdiaz) It is fixed and already deployed in production.
[19:15:19] <wikibugs>	 Analytics-Tech-community-metrics, DevRel-January-2016: "Unavailable section name" displayed on repository.html - https://phabricator.wikimedia.org/T121102#1949077 (Lcanasdiaz) Open>Resolved
[19:16:16] <grrrit-wm>	 (CR) Nuria: "Adding @joal. We are still looking forward to merge this, correct?" [analytics/refinery/source] - https://gerrit.wikimedia.org/r/255105 (owner: DCausse)
[19:22:43] <nuria>	 madhuvishy: i have kill them with grep mysql | xargs  kill -9  <> in the past, would love to know how to make it more sophisticated
[19:26:10] <nuria>	 milimetric: the piwik ui is still trying to run queries, did you restarted after changing the archiving settings?
[19:34:24] <icinga-wm>	 PROBLEM - Overall insertion rate from MySQL consumer on graphite1001 is CRITICAL: CRITICAL: 20.00% of data under the critical threshold [10.0]
[19:34:41] <DarTar>	 we started the research showcase, streamed at https://t.co/tdESrvwEhd
[19:34:57] <wikibugs>	 Analytics-Kanban: Communicate the WikimediaBot convention {hawk} [5 pts] - https://phabricator.wikimedia.org/T108599#1949173 (bd808) >>! In T108599#1949029, @Nuria wrote: > Please connect with @bd808  Yes, please. @Anomie and I can help with getting the new guidelines broadcasted out to bot developers, librar...
[19:45:04] <icinga-wm>	 PROBLEM - Overall insertion rate from MySQL consumer on graphite1001 is CRITICAL: CRITICAL: 20.00% of data under the critical threshold [10.0]
[19:46:24] <milimetric>	 nuria, which queries, the real-time visitor one? That's apparently safe
[19:47:14] <milimetric>	 I restarted it, just to be safe, but then I changed some settings and verified they get picked up whether you restart or not
[19:49:37] <wikibugs>	 Analytics-Tech-community-metrics, pywikibot-core, DevRel-January-2016: Statistics for SCM project 'core' mix pywikibot/core, mediawiki/core and oojs/core - https://phabricator.wikimedia.org/T123808#1949253 (Aklapper) ...that dropdown also lists quite some duplicate entries like wikipedia, vendor, varn...
[19:49:56] <mutante>	 hello. i would like to check traffic stats for a bunch of domain names we have that aren't the regular project domains
[19:50:03] <mutante>	 stuff like wikiepdia.com
[19:50:31] <mutante>	 i was looking at 1000-sampled.json on oxygen, but if they are very low usage, it should probably check unsampled logs
[19:50:45] <mutante>	 could you point me to instructions to get that out of hadoop?
[19:51:10] <ottomata>	 milimetric:  was going to help point, but then i was wondering about projectview files
[19:51:10] <mutante>	 i want to justify deactivating a bunch of them , if they are rarely ever used
[19:51:13] <ottomata>	 etc.
[19:51:23] <ottomata>	 are bad domains like that in projectview/projectcounts?
[19:51:24] <icinga-wm>	 PROBLEM - Overall insertion rate from MySQL consumer on graphite1001 is CRITICAL: CRITICAL: 20.00% of data under the critical threshold [10.0]
[19:51:46] <mutante>	 we have over 600 domain names :p
[19:51:57] <mutante>	 all kinds of typo and weird stuff
[19:52:06] <mutante>	 but just a fraction of them get traffic
[19:52:09] <grrrit-wm>	 (CR) Joal: "I Think so @nuria" [analytics/refinery/source] - https://gerrit.wikimedia.org/r/255105 (owner: DCausse)
[19:53:34] <ottomata>	 oh joal is here, maybe he can answer ^^^
[19:53:38] <ottomata>	 mutante:  in lieu of an answer from milimetric, check out:
[19:53:42] <ottomata>	 https://upload.wikimedia.org/wikipedia/commons/5/53/Introduction_to_Hive.pdf
[19:53:56] <mutante>	 thanks!
[19:53:58] <ottomata>	 some good stuff here too
[19:53:59] <ottomata>	 https://wikitech.wikimedia.org/wiki/Analytics/Cluster/Hive/Queries
[19:54:03] <ottomata>	 https://wikitech.wikimedia.org/wiki/Analytics/Cluster/Hive
[19:54:22] <milimetric>	 (Reading up)
[19:54:27] <joal>	 mutante: if you are after special domain names (not subdomains), then you'll only find them in webrequestlog
[19:54:43] <joal>	 webrequest log sorry mutante
[19:54:51] <joal>	 (in hive, webrequest table)
[19:55:16] <mutante>	 thanks, both of you
[19:55:23] <mutante>	 looks on stat1002
[19:55:35] <ottomata>	 mutante:  ja, hive --database wmf
[19:55:38] <ottomata>	 webrequest
[19:55:40] <ottomata>	 table
[19:55:44] <icinga-wm>	 PROBLEM - Overall insertion rate from MySQL consumer on graphite1001 is CRITICAL: CRITICAL: 20.00% of data under the critical threshold [10.0]
[19:55:46] <ottomata>	 ...you might need to be in the analytics-privatedata-users group
[19:55:49] <joal>	 The thing is, this is a huge table even if it contains only 2 month of data.
[19:56:12] <joal>	 mutante: 80Tb for 2 month ...
[19:56:24] <mutante>	 WARNING: Hive CLI is deprecated and migration to Beeline is recommended.
[19:56:27] <mutante>	 hive (wmf)>
[19:56:29] <milimetric>	 mutante: let me know if it gets frustrating and I can help with the sql.  But my first thought is maybe grouping and counting by hostname for an hour of traffic?
[19:56:36] <mutante>	 joal: as long as it's ok i run a query ?
[19:56:37] <joal>	 So, your query might be long-ish depending on how much you are reqesuting to analyze :)
[19:56:53] <milimetric>	 ignore the beehive thing, Hive cli is fine
[19:56:54] <mutante>	 well, if i just check an hour of traffic
[19:57:07] <mutante>	 then i could also use the sampled log on oxygen?
[19:57:07] <joal>	 mutante: for an hour, you'll be fiiiiiine :)
[19:57:12] <joal>	 even a day is manageable
[19:57:24] <joal>	 it get's trickier for more
[19:57:30] <mutante>	 ideally i want to say " nobody ever used this in the last year " :p
[19:57:39] <mutante>	 i dont know what our threshold is , heh
[19:57:46] <mutante>	 i want reasons to _remove_ domains :)
[19:57:51] <joal>	 mutante: only 2 month of data anyway ...
[19:57:58] <mutante>	 ah, right
[19:58:06] <mutante>	 data-retention,, ack
[19:58:10] <joal>	 privacy first, mutante :)
[19:58:10] <milimetric>	 yeah, but looking at 2 months of data will be *really* slow
[19:58:19] <joal>	 yeah, really really
[19:58:31] <milimetric>	 like, kill the cluster slow :)
[19:58:32] <wikibugs>	 Analytics, operations, ops-eqiad: Possible bad mem chip or slot on dbproxy1004 - https://phabricator.wikimedia.org/T123546#1949298 (Ottomata) Since we are about to have an EL downtime anyway, can we fit this in as well?
[19:58:59] <joal>	 milimetric: hopefully not, but long to get results :)
[19:59:15] <mutante>	 hmm, i'm wondering if i make a real difference whether i do this (and limit it to days) or just rely on sampled-1000.log
[19:59:18] <joal>	 milimetric: it would kill the cluster if we run that in a queue preventing other jobs to happen
[19:59:21] <milimetric>	 yeah.  Hm... mutante: do you have a list of candidates for removal?
[19:59:43] <milimetric>	 sampled-1000 is usually ok
[19:59:58] <mutante>	 milimetric: here's a random selection to start with https://gerrit.wikimedia.org/r/#/q/status:open+project:operations/dns+branch:master+topic:parking,n,z
[20:00:03] <milimetric>	 the only problem is if you expect that there are less than 1000 hits for something over whatever period you're looking at
[20:00:31] <joal>	 Well, using sampled-1000 is actually a good trick: if it's not in there, there a strong probability that it has not past 1000 hits
[20:00:31] <milimetric>	 mutante: ok, that might help, you can look for hits where uri_host is equal to those
[20:00:35] <mutante>	 wikiepdia had 2 lines in the sampled-1000
[20:01:02] <mutante>	 milimetric: thx
[20:01:34] <milimetric>	 and mutante one idea would be to run for an hour first, then a day, and see how the performance looks, then if you're comfortable look at a few days
[20:01:43] <joal>	 mutante: if you only filter a subset of project and count, you could go for a month long query and wait for results
[20:01:52] <joal>	 But make sure you have it tested on hours before :)
[20:02:07] <joal>	 milimetric: You always are faster than me :)
[20:02:10] <mutante>	 ok!
[20:02:30] <joal>	 YuviPanda: Hellooo-oo?
[20:02:56] <joal>	 Ironholds: helloooo-oo as well?
[20:03:18] <madhuvishy>	 ottomata: what EL downtime are you talking about on the ticket?
[20:03:23] <joal>	 Given that Ironholds had not slept for 22 hours at my morning time, I strongly suspect he won't answer :)
[20:03:58] <joal>	 maybe madhuvishy knows
[20:03:58] <madhuvishy>	 joal: Ironholds was in the office a few minutes back
[20:04:05] <joal>	 Ah, maybe then :)
[20:04:36] <joal>	 madhuvishy: do you know if YuviPanda has private notebooks working on our network or not yet ?
[20:04:49] <madhuvishy>	 joal: not yet as far as I know
[20:05:01] <joal>	 Crap ... k, thx madhuvishy :)
[20:07:06] <mutante>	 ottomata: hah , Exception: Permission denied: user=root, access=READ,
[20:07:12] <mutante>	 i'll ask for access :p
[20:07:35] <mutante>	 i can get on the client and use "describe" on tables but not select data
[20:07:42] <mutante>	 s/client/shell
[20:07:46] <ottomata>	 right
[20:07:47] <ottomata>	 good it works!
[20:07:49] <ottomata>	 :)
[20:09:11] <madhuvishy>	 mutante: yeah it lets you do everything until the point you try to read from hdfs
[20:09:31] <madhuvishy>	 the desc etc is metadata that hive has I think
[20:09:55] <mutante>	 are you gonna use that "Beeline" stuff later?
[20:10:19] <ottomata>	 hehe, that shoudl be a task too!
[20:10:23] <ottomata>	 madhuvishy:  did we make a task for that?
[20:10:32] <ottomata>	 that's another good little fun one, maybe luca can do that too
[20:10:39] <madhuvishy>	 ottomata: yeah i think we did
[20:10:52] <ottomata>	 found it
[20:10:52] <ottomata>	 https://phabricator.wikimedia.org/T116123
[20:11:11] <wikibugs>	 Analytics: Make beeline easier to use as a Hive client {hawk} - https://phabricator.wikimedia.org/T116123#1949356 (Ottomata) @elukey this is another that could be fun and very helpful!
[20:11:17] <madhuvishy>	 we should may be have some script or sth that can launch it with the configuration set up
[20:11:46] <ottomata>	 yeah, probably there is some way to set some default vars, if not then ja a wrapper of some kind that reads from hive-site.xml or seomethign
[20:13:36] <mutante>	 one more question. what's the issue here
[20:13:37] <mutante>	 FAILED: SemanticException [Error 10041]: No partition predicate found for Alias "webrequest" Table "webrequest"
[20:14:04] <mutante>	 that happens when i just do example commands like SELECT agent_type FROM webrequest LIMIT 5;
[20:15:28] <mutante>	 ignore me, i just had to keep reading
[20:16:21] <ottomata>	 :)
[20:16:32] <ottomata>	 rookie mistake #1 ! :D
[20:16:45] <mutante>	 yes
[20:18:21] <mutante>	 hive (wmf)> SELECT agent_type FROM webrequest where year=2016 and month=1 and day=19 limit 5;
[20:18:28] <mutante>	 OK
[20:18:28] <mutante>	 agent_type
[20:18:28] <mutante>	 SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
[20:18:39] <wikibugs>	 Analytics, operations, ops-eqiad: Possible bad mem chip or slot on dbproxy1004 - https://phabricator.wikimedia.org/T123546#1949407 (Ottomata) @cmjohnson and I will do this Jan 21 16:00 UTC (11am EST).  Should be a very short and unnoticeable downtime.
[20:22:12] <ottomata>	 ut	 you can ignore the SLF4J error
[20:22:16] <ottomata>	 did you get no results?
[20:22:25] <ottomata>	 btw, that is a pretty large query you are trying to launch
[20:22:39] <ottomata>	 it would be best if, while exploring, you limited it to the smallest partition spec possible
[20:22:50] <ottomata>	 where year, month, day, hour, webrequest_source
[20:22:52] <ottomata>	 with all of those set
[20:22:57] <ottomata>	 webrequest_source='misc' will give you the smallest
[20:23:04] <ottomata>	 mutante: ^
[20:23:22] <mutante>	 ok, i already canceled that again
[20:23:36] <mutante>	 sets hour too
[20:25:04] <mutante>	 source='misc' show me all the 15.wp hits, nice
[20:27:42] <wikibugs>	 Analytics-Kanban, DBA, Patch-For-Review: Pending maintenance on the eventlogging databases (db1046, db1047, dbstore1002, other dbstores) - https://phabricator.wikimedia.org/T120187#1949448 (Ottomata) Just worked this out in IRC.  The downtime will start at 16:00 UTC.  @madhuvishy will email the analyti...
[20:28:30] <ottomata>	 joal: https://phabricator.wikimedia.org/T109286
[20:28:34] <ottomata>	 "The traffic move from mobile->text is now on hold (we did convert codfw, then we rolled back) due to purge-related issues that need to be addressed first, in blocking task https://phabricator.wikimedia.org/T124165.
[20:28:34] <ottomata>	 "
[20:28:53] <milimetric>	 MarkTraceur: yeah, we can force it if we have to but it's kind of a pain in the butt.  I think it runs again in about 3 hours
[20:28:57] <joal>	 ottomata: Yes, I followed the talk on traffic chan earlier on today
[20:29:01] <ottomata>	 nice
[20:29:06] <joal>	 Problems of cache invalidation
[20:29:08] <mutante>	 i actually found 3 hits on my typo domain in a one-hour range.. hmmm hmmm
[20:29:11] <ottomata>	 so, we need to remember to not deploy unless we roll back our refinery patches
[20:29:16] <MarkTraceur>	 milimetric: All right, fair enough
[20:29:18] <ottomata>	 :/
[20:29:22] <ottomata>	 that's what I get for merging too soon
[20:29:25] <joal>	 Mwarf ...
[20:29:37] <joal>	 ottomata: My bad, I have been pushing for that
[20:29:42] <MarkTraceur>	 milimetric: If it fails again, though...
[20:30:22] <joal>	 ottomata: there was no deploy plan soon (still work ongoing both on mforns_gym app-sessions and nuria last_access_uniques
[20:30:54] <milimetric>	 MarkTraceur: if it fails again I'll run it myself and fix any problems :)  deal?
[20:31:18] <MarkTraceur>	 I mean, I'd be happy to fix it
[20:31:26] <MarkTraceur>	 But I don't know how
[20:31:40] <milimetric>	 no worries, that's the kind of customer service we pride ourselves on here in -analytics
[20:32:00] <MarkTraceur>	 Heh
[20:32:05] <MarkTraceur>	 All right, fair enough
[20:32:33] <ottomata>	 aye
[20:32:36] <ottomata>	 joal:  we'll leave it for now
[20:32:55] <joal>	 ok ottomata
[20:33:11] <joal>	 From what I have understood ottomata they still want to have it done soon-ish
[20:33:19] <ottomata>	 cool, aye, just asked that :)
[20:33:24] <ottomata>	 oh 1:1 with nuria!
[20:33:34] <nuria>	 ottomata: yessir
[20:34:35] <wikibugs>	 Analytics, Wikipedia-iOS-App-Product-Backlog, Patch-For-Review, iOS-app-v5-production: Puppetize Piwik to prepare for production deployment - https://phabricator.wikimedia.org/T103577#1949492 (Fjalapeno) Open>Resolved a:Fjalapeno Deployed
[20:35:16] <wikibugs>	 Analytics, Wikipedia-iOS-App-Product-Backlog, iOS-app-v5-production: Support Piwik in production - https://phabricator.wikimedia.org/T116308#1949505 (Fjalapeno) Open>Resolved a:Fjalapeno
[20:37:31] <grrrit-wm>	 (CR) Milimetric: Add a graph tracking how many people have enabled the cross-wiki notifications beta feature (1 comment) [analytics/limn-ee-data] - https://gerrit.wikimedia.org/r/264700 (owner: Catrope)
[20:37:57] <madhuvishy>	 nuria: can we push our 1:1 by 30 minutes. I was at research showcase and would like to get lunch
[20:38:09] <grrrit-wm>	 (CR) Milimetric: [C: 2 V: 2] Add a graph tracking how many people have enabled the Flow beta feature [analytics/limn-flow-data] - https://gerrit.wikimedia.org/r/264698 (https://phabricator.wikimedia.org/T114111) (owner: Catrope)
[20:39:43] <joal>	 hm, ottomata I've been reading brandon answer and now I wonder about reverting now
[20:42:16] <wikibugs>	 Analytics-General-or-Unknown, Community-Advocacy, Wikimedia-Extension-setup: enable Piwik on ru.wikimedia.org - https://phabricator.wikimedia.org/T91963#1949533 (Fjalapeno)
[20:44:49] <madhuvishy>	 ottomata1:
[20:44:53] <madhuvishy>	 https://www.irccloud.com/pastebin/rTpzNm2E/
[20:44:54] <icinga-wm>	 PROBLEM - Overall insertion rate from MySQL consumer on graphite1001 is CRITICAL: CRITICAL: 20.00% of data under the critical threshold [10.0]
[20:51:14] <icinga-wm>	 PROBLEM - Overall insertion rate from MySQL consumer on graphite1001 is CRITICAL: CRITICAL: 20.00% of data under the critical threshold [10.0]
[20:52:58] <ottomata1>	 ergh ^
[20:53:02] <ottomata1>	 annoying
[20:55:54] <madhuvishy>	 ottomata1: what is this coming from?
[20:56:12] <madhuvishy>	 also pasted email draft above. if it's fine i'll send
[20:56:23] <madhuvishy>	 analytics and engg lists?
[21:00:02] <ottomata1>	 (in 1:1, with you shortly)
[21:01:14] <nuria>	 madhuvishy: i have a 1:1 with dario shortly after ours
[21:01:23] <nuria>	 madhuvishy: but i can do it later, at 3pm
[21:01:25] <madhuvishy>	 nuria: ya it's fine - lets do now
[21:01:31] <madhuvishy>	 i'll go for lunch after
[21:01:36] <ottomata>	 +1 for email madhuvishy
[21:01:40] <madhuvishy>	 ottomata: cool
[21:01:55] <ottomata>	 madhuvishy:  i think that alert is fireing more often now because of mforns queueiung change
[21:02:01] <ottomata>	 made a ticket for fixing this morning
[21:02:18] <ottomata>	 joal:  aye
[21:02:19] <nuria>	 madhuvishy: ok, let's do it then
[21:02:20] <ottomata>	 ja :/
[21:02:51] <joal>	 ottomata: so, revert tomorrow? (I'm about to go to bed)
[21:03:37] <ottomata>	 ja
[21:03:38] <ottomata>	 k
[21:03:43] <ottomata>	 lets do later
[21:03:50] <ottomata>	 nighters joal!
[21:03:54] <joal>	 Thx ottomata
[21:04:14] <joal>	 ottomata: I'll bug you tomorrow not to forget (I just made myself a post it :)
[21:04:26] <ottomata>	 k :)
[21:37:17] <grrrit-wm>	 (PS6) Nuria: Drop support for message without rev id in avro decoders and make latestRev mandatory [analytics/refinery/source] - https://gerrit.wikimedia.org/r/255105 (owner: DCausse)
[21:38:39] <mforns>	 ottomata, yt?
[21:40:16] <ottomata>	 ja hey
[21:40:47] <mforns>	 hey!
[21:40:56] <mforns>	 are you planning to rollback the EL deployment?
[21:41:40] <mforns>	 ottomata, ^
[21:42:54] <mforns>	 oh no, I think it was another thing, now that I read the scrollback
[21:43:17] <ottomata>	 mforns:  no
[21:43:20] <ottomata>	 yeah another thing :)
[21:44:11] <mforns>	 anyway ottomata I thought we might add a random number of seconds to the time limit, so that batches are not inserted in spikes, it seems this version of the code is not going to unsync
[21:46:36] <ottomata>	 yeah
[21:46:37] <ottomata>	 hm
[21:46:50] <mforns>	 I'm writing the commit msg
[21:46:51] <ottomata>	 heh, i dunno
[21:46:55] <mforns>	 no?
[21:46:57] <ottomata>	 or a random sleep before starting
[21:47:02] <ottomata>	 would do it too
[21:47:08] <ottomata>	 randomly changing the time might be funky
[21:47:13] <ottomata>	 really, we might want to do a shorter time limit
[21:47:21] <ottomata>	 which would also smooth this out
[21:47:31] <ottomata>	 dunno though
[21:47:42] <mforns>	 yes, the spikes would be shorter and more frequent
[21:48:39] <mforns>	 but wouldn't we lose the performance gain we reached when batching inserts?
[21:50:08] <ottomata>	 maybe, i guess, but, hm.  i'd keep the max batch size large
[21:50:13] <ottomata>	 so that if events come in real fast
[21:50:16] <ottomata>	 they will be in large batches
[21:50:28] <ottomata>	 but for schemas with small numbers of events, maybe its ok to insert frequently?
[21:50:32] <ottomata>	 not sure.
[21:50:37] <mforns>	 I see, I think you're right
[21:50:50] <ottomata>	 dunno though
[21:50:54] <ottomata>	 i'm reluctant to change this now though
[21:51:02] <mforns>	 and where would you put the random sleep befor start?
[21:51:05] <ottomata>	 since we are going to do a downtime tomorrow, and then heavily overload the thing after
[21:51:06] <mforns>	 aha, ok
[22:16:14] <icinga-wm>	 PROBLEM - Overall insertion rate from MySQL consumer on graphite1001 is CRITICAL: CRITICAL: 20.00% of data under the critical threshold [10.0]
[22:37:39] <wikibugs>	 Analytics, Analytics-Cluster: https://yarn.wikimedia.org/cluster/scheduler should be behind ldap - https://phabricator.wikimedia.org/T116192#1950054 (Tbayer) OK, I've added this alternative to the documentation for now (not fully sure what's going on though, or whether the variant with bast1001 should be...
[22:41:44] <icinga-wm>	 PROBLEM - Overall insertion rate from MySQL consumer on graphite1001 is CRITICAL: CRITICAL: 20.00% of data under the critical threshold [10.0]
[22:58:37] <wikibugs>	 Analytics: Remove LegacyPageviews from vital-signs - https://phabricator.wikimedia.org/T124244#1950171 (madhuvishy) NEW
[22:59:46] <wikibugs>	 Analytics: Move vital signs to its own instance {crow} [5 pts] - https://phabricator.wikimedia.org/T123944#1950184 (madhuvishy)
[22:59:47] <wikibugs>	 Analytics: Remove LegacyPageviews from vital-signs - https://phabricator.wikimedia.org/T124244#1950185 (madhuvishy)
[23:00:14] <wikibugs>	 Analytics: Make beeline easier to use as a Hive client {hawk} - https://phabricator.wikimedia.org/T116123#1950188 (Tbayer) One blocker I have personally encountered in using beeline is that there does not seem to be an option to raise the heap memory limit akin to [[https://wikitech.wikimedia.org/wiki/Analyt...
[23:01:56] <wikibugs>	 Analytics: Make beeline easier to use as a Hive client {hawk} - https://phabricator.wikimedia.org/T116123#1950196 (madhuvishy) @Tbayer This should not be a problem anymore. @Ottomata configured it to have 1024m as the default.
[23:19:45] <wikibugs>	 Analytics: Make beeline easier to use as a Hive client {hawk} - https://phabricator.wikimedia.org/T116123#1950359 (Tbayer) >>! In T116123#1950196, @madhuvishy wrote: > @Tbayer This should not be a problem anymore. @Ottomata configured it to have 1024m as the default.   Thanks, that's great! Although there ar...
[23:23:19] <wikibugs>	 Analytics: Make beeline easier to use as a Hive client {hawk} - https://phabricator.wikimedia.org/T116123#1950366 (madhuvishy) @Tbayer aah, yes that'd be a problem them. We'll try to figure out a way to make configuring these options easier, but it's probably something better fixed in beeline upstream. (http...
[23:44:04] <YuviPanda>	 madhuvishy: btw, http://googlecloudplatform.blogspot.com/2016/01/Dataflow-and-open-source-proposal-to-join-the-Apache-Incubator.html
[23:48:43] <wikibugs>	 Analytics, operations, ops-eqiad: Possible bad mem chip or slot on dbproxy1004 - https://phabricator.wikimedia.org/T123546#1950435 (Tbayer)
[23:48:45] <wikibugs>	 Analytics: Restore MobileWebSectionUsage_14321266 and MobileWebSectionUsage_15038458 - https://phabricator.wikimedia.org/T123595#1950434 (Tbayer)
[23:48:47] <wikibugs>	 Analytics-Kanban, DBA, Patch-For-Review: Pending maintenance on the eventlogging databases (db1046, db1047, dbstore1002, other dbstores) - https://phabricator.wikimedia.org/T120187#1950436 (Tbayer)
[23:48:59] <madhuvishy>	 YuviPanda: google compute engine hhmmmm
[23:49:15] <YuviPanda>	 madhuvishy: but as an apache project
[23:49:21] <madhuvishy>	 right
[23:49:41] <madhuvishy>	 looks interesting
[23:51:08] <wikibugs>	 Analytics: Restore MobileWebSectionUsage_14321266 and MobileWebSectionUsage_15038458 - https://phabricator.wikimedia.org/T123595#1950448 (Tbayer)  >>! In T123595#1934514, @Nuria wrote: > @Tbayer: >  > Given the many issues we have in our data store right now. >  > Hardware: https://phabricator.wikimedia.org/...
[23:51:44] <icinga-wm>	 PROBLEM - Overall insertion rate from MySQL consumer on graphite1001 is CRITICAL: CRITICAL: 20.00% of data under the critical threshold [10.0]
[23:55:55] <icinga-wm>	 PROBLEM - Overall insertion rate from MySQL consumer on graphite1001 is CRITICAL: CRITICAL: 20.00% of data under the critical threshold [10.0]