[00:05:37] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Add is_pageview as a dimension to the 'webrequest_sampled_128' Druid dataset - https://phabricator.wikimedia.org/T212778 (10Nuria) In neither turnilo nor superset does is_pageview appear as a dimension. I think we might need a job restart.
[00:35:52] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Add is_pageview as a dimension to the 'webrequest_sampled_128' Druid dataset - https://phabricator.wikimedia.org/T212778 (10Nuria) mmmm .. both jobs were restarted on the 1/7  Job Name    : webrequest-druid-hourly-coord App Path    : hdfs://analytics-hadoo...
[01:28:52] <wikibugs>	 10Analytics, 10Research, 10WMDE-Analytics-Engineering, 10User-Addshore, 10User-Elukey: Provide tools for querying MediaWiki replica databases without having to specify the shard - https://phabricator.wikimedia.org/T212386 (10Neil_P._Quinn_WMF) Product Analytics met with @elukey and @Milimetric on Monday...
[02:05:06] <wikibugs>	 10Analytics, 10Analytics-EventLogging, 10Operations, 10Performance-Team: New MongoDB version is not DFSG-compatible, dropped by Debian - https://phabricator.wikimedia.org/T213996 (10MaxSem)
[02:06:47] <wikibugs>	 10Analytics, 10Analytics-EventLogging, 10Operations, 10Performance-Team, 10Software-Licensing: New MongoDB version is not DFSG-compatible, dropped by Debian - https://phabricator.wikimedia.org/T213996 (10Peachey88)
[02:15:49] <wikibugs>	 10Analytics, 10Analytics-EventLogging, 10Operations, 10Performance-Team, 10Software-Licensing: New MongoDB version is not DFSG-compatible, dropped by Debian - https://phabricator.wikimedia.org/T213996 (10MaxSem)
[02:21:00] <wikibugs>	 10Analytics, 10Analytics-EventLogging, 10Operations, 10Performance-Team, 10Software-Licensing: New MongoDB version is not DFSG-compatible, dropped by Debian - https://phabricator.wikimedia.org/T213996 (10Legoktm) SSPL v2 is not substantially different, and IMO the perspective on the OSI's license-review...
[03:55:43] <wikibugs>	 10Analytics, 10Analytics-EventLogging, 10Operations, 10Performance-Team, 10Software-Licensing: New MongoDB version is not DFSG-compatible, dropped by Debian - https://phabricator.wikimedia.org/T213996 (10Krinkle) XHGui is scheduled to be migrated from tungsten (Jessie; Mongo 2.4.10) to webperf1002 (Stret...
[07:15:14] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10User-Elukey: Convert Aria tables to InnoDB on dbstore1002 - https://phabricator.wikimedia.org/T213706 (10elukey) The alter tables to convert Aria tables to InnoDB are still in progress :)
[07:24:44] <elukey>	 morning!
[07:32:20] <wikibugs>	 10Analytics, 10DBA, 10Operations, 10ops-eqiad: swap a2-eqiad PDU with on-site spare - https://phabricator.wikimedia.org/T213748 (10Marostegui)
[07:33:50] <wikibugs>	 10Analytics, 10DBA, 10Operations, 10ops-eqiad: swap a2-eqiad PDU with on-site spare - https://phabricator.wikimedia.org/T213748 (10Marostegui)
[07:36:09] <wikibugs>	 10Analytics, 10DBA, 10Operations, 10ops-eqiad: swap a2-eqiad PDU with on-site spare - https://phabricator.wikimedia.org/T213748 (10Peachey88)
[07:45:02] <wikibugs>	 10Analytics, 10DBA, 10Operations, 10ops-eqiad: swap a2-eqiad PDU with on-site spare - https://phabricator.wikimedia.org/T213748 (10Marostegui) db1075, s3 primary master, was failed over to db1078 which is in row C.
[07:56:04] <wikibugs>	 10Analytics, 10DBA, 10Operations, 10ops-eqiad: swap a2-eqiad PDU with on-site spare - https://phabricator.wikimedia.org/T213748 (10Marostegui) @RobH is this happening today too along with a3 maintenance or is this finally moved to Tue 22nd?
[08:02:40] <joal>	 Good morning elukey 
[08:03:11] <joal>	 elukey: Would you give me a minute for a quick catch up?
[08:04:06] <elukey>	 joal: bonjour!
[08:04:16] <elukey>	 of course!
[08:06:01] <joal>	 elukey: How shall we proceed? IRC, batcave? And of course, coffee first :)
[08:07:38] <elukey>	 batcave!
[08:07:48] <joal>	 OMW elukey :)
[08:30:48] <addshore>	 Morning all :)
[08:31:03] <joal>	 Hi addshore! Back I am!
[08:31:13] <addshore>	 :D
[08:31:55] <addshore>	 The thing I have been avoiding poking you about while you were ill is https://phabricator.wikimedia.org/T211015 :P
[08:32:23] <addshore>	 I have all of the data, just need to pull it out and do some maths
[08:32:45] <addshore>	 I wrote a very crappy query 2 days ago that I think basically shows me what I think I was expecting
[08:32:46] <joal>	 k addshore 
[08:32:55] <addshore>	 https://www.irccloud.com/pastebin/38K8ZZbm/
[08:33:10] <addshore>	 But while writing that I also figured, I should totally be writing this as a notebook or something
[08:35:00] <joal>	 addshore: Why not - book :)
[08:36:21] <addshore>	 exactyl, but at that stage I was already mostly there with that large query :D
[08:37:22] <joal>	 :)
[08:37:38] <joal>	 addshore: To me this query is not that large ;)
[08:38:06] <addshore>	 joal: yes, but your a crazy person :P
[08:38:16] <addshore>	 it actually turned out to be quite a nice query
[08:38:25] <addshore>	 my first attempt didnt have the sub queries, and was not as nice
[08:38:29] * joal thinks of how long of a query it takes to be seen as crazy
[09:11:53] <elukey>	 joal: disabled puppet on all the hosts, merging the change to spin up the testing cluster ok?
[09:12:00] <joal>	 please elukey
[09:13:41] <addshore>	 wow, i got totally distracted by coffee and other tickets :P
[09:13:57] <addshore>	 joal: does that query look roughly right as a starting point from what you know of the task?
[09:14:16] <addshore>	 basically, the event logging is on how many page views people interact with these 3 UI elements
[09:14:34] <addshore>	 only counted once per page view for each element
[09:15:06] <addshore>	 and then joining that with the pageview data for the same days, and from that coming up with a % of user views that interacted which each of the UI elements
[09:16:04] <joal>	 addshore: I follow you and it seems correct - Confirming one thing: the 
[09:16:08] <joal>	 aouch again
[09:16:32] <addshore>	 the ouch? touch? aouch? souch? :>PO
[09:16:38] <joal>	 :)
[09:16:49] <addshore>	 slouch? :P
[09:17:30] <joal>	 Very mouch ... The wikibasetermboxinteraction table contains events only for those pageviews that have interacted with UI, right?
[09:17:55] <addshore>	 yes
[09:18:37] <addshore>	 otherwise we would be sending multiple events for every single wikidata page view :/
[09:18:53] <addshore>	 and we only really need a rough % of interactions at this stage
[09:19:37] <joal>	 ok - Something to keep in mind in that namespace_id is not present on all pages (not for mobile app for instancde IIRC), but it should be minimal
[09:20:01] <addshore>	 okay, but I'm already only looking at AND p.access_method = 'desktop'
[09:20:10] <addshore>	 so for this that should be fine?
[09:20:14] <joal>	 Right - Missed that - Should be good
[09:20:17] <addshore>	 whooo
[09:21:14] <joal>	 maybe some more precisions for page patterns to prevent non-entities pages (not sure if they exist, I'm not good at mediawiki-namespaces and special pages stuff)
[09:22:17] <addshore>	 nope, everything in those 2 namespaces on wikidata will have the UI elements that we are tracking, and only those pages ;)
[09:23:07] <joal>	 SOunds good then
[09:23:42] <addshore>	 WHOOO, thanks for double checking my logic
[09:23:59] <addshore>	 sometimes i start writing these queries and get some numbers and then don't believe what I'm seeing / think I have just done it all wrong ;)
[09:25:09] <joal>	 Something to consider in the subqueries: remove the order and limit, and do inner join (days should exist in bot tables) - Since you'll join after, those costly steps are not needed
[09:25:16] <joal>	 addshore: --^
[09:26:33] <joal>	 addshore: Just double checked for yesterday - namespace_id is null for the same constraints you defined for 1.1% of events
[09:27:01] <joal>	 addshore: events being here instances of pageview - checking the view_counts
[09:27:43] <joal>	 0.8% in term of view_count - Seems ok :)
[09:28:08] <addshore>	 namespace_id is null for the same constraints you defined for 1.1% of events, realllly?
[09:28:25] <addshore>	 i wonder what causes that? O_o
[09:28:36] <joal>	 addshore: I don't know :(
[09:28:47] <joal>	 addshore: That would be interesting to be investigated
[09:28:50] <addshore>	 well
[09:29:03] <addshore>	 the namespace_id is added by PHP code into the x analytics header right? (if i remember)?
[09:29:18] <joal>	 addshore: About events in the table, can you explain to me again? I think I have misunderstood one bit you mentioned earlier
[09:29:43] <joal>	 addshore: yes indeed
[09:30:01] <elukey>	 joal: I am running puppet on the druid/stat/etc.. nodes, no changes in config so far
[09:30:02] <addshore>	 does the pageviews table include page views that erroeed out at some point? (i seem to remember no?)
[09:30:28] <joal>	 addshore: we keep only http_stats IN (200, 304) IIRC
[09:30:41] <joal>	 http_status sorry
[09:32:34] <addshore>	 joal: interesting, i'd been keen to have a look at some of those requests that are apparently missing the NS and see if I can spot what is going on
[09:33:01] <addshore>	 >> "addshore: About events in the table, can you explain to me again? I think I have misunderstood one bit you mentioned earlier" YES :)
[09:33:07] <joal>	 Thanks addshore - I'll be interested, keep me posted please :)
[09:33:45] <addshore>	 so, there are 3 UI JS links that we are tracking the interaction of, but all we care about is if on a given page load there is some interaction with those links, now how many times the links are clicked on a single page load, but if each one is
[09:33:55] <joal>	 addshore: About the eventlogging table, for my understanding - It receives events based on interactions with some javascript on the page, and therefore can send multiple events for each pageview - correct?
[09:34:10] <addshore>	 so, when the link is clicked, and if this page load has not already fired an event for this link, an event is sent, but if the user clicks it again, there is no event
[09:34:37] <addshore>	 joal: it can, but we specifically dont do that in the JS side, we will only send each type of event once per page load
[09:34:57] <joal>	 This means the JS reloads the page?
[09:35:03] <joal>	 generating a new pageview
[09:35:17] <addshore>	 the JS doesn't reload the page at all
[09:35:54] <addshore>	 so in a single page view, you could happily sit there and interact with these elements 10000 times with no reloading, and then only 3 events being fired, saying the user used the elements
[09:36:00] <joal>	 addshore: if a client interacts multiple time with this JS, you don't know - You only know the state of the box at load time, right?
[09:36:31] <joal>	 I'm still not on it :)
[09:36:47] <joal>	 So: 3 events per pageview, on per action-type
[09:36:56] <addshore>	 yes, well, so the code that fires the event and send it to the beacon, records within the JS if that event has been sent before on that page view / js session
[09:37:17] <addshore>	 so 3 events per page view max, and 1 event of each type per page view
[09:37:32] <addshore>	 that being event.actionType
[09:37:40] <addshore>	 either "hide", "show", or "all"
[09:38:51] <joal>	 hm - And it sends the events at page destruction? Or sends the events for previous page when new page is seen?
[09:39:07] <addshore>	 it sends the events as soon as the interaction happens
[09:40:52] <joal>	 addshore: right - So you get 1 event if `some` of this interaction type has happened on any give page - I have - Thanks for taking the time to explain :)
[09:41:10] <addshore>	 yes, that's right :)
[09:41:46] <addshore>	 we could have bundled the 3 possible interactions up into 1 event and then sent it on page destruct or something like that, but that was lots more logic, and the traffic of these events is pretty low :)
[10:48:18] <elukey>	 joal: puppet ran on analytics-tool1002, do you want to restart turnilo?
[10:48:35] <joal>	 Will do elukey - Thanks :)
[10:48:39] <joal>	 :q
[10:48:42] <joal>	 oops
[11:12:07] <icinga-wm>	 PROBLEM - Hadoop NodeManager on analytics1031 is CRITICAL: PROCS CRITICAL: 0 processes with command name java, args org.apache.hadoop.yarn.server.nodemanager.NodeManager
[11:13:42] <elukey>	 this is the testing cluster
[11:17:44] <icinga-wm>	 PROBLEM - Hadoop NodeManager on analytics1034 is CRITICAL: PROCS CRITICAL: 0 processes with command name java, args org.apache.hadoop.yarn.server.nodemanager.NodeManager
[11:19:01] <elukey>	 same thing
[11:43:43] <wikibugs>	 10Analytics, 10DBA, 10Operations, 10ops-eqiad, 10Patch-For-Review: swap a2-eqiad PDU with on-site spare - https://phabricator.wikimedia.org/T213748 (10Marostegui) All the systems owned by the DBAs are now off.
[11:52:50] <addshore>	 joal: i was just trying to look at something in turnilo :) and I wanted to filter the uri_query for results with "wbcheckconstraints" in them
[11:53:05] <addshore>	 I tried using "contains" but not sure if that is the right thing, and trying that seemed to time out :)(
[11:54:23] <addshore>	 trying regex also times out :(
[12:20:10] <joal>	 hm
[12:20:33] <elukey>	 joal: 8 hadoop workers are going down, not really sure why we have so many in one rack
[12:20:57] <elukey>	 as follow up step I'd need to verify if we could spread them around
[12:27:48] <wikibugs>	 10Analytics, 10EventBus, 10serviceops: Include git in our alpine docker image on docker-registry.wikimedia.org - https://phabricator.wikimedia.org/T213963 (10fselles) @Ottomata you can create a blubberfile like     ` version: v3 base: docker-registry.wikimedia.org/wikimedia-stretch apt:   packages:     - git...
[13:00:11] <elukey>	 so HDFS is already replicating the blocks that are offline
[13:00:32] <elukey>	 we'll have some over replicated blocks but it should get cleared later on
[13:00:44] <elukey>	 the maintenance should last a couple of hours (hopefully)
[13:00:50] <elukey>	 then we have another rack
[13:05:58] <icinga-wm>	 PROBLEM - Check status of defined EventLogging jobs on eventlog1002 is CRITICAL: CRITICAL: Stopped EventLogging jobs: eventlogging-consumer@mysql-m4-master-00 eventlogging-consumer@mysql-eventbus
[13:09:09] <elukey>	 buuuuu
[13:09:59] <elukey>	 re-downtimed
[13:10:21] <joal>	 :(
[13:10:26] <joal>	 Check status of defined EventLogging jobs on eventlog1002 is CRITICAL: CRITICAL: Stopped EventLogging jobs: eventlogging-consumer@mysql-m4-master-00 
[13:10:29] <joal>	                       eventlogging-consumer@mysql-eventbus
[13:10:32] <joal>	 miscpate
[13:10:34] <joal>	 oops
[13:47:35] <nuria>	 elukey: to get the is_pageview to show up on turnilo did you do anything other than adding it to the config (and restart) ?
[13:47:45] <joal>	 nuria: nope
[13:48:03] <nuria>	 joal: man... i tried taht yesterday with my local config to no avail
[13:48:23] <nuria>	 fdans: i think i missed chu chu train yesterday
[13:48:23] <joal>	 nuria: it would have showed up if `introspection: true` was set, but we decided not
[13:48:31] <nuria>	 joal: i tried that too
[13:48:40] <nuria>	 joal: that is why i started looking ta jobs
[13:49:08] <fdans>	 nuria: ahhhh I was gonna mention it in standup and it completely went off my head!
[13:49:10] <nuria>	 joal: i still do not get it, i have changed config a bunch of times on turnilo ....anyways, all working now
[13:49:40] <joal>	 Seems ok yes
[13:50:11] <elukey>	 as FYI people all the alarms that you are seeing are due to rack maintenance
[13:50:12] <joal>	 it's also present in Superset
[13:50:22] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Add is_pageview as a dimension to the 'webrequest_sampled_128' Druid dataset - https://phabricator.wikimedia.org/T212778 (10Nuria) Nice to know that druid admin interface displayed all dimensions: webrequest_source hostname time_firstbyte ip http_status re...
[13:50:34] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Add is_pageview as a dimension to the 'webrequest_sampled_128' Druid dataset - https://phabricator.wikimedia.org/T212778 (10Nuria) 05Open→03Resolved
[13:50:56] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Add is_pageview as a dimension to the 'webrequest_sampled_128' Druid dataset - https://phabricator.wikimedia.org/T212778 (10JAllemandou) Turnilo needed a patch (`webrequest_sampled_128` datasource had intrspection disabled), and I manually updated the colu...
[13:51:10] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Move users from stat1005 to stat1007 - https://phabricator.wikimedia.org/T205846 (10Nuria) 05Open→03Resolved
[13:54:18] <elukey>	 joal: is this needed? https://hue.wikimedia.org/oozie/list_oozie_workflow/0007764-181009135629101-oozie-oozi-W/
[13:54:30] <wikibugs>	 10Analytics: Alarms for virtualpageview should exist (probably in oozie) for jobs that have been idle too long - https://phabricator.wikimedia.org/T213716 (10Nuria) The hue dashboard for workflows displays SLAS (probably not big news) . Looked at docs and i saw we can add an alarm for durantion of job  which "se...
[13:55:24] <joal>	 WAT? elukey ?
[13:55:40] <elukey>	 I was checking pageview-hourly-wf-2019-1-12-14 and I saw that one
[13:56:11] <joal>	 elukey: something weird must be happening - 2019-1-12-14 is 7 days ago
[13:57:24] <joal>	 elukey: the current version of the pageview_hourly get back to 2018-10-19:12:00
[13:57:26] <elukey>	 !log re-run pageview-hourly-wf-2019-1-12-14's coordinator 
[13:57:27] <stashbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log
[13:57:32] <joal>	 Killing the job in prep
[13:57:39] <elukey>	 super
[13:57:52] <joal>	 elukey: I don't get it - When I checked, it was successfull?
[13:58:08] <elukey>	 it showed failed to me :(
[13:58:35] <joal>	 ok :)
[13:58:50] <joal>	 maybe I'm still not recovered after all///
[14:00:36] <joal>	 elukey: something weird here - cassandra pageview job has been run for 2019-01-12 --> data was available
[14:04:15] <elukey>	 it is indeed a bit weird
[14:04:24] <elukey>	 I swear that the pageview coord was red :)
[14:06:18] <wikibugs>	 10Analytics: Alarms for virtualpageview should exist (probably in oozie) for jobs that have been idle too long - https://phabricator.wikimedia.org/T213716 (10JAllemandou) I have a suggestion.  We could set the `<timeout>XXX</timeout>` control in oozie coordinators, replacing XXX with number of seconds before the...
[14:08:16] <wikibugs>	 10Analytics, 10Technical-Debt: Remove Zero support in analytics - https://phabricator.wikimedia.org/T213770 (10JAllemandou) To discuss with the team: Do we want to drop the column or nullifying the field would be enough? For webrequest, since data is dropped after 2 month, we can first nullify then drop for re...
[14:15:58] <wikibugs>	 (03CR) 10Joal: "I actually think that the split-query and udf-registration extracted is not a good idea - it prevents a clear understanding of what's goin" (031 comment) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/480796 (https://phabricator.wikimedia.org/T210543) (owner: 10Joal)
[14:18:15] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: JVM pauses cause Yarn master to failover - https://phabricator.wikimedia.org/T206943 (10Nuria) 05Open→03Resolved
[14:18:44] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Phabricator, 10Wikimedia-Stream, 10Patch-For-Review: Move KafkaSSE development from Differential to Github - https://phabricator.wikimedia.org/T212420 (10Nuria) 05Open→03Resolved
[14:19:48] <joal>	 ebernhardson: Hi! For when you're up - The query_clicks_daily job is suspended from a long time (2018-10-30) - is it expected? Shall we resume it? https://hue.wikimedia.org/oozie/list_oozie_coordinator/0022615-181112144035577-oozie-oozi-C/
[14:25:15] <wikibugs>	 10Analytics, 10EventBus, 10serviceops: Include git in our alpine docker image on docker-registry.wikimedia.org - https://phabricator.wikimedia.org/T213963 (10Ottomata) Yar ok.  I don't really want to put the schemas into EventGate, soooo I'll make a deploy repo after all! :)
[14:26:01] <wikibugs>	 10Analytics, 10EventBus, 10serviceops: Include git in our alpine docker image on docker-registry.wikimedia.org - https://phabricator.wikimedia.org/T213963 (10Ottomata) Hm, well, I did consider doing this for final prod deployment.........yar ok.  Nevermind.  I'll DO IT!
[14:26:11] <wikibugs>	 10Analytics-Kanban, 10Patch-For-Review: Clickstream job failing  due to change of types of namespace column - https://phabricator.wikimedia.org/T211717 (10Nuria) 05Open→03Resolved
[14:26:18] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Failure while refining webrequest upload 2018-12-01-14. Upgrade alarms - https://phabricator.wikimedia.org/T211000 (10Nuria) 05Open→03Resolved
[14:26:38] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Presto cluster online and usable with test data pushed from analytics prod infrastructure accessible by Cloud (labs) users - https://phabricator.wikimedia.org/T204951 (10Nuria)
[14:26:40] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Operations, 10netops: Figure out networking details for new cloud-analytics-eqiad Hadoop/Presto cluster - https://phabricator.wikimedia.org/T207321 (10Nuria) 05Open→03Resolved
[14:26:48] <wikibugs>	 10Analytics, 10Analytics-Kanban: Create Office Hours for Team Analytics - https://phabricator.wikimedia.org/T211609 (10Nuria)
[14:30:49] <wikibugs>	 10Analytics-Kanban: Make edit data lake data available as a snapshot on dump hosts that can be sourced by Presto - https://phabricator.wikimedia.org/T214043 (10Nuria) p:05Triage→03High
[14:46:21] <elukey>	 joal: ssh -L 50070:analytics1028.eqiad.wmnet:50070 analytics1028.eqiad.wmnet \o/
[14:47:18] <joal>	 Hurray elukey :)
[14:53:06] <elukey>	 joal,ottomata - today I was wondering how much load it would cause another coordinator to kafka jumbo
[14:53:12] <elukey>	 as a mental exercise
[14:53:29] <ottomata>	 ?
[14:54:08] <elukey>	 I mean that camus is a bit brutal when consuming from kafka
[14:54:20] <joal>	 elukey: you mean, how much load another camus process for instance I assume
[14:54:23] <joal>	 right
[14:54:24] <elukey>	 yeah
[14:54:28] <joal>	 hm
[14:54:35] <elukey>	 and if the testing one would interfere with the current one
[14:54:56] <elukey>	 maybe just limiting map red processes for testing is sufficient
[14:55:13] <elukey>	 also morning ottomata :)
[14:56:16] <joal>	 elukey: maybe also reducing the size of data we get?
[14:56:50] <joal>	 elukey: not sure how we can do that though (going for upload only for instance would be a wasy, but big already, and not relevant in term of variability of data
[14:57:30] <ottomata>	 elukey:  you mean camus consuming webreqeuest into new cluster?
[14:57:44] <elukey>	 ottomata: yes yes, even other things
[14:57:48] <ottomata>	 you could consume a different topic if you just want to test camus
[14:58:05] <elukey>	 well in theory we need to test as much as possible to avoid surprises
[14:58:06] <ottomata>	 (good mornign!)
[14:58:09] <ottomata>	 aye
[14:58:20] <ottomata>	 does camus allow us to consume only a single partition?
[14:58:40] <ottomata>	 not suree it might@!
[14:59:24] <elukey>	 ok so I'd need to put some thinking into tweaking the testing coordinator
[15:00:04] <wikibugs>	 10Analytics, 10EventBus, 10serviceops: Include git in our alpine docker image on docker-registry.wikimedia.org - https://phabricator.wikimedia.org/T213963 (10Ottomata) 05Open→03Declined For development phase, i'll use thee wmfdebug image.  For prod deployment (outside of staging k8s), we'll build the sch...
[15:00:07] <wikibugs>	 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10EventBus, and 3 others: Modern Event Platform: Stream Intake Service: Implementation: Deployment Pipeline - https://phabricator.wikimedia.org/T211247 (10Ottomata)
[15:00:54] <ottomata>	 elukey:  coordinator?
[15:01:03] <ottomata>	 ohhh i see
[15:01:09] <ottomata>	 instead of just applying all the roles
[15:01:10] <ottomata>	 hmmm
[15:01:18] <ottomata>	 elukey:  my guess is it would be ok
[15:01:26] <ottomata>	 it probably would add extra load to kafka, but i would guess its ok
[15:01:44] <ottomata>	 but we should be careful to just apply the role as is, there are probably things that won't work 100%
[15:01:45] <icinga-wm>	 PROBLEM - HDFS corrupt blocks on an-master1001 is CRITICAL: 10 ge 5 https://grafana.wikimedia.org/dashboard/db/hadoop?var-hadoop_cluster=analytics-hadoop&orgId=1&panelId=39&fullscreen
[15:02:18] <elukey>	 the nodes of rack a2 are coming up
[15:02:29] <elukey>	 a3 sorry
[15:30:47] <icinga-wm>	 PROBLEM - Disk space on Hadoop worker on analytics1056 is CRITICAL: connect to address 10.64.5.19 port 5666: Connection refused
[15:30:47] <icinga-wm>	 PROBLEM - Hadoop DataNode on analytics1056 is CRITICAL: connect to address 10.64.5.19 port 5666: Connection refused
[15:31:13] <icinga-wm>	 PROBLEM - YARN NodeManager Node-State on analytics1056 is CRITICAL: connect to address 10.64.5.19 port 5666: Connection refused
[15:31:31] <icinga-wm>	 PROBLEM - Hadoop NodeManager on analytics1056 is CRITICAL: connect to address 10.64.5.19 port 5666: Connection refused
[15:31:47] <elukey>	 this is probably the rack
[15:34:58] <elukey>	 mforns: o/
[15:35:03] <mforns>	 elukey, hi!
[15:35:07] <elukey>	 sorry two workers are down :(
[15:35:11] <elukey>	 can you wait 10 mins?
[15:35:26] <mforns>	 elukey, no problem I'm in the hangouts, just come in when you're ready
[15:36:10] <elukey>	 mforns: there is also some maintenance to do on another rack, I hoped that it would have ended by now but it has still to start.. Do you mind if we reschedule? Even tomorrow 
[15:36:14] <elukey>	 really sorry
[15:36:24] <wikibugs>	 10Analytics, 10Discovery: Workflow to be able to move data files computed in jobs from analytics cluster to production - https://phabricator.wikimedia.org/T213976 (10Ottomata) In our discussion yesterday, we mention both git lfs and swift as an option for this.  I turned that idea down, but it seems it has bee...
[15:36:32] <ebernhardson>	 joal: doh! i stopped that to test something and forgot to re-deploy the new version. thanks!
[15:36:33] <mforns>	 elukey, sure, we can do it tomorrow, no problemo at all
[15:37:07] <mforns>	 elukey, just move it to same time, or whenever is good for you starting at 16h
[15:44:35] <icinga-wm>	 PROBLEM - Hadoop NodeManager on analytics1054 is CRITICAL: connect to address 10.64.5.17 port 5666: Connection refused
[15:44:43] <icinga-wm>	 PROBLEM - Hadoop DataNode on analytics1054 is CRITICAL: connect to address 10.64.5.17 port 5666: Connection refused
[15:45:29] <icinga-wm>	 PROBLEM - Disk space on Hadoop worker on analytics1054 is CRITICAL: connect to address 10.64.5.17 port 5666: Connection refused
[15:46:29] <icinga-wm>	 PROBLEM - Disk space on Hadoop worker on analytics1056 is CRITICAL: connect to address 10.64.5.19 port 5666: Connection refused
[15:46:29] <icinga-wm>	 PROBLEM - Hadoop DataNode on analytics1056 is CRITICAL: connect to address 10.64.5.19 port 5666: Connection refused
[15:46:57] <icinga-wm>	 PROBLEM - YARN NodeManager Node-State on analytics1056 is CRITICAL: connect to address 10.64.5.19 port 5666: Connection refused
[15:48:57] <elukey>	 I think that we have two broken disks
[15:49:03] <elukey>	 amazing
[15:49:13] <icinga-wm>	 PROBLEM - YARN NodeManager Node-State on analytics1054 is CRITICAL: connect to address 10.64.5.17 port 5666: Connection refused
[16:10:09] <elukey>	 joal: how is pageview daily ingested in turnilo?
[16:14:16] <elukey>	 neverming :)
[16:14:19] <elukey>	 *mind
[16:18:07] <icinga-wm>	 RECOVERY - Hadoop NodeManager on analytics1056 is OK: PROCS OK: 1 process with command name java, args org.apache.hadoop.yarn.server.nodemanager.NodeManager
[16:18:09] <icinga-wm>	 RECOVERY - YARN NodeManager Node-State on analytics1056 is OK: OK: YARN NodeManager analytics1056.eqiad.wmnet:8041 Node-State: RUNNING
[16:18:15] <icinga-wm>	 RECOVERY - Disk space on Hadoop worker on analytics1056 is OK: DISK OK
[16:18:17] <icinga-wm>	 RECOVERY - Hadoop DataNode on analytics1056 is OK: PROCS OK: 1 process with command name java, args org.apache.hadoop.hdfs.server.datanode.DataNode
[16:22:39] <wikibugs>	 10Analytics: Broken disk on analytics1056 - https://phabricator.wikimedia.org/T214057 (10elukey) p:05Triage→03Normal
[16:22:59] <wikibugs>	 10Analytics: Broken disk on analytics1056 - https://phabricator.wikimedia.org/T214057 (10elukey)
[16:30:02] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Product-Analytics: dbstore1002 Mysql errors - https://phabricator.wikimedia.org/T213670 (10Marostegui) Another crash: ` InnoDB: Warning: a long semaphore wait: --Thread 139957469411072 has waited at dict0stats.cc line 2391 for 241.00 seconds the semaphore: X-lock (wait_ex)...
[16:31:00] <wikibugs>	 10Analytics, 10Knowledge-Integrity, 10Research, 10Epic: Citation Usage: run third round of data collection - https://phabricator.wikimedia.org/T213969 (10bmansurov)
[16:37:31] <icinga-wm>	 RECOVERY - Hadoop NodeManager on analytics1054 is OK: PROCS OK: 1 process with command name java, args org.apache.hadoop.yarn.server.nodemanager.NodeManager
[16:37:41] <icinga-wm>	 RECOVERY - YARN NodeManager Node-State on analytics1054 is OK: OK: YARN NodeManager analytics1054.eqiad.wmnet:8041 Node-State: RUNNING
[16:38:07] <icinga-wm>	 RECOVERY - Hadoop DataNode on analytics1054 is OK: PROCS OK: 1 process with command name java, args org.apache.hadoop.hdfs.server.datanode.DataNode
[16:38:21] <icinga-wm>	 RECOVERY - Disk space on Hadoop worker on analytics1054 is OK: DISK OK
[16:42:13] <elukey>	 all right workers up
[16:42:14] <elukey>	 sigh
[16:47:21] <wikibugs>	 10Analytics, 10Analytics-EventLogging, 10EventBus, 10Core Platform Team (Modern Event Platform (TEC2)), and 2 others: Modern Event Platform: Schema Registry: Implementation - https://phabricator.wikimedia.org/T206789 (10Ottomata) @Pchelolo, so aside from the eventual HTTP based schema registry idea, we wil...
[16:51:10] <wikibugs>	 10Analytics, 10Analytics-EventLogging, 10Operations, 10Performance-Team, 10Software-Licensing: New MongoDB version is not DFSG-compatible, dropped by Debian - https://phabricator.wikimedia.org/T213996 (10CDanis) p:05Triage→03Normal
[16:54:06] <elukey>	 mysql on dbstore1002 again down
[16:58:29] <wikibugs>	 10Analytics, 10Knowledge-Integrity, 10Research, 10Epic: Citation Usage: run third round of data collection - https://phabricator.wikimedia.org/T213969 (10DarTar) @bmansurov thanks for announcing the timeline. Is it unrealistic to have the patch reviewed and a deployment scheduled before All Hands week? Thi...
[17:00:09] * Nettrom sends some sympathy to the dbstore1002 admins
[17:01:38] <nuria>	 ping joal milimetric ottomata standdduppp 
[17:01:39] <wikibugs>	 10Analytics, 10Analytics-EventLogging, 10EventBus, 10Core Platform Team (Modern Event Platform (TEC2)), and 2 others: Modern Event Platform: Schema Registry: Implementation - https://phabricator.wikimedia.org/T206789 (10Pchelolo) > Should we create a new schema repo now for analytics purposes, or should we...
[17:01:41] <ottomata>	 UH OH
[17:07:05] <wikibugs>	 10Analytics, 10Knowledge-Integrity, 10Research, 10Epic: Citation Usage: run third round of data collection - https://phabricator.wikimedia.org/T213969 (10bmansurov) @DarTar The patches are up for a review (I pinged @EBernhardson too). I may need your help to expedite this. Once they are reviewed, we probab...
[17:26:49] <wikibugs>	 10Analytics, 10Discovery: Workflow to be able to move data files computed in jobs from analytics cluster to production - https://phabricator.wikimedia.org/T213976 (10Ottomata) @fgiunchedi @mmodell got a sec sometime to discuss this?  Here's a quick summary:  Folks generate large-ish binary artifacts (e.g. ML m...
[17:32:49] <wikibugs>	 (03CR) 10Joal: "Comment inline - Thanks for that :)" (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/484657 (https://phabricator.wikimedia.org/T206894) (owner: 10Fdans)
[17:35:34] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Product-Analytics: dbstore1002 Mysql errors - https://phabricator.wikimedia.org/T213670 (10Marostegui) Replication flowing, I am glad we migrated most of those MyISAM tables away, so far it is working fine
[17:47:41] <nuria>	 ping ottomata grossskin
[17:47:52] <ottomata>	 ACK
[17:50:55] <wikibugs>	 10Analytics, 10Discovery: Workflow to be able to move data files computed in jobs from analytics cluster to production - https://phabricator.wikimedia.org/T213976 (10mforns) p:05Triage→03Normal
[17:52:58] <wikibugs>	 10Analytics: Add user_properties mysql table data to hadoop cluster - https://phabricator.wikimedia.org/T213910 (10mforns) user_properties is not the best case for one-off sqoop, because it is constantly updated. We would benefit from a real time approach, but this is not going to happen in the near future.
[17:53:15] <wikibugs>	 10Analytics: Add user_properties mysql table data to hadoop cluster - https://phabricator.wikimedia.org/T213910 (10mforns) p:05Triage→03Normal
[17:55:03] <icinga-wm>	 RECOVERY - Check status of defined EventLogging jobs on eventlog1002 is OK: OK: All defined EventLogging jobs are runnning.
[17:57:32] <elukey>	 re-enabled eventlogging mysql consumers
[17:58:36] <wikibugs>	 10Analytics, 10Analytics-Wikistats: [Wikistats v2] Default selection for (active) editors is confusing for inexperienced users - https://phabricator.wikimedia.org/T213800 (10mforns) p:05Triage→03Normal
[17:59:11] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Technical-Debt: Remove Zero support in analytics - https://phabricator.wikimedia.org/T213770 (10mforns) a:03JAllemandou
[18:00:01] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Technical-Debt: Remove Zero support in analytics - https://phabricator.wikimedia.org/T213770 (10mforns) p:05Triage→03Normal
[18:01:59] <wikibugs>	 10Analytics: Use MaxMind DB in piwik geo-location - https://phabricator.wikimedia.org/T213741 (10mforns) p:05Triage→03Low
[18:02:36] <wikibugs>	 10Analytics, 10Analytics-Kanban: Create staging environment for superset - https://phabricator.wikimedia.org/T213923 (10mforns) p:05High→03Normal
[18:04:37] <wikibugs>	 10Analytics, 10Operations, 10ops-eqiad, 10Patch-For-Review: Broken disk on analytics1056 - https://phabricator.wikimedia.org/T214057 (10mforns)
[18:05:53] <wikibugs>	 10Analytics, 10Analytics-EventLogging, 10Operations, 10Performance-Team, 10Software-Licensing: New MongoDB version is not DFSG-compatible, dropped by Debian - https://phabricator.wikimedia.org/T213996 (10mforns) We do not use MongoDB in EventLogging production. Thanks for the heads up. Removing EventLogg...
[18:14:04] <wikibugs>	 10Analytics, 10Operations, 10Performance-Team, 10Software-Licensing: New MongoDB version is not DFSG-compatible, dropped by Debian - https://phabricator.wikimedia.org/T213996 (10mforns)
[18:15:16] * elukey off!
[18:28:39] <gehel>	 ottomata: hey! you know anything about cassandra packaging? Seems that both Luca and Eric are off
[18:36:31] <ottomata>	 gehel:  i don't!  :/
[18:37:03] <gehel>	 ottomata: thanks anyway :) I should have tried a bit more on my own first, problem already solved
[18:50:11] <wikibugs>	 10Analytics, 10Operations, 10Performance-Team, 10Software-Licensing: New MongoDB version is not DFSG-compatible, dropped by Debian - https://phabricator.wikimedia.org/T213996 (10MaxSem) @mforns, so you don't need `python-pymongo` installed in `eventlogging::dependencies`?
[18:50:57] <wikibugs>	 10Analytics, 10Operations, 10Performance-Team, 10Software-Licensing: New MongoDB version is not DFSG-compatible, dropped by Debian - https://phabricator.wikimedia.org/T213996 (10Ottomata) nope! def not. must be some super legacy thang.
[19:00:21] <wikibugs>	 10Analytics, 10Research, 10Article-Recommendation: Transferring data from Hadoop to production MySQL database - https://phabricator.wikimedia.org/T213566 (10Ottomata) @bmansurov I think we can eventually figure out a way to get your dump files out of analytics to somewhere that can access mysql.  Where and h...
[19:02:13] <wikibugs>	 10Analytics, 10Research, 10Article-Recommendation: Transferring data from Hadoop to production MySQL database - https://phabricator.wikimedia.org/T213566 (10bmansurov)
[19:02:15] <wikibugs>	 10Analytics, 10Discovery: Workflow to be able to move data files computed in jobs from analytics cluster to production - https://phabricator.wikimedia.org/T213976 (10bmansurov)
[19:02:29] <wikibugs>	 10Analytics, 10Research, 10Article-Recommendation: Transferring data from Hadoop to production MySQL database - https://phabricator.wikimedia.org/T213566 (10bmansurov)
[19:02:31] <wikibugs>	 10Analytics, 10Discovery: Workflow to be able to move data files computed in jobs from analytics cluster to production - https://phabricator.wikimedia.org/T213976 (10bmansurov)
[19:07:37] <ottomata>	 be back in a bit, gotta pick up car from shop...
[19:09:05] <wikibugs>	 10Analytics, 10Research, 10Article-Recommendation: Generate article recommendations in Hadoop for use in production - https://phabricator.wikimedia.org/T210844 (10bmansurov)
[19:11:36] <wikibugs>	 10Analytics, 10Research, 10Wikidata: Copy Wikidata dumps to HDFs - https://phabricator.wikimedia.org/T209655 (10bmansurov)
[19:42:01] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Product-Analytics: Bug: can't make a YoY time series chart in Superset - https://phabricator.wikimedia.org/T210687 (10Tbayer) For some reason, that link only shows me data up to March - even after changing the metric to `SUM(view_count)` (which I guess was intended) and hit...
[19:44:46] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Product-Analytics: Bug: can't make a YoY time series chart in Superset - https://phabricator.wikimedia.org/T210687 (10Nuria) Just try again, teh tiem granularity needs to be set to "day"
[19:46:20] <wikibugs>	 10Analytics, 10Research, 10Article-Recommendation: Transferring data from Hadoop to production MySQL database - https://phabricator.wikimedia.org/T213566 (10Nuria) >Unless...unless that is we can actually write to the MySQL db from Hadoop. I do not think we shoudl consider this an option. We should ahve a cl...
[19:50:12] <wikibugs>	 10Analytics, 10DBA, 10Operations, 10Research, 10Article-Recommendation: Transferring data from Hadoop to production MySQL database - https://phabricator.wikimedia.org/T213566 (10bmansurov)
[19:50:45] <wikibugs>	 10Analytics, 10DBA, 10Operations, 10Research, 10Article-Recommendation: Transferring data from Hadoop to production MySQL database - https://phabricator.wikimedia.org/T213566 (10bmansurov) @Ottomata thanks! I've updated the task description and ping the groups you mentioned.
[19:51:10] <wikibugs>	 10Analytics, 10DBA, 10Operations, 10Research, 10Article-Recommendation: Transferring data from Hadoop to production MySQL database - https://phabricator.wikimedia.org/T213566 (10bmansurov)
[19:52:15] <wikibugs>	 10Analytics, 10DBA, 10Operations, 10Research, 10Article-Recommendation: Transferring data from Hadoop to production MySQL database - https://phabricator.wikimedia.org/T213566 (10bmansurov) @Banyek and @Dzahn I'd appreciate your input on this task. Thank you.
[19:57:49] <ottomata>	 (back)
[20:01:03] <wikibugs>	 10Analytics, 10DBA, 10Operations, 10Research, 10Article-Recommendation: Transferring data from Hadoop to production MySQL database - https://phabricator.wikimedia.org/T213566 (10Ottomata) > We should have a clear separation of concerns and while the hadoop cluster is in charge of computing the data the t...
[20:04:17] <wikibugs>	 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10EventBus, and 3 others: Modern Event Platform: Stream Intake Service: Implementation: Deployment Pipeline - https://phabricator.wikimedia.org/T211247 (10Ottomata) FYI, the symlinked (or copied, or packaged .tgz) chart is necessary in the `charts/...
[20:06:42] <wikibugs>	 10Analytics, 10DBA, 10Operations, 10Research, 10Article-Recommendation: Transferring data from Hadoop to production MySQL database - https://phabricator.wikimedia.org/T213566 (10Dzahn) How to install the importer scripts is what i started once in https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/476...
[20:20:58] <wikibugs>	 10Analytics, 10Analytics-EventLogging, 10Discovery, 10EventBus, 10Services: Rewrite Avro schemas (ApiAction, CirrusSearchRequestSet) as JSONSchema and produce to EventGate - https://phabricator.wikimedia.org/T214080 (10Ottomata) p:05Triage→03Normal
[20:21:43] <wikibugs>	 10Analytics, 10DBA, 10Operations, 10Research, 10Article-Recommendation: Transferring data from Hadoop to production MySQL database - https://phabricator.wikimedia.org/T213566 (10Nuria) >It has been abandoned after Analytics said to not use stat hosts and use Hadoop instead. To clarify: stats machines sho...
[20:24:28] <wikibugs>	 10Analytics, 10DBA, 10Operations, 10Research, 10Article-Recommendation: Transferring data from Hadoop to production MySQL database - https://phabricator.wikimedia.org/T213566 (10Ottomata) > to have a daemon on the mysql hosts To clarify, it is unlikely these scripts would run on the mysql servers themsel...
[20:24:37] <ottomata>	 bd808:  yt?
[20:24:45] <bd808>	 yup
[20:24:57] <ottomata>	 looking at ApiAction
[20:24:57] <ottomata>	 https://phabricator.wikimedia.org/T214080
[20:25:03] <ottomata>	 https://github.com/wikimedia/mediawiki-event-schemas/blob/master/avro/mediawiki/ApiAction/101453221640.avsc
[20:25:10] <ottomata>	 (because its simpler than CirrusSearchRequestSet)
[20:25:29] * bd808 tries to remember 2015 code
[20:25:29] <ottomata>	 the only difficult thing there is the params map.
[20:25:40] <ottomata>	 i guess here's a required q:
[20:25:43] <ottomata>	 is this data used at all?
[20:25:50] <ottomata>	 if it isn't...we don't have to port it :)
[20:26:10] <bd808>	 yes. its the only analytics we have at all on the ActionAPI
[20:26:13] <ottomata>	 ok
[20:26:17] <ottomata>	 so we port it.
[20:26:21] <ottomata>	 the params map is difficult
[20:26:28] <ottomata>	 since there's no difference between a map and a 'struct' in json
[20:26:29] <ottomata>	 only objects
[20:26:30] <bd808>	 and the map is the main thing really...
[20:26:44] <ottomata>	 do we have an idea of how many possible keys there are?
[20:26:54] <ottomata>	 (i
[20:26:57] <bd808>	 thousands
[20:26:59] <ottomata>	 (i'm guessing no, ...yeah)
[20:27:08] <ottomata>	 ok so putting them all in the schema isn't great
[20:27:13] <ottomata>	 we had this problem with ores schemas too
[20:27:22] <bd808>	 and new ones at any random MediaWiki release
[20:27:25] <ottomata>	 what if it was an array of { name, value} ?
[20:28:00] <bd808>	 then any and all code that accesses the hadoop table would need to be rewritten I guess?
[20:28:01] <ottomata>	 [{name: "format", value: "json"}, {name: "prop", value:"imageinfo"}, ...] 
[20:28:09] <ottomata>	 that's going to be true no matter waht
[20:28:17] <bd808>	 :/
[20:28:18] <ottomata>	 it'll be a new table with a new schema no matter what we do
[20:28:33] <ottomata>	 we won't be turning the old one off until everything is rewritten to the new one
[20:28:42] <ottomata>	 so we can have them both exist at the same time
[20:29:02] <ottomata>	 we might eventually be able to support maps
[20:29:11] <ottomata>	 but we'd need our hive importer to know how to do that
[20:29:16] <ottomata>	 right now it just examines the json data
[20:29:23] <ottomata>	 it could be possible to do it from schemas....
[20:29:28] <ottomata>	 i have a prototype for trhat actually..
[20:29:29] <bd808>	 "everything" is pretty much just some cron jobs I have on stats1005(?) and adhoc queries by tgr/anomie/whoever
[20:29:30] <ottomata>	 hmmmm
[20:29:49] <ottomata>	 aye, we can figure out how to port that stuff later
[20:30:18] <ottomata>	 nuria: yt?
[20:30:47] <nuria>	 ottomata: yes
[20:31:07] <bd808>	 as long as the data ends up in hadoop in a way that we can query for say "all records in June that passed foo_bar_baz=10" things will be fine
[20:31:21] <wikibugs>	 10Analytics, 10DBA, 10Operations, 10Research, 10Article-Recommendation: Transferring data from Hadoop to production MySQL database - https://phabricator.wikimedia.org/T213566 (10Nuria) I think one telling use case  the ilustrates why we want to decouple data loading from hadoop is a rollback. Say that yo...
[20:31:27] <nuria>	 ottomata: let me reda backscroll
[20:31:35] <ottomata>	 nuria:  well, my q is about kafkaconnect
[20:31:54] <ottomata>	 right now we can support maps from the Refine job because it doesn't know anything about the schema
[20:32:06] <ottomata>	 there are 3 solutions:
[20:32:39] <ottomata>	 make the schema awkward with arrays of {name, value} instead of real maps.
[20:32:41] <nuria>	 "we can support"  -> "we acnoot support"
[20:32:46] <nuria>	 *cannot right?
[20:32:49] <ottomata>	 cannot*
[20:32:50] <ottomata>	 sorry ya
[20:32:51] <nuria>	 k
[20:33:08] <ottomata>	 augment Refine to know how to access the schemas, and map from JSONSchema to Spark schema
[20:33:11] <ottomata>	 Or
[20:33:24] <ottomata>	 use Kafka Connect with my JSONSchema converter prototype
[20:33:34] <ottomata>	 to do KafkaConnect...we'd ahve to run KafkaConnect somewhere
[20:33:37] <ottomata>	 we want to do this eventually
[20:33:45] <nuria>	 which is basically another refine job with different code entirely right?
[20:33:47] <ottomata>	 but, it might be a lot of work to get all the details 
[20:33:56] <ottomata>	 nuria sort of not really
[20:34:00] <ottomata>	 its more like the camus replacement
[20:34:12] <ottomata>	 but it would create parquet hive tables itself
[20:34:17] <ottomata>	 instead of just json data on hdfs
[20:34:37] <ottomata>	 so, Refine woudl still be used, but more for augmentation rather than format converstion and schema evolution
[20:34:51] <ottomata>	 e.g. geocoding, deduplicating, ua mapping, whatever
[20:34:54] <nuria>	 ottomata: besides api action (that will be easy to change consumer wise, it is just used for ad-hoc query access)
[20:35:02] <ottomata>	 that's true.....
[20:35:18] <nuria>	 ottomata: who else runs into the issue with maps
[20:35:20] <ottomata>	 but, it kinda sucks to make the schema awkward now, since we might be able to support non awkward later
[20:35:25] <ottomata>	 mw rev score did
[20:35:26] <ottomata>	 ores
[20:35:32] <ottomata>	 but we did the name, value thing.
[20:35:43] <ottomata>	 some of the job queue stuff does, but that's because it was ported over
[20:35:47] <ottomata>	 from old stuff
[20:36:00] <ottomata>	 i'm looking at cirrussearchrequest set now
[20:36:08] <ottomata>	 if the keys are few enough, a struct/object is fine
[20:36:17] <ottomata>	 ebernhardson:  yt? ^ :)
[20:37:16] <wikibugs>	 10Analytics, 10Operations, 10Performance-Team, 10Patch-For-Review, 10Software-Licensing: New MongoDB version is not DFSG-compatible, dropped by Debian - https://phabricator.wikimedia.org/T213996 (10Krinkle)
[20:37:19] <wikibugs>	 10Analytics, 10DBA, 10Operations, 10Research, 10Article-Recommendation: Transferring data from Hadoop to production MySQL database - https://phabricator.wikimedia.org/T213566 (10bmansurov) Rollback is already taken care of the in the script level. We'll have different versions of the data in MySQL and ca...
[20:39:39] <nuria>	 ottomata:  I do not understand how  the variable number of parameters would work in a struct though?
[20:44:13] <ottomata>	 nuria:  they wouldn't be variable
[20:44:20] <ottomata>	 if the number of keys is small 
[20:44:23] <ottomata>	 and known
[20:44:26] <ottomata>	 it doesn't need to be a map
[20:44:39] <ottomata>	 they'd be in the schema
[20:45:32] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Community-Tech, 10Event Metrics, 10Community-Tech-Sprint: Add site to piwik.wikimedia.org for Event Metrics so we can measure traffic to tool - https://phabricator.wikimedia.org/T213735 (10MusikAnimal) @Nuria We have deployed to production. I can see in the network log...
[20:46:02] <wikibugs>	 10Analytics, 10DBA, 10Operations, 10Research, 10Article-Recommendation: Transferring data from Hadoop to production MySQL database - https://phabricator.wikimedia.org/T213566 (10Nuria) @bmansurov how do handle deleting data in your storage when you have reached capacity or when that dataset is bad? There...
[20:46:39] <nuria>	 bd808: is the params argument variable in the avro schema ? 
[20:47:11] <bd808>	 nuria: yes. its is the GET/POST parameters passed to the api
[20:47:26] <bd808>	 or a filtered subset of them
[20:47:42] <nuria>	 bd808: is that filter a whitelist that exists somewhere? ccott
[20:47:44] <bd808>	 probably the latter actually. Let me see if I can find the MediaWiki side
[20:47:48] <nuria>	 cc  ottomata 
[20:49:28] <ottomata>	 intersting, ya
[20:49:38] <ottomata>	 thousands of possible params is pretty huge thoug
[20:49:43] <ottomata>	 not sure if that's what we want in the schema :)
[20:50:01] <ottomata>	 i'm looking at the cirrussearchrequestset now, it looks like the number of keys in their maps is small enough
[20:51:22] <bd808>	 nope, I was wrong. Its all param names, but some values are redacted -- https://github.com/wikimedia/mediawiki/blob/a7ad3f7358ed6f3525d4f313970ccc8af95123f6/includes/api/ApiMain.php#L1658-L1677
[20:52:02] <wikibugs>	 10Analytics, 10DBA, 10Operations, 10Research, 10Article-Recommendation: Transferring data from Hadoop to production MySQL database - https://phabricator.wikimedia.org/T213566 (10bmansurov) @Nuria   > how do handle deleting data in your storage when you have reached capacity or when that dataset is bad? T...
[20:54:16] <ottomata>	 aye ok
[20:56:02] <wikibugs>	 10Analytics, 10Operations, 10ops-eqiad: Rack A2's hosts alarm for PSU broken - https://phabricator.wikimedia.org/T212861 (10RobH)
[20:56:07] <wikibugs>	 10Analytics, 10DBA, 10Operations, 10ops-eqiad, 10Patch-For-Review: swap a2-eqiad PDU with on-site spare - https://phabricator.wikimedia.org/T213748 (10RobH) 05Open→03Resolved a:03RobH Synced up with Chris via IRC:  All systems were able to come back up within a2 without incident.  The spare PDU is...
[21:00:12] <icinga-wm>	 RECOVERY - HDFS corrupt blocks on an-master1001 is OK: (C)5 ge (W)2 ge 0 https://grafana.wikimedia.org/dashboard/db/hadoop?var-hadoop_cluster=analytics-hadoop&orgId=1&panelId=39&fullscreen
[21:23:09] <wikibugs>	 10Analytics: Update git lfs on stat1006/7 - https://phabricator.wikimedia.org/T214089 (10Halfak)
[21:29:19] <wikibugs>	 10Analytics, 10Analytics-EventLogging, 10EventBus, 10Core Platform Team (Modern Event Platform (TEC2)), and 2 others: CI Support for Schema Registry - https://phabricator.wikimedia.org/T206814 (10Ottomata) a:03Pchelolo
[21:29:49] <wikibugs>	 10Analytics, 10Analytics-EventLogging, 10EventBus, 10Core Platform Team (Modern Event Platform (TEC2)), and 2 others: Git Commit hook that adds a whole new file when a new version of schema is committed - https://phabricator.wikimedia.org/T206812 (10Ottomata) a:03Pchelolo
[21:30:11] <wikibugs>	 10Analytics, 10Analytics-EventLogging, 10EventBus, 10Core Platform Team (Modern Event Platform (TEC2)), and 2 others: Develop a library for JSON schema backwards incompatibility detection - https://phabricator.wikimedia.org/T206889 (10Ottomata) a:03Pchelolo
[21:38:24] <wikibugs>	 10Analytics-EventLogging, 10Analytics-Kanban, 10EventBus, 10Product-Analytics, and 4 others: Modern Event Platform: Schema Guidelines and Conventions - https://phabricator.wikimedia.org/T214093 (10Ottomata) p:05Triage→03Normal
[21:38:43] <wikibugs>	 10Analytics-EventLogging, 10Analytics-Kanban, 10EventBus, 10Core Platform Team (Modern Event Platform (TEC2)), and 3 others: Modern Event Platform (TEC2) - https://phabricator.wikimedia.org/T185233 (10Ottomata)
[21:41:32] <wikibugs>	 10Analytics, 10Analytics-EventLogging, 10Discovery, 10EventBus, 10Services (watching): Rewrite Avro schemas (ApiAction, CirrusSearchRequestSet) as JSONSchema and produce to EventGate - https://phabricator.wikimedia.org/T214080 (10Pchelolo)
[21:45:09] <wikibugs>	 10Analytics-EventLogging, 10Analytics-Kanban, 10EventBus, 10Product-Analytics, and 4 others: Modern Event Platform: Schema Guidelines and Conventions - https://phabricator.wikimedia.org/T214093 (10Ottomata) I'm opening this task now, as we are starting to design some new schemas as part of our goals for Q3...
[22:37:51] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Community-Tech, 10Event Metrics, 10Community-Tech-Sprint: Add site to piwik.wikimedia.org for Event Metrics so we can measure traffic to tool - https://phabricator.wikimedia.org/T213735 (10Nuria) 05Open→03Resolved
[22:38:25] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Community-Tech, 10Event Metrics, 10Community-Tech-Sprint: Add site to piwik.wikimedia.org for Event Metrics so we can measure traffic to tool - https://phabricator.wikimedia.org/T213735 (10Nuria) Confirmed with @MusikAnimal that he can see usage dashboard cc @jmatazzoni...
[22:38:57] <wikibugs>	 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10MW-1.33-notes (1.33.0-wmf.6; 2018-11-27): dogs - https://phabricator.wikimedia.org/T214104 (10Bsausage64) 05Open→03Invalid p:05Triage→03Unbreak!
[23:03:41] <wikibugs>	 10Analytics, 10Operations, 10Research, 10Article-Recommendation, 10User-Marostegui: Transferring data from Hadoop to production MySQL database - https://phabricator.wikimedia.org/T213566 (10Marostegui) I don't think writing from Hadoop directly to M2 master is a good idea. But it is not really my call....
[23:46:39] <Nettrom>	 a-team: looks like EventLogging on betalabs doesn't copy events into MariaDB as expected, last update timestamp for three of our tables is sometime on 2019-01-04. Is there a magic button to press to turn it on again?