[11:29:57] * elukey lunch! [12:36:24] (03PS1) 10Fdans: Release 2.0.11 [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/393584 [12:36:52] (03CR) 10Fdans: [V: 032 C: 032] Release 2.0.11 [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/393584 (owner: 10Fdans) [12:40:13] (03PS1) 10Fdans: Release 2.0.11 [analytics/wikistats2] (release) - 10https://gerrit.wikimedia.org/r/393586 [12:40:55] (03CR) 10Fdans: [V: 032 C: 032] Release 2.0.11 [analytics/wikistats2] (release) - 10https://gerrit.wikimedia.org/r/393586 (owner: 10Fdans) [12:58:20] joal: o/ [12:58:21] https://grafana.wikimedia.org/dashboard/db/prometheus-druid [12:58:54] I updated the dashboard with percentiles (p50/75/90) for all the query metrics (both historical and broker) [12:58:58] funny things to notice [12:59:24] in druid_analytics, we have only druid1001's metrics listed for the broker [12:59:44] and I am pretty sure that it is due to the fact that drud1001 is hardcoded in pivot's config :P [12:59:54] we should have a load balanced ip now [12:59:56] checking [13:03:02] mmmm no there was the issue with the firewall yeah [13:03:03] uffff [13:03:19] anyhow, metrics looks good! [13:14:17] ok now those are aggregated by datasource [13:28:32] (03PS1) 10Fdans: [wip] Add pageviews by country endpoint [analytics/aqs] - 10https://gerrit.wikimedia.org/r/393591 [13:31:18] Heya elukey :) Thanks a lot for putting this together, it's really great :) [13:31:26] \o/ [13:32:54] elukey: The meetup I been to on Friday was really interesting [13:33:23] elukey: It would be gret to discuss potential usages for warp10 - I'm sure we would fine some [13:33:57] Also elukey - Not sure if you've seen Faidon's comments on T167907 and T138396 [13:33:57] T167907: Incorporate data from the GeoIP2 ISP database to webrequest - https://phabricator.wikimedia.org/T167907 [13:33:57] T138396: Create ops dashboard with info like ipv6 traffic split - https://phabricator.wikimedia.org/T138396 [13:34:12] elukey: I feel a bit ashamed we've not put enough effort on those aspects [13:35:05] ah nice! [13:36:26] nah, thanks to your spark/druid work we'll be able to add netflow data in realtime, so indirectly we are doing great in my opinion :) [13:37:05] I hear elukey, but the IPv6 stuff has already been discussed last year at all-hands, and nothing yet done :( [13:37:15] Arf anyway, I'm a never-happy man :-P [13:38:27] hey joal I know this is around your break time, but do you have a few minutes to talk about by country data sometime before standup? [13:39:17] fdans: no afternoon break on monday, I regularly star late ;) [13:39:26] So yeah, let's do it when you want fdans :) [13:39:38] oh then cave now joal ? [13:39:45] sure fdans [13:48:26] ottomata: o/ morninggggggg [13:48:43] whenever you have time I'd need some brain bounce with you for the prometheus stuff in hadoop [13:48:58] there is a puppet thing that I am not sure how to do [13:51:17] ok great! luca let's do now real quick? i'm still checking email and stuff so it'll be a while [13:51:56] elukey: ^ :) [13:52:29] ottomata: nono don't worry, later on is super fine! Just wanted to book a slot of your time :) [13:53:31] lets do now! [13:53:41] a-batcave-2 elukey [13:53:52] ack, grabbing headphones [14:19:31] hi everyone [14:21:27] Hi milimetric :) [14:21:35] hi joal [14:22:57] thanks for the geowiki fix, elukey [14:29:00] :) [14:29:04] wow, joal, that email with Erik sounds very promising [14:29:35] milimetric: I hope we're close to the end of the research - Next, fixing time will come ;) [14:29:58] 10Analytics, 10Operations, 10hardware-requests: Refresh or replace oxygen - https://phabricator.wikimedia.org/T181264#3788764 (10Ottomata) For me, oxygen is not that useful, as we have 90 days of queryable webrequest data in Hadoop. I suppose it is nice to be able to do some quick sed/awk/jq magic on sample... [14:30:24] happy to brain-bounce or pair joal [14:30:44] Thanks milimetric, will definitely ping you soon [14:37:13] 10Analytics, 10Operations, 10hardware-requests: Refresh or replace oxygen - https://phabricator.wikimedia.org/T181264#3784670 (10fgiunchedi) The syslog servers have 2x spinning disks and 16GB of memory, so comparable to oxygen performance wise. I personally use the dashboard @elukey mentioned to investigate... [14:40:27] 10Analytics, 10EventBus, 10Services (next): Support multiple partitions per topic in EventBus - https://phabricator.wikimedia.org/T157822#3788790 (10Pchelolo) Although this is required for being able to horizontally scale the Kafka-based queue, optimization work done in T181007 made it possible to sustain 90... [14:48:26] 10Analytics, 10Operations, 10hardware-requests: Refresh or replace oxygen - https://phabricator.wikimedia.org/T181264#3788817 (10faidon) I use the sampled-1000 logs from time to time (and the 5xx ones, but less frequently), especially in incident-worthy situations, where speed is of the essence. Additionall... [14:54:05] 10Analytics, 10EventBus, 10Services (next): Clean up retry-retry Kafka topics - https://phabricator.wikimedia.org/T179958#3788838 (10fgiunchedi) [14:54:44] 10Analytics, 10Operations, 10hardware-requests: Refresh or replace oxygen - https://phabricator.wikimedia.org/T181264#3784670 (10BBlack) Yeah I still use oxygen pretty routinely. I often prefer being able to construct a CLI pipeline out of jq/grep/sed/sort/uniq/etc... to using a web UI, and usually sampled-... [14:55:57] 10Analytics, 10Operations, 10hardware-requests: Refresh or replace oxygen - https://phabricator.wikimedia.org/T181264#3788843 (10Ottomata) > I guess I could extract those stats from stat1006 instead? Are the webrequest logs available there? Naw, and we only have unsampled in Hadoop. [14:57:03] 10Analytics, 10Operations, 10hardware-requests: Refresh or replace oxygen - https://phabricator.wikimedia.org/T181264#3788845 (10Ottomata) > Also, the Hadoop data available in the UI doesn't have the same responsiveness in an immediate situation. IIRC it runs up to an hour behind realtime, It actually could... [14:57:56] 10Analytics, 10EventBus, 10monitoring, 10Services (next): Clean up retry-retry Kafka topics - https://phabricator.wikimedia.org/T179958#3788847 (10fgiunchedi) [15:14:53] ottomata: did you want to brainbounce on jsonrefine stuff? [15:17:04] milimetric: hm, not yet [15:17:11] but maybe in 10 mins [15:18:45] milimetric: actually, yes lets [15:18:46] might help me [15:18:57] in batcave [15:20:05] omw [15:38:33] * elukey coffee! [15:39:52] 10Analytics, 10Analytics-Wikistats, 10Research: Renovation of Wikistats production jobs - https://phabricator.wikimedia.org/T176478#3788948 (10Erik_Zachte) script stat1005:/home/ezachte/wikistats/dumps/bash/extract_dump.sh has been adapted to stat1005 it invokes perl file ../perl/WikiDumpFilterArticles.pl w... [15:46:41] Hey, analytics team folks! I'd like to chat with someone about the EventLogging system, to make sure that a change that I'm considering for NavTiming makes sense. Who'd be the right person? [15:46:58] (And on a related note, is this the best place to ping, or should I be emailing your team list?) [15:53:02] Hi marlier! you can definitely write in here if you wish, but I'd also consider to follow up with the owners of the NavTiming schema (Perf team iirc) [15:53:26] I'm the new manager of the Perf team :-) [15:54:20] It's less about teh schema, and more about 1) the details of making a schema change and 2) making sure that I'm not going to accidentally do something that will have a negative effect on the eventlogging system as a whole [15:57:02] It occurs to me that the best approach is probably a much more detailed Phab task than what currently exists. I'll get that squared away and loop back shortly. [15:57:42] marlier: chatting here is good [15:58:23] marlier: phab ticket works too but best irc for fast conversation. [15:58:27] marlier: and WELCOME! [16:00:42] ping joal [16:01:42] 10Quarry: Can't access database in Quarry - https://phabricator.wikimedia.org/T181411#3789075 (10Edgars2007) [16:02:27] 10Analytics-Cluster, 10Analytics-Kanban, 10Language-Team, 10MediaWiki-extensions-UniversalLanguageSelector, and 3 others: Migrate table creation query to oozie for interlanguage links - https://phabricator.wikimedia.org/T170764#3789090 (10Nuria) [16:02:42] 10Analytics-Kanban, 10Patch-For-Review: Move Wikimetrics to new cloud databases - https://phabricator.wikimedia.org/T180770#3789091 (10Nuria) [16:04:07] 10Analytics-Kanban: Geowiki stopped updating on October 24th - DATA LOSS (read comments) - https://phabricator.wikimedia.org/T179952#3789115 (10Nuria) a:05Milimetric>03elukey [16:05:59] Thanks, Nuria! Glad to be here. [16:06:49] What I'm working on is https://phabricator.wikimedia.org/T181413 -- doing some refactoring of the NavigationTiming extension to allow us to oversample based on specific criteria (geography, user-agent). [16:07:10] There are two things that I want to make sure won't be an issue. [16:08:46] The first is pretty straightforward: I want to add a new field to the schema. My understanding is that I can just create a new rev of the schema page, change the revision specifier in extension.json, and the event logging system will pick that up when it starts receiving events with the new rev included in them. Is that correct? [16:16:30] marlier: lol sorry! I didn't know your username :) [16:16:43] welcome indeed! [16:17:50] 10Analytics-Kanban, 10Patch-For-Review, 10Services (watching): Add action api counts to graphite-restbase job - https://phabricator.wikimedia.org/T176785#3789182 (10JAllemandou) Ping ! If no other message comes in before tomorrow, I'll move forward with @Addshore suggestion of keeping existing restbase metri... [16:49:10] ottomata: whenever you have time - https://gerrit.wikimedia.org/r/#/c/393611/3 [16:49:38] (so it might be nicer to separate cdh::hadoop into a common profile that has nothing to do with journal nodes [17:08:23] 10Analytics-Kanban: Geowiki stopped updating on October 24th - DATA LOSS (read comments) - https://phabricator.wikimedia.org/T179952#3789338 (10Milimetric) @Ijon take a look at https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Edits/Geowiki which documents data that's being lost by the old geowiki system.... [17:13:23] 10Analytics-Kanban, 10EventBus, 10Reading-Infrastructure-Team-Backlog, 10Trending-Service, and 3 others: Trending Edit's worker offsets disappear from Kafka - https://phabricator.wikimedia.org/T181346#3789381 (10Milimetric) [17:13:51] 10Analytics-Kanban, 10EventBus, 10Reading-Infrastructure-Team-Backlog, 10Trending-Service, and 3 others: Trending Edit's worker offsets disappear from Kafka - https://phabricator.wikimedia.org/T181346#3786853 (10Milimetric) ping @Ottomata and @JAllemandou [17:16:16] 10Analytics-Dashiki, 10Analytics-Kanban: Browser-major numbers represented as a percentage (when they're not) on the desktop tabular view - https://phabricator.wikimedia.org/T181164#3789387 (10Milimetric) p:05Triage>03Normal a:03Nuria [17:16:47] 10Analytics-Kanban, 10Operations, 10monitoring, 10netops, 10User-Elukey: Pull netflow data in realtime from Kafka via Tranquillity/Spark - https://phabricator.wikimedia.org/T181036#3789390 (10Milimetric) [17:18:49] 10Analytics-Kanban, 10EventBus, 10Reading-Infrastructure-Team-Backlog, 10Trending-Service, and 3 others: Trending Edit's worker offsets disappear from Kafka - https://phabricator.wikimedia.org/T181346#3789393 (10mobrovac) ping @elukey too :) [17:20:30] 10Analytics, 10EventBus, 10Services (next): Support multiple partitions per topic in EventBus - https://phabricator.wikimedia.org/T157822#3017728 (10Milimetric) We're moving this to Radar. Ping us or move it back if it becomes high priority again. cc @Ottomata [17:21:46] 10Analytics, 10Analytics-EventLogging: design a system to assist software developers and researchers to perform automated data unit testing before pushing to production - https://phabricator.wikimedia.org/T85032#3789402 (10Milimetric) 05Open>03Resolved a:03Milimetric Beta labs data testing takes care of... [17:22:08] 10Quarry: Can't access database in Quarry - https://phabricator.wikimedia.org/T181411#3789406 (10zhuyifei1999) 05Open>03Invalid `s51434__mixnmatch_large_catalogs_p` exists on [[https://wikitech.wikimedia.org/wiki/Help:Toolforge/Database#User_databases|ToolsDB]] host `tools.db.svc.eqiad.wmflabs`. However, sin... [17:22:12] marlier: re schema change, yes, that is correct, adding a new field will mean that events will be stored in mysql in anew table that has that column [17:22:20] marlier: validation wise things would work fine [17:22:29] marlier: let me look at ticket [17:31:57] (03CR) 10Nuria: [V: 032 C: 032] "Sounds good." [analytics/refinery] - 10https://gerrit.wikimedia.org/r/392624 (owner: 10Joal) [17:33:52] 10Analytics: Incorporate data from the GeoIP2 ISP database to webrequest - https://phabricator.wikimedia.org/T167907#3789456 (10Nuria) Let's start buying a license for GeoIP v2 ISP databases [17:35:52] milimetric: https://gerrit.wikimedia.org/r/#/c/393613/ [17:35:54] oof [17:35:54] it got nastier [17:36:16] nuria_: https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Edits/Geowiki is the data loss from geowiki, confirmed it's not an ongoing problem, data was computed for the last few days since Luca fixed i t [17:36:49] milimetric: super thanks for documenting [17:37:17] ottomata: indeed, that del code is ugly. looking closer [17:37:35] yeah, be harsh, might be nicer way to do that [17:39:59] milimetric: suppose i could get rid of the dels by adding a conditional to the last comprehension that makes parsed_fields_from_line [17:40:12] if k!= 'capsule and k not in capsule.keys() [17:40:49] ottomata: nono, I wasn't done looking at it, just meant yeah I see it got worse [17:41:06] still thinking, and I have to do lunch too, got a busy afternoon [17:41:09] I'll review soon though [17:49:12] nuria: thanks much. [18:32:52] * elukey off! [19:22:23] (03PS5) 10Shilad Sen: Spark job to create desktop page ids viewed in each session with both language specific ids and wikidata data ids. [analytics/refinery/source] (nav-vectors) - 10https://gerrit.wikimedia.org/r/383761 (https://phabricator.wikimedia.org/T174796) [19:27:15] Hi all. I have a really dumb question. How do I save / commit a draft comment in Gerrit. [19:37:20] milimetric: just checking, will you have time to review this afternoon? would love to deploy that tomorrow morning [19:38:14] yes! will do in the next 30 minutes [19:38:18] ottomata: ^ [19:44:39] great thank youuu [20:08:48] ottomata: wanna chat in cave? I still don't understand one thing [20:09:47] ya, i'm kinda listening to open enrollment meeting [20:10:14] coming to cave though [20:10:21] will have you in hangout and bluejeans in background [20:22:00] ottomata: did you take down the superset instance we had running on your shell? [20:22:52] cc chelsyx bearloga [20:27:33] nuria_: no? [20:27:45] h [20:27:46] oh [20:27:50] but luca restarted :) [20:29:43] nuria_: back up [20:29:52] ottomata: cc chelsyx [20:32:05] Thanks nuria_ ! [20:36:53] Gone for tonight a-team [20:36:56] See you tomorrow [20:46:03] 10Analytics: R execution on stat1005 -> 'stack smashing error' - https://phabricator.wikimedia.org/T174946#3790265 (10Ottomata) @ezachte, this looks to be some deeper R + stretch upgrade bug. I think it will be very difficult to solve. Q: to unblock your charts, what do you need to generate the charts? We jus... [21:22:48] 10Analytics, 10EventBus, 10MW-1.31-release-notes (WMF-deploy-2017-11-14 (1.31.0-wmf.8)), 10Patch-For-Review, 10Services (next): Timeouts on event delivery to EventBus - https://phabricator.wikimedia.org/T180017#3790421 (10Ottomata) I caught a stacktrace: ``` Nov 27 20:31:41 kafka1001 eventlogging-servic... [21:27:26] 10Analytics-Kanban, 10Analytics-Wikistats, 10Patch-For-Review: Fonts from fonts.googleapis.com on wikistats - https://phabricator.wikimedia.org/T178317#3688129 (10Nuria) 05Open>03Resolved [21:28:00] 10Analytics-Cluster, 10Analytics-Kanban, 10Language-Team, 10MediaWiki-extensions-UniversalLanguageSelector, and 3 others: Migrate table creation query to oozie for interlanguage links - https://phabricator.wikimedia.org/T170764#3790434 (10Nuria) 05Open>03Resolved [21:28:13] 10Analytics-Kanban, 10Patch-For-Review: Investigate the use of local_quorum for AQS - https://phabricator.wikimedia.org/T164348#3790436 (10Nuria) 05Open>03Resolved [21:28:28] 10Analytics-Cluster, 10Analytics-Kanban, 10Patch-For-Review: Make tranquility work with Spark - https://phabricator.wikimedia.org/T168550#3790438 (10Nuria) [21:28:30] 10Analytics-Cluster, 10Analytics-Kanban, 10Patch-For-Review: Make Spark 2.1 easily available on new CDH5.10 cluster - https://phabricator.wikimedia.org/T158334#3790437 (10Nuria) 05Open>03Resolved [21:28:51] 10Analytics-Kanban, 10Analytics-Wikistats, 10Patch-For-Review: Create Druid public cluster such AQS can query druid public data - https://phabricator.wikimedia.org/T176223#3790443 (10Nuria) 05Open>03Resolved [21:28:54] 10Analytics-Kanban, 10Analytics-Wikistats, 10Patch-For-Review: Add edits endpoint to AQS using druid as a backend - https://phabricator.wikimedia.org/T174174#3790445 (10Nuria) [21:54:09] ottomata: I'm getting "AttributeError: 'module' object has no attribute 'future'" when trying to test the code [21:54:18] I have librdkafka 9.3.1 [21:54:31] is that too new? old? [21:54:43] is there a more fool-proof guide of installing EL with all the right dependencies? [21:55:48] hmmm, how are you testing? [21:56:00] future is not part of librdkakfak [21:56:11] that only works with kafka-python [21:56:19] buuuut, it should be transparent if you are not using that [21:56:22] milimetric: ^ [21:56:51] I'm just running "nosetests" after python setup.py install [21:57:43] ha, I haven't tested EL stuff since before the kafka stuff? Crazy [21:59:18] milimetric: in a virtual env? [21:59:29] i usually do [21:59:31] pip install -e . [21:59:38] I don't use virtual env, no [21:59:41] oh [21:59:46] hm, then dunno what happens to you :) [22:00:00] milimetric: do you use mediawiki vagrant? [22:00:02] no you dont. [22:00:02] ah, yeah sudo pip install -e . worked [22:00:03] hm [22:00:06] ok cool [22:00:19] weird, I should update the readme [22:00:25] ok, all pass, all good [22:00:26] thanks ottomata [22:01:39] k, merged [22:01:51] I'll be here tomorrow morning to help in case [22:02:21] great, thanks [22:16:43] (03CR) 10Shilad Sen: Spark job to create desktop page ids viewed in each session with both language specific ids and wikidata data ids. (031 comment) [analytics/refinery/source] (nav-vectors) - 10https://gerrit.wikimedia.org/r/383761 (https://phabricator.wikimedia.org/T174796) (owner: 10Shilad Sen)