[00:24:54] (03CR) 10MaxSem: WIP: reportupdater queries for EventLogging (032 comments) [analytics/discovery-stats] - 10https://gerrit.wikimedia.org/r/322007 (https://phabricator.wikimedia.org/T147034) (owner: 10MaxSem) [07:16:22] 10Analytics, 06Research-and-Data: Hash IPs on webrequest table - https://phabricator.wikimedia.org/T150545#2789216 (10Tbayer) @leila Please be aware that contrary to the assumption expressed above, there are various people outside the Research and Ops teams (developers, PMs, analysts) who have been using this... [07:41:33] PROBLEM - Kafka Broker Server on kafka1022 is CRITICAL: PROCS CRITICAL: 0 processes with command name java, args Kafka /etc/kafka/server.properties [08:06:16] good morning everyone, kafka1022 had hardware trouble (probably a broken disk from what it appears right now, Giuseppe is currently rebooting it) [08:37:24] 06Analytics-Kanban: Ongoing: Give me permissions in LDAP - https://phabricator.wikimedia.org/T150790#2805090 (10elukey) p:05Triage>03Normal [08:42:48] 10Analytics, 10Analytics-Cluster, 06DC-Ops, 06Operations, 10ops-eqiad: Kafka1022 needs a new disk - https://phabricator.wikimedia.org/T151028#2805092 (10elukey) [08:43:16] created the task for it! --^ [08:43:23] now kafka* daemons are sysctl masked [08:43:36] so the only thing actually alarming are the under-replicaated partitions [08:49:14] 10Analytics, 10Analytics-Cluster, 06DC-Ops, 06Operations, 10ops-eqiad: Kafka1022 needs a new disk - https://phabricator.wikimedia.org/T151028#2805117 (10elukey) p:05Triage>03High [08:49:34] two remarks [08:50:39] 1) UUIDs are now in fstabs so at least something good to notice after an outage :P [08:51:02] 2) either the alarms or the kafka partitions would need, imho, to be re-thinked [08:51:38] we can't page the world and spam IRC for a broken disk each time [09:20:23] Hi elukey, how was the trip yesterday? [09:22:19] joal: o/ good! No delays, I was impressed :) [09:22:24] Yay ! [09:22:39] I managed to grab some fresh fish at the "Trastienda" for lunch, loely [09:22:42] *lovely [09:22:45] And are you as tired as I am ? I guess yes ;) [09:23:03] That's awesome :) [09:25:28] yessss [09:25:54] joal: need also to restart all java things on the cluster for openjdk upgrade [09:26:00] maybe on Monday :P [09:26:30] Yay ! Again a full restart ! Thanks java for ensuring a natural chaos-monkey on our systems ;) [09:26:49] elukey: I prefer on Monday yes :) [09:28:28] elukey: only hadoop/oozie, kafka is running jessie and already on the new java [09:29:44] moritzm: ah yes yes the "cluster" for me and joal is hadoop :D [09:30:35] moritzm: we talked this week about it - we have now a more resilient cluster and better procedure since when we do regular restarts/reboots [09:33:03] ok :-) [09:58:03] (03PS37) 10Joal: [WIP] Refactor Mediawiki History scala code [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/301837 (https://phabricator.wikimedia.org/T141548) [10:35:54] (03PS38) 10Joal: [WIP] Refactor Mediawiki History scala code [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/301837 (https://phabricator.wikimedia.org/T141548) [10:39:44] (03PS29) 10Joal: [WIP] Join and denormalize all histories into one [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/307903 (owner: 10Milimetric) [12:27:07] https://libraries.io/cargo/librdkafka-sys \o/ [12:28:48] rdkafka in rust? [12:29:32] yeah :) [12:29:41] bindings for the C libs [13:14:05] while refactoring vk I am unsterdanding new pieces [13:25:18] so vk does *a lot* of its operations on pre-allocated buffers, rarely using malloc [13:25:32] this is great since it will ease testing [13:31:03] elukey: from what I understood from ottomata, it's also why vk is so fast [13:33:37] joal: ah yes I suspected something similar, but my brain was able to decode it only now :D [13:59:00] (03PS1) 10Joal: [WIP] Update edit history to GraphFrames [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/322260 [14:37:29] 06Analytics-Kanban, 13Patch-For-Review: Evaluate a unit testing framework and add tests for the formatter function - https://phabricator.wikimedia.org/T147440#2806080 (10elukey) Rationale of the changes: I'd like to separate functionalities in multiple .c/.h modules that can be unit tested separately, and the... [14:52:14] 10Analytics-Wikimetrics, 10Continuous-Integration-Config: tox runs all tests (including manual ones) - https://phabricator.wikimedia.org/T71183#2806104 (10hashar) 05Open>03declined [14:58:15] (03CR) 10Joal: [V: 04-1] "This works on a local Flink cluster but there is a conflict on Guava jar when using this code on Flink+Yarn." [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/321936 (owner: 10Joal) [15:06:40] joal: https://github.com/dibbhatt/kafka-spark-consumer - this one was the connector that I mentioned [15:07:43] I added my notes to the Conferences wiki page [15:08:12] and I also found http://events.linuxfoundation.org/events/apache-big-data-europe/program/slides [15:12:41] (03PS30) 10Mforns: [WIP] Join and denormalize all histories into one [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/307903 (owner: 10Milimetric) [15:15:11] (03CR) 10jenkins-bot: [V: 04-1] [WIP] Join and denormalize all histories into one [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/307903 (owner: 10Milimetric) [15:15:36] :[ [15:17:57] (03PS31) 10Mforns: [WIP] Join and denormalize all histories into one [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/307903 (owner: 10Milimetric) [15:22:25] (03PS1) 10Joal: [WIP] Spark-Streaming example reading webreques [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/322266 [15:22:53] :] [15:23:23] oh no, I thought it was jenkins [15:39:25] elukey, it's my ops week, should I restart kafka service in kafka1022? [15:39:52] mforns: nono it has already been handled :) [15:40:04] kafka is down now because we need to swap the disk first [15:40:06] elukey, oh, ok [15:45:10] mforns: so currently we have some topic with under replicated partitions (2 out of 3 required) but we are fine [15:45:18] elukey, I see [15:59:20] 10Analytics, 06Research-and-Data: Hash IPs on webrequest table - https://phabricator.wikimedia.org/T150545#2806314 (10Nuria) @Tbayer: the 60 days IP data? Or You are saying data in general? If the former can you specify what the use is? (thanks) [16:00:23] a-team:standdduppp [16:12:59] 10Analytics, 06Research-and-Data: Hash IPs on webrequest table - https://phabricator.wikimedia.org/T150545#2806372 (10Tbayer) >>! In T150545#2806314, @Nuria wrote: > @Tbayer: the 60 days IP data? Or You are saying data in general? If the former can you specify what the use is? (thanks) The data that is the to... [16:56:38] joal: you busy prepping? [16:56:57] mforns / joal: we could batcave for a couple minutes before meeting [16:57:08] milimetric, ok [17:01:10] 10Analytics, 06Research-and-Data: Hash IPs on webrequest table - https://phabricator.wikimedia.org/T150545#2806515 (10leila) @Tbayer two points to share: 1) I've signed up to reach out to all the people who may be using this data. I will reach out to you and other researchers, for example, via research-interna... [17:44:15] * elukey going afk! [18:00:42] milimetric, mforns : Going to spend some time with Lino before he goes to bed, should be here in 1/2h more or less [18:00:54] have fun [18:00:56] joal, ok [18:00:57] hi Lino!! [18:00:58] :] [18:01:17] milimetric, I'll also brb for a couple minutes [18:33:58] i've compiled up a test version of refinery-hive that brings in lucene so i can stem some text for analysis, but trying to create the function i get an error with very little information 'FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.FunctionTask' [18:34:19] I double checked the built jar, and the shade plugin is approrpiatly including the lucene dependencies, was wondering if anyone had other ideas on what could give that error? [18:35:55] milimetric, mforns : I'm here ! [18:35:57] Batcave? [18:36:22] Hi ebernhardson [18:36:42] ebernhardson: Without debugging info, it'll be difficult indeed :( [18:36:43] joal: hi! late friday for you :) [18:36:52] joal: how can i get hive to give me more info? [18:37:06] the entirety of output is just: [18:37:06] e (ebernhardson)> CREATE FUNCTION english_stemmer AS 'org.wikimedia.analytics.refinery.hive.Stemmer'; [18:37:09] Failed to register ebernhardson.english_stemmer using class org.wikimedia.analytics.refinery.hive.Stemmer [18:37:12] FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.FunctionTask [18:37:13] ebernhardson: what CLI are you using, hive or beeline? [18:37:16] joal: hive [18:37:20] joal, heya [18:37:21] k [18:37:25] gimme 1 minute please [18:37:28] sure mforns [18:37:40] ebernhardson: Ok that already means something:) [18:37:59] ebernhardson: as you probably know, hive registers UDFs [18:38:05] And in that case doesn't manage to [18:38:26] ebernhardson: I love adding meaning inforamtion ;) [18:38:28] joal: right, this class extends from org.apache.hadoop.hive.ql.exec.UDF i'm not sure what else :S [18:38:32] https://github.com/nomoa/analytics-refinery-source/commit/733a43e3545b5ba9905fd70e2428380440c54bd5 [18:38:40] ebernhardson: Would you mind me having a look at your code? [18:38:45] (nomoa == dcausse) [18:39:04] omw to batcave joal [18:40:21] ebernhardson: reading [18:47:39] ebernhardson: sorry got hooked up, will be back in a few minutes [18:47:47] joal: no worries, i appreciate the help! [19:08:04] ebernhardson: did you figured it out? [19:08:41] nuria: not yet :S randomly tried copying into hdfs and sourcing from there, and using beeline instead of hive, but same error [19:09:13] ebernhardson:seems like hive would be trying to load your code from the usual path [19:09:22] ebernhardson: but it is not going to be there, correct? [19:10:03] ebernhardson: did you started hive with: hive --hiveconf hive.aux.jars.path= -f test-udf.hql [19:10:13] nuria: no, its either in my home dir on stat1002, or /user/ebernhardson/ on hdfs. But the 'add jar' command seems to be ok, and list jars shows it [19:10:16] nuria: nope lemme try that [19:10:58] 10Analytics, 10Analytics-Cluster, 06Operations, 06Research-and-Data, and 2 others: GPU upgrade for stats machine - https://phabricator.wikimedia.org/T148843#2806890 (10RobH) We've never added a GPU/Card to any of our servers in the past. Many of them may not have the space for them. My understanding is f... [19:11:10] ebernhardson: also did you do jar -tf to make sure your jar has all deps that you need? [19:12:20] nuria: yes, and it looks to have both the direct lucene dependency, and the dependencies that jar pulls in [19:12:34] 10Analytics, 10Analytics-Cluster, 06Operations, 06Research-and-Data, and 2 others: GPU upgrade for stats machine - https://phabricator.wikimedia.org/T148843#2806898 (10RobH) Alternatively Chris may be able to check any R430 for the space rather than downtime stat1004, as long as its a same generation R430. [19:12:48] trying with the --hiveconf hive.aux.jars.path= didn't change anything [19:12:50] ebernhardson: Back [19:13:07] ebernhardson: ok, do any deps version number collide with existing versions of those jars that are being used? [19:13:13] joal: ah hello [19:13:17] ebernhardson: I follow nuria in thinking it's an issue with the jar not being present on the machines [19:13:22] nuria: hmm, i'm not sure how to check that [19:13:47] joal: in the past, although it's been a few months, hive was happy to load a snapshot jar from my home directory [19:13:56] k [19:13:57] ebernhardson: on your mvn directory ~/.m2 [19:13:58] but right now i'm trying with it in hdfs so it's available [19:14:45] ebernhardson: first thingg in mind: don't use beeline to upload jars [19:14:56] ebernhardson: the error is so high level that it could be 1) jar not in path 2)not able to load code as dep tree is not there [19:14:59] then, versions: how is your jar named? [19:15:15] joal: refinery-hive-0.0.37-SNAPSHOT.jar [19:15:26] in hdfs://analytics-hadoop/user/ebernhardson/ [19:15:31] k ebernhardson, this is not our latest, shouldn't conflict [19:15:41] ebernhardson: will have a look :) [19:15:54] * ebernhardson is just wishing for more verbose error messages :) [19:15:58] :) [19:16:21] which is funny because on the other end of hadoop, my oozie jobs have such verbose output it's a pain to track down where the actual problem was... [19:17:41] i suppose for conflicts, it's possible hadoop already has lucene somewhere? there is a sentences() built-in udf that could plausibly be using lucene, checking.. [19:18:03] although i thought the maven shade plugin took care of that somehow [19:18:48] ebernhardson: i bet this is java usual dependency hell (only surpased by node dependency hell) [19:19:01] ebernhardson: let's test something [19:19:08] ebernhardson: remove any deps from your code [19:19:11] hmm, no at least the sentences udf is using java.text.BreakIterator [19:19:14] ebernhardson: and just do a dummy udf [19:19:20] ebernhardson: see if that gets loaded [19:19:24] nuria: ok, sec [19:19:31] ebernhardson: if it does issue is deps [19:19:41] ebernhardson: if it doesn't is java classpath [19:21:44] 10Analytics, 10Analytics-Cluster, 06Operations, 06Research-and-Data, and 2 others: GPU upgrade for stats machine - https://phabricator.wikimedia.org/T148843#2806933 (10RobH) a:03RobH I'll steal this task for followup from Chris's findings and will update it accordingly. [19:21:51] ebernhardson: moar logging info : hive -hiveconf hive.root.logger=INFO,console [19:22:01] MOAR ! [19:24:00] joal: ooh, much better16/11/18 19:23:36 ERROR exec.FunctionRegistry: Unable to load UDF class: java.lang.ClassNotFoundException: org.wikimedia.analytics.refinery.hive.Stemmer [19:24:06] * ebernhardson now suspects PEBKAC [19:24:06] :) [19:25:31] ebernhardson: no stemmer class in said package :) [19:26:17] yea i just double checked the jar contents and it has all of lucene, but not my udf ... checking what i did [19:26:30] np, looks like a solved thing :) [19:26:49] I update the docs about the log trick ! [19:26:49] thanks a bunch! i'll note down how to increase logging for the future [19:26:56] perfect [19:28:51] joal: would you be so kind as to document that here: https://wikitech.wikimedia.org/wiki/Analytics/Cluster/Hive/QueryUsingUDF#Testing_a_UDF_you_just_wrote [19:29:09] nuria: will add in the two pages (with UDF ,and without [19:29:23] joal: super thanks [19:33:36] Done nuria [19:33:41] Away for weekend a-team :) [19:33:44] Bye ! [19:33:49] Bye ebernhardson, have a good weekend :) [19:33:50] joal: ciaooo thanks again [19:33:50] bye joal ! [19:33:56] nice weekend :] [19:35:08] 10Analytics, 10Analytics-Cluster, 06Operations, 06Research-and-Data, and 2 others: GPU upgrade for stats machine - https://phabricator.wikimedia.org/T148843#2806997 (10Cmjohnson) The size of the card it too large for the space inside the server. [19:53:11] 10Analytics, 10Analytics-Cluster, 06Operations, 06Research-and-Data, and 2 others: GPU upgrade for stats machine - https://phabricator.wikimedia.org/T148843#2807048 (10Nuria) @Cmjohnson : will it fit in 1002? [20:03:16] 10Analytics, 10Analytics-Cluster, 06Operations, 06Research-and-Data, and 2 others: GPU upgrade for stats machine - https://phabricator.wikimedia.org/T148843#2807065 (10RobH) stat1002 is an R510 3 PCIe G2 slots + 1 storage slot: One x8 slot Two x4 slots One Storage x4 slot So its appea... [20:05:58] 10Analytics, 03Interactive-Sprint: Enhance Report Updater to be able to send data to graphite - https://phabricator.wikimedia.org/T150187#2807081 (10MaxSem) a:03MaxSem [20:08:26] 06Analytics-Kanban: unusual number of '-" pageviews in October - https://phabricator.wikimedia.org/T150990#2807084 (10Nuria) {F4743325} Screenshoot from https://tools.wmflabs.org/pageviews/?project=en.wikipedia.org&platform=all-access&agent=user&start=2016-01-01&end=2016-11-05&pages=- [20:10:26] 10Analytics, 10Analytics-Cluster, 06Operations, 06Research-and-Data, and 2 others: GPU upgrade for stats machine - https://phabricator.wikimedia.org/T148843#2807089 (10RobH) In fact, it seems that even if there is room in the R510, it lacks the power cables for PCI slot use. Some quick checking online fin... [20:13:07] 10Analytics, 10Analytics-Cluster, 06Operations, 06Research-and-Data, and 2 others: GPU upgrade for stats machine - https://phabricator.wikimedia.org/T148843#2807097 (10RobH) a:05RobH>03DarTar I'm assigning this back to @dartar as the original requestor, since it seems we cannot accommodate the request... [21:09:39] 06Analytics-Kanban, 06Wikipedia-iOS-App-Backlog, 10iOS-app-feature-Analytics: Drop in iOS app pageviews since version 5.2.0 - https://phabricator.wikimedia.org/T148663#2807221 (10JMinor) @JoeWalsh @Fjalapeno can one of y'all update the user agent definition/pattern for the analytics team per @Nuria previous... [21:18:33] mforns: found something else. archived records aren't coming in properly if their ar_page_id is null [21:18:43] 06Analytics-Kanban, 06Wikipedia-iOS-App-Backlog, 10iOS-app-feature-Analytics: Drop in iOS app pageviews since version 5.2.0 - https://phabricator.wikimedia.org/T148663#2807238 (10Tbayer) For the record, the change related to Nuria's comment above: https://gerrit.wikimedia.org/r/#/c/319374 [21:18:57] that makes sense for page histories but doesn't make sense for edit metrics that need those records [21:19:29] 06Analytics-Kanban: Can't log into https://piwik.wikimedia.org/ - https://phabricator.wikimedia.org/T144326#2807240 (10JMinor) @Milimetric I am still able to log in as "wikimedia-ios", but the sites on the site list no longer include the app. I wasn't able to pull the credentials file for piwik-wmf-admin yet. Is... [21:19:34] I think this one we should fix, there are more than before [21:19:37] milimetric, aha, what happens to those records with NULL ar_page_id [21:19:39] ? [21:20:08] I didn't look at the code closely but my guess is they're ignored because they have an artificial page_id [21:20:13] (in page history) [21:20:30] like probably the equivalent of an inner join when we need a left join? [21:20:32] oh, so they are not in the history, right? [21:20:48] no, they don't end up in the output files of the denormalization phase [21:20:58] 06Analytics-Kanban: Can't log into https://piwik.wikimedia.org/ - https://phabricator.wikimedia.org/T144326#2807241 (10Tbayer) I just tried the login and password from the "piwik.credentials" file again and it no longer works for me. [21:21:07] I see [21:21:12] mforns: I'm telling you in case it's causing your problems too [21:21:21] milimetric, ok will consider it [21:21:30] and it's one of those things that stops in 2012 'cause ar_page is populated after that I think [21:21:51] aha, yes agree we should fix this one [21:23:52] 06Analytics-Kanban: Can't log into https://piwik.wikimedia.org/ - https://phabricator.wikimedia.org/T144326#2807247 (10Milimetric) Let's chat on IRC to figure this out. I see the user configured to access the IOS and IOS Beta websites in piwik. Something weird must be going on. [21:24:18] HaeB: are you trying to look at the same site? IOS? [21:24:49] hi JoshM_ [21:24:56] hey Dan [21:25:03] thanks for the quick ping [21:25:11] milimetric: i'm failng to access https://piwik.wikimedia.org/ in the first place [21:25:26] HaeB: is your ldap auth working? [21:25:26] (i.e. before seeing the site menu) [21:26:11] so JoshM_ I understand your ldap is working, and your piwik login for wikimedia-ios, can you try it again though 'cause I just removed and re-added permissions [21:26:11] ah, i can now log in with my own user account? [21:26:21] HaeB: there are two layers of security there [21:26:33] there's LDAP before you hit the site so the public can't get to it [21:26:34] milimetric, yeah thats correct. will try again now [21:26:42] cool, that works (i tried the piwik / piwik admin account first because that worked recently) [21:26:48] and then there's a custom one for each site so people only have access to what they need [21:27:05] ok, logged in as "HaeB" and seeing the ios site [21:28:10] so, now I can't login under wikimedia-ios. did the password for that change? maybe I was in a old session... [21:28:17] milimetric: got it [21:29:32] 06Analytics-Kanban: (subtask) Dan's standard metrics - https://phabricator.wikimedia.org/T150025#2807256 (10Milimetric) Another problem that we should fix. Archive records with ar_page null are not generating revision / create history events. They must be discarded somewhere in the pipeline and this is messing... [21:29:40] 06Analytics-Kanban: Can't log into https://piwik.wikimedia.org/ - https://phabricator.wikimedia.org/T144326#2807257 (10Tbayer) (Figured it out on IRC - it works for me now, but not for Josh.) [21:29:50] JoshM_: no, I didn't change it, but I can change it now :) [21:30:05] JoshM_: want me to change it to something specific? [21:30:54] Yeah, I will send you directly [21:34:48] a-team, logging off, have a nice weekend! [21:35:07] have a good weekend! [21:43:46] 06Analytics-Kanban: Can't log into https://piwik.wikimedia.org/ - https://phabricator.wikimedia.org/T144326#2807284 (10JMinor) Also resolved my issue over IRC. Thanks @Milimetric! [21:44:14] np, glad it worked [23:41:09] (03PS3) 10MaxSem: WIP: reportupdater queries for EventLogging [analytics/discovery-stats] - 10https://gerrit.wikimedia.org/r/322007 (https://phabricator.wikimedia.org/T147034) [23:44:13] (03PS1) 10MaxSem: WIP: optionally record data to Graphite [analytics/reportupdater] - 10https://gerrit.wikimedia.org/r/322365 (https://phabricator.wikimedia.org/T150187) [23:45:52] (03CR) 10jenkins-bot: [V: 04-1] WIP: optionally record data to Graphite [analytics/reportupdater] - 10https://gerrit.wikimedia.org/r/322365 (https://phabricator.wikimedia.org/T150187) (owner: 10MaxSem)