[12:18:27] so let's see [12:18:33] from apt history Upgrade: libmysql-java:amd64 (5.1.41-1~deb8u1, 5.1.42-1~deb8u1) [12:19:11] that should be only Fix CVE-2017-3586 and CVE-2017-3589 by backporting the latest stable release. [12:19:22] ah "by backporting the latest stable release" [12:20:01] https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2017-3586 [12:20:21] https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2017-3589 [12:22:40] but the weird thing is that seems to fails doing the dns lookup [12:22:56] elukey: indeed [12:26:35] ah no wait a sec, the bug that I reported seems to be related to trying to resolve usernames as domains [12:26:41] that was a kite bug in reading config [12:26:46] so not related [12:27:06] elukey@analytics1003:~$ telnet labsdb-analytics.eqiad.wmnet 3306 works fine [12:27:57] elukey: I can connect using python [12:28:02] Seems java related :( [12:28:08] maybe a brojen link ? [12:28:41] 10Analytics, 10Analytics-EventLogging, 10Beta-Cluster-Infrastructure: deployment-kafka01 / partition is full - https://phabricator.wikimedia.org/T168564#3370226 (10hashar) Merci! [12:29:31] elukey: There are some kite jars in sqoop lib [12:31:09] elukey: on analytics1003 no broken links [12:31:16] Hey, does anyone knows where do the data for Banner Activity in Pivot come from originally: what's the source of data before Druid/Pivot - are they available in webrequests, for example? Thanks. [12:31:29] Hi GoranSM [12:31:35] joal: Hi! [12:31:40] Data in Pivot comes from parsing webrequest indeed [12:31:48] raw or refined? [12:32:20] Raw and refined are the same except refined has more fields and better file formats for querying - So refine in any case [12:32:29] GoranSM: --^ [12:32:32] joal: Thanks a lot. [12:32:37] no prob :) [12:33:06] joal: I need to track banner impressions for WMDE: how many times a banner was served. Do you have any suggestion on how to figure that out from webrequests? [12:33:56] GoranSM: I imagine that if you have that request, it's because data in Pivot is not good enough? [12:34:01] joal: Obviously, we're not talking about page views or anything similar; I need to know when (i.e. how many times) have we served a particular banner to some user, on some ocassion. [12:34:45] joal: can you try to re-run your script? [12:34:58] just want to see with tcpdump what happens [12:35:01] joal: In Pivot, for example, I cannot access the data for a campaign that we've stopped running, and imagine we would like to compare our actual campaign with our past campaign. [12:35:26] GoranSM: hm - That doesn't seem right [12:36:14] joal: The second reason is that we want to build an automated campaign tracking system (I work as a Data Analyst for WMDE, so I will be in charge of development), and that would mean Pivot goes out of the loop [12:36:41] GoranSM: pivot can go out of the loop - Would druid (pivot backend) be good enough? [12:37:17] I think for the moment we store all the data for all times in banner [12:37:24] joal: Any reason why wouldn't we go for webrequests on Hadoop? I've never worked with Pivot, while I can do HiveQL [12:37:54] GoranSM: I prefer it the other way around: Any reason druid wouldn't be good enough? [12:38:02] joal: No way all data are there; checked for some of our past campaigns minutes ago - no data. [12:38:10] elukey, joal: which host serves the broken query? [12:38:15] and since when does it fail? [12:38:31] GoranSM: webrequest is 1T per day - We alrdeay process that to push in druid - if druid is good enough, no need to re-read again the huge dataset [12:38:40] moritzm: Hi ! [12:38:47] joal: The reason is, obviously, that I would need to learn to work with Druid just in order to develop this one thing; any reason why Hadoop wouldn't do (I already fetch some campaign related data from webrequests)? [12:39:04] moritzm: I can't say which host, hadoop job, therefore any of the nodes [12:39:13] moritzm: Just tested it yesterday and today [12:39:30] moritzm: automated job run monthly, and it didn't fail on June 2nd [12:39:47] (upgrade happened on 8th) [12:39:54] GoranSM: Did the reason I gave you above make sense? [12:40:03] moritzm: the error is https://gist.github.com/jobar/58d115e775cb2d40c5ca48da930839b4 [12:40:16] seems really weird, dns lookup failing [12:40:30] GoranSM: Also, we are working a new strategy to split webrequest in advance for use-case similar to yours [12:40:48] So yes, hadoop (with the new setting) will definitely do [12:41:11] But if druid does as well, I suggest you use it (really faster, and optimised for analytics) [12:41:14] GoranSM: --^ [12:41:40] joal: Ok I'll do it in Druid. Can you point me to our Druid documentation which explains how to access the data (not Druid administration, I've found that already)? [12:42:46] GoranSM: Thanks for your understanding :) [12:43:03] GoranSM: We don't have good docs around querying druid manually :S [12:43:44] elukey: could be related to the JNI bug in the stack clash fixes [12:43:47] GoranSM: Druid doc (http://druid.io/docs/0.10.0/querying/querying.html) will help, I can also help you write some examples if you want [12:44:08] GoranSM: Idea is to curl druid100X from stat100X [12:44:12] we'd really need a query which hits a specific host instead of a random one to verify that [12:44:52] GoranSM: Query should be a POST with a JSON object as payload, expressing the query [12:45:34] 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Make non-nullable columns in EL database nullable - https://phabricator.wikimedia.org/T167162#3370251 (10elukey) [12:46:30] joal: Got it. Thank you very much, I will read the Druid documentation and try to query from stat100x. By the way, having any examples available would be good, because I have to prepare this banner impression analytics rather quickly; could you send the examples to goran.milovanovic_ext.wikimedia.de? [12:47:20] GoranSM: I don't really have good examples - I think it'd be faster to cowork a valid one if you wish :) [12:47:53] joal: How can I contact you? [12:48:26] I'm goran.milovanovic_ext.wikimedia.de, Data Analyst (contractor, working remotely from Belgrade) [12:48:40] elukey: one possibility to verify would be to boot one of the hadoop nodes with the kernel parameter "stack_guard_gap=1" (which effectively reverts the kernel security fix by limiting the guard page to a single page) and re-running the query like 50-100 times (if the queried hadoop node is random, it should likely hit the modified system at least once) [12:48:40] moritzm: maybe we can create a small java class that does connect via the mysql connector and see how it goes [12:48:49] GoranSM: hm, to cowork you mean? Easier is hangout - We ping each other here, and decide to go to a room [12:49:00] Hi GoranSM :) [12:49:05] or we downgrade mysql-connector-java to validate [12:49:13] joal: Hi :) And that you so much for support, stay in touch! [12:49:29] moritzm: could be an option as well [12:49:29] GoranSM: I'm Joseph Allemandou, data engineer contracting for WMF from Brest, France :) [12:49:41] GoranSM: joal_AT_wikimedia.org [12:49:55] GoranSM: As I said, you ping me here, and we can talk whenever you want :) [12:50:09] GoranSM: It's a pleasure having people use our tools and data [12:50:37] joal: Did we had a chance to meet yesterday during the Large scale analytics Hangout? [12:50:43] We did :) [12:50:57] GoranSM: I spent some time with Nathaniel, discussing Spark related stuff [12:51:02] joal: :) Hi, Joseph, and once again, thank you very much! [12:51:09] joal: do you happen to have a snippet of java code that connects to labsdb to quickly test if it works? [12:51:23] elukey: I don't, but can do [12:51:24] * fdans lunch! [12:51:40] joal: that would be awesome [12:52:32] otherwise I'll do it [12:53:38] elukey: doing it now [13:13:53] joal: any news [13:13:54] ? [13:14:10] Trying to run my code, but I'm not used to java CLI anymore [13:14:39] elukey: batcave? [13:15:52] sure [13:25:15] joal: thanks for dealing with that labs kafka broker yesterday [13:35:56] i think kafka filled up because there are a lot of app eventlogging events in beta [13:36:05] usually beta eventlogging is quiet [13:36:16] but there look to be a few MobileWikiApp* events / second in beta now [13:49:29] ottomata: helloooo.. do you have a min to join the cave? [13:50:07] np ottomata :) [13:50:19] ottomata: sorry for the lag, fightin [13:50:23] with sqoop [13:51:43] 5 mins... [14:08:40] moritzm: I'm sorry to have disturbed - It was completely unrelated issue [14:12:42] hi team [14:12:43] :] [14:20:43] hiii [14:26:58] I made it [14:27:31] ottomata: wanna finish the datasets cleanup with me today? [14:31:06] haha yes milimetric! after 3 today? we got meetings and i have an interview at 2pm [14:31:15] k, les do it [14:33:54] milimetric: got a minute for me? [14:33:58] def [14:34:06] In da CAVE :) [14:34:12] omw [14:45:04] fdans: have you messed with the webpack config in any branch? [14:45:17] I'm trying to follow different tutorials to make a simple prod config [14:45:30] milimetric: lms [14:45:31] and I'd like to move the files into webpack/ and restructure a bit [14:45:48] if you pushed to some branch I can just check it out, but I don't see anything [14:46:30] milimetric: no I'm using the one you pushed in aqs [14:46:51] anybody know what the beta eventlogging mysql db root pw is? [14:46:53] ok, cool, will finish that feature and put it in develop then start a new one [14:47:10] ottomata: should be in /etc/mysql.d? [14:48:31] ha, there's nothing in there! [14:49:33] I don't know then, though there's this way to login to mysql locally where you bypass any user access, with like a system account [14:49:45] I always forget one sec [14:49:50] 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Make non-nullable columns in EL database nullable - https://phabricator.wikimedia.org/T167162#3370722 (10elukey) Following @Marostegui's advice I changed the script to generate more precise alter tables, adding to `varchar` and `varbinary` their length (... [14:51:31] mforns: o/ --^ [14:51:35] re-generated the alters [14:51:38] elukey, hi! reading [14:52:37] arg, can't find it, sorry [14:53:06] elukey, makes sense [14:53:18] i am having the not uncommon: what the heckay eventlogging code is this deployment prep instance runinnng [14:53:21] surely not MY code [14:53:25] i changed it! [15:02:37] ottomata, this happened to me when I did not python setup.py install it after pulling... [15:02:37] OH MY STANDUP [15:02:48] do people do that? [15:02:52] you are def not supposed to do that anymore [15:02:57] if someone did that, that is why this isn't working! [15:03:56] mforns: ^ [15:04:20] ottomata, yea I did it [15:04:24] :/ [15:05:07] scap deploy was not working in beta last time I tested [15:05:16] never do a globalk setup.py install anymore [15:05:22] it hsould work directly from the deploy path [15:05:35] i just did : sudo find /usr/local/lib/python2.7/ -exec rm -rfv {} \; [15:06:24] 10Analytics-Kanban, 10Patch-For-Review: Rename last_access_uniques to per-domain uniques - https://phabricator.wikimedia.org/T167043#3315501 (10Milimetric) p:05Triage>03Normal [15:07:20] 10Analytics-Kanban: Rename unique_devices_project_wide to unique_devices_per_project_class - https://phabricator.wikimedia.org/T168402#3370812 (10Milimetric) p:05Triage>03Normal [15:07:34] 10Analytics-Kanban, 10Patch-For-Review: Load webrequest raw data into druid so ops can use it for troubleshooting - https://phabricator.wikimedia.org/T166967#3370816 (10Milimetric) p:05Triage>03Normal [15:07:36] 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Make non-nullable columns in EL database nullable - https://phabricator.wikimedia.org/T167162#3370817 (10Milimetric) p:05Triage>03Normal [15:07:49] 10Analytics-Kanban: Modify EventLogging so that all table fields are nullable - https://phabricator.wikimedia.org/T167161#3370818 (10Milimetric) p:05Triage>03Normal [15:07:55] 10Analytics-Kanban, 10Page-Previews, 10Reading-Web-Backlog: Update purging settings for Schema:Popups - https://phabricator.wikimedia.org/T167449#3370821 (10Milimetric) p:05Triage>03Normal [15:08:02] 10Analytics-Kanban: Implement purging settings for Schema:ReadingDepth - https://phabricator.wikimedia.org/T167439#3370822 (10Milimetric) p:05Triage>03Normal [15:09:04] 10Analytics-EventLogging, 10Analytics-Kanban: whitelist multimedia and upload wizard tables - https://phabricator.wikimedia.org/T166821#3370823 (10Milimetric) p:05Triage>03Normal [15:09:20] 10Analytics-Kanban, 10Patch-For-Review: Update per-domain uniques fresh-sessions computation - https://phabricator.wikimedia.org/T167005#3370824 (10Milimetric) p:05Triage>03Normal [15:10:48] joal: ok :-) [15:11:26] moritzm: classical PBKAC, I'm very ashamed to have shouted wolf [15:26:11] 10Analytics-Kanban: Change retention time for deployment-kafka01 - https://phabricator.wikimedia.org/T168576#3370874 (10JAllemandou) 05Open>03Resolved a:03Ottomata [15:26:15] ottomata: I gave it 3 points for you [15:29:34] danke :) [15:36:47] (03PS1) 10Joal: Add two tables to sqoop on hadoop [analytics/refinery] - 10https://gerrit.wikimedia.org/r/360866 [15:37:58] nice joal, that basically proves that in the vast majority of cases revision count is a good approximation for editcount [15:38:09] you can reply to those big threads and contradict me :) [15:38:15] correct milimetric :) [15:38:28] milimetric: I'll wait for the boss approval ;) [15:38:59] milimetric: plus, I try not to contradict my fellows, I let them do it ;) [15:39:48] well, it's your research so you should present it, but I can set you up with "actually, it's pretty close in reality, Joseph has numbers". I'll do that now [15:41:25] 10Analytics: Productionize analysis of editcount vs per_user_revision_count - https://phabricator.wikimedia.org/T168648#3370932 (10JAllemandou) [15:41:29] milimetric: you can refer --^ [15:41:31] :) [15:41:54] k, nice [15:42:38] a-team, leaving now for the prez - see you after :) [15:45:10] (03CR) 10Ottomata: [C: 031] Add two tables to sqoop on hadoop [analytics/refinery] - 10https://gerrit.wikimedia.org/r/360866 (owner: 10Joal) [16:07:32] 10Analytics, 10EventBus, 10ORES, 10Reading-Infrastructure-Team-Backlog, and 3 others: Emit revision-score event to EventBus and expose in EventStreams - https://phabricator.wikimedia.org/T167180#3371058 (10Ottomata) [16:08:28] 10Analytics, 10EventBus, 10ORES, 10Reading-Infrastructure-Team-Backlog, and 3 others: Emit revision-score event to EventBus and expose in EventStreams - https://phabricator.wikimedia.org/T167180#3319947 (10Ottomata) @halfak, could you look over https://gerrit.wikimedia.org/r/#/c/357457/5/jsonschema/mediawi... [16:23:17] brb [16:25:30] 10Analytics, 10EventBus, 10ORES, 10Reading-Infrastructure-Team-Backlog, and 3 others: Emit revision-score event to EventBus and expose in EventStreams - https://phabricator.wikimedia.org/T167180#3371136 (10Pchelolo) > @Pchelolo, what do we need to do to produce this from changeprop? An hour of time to imp... [16:28:02] fdans: oh man, I was stuck for a couple of hours on the dumbest thing [16:28:29] milimetric: aw, what was it? [16:28:39] you had config = require('../webpack.config') in build/utils.js and I was running all over the place trying to figure out this little loop [16:28:51] but the error was very clear in the stack trace [16:29:13] I've gotta learn to like relax and read the error message, grok it fully [16:29:21] if it's not super friendly I freak out :) [16:29:46] I take it all personal, like WHY ARE YOU SO MEAN [16:29:47] lol [16:32:07] milimetric: WELL IT WAS ALL ON PURPOSE [16:32:30] yeah, like the world has time to orchestrate a really sneaky well planned attack on me [16:32:50] in June 2017 he'll be using the latest version of webpack-merge everyone, let's get ready [16:36:57] hahah [16:41:15] * elukey off! [16:56:33] 10Analytics-Kanban, 10Wikimedia-Stream: Port RCStream clients to EventStreams - https://phabricator.wikimedia.org/T156919#3371331 (10Ottomata) Just a reminder that we will be shutting off RCStream after July 7th. [17:28:57] (03PS1) 10Ottomata: Adding test for authorship [analytics/refinery] - 10https://gerrit.wikimedia.org/r/360881 [17:33:08] (03Abandoned) 10Ottomata: Adding test for authorship [analytics/refinery] - 10https://gerrit.wikimedia.org/r/360881 (owner: 10Ottomata) [17:33:10] (03PS1) 10Ottomata: test auth [analytics/refinery] - 10https://gerrit.wikimedia.org/r/360882 [17:36:25] (03Abandoned) 10Ottomata: test auth [analytics/refinery] - 10https://gerrit.wikimedia.org/r/360882 (owner: 10Ottomata) [17:40:23] (03CR) 10Ottomata: [C: 032] Remove deprecated ClientIpUDF and deprecate Legacy Pageview code. [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/358603 (https://phabricator.wikimedia.org/T118557) (owner: 10Ottomata) [17:42:23] 10Analytics-Kanban, 10Operations, 10Traffic, 10Patch-For-Review: Replace Analytics XFF/client.ip data with X-Client-IP - https://phabricator.wikimedia.org/T118557#3371503 (10Ottomata) 05Open>03Resolved [18:34:39] hm, the wikistats repository is mirrored and it appears it's not separate from the gerrit one as changes are pushed to both: https://phabricator.wikimedia.org/diffusion/ANWS/ [18:49:34] 10Analytics, 10EventBus, 10ORES, 10Reading-Infrastructure-Team-Backlog, and 3 others: Emit revision-score event to EventBus and expose in EventStreams - https://phabricator.wikimedia.org/T167180#3371721 (10Pchelolo) A couple of open questions we need to answer first: 1. Are we going to post all revision-c... [18:58:04] 10Analytics, 10EventBus, 10ORES, 10Reading-Infrastructure-Team-Backlog, and 3 others: Emit revision-score event to EventBus and expose in EventStreams - https://phabricator.wikimedia.org/T167180#3371757 (10Ottomata) > Are we going to post all revision-create events to the new stream even if ORES is not sup... [19:00:15] 10Analytics, 10EventBus, 10ORES, 10Reading-Infrastructure-Team-Backlog, and 3 others: Emit revision-score event to EventBus and expose in EventStreams - https://phabricator.wikimedia.org/T167180#3371764 (10Pchelolo) > Not sure! If change-prop does validation of the events, I think I'm fine with that. It d... [19:01:10] milimetric: ! [19:01:11] hiiii [19:05:16] hey ottomata [19:05:46] cave? [19:05:53] k [19:38:25] ottomata: super tiny CR for you when you have a minute :) https://gerrit.wikimedia.org/r/#/c/360895/ [19:49:23] 10Analytics, 10EventBus, 10ORES, 10Reading-Infrastructure-Team-Backlog, and 3 others: Emit revision-score event to EventBus and expose in EventStreams - https://phabricator.wikimedia.org/T167180#3371947 (10Pchelolo) > Not sure! If change-prop does validation of the events, I think I'm fine with that. Hm..... [19:54:03] done, thanks bearloga [19:54:18] ottomata: thanks! :D [19:55:28] 10Analytics, 10EventBus, 10ORES, 10Reading-Infrastructure-Team-Backlog, and 3 others: Emit revision-score event to EventBus and expose in EventStreams - https://phabricator.wikimedia.org/T167180#3319947 (10mobrovac) >>! In T167180#3371757, @Ottomata wrote: >> Are we going to post all revision-create events... [19:56:18] ottomata: hypothetically it should become available the next time puppet agent runs on stat1002 which is probably sometime soon, right? [19:56:36] i just ran puppet! [19:56:37] you got it. [19:56:38] :) [19:56:42] shweet [19:56:49] ottomata: ta! [20:07:19] 10Analytics, 10EventBus, 10ORES, 10Reading-Infrastructure-Team-Backlog, and 3 others: Emit revision-score event to EventBus and expose in EventStreams - https://phabricator.wikimedia.org/T167180#3371998 (10Pchelolo) > That. Let's not duplicate if we don't have to. The revision-create traffic is pretty low... [20:27:33] 10Analytics-Kanban, 10Wikimedia-Stream: Port RCStream clients to EventStreams - https://phabricator.wikimedia.org/T156919#3372018 (10Krinkle) [20:29:02] 10Analytics, 10EventBus, 10ORES, 10Reading-Infrastructure-Team-Backlog, and 3 others: Emit revision-score event to EventBus and expose in EventStreams - https://phabricator.wikimedia.org/T167180#3372019 (10Ottomata) > This sounds like we are pushing this forward without a clear understanding of what we wan... [20:51:21] 10Analytics-Kanban: Update undocumented EventLogging mediawiki hooks - https://phabricator.wikimedia.org/T158331#3372097 (10Ottomata) 05Open>03Resolved Done: https://www.mediawiki.org/wiki/Manual:Hooks/EventLoggingRegisterSchemas [20:51:25] 10Analytics, 10Analytics-EventLogging, 10Collaboration-Team-Triage, 10MediaWiki-ContentHandler, and 5 others: Multiple MediaWiki hooks are not documented on mediawiki.org - https://phabricator.wikimedia.org/T157757#3372099 (10Ottomata) [20:51:41] 10Analytics-Kanban: Update undocumented EventLogging mediawiki hooks - https://phabricator.wikimedia.org/T158331#3372114 (10Ottomata) 05Resolved>03Open Will resolve later... [20:51:43] 10Analytics, 10Analytics-EventLogging, 10Collaboration-Team-Triage, 10MediaWiki-ContentHandler, and 5 others: Multiple MediaWiki hooks are not documented on mediawiki.org - https://phabricator.wikimedia.org/T157757#3015493 (10Ottomata) [21:37:42] 10Analytics, 10Analytics-Backlog, 10Analytics-EventLogging, 10Fundraising-Backlog, and 4 others: Promise returned from LogEvent should resolve when logging is complete - https://phabricator.wikimedia.org/T112788#3372575 (10mmodell) [21:44:21] 10Analytics, 10Analytics-EventLogging, 10Beta-Cluster-Infrastructure, 10Fundraising-Backlog, and 4 others: Beta Cluster EventLogging data is disappearing? - https://phabricator.wikimedia.org/T112926#3373008 (10mmodell) [21:48:43] 10Analytics, 10Analytics-EventLogging, 10Fundraising-Backlog: Nested EventLogging data doesn't get copied to MySQL - https://phabricator.wikimedia.org/T112947#3373224 (10mmodell) [22:26:09] 10Analytics-Tech-community-metrics, 10Gerrit, 10Upstream: Gerrit patchset 99101 cannot be accessed: "500 Internal server error" - https://phabricator.wikimedia.org/T161206#3373558 (10Paladox) The change on GitHub is https://github.com/wikimedia/operations-debs-kafka/commit/dbed4e47a6df5028263d62eb6ec97daa588... [22:57:29] 10Analytics, 10Analytics-EventLogging, 10Fundraising-Backlog: Nested EventLogging data doesn't get copied to MySQL - https://phabricator.wikimedia.org/T112947#3373631 (10Nuria) 05Open>03Resolved