[03:44:49] in Hive Webrequest, any way of seeing whether a request hit PHP or was provided by Varnish? [03:49:11] maybe varnishes will have a different host value? [03:58:37] mmmm I'll bug folks tomorrow ;p cya! [08:17:24] joal_: o/ [08:17:35] whenever you are ready I'd start to reboot druid :) [09:42:28] joal_: /me plays sad_trombone.wav [09:42:29] CREATE KEYSPACE system_auth WITH replication = {'class': 'SimpleStrategy', 'replication_factor': '1'} AND durable_writes = true; [09:45:47] this is probably why we have seen some 503s during the last reboots [09:47:18] elukey: Arfff :( [09:47:36] elukey: How have you found that? [09:48:20] https://wikitech.wikimedia.org/wiki/Incident_documentation/20161021-Maps [09:48:35] I helped Guillaime to follow up on it [09:48:50] then I thought "wait a minute.. did we do it in the new cluster?" [09:48:58] and then the sad trombone wav [09:49:00] sigh [09:49:00] good catch elukey ! [09:49:41] elukey: I mean, price is small in comparison of a real issue and having to find while cluster is on fire ! [09:53:59] so from https://docs.datastax.com/en/cql/3.1/cql/cql_using/update_ks_rf_t.html it seems that we could use SimpleStrategy and 6? [09:54:25] and then nodetool repair on one node to kick off the replicas [09:54:53] mmm probably on all the nodes [09:55:25] elukey: I think repair needs to happen on all nodes (from what I read), but I also think we don't want a global repair (on per-article keyspace, it's a nightmare [09:56:28] elukey: I think we want to repair only system_auth keyspace [09:57:08] yeah [09:58:23] and we can give the keyspace (https://docs.datastax.com/en/cassandra/2.1/cassandra/tools/toolsRepair.html0 [09:58:27] so something like [09:58:53] ALTER KEYSPACE "system_auth" WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : 6 }; [09:59:01] on each node: [09:59:09] nodetool repair system_auth [09:59:17] (that is nodetool-{a,b}) [10:00:30] Sounds good :) [10:23:36] hi, does anyone know mschwarzer (he works on citolytics) or if he is on irc? [10:24:08] Hi dcausse, sorry, I don't know him [10:24:49] joal: are you aware of an apache flink job to generate data for citolytics (article recommendation)? [10:25:02] dcausse: absolutely not [10:25:05] :/ [10:25:18] did we ever scheduled something like that with oozie? [10:25:31] I mean apache flink [10:25:38] dcausse: I have not heard of flink on our cluster (except us testing it) [10:26:23] dcausse: it really depends on the job type [10:26:44] ok, I don't know where the code is so it's hard to tell :/ [10:26:45] dcausse: flink is mostly used for streaming purposes - In that case, oozie is of no sense [10:26:53] ah... [10:27:14] dcausse: If it's a batch process in flink, oozie could do [10:27:40] unfortunately I have no idea... I don't where's the code [10:27:51] *I don't know [10:28:16] https://github.com/wikimedia/citolytics [10:29:10] dcausse: Looks like those jobs are batch [10:29:34] joal: at a glance it seems possible to use oozie then? [10:30:43] dcausse: correct, using the plain old java way [10:31:14] dcausse: There is no oozie action for flink - So it would mean launching jobs using java actions [10:31:20] he needs to push his data to the production cluster, since we already have a process to do that for pageviews => elasticsearch, I've suggested to plug his stuff into our oozie workflows [10:31:23] joal: ok [10:32:30] dcausse: 2 things to keep in mind: the data his jobs depends on needs to be available on the cluster, and how much resources do his job need [10:33:22] joal: he uses clics data so I suppose he already ran it on our cluster [10:34:15] dcausse: I think clics data is generated by ellery and available online (but not sure) [10:34:28] oh [10:34:41] dcausse: I don't think those flink jobs have ever been run on the cluster (but I'm n ot aware of everything hapenning) [10:35:00] ok... [10:53:21] joal: stupid question: is possible for oozie to wait for 2 data-in in input-events? I mean start the job only if 2 dependent jobs are done? [10:53:34] dcausse: it sure is [10:53:39] thanks! [10:57:35] * elukey lunch! [11:17:58] hi team! [11:34:28] Hey mforns :) [11:34:37] hello joal! [12:10:06] Analytics-Kanban: Create documentation for edit history reconstruction - https://phabricator.wikimedia.org/T139763#2741447 (mforns) https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Data_loading [12:21:36] joal: ready to reboot druid? [12:21:50] sure elukey [12:24:10] all right proceeding with druid1001 [12:26:54] elukey: good thing that you restart them, I needed a fresh cache for some test ;) [12:26:59] :P [12:29:32] * joal is happy - removing commons and wikisource projects makes user and page workable [12:29:57] * joal is sad - We need to find a way to work commons and wikisource projects [12:30:26] Why is that almost always the case that I have a hate/love relationship with my solved problems? [12:35:49] :D [12:36:20] Analytics-Wikistats: Wikistats report on active editors for all projects (deduplicated) is inconsistent with other Wikistats reports - https://phabricator.wikimedia.org/T149087#2741529 (ezachte) [12:56:17] ok joal I restarted clickhouse on druid1001 [12:56:26] can you check that everything is good? [12:57:47] sure elukey [12:59:02] I don't like non-puppetized things in prod [12:59:12] I mean, I get the fact that we want to test things [12:59:21] but at some point if we use them we need to put them in puppet [13:02:42] Analytics-Kanban: Create documentation for edit history reconstruction - https://phabricator.wikimedia.org/T139763#2741570 (mforns) https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Serving_layer [13:03:23] * elukey grumpy mode off [13:08:35] joal: ? [13:08:45] can I proceed with druid1002? :) [13:09:09] elukey: yup ! [13:09:41] super [13:14:09] urandom: o/ can you ping me whenever you have 10 minutes? [13:29:10] Analytics-Kanban: Create documentation for edit history reconstruction - https://phabricator.wikimedia.org/T139763#2741629 (mforns) And also added the links to the new pages here: https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake [13:30:21] Analytics-Kanban: Create documentation for edit history reconstruction - https://phabricator.wikimedia.org/T139763#2741636 (mforns) I think this task is pretty much done. Of course, the docs are not perfect, and please add/modify anything you like. [13:31:18] Analytics-Kanban, Patch-For-Review: Extract edit oriented data from MySQL for simplewiki - https://phabricator.wikimedia.org/T134790#2741639 (mforns) [13:31:20] Analytics: User History: Add history of annonymous users to history reconstruction - https://phabricator.wikimedia.org/T139760#2741638 (mforns) [13:32:06] druid100[123] rebooted, clickhouse should be working [13:39:43] elukey: confirmed ! [13:41:06] joal: if we are planning to keep it longer I'd suggest to spend some time to puppetize it [13:42:05] elukey: we need to discuss that with the team :) [13:43:58] of course, but I am going to have a strong opinion on it :D [13:44:16] I am in favor of bending rules for experimental services for a bit [13:44:30] since testing these use cases in labs is painful [13:44:46] but when we cross a certain usage we need to put things in puppet [14:04:36] hey can someone link me the job coding task? i lost it... [14:20:36] elukey: When I said discuss with the team, I was thinking about the need to keep it longer :) Of course puppettsation will be needed in that case ;) [14:20:57] joal: ahahhah okok [14:21:07] I am in favor of keeping it in the medium/long term [14:21:35] :D [14:22:11] elukey: Just ran some tests: it works really great, but there are some drawback [14:22:37] elukey: particularly, there is no [sub]-query level cache [14:23:02] elukey: in comparison to druid, this makes a big difference for repeated queries [14:23:07] yeah [14:23:16] I am super ignorant but I got this part [14:23:17] :D [14:23:35] :) [14:24:29] elukey: I can't reacll what we already discussed about clickhouse ;) [14:24:57] I haven't with you, but I followed a bit your discussions with the team [14:25:25] Ahhhh, ok :) [14:26:20] some results here if you want: https://docs.google.com/spreadsheets/d/1uPDmua7m3h5iH9LsbVPirOIcQ9Kc3D6LopW4S4UXpk8/edit#gid=0 [14:26:24] elukey: --^ [14:28:30] I haz no permissionz [14:28:32] :( [14:28:51] https://docs.google.com/a/wikimedia.org/spreadsheets/d/1uPDmua7m3h5iH9LsbVPirOIcQ9Kc3D6LopW4S4UXpk8/edit?usp=sharing [14:28:55] elukey: --^ better? [14:29:08] yesss [14:29:52] wow there seem to be no clear winner [14:30:07] elukey: depends on the queries :) [14:30:17] Complex group by --> clickhouse way faster [14:30:29] simple queries - Same answers [14:30:39] BUT, no cache on clickhouse [14:30:43] :S [14:42:43] elukey: Last but not least, clickhouse answers a much larger span of queries than druid does [14:44:16] Analytics: Provide API for sampling pageviews - https://phabricator.wikimedia.org/T126290#2741812 (Nuria) >Yes, this can be done manually with a hive query, but then that can be said about all Analytics APIs :) Not really, the APIs exists to provide aggregated (long term) data for WMF and the community. In t... [14:50:29] elukey: about to go into a meeting, i'll ping you back when I'm done (~1 hour) [14:55:25] elukey: leila couldn't get into pivot yesterday either, how can I check that she's in wmf/nda in puppet? [14:55:37] urandom: super thanks! [14:55:48] milimetric: going to check on terbium [14:55:56] what is her username? [14:56:40] lzia or leila, lemme check on wikitech though [14:57:19] it looks like it's "leila", elukey [14:58:00] Her case is even weirder than the other folks who don't have access 'cause she's a researcher and should have pretty much everything [14:58:36] and she should be in wmf [15:01:45] milimetric: she seems neither in wmf nor in nda :( [15:02:17] I tried leila and lzia [15:02:27] this is weird [15:02:32] she should be at least in wmf [15:02:36] we should open a phab task [15:50:56] Analytics-Kanban, Patch-For-Review: Extract edit oriented data from MySQL for simplewiki - https://phabricator.wikimedia.org/T134790#2741939 (Nuria) [15:50:58] Analytics-Kanban: Create documentation for edit history reconstruction - https://phabricator.wikimedia.org/T139763#2741938 (Nuria) Open>Resolved [15:51:09] Analytics-Kanban, Patch-For-Review: Extract edit oriented data from MySQL for simplewiki - https://phabricator.wikimedia.org/T134790#2277147 (Nuria) Open>Resolved [15:51:22] Analytics-Kanban: Make yarn.wikimedia.org correctly proxy to Spark UI - https://phabricator.wikimedia.org/T147927#2741941 (Nuria) Open>Resolved [15:51:37] Analytics-EventLogging, Analytics-Kanban, Patch-For-Review: Examine puppet code for EventLogging and make sure monitoring is using the best counts - https://phabricator.wikimedia.org/T147321#2741942 (Nuria) Open>Resolved [15:56:40] (PS1) Ori.livneh: Die if no metrics received in one minute [analytics/statsv] - https://gerrit.wikimedia.org/r/317847 [16:00:34] elukey: ok; meeting over! [16:00:38] elukey: sup? [16:02:04] joal: I lost track of what you said exactly about the boolean columns and now I'm confused [16:02:19] hey milimetric: [16:02:32] the booleans are coming across ok but one of them was coming in as boolean when it shouldn't? [16:02:46] milimetric: I think the boolean is currently stored as a TINYINT in mariadb [16:03:01] yes, and there are some TINYINT(4), right? [16:03:04] that hive thinks are bools? [16:03:19] and for TINY(1), spark can read it as boolean, but for TINY(4) it can't [16:03:23] but the TINYINT(1)s are all coming in ok as booleans, right? [16:03:35] milimetric: correct, I think that's correct [16:03:47] ok, you don't by chance know all the tiny(4)s, right? [16:04:01] milimetric: I have not checked all and every DB for column type, but noticed [16:04:18] I'll check all schemas for tinyints [16:04:23] thx and apologies [16:04:24] milimetric: really ? [16:04:36] milimetric: wouldn't it5 be easier to cast to boolean for instance?> [16:04:58] urandom: can we sync in ~1hr? I have a meeting now :( [16:05:03] heh [16:05:08] elukey: tag, you're it! [16:05:15] the tinyint(1) are being recognized correctly for hive, and casting would use up needless CPU there. I think I just need to find the tiny(4)s and cast them to int [16:05:17] elukey: but, yeaa. [16:05:30] unless spark is behaving differently, joal [16:06:03] milimetric: you mean cast them to bools? [16:06:15] I don't get it milimetric [16:06:27] joal: let's batcave, this is too confusing for something so simple :) [16:06:38] OMW milimetric ! [16:06:39] https://hangouts.google.com/hangouts/_/wikimedia.org/a-batcave-2 [16:06:43] joal: ^ [16:16:37] addshore: any idea why the wikidatawiki sql database has tinyint(3) and tinyint(4) for a bunch of columns that are tinyint(1) in other dbs, like enwiki? [16:16:50] for example: revision.rev_is_minor, archive.ar_minor [16:17:24] (PS2) Ori.livneh: Die if no metrics received in one minute [analytics/statsv] - https://gerrit.wikimedia.org/r/317847 [16:18:02] *reads up* [16:18:23] milimetric: that seems rather oddd... [16:18:39] yep :) [16:18:48] ottomata, elukey: I'm sorry I completely missed the ops meeting :( [16:18:58] should I file a task about it, and if so who do I CC, addshore? [16:19:11] *quickly looks into it* [16:20:08] i may be able to figure this out ;) [16:22:23] thx, it's appreciated but it's not a blocker for me right now (we have to deal with it anyway) [16:22:40] milimetric: well, super odd, in the sql files they are simply always Tinyint (nothing specified) [16:22:53] sql files? what are those [16:22:57] it should defiantly not be anything that wikibase / wikidata has done [16:23:16] like... the db page files?! [16:23:28] https://github.com/wikimedia/mediawiki/blob/master/maintenance/tables.sql#L348 [16:26:59] so milimetric on my local install I also have tintyint(3), so I guess tinyint(3) might be the default now when you specify tinyint [16:27:35] it's possible the older dbs thus have tinyint(2) and tinyint(1) either because the default used to be different or the SQL that created the tables was different [16:32:53] thanks addshore, I'm checking all schemas now and will file a task [16:32:53] (CR) Ori.livneh: [C: 2 V: 2] Use ^ and $ while spliting metric value and type [analytics/statsv] - https://gerrit.wikimedia.org/r/308959 (owner: Addshore) [16:33:07] thanks ori ! [16:33:07] Analytics, Beta-Cluster-Infrastructure, WikimediaPageViewInfo: Deploy WikimediaPageViewInfo extension to beta cluster - https://phabricator.wikimedia.org/T129602#2742043 (greg) >>! In T129602#2741033, @MZMcBride wrote: > Deploying this extension to the beta cluster should be fairly straightforward.... [16:33:19] addshore: np, sorry it took forever [16:35:28] addshore: you're right, all the new dbs (wikivoyages, wikidata, etc.) have tinyint(3) and some of the super new ones have tinyint(4) [16:36:34] (PS3) Ori.livneh: Die if no metrics received in one minute [analytics/statsv] - https://gerrit.wikimedia.org/r/317847 [16:36:49] (CR) Ori.livneh: [C: 2 V: 2] Die if no metrics received in one minute [analytics/statsv] - https://gerrit.wikimedia.org/r/317847 (owner: Ori.livneh) [16:43:04] joal: I was right, log_deleted is legitimately a tinyint(3) in all dbs. And you were right that instead of tinyint(1), relatively recent dbs use tinyint(3) and new ones use tinyint(4). So I'm going to run the sqoop on something small now [16:43:37] (PS23) Milimetric: Script sqooping mediawiki tables into hdfs [analytics/refinery] - https://gerrit.wikimedia.org/r/306292 (https://phabricator.wikimedia.org/T141476) [16:43:42] milimetric: How will you convert the data to make it coherent? [16:43:55] joal: trying the if trick you mentioned and will try other stuff if that doesn't work [16:44:04] Ok great [16:44:27] milimetric: let me know if you want me to test values in spark [16:59:08] joal: oh, of course this won't finish until your jobs finish [16:59:26] but it's a really small wiki so it'll be available soon and I'll let you know when you can test [16:59:43] milimetric: super, it's easy to test by hand [17:15:43] urandom: o/ [17:16:03] so today I discovered that system_auth on aqs100[456] has replication 1 [17:16:06] :( [17:16:22] so what I want to do is alter the keyspace to have replication 6 [17:16:37] and then run nodetool-{a,b} repair system_auth on all nodes [17:16:53] just wanted to double check with you if this sounds ok or if there are precautions to take [17:21:06] milimetric: I found an interesting thing I think: pivot never does a groupBy query: It sends many smaller queries and combine reuslts [17:21:14] milimetric: From what I understood [17:21:42] anyway, AFK, will be back after diner to check jobs [17:21:53] joal: oh that's cool [17:26:43] mforns: hey I have a few minutes before my interview, is it useful to hang out or would I just bother you at this point? [17:26:52] batcave milimetric ! [17:27:40] 1981 python3 sqoop-mediawiki-tables --jdbc-host analytics-store.eqiad.wmnet --output-dir /user/milimetric/wmf/data/raw/mediawiki/tables --wiki-file "/home/milimetric/refinery/static_data/mediawiki/grouped_wikis/grouped_wikis.csv" --timestamp 20161017000000 --user research --password-file /user/milimetric/mysql-analytics-research-client-pw.t [17:27:43] heh [17:27:44] oops [17:27:48] milimetric, having problems joining hangouts... [17:27:48] mforns: https://hangouts.google.com/hangouts/_/wikimedia.org/a-batcave-2 [17:27:49] hehe [17:28:43] milimetric, timeouts.. [17:28:51] sucks [17:30:15] no.. [17:30:57] milimetric, what was the other open videoconference app we used sometimes? [17:31:22] mforns: https://appear.in/wmf-batcave [17:35:42] team logging off! [17:35:46] byyeee o/ [18:07:58] (CR) Dzahn: [C: 1] Replace Bugzilla links by Phabricator links [analytics/wikistats] - https://gerrit.wikimedia.org/r/315417 (owner: Aklapper) [18:08:39] heading to cafe [18:26:23] elukey: did i miss you? [18:56:18] urandom: yes, he logs off by 11 pst most days, can i help you though? [19:09:33] (CR) Nuria: "Looks good, did you test across browsers and such?" [analytics/dashiki] - https://gerrit.wikimedia.org/r/316834 (https://phabricator.wikimedia.org/T147884) (owner: Milimetric) [19:09:45] Analytics, Editing-Analysis: Determine: What percentage of new articles are created by non-autoconfirmed editors - https://phabricator.wikimedia.org/T149021#2742507 (Quiddity) [19:12:39] (CR) Milimetric: "Yes, we only tested firefox and chrome though. The next change will need the same kind of manual cross-browser testing, so we figured we'" [analytics/dashiki] - https://gerrit.wikimedia.org/r/316834 (https://phabricator.wikimedia.org/T147884) (owner: Milimetric) [19:15:06] (CR) Nuria: "Tested from src and layouts in chrome and things look good. Let's merge?" [analytics/dashiki] - https://gerrit.wikimedia.org/r/316834 (https://phabricator.wikimedia.org/T147884) (owner: Milimetric) [19:21:45] hi! looking at webrequest in Hive, I see a field "cache_status string Cache status". However in the doc it says "cache_status string Cache status (this field is wrong and should be removed)" https://wikitech.wikimedia.org/wiki/Analytics/Data/Webrequest [19:22:19] anyone know if the field OK? thx in advance! [19:22:22] an AndyRussG : we updated the field since then [19:23:58] AndyRussG: corrected, but logic in varnish is tricky so if you do not know how things work well do learn abit about cache stages [19:25:04] nuria: ah fantastic... I see values of "hit", "miss" and "pass" in the data I pulled [19:25:26] Shall I correct the doc to remove the bit where it says the field is wrong? also, thx!!!! [19:25:37] AndyRussG: I just did [19:26:14] ah K :D [19:33:03] milimetric: want me to take a lok at this one: https://gerrit.wikimedia.org/r/#/c/316904/2 [19:33:05] ? [19:33:12] milimetric: or you are still working on it? [19:34:30] sorry, spaced out [19:34:31] :) [19:34:53] nuria: it's a WIP, mforns is working on it, was about to ping him and see if I can help [19:35:02] milimetric: ok [19:35:09] nuria: in the switch I allowed the packages to be upgraded and a bunch of them changed [19:35:29] there are some pros and cons, but I think marcel sorted out most of the cons now [19:35:31] milimetric: ya, not so fond of that gotta say [19:35:36] hey milimetric and nuria I was about to log off [19:35:46] mforns: np, push the stuff and I can take it over [19:36:03] milimetric, ok, it's ugly right now though... [19:36:12] I couldn't solve the sinon error [19:36:14] nuria: happy to chat and explain, the way it was before was a little broken, so the upgrades aren't purely to clean up [19:36:27] mforns: ok, we can talk tomorrow, I'll look at the sqoop stuff [19:36:54] milimetric: what was broken? the test deps? [19:36:55] milimetric, ok, we meet before stand-up and I hope I can get that error clean [19:37:25] nuria, I think there were tests that were not being executed before [19:37:43] nuria: karma was passing some tests that shouldn't have passed, and failing some tests non-deterministically [19:38:09] nuria: also, the require-js optimizer was really slow and the new version works in half the time, so builds are faster now [19:38:28] I think the upgrade revealed some hidden testing errors [19:38:33] the way gulp was reporting timing on the build was broken, now it's fine [19:38:50] and the interaction between karma/jasmine/require made no sense before, now it's fine [19:39:14] yeah, I'm pretty happy with the current state, it's much more straightforward and will be a good base to build on [19:39:27] it's ok if we have to rewrite some tests that were broken before [19:40:12] ok, after this 1st round of changes for the platform I think we should hold on on doing more until yarn works for us, does that sound good? [19:41:52] nuria, makes sense to me [19:42:57] oh yeah, I think this clears up all the cob-webs I know about except the lack of proper models and how we pass data around, some of the D3 code being a bit spaghetti, and a couple of the old components like project selector's autocomplete [19:42:58] D3: test - ignore - https://phabricator.wikimedia.org/D3 [19:43:05] lol, what? [19:43:13] oh, haha [20:00:16] haha [20:00:22] its a differential patch bot! [20:00:32] D425 [20:00:32] D425: Changes to work with latest node-rdkafka (0.5.5+) - https://phabricator.wikimedia.org/D425 [20:12:35] Analytics-Kanban: Inconsistant data in #all-sites-by-os-and-browser fot IE7 - https://phabricator.wikimedia.org/T148461#2742659 (Nuria) This requests are real and are happening (mostly) for Main_Page. Could be a bot or it could be an issue negotiating ssl again [20:53:34] Analytics-Kanban: Inconsistant data in #all-sites-by-os-and-browser fot IE7 - https://phabricator.wikimedia.org/T148461#2742820 (Nuria) Our compatibility is edge: https://github.com/wikimedia/mediawiki/blob/1.27.1/includes/OutputPage.php#L2285 [21:36:58] wut https://github.com/implydata/pivot "Repository unavailable due to DMCA takedown." [21:37:18] (the official Pivot repository, linked from https://pivot.wikimedia.org ) [22:05:44] wikimedia/mediawiki-extensions-EventLogging#615 (REL1_28 - 41f8b94 : Chad Horohoe): The build has errored. [22:05:45] Change view : https://github.com/wikimedia/mediawiki-extensions-EventLogging/compare/REL1_28 [22:05:45] Build details : https://travis-ci.org/wikimedia/mediawiki-extensions-EventLogging/builds/170606150 [22:36:27] HaeB: wow [23:47:35] nuria joal milimetric: out of curiosity, does our setup allow for UDFs written in C++? [23:48:57] bearloga: no, we're just set up for jvm languages [23:49:08] okie dokie, thanks :) [23:50:11] HaeB / ori: we know, we pinged them and they say it's frivolous. We hope they figure things out but until they decide one way or another, it's in limbo [23:50:57] If the software is adored we can always replicate it ourselves in a simpler implementation, or we have a few other alternatives like saiku and caravel [23:51:06] seems they haven't filed a counter-notice.. [23:51:36] HaeB: they have, and they made another repo available to but both of those got rejected [23:51:46] Analytics, Discovery, Discovery-Analysis: Pivot access for Discovery's Analysis team - https://phabricator.wikimedia.org/T149144#2743453 (mpopov) [23:51:48] *too [23:56:10] i see.. over a month would have seemed a long time to process a counter-notice