[00:01:35] 10Analytics-Kanban, 10Wikimedia-Stream: Port RCStream clients to EventStreams - https://phabricator.wikimedia.org/T156919#3423378 (10Kenrick95) [00:10:10] 10Analytics, 10Analytics-EventLogging, 10Page-Previews, 10Reading-Web-Backlog, and 2 others: Duplicate events sent in Firefox after back button press - https://phabricator.wikimedia.org/T170018#3423396 (10Jdlrobson) @krinkle the bug is definitely real please see activity on the upstream ticket specifically... [00:12:59] 10Analytics, 10Analytics-EventLogging, 10Page-Previews, 10Reading-Web-Backlog, and 2 others: Duplicate events sent in Firefox after back button press - https://phabricator.wikimedia.org/T170018#3423399 (10Krinkle) [00:29:10] bearloga, chelsyx: BTW the "See our Hive query" link at http://discovery.wmflabs.org/portal/#pageviews is broken [00:30:06] HaeB: thanks for the heads-up! I'll fix it [00:30:35] cool thanks! [00:37:39] HaeB: fixed :) [01:00:57] 10Analytics-Kanban, 10Wikimedia-Stream: Port RCStream clients to EventStreams - https://phabricator.wikimedia.org/T156919#3423523 (10BBlack) [02:44:54] 10Analytics, 10Analytics-EventLogging, 10Page-Previews, 10Reading-Web-Backlog, and 2 others: Duplicate events sent in Firefox after back button press - https://phabricator.wikimedia.org/T170018#3423600 (10Krinkle) >>! In T170018#3423396, @Jdlrobson wrote: > @krinkle the bug is definitely real please see ac... [04:29:11] SMalyshev: sorry, we can talk tomorrow [05:24:46] !log drop _Edit_11448630_old from dbstore1002 [05:24:48] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [05:28:06] 10Analytics-Kanban, 10User-Elukey: dbstore1002 /srv filling up - https://phabricator.wikimedia.org/T168303#3423724 (10elukey) Table dropped after a consultation with data analysts and the analytics team. [06:53:55] 10Analytics-Kanban, 10User-Elukey: dbstore1002 /srv filling up - https://phabricator.wikimedia.org/T168303#3423854 (10Marostegui) Awesome! Once you're ready to start purging old rows, we can try to optimize a couple of tables and see what happens with the claimed disk space [07:14:16] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Contributors-Analysis, and 7 others: Managing size of page-create and revision-create tables in storage. Aggregation? - https://phabricator.wikimedia.org/T169898#3423885 (10Nemo_bis) [09:29:19] 10Analytics, 10Analytics-EventLogging, 10Page-Previews, 10Reading-Web-Backlog, and 2 others: Duplicate events sent in Firefox after back button press - https://phabricator.wikimedia.org/T170018#3424168 (10phuedx) >>! In T170018#3422516, @Nuria wrote: > mmm just to understand this better: is your instrument... [09:30:45] 10Analytics-Kanban, 10DBA, 10Operations, 10Patch-For-Review, 10User-Elukey: Puppetize Piwik's Database and set up periodical backups - https://phabricator.wikimedia.org/T164073#3424171 (10elukey) I want to observe how the patch that I merged behaves during the next days before closing. [09:35:19] 10Analytics, 10Analytics-EventLogging, 10Page-Previews, 10Reading-Web-Backlog, and 2 others: Duplicate events sent in Firefox after back button press - https://phabricator.wikimedia.org/T170018#3424189 (10phuedx) [I've scheduled](https://wikitech.wikimedia.org/w/index.php?title=Deployments&type=revision&di... [11:38:15] hey team :] [11:40:08] hello :) [11:50:40] mforns: after a chat with Jaime I filed https://gerrit.wikimedia.org/r/#/c/364412/, it should be the last one [11:50:51] (I also fixed a little nit in the script args) [11:50:58] we are getting closer :) [11:51:19] elukey, reading [11:53:44] after that one is will be only a matter of executing the script with the eventlogcleaner user [11:53:58] (pointing it to my.cnf) [11:54:21] * elukey lunch! [11:54:26] (will brb in a bit) [11:57:09] makes sense :] [12:05:05] yoohoooo :) [12:18:24] o/ [12:19:34] ottomata: I had a chat with Moritz this morning and the cdh packages should be able to be used on stretch too, since they basically depends on themselves and few other things [12:19:48] so we might get stat1005 as hadoop client + GPU on stretch [12:19:51] 10Analytics, 10EventBus, 10ORES, 10Reading-Infrastructure-Team-Backlog, and 3 others: Emit revision-score event to EventBus and expose in EventStreams - https://phabricator.wikimedia.org/T167180#3424722 (10Ottomata) Ok, in https://gerrit.wikimedia.org/r/#/c/357457/8 I removed a lot of extra revision metada... [12:20:08] oh! [12:20:09] elukey: awesome [12:20:18] we just need to do some /etc/apt.d something something to make that happen? [12:21:13] elukey: if that is true... [12:21:19] perhaps stat1006 should also be stretch [12:21:29] i guess let's try it on stat1005 first? [12:22:00] yes! If we make a mess then we just reimage :) [12:22:48] was there anything special done during imaging of stat1005 for GPU stuff? [12:23:03] and/or should we get hal fak or someone to test using GPU there to make sure its available? [12:23:12] and works for them? [12:23:57] ottomata: for the moment I didn't do anything other than reimage, but possibly we'll need to deploy the GPU's drivers.. The major pain point with jessie is that non proprietary ones are a nightmare to backport [12:24:31] about the specific for apt, not really sure, but probably we'll need to add another component like the cdh one that you created for jessie? [12:25:20] or, maybe we can just add the jessie dist as a source [12:25:23] i'm trying that now [12:25:32] cat /etc/apt/sources.list.d/thirdparty-cloudera.list [12:25:37] deb http://apt.wikimedia.org/wikimedia jessie-wikimedia thirdparty/cloudera [12:25:37] deb-src http://apt.wikimedia.org/wikimedia jessie-wikimedia thirdparty/cloudera [12:25:46] apt-cache show hadoop-client [12:25:54] Version: 2.6.0+cdh5.10.0+2102-1.cdh5.10.0.p0.72~jessie-cdh5.10.0 [12:26:37] hmm, but it won't auto install deps from that... [12:29:41] moritzm: yt? [12:31:47] 10Analytics-Kanban, 10DBA: Purge all old data from EventLogging master - https://phabricator.wikimedia.org/T168414#3424745 (10mforns) [12:33:06] yep [12:34:32] why wouldn't it install deps from that repo? [12:34:37] i'm not sure yet [12:34:39] as for the GPU stuff [12:34:50] i've done the source list as shown above [12:34:55] apt-get update [12:35:07] then apt-get install hadoop [12:35:08] says [12:35:09] hadoop : Depends: parquet but it is not going to be installed [12:35:09] I'd say let whoever requested that open a ticket with the requirements for the stack/spftware they want to use [12:35:19] then we can say what we need to do to set that up [12:35:20] says [12:35:20] The following packages have unmet dependencies: [12:35:20] parquet : Depends: hadoop-mapreduce but it is not going to be installed [12:35:20] Depends: hadoop-client (>= 2.6.0+cdh5.4.0) but it is not going to be installed [12:35:26] apt-get install parquet says ^ [12:35:26] let me have a look [12:35:31] stat1005 [12:35:58] i could list all deps, but i thikn we don't want to do that [12:36:00] 2.6.0+cdh5.4.0 ? [12:36:04] i think that's just parquet [12:36:21] should it be 2.6.0+cdh5.10.etc.. ? [12:36:22] I see the problem [12:36:27] tha'ts just the depends [12:36:28] >= [12:36:35] ah >= okok [12:36:38] parquet probably specifies a more minimal dep [12:36:46] hadoop-client : Depends: hadoop-mapreduce (= 2.6.0+cdh5.10.0+2102-1.cdh5.10.0.p0.72~jessie-cdh5.10.0 [12:37:08] these are the trusty packges and hadoop-mapreduce is a native package [12:37:23] ? [12:37:27] jessie packages you mean? [12:37:32] native?! [12:37:41] scratch that [12:37:42] ohh, stretch has hadopp packages?! [12:37:52] it's an arch:all package after all [12:37:56] but [12:38:12] it depends on the openssl version which is present in trusty (libssl1.0.0) [12:38:32] which does? [12:38:44] oh hadoop-mr [12:38:48] which is entirely bogus [12:39:04] looking at the version on stat1002 is only ships a couple of Jars [12:39:25] and /usr/bin/mpred which is a shell script [12:40:05] the only remaining explanation would be some program dlopen()ing libssl [12:40:13] but that's most definitely not the case [12:40:19] hmmmmmMMMM [12:40:36] we coudl also try to future port libssl1.0.0 from jessie? [12:40:40] so, this should be reported to CDH [12:40:53] but fortunately there's a semi-elegant way to get around that: [12:41:05] oh yea? :) [12:41:25] forward-porting libssl1.0.0 from trusty would be really painful [12:41:30] and confusing as well [12:41:50] since some programs might try to dlopen() libssl1.0.0 and fall back to that instead of 1.10 [12:41:58] so let's avoid that [12:41:59] but: [12:42:26] there's a nice little tool in Debian called equivs [12:42:33] libssl1.0.0 (>= 1.0.0) - this is insane :D [12:42:39] ya [12:43:05] which can easily generate a fake package which doesn't have any content [12:43:36] oh...weiiird, ok. [12:43:37] so we could make a package like "shut-up-cdh" and have it declare a "Provides: libssl1.0.0" [12:43:43] and that should fix it AFAICT [12:44:02] or instead of equivs we can also roll a simple package doing that on our own [12:44:29] is there a cdh release for jessie? [12:44:31] so, we would install shut-up-cdh and stretch's libssl package, right? [12:44:34] and likely everything would work? [12:44:39] moritzm: yes [12:44:41] yes, that should be fine [12:44:42] we have it [12:44:49] most nodes run jessie [12:44:55] libssl is installed by all kinds of packages anyway (like openssh) [12:44:56] stat1002 is the only one that doesn't [12:44:59] aye [12:45:11] k, I'm having a look what mapreduce declares on jessie [12:45:39] try stat1004 [12:45:49] its similiar to stat1002, but jessie [12:45:52] on jessie it's also libssl1.0.0 [12:45:55] aye [12:45:56] which makes sense [12:46:07] it ships openssl 1.0.1, but the soname is still 1.0.0 [12:46:58] shall I prep a quick fake package? [12:47:09] mobrovac: sure! [12:47:16] i'm reading some equivs manpages, but i'm sure you can do it way faster [12:47:31] don't bother adding it to apt or anythign yet, we can just try on stat1005 to see if this all works [12:48:19] yeah, I'll make some tea and give it a shot [12:48:19] I didn't get what's happening, maybe after all the tests somebody can explain? :D [12:48:43] elukey: hadoop-mapreduce declares that it needs libssl1.0.0, which is not avail in stretch [12:48:54] but, probably, it will work just fine with libssl1.0.2 which is in stretch [12:49:02] with the fake package [12:49:07] that provides 1.0.0 [12:49:08] okok [12:49:10] exactly [12:49:18] thanks :) [12:49:27] so, we lie to apt tot make it think the dep is satisfied [12:50:37] elukey: other thought: do we really want/need user quotas? [12:50:46] mabye just putting homedirs on /srv would be good enough [12:52:49] ottomata: I think we do but probably it is not a priority for this quarter.. I mean, I'd strive to have stat100[56] up and running with users migtrated to them, and only then think about quotas etc... [12:52:53] wdyt? [12:53:32] i think quotas will be annoying [12:53:44] if someone runs up against one, we'd say "ok, just put stuff outside of your homedir" [12:53:48] which really isn't that useful [12:55:06] well no, we'd follow up with the user asking why he/she is saturating home space [12:55:28] the alternative is for me/you to follow up over email when partitions are filled up [12:56:01] i almost prefer the alternative, or maybe we can script a monthly or less email that just emails us homedir sizes [12:56:02] so we can nag [12:56:13] or, even just emails someone if their homedir is larger than some amount [12:56:21] hmm, i guess that's what a quota with just a warning size would do [12:56:28] maybe a warning size ,but not hard cutoff quota? [12:56:30] hmm, the Provides: hack doesn't seem to be enough, I installed mapreduce-libssl on stat1005, but it still rejects the installation of -mapreduce [12:56:39] complaning that libssl1.0.0 is not around [12:56:43] hm [12:56:50] mapreduce-libssl is your fake packages? [12:57:23] moritzm: maybe [12:57:25] its the version? [12:57:26] Version: 1.0 [12:57:28] vs Version: 1.0.0? [12:57:31] seems unlikely [12:57:32] but maybe? [12:57:49] on jessie it is Version: 1.0.1t-1+deb8u6 [12:57:49] hmm, good point, let me update the version [12:58:59] elukey: do you know of any reason I can't just symlink /home -> /srv/home? [13:02:56] ottomata: different partitions? [13:03:12] what is the error? [13:06:40] elukey: no error [13:07:00] just wondering, seems fine to do to me! [13:07:51] ottomata: why not create a bigger partition only for /home ? [13:09:39] elukey: it just seems useless, then we'd have the same problem of telling people to make special dirs in /srv so they can store more data [13:10:02] i think folks should use their own home dirs for their working data [13:10:10] and on these machiens they need lots of space for working data [13:10:16] so in the end Provides: turned out to be insufficient and I used equivs instead, but stat1005 now has hadoop-mapreduce installed [13:10:24] ! nice! [13:10:26] nice! [13:10:40] thanks moritzm, ok, i'm going to try to puppetize stat1006 [13:10:56] can you add that package to apt.wm.org? [13:11:15] we'll probably reinstall stat1006 as stretch too [13:13:20] ottomata: if we don't put sane boundaries in multi-user hosts like stat100[56] they will become a mess like stat100[23], namely once in a while somebody just overflows the host resources impairing the work of others [13:14:09] not saying that user quotas is the answer for all, but we definitely need to come up with a strategy imho [13:14:30] elukey: i think you are right, but its hard to find a balance [13:14:34] i think maybe [13:14:40] home -> srv/home [13:14:41] + [13:14:52] user warning quotas that email them at around 1T of use maybe? [13:15:01] and a hard cutoff of something really large [13:15:02] maybe 4T? [13:15:19] something like that could be good [13:15:29] i guess warning quota could be lower [13:15:31] 500G mabye [13:16:03] and in my experience when you cross 500G boundaries is not due to big data but big mistakes in programming :D [13:16:09] like logs not cleared, etc.. [13:16:15] aye [13:16:25] or, it is big data, but then never cleaned up [13:16:55] is there a task for the stat1005 setup? [13:17:03] so what needs to be done is to: [13:17:30] create the cdh component in our repo for stretch [13:17:34] - create the cdh component in our repo for stretch [13:17:34] https://phabricator.wikimedia.org/T152712 [13:17:44] - import the current trusty packages there [13:17:44] mobrovac: we ned to create a new component? [13:17:48] (jessie) [13:17:59] can't we just point apt at jessie cloudera thirdparty? [13:18:02] like i've done on stat1005? [13:18:11] oops [13:18:11] sorry [13:18:15] wrong ping (sorry marko!) [13:18:42] ottomata: if we need to add/change packages like libssl1.0.0 then it might become a mess [13:18:44] that's only backfire at some point, e.g. when there's new packages for stretch, but we want to keep the jessie versions for the analytics* cluster at the old versions [13:18:58] that'll only backfire at some point, e.g. when there's new packages for stretch, but we want to keep the jessie versions for the analytics* cluster at the old versions [13:19:06] but adding the new component is cleaner [13:19:38] and we only want to import that fake package for the stretch repo [13:19:47] hmmmm, ok. can we do that with reprepro updates? [13:19:50] since everything is fine for cdh/jessie [13:19:57] i guess so right? [13:20:26] yeah, there's a few commits from Filiipo recently who imported some RAID drivers from the upstream jessie repo to the stretch-wikimedia section [13:21:37] cool [13:22:46] moritzm: like: https://gerrit.wikimedia.org/r/#/c/364429/ ? :) [13:24:06] ah yeah, that'll work great with https://github.com/wikimedia/puppet/blob/production/modules/role/manifests/analytics_cluster/apt.pp [13:24:41] or him [13:24:43] hm [13:24:57] nice [13:24:58] is there somethign I need to do to make it actually in strech-wikmiedia? [13:25:18] 10Analytics-Cluster, 10Analytics-Kanban, 10Patch-For-Review: Replacement of stat1002 and stat1003 - https://phabricator.wikimedia.org/T152712#2857791 (10MoritzMuehlenhoff) At this point there's no cdh release for stretch yet and the hadoop-mapreduce package for jessie has a broken Depends: on libssl1.0.0. Th... [13:26:06] like that, but you'll also need to update aptrepo to create thirdparty/cdh for stretch-wikimedia [13:26:38] moritzm: once that is merged, i can run reprepro update, right? [13:27:10] yes, exactly [13:27:27] reprepro --restrict cloudera-stretch update [13:27:33] ack [13:28:01] maybe check the puppet logs for commits by Filippo, he did mostly that for setting up the RAID drivers crap when he installed the swift servers with stretch [13:28:11] should be mostly identical [13:28:17] if my memory serves me right [13:32:10] ok gonna try this... [13:32:52] moritzm: did you add your fake ssl package to apt or should I? [13:34:10] I can add it when the repo section is available, but that's not the case yet, right? [13:37:09] ah, missed your patch [13:37:24] let me know when puppet ran and I'll import the dummy deb [13:41:23] its not working yet [13:41:37] update isn't doing anything for the stretch compoenent, its not adding the entries to InRelease [13:41:39] not sure why... [13:55:35] ottomata: see mediawiki_security [13:55:41] replied there [13:56:29] danke [13:57:49] 10Analytics-Kanban, 10Wikimedia-Stream: Port RCStream clients to EventStreams - https://phabricator.wikimedia.org/T156919#3425014 (10Kenrick95) [13:59:59] elukey: will be at sync meeting in just a few [14:01:48] sure, let me know when and I'll join [14:02:02] k, just want to finsih this package thign so mor itz can upload that fake one [14:07:31] moritzm: hm, ok, not totally sure, i did that, but still same problem. i've got a meeting with elukey now, will have to continue later [14:07:58] elukey: am in bc [15:01:59] ping marktraceur [15:02:02] ping mforns [15:02:07] sorry marktraceur [15:09:09] 10Analytics, 10Analytics-Cluster, 10Operations: Clean up permissions for privatedata files on stat1002 - they should be group readable by statistics-privatedata-users - https://phabricator.wikimedia.org/T89887#3425286 (10elukey) [15:13:30] 10Analytics, 10Analytics-Cluster: Support Wikipedia Zero change from X-CS to X-Analytics - https://phabricator.wikimedia.org/T68546#3425311 (10Ottomata) 05Open>03declined [15:19:38] 10Analytics, 10Analytics-Cluster: Perf test RAID vs JBOD with new hardware and kafka versions - https://phabricator.wikimedia.org/T168538#3425366 (10Ottomata) [15:19:40] 10Analytics, 10Analytics-Cluster, 10Operations, 10ops-eqiad: rack/setup/install new kafka nodes kafka-jumbo100[1-6] - https://phabricator.wikimedia.org/T167992#3425367 (10Ottomata) [15:23:21] 10Analytics, 10EventBus, 10ORES, 10Reading-Infrastructure-Team-Backlog, and 3 others: Emit revision-score event to EventBus and expose in EventStreams - https://phabricator.wikimedia.org/T167180#3425375 (10Fjalapeno) @mobrovac One of the things we want to be able to do with the new event is expose ORES sco... [15:25:00] 10Analytics, 10Project-Admins: Create tag "Analytics-Data-Quality" on Phabricator - https://phabricator.wikimedia.org/T169560#3425377 (10Nuria) Ping @Aklapper , anything mising so this can be created? [15:46:59] milimetric, fdans, can I join you with Wikistats and deploy? Are you guys going to do that now or later? [15:47:07] (03PS1) 10Milimetric: Fix filename [analytics/refinery] - 10https://gerrit.wikimedia.org/r/364460 [15:47:33] milimetric wanted to take a look at the state of the app right now? [15:47:51] mforns: fdans: I got a long-ish list of things that are still bugging me, will finish writing it up in a few minutes and push it (it's in the README) [15:47:55] then we can coordinate [15:48:08] cool [15:48:09] milimetric, OK thx [15:48:16] (03CR) 10Milimetric: [V: 032 C: 032] Fix filename [analytics/refinery] - 10https://gerrit.wikimedia.org/r/364460 (owner: 10Milimetric) [15:52:25] moritzm: just looked around for fillppos commits for raid/swift, not finding [15:53:25] (03CR) 10Mforns: "I think there's problems with the quotes, see comments :]" (032 comments) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/362310 (https://phabricator.wikimedia.org/T164021) (owner: 10Nuria) [15:55:31] ok fdans pull and check the README, added 3 sections in there with my thoughts [15:55:43] we don't have to get to the nice to have but the others we should probably do [15:55:57] milimetric: on it [15:56:27] cool, and fdans let's chat after you take a look, don't just jump into doing it, then we can bring marcel in to [15:56:29] *too [15:56:55] yes :] [15:59:25] 10Analytics, 10Analytics-EventLogging, 10Page-Previews, 10Reading-Web-Backlog, and 3 others: Duplicate events sent in Firefox after back button press - https://phabricator.wikimedia.org/T170018#3425543 (10phuedx) I can verify that this bug is fixed on at least https://hu.wikipedia.org/wiki/Sablon:A_Sz%C3%A... [16:03:03] milimetric mforns wanna parlay now? [16:03:09] fdans, yes [16:03:11] to the batcave! [16:03:11] :] [16:13:28] 10Analytics: Add "desktop by browser" tab to browser reports - https://phabricator.wikimedia.org/T170286#3425599 (10Nuria) [16:19:43] 10Analytics, 10Analytics-EventLogging, 10Page-Previews, 10Reading-Web-Backlog, and 3 others: Duplicate events sent in Firefox after back button press - https://phabricator.wikimedia.org/T170018#3425659 (10Nuria) >To be clear, right now we have a serious problem with EventLogging which has been blocking us... [16:32:00] o/ [16:32:09] I'm late to the analytics/research hangout chill out. [16:32:15] Because of tech management meeting :( [16:35:14] s'ok i can chill out on my own [16:35:16] chillliiiin [16:35:30] don't have much to talk about, maybe stat box GPU stuff [16:38:20] ottomata: on my way. missed the ping. [16:44:39] 10Analytics-EventLogging, 10Analytics-Kanban, 10Contributors-Analysis, 10DBA, and 5 others: Drop tables with bad data: mediawiki_page_create_1 mediawiki_revision_create_1 - https://phabricator.wikimedia.org/T169781#3425892 (10kaldari) 05Open>03Resolved Looks good to me! [16:44:41] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Contributors-Analysis, and 6 others: Record an event every time a new content namespace page is created - https://phabricator.wikimedia.org/T150369#3425894 (10kaldari) [16:45:01] 10Analytics-EventLogging, 10Analytics-Kanban, 10Contributors-Analysis, 10DBA, and 4 others: Drop tables with bad data: mediawiki_page_create_1 mediawiki_revision_create_1 - https://phabricator.wikimedia.org/T169781#3425895 (10kaldari) [16:46:57] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Contributors-Analysis, and 6 others: Record an event every time a new content namespace page is created - https://phabricator.wikimedia.org/T150369#3425915 (10kaldari) [16:47:02] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Contributors-Analysis, and 6 others: Managing size of page-create and revision-create tables in storage. Aggregation? - https://phabricator.wikimedia.org/T169898#3425914 (10kaldari) 05Open>03Resolved [16:47:16] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Contributors-Analysis, and 6 others: Managing size of page-create and revision-create tables in storage. Aggregation? - https://phabricator.wikimedia.org/T169898#3411985 (10kaldari) [16:49:21] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Contributors-Analysis, and 6 others: Record an event every time a new content namespace page is created - https://phabricator.wikimedia.org/T150369#3425928 (10kaldari) 05Open>03Resolved This seems to be working smoothly now! Thanks @Ottomata... [17:10:45] 10Analytics, 10ChangeProp, 10EventBus, 10Epic, and 2 others: [EPIC] Develop a JobQueue backend based on EventBus - https://phabricator.wikimedia.org/T157088#3426049 (10Pchelolo) [17:12:41] * elukey off! [17:24:11] 10Analytics, 10ChangeProp, 10EventBus, 10Services (later), 10User-mobrovac: Create a hook when job is posted to JobQueue - https://phabricator.wikimedia.org/T163380#3426150 (10Pchelolo) [17:28:45] 10Analytics, 10ChangeProp, 10EventBus, 10Services (later), 10User-mobrovac: Create a hook when job is posted to JobQueue - https://phabricator.wikimedia.org/T163380#3195595 (10GWicke) IIRC the other option you looked into before was to create a "wrapper" JobQueue class that delegates to Redis *and* the n... [17:32:49] 10Analytics, 10ChangeProp, 10EventBus, 10Services (later), 10User-mobrovac: Create a hook when job is posted to JobQueue - https://phabricator.wikimedia.org/T163380#3426198 (10Pchelolo) >>! In T163380#3426174, @GWicke wrote: > IIRC the other option you looked into before was to create a "wrapper" JobQueu... [17:37:42] (03PS3) 10Nuria: Adding "tags" column to webrequest [analytics/refinery] - 10https://gerrit.wikimedia.org/r/362310 (https://phabricator.wikimedia.org/T164021) [17:38:16] (03PS4) 10Nuria: Adding "tags" column to webrequest [analytics/refinery] - 10https://gerrit.wikimedia.org/r/362310 (https://phabricator.wikimedia.org/T164021) [17:38:58] (03CR) 10Nuria: Adding "tags" column to webrequest (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/362310 (https://phabricator.wikimedia.org/T164021) (owner: 10Nuria) [18:03:45] SMalyshev: yt? [18:04:03] SMalyshev: want to talk tags? [18:04:47] nuria_: on a meeting now, in a half hour? [18:04:54] SMalyshev: sure [18:14:14] 10Analytics, 10ChangeProp, 10EventBus, 10Services (later), 10User-mobrovac: Create a hook when job is posted to JobQueue - https://phabricator.wikimedia.org/T163380#3195595 (10mobrovac) Heh, this solution is easier but the other one is more correct. In the end, we will end up with MW posting jobs directl... [18:20:36] milimetric or fdans you still around? [18:20:58] mforns: yup! [18:21:22] hey fdans can I ask you about the store changes in da cave? [18:23:31] a-team: do we know if anyone is using the /a/eventlogging/archive files on stat1002? [18:23:49] i think we used to keep them there for posterity/backup, but its all in kafka now and/or written to mysql or hdfs [18:23:53] mforns: yep omw [18:23:58] i'd like to stop copying them there if we can [18:24:10] ottomata, mmm no idea [18:24:48] 10Analytics, 10EventBus, 10ORES, 10Reading-Infrastructure-Team-Backlog, and 3 others: Emit revision-score event to EventBus and expose in EventStreams - https://phabricator.wikimedia.org/T167180#3426466 (10mobrovac) >>! In T167180#3425375, @Fjalapeno wrote: > @mobrovac One of the things we want to be able... [18:25:38] ottomata: I remember it the same way you do - that it was a backup before we were more reliable (back in the days when we would restore from logs if something happened) [18:28:53] ottomata: i used them the other day yes [18:29:19] nuria_: for what? [18:29:27] ottomata: it is pretty easy to look at past data [18:29:29] could you have used them from eventlog1001? [18:29:54] ottomata: i try not to cat huge files in teh prod machine... [18:30:23] ottomata: right? and doing unix pipe stuff [18:30:42] aye ok [18:30:43] hm [18:37:00] milimetric mforns just pushed the last bit of my assignments (tests), let's talk tomorrow morning EDT [18:37:12] k! [18:38:14] 10Analytics-Kanban: Non existing article is one of the most viewed according to the data returned by the /metrics/pageviews/top/ API - https://phabricator.wikimedia.org/T149178#2744325 (10Nuria) a:03Nuria [18:38:53] 10Analytics-Cluster, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Monitor that no worker nodes are in the default rack in net topology - https://phabricator.wikimedia.org/T163909#3426535 (10Nuria) 05Open>03Resolved [18:39:05] 10Analytics-Kanban, 10Page-Previews, 10Reading-Web-Backlog (Tracking): Update purging settings for Schema:Popups - https://phabricator.wikimedia.org/T167449#3426536 (10Nuria) 05Open>03Resolved [18:39:25] nuria_: done with meetings now :) [18:39:28] 10Analytics-Cluster, 10Analytics-Kanban, 10Operations: Reinstall Analytics Hadoop Cluster with Debian Jessie - https://phabricator.wikimedia.org/T157807#3426537 (10Nuria) 05Open>03Resolved [18:39:32] 10Analytics-Cluster, 10Analytics-Kanban, 10Operations: Reinstall Analytics Hadoop Cluster with Debian Jessie - https://phabricator.wikimedia.org/T157807#3017036 (10Nuria) [18:39:37] 10Analytics-Cluster, 10Analytics-Kanban: Hadoop cluster expansion. Add Nodes - https://phabricator.wikimedia.org/T152713#3426539 (10Nuria) 05Open>03Resolved [18:40:17] SMalyshev: yessir [18:40:54] 10Analytics-Kanban, 10EventBus, 10Wikimedia-Stream, 10Services (watching), 10User-mobrovac: EventStreams - https://phabricator.wikimedia.org/T130651#3426545 (10Nuria) [18:40:56] 10Analytics-Kanban, 10Wikimedia-Stream: Port RCStream clients to EventStreams - https://phabricator.wikimedia.org/T156919#3426544 (10Nuria) 05Open>03Resolved [18:41:12] yo, fyi, analytics-store is down temporarily [18:41:18] it crashed due to a long running alter transaction [18:41:21] it is recovering [18:41:24] i'm watchin git [18:41:25] 10Analytics-Kanban: Implement purging settings for Schema:ReadingDepth - https://phabricator.wikimedia.org/T167439#3426546 (10Nuria) 05Open>03Resolved [18:41:34] 10Analytics-EventLogging, 10Analytics-Kanban: whitelist multimedia and upload wizard tables - https://phabricator.wikimedia.org/T166821#3426547 (10Nuria) 05Open>03Resolved [18:41:53] 10Analytics-Kanban: Preserve userAgent field in apps schemas - https://phabricator.wikimedia.org/T164125#3426548 (10Nuria) 05Open>03Resolved [18:42:08] fdans: will pull in a bit and do my stuff [18:42:35] 10Analytics-Kanban, 10DBA, 10Patch-For-Review, 10User-Elukey: Create a user for the eventlogging_cleaner script on the analytics slaves - https://phabricator.wikimedia.org/T170118#3426549 (10Nuria) [18:42:50] 10Analytics-Kanban: Non existing article is one of the most viewed according to the data returned by the /metrics/pageviews/top/ API - https://phabricator.wikimedia.org/T149178#3426550 (10Nuria) [18:42:58] SMalyshev: your question? [18:43:31] nuria_: so I've been looking into the tagging system, and I have several actually [18:43:54] nuria_: 1. is tagging only for partitioning or for other things too? Does it make sense to have many tags on same row? [18:44:15] SMalyshev: it mostly for partitioning [18:44:47] SMalyshev: yes we could have several tags on the same row [18:45:03] nuria_: so how it would be partitioned if it has several tags? [18:45:51] SMalyshev:not all tags might be used for partitioning, example: Pageview and php-api request [18:46:20] aha, so it's configurage which tags go into a separate partition? [18:46:30] SMalyshev: that request will end up on the Pageview partition (so jobs whose metrics are computed on pageviews can use it) [18:46:32] milimetric, I also pushed all changes [18:47:07] nuria_: aha, so php-api tag can be used for non-partition purposes, right? [18:47:19] SMalyshev: the php-api tag might/might not be used for partition but in any case could be used for easy "selection" of records [18:47:28] milimetric, now, looking at the link highlight bug that I "fixed" it has another edge case: when you click on the metric box [18:47:42] SMalyshev: yes, but whether that makes sense depends on the tag itself [18:47:42] it doesn't highlight the Reading menu... [18:48:14] SMalyshev: it doesn't really make sense to tag really granular stuff [18:48:22] trying to figure that out, but the router doesn't want to do what I tell it... :'( [18:48:45] SMalyshev: as you might go through 1 TB of data to tag 100K [18:48:59] nuria_: so how granular? i.e. for example wdqs has requests for SPARQL and for LDF - which are pretty different. does it make sense to have separate tags for them? [18:49:26] SMalyshev: LDF is also a query system? [18:49:46] SMalyshev: if both those requests are going to be grouped together to calculate metrics of usage of entities [18:49:49] nuria_: it's not exactly a query, it's more like direct access to index [18:50:11] nuria_: they are usually measured separately, since they are very different [18:50:45] SMalyshev: to measure "usage" of system maybe , but if the goal is to measure "usage" of entities , does that matter ? ( honest question) [18:51:53] SMalyshev: it depends what is the goal [18:51:57] nuria_: the goal is kinda both I think :) [18:52:13] SMalyshev: i doubt you can do both with teh same code [18:52:37] SMalyshev: measure system usage is best done with things such us graphite , request counts for pageview API for example [18:52:43] nuria_: e.g. separating SPARQL and LDF would make processing SPARQL data easier since we could easily exclude LDF ones (we can exclude it now by matching URI but that's probably slower?) [18:52:57] SMalyshev: measure entity usage is a metric that you need to define [18:53:13] SMalyshev: as requests to entity X per 1 hour [18:53:58] SMalyshev: agreggation in one case is not needed 9distinct counts sent to graphite will do) in the second case agreggation is a must, is is a sum/time [18:54:19] nuria_: so my idea was to make it in two stages. One stage is to make what we have now (raw query rows, intermixed with all other traffic in misc) into a separate dataset that has only SPARQL things extracted and segregated and easily queriable [18:54:24] does it make sense? [18:55:24] nuria_: and second stage is using this data set to make metrics -like entity usage, etc. - using sparql-only data instead of trying to process all misc cluster traffic each time we need something [18:55:42] SMalyshev: ok, let me see if i got it: that will be a subset of records that will look like webrequest but will exist in a partition that only has the data you are interested into for 60 days, correct? [18:55:54] SMalyshev: and you will have jobs that run on that data after [18:56:20] yep something like that [18:56:22] SMalyshev: if so, yes, sounds good [18:56:55] I do not insist on "look like webrequest" even - e.g. many things that are in webrequest I may not need (though it won't hurt if it's easier that way) [18:57:41] SMalyshev: it means that table has the same schema, so, yes, it is easier that way, so the partitio code is tag "agnostic" [18:57:48] *partition code [18:58:15] SMalyshev: see https://gerrit.wikimedia.org/r/#/c/357814/1/oozie/webrequest/split/split_webrequest.hql [18:58:50] nuria_: ok then, so which tags then I'd define? Generally I have 3 kinds of traffic going to wdqs - sparql, ldf and all the rest (gui files, etc.) [18:59:36] ideally I'd like to keep them segregated [19:00:23] 10Analytics, 10Analytics-EventLogging, 10DBA: dbstore1002 crashed - https://phabricator.wikimedia.org/T170308#3426597 (10jcrespo) [19:02:02] SMalyshev: Partitions (for things to work ok) need to be sizable, wikidata is already quite small and i am afraid further partitioning of it might be too small (kind of eye-balling problem) so i will create 1st one tag "spaqrl-things" used to partition [19:02:29] SMalyshev: so all the reuqests that you are interested on go into that partition [19:02:52] nuria_: so then it'd be just like hostname = 'query.wikidata.org' basically? [19:02:54] SMalyshev: We will insert that tag in a table like: https://gerrit.wikimedia.org/r/#/c/357814/1/hive/webrequest/create_webrequest_split_tag_table.hql [19:03:09] SMalyshev: that tells us that "it is to be used for partitioning" [19:03:33] SMalyshev: 'wikidata-query" sounds good [19:03:58] ok, so that's really simple then. [19:04:01] SMalyshev: you can add also tags to classify that data further that will not be used to partition [19:04:15] SMalyshev: like "sparql" [19:04:22] nuria_: one more question though: I see that tagger now only tags successful requests [19:04:28] SMalyshev: yes [19:04:57] nuria_: it may be a bit of a problem as we may generally want to have failed ones too - e.g. syntax errors, timeouts, etc. - these may be interesting too [19:05:22] SMalyshev: so "wikidata_query" is used to partition and "ldf" and "sparql" and used later for your own jobs for something else [19:05:34] nuria_: sounds good [19:05:45] nuria_: so it's ok for a tagger to create more than one tag? [19:06:12] SMalyshev: yes, it returns a set [19:06:18] because there's @Tag(tag = "portal", - so I wondered what that "tag = " means there [19:06:59] SMalyshev: https://github.com/wikimedia/analytics-refinery-source/blob/master/refinery-core/src/main/java/org/wikimedia/analytics/refinery/core/webrequest/tag/Tagger.java#L25 [19:07:52] SMalyshev: ah, good catch, that shoudl also be updated just like the interface https://github.com/wikimedia/analytics-refinery-source/blob/master/refinery-core/src/main/java/org/wikimedia/analytics/refinery/core/webrequest/tag/Tag.java#L17 [19:08:01] SMalyshev: that is my mistake [19:09:08] @Tag(tags="blah, blah, blah") [19:09:24] ah, ok then :) [19:10:08] SMalyshev: will correct [19:10:16] nuria_: one more Q: tagAccumulator is used for looking at other tags, but I'm not supposed to modify it, right? [19:10:25] SMalyshev: right [19:10:53] SMalyshev: recursion- style [19:11:24] ok, sounds clear enough for me, will try to build a tagger now and see how it goes. Thanks! [19:11:40] SMalyshev: ok, will send code change for interface and add you as CR-er [19:11:48] great, thanks! [19:12:02] SMalyshev: i will start testing splitting today hopefully and let you know how does goes [19:19:00] 10Analytics, 10Analytics-EventLogging, 10DBA: dbstore1002 crashed - https://phabricator.wikimedia.org/T170308#3426698 (10Marostegui) Yeah, big alter on s1 tables (adding PK) was running at the time :-( [19:20:57] 10Analytics, 10Analytics-EventLogging, 10DBA: dbstore1002 crashed - https://phabricator.wikimedia.org/T170308#3426713 (10Marostegui) I will try the alters tomorrow to see if they go thru or if this host cannot cope with such big ones (which will be worrying) [19:22:07] 10Analytics, 10ChangeProp, 10EventBus, 10Services (later), 10User-mobrovac: Support posting Jobs to EventBus simultaneously with normal job processing - https://phabricator.wikimedia.org/T163380#3426718 (10Pchelolo) [19:24:31] 10Analytics, 10ChangeProp, 10EventBus, 10Services (later), 10User-mobrovac: Support posting Jobs to EventBus simultaneously with normal job processing - https://phabricator.wikimedia.org/T163380#3426740 (10Pchelolo) I've heavily edited the task description to incorporate input from @GWicke and @mobrovac... [19:27:42] halfak: , yt? [19:27:48] aspell-id is not avail in stretch [19:27:55] but it is installed by ores::base in puppet [19:28:00] is it really needed? [19:28:14] hmm... yes. But there can be an alternative. [19:28:22] Any other indonesian dicts available? [19:28:31] like myspell-id or hunspell-id? [19:28:51] done' see any [19:30:01] don't? [19:30:33] don't [19:30:46] WTF [19:30:52] Why would they drop a package. [19:31:02] haha [19:31:03] no idea [19:31:06] i'm googling [19:31:19] 10Analytics, 10ChangeProp, 10EventBus, 10Services (later), 10User-mobrovac: Support posting Jobs to EventBus simultaneously with normal job processing - https://phabricator.wikimedia.org/T163380#3426764 (10mobrovac) > Option 1: Create an AfterJobPush hook It should actually be a `BeforeJobPush` hook bec... [19:32:19] AH [19:32:23] halfak: it was never around [19:32:25] we built it [19:32:37] or imported it [19:32:38] on it... [19:34:01] fwiw ftp://ftp.gnu.org/gnu/aspell/dict/id/ [19:34:28] No changes since 2004 :) it shouldn't be too hard to fix the pkg if we need to [19:35:13] i just copied it to the stretch distro [19:36:36] 10Analytics, 10Project-Admins: Create tag "Analytics-Data-Quality" on Phabricator - https://phabricator.wikimedia.org/T169560#3426804 (10Aklapper) I've created https://phabricator.wikimedia.org/project/view/2886/ but don't know yet if you want H126 to be altered/adjusted? [19:36:47] 10Analytics, 10Project-Admins: Create tag "Analytics-Data-Quality" on Phabricator - https://phabricator.wikimedia.org/T169560#3426806 (10Aklapper) p:05Triage>03Normal a:03Aklapper [19:37:51] 10Analytics, 10Analytics-EventLogging, 10DBA: dbstore1002 crashed - https://phabricator.wikimedia.org/T170308#3426812 (10jcrespo) At least x1 broke- no time to reimport now. [19:42:19] 10Analytics, 10Project-Admins: Create tag "Analytics-Data-Quality" on Phabricator - https://phabricator.wikimedia.org/T169560#3426865 (10Nuria) No, thank you, that is not needed. [19:43:09] 10Analytics-Data-Quality, 10Analytics-Kanban: Non existing article is one of the most viewed according to the data returned by the /metrics/pageviews/top/ API - https://phabricator.wikimedia.org/T149178#3426870 (10Nuria) [19:43:17] 10Analytics, 10DBA, 10Security, 10Wikimedia-Incident: MySQL password for research@analytics-store.eqiad.wmnet publicly revealed - https://phabricator.wikimedia.org/T170066#3426873 (10Legoktm) [19:43:48] 10Analytics, 10Analytics-Cluster, 10Security: Access rights for HDFS on stat100* for Sqoop tasks - https://phabricator.wikimedia.org/T170052#3426875 (10Legoktm) [19:45:24] halfak: good news! It looks like these jessie hadoop packages work fine on stretch! :) [19:45:50] \o/ [19:45:52] Nice! [19:46:21] 10Analytics-Tech-community-metrics, 10Developer-Relations (Jul-Sep 2017): Automatically sync mediawiki-identities/wikimedia-affiliations.json DB dump file with the data available on wikimedia.biterg.io - https://phabricator.wikimedia.org/T157898#3426911 (10Aklapper) Hmm, is it a side effect that Owlbot overwri... [19:57:18] 10Analytics-Data-Quality, 10Analytics-Kanban: Non existing article is one of the most viewed according to the data returned by the /metrics/pageviews/top/ API - https://phabricator.wikimedia.org/T149178#3426989 (10Nuria) I am sorry we didi not looked at this earlier. Page got created December 7th, so pageview... [20:02:15] 10Analytics, 10RESTBase, 10Services (blocked): REST API entry point web request statistics at the Varnish level - https://phabricator.wikimedia.org/T122245#3427025 (10GWicke) [20:05:50] 10Analytics, 10Analytics-EventLogging, 10DBA: dbstore1002 crashed - https://phabricator.wikimedia.org/T170308#3427081 (10Marostegui) Oh if a shard at least broke, then I won't try this alter again as it could corrupt another shard and we might need to even reimport it. We will need to skip this host and leav... [20:08:12] (03PS1) 10Nuria: Tag annotation should reflect that a tagger can return several tags [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/364518 (https://phabricator.wikimedia.org/T164021) [20:09:06] 10Analytics, 10ChangeProp, 10EventBus, 10Epic, and 2 others: [EPIC] Develop a JobQueue backend based on EventBus - https://phabricator.wikimedia.org/T157088#3427140 (10Pchelolo) [20:09:41] (03PS2) 10Nuria: Tag annotation should reflect that a tagger can return several tags [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/364518 (https://phabricator.wikimedia.org/T164021) [20:12:07] 10Analytics-Cluster, 10Analytics-Kanban, 10Patch-For-Review: Replacement of stat1002 and stat1003 - https://phabricator.wikimedia.org/T152712#3427151 (10Ottomata) stat1005 has analytics client stuff applied (Thanks Moritz! :D ). statistics::* classes are going to be more difficult and require coordination.... [20:15:35] (03CR) 10Smalyshev: [C: 031] Tag annotation should reflect that a tagger can return several tags [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/364518 (https://phabricator.wikimedia.org/T164021) (owner: 10Nuria) [20:29:32] 10Analytics, 10ChangeProp, 10EventBus, 10Services (later), 10User-mobrovac: Support posting Jobs to EventBus simultaneously with normal job processing - https://phabricator.wikimedia.org/T163380#3427329 (10Tgr) What's the desired end state? I imagine you won't use `JobQueue::pop` / `JobQueue::ack` as usi... [20:36:11] 10Analytics, 10ChangeProp, 10EventBus, 10Epic, and 2 others: [EPIC] Develop a JobQueue backend based on EventBus - https://phabricator.wikimedia.org/T157088#3427375 (10Pchelolo) Here's the presentation from the discussion of the options on the Developer Summit 2017: https://commons.wikimedia.org/wiki/File:... [20:39:16] 10Analytics, 10ChangeProp, 10EventBus, 10Services (later), 10User-mobrovac: Support posting Jobs to EventBus simultaneously with normal job processing - https://phabricator.wikimedia.org/T163380#3427465 (10Pchelolo) >>! In T163380#3427329, @Tgr wrote: > What's the desired end state? I imagine you won't u... [20:42:59] 10Analytics, 10ChangeProp, 10EventBus, 10Epic, and 2 others: [EPIC] Develop a JobQueue backend based on EventBus - https://phabricator.wikimedia.org/T157088#3427504 (10Pchelolo) [20:45:25] (03PS1) 10Smalyshev: [WIP] Add tagger for Wikidata Query Service requests [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/364542 (https://phabricator.wikimedia.org/T169798) [20:46:05] (03PS2) 10Smalyshev: [WIP] Add tagger for Wikidata Query Service requests [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/364542 (https://phabricator.wikimedia.org/T169798) [20:48:15] (03PS3) 10Smalyshev: [WIP] Add tagger for Wikidata Query Service requests [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/364542 (https://phabricator.wikimedia.org/T169798) [20:55:02] 10Analytics, 10Analytics-Cluster, 10Security: Access rights for HDFS on stat100* for Sqoop tasks - https://phabricator.wikimedia.org/T170052#3427635 (10dpatrick) p:05Triage>03Unbreak! [21:04:45] 10Analytics, 10Analytics-Cluster, 10Security: Access rights for HDFS on stat100* for Sqoop tasks - https://phabricator.wikimedia.org/T170052#3427700 (10Legoktm) @dpatrick did you mean to triage this as Unbreak Now!? If so, why? [21:05:50] 10Analytics, 10Analytics-Cluster, 10Security: Access rights for HDFS on stat100* for Sqoop tasks - https://phabricator.wikimedia.org/T170052#3427713 (10Bawolff) p:05Unbreak!>03Normal I think it was just the phab board's auto mess with priority "feature" [21:10:36] (03CR) 10Nuria: [WIP] Add tagger for Wikidata Query Service requests (032 comments) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/364542 (https://phabricator.wikimedia.org/T169798) (owner: 10Smalyshev) [21:17:59] (03CR) 10Smalyshev: [WIP] Add tagger for Wikidata Query Service requests (031 comment) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/364542 (https://phabricator.wikimedia.org/T169798) (owner: 10Smalyshev) [21:22:00] (03PS4) 10Smalyshev: [WIP] Add tagger for Wikidata Query Service requests [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/364542 (https://phabricator.wikimedia.org/T169798) [21:22:58] nuria_: check out this one: https://gerrit.wikimedia.org/r/#/c/364542/4/refinery-core/src/main/java/org/wikimedia/analytics/refinery/core/webrequest/tag/TaggerChain.java looks like there are some weird unicode chars in this file which my Eclipse hates [21:23:24] [22:20:57] (03CR) 10Bearloga: [C: 031] Tag annotation should reflect that a tagger can return several tags [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/364518 (https://phabricator.wikimedia.org/T164021) (owner: 10Nuria) [22:34:53] 10Analytics-Tech-community-metrics, 10Developer-Relations (Jul-Sep 2017): Investigate detached duplicated accounts in DB with same username, same source, but different uuids - https://phabricator.wikimedia.org/T170093#3428204 (10Aklapper) First two were pretty sure shots, hence merged nearly all of them. Third... [22:45:46] 10Analytics-Kanban, 10Analytics-Wikistats: Manage application state with vuex - https://phabricator.wikimedia.org/T169371#3428276 (10Milimetric) p:05Triage>03Normal [22:57:48] 10Analytics, 10MediaWiki-API, 10RESTBase-API, 10Services (blocked): Top API user agents stats - https://phabricator.wikimedia.org/T142139#3428330 (10GWicke) [22:58:58] 10Analytics, 10MediaWiki-API, 10RESTBase-API, 10Services (blocked): Top API user agents stats - https://phabricator.wikimedia.org/T142139#2524050 (10GWicke) [22:59:16] 10Analytics, 10RESTBase, 10Services (blocked): REST API entry point web request statistics at the Varnish level - https://phabricator.wikimedia.org/T122245#1899324 (10GWicke) [23:02:16] nuria_: question: what arguments I give to GetWebrequestTagsUDF function? [23:09:11] nuria_: also, when I try to run it, I get Caused by: java.lang.ClassNotFoundException: org.reflections.Reflections - is something I'm doing wrong? [23:17:00] SMalyshev: checking [23:17:55] SMalyshev: want to remove those bad chars? or either i can do on tag patch give me a sec [23:18:14] nuria_: I've removed them in my patch, yes [23:18:23] SMalyshev: ok, then that works [23:19:56] SMalyshev: let me check couple things i was in the middle of snd will get back to you on testing udf [23:20:10] sure [23:27:15] SMalyshev: did you tested by building your code in 1002? [23:27:29] nuria_: I built locally but run it on 1002 [23:27:43] build & tests are fine [23:27:53] SMalyshev: by moving the jars? [23:27:58] but looks like jar doesn't have those classes [23:27:59] yes [23:28:40] SMalyshev: did you rsync 1 jar or several? [23:29:08] 2 jars, refinery-core & refinery-hive [23:29:35] SMalyshev: ok, let me try [23:38:20] SMalyshev: still undoimg some chnages i had to test this [23:38:23] *changes [23:45:02] SMalyshev: running queries [23:47:04] SMalyshev: worked fine [23:47:14] SMalyshev: check mu code on stat1002 at: [23:47:46] "/home/nuria/workplace/tag/test_UDF.hql" [23:48:05] SMalyshev: once sec let me give you permits [23:48:39] SMalyshev: let me know if that is similar to what you were doing [23:49:39] nuria_: yeah but somehow that didn't work for me... not sure why. I just checked out the source and did mvn package [23:49:46] and copied the jars [23:50:23] [23:51:11] SMalyshev: ok, try to use my jar at /home/nuria/workplace/refinery/source/refinery-hive/target/refinery-hive-0.0.46-SNAPSHOT.jar [23:51:19] SMalyshev: and let me know if it works [23:53:25] SMalyshev: also try doing mvn dependency:tree > tree.txt [23:53:52] SMalyshev: that will print your mvn dep tree [23:54:04] SMalyshev: which likely fine as locally you do not have a problem [23:54:18] nuria_: well yes your one works... but produces no tags [23:54:18] SMalyshev: I see: [23:54:53] shouldn't it also use different core module? tags are in refinery-core? [23:55:23] SMalyshev: right, it will produce no tags for wikidata querys just for "portal" [23:55:45] nuria_: but I want to test my tagger... the one that tags sparql ones [23:56:12] SMalyshev: right, but the fact that my jar works means that the problem is the build of yours, we can fix that [23:56:52] SMalyshev: do rsync your code to 1002 and let's build there so we can troubleshoot [23:56:58] nuria_: yeah I think the problem is that jar has only classes from the project, not dependencies, so it can't find new dependency? [23:57:37] with your jar the error is different: java.lang.ClassNotFoundException: org.wikimedia.analytics.refinery.core.webrequest.tag.WDQSTagger [23:57:46] SMalyshev: right [23:58:14] SMalyshev: if you put your source and code under your /home in 1002 we can look at it [23:58:54] hmm stat1002 doesn't have gerrit... I need to dig out how to check out gerrit patches without gerrit... [23:59:17] SMalyshev: You can 1) rsync from local "rsync -rva --delete ./source/ stat1002.eqiad.wmnet:~/workplace/refinery/source" [23:59:38] SMalyshev: or if you want a gerrit change clone it anonymously