[02:04:12] (03CR) 10Zhuyifei1999: [C: 032] Remove call to nonexistent list.js [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/430495 (owner: 10Framawiki) [02:04:53] (03Merged) 10jenkins-bot: Remove call to nonexistent list.js [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/430495 (owner: 10Framawiki) [08:28:22] (03CR) 10Joal: "@nuria: I think going for that move allows us to gently move toward safer releases of new datasources. The set of patches involve manual d" (031 comment) [analytics/aqs] - 10https://gerrit.wikimedia.org/r/429765 (https://phabricator.wikimedia.org/T193387) (owner: 10Joal) [08:35:00] Hi elukey - Thinking of you traveling to Madrid :) Enjoy your time in Spain ;) [08:35:36] joal: thanks!! Will work a bit this morning and fly in the afternoon :) [08:35:51] (building druid now) [08:36:06] Wow - good luck :S [08:42:55] elukey: reading just this page makes sure I'm actually gonna read the whole thing :) https://legacy.gitbook.com/book/steveloughran/kerberos_and_hadoop/details [08:44:57] ahhaah [08:59:13] so joal I have the druid 0.11 debs ready, I am going to copy them (it takes a bit) to the druid labs hosts [08:59:25] ack elukey! [08:59:41] elukey: I have he procedures ready to test whenever you want :) [08:59:43] as far as I can see the upgrade process should be the same except for coordinators and overlords [09:00:04] that need to be shutdown all in once, then start one at the time [09:00:18] (no mixture of 0.10 and 0.11 at the same time) [09:00:25] (leader election changed) [09:00:27] elukey: That'll mean dowtime, but should be small :) [09:00:40] or maybe not? [09:00:53] since brokers and historical will still be up, maybe not even downtime @! [09:01:00] in theory they say that if you are quick and bring up at least one of them after shutting them down it shouldn't be a problem [09:01:10] but I don't trust them anymore :P [09:01:14] hehehehe :) [09:01:25] https://github.com/druid-io/druid/releases/tag/druid-0.11.0 - do you see anything in there that would need our attention ? [09:01:28] like config changes, etc.? [09:01:47] checking [09:05:21] elukey: Things I noticed: jvm/gc/time metric [09:05:45] Service name changes (pssible tweak needed for Tranquiliy, not sure) [09:05:57] And cachingCost Balancer Strategy [09:06:44] so the jvm metrics are grabbed via jmx so no problem [09:07:15] service name changes not sure, we don't set anything else with druid.service buuut I'd need to triple check what that means [09:08:09] very nice cachingCost indeed! Maybe we can test it after we see that the upgrade went ok? [09:09:53] works for me elukey :) [09:10:33] elukey: I was wondering about service names because Trnaquility interacts with ZK and therefore uses thoses names [09:10:34] all right waiting for the debs to be uploaded, then I'll upgrade in labs and ask you for some testing (whenever you have time) [09:10:54] elukey: zookeeperDruidIndexingService: String = "druid/overlord" [09:11:03] elukey: Looks like we are safe :) [09:11:08] I am a bit worried that tranquillity's commit activity has been really poor recently [09:11:41] I found while checking a pull request half done to fully support 0.10 [09:11:57] I have not checked that ... Not sure if that means it's stable enough, or if features should be available through different system [09:11:58] will add to the things to check [09:12:51] elukey: while reading about Kerberos, found out that one: https://ranger.apache.org/ [09:13:33] elukey: Doesn't replace Kerberos, kerberos is step-0 towards security, but step-1 could really be rangers ! [09:14:12] elukey: some nice-to-haves: Audit logging (can be used for different things than just checking whose done what_) [09:16:35] yeah Ranger is really cooool [09:59:41] so historicals upgraded in labs [10:00:43] elukey: query works :) [10:00:45] I can see again the error in loading json [10:00:49] :( [10:00:57] but I think it is related to the format of the segments cached [10:01:20] in theory, IIUC, druid unloads/unannounce/loads again them [10:01:33] let's see what the coordinator says about status of its segments [10:02:39] d-2 is the coordinator leader and the UI reports all segments loaded [10:02:42] so I think we are good :) [10:04:03] ok - I'm gonna do some more test :) [10:05:06] overlords upgraded [10:05:18] now I am going to do middlemanagers, brokers and historicals [10:05:21] err coordinators [10:05:26] k [10:11:43] !log d-[123] Druid cluster upgraded to 0.11 in labs (project analytics) [10:11:44] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [10:11:47] joal: done! [10:11:58] elukey: okey ! Starting some tests :) [10:12:33] let's see if pivot works [10:14:28] seems working (it doesn't find banner activity data for its related cube) [10:14:39] elukey: indeed, no banner data [10:15:23] from the coordinator's UI, I saw only two data sources though [10:15:29] webrequest and webrequest_live [10:15:50] Correct sire [10:16:09] webrequest is through hadoop indexation, webrequest_live a fake streaming indexation (not launched now) [10:46:04] all right I am going to log off! [10:46:14] talk with you on Monday joal! [10:46:30] elukey: I'm fightin [10:46:42] with druid? [10:46:50] elukey: with kafka and camus on labs - I expect to have tested things this afternoon :) [10:47:28] elukey: enjoy Spain! Have some jabugo on me ;) [10:48:26] o/ [11:15:46] 10Analytics: Varnishkafka does not play well with varnish 5.2 - https://phabricator.wikimedia.org/T177647#4181720 (10R4q3NWnUx2CEhVyr) According to my testing it is ok. I updated my older review for format allocation... forgot about that one. [11:31:11] (03PS1) 10Mforns: Add source page fields to wmf.virtualpageview_hourly [analytics/refinery] - 10https://gerrit.wikimedia.org/r/430889 (https://phabricator.wikimedia.org/T186728) [11:32:16] (03CR) 10Mforns: [C: 04-1] "Still needs testing and droping the previous wmf.virtualpageview_hourly table." [analytics/refinery] - 10https://gerrit.wikimedia.org/r/430889 (https://phabricator.wikimedia.org/T186728) (owner: 10Mforns) [11:34:46] 10Analytics-Kanban, 10MW-1.31-release-notes (WMF-deploy-2018-02-27 (1.31.0-wmf.23)), 10Patch-For-Review: Record and aggregate page previews - https://phabricator.wikimedia.org/T186728#4181782 (10mforns) @Tbayer I will regenerate the whole table since the beginning of the dataset, so that the new fields are p... [11:58:55] (03CR) 10Mforns: "I tested this and seems to work! Ready to review and merge." [analytics/refinery] - 10https://gerrit.wikimedia.org/r/430889 (https://phabricator.wikimedia.org/T186728) (owner: 10Mforns) [12:00:49] 10Analytics-Kanban, 10MW-1.31-release-notes (WMF-deploy-2018-02-27 (1.31.0-wmf.23)), 10Patch-For-Review: Record and aggregate page previews - https://phabricator.wikimedia.org/T186728#4181856 (10mforns) Also, I killed the previous oozie coordinator, so current data set will not be refined any more. [12:03:19] (03CR) 10Joal: [C: 031] "Minimal nit in comment, code looks good :)" (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/430889 (https://phabricator.wikimedia.org/T186728) (owner: 10Mforns) [12:08:23] (03PS2) 10Mforns: Add source page fields to wmf.virtualpageview_hourly [analytics/refinery] - 10https://gerrit.wikimedia.org/r/430889 (https://phabricator.wikimedia.org/T186728) [12:09:46] (03CR) 10Mforns: [C: 04-1] "Wait, this new Gerrit feature does not work! It undid changes in another file! Or I don't know how to use it..." [analytics/refinery] - 10https://gerrit.wikimedia.org/r/430889 (https://phabricator.wikimedia.org/T186728) (owner: 10Mforns) [12:12:33] (03PS3) 10Mforns: Add source page fields to wmf.virtualpageview_hourly [analytics/refinery] - 10https://gerrit.wikimedia.org/r/430889 (https://phabricator.wikimedia.org/T186728) [12:13:26] (03CR) 10Mforns: "OK, done. Sorry for the mess." [analytics/refinery] - 10https://gerrit.wikimedia.org/r/430889 (https://phabricator.wikimedia.org/T186728) (owner: 10Mforns) [12:14:53] joal, sorry I forgot again to change the email when testing that oozie job... :/ [12:15:08] mforns: if ou wan I have a trick for you [12:15:14] yep [12:15:29] mforns: when you test jobs, how do you synchrionize the oozie code in HDFS ? [12:15:49] I copy the oozie directory in /tmp/mforns/oozie [12:16:04] and assign it as a parameter to the oozie job call [12:16:22] mforns: full oozie folder contains a lot more than what you actually need for a single job test :) [12:16:33] sure [12:16:47] mforns: the way I do it is, I have /users/joal/oozie folder, set up a long time ago [12:17:16] In that forlder, I updated the utils/send_error_email/workflow.xml file so that the default email is me instead of the team [12:17:23] aha [12:17:25] And when I sync, I only sync oozie subfolders [12:17:44] aha [12:17:47] makes sense [12:18:00] The thing is not to forget -f when putting files, so that i overwritesd them [12:18:09] aha yes [12:18:13] will do! [12:18:35] hdfs dfs -put -f oozie/virtualpageview/hourly oozie/virtualpageview [12:18:50] sure [12:18:57] The email conf doesn't change like that, and you don't have to think about it :) [12:19:19] yea :] [12:19:23] joal, hve to go pick up my daughter at school [12:19:29] bye mforns :) [12:19:30] see you in a bit [12:19:32] byeeee [12:19:34] see you later :) [14:14:49] 10Analytics-Legal, 10WMF-Legal, 10Wikidata: Solve legal uncertainty of Wikidata - https://phabricator.wikimedia.org/T193728#4182279 (10Psychoslave) Well, that's a good point @Denny. I can point to a message on the Wikidata WMfr mailing list where [@Karima clearly explain how some contributor envision that p... [14:25:45] 10Analytics-Legal, 10WMF-Legal, 10Wikidata: Solve legal uncertainty of Wikidata - https://phabricator.wikimedia.org/T193728#4182296 (10Ivanhercaz) > by removing them from Wikidata, or by some other solution yet to identify. Just a note. Items with statements that only has references to Wikipedia are not as... [15:01:25] 10Analytics-Legal, 10WMF-Legal, 10Wikidata: Solve legal uncertainty of Wikidata - https://phabricator.wikimedia.org/T193728#4182350 (10Denny) @Psychoslave, I am not sure I entirely follow. You said "there are contributors of Wikidata that do make massive imports of external data banks, regardless of the co... [15:27:19] (03CR) 10Nuria: Add a config param for druid datasources (032 comments) [analytics/aqs] - 10https://gerrit.wikimedia.org/r/429765 (https://phabricator.wikimedia.org/T193387) (owner: 10Joal) [15:32:41] (03CR) 10Nuria: [V: 032 C: 032] Add source page fields to wmf.virtualpageview_hourly [analytics/refinery] - 10https://gerrit.wikimedia.org/r/430889 (https://phabricator.wikimedia.org/T186728) (owner: 10Mforns) [15:44:18] (03CR) 10Joal: "Will update commit message." (032 comments) [analytics/aqs] - 10https://gerrit.wikimedia.org/r/429765 (https://phabricator.wikimedia.org/T193387) (owner: 10Joal) [15:49:02] (03PS2) 10Joal: Add a config param for druid datasources [analytics/aqs] - 10https://gerrit.wikimedia.org/r/429765 (https://phabricator.wikimedia.org/T193387) [15:50:44] 10Analytics-Legal, 10WMF-Legal, 10Wikidata: Solve legal uncertainty of Wikidata - https://phabricator.wikimedia.org/T193728#4182444 (10Karima) My point of view on data was personal, I didn't work for Wikidata and I didn't import massive data... I am only a user. @Sylvain_WMFr can you remove my message of y... [16:05:00] milimetric, heyooo [16:05:55] I’m off Fridays until August, mforns, anything urgent? [16:06:08] ooooh! sorryyyy [16:06:09] Oh no - Sorry to have bothered :( [16:06:15] byeee :] [16:06:18] Enjoy time off milimetric - We forgot [16:17:08] 10Analytics-Legal, 10WMF-Legal, 10Wikidata: Solve legal uncertainty of Wikidata - https://phabricator.wikimedia.org/T193728#4182575 (10Pintoch) I think there are plenty of examples of non-CC0 data being imported in Wikidata. PubMedCentral is being imported at a large scale and as far as I can tell it is not... [17:02:52] o/ [17:11:07] heya [17:12:55] mforns: review my eventllogging log config patch? :) https://gerrit.wikimedia.org/r/#/c/430801/ [17:13:36] ottomata, looking :] [17:14:16] the ExtraFilter thing will be used in https://gerrit.wikimedia.org/r/#/c/430808/ [17:17:52] 10Analytics-Legal, 10WMF-Legal, 10Wikidata: Solve legal uncertainty of Wikidata - https://phabricator.wikimedia.org/T193728#4182720 (10Denny) I don't know the Open License. Given what I understand using automatic translation, the license requires attribution. So if the RNSR is a database in the sense of the... [17:18:27] 10Analytics, 10Analytics-Kanban, 10EventBus, 10Patch-For-Review, 10Services (watching): Upgrade Kafka on main cluster with security features - https://phabricator.wikimedia.org/T167039#4182733 (10Ottomata) [17:23:38] (03CR) 10Nuria: "I see, is there a puppet companion change we can link to this one? (in the commit message)" [analytics/aqs] - 10https://gerrit.wikimedia.org/r/429765 (https://phabricator.wikimedia.org/T193387) (owner: 10Joal) [17:25:05] ottomata, ha... I just learned that JSON is YAML [17:26:05] :! [17:26:07] yeah! [17:26:31] so you can put a json blob inside a yaml field? like: [17:26:41] 10Analytics-Legal, 10WMF-Legal, 10Wikidata: Solve legal uncertainty of Wikidata - https://phabricator.wikimedia.org/T193728#4182743 (10Pintoch) The import was discussed at various places, including at the [[https://www.wikidata.org/w/index.php?title=Wikidata:Dataset_Imports&oldid=664915540#RNSR_(Répertoire_N... [17:26:56] blah: {"field": "value"} [17:30:21] 10Analytics-Kanban, 10MW-1.31-release-notes (WMF-deploy-2018-02-27 (1.31.0-wmf.23)), 10Patch-For-Review: Record and aggregate page previews - https://phabricator.wikimedia.org/T186728#4182774 (10mforns) Hey, no need to delete the current data set in advance. The data set with the new fields is being recomput... [17:39:09] 10Analytics-Legal, 10WMF-Legal, 10Wikidata: Solve legal uncertainty of Wikidata - https://phabricator.wikimedia.org/T193728#4182817 (10Denny) The property is about adding the RNSR ID. That's fine. On the data import hub page you linked, I don't see a mention of the license. Nor on your talk page. I find thi... [18:53:20] Hello A-team! Is there a beta cluster we can use to test the X-analytics header sent with webrequest? [19:01:36] chelsyx: we don't have a hadoop cluster in beta [19:01:39] but akfka and webrequests are there [19:01:55] so if you just want to see the x-analytics values coming through [19:01:56] you can do that [19:01:57] :) [19:03:15] i'm in a meeting atm, but can help you shorly if that is useful [19:03:44] ottomata: Thanks! I will let the developer know. [19:04:11] joal: in case you want to be nerd sniped: I might have a solution to the PartitionDataFrame dataframe methods [19:04:13] :D [19:04:19] not sure if it better than calling partDF.df [19:04:20] but it works! [19:04:33] partDf.select("...") [19:04:36] -> DataFrame tho [19:04:58] i feel like getting a partDf back should be possible [19:23:05] 10Analytics-Legal, 10WMF-Legal, 10Wikidata: Solve legal uncertainty of Wikidata - https://phabricator.wikimedia.org/T193728#4183021 (10Pintoch) I am glad I got the discussion going then: you now have one concrete example to look at (or maybe two? you did not comment on PMC). I think it is fair to say that t... [19:49:45] chelsyx: mmm.. wait, i thought we were not using x-analytics [19:49:49] for anything [19:50:01] chelsyx: as i mentioned on ticket it does not work automagically [19:50:13] chelsyx: hadn't we decide to do EL [19:50:30] chelsyx: for instrumenting both IOs and redaing lists? [19:53:43] chelsyx: or is this for existing fields? [20:11:03] nuria_: It's not about reading list. we are investigating on a bug. [20:11:25] nuria_: MobileWikiAppDailyStats is 100% sampling for ios and should be tracking every opted-in user every day. However, we have only ~60k distinct iOS appInstallIDs daily in MobileWikiAppDailyStats, while there are ~150k distinct iOS appInstallIDs daily in mobile_apps_uniques [22:11:30] chelsyx: ahhhh, i see, let me know what you expect to see in app install id and i can see if i can help