[00:03:13] halfak, hello [00:03:46] halfak, got a minute to talk about my research? [00:04:39] Analytics-Kanban: {flea} Self-serve Analysis - https://phabricator.wikimedia.org/T107955#1695861 (RobH) [00:25:13] bye a-team see you tomorrow! [05:15:47] Analytics-Tech-community-metrics, Possible-Tech-Projects: Port MediaWikiAnalysis to SQLAlchemy - https://phabricator.wikimedia.org/T114437#1696170 (Anmolkalia) @jgbarah, since I am supposed to complete atleast one microtask before the application deadline, I think this one can be done in that much time.... [05:18:53] Analytics-Tech-community-metrics, Possible-Tech-Projects: Improving MediaWikiAnalysis - https://phabricator.wikimedia.org/T89135#1696177 (Anmolkalia) @jgbarah, I went through the microtasks. They sound pretty good to me and sum up all of what we aim to do. So let us start with the one which can be comple... [09:22:40] Analytics-Backlog: Investigate US traffic by state normalized by population - https://phabricator.wikimedia.org/T114469#1696431 (JAllemandou) NEW [09:34:10] Analytics-Backlog, Mobile-Apps: Investigate and fix inconsistent data in mobile_apps_uniques_daily - https://phabricator.wikimedia.org/T114406#1696467 (JAllemandou) Bug in refinery/oozie/mobile_apps/uniques/[daily|monthly] --> When we union old and currently computed data, we should filter the currently co... [10:05:05] (CR) John Vandenberg: [C: 1] Fix tox to be able to run tests [analytics/limn-mobile-data] - https://gerrit.wikimedia.org/r/242966 (owner: Hashar) [10:09:07] Analytics-Backlog: JsonRevisionsSortedPerPage failed on enwiki-20150901-pages-meta-history - https://phabricator.wikimedia.org/T114359#1696560 (JAllemandou) Looked at the logs: Seemed to be an interuption exception. If so, there are chances that the issue comes from timeout. There is a parameter that can be c... [10:19:46] hi a-team! [10:19:52] Hi mforns :) [10:19:58] :] [10:23:18] Analytics-Tech-community-metrics, Possible-Tech-Projects: Improving MediaWikiAnalysis - https://phabricator.wikimedia.org/T89135#1696592 (Qgil) @jgbarah, a co-mentor is required. Are Alvaro del Castillo, Daniel Izquierdo (mentioned in the description) on board? [11:20:16] Analytics-Tech-community-metrics: Exclude certain repositories (upstream / inactive) from Gerrit metrics by blacklisting - https://phabricator.wikimedia.org/T103984#1696706 (Aklapper) [11:58:41] Hi a-team, I'm a away for a moment, will be back before standup [11:58:53] ok joal see you then :] [12:17:33] Analytics-Tech-community-metrics, DevRel-September-2015: Tech community KPIs for the WMF metrics meeting - https://phabricator.wikimedia.org/T107562#1696753 (Qgil) Some progress: we are presenting solid metrics for the first time in a quarterly review, next Monday. The slides will be published after the r... [13:18:40] * joal is back ! [13:18:55] Mr halfak, are you around ? [13:19:09] o/ [13:19:11] Yeah. [13:19:13] :) [13:19:21] Saw your notes re. extract & sort. [13:19:30] I am unhappy to know it has failed :( [13:20:16] I have seen you are running another job on altiscale [13:20:40] Let me know when finished, we'll re-test the josn+ srot [13:20:47] json+sort sorry [13:20:50] halfak: --^ [13:21:09] Yeah. I'm sorting an old dataset because I need to get that run done. [13:21:29] I think that a retry of json+sort will have to wait until either this job fails or I finish a whole new diff run. [13:21:31] hm, not sure [13:21:46] I understand about the old dataset [13:21:51] but no prob [13:22:19] I am very sorry and unhappy, I know you need that thing to work :( [13:22:34] No worries. It will. We've just got to learn its nuances :D [13:22:49] Either way, we get to test out the pure-mapper diff strategy with no sorting. [13:23:00] Yeah, but learning nuances on a tow day job takes long time ! [13:23:14] halfak: Yes, that will be a good thing [13:23:50] halfak: I have not yet looked at the task you created about loading data in the research cluster [13:23:59] halfak: will do early next week [13:24:26] Oh yes. Let me actually write something up for that. [13:24:55] Cool :) [13:38:55] Analytics-Cluster, Datasets-Archiving, Datasets-Webstatscollector: Mediacounts stalled after 2015-09-24 - https://phabricator.wikimedia.org/T113956#1696947 (Hydriz) Open>Resolved It seems like they were backfilled and they now exist. I have just pushed them to the Archive, thanks for your help! [14:12:56] Analytics, Discovery, EventBus, MediaWiki-General-or-Unknown, and 6 others: EventBus MVP - https://phabricator.wikimedia.org/T114443#1696984 (mobrovac) >>! In T114443#1695708, @GWicke wrote: > 1) Provide edit related events (ex: edit, creation, deletion, revision deletion, rename). Consumers: REST... [14:16:28] (CR) Nuria: [C: 2] Fix tox to be able to run tests [analytics/limn-mobile-data] - https://gerrit.wikimedia.org/r/242966 (owner: Hashar) [14:20:39] nuria: \O/ [14:21:23] hashar: holaaa [14:21:36] I am deploying the CI change for limn-mobile-data so the tests will run on patch / +2 [14:21:57] hashar: does that need verified +2? [14:22:04] hashar: or Cr+2 is enough [14:22:22] when one votes CR+2, Jenkins run the jobs (that will be 'tox') [14:22:30] and on success Jenkins does the Verified +2 and Submit for you [14:22:36] but on failure it votes Verified -1 [14:22:53] so in theory you just have to +2 , and Jenkins make sure the change is valid [14:23:37] (CR) Hashar: "recheck" [analytics/limn-mobile-data] - https://gerrit.wikimedia.org/r/242966 (owner: Hashar) [14:24:04] hashar: ok, looking at wikimetrics change now [14:24:15] https://integration.wikimedia.org/ci/job/tox-jessie/176/console :-) [14:24:35] (CR) Hashar: "I have deployed the CI change so tox is run on the repositories now." [analytics/limn-mobile-data] - https://gerrit.wikimedia.org/r/242966 (owner: Hashar) [14:24:52] nuria: the wikimetrics one is a bit more complicated [14:25:12] I had to pass session=False to pip.req.parse_requirements() for some reason [14:25:40] the usedevelop=True is because we skip the dist step, and we ended up with bin/wikimetrics not being installed [14:25:54] use develop runs something like pip install -e (or python setup.py develop) [14:25:55] hashar: my wikimetrics code is from the jurasic period let me update and test [14:26:00] hehe [14:26:41] the test don't pass anyway [14:26:45] that requires a redis backend [14:29:56] hashar: right, i think the nosetest command has to be "osetests --cover-erase -a "!nonDeterministic,!manual,!slow"" [14:30:05] so it does not run the manual tests [14:30:11] "nosetests --cover-erase -a "!nonDeterministic,!manual,!slow" [14:30:33] hashar: makes sense? [14:32:28] hashar: let me change & retry [14:32:35] sure up to whatever you want I guess [14:32:46] I haven't even looked at the testsuite, just added a basic nosetests command [14:33:14] CI wise, ideally Jenkins will just run 'tox' [14:33:31] this way it is up to the developers to define the exact commands to run straight in the source repository [14:33:41] this way you no more depends on updating the Jenkins job configuration [14:34:44] (PS2) Nuria: Fix up tox setup and setup.py parse_requirements() [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/243012 (owner: Hashar) [14:35:12] hashar: works by running tox [14:35:33] hashar: just changed the nosetest command you run by default when running tox, please take a look [14:35:51] hashar: locally works fine. all tests that should be run are so and they pass [14:36:20] hashar: let me know if it makes sense [14:38:51] nuria: checking [14:39:53] nuria: ConnectionError: Error 61 connecting to localhost:6379. Connection refused. [14:39:53] :D [14:40:16] because /tests/__init__.py launch wikimetrics which attempts to hit a local redis [14:41:32] (CR) Hashar: [C: 1] "Looks like PS2 will cover the most common use cases. For CI that would fail with:" [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/243012 (owner: Hashar) [14:41:39] in short: looks fine [14:43:40] hashar: right, ok, let's merge then [14:44:28] (CR) Nuria: [C: 2] Fix up tox setup and setup.py parse_requirements() [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/243012 (owner: Hashar) [14:45:03] hashar: anything else we should take a look? [14:46:03] owner:hashar project:^analytics/.* is:open --> empty !!!!!!!!!!! [14:46:17] nuria: nothing left apparently. Thanks a ton [14:46:35] if you know of other repositories that might need jobs/tests, we can add them as well [14:46:53] hashar: ahem... no it is the other way arround, thanks to ya' [14:51:36] :-D [14:53:34] Analytics-Backlog, Analytics-Wikistats: Feed Wikistats traffic reports with aggregated hive data - https://phabricator.wikimedia.org/T114379#1697093 (Nuria) [15:30:19] Analytics-Kanban, Patch-For-Review, WMF-deploy-2015-10-06_(1.27.0-wmf.2): Bug: client IP is being hashed differently by the different parallel processors {stag} [13 pts] - https://phabricator.wikimedia.org/T112688#1697272 (kevinator) Open>Resolved [15:30:50] Analytics-Kanban, Patch-For-Review, WMF-deploy-2015-10-06_(1.27.0-wmf.2): Bug: client IP is being hashed differently by the different parallel processors {stag} [13 pts] - https://phabricator.wikimedia.org/T112688#1642642 (kevinator) [15:30:51] Analytics-EventLogging, Analytics-Kanban: {stag} EventLogging on Kafka - https://phabricator.wikimedia.org/T102225#1697278 (kevinator) [15:31:16] Analytics-Backlog, Analytics-EventLogging, Performance-Team, Patch-For-Review: Make webperf eventlogging consumers use eventlogging on Kafka - https://phabricator.wikimedia.org/T110903#1697281 (kevinator) [15:32:52] Analytics-Backlog, Analytics-EventLogging, Performance-Team, Patch-For-Review: Make webperf eventlogging consumers use eventlogging on Kafka - https://phabricator.wikimedia.org/T110903#1589603 (kevinator) [15:32:54] Analytics-EventLogging, Analytics-Kanban: {stag} EventLogging on Kafka - https://phabricator.wikimedia.org/T102225#1697292 (kevinator) [15:35:15] Analytics-Kanban: {flea} Self-serve Analysis - https://phabricator.wikimedia.org/T107955#1697298 (kevinator) [15:35:16] Analytics-Kanban: Introduction to Hive class {flea} [13 pts] - https://phabricator.wikimedia.org/T113545#1697297 (kevinator) Open>Resolved [15:35:38] Analytics-Cluster, Analytics-Kanban: Send email when load jobs fails {hawk] [8 pts] - https://phabricator.wikimedia.org/T113253#1697301 (kevinator) Open>Resolved [15:35:39] Analytics-Backlog, Analytics-Cluster: {epic} Implement better Webrequest load monitoring {hawk} - https://phabricator.wikimedia.org/T109192#1697302 (kevinator) [15:39:50] Analytics-Kanban: Enable use of Python 3 in Spark {hawk} [8 pts] - https://phabricator.wikimedia.org/T113419#1697318 (kevinator) [15:41:11] milimetric: my connection dropped [15:41:29] yeah, you can type here, I think everyone's on [15:43:24] I'll head to office people. Cya there. [15:49:52] Analytics-Tech-community-metrics, Possible-Tech-Projects: Improving MediaWikiAnalysis - https://phabricator.wikimedia.org/T89135#1697328 (jgbarah) [15:50:38] Analytics-Tech-community-metrics, Possible-Tech-Projects: Improving MediaWikiAnalysis - https://phabricator.wikimedia.org/T89135#1027974 (jgbarah) >>! In T89135#1696592, @Qgil wrote: > @jgbarah, a co-mentor is required. Are Alvaro del Castillo, Daniel Izquierdo (mentioned in the description) on board? A... [15:53:52] Analytics-Tech-community-metrics, Possible-Tech-Projects: Improving MediaWikiAnalysis - https://phabricator.wikimedia.org/T89135#1697339 (jgbarah) >>! In T89135#1696177, @Anmolkalia wrote: > @jgbarah, I went through the microtasks. They sound pretty good to me and sum up all of what we aim to do. So let... [15:57:37] Analytics-Tech-community-metrics, Possible-Tech-Projects: Port MediaWikiAnalysis to SQLAlchemy - https://phabricator.wikimedia.org/T114437#1697370 (jgbarah) This one needs some knowledge about Python, and will require you to learn about SQLAlchemy. Fortunately, there is plenty of documentation about SQLA... [15:59:17] milimetric: ok, i'm going to do your stuff and some spark stuff for now [15:59:24] lemme know when you are ready for me to push buttons [15:59:46] thx ottomata, I think you can +2 and merge that anytime (needs rebase probably) [16:00:06] otherwise the services guys are talking deployment details in #wikimedia-services [16:00:18] i'll let you know when they're decided and we need access [16:06:17] Analytics-Kanban, Patch-For-Review: Puppetize a server with a role that sets up Cassandra on Analytics machines [13 pts] {slug} - https://phabricator.wikimedia.org/T107056#1697401 (JAllemandou) Hey DevOps Guys, As part of that task, we would need the cassandra cluster to beaccessible from the hadoop cluste... [16:06:29] ottomata: Do i need to transfer to somebody ? --^ [16:07:09] Analytics-Backlog: Load Wikimedia JSON data into Altiscale "Research Cluster" HIVE - https://phabricator.wikimedia.org/T114489#1697404 (Halfak) NEW [16:07:31] joal, see https://phabricator.wikimedia.org/T114489 -- Load Wikimedia JSON data into Altiscale "Research Cluster" HIVE [16:07:34] Analytics-Kanban, Patch-For-Review: Puppetize a server with a role that sets up Cassandra on Analytics machines [13 pts] {slug} - https://phabricator.wikimedia.org/T107056#1697415 (Ottomata) @akosiaris ^ [16:07:49] I'm sure that is not enough detail. Please let me know what else you need. [16:08:21] Also, you'll notice the dataset I reference is old. I have an open ticket with altiscale for an issue with transferring new datasets to the cluster. [16:08:41] So I might get new data before you get a chance to pick up the ticket. [16:08:45] * halfak crosses fingers [16:11:09] (PS4) Nuria: [WIP] Changes to camus to debug/test avro in 1002 [analytics/camus] - https://gerrit.wikimedia.org/r/242907 [16:16:24] halfak: I'll sync up before tackling the ticket, just to be sure :) [16:16:29] halfak: Thanks for the doc ! [16:16:34] Analytics, Discovery, EventBus, MediaWiki-General-or-Unknown, and 6 others: EventBus MVP - https://phabricator.wikimedia.org/T114443#1697452 (Ottomata) Edit events would be awesome and totally doable with this MVP, but I'm a little worried about the amount of bike shedding that will go into design... [16:16:42] Sounds good. no prob. :) [16:16:45] ottomata: I have a request for you [16:17:26] ottomata: We are at version wmf6 of camus I think, and in archiva we only have until wmf4 [16:17:34] Do you mind pushing that ? [16:17:52] oh! [16:18:17] ottomata: My camus checker is almost done (I've been fighting with maven a bit) [16:18:20] oh! how's that possible...i guess I just git added and git pushed and git fat synced? [16:18:27] You'll a request for review today :) [16:18:45] ottomata: I have no idea :( [16:19:24] k will upload to archiva [16:20:13] Also ottomata : I think the camus jars you build are fat jar [16:20:31] should it not be? [16:20:41] That would be great to change that to small ones (would prevent me to have to exclude in maven) [16:20:54] exclude in maven? like the hadoop stuff? [16:20:59] joal: am all for it if it works [16:20:59] ottomata: Noramlly hadoop knows where to find haddop jars: ) [16:21:03] not sure if i know much about it [16:21:10] it might be setting that just came in with camus? not sure [16:21:24] hm ... I'll have a look at the repo [16:21:27] and submit [16:21:42] joal: https://archiva.wikimedia.org/#artifact~releases/org.wikimedia.analytics/camus-wmf/0.1.0-wmf6 [16:21:45] its in archiva [16:21:49] awesome :) [16:21:51] Thanks [16:21:56] it was already there though! [16:22:07] Ah, yes, but not listed ! [16:22:14] So I didn't know ! [16:22:32] Analytics, MediaWiki-Authentication-and-authorization, Reading-Infrastructure-Team, MW-1.26-release, Patch-For-Review: Create dashboard to track key authentication metrics before, during and after AuthManager rollout - https://phabricator.wikimedia.org/T91701#1697467 (bd808) >>! In T91701#169582... [16:25:14] Actually ottomata, not your fault: hadoop is a dependency for camus [16:25:24] but we should tell it that the jars are provided [16:25:59] aye, joal feel free to modify the pom [16:37:56] madhuvishy, nuria, do you know how many people we trained to use hive? [16:40:15] mforns: 7 as far as I know [16:40:22] I'd check with with Kevin [16:40:29] madhuvishy, ok, thx [16:41:22] mforns: https://phabricator.wikimedia.org/T107955 4 people here, Neil, S, and Tilman before - these I know. I remember Kevin helping someone on TPG, but not sure [16:42:22] aha [17:07:13] AHHH ceap [17:07:16] oops, wrong chat [17:24:01] ottomata: I just get it : camus jar was in archive, but mot the place I expected it, that's why I didn't find it :) [17:24:10] ottomata: thanks anyway :) [17:29:44] ah k, cool [17:29:45] np [17:30:00] Ahhhh ! ottomata : still isues ! [17:30:05] :( [17:30:07] Mwarf [17:30:13] oh wassup? [17:30:35] Can't use the wmf6 jar, maven doesn't manage to read properly [17:31:10] Failed to read artifact descriptor for org.wikimedia.analytics:camus-wmf:jar:0.1.0-wmf6: Could not find artifact com.linkedin.camus:camus-parent:pom:0.1.0-wmf6 [17:31:15] ottomata: --^ [17:31:16] :( [17:31:45] That's a shame, because that's acutally the one I'm after (not even the camus-wmf wrapper) [17:34:25] hm [17:34:27] ? [17:34:33] that's when you build a non fat jar? [17:35:18] So, since the pom defines a dependency, it assumes it has to download uit (non-fat jar mode) [17:35:30] Analytics-Backlog: Spike: Can we have a production Event Logging endpoint from labs? - https://phabricator.wikimedia.org/T114503#1697728 (Milimetric) NEW [17:40:42] ah, and the linked in parent pom isn't in the repo? [17:40:42] hm. [17:41:33] joal: do we just need parent pom in repo? [17:42:01] Don't know ottomata, I think it's the dependency (camus-etl-kafka) that is missing [17:42:09] maybe not [17:42:47] hm, it seems it says it can't find the parent pom [17:42:49] and it needs it [17:42:49] but [17:42:55] https://archiva.wikimedia.org/#artifact-details-download-content~releases/org.wikimedia.analytics/camus-wmf/0.1.0-wmf6 [17:42:57] pom is there [17:43:16] https://archiva.wikimedia.org/repository/releases/org/wikimedia/analytics/camus-wmf/0.1.0-wmf6/camus-wmf-0.1.0-wmf6.pom [17:43:26] oh that is the parent of the parent? [17:43:35] parent pom is com.linkedin.camus [17:43:41] OH [17:43:42] no [17:43:47] got it, yes it needs parent pom [17:43:48] think i can fix. [17:43:59] You rock :) [17:45:57] Analytics-Kanban, Patch-For-Review: Enable use of Python 3 in Spark {hawk} [8 pts] - https://phabricator.wikimedia.org/T113419#1697860 (Ottomata) ellery, we are not running any production spark jobs, and the more I think about this, the less dangerous I think it is. I tested that this fixes the problem in... [17:47:34] Analytics, Discovery, EventBus, MediaWiki-General-or-Unknown, and 6 others: EventBus MVP - https://phabricator.wikimedia.org/T114443#1697875 (bd808) >>! In T114443#1697452, @Ottomata wrote: > What about api.php logging? See T108618. Those logs are still being collected via udp2log, whereas the e... [17:51:20] Analytics, Discovery, EventBus, MediaWiki-General-or-Unknown, and 6 others: EventBus MVP - https://phabricator.wikimedia.org/T114443#1697887 (Ottomata) @bd808, what's your timeline on this? Producing directly into Kafka is fine, but we are trying to do two things with this MVP: - centralize and... [17:56:33] Analytics, Discovery, EventBus, MediaWiki-General-or-Unknown, and 6 others: EventBus MVP - https://phabricator.wikimedia.org/T114443#1697908 (bd808) >>! In T114443#1697887, @Ottomata wrote: > @bd808, what's your timeline on this? Producing directly into Kafka is fine, but we are trying to do two... [17:56:55] Analytics-Kanban, Patch-For-Review: Enable use of Python 3 in Spark {hawk} [8 pts] - https://phabricator.wikimedia.org/T113419#1697912 (JAllemandou) @Ottomata: We do have some spark jobs in prob (mobile stats and restbase metrics). They are not python though, so hopefully not affected :) [18:01:03] Analytics-Backlog, Developer-Relations, MediaWiki-API, Reading-Infrastructure-Team, and 5 others: Metrics about the use of the Wikimedia web APIs - https://phabricator.wikimedia.org/T102079#1697920 (Ottomata) @bd808, @eberhardson I believe is producing binary Avro to Kafka. At this time, I'd advise... [18:02:12] joal: hi [18:02:29] ottomata: I tried to shade the camus-etl-kafka package without hadoop deps (the same way I did for refinery) --> from 40Mb to 12Mb [18:02:33] Hi HaeB [18:02:37] Excuse, I missed the time [18:02:41] Joining now [18:08:57] ah sorry joal got sidetracked [18:15:48] joal: ok [18:15:48] try now [18:15:50] https://archiva.wikimedia.org/#artifact~releases/com.linkedin.camus/camus-parent/0.1.0-wmf6 [18:16:04] i had to manually create a directory on archiva host, that wasn't cool. [18:16:04] Ironholds: Hey, you there ? [18:16:04] :/ [18:16:42] ottomata: look uncool indeed [18:17:34] joal: yeah i don't fully understand there [18:17:40] its like it wouldn't let me upload the pom by itself. [18:17:42] :( [18:17:52] Have you seen the message about non-fat jar ? [18:17:55] ottomata: --^ [18:17:57] and the directory didn't exist [18:18:00] We could go that directiomn [18:18:05] i think i probably shoudl have done a full camus jar release [18:18:06] hmmm [18:18:19] joal: don't understand [18:18:20] ottomata: I think that's it yes [18:18:32] ah, so, we push a fat camus-wmf jar [18:18:35] etl jar with hadoop bundled: 40M [18:18:44] and if we make it non fat, it doesn't have camus deps [18:18:46] etl jar without hadoop bundled: 40M [18:18:49] but we don't deploy camus deps to cluster [18:18:50] etl jar without hadoop bundled: 12M [18:18:51] sorry [18:19:10] Every all deps are shaded, only haddop [18:19:15] so, we need a camus-wmf jar with camus deps, but without hadoop ones [18:19:17] ja? [18:19:24] correct [18:19:31] ottomata: same issue wit [18:19:37] ok, then why does maven need this pom in archiva? [18:19:41] with com.linkedin.camus:camus-etl-kafka:jar:0.1.0-wmf6 [18:19:44] if the camus deps are included [18:20:08] I think maven assumes packages to be non-fat [18:20:09] joal: i don't understand, if we have a camus-wmf artifact with camus deps in it [18:20:15] ah, its just maven [18:20:21] So when there is a dep, it is donwmloaded [18:20:23] so, you try to depend on camus-wmf [18:20:25] ? [18:20:30] Yes sir [18:20:33] oh, but you want to depend on camus-etl-kafka, right? [18:20:34] hm. [18:20:38] Correct [18:20:52] so we should do a full camus release with all artifacts, not just camus-wmf? [18:21:03] I need the EtlKey and the EtlInputMapper, and some other classes [18:21:15] i suppose you could just depend on camus-wmf, and since those are in that jar...its ok? [18:21:35] ottomata: correct, if it get downloaded :) [18:22:32] I think we could also revamp the maven of the packages, to ensure we just build the samllest part of it [18:22:45] Would make it easier I guess (less dependencies) [18:23:02] aye [18:23:13] well, i mean, we DO want the camus deps in camus-wmf.jar [18:23:24] Yes, but only the needed ones :) [18:23:25] since we don't deploy with maven, those won't be on hosts [18:23:31] but, we dont' need hadoop deps [18:23:36] since those are deployed by .deb packages [18:23:45] There 6 modules in camus, and etl-kafka only uses 2 [18:24:05] joal: can you depend on camus-wmf now that the camus-parent is in archiva? [18:24:06] So half of it can be left (I say that for archive release eainess) [18:24:20] ottomata: nope, missing etl-kafka under linkeedin [18:24:26] oh, aye, k. [18:24:27] hm. [18:24:46] Then etl-kafka will miss the other two ... [18:24:46] sooooooo really we should build maven release plugin into our camus fork [18:24:58] I do agree :) [18:25:21] CAuse now the camus-wmf wrapper is not even in the forkm is it ? [18:25:50] Analytics-Kanban, Patch-For-Review: Enable use of Python 3 in Spark {hawk} [8 pts] - https://phabricator.wikimedia.org/T113419#1698027 (Ottomata) Totally works! ``` export PYSPARK_PYTHON=python3 export PYSPARK_DRIVER_PYTHON=ipython3 bin/pyspark --master yarn Python 3.4.0 (default, Apr 11 2014, 13:05:11)... [18:27:08] hm, joal it is [18:27:19] https://github.com/wikimedia/analytics-camus [18:27:23] https://github.com/wikimedia/analytics-camus/tree/wmf [18:28:08] Analytics-Backlog: Investigate US traffic by state normalized by population - https://phabricator.wikimedia.org/T114469#1698031 (kevinator) I did something similar with a trial version of Tableau back in April (at the Philly Hackathon). It wasn't virginia, but neighbouring Washington DC which stood out. I w... [18:28:31] Riiiight --> dedicated branch [18:28:38] Couldn't find it ottomata [18:28:42] Ok, makes sense [18:28:52] ja [18:28:54] description here [18:28:54] https://github.com/wikimedia/analytics-camus/tree/wmf#wikimedia-fork--branches [18:29:03] mabye we should delete teh wikimedia/camus fork though [18:29:13] since i doubt linkedin will accept pull requests anymore [18:29:17] I'll try to sort :D [18:29:40] hm, now that I've found that, I'll try to revamp a bit of the stuff [18:30:42] ottomata: I'll try to have ready for monday, to provide camus-checker fter :) [18:31:26] Ironholds: Sorry to bug you, I'm asked to review the pageview code you used for april pageview computation [18:31:43] Ironholds: If you could send me a link to that place, that'd be great :) [18:34:41] a-team, it's the end of my day ! [18:34:53] Thanks ottomata for the help on maven :) [18:35:11] ciao [18:35:13] joal: have a nice weekend! [18:35:20] Thx madhuvishy ! [18:36:19] laters! [18:55:13] Analytics, Discovery, EventBus, MediaWiki-General-or-Unknown, and 6 others: EventBus MVP - https://phabricator.wikimedia.org/T114443#1698184 (Ottomata) Hm, @gwicke, if Mediawiki has access to schemas, and is relatively sure that it can produce valid messages for a given topic, why should a Mediawi... [19:01:29] Analytics, Discovery, EventBus, MediaWiki-General-or-Unknown, and 6 others: EventBus MVP - https://phabricator.wikimedia.org/T114443#1698223 (GWicke) @ottomata, main reason would be the ability to work with $simple_queue, $binary_kafka, $amazon_queue and so on without changes in MW code. This isn'... [19:02:28] Analytics, Discovery, EventBus, MediaWiki-General-or-Unknown, and 6 others: EventBus MVP - https://phabricator.wikimedia.org/T114443#1698226 (Ottomata) Oh, so the service is planned to be kafka agnostic? (I will come hang on your side after lunch :) ) [19:34:13] Analytics-Kanban, Patch-For-Review: Enable use of Python 3 in Spark {hawk} [8 pts] - https://phabricator.wikimedia.org/T113419#1698301 (ellery) Woohoo, thanks otto! [19:41:17] Analytics-Kanban: Investigate sample cube pageview_count vs unsampled log pageview count [5 pts] - https://phabricator.wikimedia.org/T108925#1698315 (Tbayer) For the record, Oliver has shared the reference implementation of the pageview definition underlying pageviews05: https://github.com/wikimedia-research/... [20:11:45] milimetric: do you know what's a good way to get pageview data before 5/2015? [20:12:03] dumps? [20:12:04] madhuvishy: the only way is the currently still existing sampled logs [20:12:15] oh, you can get dumps but that's more limiting [20:12:25] hmmm, where are the sampled logs? [20:12:26] (eg. already has the old pv definition applied, pre-aggregated, etc. [20:12:33] aah [20:12:36] hmmm [20:12:38] the sampled logs are on stat1002, I can give you the dir [20:12:55] but they're also loaded up from June 2014 to June 2015 in milimetric.webrequest_sampled [20:13:51] okay [20:15:48] thanks [20:19:48] milimetric: Edward Galvez is looking for some Q1 data, I replied to him with what I know - but cc-ed you too [20:20:05] in case i'm saying something stupid [20:33:12] milimetric: , joal, cass + restbase patch merged and applied [20:34:00] ottomata: yeah, I think we need to do this ansible thing now: [20:34:01] ansible-playbook -i production -e target=aqs roles/restbase/deploy.yml [20:34:08] from where? [20:34:10] and I think someone with ssh needs to do it [20:34:25] mmm, not super sure, but i think from the ansible repo after installing all the deps? [20:34:34] let's ask in -services [20:43:03] milimetric: wait a moment [20:43:15] first, you need to get rid of the trebuchet repo [20:43:33] also, since this is completely untested, please do a dry-run before each command [20:43:53] so, ansible-playbook -i production -e target=aqs roles/restbase/setup.yml --check --diff [20:43:57] gwicke: let's talk in services [22:18:12] Analytics-Backlog, MediaWiki-API, Research-and-Data, Patch-For-Review: log user agent in api.log - https://phabricator.wikimedia.org/T108618#1698846 (bd808) @Spage: would you object to me re-titling and adjusting the main task description here to reflect the real direction that this work is taking i... [22:47:05] Analytics-Backlog, MediaWiki-API, Research-and-Data, Patch-For-Review: log user agent in api.log - https://phabricator.wikimedia.org/T108618#1698925 (bd808) >>! In T108618#1674708, @Nuria wrote: > >>Is there a way to get a unique identifier to the varnish log in MediaWiki code? Otherwise that data... [22:52:01] Analytics, MediaWiki-Authentication-and-authorization, Reading-Infrastructure-Team, MW-1.26-release, Patch-For-Review: Create dashboard to track key authentication metrics before, during and after AuthManager rollout - https://phabricator.wikimedia.org/T91701#1698945 (Tgr) The task for upgrading... [23:08:31] Analytics-Backlog, MediaWiki-API, Research-and-Data, Patch-For-Review: log user agent in api.log - https://phabricator.wikimedia.org/T108618#1698981 (Tgr) >>! In T108618#1698925, @bd808 wrote: > If we what to try and do that we should probably spin it off into a new task for more in depth discussion... [23:30:42] Analytics, MediaWiki-Authentication-and-authorization, Reading-Infrastructure-Team, MW-1.26-release, Patch-For-Review: Create dashboard to track key authentication metrics before, during and after AuthManager rollout - https://phabricator.wikimedia.org/T91701#1699005 (bd808) >>! In T91701#169746...