[00:13:20] Analytics, Analytics-Tech-community-metrics: Set up metrics for Time on Site - https://phabricator.wikimedia.org/T119352#1824182 (GWicke) NEW [00:13:45] Analytics, Analytics-Tech-community-metrics: Set up metrics for Time on Site - https://phabricator.wikimedia.org/T119352#1824189 (GWicke) [08:45:25] Analytics-Tech-community-metrics, Possible-Tech-Projects, Outreachy-Round-11: Misc. improvements to MediaWikiAnalysis (which is part of the MetricsGrimoire toolset) - https://phabricator.wikimedia.org/T89135#1824480 (01tonythomas) [08:45:33] Analytics-Tech-community-metrics, Outreachy-Round-11: Outreachy Proposal for Improving MediaWikiAnalysis - https://phabricator.wikimedia.org/T116733#1824478 (01tonythomas) Open>declined Thank you for your proposal. Sadly, the Outreachy administration team made it strict that candidates with any kind... [09:53:57] Analytics-Kanban, WMDE-Analytics-Engineering, Wikidata, Patch-For-Review: Fix '.*http.*' not being tagged as spiders in webrequest [5 pts] {hawk} - https://phabricator.wikimedia.org/T119054#1824596 (Addshore) Per the drop on https://vital-signs.wmflabs.org/#projects=wikidatawiki/metrics=Pageviews it... [10:08:38] Analytics-Kanban, WMDE-Analytics-Engineering, Wikidata, Patch-For-Review: Fix '.*http.*' not being tagged as spiders in webrequest [5 pts] {hawk} - https://phabricator.wikimedia.org/T119054#1824622 (JAllemandou) @Addshore: Not feasible since original user_agent is not present in pageview_hourly. [10:10:06] Analytics-Kanban, WMDE-Analytics-Engineering, Wikidata, Patch-For-Review: Fix '.*http.*' not being tagged as spiders in webrequest [5 pts] {hawk} - https://phabricator.wikimedia.org/T119054#1824623 (Addshore) ahh okay! :/ Is it possible to add a not to the spike on the graph displayed on vital-sign... [10:16:45] Analytics-Kanban, WMDE-Analytics-Engineering, Wikidata, Patch-For-Review: Fix '.*http.*' not being tagged as spiders in webrequest [5 pts] {hawk} - https://phabricator.wikimedia.org/T119054#1824639 (JAllemandou) @ Addshore: The A are notes (there is a card if you place your mouse over it), and there... [10:21:41] Analytics-Kanban, WMDE-Analytics-Engineering, Wikidata, Patch-For-Review: Fix '.*http.*' not being tagged as spiders in webrequest [5 pts] {hawk} - https://phabricator.wikimedia.org/T119054#1824644 (Addshore) Ahh, I never left my mouse over the tag for long enough. Is it not possible to do notes on... [10:26:18] Analytics-Kanban, WMDE-Analytics-Engineering, Wikidata, Patch-For-Review: Fix '.*http.*' not being tagged as spiders in webrequest [5 pts] {hawk} - https://phabricator.wikimedia.org/T119054#1824649 (JAllemandou) Notes are to the dashiki page, but I think you can modify the existing ones if you wish :) [10:39:47] Analytics-Kanban, WMDE-Analytics-Engineering, Wikidata, Patch-For-Review: Fix '.*http.*' not being tagged as spiders in webrequest [5 pts] {hawk} - https://phabricator.wikimedia.org/T119054#1824658 (Addshore) {{doing}} [11:59:03] Analytics-Kanban, WMDE-Analytics-Engineering, Wikidata, Patch-For-Review: Fix '.*http.*' not being tagged as spiders in webrequest [5 pts] {hawk} - https://phabricator.wikimedia.org/T119054#1824807 (JAllemandou) Thanks :) [12:01:28] (PS1) Addshore: Add and use retryingExternalCurlGet for php scripts [analytics/limn-wikidata-data] - https://gerrit.wikimedia.org/r/254834 [12:03:05] (PS2) Addshore: Add and use retryingExternalCurlGet for php scripts [analytics/limn-wikidata-data] - https://gerrit.wikimedia.org/r/254834 [12:10:11] (CR) Addshore: [C: 2 V: 2] Add and use retryingExternalCurlGet for php scripts [analytics/limn-wikidata-data] - https://gerrit.wikimedia.org/r/254834 (owner: Addshore) [12:13:22] Analytics, operations, Database: db1046 running out of disk space - https://phabricator.wikimedia.org/T119380#1824823 (jcrespo) NEW [12:18:28] Analytics-Kanban: Expose the results of the global metric at a public link, that's available immediately for the API {kudu} [8 pts] - https://phabricator.wikimedia.org/T118310#1824844 (mforns) a:mforns [12:42:16] (PS1) Addshore: Split sql query & exec lines [analytics/limn-wikidata-data] - https://gerrit.wikimedia.org/r/254844 [12:43:16] (CR) Addshore: [C: 2 V: 2] Split sql query & exec lines [analytics/limn-wikidata-data] - https://gerrit.wikimedia.org/r/254844 (owner: Addshore) [13:20:40] * joal is away for a few hours [13:33:53] (CR) DCausse: [C: 1] Implement ArraySum UDF [analytics/refinery/source] - https://gerrit.wikimedia.org/r/254452 (owner: EBernhardson) [13:38:45] Analytics-Kanban, CirrusSearch, Discovery, operations, and 2 others: Delete logs on stat1002 in /a/mw-log/archive that are more than 90 days old. - https://phabricator.wikimedia.org/T118527#1824949 (ArielGlenn) Oldest file in these directories is now Aug 25 (today is Nov 23), so looking good. [13:48:29] (PS1) Addshore: Add site_stats/pages_by_namespace script [analytics/limn-wikidata-data] - https://gerrit.wikimedia.org/r/254847 [13:48:40] (CR) Addshore: [C: 2 V: 2] Add site_stats/pages_by_namespace script [analytics/limn-wikidata-data] - https://gerrit.wikimedia.org/r/254847 (owner: Addshore) [13:54:05] (PS1) Addshore: Add WikimediaDb getPdo static method [analytics/limn-wikidata-data] - https://gerrit.wikimedia.org/r/254848 [13:54:15] (CR) Addshore: [C: 2 V: 2] Add WikimediaDb getPdo static method [analytics/limn-wikidata-data] - https://gerrit.wikimedia.org/r/254848 (owner: Addshore) [13:56:18] (PS1) Addshore: Also count item talk pages [analytics/limn-wikidata-data] - https://gerrit.wikimedia.org/r/254849 [13:56:27] (CR) Addshore: [C: 2 V: 2] Also count item talk pages [analytics/limn-wikidata-data] - https://gerrit.wikimedia.org/r/254849 (owner: Addshore) [13:57:48] (PS1) Addshore: Fix missing space in db query [analytics/limn-wikidata-data] - https://gerrit.wikimedia.org/r/254850 [13:58:00] (CR) Addshore: [C: 2 V: 2] Fix missing space in db query [analytics/limn-wikidata-data] - https://gerrit.wikimedia.org/r/254850 (owner: Addshore) [14:19:05] goodmorniiniing [14:26:51] Analytics-Backlog, operations, HTTPS: EventLogging sees too few distinct client IPs - https://phabricator.wikimedia.org/T119144#1825022 (Ottomata) BTW, we are soon removing `ip` and `x_forwarded_for` from the webrequest table, in favor of the `X-Client-IP` header that will be set on all varnish requests... [14:29:55] Analytics-Kanban, operations, Database: db1046 running out of disk space - https://phabricator.wikimedia.org/T119380#1825024 (Milimetric) p:Triage>High [14:30:12] Analytics-Kanban, operations, Database: db1046 running out of disk space - https://phabricator.wikimedia.org/T119380#1825027 (Milimetric) a:Milimetric [14:51:39] (PS1) Addshore: Count WD query triples [analytics/limn-wikidata-data] - https://gerrit.wikimedia.org/r/254862 [14:52:43] (CR) Addshore: [C: 2 V: 2] Count WD query triples [analytics/limn-wikidata-data] - https://gerrit.wikimedia.org/r/254862 (owner: Addshore) [14:53:38] morning ottomata ! [14:53:50] :) [14:58:03] Analytics-Tech-community-metrics, DevRel-November-2015: Explain / sort out / fix SCM repository number mismatch on korma - https://phabricator.wikimedia.org/T116483#1825095 (Lcanasdiaz) Bug is pretty similar to T116484. It's also already isolated. [15:14:19] Analytics-Tech-community-metrics, DevRel-November-2015: Explain / sort out / fix SCR repository number mismatch on korma - https://phabricator.wikimedia.org/T116484#1825153 (Lcanasdiaz) Fixed in upstream product version. [15:23:36] Analytics-Kanban, CirrusSearch, Discovery, operations, and 2 others: Delete logs on stat1002 in /a/mw-log/archive that are more than 90 days old. - https://phabricator.wikimedia.org/T118527#1825194 (Ironholds) It would be really good if it could have been explicitly called out (with a ping) that the... [15:39:30] Analytics-Kanban, CirrusSearch, Discovery, operations, and 2 others: Delete logs on stat1002 in /a/mw-log/archive that are more than 90 days old. - https://phabricator.wikimedia.org/T118527#1825288 (Ottomata) Aye ok. I should have been more explicit about this. (You were CCed on this ticket though... [15:49:36] (PS1) Addshore: Add sparql lag to metrics [analytics/limn-wikidata-data] - https://gerrit.wikimedia.org/r/254876 [15:49:49] (CR) Addshore: [C: 2 V: 2] Add sparql lag to metrics [analytics/limn-wikidata-data] - https://gerrit.wikimedia.org/r/254876 (owner: Addshore) [16:08:08] hola a-team [16:08:17] Hi nuria [16:08:22] heloo [16:09:01] hi, morning [16:12:09] Analytics-Backlog, Database: Set up bucketization of editCount fields {tick} - https://phabricator.wikimedia.org/T108856#1825375 (Milimetric) Also, @jcrespo, we're happy to work on the SQL, we were just following the process we used with Sean where he preferred to do some of this type of work. [16:21:11] Hey ottomata, will you have time to deploy aqs today? [16:21:41] sure, am in meeting now, but ja [16:21:47] awesome :) [16:52:29] Analytics-Backlog, operations, HTTPS: EventLogging sees too few distinct client IPs - https://phabricator.wikimedia.org/T119144#1825526 (leila) @Nuria and @ori, agreed. Please keep me in the loop if a discussion happens outside of this thread and T118557. We have an upcoming research in Q3 that relies h... [16:53:31] Analytics-Backlog, operations, HTTPS: EventLogging sees too few distinct client IPs - https://phabricator.wikimedia.org/T119144#1825529 (ori) >>! In T119144#1825022, @Ottomata wrote: > BTW, we are [[ https://phabricator.wikimedia.org/T118557 | soon removing ]] `ip` and `x_forwarded_for` from the webrequ... [16:53:37] Analytics-Backlog, operations, HTTPS: EventLogging sees too few distinct client IPs - https://phabricator.wikimedia.org/T119144#1825531 (leila) [16:53:38] Analytics-Backlog, Analytics-Cluster, Improving access, Research-and-Data: Hashed IP addresses in refined webrequest logs - https://phabricator.wikimedia.org/T118595#1825532 (leila) [16:54:27] Analytics-Backlog, Analytics-Cluster, Improving access, Research-and-Data: Hashed IP addresses in refined webrequest logs - https://phabricator.wikimedia.org/T118595#1804411 (leila) [16:56:32] Analytics-Backlog, Database: Set up bucketization of editCount fields {tick} - https://phabricator.wikimedia.org/T108856#1825545 (jcrespo) I would prefer it too, but I cannot honstly guarantee to work quickly on this. [17:01:50] a-team: joining standup. ottomata and i were on meeting [17:01:57] k [17:03:59] Analytics-Kanban: Encapsulating the retrieval of schemas from local depot from KafkaSchemaRegistry [3] - https://phabricator.wikimedia.org/T119211#1825617 (Nuria) a:Nuria [17:04:12] Analytics-Kanban: Research avro schema evolution, do we need a write and reader schema? [3] - https://phabricator.wikimedia.org/T119092#1825619 (Nuria) Open>Resolved [17:04:28] (PS1) Madhuvishy: [WIP] Add new form for launching the global metrics report [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/254887 [17:05:08] (PS5) Madhuvishy: [WIP] Setup celery task workflow to handle running reports for the Global API [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/253750 (https://phabricator.wikimedia.org/T118308) [17:07:13] Analytics-Backlog, Analytics-Cluster: Implement better Webrequest load monitoring {hawk} - https://phabricator.wikimedia.org/T109192#1825642 (Ottomata) Open>declined Joseph and Marcel did this when they made webrequest jobs depend on camus offsets. [17:07:35] Analytics-Backlog, Analytics-Cluster: Implement better Webrequest load monitoring {hawk} - https://phabricator.wikimedia.org/T109192#1825644 (Ottomata) declined>Resolved [17:10:06] (CR) Nuria: "There are two tasks on this regard: https://gerrit.wikimedia.org/r/#/c/252958/" [analytics/refinery] - https://gerrit.wikimedia.org/r/252956 (https://phabricator.wikimedia.org/T118570) (owner: DCausse) [17:13:28] (CR) DCausse: "https://gerrit.wikimedia.org/r/#/c/252958/ is meant for camus and will be used by the kafka decoders." [analytics/refinery] - https://gerrit.wikimedia.org/r/252956 (https://phabricator.wikimedia.org/T118570) (owner: DCausse) [17:13:33] ottomata: I'm preparing a deploy for aqs, we'll need that so Joseph can start backfilling [17:13:51] k [17:14:15] Analytics-Kanban: Write pageview API blogpost - https://phabricator.wikimedia.org/T118471#1825654 (kevinator) a:kevinator [17:15:30] thx milimetric :) [17:18:02] nuria: if you are doing cluster CRs, the cassandra one is ready for you as well :) [17:19:31] joal: ok, will do. [17:19:41] thks M'dame nuria :) [17:19:50] joal: we still want that on the outside depot right? [17:20:15] nuria: Can be in the refinery depot, it has it's own module [17:20:40] Analytics-Kanban: Write pageview API blogpost {melc} - https://phabricator.wikimedia.org/T118471#1825678 (kevinator) [17:20:47] joal: right, its own git module, correct? [17:21:11] no nuria, it's own maven module:) [17:21:48] joal: ah, ok, that is fine too [17:21:55] ok cool :) [17:22:14] like that nuria the code stays in refinery but is separated enough [17:22:22] sounds good [17:32:20] Analytics-Backlog, operations, HTTPS: EventLogging sees too few distinct client IPs - https://phabricator.wikimedia.org/T119144#1825736 (Nuria) @leila, @ottomata: note that even then not all evenlogging requests go through varnish, only the ones that come from the javascript client, there are 4 clients... [17:34:08] Analytics-EventLogging, Analytics-Kanban, EventBus, Patch-For-Review: Make eventlogging logs configurable via python config file - https://phabricator.wikimedia.org/T118903#1825746 (madhuvishy) a:Ottomata>madhuvishy [17:35:40] Analytics-Kanban: Prepare Pageview API lightning talk {melc} [5 pts] - https://phabricator.wikimedia.org/T119091#1825749 (mforns) [17:35:46] Analytics-Kanban: Write pageview API blogpost [8 pts] {melc} - https://phabricator.wikimedia.org/T118471#1825752 (JAllemandou) [17:36:01] Analytics-Kanban: Write pageview API blogpost {melc} [8] - https://phabricator.wikimedia.org/T118471#1825753 (Nuria) [17:36:44] Analytics-Kanban, Datasets-General-or-Unknown: Wikimedia "top" pageviews API has problematic double-encoded JSON [8 pts] {melc} - https://phabricator.wikimedia.org/T118931#1825754 (JAllemandou) [17:37:42] ottomata: ok, I merged the changes, so you can deploy aqs at your earliest [17:39:22] Analytics-Kanban: Encapsulating the retrieval of schemas from local depot from KafkaSchemaRegistry [3 pts] - https://phabricator.wikimedia.org/T119211#1825769 (Milimetric) [17:41:12] Analytics-Kanban: Backfill daily-top-articles in cassandra [2015-09-01 - 2015-11-16 (included)] [5 pts] {melc} - https://phabricator.wikimedia.org/T118991#1825774 (JAllemandou) [17:41:57] Analytics-EventLogging, Analytics-Kanban, EventBus, Patch-For-Review: Make eventlogging logs configurable via python config file [5 pts] {oryx} - https://phabricator.wikimedia.org/T118903#1825779 (JAllemandou) [17:43:06] Analytics-Kanban: Implement a simple public API to calculate global metrics {kudu} [0 pts] - https://phabricator.wikimedia.org/T117285#1825790 (JAllemandou) [17:44:33] Analytics-Kanban: Create a set of celery tasks that can handle the global metric API input {kudu} [0 pts] - https://phabricator.wikimedia.org/T117288#1825799 (madhuvishy) [17:46:29] Analytics-Kanban: Communicate the WikimediaBot convention {hawk} [5 pts] - https://phabricator.wikimedia.org/T108599#1825838 (JAllemandou) [17:48:51] Analytics-Kanban: Implement the logic of each node in the celery chain {kudu} [5 pts] - https://phabricator.wikimedia.org/T118309#1825860 (madhuvishy) a:madhuvishy [17:49:13] Analytics-Kanban: Define a first set of metrics to be worked for wikistats 2.0 {lama} [8 pts] - https://phabricator.wikimedia.org/T112911#1825863 (Nuria) [17:49:15] Analytics-Kanban, Analytics-Wikistats: {lama} Wikistats 2.0 - https://phabricator.wikimedia.org/T107175#1825864 (Nuria) [17:50:15] Analytics-Backlog, Research-and-Data: Double check Article Title normalization - https://phabricator.wikimedia.org/T108867#1825874 (JAllemandou) Related task: https://phabricator.wikimedia.org/T117945 [17:59:11] Analytics-Backlog, Research-and-Data: Research Spike: Article Title normalization contains weird chars - https://phabricator.wikimedia.org/T108867#1825920 (JAllemandou) [18:00:35] Analytics-Backlog, Research-and-Data: Research Spike: Article Title normalization contains weird chars [8 pts] {hawk} - https://phabricator.wikimedia.org/T108867#1532560 (JAllemandou) [18:05:55] Analytics-Kanban: cassandra backfill monitoring [0 pts] {slug] - https://phabricator.wikimedia.org/T115360#1825958 (JAllemandou) [18:06:47] Analytics-Backlog, Patch-For-Review: Wikimedia Analytics Refinery Jobs TestCamusPartitionChecker test failure when running as bd808 on stat1002 - https://phabricator.wikimedia.org/T119101#1825967 (Nuria) a:Nuria [18:07:07] Analytics-Kanban, Patch-For-Review: Wikimedia Analytics Refinery Jobs TestCamusPartitionChecker test failure when running as bd808 on stat1002 {hawk} - https://phabricator.wikimedia.org/T119101#1825968 (JAllemandou) [18:07:27] milimetric: hiya ok! [18:08:15] checking [18:08:45] check good [18:08:46] proceeding.. [18:08:55] !log deploying AQS [18:09:46] looks good from here [18:10:41] Hey! I am looking for a quotable source for the number of users on the Wikipedia, that have JS disabled. Couldn't find anything on the statistic pages I looked at so far so I thought anyone here might know something? :) [18:12:04] dcausse: yt? [18:12:27] frimelle: indeed, we calculated that a while back, it's just that wiki is hard to find, lemme see [18:12:28] need food back shortly [18:13:07] frimelle: https://www.mediawiki.org/wiki/Analytics/Reports/Clients_without_JavaScript [18:13:38] frimelle: feel free to link from a more "findable" place [18:13:56] frimelle: also note this was done almost a year ago [18:14:01] nuria Awesome, thank you so much! I will check where I'd expect it to be :) [18:14:26] Is there a plan to recalculate the numbers? [18:15:15] frimelle: no, but you can file a phab task if you want that done under 'analytics-backlog' (noting why is important so we can prioritize accordingly) [18:15:42] frimelle: our calculations were done to see if it was feasible to require javascript on logging, that was use case [18:16:17] nuria: Was more a question of curiosity and for the ArticlePlaceholder extension. But I'll consider opening a ticket! [18:16:57] frimelle: ok, if it is just curiosity those numbers should be plenty, note that ie8 will soon not receive in js [18:18:02] Okay, thanks! [18:18:49] Analytics-Kanban, Datasets-General-or-Unknown: Wikimedia "top" pageviews API has problematic double-encoded JSON [8 pts] {melc} - https://phabricator.wikimedia.org/T118931#1826020 (Addshore) The output is now much better and much easier to work with! Many thanks!!! [18:21:14] (CR) Nuria: Create and cleanup temp dirs in TestCamusPartitionChecker.scala (1 comment) [analytics/refinery/source] - https://gerrit.wikimedia.org/r/254350 (https://phabricator.wikimedia.org/T119101) (owner: BryanDavis) [18:22:04] Analytics-Kanban, Datasets-General-or-Unknown: Wikimedia "top" pageviews API has problematic double-encoded JSON [8 pts] {melc} - https://phabricator.wikimedia.org/T118931#1826032 (Milimetric) @Addshore: you're not supposed to thank us before I mention that we deployed the fix :) So, we deployed it, but... [18:24:58] Analytics-Backlog, Research-and-Data: Research Spike: Article Title normalization contains weird chars [8 pts] {hawk} - https://phabricator.wikimedia.org/T108867#1826050 (Milimetric) Related task confirming that redirect logic is a bit of a ball of spaghetti: T104755 [18:26:17] milimetric, I updated the star wars credits [18:26:21] yay :) [18:26:38] mforns: the new top format is deployed, and data is backfilling [18:26:48] (pure json now) [18:26:58] (Abandoned) Nuria: [WIP] Adding test schema to test json->avro publishing [analytics/refinery/source] - https://gerrit.wikimedia.org/r/254066 (owner: Nuria) [18:27:20] milimetric, cool! [18:30:02] Analytics-Backlog: Expose the results of the global metric at a public link, that's available immediately for the API {kudu} [8 pts] - https://phabricator.wikimedia.org/T118310#1826059 (mforns) a:mforns>None [18:30:11] (CR) Nuria: [C: 2 V: 2] Clean up IpUtil trusted proxy initialization [analytics/refinery/source] - https://gerrit.wikimedia.org/r/253793 (owner: BryanDavis) [18:33:55] mforns: +1 to your presentation, "ready to deploy" as far as I'm concerned [18:34:07] thanks milimetric [18:34:22] and mforns + madhuvishy: lemme know if you wanna talk about what Amanda said [18:34:39] milimetric, I didn't get her concern [18:35:03] so ultimately she wants someone on-wiki to hit a button and everything else to happen behind the scenes [18:35:12] milimetric: I agree, only if we are still gonna do the on-wiki thing [18:35:25] when she hits the button she gets a URL, and she can display the results of that URL on the wiki. [18:35:43] sure, but when we implement that, we can also come with a way of solving that detail no? I mean, that was your point [18:35:49] yeah, that's her ultimate goal. So the question is, do you guys wanna finish most of this work now or work on it more later [18:35:59] I think she's ok with just the on-wikimetrics version for now [18:36:16] yes, we can do that later, mforns [18:36:33] yes, I think we can leave it for later and deliver this (logged in) [18:37:21] milimetric, madhuvishy, I think the on-wiki thing is beatuiful, but the thing that will save the 800 hours of work is that what we're doing now [18:37:40] I don't have an opinion either way, I'll see if Amanda does. [18:37:49] ok [18:42:02] nuria: yes [19:00:19] Hah milimetric! I did not realise you hadn't announced it yet ;) [19:00:38] :) [19:00:41] I just loaded my thing, it was broken, so I fixed it by removing the evil hack I added for the double encoding stuff! [19:01:18] If you havnt already seen it I could probably make this thing more generic https://grafana.wikimedia.org/dashboard/db/wikidata-top-page-views [19:01:42] Hey addshore [19:01:59] I also threw together a rough draft of the JS part of a datasource plugin for grafana for it ;) https://phabricator.wikimedia.org/P2338 [19:02:02] new data should start flowing in tonight :) [19:03:16] addshore: the dashboard is cool ! [19:03:48] the grafana plugin is cooler ;) pick a page and a site and it will just graph it ;) [19:04:19] addshore: have an example ? [19:08:58] joal: give me 2 ticks! [19:09:18] sure addshore :) [19:09:20] thanks :) [19:13:33] (PS2) Madhuvishy: [WIP] Add new form for launching the global metrics report [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/254887 [19:17:10] dcausse: what do we have pending on the search changesets? seems that very little right? [19:17:37] milimetric: let's please not add a wiki ui [19:18:23] nuria: on analytics/camus/hive side I think we just have to review the changesets now. [19:18:23] nuria: I don't think we are, for now we're just making all the changes inside wikimetrics, with no changes to that interface [19:18:49] nuria: but I mean, if after that they say they want it, we have the tasks and we can choose to work on them or try to get someone else to work on them [19:18:55] on mediawiki side we still have to encode the schema id into the kafka message. [19:19:00] milimetric: ok, sounds good. I cannot see much value adding a ui on top of wikimetrics ui [19:19:29] well the proposal isn't to add a UI, they just want a single button [19:19:59] but the decoder shoud be backward compatible and accepts message without ids, so safe to deploy even if mediawiki is not ready [19:20:22] milimetric: ya, i think they need to understand a bit more of what goes behind the scenes, i am really not in favor of the 'button' that knows how to execute my use case [19:21:27] that's ok, nuria, but this is the solution that they want to drive towards. We're not doing it 100% right now, but maybe if you want, you can schedule a meeting and talk to Amanda about it. Personally I don't think it's a big deal either way [19:22:13] milimetric: ok, we can schedule that meeting once the workflow in wikimetrics is completed [19:22:17] dcausse: ok, lemme look at code again. ottomata: any further thoughts on avro ? [19:25:40] (PS3) Nuria: Add 2 payloads map fields to CirrusSearchRequestSet avro schema [analytics/refinery/source] - https://gerrit.wikimedia.org/r/252958 (https://phabricator.wikimedia.org/T118570) (owner: DCausse) [19:26:09] joal: back! just had an inpromptu at desk meeting :P I'll fire up my grafana instance now :) [19:26:17] oh wait, I think I actually took a screenshot the other day.. [19:26:38] https://usercontent.irccloud-cdn.com/file/Qm3wHk9I/ [19:27:19] So basically you could specify as many combinations of projects and pages as you want and have them graphed with data direct from the api [19:27:31] man, that's coooooool ! [19:27:39] I'll look at shoving the code in a git repo this week [19:27:49] (CR) Nuria: "Keeping this review up" [analytics/refinery/source] - https://gerrit.wikimedia.org/r/252958 (https://phabricator.wikimedia.org/T118570) (owner: DCausse) [19:27:58] great addshore ! [19:28:01] I also bashed this together for pulling data straight from TSVs github.com/addshore/grafana-tsv-datasource [19:28:07] http://github.com/addshore/grafana-tsv-datasource [19:29:21] Thanks for the link ! [19:29:21] * joal likes grafana ! [19:29:21] * addshore does too [19:29:21] and I finally got my head around writing plugins for it [19:29:21] I could probably also write an annotation plugin for it pulling anotations from the same pages / formats as dashiki [19:30:21] I didn't know grafan supported annotationds [19:30:56] yup! there are even some used in places, i believe pulled from elastic [19:33:47] (PS17) Nuria: Add support for custom timestamp and schema rev id in avro message decoders [analytics/refinery/source] - https://gerrit.wikimedia.org/r/251267 (https://phabricator.wikimedia.org/T117873) (owner: DCausse) [19:34:27] I think after over 3 years here, I figured out what's up with the dashboarding problem: [19:34:36] everyone just really wants to build their own dashboard [19:34:51] :D milimetric [19:35:11] :D [19:35:24] well, I figured if I can do everything in grafana, then, well, why not ;) [19:35:39] right, sure [19:35:44] IMO it's probably the easiest ;) [19:36:05] easy is very relative, I think easy usually means familiar [19:36:05] whoa, you can point grafana at tsvs at a remote uri now? [19:36:43] ottomata: well, with this plugin I made ;) [19:36:49] thought it needs some corners rounding [19:41:01] ottomata: more thoughts on avro stuff? I was going to polish a bit the current patch and merge it [19:42:07] nuria: i haven't looked at it since thursday, but i think it shouldn't support not having the integer ID [19:42:22] i think we should enforce it [19:42:33] cc dcausse [19:43:00] ottomata: why not try latest? if id is not present [19:43:39] (PS2) BryanDavis: Create and cleanup temp dirs in TestCamusPartitionChecker.scala [analytics/refinery/source] - https://gerrit.wikimedia.org/r/254350 (https://phabricator.wikimedia.org/T119101) [19:43:54] i belive dcausse made it try earliest, not latest, no? [19:46:14] (CR) BryanDavis: Create and cleanup temp dirs in TestCamusPartitionChecker.scala (1 comment) [analytics/refinery/source] - https://gerrit.wikimedia.org/r/254350 (https://phabricator.wikimedia.org/T119101) (owner: BryanDavis) [19:49:07] (CR) Joal: [C: 1] "Thanks Bryan for the modification !" [analytics/refinery/source] - https://gerrit.wikimedia.org/r/254350 (https://phabricator.wikimedia.org/T119101) (owner: BryanDavis) [19:49:52] milimetric, mforns_gym : https://wikimedia.org/api/rest_v1/metrics/pageviews/top/en.wikipedia/all-access/2015/11/22 [19:49:56] :D [19:50:14] joal: https://wikimedia.org/api/rest_v1/metrics/pageviews/top/en.wikipedia/all-access/2015/11/13 [19:50:16] ;) [19:50:30] milimetric: awesome :) [19:50:53] milimetric: now that it's checked, backfiiiiiiiiling ! [19:50:56] it'll respond with the new data once it's filled in [19:51:06] k, sweet [19:53:26] ottomata: lemme see.. would latest made more sense? [19:53:32] cc dcausse [19:53:54] nuria: probably not, since latest is less likely to work? or will nothing work i guess [19:54:28] i dunno, i just think that since an avro binary message needs a schema to be able be read, we should be able associate a message witha schema at all times [19:54:49] topic config is not set in stone, but revisions are immutable [19:54:54] ottomata, dcausse : ok, i am fine if we agree on that. [19:56:27] i mean, if dcausse needs backwards compatibility because of our mistake about how avro works, i think we should help him, but i'd prefer it we just fixed the problem for good [19:57:31] (PS18) Nuria: Add support for custom timestamp and schema rev id in avro message decoders [analytics/refinery/source] - https://gerrit.wikimedia.org/r/251267 (https://phabricator.wikimedia.org/T117873) (owner: DCausse) [19:58:13] ottomata: ok, i can do changes, let me talk to dcausse before i change stuff, i just submitted a patch that corrects typos and javadoc w/o changing code [19:59:15] bd808: did you tested this fix on cluster? https://gerrit.wikimedia.org/r/#/c/254350/2/refinery-job/src/test/scala/org/wikimedia/analytics/refinery/job/TestCamusPartitionChecker.scala [19:59:48] nuria: yes, it passed for me on stat1002 [19:59:59] but please do verify :) [20:00:03] bd808: ok, lemme test locally then and merge [20:08:06] (PS6) Madhuvishy: [WIP] Setup celery task workflow to handle running reports for the Global API [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/253750 (https://phabricator.wikimedia.org/T118308) [20:08:08] (PS3) Madhuvishy: [WIP] Add new form for launching the global metrics report [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/254887 [20:09:11] (CR) Nuria: [C: 2] Create and cleanup temp dirs in TestCamusPartitionChecker.scala [analytics/refinery/source] - https://gerrit.wikimedia.org/r/254350 (https://phabricator.wikimedia.org/T119101) (owner: BryanDavis) [20:12:00] (Merged) jenkins-bot: Create and cleanup temp dirs in TestCamusPartitionChecker.scala [analytics/refinery/source] - https://gerrit.wikimedia.org/r/254350 (https://phabricator.wikimedia.org/T119101) (owner: BryanDavis) [20:12:42] joal: can you tell me how did you graphed per host on the graphite UI? [20:24:28] nuria: graphite accepts wildcards [20:34:37] nuria: worked ? [20:34:57] joal: will try again, on interview, can you send me a sample url? [20:35:04] sure nuria [20:43:31] ragesoss: where is the wmflabs version of the education dashboard? [20:52:02] Analytics-Backlog: Provide a way to find out what is being searched for on a wiki - https://phabricator.wikimedia.org/T119439#1826482 (Superyetkin) NEW [20:53:06] !log Backfill november top data into cassandra (Json corrected) [20:53:29] Started more than an hour ago, but I forgot to send the message ... [20:54:53] Guys, I'm off for tonight [20:54:57] See y'all tomorrow [21:10:35] good night joal :) [21:22:21] Analytics-Backlog, Wikimedia-Developer-Summit-2016: Developer summit session: Pageview API overview - https://phabricator.wikimedia.org/T112956#1826577 (Dave_Braunschweig) I don't know where this would fit in terms of priority, but something I was just looking for is what might be called "Referrer URL"... [21:36:40] Analytics-Kanban: Write pageview API blogpost {melc} [8] - https://phabricator.wikimedia.org/T118471#1826604 (Milimetric) a:kevinator>Milimetric [21:45:26] Analytics-Backlog, Research-and-Data: Research Spike: Article Title normalization contains weird chars [8 pts] {hawk} - https://phabricator.wikimedia.org/T108867#1826648 (Halfak) See also https://mako.cc/academic/hill_shaw-consider_the_redirect.pdf for a discussion of redirects and page views. [22:03:24] (PS3) Bearloga: Functions for categorizing queries. (Work In Progress) [analytics/refinery/source] - https://gerrit.wikimedia.org/r/254461 (https://phabricator.wikimedia.org/T118218) [22:04:56] (CR) jenkins-bot: [V: -1] Functions for categorizing queries. (Work In Progress) [analytics/refinery/source] - https://gerrit.wikimedia.org/r/254461 (https://phabricator.wikimedia.org/T118218) (owner: Bearloga) [22:05:21] (CR) Bearloga: Functions for categorizing queries. (Work In Progress) (1 comment) [analytics/refinery/source] - https://gerrit.wikimedia.org/r/254461 (https://phabricator.wikimedia.org/T118218) (owner: Bearloga) [22:22:57] (CR) EBernhardson: Functions for categorizing queries. (Work In Progress) (1 comment) [analytics/refinery/source] - https://gerrit.wikimedia.org/r/254461 (https://phabricator.wikimedia.org/T118218) (owner: Bearloga) [22:35:51] Analytics-Tech-community-metrics, DevRel-November-2015: Explain / sort out / fix SCM repository number mismatch on korma - https://phabricator.wikimedia.org/T116483#1826788 (Lcanasdiaz) Fixed in upstream https://github.com/VizGrimoire/GrimoireLib/commit/35ca6f9eb0773b828d17e76e6a1d571184b900dc [22:36:46] harej: outreachdashboard.wmflabs.org [22:37:18] Thank you. I see you still have to register courses with the dashboard by hand? [23:01:58] ragesoss: ^ [23:03:03] harej: not sure what you mean. [23:05:38] ragesoss: Well, under [[Education Program:Wikimedia DC]] I have several courses and I would like them to show up on this dashboard. [23:18:14] Analytics-Kanban, operations, Database: db1046 running out of disk space - https://phabricator.wikimedia.org/T119380#1826987 (Milimetric) @jcrespo: I think m4-master holds on to data for too long. Since everything is replicated to analytics-store, I think we can shorten the amount of time data lives on...