[09:57:48] Hi a-team ! [10:39:29] Analytics-Engineering, Wikidata: Dashboard repository for limn-wikidata-data - https://phabricator.wikimedia.org/T112506#1636649 (Addshore) NEW [10:40:34] Analytics-Engineering, Wikidata: Dashboard repository for limn-wikidata-data - https://phabricator.wikimedia.org/T112506#1636656 (Addshore) [10:40:40] Analytics-Engineering, Wikidata: Dashboard repository for limn-wikidata-data - https://phabricator.wikimedia.org/T112506#1636659 (aude) where, what data? [10:45:45] Analytics-Engineering, Wikidata: Dashboard repository for limn-wikidata-data - https://phabricator.wikimedia.org/T112506#1636668 (Addshore) Initially looking to replace the "Wikidata community engagement tracking" spreadsheet. [10:47:31] Analytics-Engineering, Wikidata: Dashboard repository for limn-wikidata-data - https://phabricator.wikimedia.org/T112506#1636680 (aude) hmmm... no idea what that is but ok. replacing a spreadsheet sounds good :) [10:49:53] Analytics-Engineering, Wikidata: Dashboard repository for limn-wikidata-data - https://phabricator.wikimedia.org/T112506#1636688 (Addshore) Indeed! Some of the information can already be pulled from the dbs. Other information will need scripts to run to collect and put in some tables in some other db on t... [12:09:54] Analytics-Tech-community-metrics, DevRel-September-2015: Automate the identities generation - https://phabricator.wikimedia.org/T111767#1636931 (Dicortazar) @Qgil, @Aklapper, the automated way to deal with identities is now working. The bitbucket repository is now being updated using the new identities a... [12:10:04] Analytics-Tech-community-metrics, DevRel-September-2015: Automate the identities generation - https://phabricator.wikimedia.org/T111767#1636932 (Dicortazar) Open>Resolved [12:10:06] Analytics-Tech-community-metrics, DevRel-September-2015: "Median time to review for Gerrit Changesets, per month": External vs. WMF/WMDE/etc patch authors - https://phabricator.wikimedia.org/T100189#1636934 (Dicortazar) [12:12:40] Analytics-Engineering, Wikidata: Dashboard repository for limn-wikidata-data - https://phabricator.wikimedia.org/T112506#1636944 (JanZerebecki) What type of repo are we talking about here? [12:12:53] Analytics-Tech-community-metrics, DevRel-September-2015: Mysterious repository breakdown(s)/sorting order - https://phabricator.wikimedia.org/T103474#1636947 (Dicortazar) a:Lcanasdiaz>Acs [12:14:29] Analytics-Engineering, Wikidata: Dashboard repository for limn-wikidata-data - https://phabricator.wikimedia.org/T112506#1636956 (JanZerebecki) Ah this: https://wikitech.wikimedia.org/wiki/Analytics/Dashboards#Make_repository_for_queries_and_configuration [12:17:17] Analytics-Tech-community-metrics, DevRel-September-2015: Mysterious repository breakdown(s)/sorting order - https://phabricator.wikimedia.org/T103474#1636966 (Dicortazar) This is fixed for the list of projects in SCR. Closing the task and kudos to @Lcanasdiaz and @Acs! [12:23:40] Analytics-Engineering, Wikidata: Dashboard repository for limn-wikidata-data - https://phabricator.wikimedia.org/T112506#1636987 (aude) so, we make one of https://meta.wikimedia.org/wiki/Research:Data/Dashboards ? looks somewhat straightforward to do [12:31:49] Analytics-Tech-community-metrics, DevRel-September-2015: Automate the identities generation - https://phabricator.wikimedia.org/T111767#1637055 (Aklapper) Thanks! So this brings us closer to resolving T100189! [12:36:16] Analytics-Tech-community-metrics, DevRel-September-2015: "Median time to review for Gerrit Changesets, per month": External vs. WMF/WMDE/etc patch authors - https://phabricator.wikimedia.org/T100189#1637068 (Dicortazar) Ok, this is already rsync in the server. People out of the four organizations plus In... [12:36:26] Analytics-Tech-community-metrics, DevRel-September-2015: "Median time to review for Gerrit Changesets, per month": External vs. WMF/WMDE/etc patch authors - https://phabricator.wikimedia.org/T100189#1637069 (Dicortazar) Open>Resolved [12:36:26] Analytics-Tech-community-metrics, DevRel-September-2015: Tech community KPIs for the WMF metrics meeting - https://phabricator.wikimedia.org/T107562#1637070 (Dicortazar) [12:36:42] Analytics-Tech-community-metrics, DevRel-September-2015: "Median time to review for Gerrit Changesets, per month": External vs. WMF/WMDE/etc patch authors - https://phabricator.wikimedia.org/T100189#1307473 (Dicortazar) Closing this task as resolved. Kudos to @sduenas. [12:38:58] Analytics-Tech-community-metrics, DevRel-September-2015: Provide open changeset snapshot data on Sep 22 and Sep 24 (for Gerrit Cleanup Day) - https://phabricator.wikimedia.org/T110947#1637079 (Dicortazar) Ok, I get the point :). I'll simply calculate the open changesets on the 22th and on the 24th. Indee... [12:49:14] Analytics-EventLogging: Update EventLogging documentation {stag} [3 pts] - https://phabricator.wikimedia.org/T112313#1637110 (Ottomata) [12:51:49] Analytics-Tech-community-metrics, DevRel-September-2015: Patches with Verified -1 should not be counted as open in our code review metrics - https://phabricator.wikimedia.org/T108507#1637112 (Dicortazar) Wondering if having that Verified -1 in mind, the definition of 'open' is the definition of 'patchsets... [12:52:12] Analytics-Backlog: Investigate sample cube pageview_count vs unsampled log pageview count - https://phabricator.wikimedia.org/T108925#1637113 (JAllemandou) @JKatzWMF I can try to help :) I am currently computing the daily pageview / webrequest rate for august, to see if there could be big differences daily (I... [13:08:01] Analytics-Engineering, Wikidata: Dashboard repository for limn-wikidata-data - https://phabricator.wikimedia.org/T112506#1637151 (Addshore) Indeed. We would need somewhere that runs scripts to pull data and store in some sql tables somewhere on the analytics cluster. Then this repo would contain the stuf... [13:12:59] Analytics-Tech-community-metrics, DevRel-September-2015: Patches with Verified -1 should not be counted as open in our code review metrics - https://phabricator.wikimedia.org/T108507#1637161 (Qgil) At http://korma.wmflabs.org/browser/gerrit_review_queue.html we have "New" and "Waiting for review", and the... [13:16:56] (PS1) Joal: Update code and separated repos scheme [analytics/ua-parser] - https://gerrit.wikimedia.org/r/238139 (https://phabricator.wikimedia.org/T106134) [13:26:50] Analytics-Engineering, Wikidata: Dashboard repository for limn-wikidata-data - https://phabricator.wikimedia.org/T112506#1637198 (JanZerebecki) https://wikitech.wikimedia.org/wiki/Analytics/Dashboards also describes how it can contain SQL queries that get executed regularly on stat1003.eqiad.wmnet . But i... [13:28:21] Analytics-Tech-community-metrics, DevRel-September-2015: "Median time to review for Gerrit Changesets, per month": External vs. WMF/WMDE/etc patch authors - https://phabricator.wikimedia.org/T100189#1637199 (Qgil) Good! Before we had Wikia as well. Any @wikia.com contributor can be safely assigned to Wi... [13:30:45] Analytics-Engineering, Wikidata: Dashboard repository for limn-wikidata-data - https://phabricator.wikimedia.org/T112506#1637203 (JanZerebecki) @aude one of the things in this would be {T109310} [13:36:58] Analytics-Engineering, Wikidata: Dashboard repository for limn-wikidata-data - https://phabricator.wikimedia.org/T112506#1637227 (aude) @JanZerebecki that's one of the reasons I am asking. I would like to proceed with getting that done. :) [13:40:02] Analytics-Engineering, Wikidata: Dashboard repository for limn-wikidata-data - https://phabricator.wikimedia.org/T112506#1637243 (Christopher) @JanZerebecki It would nice to consolidate the post-processed "ready to use" export data sets in a single directory that can be accessed by different presentation... [13:46:11] halfak: Ironholds around? [13:57:46] hey ottomata [13:59:39] hiya! [14:00:07] Is that you having added legoktm to the ua-parser CR? [14:00:23] I wonder if I made a mistake of if somebody added him [14:00:31] joal: sorry if the pageview api survey caught you by surprise. We should talk after you talk to Andrew [14:00:55] milimetric: indeed this morning I had some interesting reading :) [14:01:08] ottomata: --^ [14:01:56] milimetric: ready if you want :) [14:02:05] irc, cave ? [14:02:25] Analytics-Tech-community-metrics, ECT-October-2015: Correct affiliation for code review contributors of the past 30 days - https://phabricator.wikimedia.org/T112527#1637282 (Qgil) NEW [14:02:45] joal: not me [14:02:54] k ottomata hx :) [14:02:57] thanks sorry [14:03:15] cave, brt [14:03:21] k milimetric [14:04:46] *waves at everyone* [14:05:13] I'm just wondering if it is possible to deduce any posted data from the Webrequest logs in hadoop? [14:05:35] o/ aude [14:05:41] addshore: POST requests, yes [14:05:45] see the http_method field [14:05:55] ottomata: but probably not any of the data contained in the body? [14:05:56] Analytics-Tech-community-metrics, ECT-October-2015: Affiliations and country of resident should be visible in Korma's user profiles - https://phabricator.wikimedia.org/T112528#1637297 (Qgil) NEW [14:05:58] ah, addshore no [14:06:00] no body [14:06:02] okay :) [14:06:33] I'm finally getting around to looking at hive a bit and was just wondering! :) [14:07:52] Right now it looks like some people make requests to the api and include the action to be performed in the body rather than as a query param [14:09:01] halfak: wondering if you are coming to wikiconference usa? [14:09:10] or Ironholds [14:09:32] aude, not 100% sure yet. I'm still waiting to hear whether my talk was accepted. [14:09:36] :/ [14:09:41] But it seems likely. [14:09:59] * aude would like someone to show me how to use analytics stuff :) [14:10:16] since seems we want to do more of that for wikidata [14:11:06] :) seems like that's something we could work out either way. [14:11:12] halfak: ok [14:11:17] What sort of analyses are you looking to do ? [14:11:34] i'll probalby have something already by then, but want to understand what else we can do [14:11:41] * aude made http://tools.wmflabs.org/audetools/wikidata-stats/ [14:12:48] probably can use the limn dashboard for that to show this over time [14:13:06] o/ milimetric [14:13:13] what's the status of limn? [14:13:27] Should we be trying to use the Graph extension instead? [14:13:35] halfak: I'll maintain / help but we're not putting new investment in it [14:13:39] what's the use case? [14:13:40] ok [14:13:51] the reportcard graphs [14:13:52] Reports for WikiData [14:16:01] holaaaa [14:16:45] one sec guys [14:17:57] hola nuria [14:18:37] Analytics-Tech-community-metrics, ECT-October-2015: Affiliations and country of resident should be visible in Korma's user profiles - https://phabricator.wikimedia.org/T112528#1637348 (Qgil) Actually, according to https://www.mediawiki.org/w/index.php?title=Community_metrics&diff=0&oldid=1870477 this data... [14:18:44] * aude also sees https://vital-signs.wmflabs.org/ [14:18:54] which might be a nice approach [14:19:18] aude, ok, so do you want timeseries data? [14:19:23] milimetric: yes [14:19:48] basically, i run a query on all wikipedias (and other projects that use wikidata) [14:19:54] so the table you have on tool labs with a time dimension [14:19:57] and want to see the results in a table + graph [14:20:05] yes [14:20:22] * aude has done something like this with rrdtool but it's not very pretty [14:20:45] hm, vital signs doesn't have a table display like limn does. But it looks like you need every wiki, which is not feasible on a limn graph [14:21:14] milimetric: could have one graph per wiki [14:21:25] but nice to see comparison [14:21:57] that would be 850 graphs and make the dashboard meaningless for all but a couple of people [14:22:08] true :/ [14:23:05] but i want to know which wikis have the most usage (possibly normalized by number of pages with site links... e.g. are connected to wikidata) [14:25:26] aude: ok, so the dashboarding tools we have right now won't bubble up "top" wikis [14:25:33] :/ [14:25:47] it sounds like you have a few different things you want, but the first thing is to get your data [14:25:52] yeah :) [14:26:14] once i find wikis with the most usage (or most increased since last month) [14:26:15] so, you're generating this per-day on tool labs? [14:26:33] then i can go find pages that are using a lot of wikidata and see how it's being used [14:26:42] not per-day yet [14:27:05] but want that as a next step and need to figure out where to store the results [14:27:09] ok, and are you sure you actually want a dashboard and not just information on most-increased since last month? [14:27:25] * aude wants both :) [14:28:04] the reason I ask is because a dashboard with 6 metrics over 850 wikis will mean almost 5000 queries running daily and monitoring, etc. [14:28:24] doesn't have to be daily [14:28:28] weekly would be ok [14:28:29] One comment, if you are looking for changes, make sure to plot the change itself. [14:28:45] but if you really want that, then dashiki would be a good tool (the metrics-by-project layout of dashiki is what was used to put up vital-signs) [14:29:22] can look at dashiki :) [14:29:23] ok, weekly's fine too, but still, it's a lot of queries and a lot of work to make sure nothing's going wrong [14:29:40] aude: there's not much docs yet because others haven't used it yet [14:29:40] the queries are quick [14:29:50] 5000 * quick = slow :) [14:30:02] it's only one query [14:30:11] oh ok, good [14:30:13] * number of wikidata clients [14:30:14] that'll help [14:30:38] aude, I have some utilities to help run the same query against multiple wikis too. [14:30:52] halfak: already have something also [14:31:07] ok, so for dashiki to find the data easily, we put it in TSV files hosted publicly, in a path that looks like: <>/metrics/<>/<>.tsv [14:31:08] https://github.com/halfak/Multiquery [14:31:12] :) [14:31:25] tsv would be easy [14:32:48] aude, here's an example of that: https://edit-analysis.wmflabs.org/compare/ [14:32:51] * addshore wonders how long would be too long for a hive query.... [14:33:13] (that's a different dashiki layout), and here are the datafiles for it: http://datasets.wikimedia.org/limn-public-data/metrics/ [14:33:45] milimetric: looks nice [14:33:48] for example the "Rates over time" graph takes its data from http://datasets.wikimedia.org/limn-public-data/metrics/rates/ [14:33:58] ottomata: thanks for the news on decommisionning :) [14:34:21] so once you have your files in that kind of structure, you define your metrics here: https://meta.wikimedia.org/wiki/Dashiki:CategorizedMetrics [14:34:48] ok [14:34:59] aude, where do I learn more about the metrics you are using. [14:35:04] Or can I see your SQL? [14:35:36] aude: yeah, I recommend working with halfak to make sure you're plotting what's most interesting (the change over the last month, absolute value, etc.) [14:35:55] it's super ugly but [14:35:57] https://github.com/filbertkm/WikidataStats/blob/master/src/Console/Commands/StatsUpdateCommand.php#L38-L41 :) [14:36:01] or hacky [14:36:09] and i don't run it over time yet [14:36:27] i think we want totals to start with [14:36:30] and ping me once you have that stuff, I can help after that. FYI: the multimedia team is doing the same thing (setting up a vital-signs-like dashboard). Here's their gerrit change with my comments: https://gerrit.wikimedia.org/r/#/c/237134/ [14:36:40] they're using a reportupdater python runner that runs over all the wikis [14:36:40] then totals + change since last month (perhaps) [14:36:53] milimetric: thanks [14:36:54] but it sounds like you'll take care of the data generation, so that's not super important [14:37:17] data generation is somewhat easy, but then i got stuck on how best to store the data [14:37:36] tsv is simple enough [14:37:48] aude: TSV files are good if you have a tool like reportupdater that makes sure all the days you want are in there [14:38:04] otherwise, you can store in an intermediary mysql table and dump results to files as you need [14:38:08] * aude nods [14:38:10] ok [14:38:56] aude: if you want to play with dashiki, you can follow the README and get it set up locally, and ask me where there are holes in the explanation [14:39:07] ok [14:45:07] o/ aude, I was looking through that wbc_entity_usage table and it doesn't look like you are getting all that you could out of that query. [14:45:18] It seems like eu_aspect should have an index, but it doesn't. [14:45:42] If it did, then your query would run in fractions of a second on the biggest wikis even when this table gets really big. [14:45:54] halfak: hmmm [14:46:09] I think we might consider filing a bug [14:46:17] What's the production purpose of this table? [14:46:45] when a wikidata item changes, we know which client pages to purge / update [14:46:50] based on usage [14:47:04] Gotcha. So it's not just here for analytics. [14:47:06] and if a page is just using sitelinks, then if a statement changes, we don't have to update [14:47:09] yep [14:47:38] Do you think it would make sense to have an index on eu_aspect? [14:47:48] it might [14:48:14] So, you are breaking things down by eu_aspect to get counts. Any other breakdowns that would be interesting? [14:48:29] not at this point [14:48:32] e.g. eu_aspect, eu_entity_id, count(*) [14:48:34] kk [14:48:43] How long are your queries generally running? [14:48:47] per wiki [14:48:51] these were very quick [14:49:00] can't give a number but quick [14:49:04] on labs [14:49:08] kk. [14:49:16] * halfak checks on analytics replicas [14:49:31] Yeah. It's doing a full table scan. [14:49:35] Which isn't terrible. [14:49:35] :/ [14:49:40] It'll probably take a few minutes. [14:49:42] an index could be nice [14:49:45] Yeah. [14:49:51] Would make this a non-issue. [14:50:13] So, we have a couple of options. Our old DBA didn't like adding indexes to the prod DBs for analytics purposes. [14:50:17] we potentially might also like a special paeg and api module as interface to this information [14:50:23] But he would add them to our analytics replicas. [14:50:31] like give me all pages on enwiki that use Q64 [14:50:37] That sounds like a prod use-case to me. [14:50:39] ;) [14:50:40] :) [14:50:41] yep [14:50:44] WHich is convenient [14:51:08] the totals could potentially go in Special:Statistics :) [14:51:13] if we want [14:51:17] what kind of values are in the eu_aspect field? [14:51:25] afaik, a string [14:51:36] like 'L' = label [14:51:57] Why are you getting counts based on it? [14:52:12] It seems like Q64 would be an entity_id, right? [14:52:20] we wwant usage based on type [14:52:32] This L... label is a type? [14:52:36] e.g. sitelinks is not as interesting [14:52:41] L is a type [14:52:45] Gotcha. [14:52:58] What does "L.en" mean? [14:53:07] any is interesting which means the full entity is accessed in lua [14:53:23] "L.en" means the english label is used [14:53:53] on wikis like commons, the cache is split by language so we need to break label usage down by language [14:54:20] for statistics don't think we care about which language... just want aggregate [15:14:31] aude, is the type of the entity usage always represented by the first char of eu_aspect? [15:14:35] hey a-team, who's got the most recent perl fu and wants to look at a task? [15:14:41] halfak: yes [15:15:35] OK. I think a btree index will work like we'd need it to then. [15:15:48] ok [15:16:08] I just did a query against the user table that has a Btree index on user_name. "SELECT LEFT(user_name, 1), count(*) FROM user GROUP BY 1;" [15:16:14] It finished in 14.92 seconds [15:16:22] sounds ok [15:16:37] That's actually much slower than I expected, but still quite fast. [15:16:46] 26.2 million rows [15:16:53] would be sufficient [15:17:19] not sure if it helps, but there are fewer distinct eu_aspect values [15:18:10] GOod point. [15:19:20] OK. So I think the Right(TM) place to start is a phab ticket. [15:19:30] ok [15:19:54] something like run these queries on analytics at x interval and produce tsvs [15:20:03] (and optimize the queries as needed) [15:20:14] + indexes [15:20:31] I was just thinking indexes for right now. [15:20:37] https://phabricator.wikimedia.org/T110664 [15:20:38] Assuming you want to keep using your own script [15:20:50] it doesn't need to be my script [15:21:39] milimetric, sorry to bug again. We talked about a more standard way to do daily, cron-y, SQL scripts to append to TSVs. Did anything ever come of that? [15:22:23] np halfak, yes, we have reportupdater [15:22:32] i can look at that [15:22:48] the gerrit change i pasted above has an example [15:22:53] ok [15:23:08] https://gerrit.wikimedia.org/r/#/c/237134/ [15:23:12] thanks [15:23:45] i think that's what we want [15:24:01] \o/ [15:24:06] you basically put sql in the folder and configure it. There are options for running over all wikis or you can specify a subset of wikis [15:24:11] looks easy [15:24:41] (CR) Nuria: [C: 1] [WIP] Update agent_type in webrequest [analytics/refinery] - https://gerrit.wikimedia.org/r/237419 (https://phabricator.wikimedia.org/T108598) (owner: Joal) [15:25:00] there are other examples in the repos called limn-flow-data, limn-edit-data [15:25:19] we are getting one for wikidata [15:25:38] afaik there are other things we want, in addition to usage tracking [15:26:19] ottomata: never did perl :( [15:26:55] hah, ok! [15:26:58] i did! [15:27:02] am just looking for a second opinion [15:34:21] Analytics-Kanban: Document work so far on Last access uniques and validating the numbers {bear} - https://phabricator.wikimedia.org/T112010#1637610 (madhuvishy) Documented all the work done here: https://wikitech.wikimedia.org/wiki/Analytics/Unique_clients/Last_access_solution/Validation [15:35:48] Analytics-Dashiki, Analytics-Kanban, Browser-Support-Firefox, Patch-For-Review: vital-signs doesn't display pageviews graph in Firefox 41, 42 {crow} [3 pts] - https://phabricator.wikimedia.org/T109693#1637619 (Milimetric) The missing data is due to labs being inadequately powerful to run all the met... [15:41:36] Analytics-Kanban: Document work so far on Last access uniques and validating the numbers {bear} - https://phabricator.wikimedia.org/T112010#1637635 (kevinator) [15:45:25] Is there an easy way to insert the result of a hive query straight into an sql table? [15:46:11] ottomata: i tried to pass hive.plan.serialization.format=javaXML [15:46:16] to hive with -D [15:46:23] but no success [15:46:49] I tried inside a hive shell using SET [15:46:54] no chance either [15:49:23] ottomata: can you pass hive -f -Dblah=blah (seems not to work) [15:50:43] nuria: it has worked for me in the past i think [15:50:53] I also think it works nuria [15:51:35] joal: ok, will try again, and will try to remove code here & there (other genericudfs work so some piece of code is trigering this) [15:51:45] nuria: actually : --hiveconf [15:52:07] as in hive -f --hiveconf (what else?) [15:52:10] Analytics-Engineering, Wikidata: Dashboard repository for limn-wikidata-data - https://phabricator.wikimedia.org/T112506#1637698 (JanZerebecki) Yes that would be what the csv/tsv files are for, I was in no way suggesting not to create them. [15:52:20] if it is hiveconf [15:52:22] i think you can do [15:52:28] set hive.plan.serialization.format= ...; [15:52:33] in the hive prompt [15:52:42] Yup, that's sure [15:52:52] I tried, and got a different error [15:53:28] will remove some code from udf and retry [15:53:51] nuria: where is code again? [16:01:10] nuria: some info [16:01:53] joal:yes [16:02:06] when running the query with no reduce: works [16:02:15] So code works if no serialisation needed [16:02:20] right [16:02:43] joal: and other genericUDFs work [16:02:51] joal: so some piece is triggering this [16:03:07] And with the javaXML set, I get that error: https://gist.github.com/jobar/0dafbbd4eb1549aa2182 [16:03:23] nuria: in tasking, will talk after :) [16:03:59] Analytics-Backlog, Analytics-Wikimetrics, Puppet: Cleanup Wikimetrics puppet module so it can run puppet continuously without own puppetmaster {dove} - https://phabricator.wikimedia.org/T101763#1637743 (madhuvishy) [16:06:06] Analytics-EventLogging, Analytics-Kanban, Patch-For-Review: Handle duplicate insert error {stag} - https://phabricator.wikimedia.org/T112265#1637760 (Milimetric) a:Ottomata [16:08:14] Analytics-EventLogging, Analytics-Kanban, Patch-For-Review: Handle duplicate insert error {stag} [5 pts] - https://phabricator.wikimedia.org/T112265#1637781 (mforns) [16:11:17] Analytics-Kanban, Patch-For-Review: Create Hadoop Job to load data into cassandra [21 pts] {slug} - https://phabricator.wikimedia.org/T108174#1637800 (Milimetric) [16:27:02] Analytics-Backlog, Analytics-EventLogging: Regularly purge EventLogging data in Hadoop {stag} [8 pts] - https://phabricator.wikimedia.org/T106253#1637864 (Milimetric) [16:29:57] Analytics-Cluster, Analytics-Kanban: Create Kafka deployment checklist on wikitech {hawk} - https://phabricator.wikimedia.org/T111408#1637882 (Milimetric) [16:35:07] Analytics-Backlog, Analytics-EventLogging: Send EventLogging validation logs to Logstash {oryx} - https://phabricator.wikimedia.org/T111412#1637902 (kevinator) [16:48:06] Analytics-Backlog, Analytics-EventLogging: Research sending EventLogging validation logs to Logstash {oryx} [5 pts] - https://phabricator.wikimedia.org/T111412#1637989 (kevinator) [16:48:59] Analytics-Backlog, Analytics-EventLogging: Regularly purge EventLogging data in Hadoop {stag} [8 pts] - https://phabricator.wikimedia.org/T106253#1637994 (Milimetric) https://www.elastic.co/guide/en/logstash/current/plugins-inputs-kafka.html [16:49:23] Analytics-Backlog, Analytics-EventLogging: Research sending EventLogging validation logs to Logstash {oryx} [5 pts] - https://phabricator.wikimedia.org/T111412#1603272 (Milimetric) https://www.elastic.co/guide/en/logstash/current/plugins-inputs-kafka.html [16:51:02] Analytics-Backlog: Prepare lightning talk on EL audit {tick} [5 pts] - https://phabricator.wikimedia.org/T112126#1638000 (Milimetric) [16:54:13] Analytics-Backlog, Analytics-Cluster: Spike replacing Camus with Gobblin {hawk} - https://phabricator.wikimedia.org/T111409#1638014 (kevinator) documentation on Goblin: https://github.com/linkedin/gobblin/wiki/Kafka-HDFS-Ingestion [16:57:17] Analytics-Backlog, Analytics-Cluster: Spike replacing Camus with Gobblin {hawk} [13 pts] - https://phabricator.wikimedia.org/T111409#1638030 (Milimetric) [17:00:04] Analytics-Backlog, Discovery: Present Google Referrer Traffic analysis at Metrics meeting - https://phabricator.wikimedia.org/T109773#1638038 (kevinator) I think we can resolve this task as "declined" since it was decided not to go through with this. [17:06:38] nuria: have you seen the error I get with the jqvqXML parameter set ? [17:07:14] weird, heh ? [17:17:32] Analytics-Backlog: Resurrect logbot so we can leave a breadcrumb trails for incident reports, teammates, etc. - https://phabricator.wikimedia.org/T111393#1638143 (kevinator) captures IRC chats and logs them. This would help us keep track of when things were deployed and why some actions were taken. [17:19:34] !log test_logbot This is a test after reading that logbot should works for us. [17:24:40] Analytics-Backlog: Resurrect logbot so we can leave a breadcrumb trails for incident reports, teammates, etc. [5 pts] - https://phabricator.wikimedia.org/T111393#1638194 (Milimetric) [17:25:12] ottomata: coming to batcave for candidate debrief ? [17:25:34] oh! [17:25:36] yes eating lunch too [17:43:12] joal: is this the CR? [17:43:42] nuria ? [17:43:52] joal: https://gerrit.wikimedia.org/r/#/c/238139/ [17:44:03] yup [17:44:33] Almost nothing nuria: deleted all content, added git submodules [17:44:45] k, let me get the depo cause i do not have it [17:44:50] np [17:45:06] ottomata: how should I push tags on gerrit ? [17:45:15] for the uap-core and uap-java repos [17:45:37] i think you have to push them directly [17:45:39] you can't push them for review [17:45:40] so [17:45:46] git push gerrit v1.1.1 [17:45:48] i think. [17:45:49] I am planning on tagging them v1.3.0-wmf2 [17:45:54] ok [17:45:57] k sounds good [17:46:02] if that doesn't work, you might not have perms, lemme know [17:46:08] The previous one was 1.3.0-wmf1 [17:46:10] sure [17:46:13] thx [17:46:35] I mean, there'll be nothing more than tags, cause the code doesn't change ! [17:46:36] oh, joal, hm. if this all changed, mabye a new v is called for? [17:46:41] oh ok [17:46:45] code didnt' change? ok. [17:47:02] ottomata: v nulber is based on uap- v number [17:47:11] Then I changed the wmf1/52 [17:47:16] wmf1/2 [17:47:16] ah ok [17:47:17] cool [17:47:17] sorry [17:47:19] is good then [17:47:23] awesome [17:49:34] joal: tried to update submodules [17:49:38] https://www.irccloud.com/pastebin/6l6VMBxi/ [17:51:15] joal: letme see if some config file is listing your user [17:59:57] nuria: I guess there is [18:00:32] nuria: .gitmodules [18:00:36] Will change that [18:03:40] (PS2) Joal: Update code and separated repos scheme [analytics/ua-parser] - https://gerrit.wikimedia.org/r/238139 (https://phabricator.wikimedia.org/T106134) [18:32:27] joal: k [18:32:37] nuria hm? [18:33:47] joal: now it looks good, should i merge? [18:34:02] please go :) [18:34:31] nuria: I'll push tags on each submodule and the parent one, and try to deploy the jar on archiva [18:34:31] ottomata, informally; so if I said "eventlogging started producing duplicated IDs in schemas some time after alst Thursday's deploy" you would say..? [18:34:42] ottomata: I'll need some help on deploy [18:34:56] Depending on how you are ottomata, today or tomorrow [18:35:14] today is good [18:35:16] hi yalls [18:35:17] (CR) Nuria: [C: 2 V: 2] "Tested "git submodule update --init" works as it should and that repo has the right set of files. Did not tested correctness of code." [analytics/ua-parser] - https://gerrit.wikimedia.org/r/238139 (https://phabricator.wikimedia.org/T106134) (owner: Joal) [18:35:27] Ironholds: in mysql? [18:36:02] ottomata, indeed [18:36:13] there are records with the same uuid? [18:36:16] multiple? [18:37:08] joal: how can I help? [18:37:22] ottomata, event_pageid apparently - https://etherpad.wikimedia.org/p/completion-ab-sanity-check [18:37:32] wondering if you had any thoughts, since you've been an active EL tweaker over the last few weeks [18:37:36] oh, not uuids [18:37:37] ok [18:37:54] I will need to deploy a new jar in archiva, but not through the mvn stuff (pom is not 'ours') [18:37:55] yeah, so this is totally possible,um, hm. [18:38:08] i saw erik's email but haven't had time to look yet [18:39:37] ah, ja joal you can log into archiva and just upload the jar manually thorugh the UI [18:39:47] oh, didn't know ! [18:39:51] thx ottomata :) [18:39:54] Will try that [18:39:58] joal: https://wikitech.wikimedia.org/wiki/Archiva#Uploading_Dependency_Artifacts [18:46:20] * milimetric lunch [18:49:15] Ironholds: that is not a uuid [18:49:29] Ironholds: the event_pageid [18:53:39] Ironholds: can you give me an example event_pageid [18:53:48] nuria, did you see the etherpad? [18:53:50] ottomata, sure! [18:53:53] lemme grab one [18:54:10] Ironholds: ya, sorry [18:54:38] ottomata, 7a123a0719685cea [18:54:48] Ironholds: i was saying that event_pageid is sent client side , so if the rest of the event is not the same [18:54:58] Ironholds: it points to a client (not a server) error [18:55:08] Ironholds: ahem,... makes sense? [18:55:35] nuria, it does! [18:55:40] uh, Ironholds, i connect so infrequently to mysql [18:55:42] what is hte hostname? [18:55:50] analytics-store.eqiad.wmnet ! [18:55:58] thank you both very much for looking at this :) [18:56:24] Ironholds: is that in TestSearchSatisfaction2_13223897 ? [18:57:35] ah, found it in TestSearchSatisfaction_12423691 [18:57:44] Ironholds: , i only see one record with that event_pageId [18:57:50] ottomata, it's in the second schema [18:57:56] the problems began appearing after last Thursday's deploy [18:58:05] TestSearchSatisfaction2_13223897 [18:58:05] ? [18:58:23] indeedy-doody [18:58:37] i don't see any record with that event_pageId in that table [18:59:02] Ironholds: ^ [19:01:08] uhhh. huh. [19:01:17] Ironholds: gergo just sent a response, take a look [19:01:36] aha, awesome [19:01:38] Ironholds: he knows the code and was able to pinpoint teh client side error [19:01:44] cool; thank you both! [19:01:50] (and thank you gergo, wherever you are) [19:01:55] Ironholds (cc ottomata ) this type of error is client induced 99% [19:02:22] gergo is tgr|away [19:02:41] ok! so not duplicate kafka crap [19:02:44] phew [19:02:46] :) [19:03:08] milimetric: around? [19:03:41] ottomata: no worries, i just saw a comment about sendbeacon so i looked but i think is the client code what needs fixes [19:04:02] ottomata: back to serialisaaationnnnn, yay! [19:05:07] woo [19:15:48] who wants to brain bounce node names for pageview api again?! :D [19:15:52] i guess milimetric when he gets back [19:15:54] maybe joal? :D [19:40:06] Hey ottomata ! [19:40:09] hello [19:40:28] Sorry, got a child request, prioritiy-wise you lost ;) [19:40:34] hah [19:40:35] no worries [19:40:41] so, names ! [19:40:46] https://etherpad.wikimedia.org/p/pageview_api_nodes [19:42:53] joal: i'm cool with aqs [19:42:58] will ops be annoyed with that name? [19:43:10] hm, why would they ? [19:43:29] joal: i've lost the ticket [19:43:31] do you have? [19:43:34] the one about getting new nodes? [19:45:47] give a sec [19:46:05] https://phabricator.wikimedia.org/T111053 [19:46:14] ottomata: [19:46:17] --^ [19:47:15] danke [20:14:05] (PS1) Joal: Update ua-parser version to 1.3.0-wmf2 and tests [analytics/refinery/source] - https://gerrit.wikimedia.org/r/238302 (https://phabricator.wikimedia.org/T106134) [20:15:25] (PS2) Joal: Update ua-parser version to 1.3.0-wmf2 and tests [analytics/refinery/source] - https://gerrit.wikimedia.org/r/238302 (https://phabricator.wikimedia.org/T106134) [20:16:21] hey nuria [20:16:30] How are you doing with the hive stuff ? [20:27:18] a-team, have a good end of day ! [20:28:02] good night joseph [20:29:21] see you tomorrow joal|night ! [20:43:58] ottomata: aqs is still ok with me. I think "Query Service" is an interesting idea as a general umbrella on top of all query services. Until then, <> query service makes sense [20:44:34] aye [20:44:42] good enough for me milimetric [20:46:27] ottomata: btw, I have access to the position in greenhouse, so you can send me links now [20:46:51] nuria: btw, I agreed with your comments and gergo's so I didn't comment further [20:47:42] cool [20:49:19] milimetric: do we go ahead and rename top to top-articles? [20:50:00] i see that it's suggested in the thread by mforns [20:52:13] madhuvishy: I don't think so. I think that concern was covered by the path being /v1/pageviews before /top [20:52:22] aah [20:52:27] we can have /v1/projectviews/top or whatever for other things [20:52:27] alright [20:52:32] cool [20:52:52] so, we are gonna do /top/year/month/day [20:52:55] well, projectviews no, because this includes project-level aggregation, but you know what i mean [20:52:58] month and day are optional [20:53:03] yes [20:53:05] yeah got it [20:53:24] i was thinking that's not too bad, but let me know if you hit anything tricky [20:53:45] i was thinking maybe like month and day are optional with "all-months" and "all-days" as defaults? [20:53:53] that fits with our other aggregation strategies [20:54:14] (PS1) Christopher Johnson (WMDE): first commit [wikidata/analytics/dashboard] - https://gerrit.wikimedia.org/r/238316 [20:54:16] (PS1) Christopher Johnson (WMDE): Merge branch 'master' of ssh://gerrit.wikimedia.org:29418/wikidata/analytics/dashboard [wikidata/analytics/dashboard] - https://gerrit.wikimedia.org/r/238317 [20:54:18] hmmm, interesting - i was thinking these would be ints [20:55:11] I was trying to think how we'd store the different levels of "top" in the same table [20:55:33] if they were ints, would we just put null in the "month" and "day" column when we're doing the yearly tops? [20:56:09] yeah that's what i thought [20:57:47] madhuvishy: that's ok too, but it's less consistent with the other endpoints [20:58:17] alright, i'll keep them as strings and put all-months and all-days as defaults then [20:58:18] i guess having "all-months" in this case might be weird, because it would technically be possible to say /2015/all-months/20 [20:58:23] yeah [20:58:25] to which the response would be ... weird :) [20:58:31] true [20:58:34] hmmm [20:58:54] well, that is true anyway? you could say 2015/null/5 [20:59:02] true too! [20:59:04] hmhm [20:59:11] i guess, if the month is nul [20:59:16] day should be ignored [20:59:21] let's do all-months and all-days and just ignore days if months = all-months [20:59:28] yea [20:59:38] yeah, okay works for me [20:59:54] ottomata: so will it be aqs1001, aqs1002, and aqs1003? I'll update the puppetization if so [21:00:22] milimetric: do we want all-years? [21:01:08] madhuvishy: I want everything! But we never talked about "all-time" [21:01:25] milimetric: think so, yes [21:01:29] yeah all years is all time [21:01:31] i'm asking on the ticket if there are objections [21:01:41] the servers are ready for reinstall, but i wanted to get a second opsen opinion about names [21:01:43] i guess it'd be really easy to have all-years... [21:02:11] madhuvishy: let's add all-years and we'll talk to joseph tomorrow to see if it's a pain to add to the aggregation [21:02:20] okay works for me [21:03:55] joal|night: np on hive yet, still trying. [21:04:48] (PS1) Christopher Johnson (WMDE): Modify access rules [wikidata/analytics/dashboard] (refs/meta/config) - https://gerrit.wikimedia.org/r/238319 [21:06:03] ottomata: ok, I'll wait on that then [21:07:38] (CR) Christopher Johnson (WMDE): [C: 1] Modify access rules [wikidata/analytics/dashboard] (refs/meta/config) - https://gerrit.wikimedia.org/r/238319 (owner: Christopher Johnson (WMDE)) [21:25:09] (CR) Nuria: [C: 2 V: 2] "Looks good. I assume tests pass, right?" [analytics/refinery/source] - https://gerrit.wikimedia.org/r/238302 (https://phabricator.wikimedia.org/T106134) (owner: Joal) [21:30:19] Analytics-EventLogging: EventLogging client should log errors (e.g url too long, or validation failure) - https://phabricator.wikimedia.org/T112592#1639278 (Krinkle) NEW [21:30:38] milimetric: Swagger spec says that if a param is "in" path [21:30:47] then it has to be requried [21:31:04] so, if we want it to be truly optional, they have to be query params [21:31:39] are we okay with that, or do we want to leave it to the users to pass in all-days and all-months as required defaults [21:32:01] right, that's the decision. Hm [21:32:39] Analytics-EventLogging: EventLogging client should log errors (e.g url too long, or validation failure) - https://phabricator.wikimedia.org/T112592#1639278 (Krinkle) [21:32:42] I thought there were in path optionals, guess not [21:32:42] i wonder if we could put in path, and see if we can put defaults [21:32:50] let me try that [21:33:37] Analytics-EventLogging: EventLogging client should log errors (e.g url too long, or validation failure) - https://phabricator.wikimedia.org/T112592#1639278 (Krinkle) [21:33:58] milimetric: http://swagger.io/specification/#parameterObject is pretty clear in stating if in=path, required must be true though [21:35:16] right, i wonder where i got that idea. But I'm leaning towards making users specify the all-days or all-months [21:35:43] milimetric: yeah that will work fine [21:36:13] it would be weird to have some endpoints do query params and others not [21:36:14] don't know if we say, required=true, and give it a default value, it won't expect it in the path [21:36:17] yeahh [21:36:35] that's what i think too. especially like, year is in path, month and day are query params [21:36:46] yeah [21:37:13] oh, you might be right about required=true with a default, it might be ok to omit it then [21:37:37] give it a shot. If it doesn't work that's ok [21:37:57] okay, will try that [22:07:58] milimetric: default has no meaning for required params [22:08:37] ok, specifying the whole thing it is then [22:08:51] yup [22:21:14] milimetric: https://github.com/madhuvishy/restbase/commit/e26fbb9bd14571e1fec038a8be463b28fc71912b let me know what you think [22:21:27] (if you're still working, if not we can chat tomorrow) [23:29:59] Analytics-EventLogging, MediaWiki-extensions-WikimediaEvents: Provide oneIn() sampling function as part of standard library - https://phabricator.wikimedia.org/T112603#1639870 (Krinkle) NEW [23:32:22] looks good to me madhuvishy