[11:10:11] * elukey goes afk! [14:02:54] hi folks :] [14:07:08] mforns: hellooo! [14:07:29] hey fdans :] [14:07:33] halllo all! [14:07:45] heya ottomata :] [14:07:57] ottomata ohai! [14:08:11] happy new year! [14:08:15] what's kickin?! [14:08:42] I'm working from a cafe in Santiago that serves the worst pancakes ever [14:09:02] hehe [14:09:35] hope you didn't feel too lonely last week ottomata [14:10:27] naw it was ok! i did some writing, wrote a draft blog post, documentation, etc. [14:10:42] cool [14:11:07] it was actually nice for that kind of work [14:15:40] niice [14:17:00] I'll be afk for the next 20-30 min [14:17:27] * fdans pays for his terrible pancakes [14:40:09] running home from cafe, back in a few mins [16:01:11] elukey, joal: staddupp? [16:20:53] 06Analytics-Kanban, 13Patch-For-Review: Replace stat1001 - https://phabricator.wikimedia.org/T149438#2913048 (10Ottomata) [17:30:58] wikimedia/mediawiki-extensions-EventLogging#628 (wmf/1.29.0-wmf.7 - dcef992 : Translation updater bot): The build has errored. [17:30:58] Change view : https://github.com/wikimedia/mediawiki-extensions-EventLogging/compare/wmf/1.29.0-wmf.7 [17:30:58] Build details : https://travis-ci.org/wikimedia/mediawiki-extensions-EventLogging/builds/188599821 [17:47:31] milimetric: double checking data by size gives me wrong results :(0;2~ [17:47:54] milimetric: I suggest we re-run the thing with the proper settings :S [17:48:20] joal: re-run sqoop? [17:48:25] milimetric: right [17:48:41] ok, I'll launch it now [17:48:53] milimetric: data currently in prod folders is ~224G, while the one I downloaded is 158G [17:48:58] there must be something wrong [17:49:13] joal: what command did you run it with? [17:49:24] that's weird... doesn't make sense that it would partially work [17:50:21] milimetric: I think I ran that: python3 /srv/deployment/analytics/refinery/bin/sqoop-mediawiki-tables --jdbc-host analytics-store.eqiad.wmnet --output-dir /user/joal/wmf/data/raw/mediawiki/tables --wiki-file "/mnt/hdfs/wmf/refinery/current/static_data/mediawiki/grouped_wikis/grouped_wikis.csv" --timestamp YYYYMMDD000000 --user research --password-file /user/hdfs/mys [17:50:27] ql-analytics-research-client-pw.txt [17:51:13] :( [17:52:08] joal: did you see errors in the logs for particular dbs? I would rerun it but maybe there's something else going on [17:52:47] milimetric: I had it running in a screen, and killed the session before checking for data consistency [17:53:01] for example, if a table schema changed, rerunning wouldn't help [17:53:09] makes sense milimetric [17:53:15] ok, i'll just re-run it as well, and we'll compare the data, I'll keep the screen [17:53:56] milimetric: I also deleted the data, since it seemed incorrect ... Mwarf - I'll stop taking fast actions for this week [17:54:29] it's all good, sounds like we all had a good proper reset over the break [17:54:41] :) [17:56:55] milimetric: let me know how things go, I didn't monitor the job properly :( [17:57:18] joal: I see the problem :) [17:57:34] you used timestamp: YYYYMMDD000000, I should've specified that's meant to be replaced [17:57:52] so that gets passed to the revision query verbatim, it probably just didn't sqoop anything from the revision tables [17:58:17] so that's meant to be an actual date like 20170101000000 [17:59:00] * joal wish to hides for such a dumnb error [17:59:14] thanks milimetric for the spotting [17:59:30] :( [17:59:38] there are many-eye errors and single-eye errors, that's what many-eyes are here for :) [18:02:18] (as for the python path I was mentioning, that's a moot point because it's set to pick up the deployed python stuff, so the command you ran is perfect otherwise [18:02:19] ) [18:05:18] though there clearly is some problem with running it as the hdfs user--- weird [18:06:44] I've kicked it off, it's looking ok [18:27:26] milimetric: Thanks again :) [18:30:25] nuria: is now when you moved the meeting to or later? [18:33:01] mforns: meeting ? [18:33:10] nuria, going! [20:23:31] 06Analytics-Kanban, 10EventBus, 13Patch-For-Review, 06Services (next): Create alerts on EventBus error rate - https://phabricator.wikimedia.org/T153034#2914203 (10Ottomata) Done! https://icinga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=2&host=graphite1001&service=EventBus+HTTP+Error+Rate+-4xx+%2B+5xx- [20:39:33] 06Analytics-Kanban, 10Analytics-Wikistats: Visual Language for http://stats.wikimedia.org replacement - https://phabricator.wikimedia.org/T152033#2914274 (10Nuria) Visual language is navigational scheme/ widget controls and visual metaphorts we use to talk about data. The deliverables are a set of wireframes o... [20:40:29] mforns: sorry did you want to talk more? [20:42:05] milimetric, maybe we could talk about a couple things, so I know what your opinion is and can forward it tomorrow? [20:42:14] yeah, sure [20:42:17] k [20:42:30] 06Analytics-Kanban: Kill limn1 - https://phabricator.wikimedia.org/T146308#2914278 (10Nuria) The dashboards that are to be migrated have been so, there have been some dashboards that we have deleted. notable exception is the reportcard, we need to migrate that one but we have not started that yet. [20:42:57] milimetric, basically about the time selector [21:00:29] 06Analytics-Kanban, 07Easy, 03Google-Code-In-2016, 13Patch-For-Review: Add monthly request stats per article title to pageview api - https://phabricator.wikimedia.org/T139934#2914300 (10Nuria) No, we do not mark tickets as resolved until code is deployed. [21:35:59] Hi guys, the chairman of WMCZ told me that it'll be good if Wikimetrics tool will support a number of pages edited by a cohort. Is a employee/a volunteer working on this? The chairman offered financial support of this work (I don't know where the money comes from, this must be directed to him). It won't be good if two or more people will be working on the work uncoordinated. [21:36:10] a-team ^ [21:48:44] Urbanecm, we analytics worked on this task some time ago, see: https://phabricator.wikimedia.org/T75072 [21:49:14] however, there were some difficulties that rendered that task quite complex [21:49:29] you can read more explanations in the task comments [21:50:11] mforns, thanks. I'm reading it. [21:50:16] we reduced the scope of the task and implemented a part of it, that at the moment was enough for us, but it looks like it won't suite WMCZ needs [21:50:33] Was the reduce radical? [21:51:14] Urbanecm, quite big, the report will calculate the pages edited by the cohort users separately for each user [21:51:31] so if a user has edited the same page more than once, it will count as just one. [21:51:33] but then [21:52:16] it will sum up the total pages edited by all cohort users, so if several users have edited the same page, it will count it more than once... [21:52:51] So publish list of pages edited per user? [21:54:09] no, rather ===> for user in cohort: #pages edited by user1 + #pages edited by user2 + ... + #pages edited by userN [21:54:23] the result is a number [21:54:26] Yes [21:54:36] but it contains duplication [21:54:45] Yeah [21:55:37] say page A has been edited twice by user U and once by user V. Wikimetrics will count 2 [21:55:39] Why it doesn't count the numbers separately? [21:55:49] it should count 1, because A is the only page edited [21:57:05] wikimetrics is built in a way that deduplicating across users is non-trivial and would need quite a bit of work [21:57:39] we analytics are not actively working on new features for wikimetrics [21:57:53] any more [21:58:08] and AFAIK no other team in the WMF is [21:58:39] So wikimetrics are abandoned right now? [21:59:05] we fix bugs, but plan to not further improve it actively [21:59:21] Is a better tool in plan? [21:59:25] Urbanecm, would Quarry be an option? [21:59:57] https://quarry.wmflabs.org/ [22:00:56] Yes, I wrote a query for the WMCZ. I got big thank you as the reply and after a few days later I was told better will be if all users have non-trivial tool that returns metrics reported as metrics to the WMF (grants etc.) [22:01:31] I thought wikimetrics is a tool that is designed for this use. [22:02:10] Urbanecm, yes, Wikimetrics is a tool meant for cohort-based metrics on top of Wikimedia databases [22:02:35] But why this tool isn't developed actively? Not enough free teems? [22:04:00] And if WMCZ will find its small team for the wikimetrics work (the original idea of the chairman)? [22:04:03] mforns, ^ [22:06:48] Urbanecm, we try to prioritize the development of tools that will bring the most benefit to the community, and we decided to not work on wikimetrics actively in favor of other tools like the pageview api (https://wikitech.wikimedia.org/wiki/Analytics/PageviewAPI) or the 2.0 version of Wikistats (https://stats.wikimedia.org/) [22:07:17] I suggest that you create a task in Phabricator (https://phabricator.wikimedia.org/) and tag it with the Analytics project [22:08:16] It will land in our team board and we'll discuss it in our weekly backlog meeting. We'll comment on it to let you know what can be done. [22:10:32] Urbanecm, mention in the task that WMCZ is planning to work on this, so we can discuss this option and how much we can help them with that [22:17:49] Urbanecm, sorry for not being able to help you more, another option is to write to the analytics mailing list, where all the analytics team and subscribers will read it and will be able to discuss the idea [22:18:10] https://lists.wikimedia.org/mailman/listinfo/Analytics [22:22:35] Thanks for your advice. I'll use the task later. [22:23:25] (03CR) 10EBernhardson: UDF for extracting primary full text search request (033 comments) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/327855 (https://phabricator.wikimedia.org/T149047) (owner: 10EBernhardson) [22:27:11] (03PS4) 10EBernhardson: UDF for extracting primary full text search request [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/327855 (https://phabricator.wikimedia.org/T149047)