[02:39:22] Analytics-EventLogging, operations, Icinga: eventlog2001 - CRITICAL status of defined EventLogging jobs - https://phabricator.wikimedia.org/T119930#1840331 (Dzahn) NEW [09:59:39] Analytics-Tech-community-metrics, DevRel-December-2015, DevRel-November-2015: "Tickets" (defunct Bugzilla) vs "Maniphest" sections on korma are confusing - https://phabricator.wikimedia.org/T106037#1840702 (Aklapper) [10:01:44] Analytics-Tech-community-metrics, DevRel-December-2015: Make GrimoireLib display *one* consistent name for one user, plus the *current* affiliation of a user - https://phabricator.wikimedia.org/T118169#1840722 (Aklapper) [10:02:18] Analytics-Tech-community-metrics, DevRel-December-2015: Many profiles on profile.html do not display identity's name though data is available - https://phabricator.wikimedia.org/T117871#1840724 (Aklapper) [10:03:00] Analytics-Tech-community-metrics, DevRel-December-2015: Backlogs of open changesets by affiliation - https://phabricator.wikimedia.org/T113719#1840727 (Aklapper) [10:03:36] Analytics-Tech-community-metrics, Developer-Relations, DevRel-December-2015: Check whether it is true that we have lost 40% of (Git) code contributors in the past 12 months - https://phabricator.wikimedia.org/T103292#1840729 (Aklapper) [10:04:10] Analytics-Tech-community-metrics, DevRel-December-2015: OwlBot seems to merge random user accounts in korma user data - https://phabricator.wikimedia.org/T119755#1840730 (Aklapper) [10:04:44] Analytics-Tech-community-metrics, DevRel-December-2015: OwlBot seems to merge random user accounts in korma user data - https://phabricator.wikimedia.org/T119755#1840731 (Aklapper) a:Dicortazar [10:05:35] Analytics-Tech-community-metrics, DevRel-December-2015: Automated generation of (Git) repositories for Korma - https://phabricator.wikimedia.org/T110678#1840732 (Aklapper) a:Dicortazar>Lcanasdiaz [10:06:25] Analytics-Tech-community-metrics, DevRel-December-2015: "Age of open changesets by Affiliation" has some "NaN" values - https://phabricator.wikimedia.org/T110875#1840736 (Aklapper) [10:06:53] milimetric: Awesome work on pageview API demo :) [10:08:27] Analytics-Tech-community-metrics, DevRel-December-2015: Explain / sort out / fix SCR repository number mismatch on korma - https://phabricator.wikimedia.org/T116484#1840746 (Aklapper) So far the numbers on korma.wmflabs.org have not changed. :) [10:08:30] Analytics-Tech-community-metrics, DevRel-December-2015: Explain / sort out / fix SCM repository number mismatch on korma - https://phabricator.wikimedia.org/T116483#1840748 (Aklapper) So far the numbers on korma.wmflabs.org have not changed. :) [11:20:38] Analytics-Backlog, Wikimedia-Developer-Summit-2016: Developer summit session: Pageview API overview - https://phabricator.wikimedia.org/T112956#1840968 (Akeron) >>! In T112956#1838690, @Milimetric wrote: > We do have dumps in the same format as the old ones available here: http://dumps.wikimedia.org/oth... [11:44:07] hi a-team! [11:52:04] Hey mforns :) [11:52:09] hey [11:52:17] batcave? [11:54:09] mforns: not yet now, will ping you when ready :) [11:54:16] but please go ahead ! [11:54:21] joal, ok np [12:01:31] (PS1) Addshore: Add prefix to all SQL files [analytics/limn-wikidata-data] - https://gerrit.wikimedia.org/r/256207 [12:02:42] (PS2) Addshore: Add prefix to all SQL files [analytics/limn-wikidata-data] - https://gerrit.wikimedia.org/r/256207 [12:02:52] (CR) Addshore: [C: 2] Add prefix to all SQL files [analytics/limn-wikidata-data] - https://gerrit.wikimedia.org/r/256207 (owner: Addshore) [12:03:10] (Merged) jenkins-bot: Add prefix to all SQL files [analytics/limn-wikidata-data] - https://gerrit.wikimedia.org/r/256207 (owner: Addshore) [12:22:53] (PS5) Addshore: Add new active_users script [analytics/limn-wikidata-data] - https://gerrit.wikimedia.org/r/256208 (https://phabricator.wikimedia.org/T119959) [12:23:32] (CR) Addshore: [C: 2 V: 2] Add new active_users script [analytics/limn-wikidata-data] - https://gerrit.wikimedia.org/r/256208 (https://phabricator.wikimedia.org/T119959) (owner: Addshore) [12:33:16] (PS1) Addshore: Limit user lang tracking to active users [analytics/limn-wikidata-data] - https://gerrit.wikimedia.org/r/256209 (https://bugzilla.wikimedia.org/119710) [12:33:36] (CR) Addshore: [C: 2 V: 2] Limit user lang tracking to active users [analytics/limn-wikidata-data] - https://gerrit.wikimedia.org/r/256209 (https://bugzilla.wikimedia.org/119710) (owner: Addshore) [12:47:53] (PS1) Addshore: Track anon vs logged in edits [analytics/limn-wikidata-data] - https://gerrit.wikimedia.org/r/256211 (https://phabricator.wikimedia.org/T119953) [12:48:15] (CR) Addshore: [C: 2 V: 2] Track anon vs logged in edits [analytics/limn-wikidata-data] - https://gerrit.wikimedia.org/r/256211 (https://phabricator.wikimedia.org/T119953) (owner: Addshore) [13:07:40] Analytics-Backlog, Analytics-Wikistats, DevRel-December-2015: Clean the code review queue of analytics/wikistats - https://phabricator.wikimedia.org/T113695#1841256 (Qgil) [13:08:00] Analytics-Tech-community-metrics, DevRel-December-2015: Affiliations and country of resident should be visible in Korma's user profiles - https://phabricator.wikimedia.org/T112528#1841260 (Aklapper) [13:08:08] Analytics-Tech-community-metrics, DevRel-December-2015: Legend for "review time for reviewers" and other strings on repository.html - https://phabricator.wikimedia.org/T103469#1841262 (Aklapper) [13:30:11] Analytics-Tech-community-metrics, DevRel-December-2015, Google-Code-In-2015: Names on scr-contributors.html should link to corresponding people.html page - https://phabricator.wikimedia.org/T118192#1841285 (Aklapper) [13:30:33] Analytics-Tech-community-metrics, DevRel-January-2016: Key performance indicator: Top contributors: Should have sane Ranking algorithm which takes (un)reliability of user data into account - https://phabricator.wikimedia.org/T64221#1841286 (Aklapper) [13:31:05] Analytics-Tech-community-metrics, DevRel-December-2015: What is contributors.html for, in contrast to who_contributes_code.html and sc[m,r]-contributors.html and top-contributors.html? - https://phabricator.wikimedia.org/T118522#1841289 (Aklapper) a:Aklapper [13:40:26] Hello dcausse, are you here? [13:40:40] hi joal [13:40:40] Analytics-Tech-community-metrics, Developer-Relations: Mark BayesianFilter repository as inactive - https://phabricator.wikimedia.org/T118460#1841309 (Aklapper) p:Triage>Low [13:41:24] dcausse: I'll deploy now the changes on avro etc [13:41:41] refinery-camus? [13:41:44] dcausse: I have a preliminary question: do I need to drop data bafore creating the hive table ? [13:41:57] Or do you want me to add some partitions? [13:42:16] dcausse: refinery-source (including refinery-camus), and refinery (oozie jobs) [13:42:29] ok [13:43:15] (CR) Joal: [C: 2] "Self merging for deployment." [analytics/refinery/source] - https://gerrit.wikimedia.org/r/255976 (owner: Joal) [13:43:18] so now camus is running and adding data to hive but with wrong partition boundaries [13:43:28] s/hive/hdfs/ [13:43:43] ok dcausse, makes sense [13:44:13] joal: can you disable camus for mediawiki? [13:44:21] data can be read for any of those partitions with the defined hive table ? [13:44:52] data is readable but maybe we should delete it because of the wrong partitionning [13:45:29] dcausse: Here is my plan: I'll deploy the new code at a given time, double check that contained data is correct, and move the old wrongly partuitionned data in another place [13:45:39] dcausse: If needed, this data can be repartitionned [13:46:58] ok [13:47:16] I never tested it, I mean I don't know how the --check flag will behave [13:47:44] dcausse: I think the first imported hour will possibly be incomplete in camus due to the offset/timestamp thing, but we''l double check :) [13:47:52] ok [13:48:14] dcausse: for the moment, java compilation and upload to archiva :) [13:48:25] if something goes wrong we can delete everything [13:48:26] dcausse: I'll ping you when the interesting stuff starts :) [13:48:32] ok thanks! :) [13:48:34] dcausse: ok well understood :) [13:49:41] !log Deploying new refinery-source jars to archiva [13:49:54] !log New jars v0.0.23 [13:50:27] Hi guys o/ [13:50:36] this is Luca [13:50:52] hi elukey! [13:50:56] finally I joined the channel :) [13:50:59] how are you doing? [13:51:04] welcome! [13:51:10] good! thank you!! [13:51:14] :] [13:51:19] cool to see you here [13:52:22] I finished yesterday my previous job and now I am going to start reading a bit of Wikimedia material.. Andrew sent me a loooong list of links :) [13:52:29] elukey, you're starting in january, right? [13:52:49] yep! [13:53:21] hehe, yes, there are a lot of things to read, I'm still reading things after 1 year :] [13:54:07] Hello elukey ! Nice to see you around :) [13:54:28] I think working in a place like here, we'll never be done reading :D [13:54:46] Analytics-Tech-community-metrics, DevRel-December-2015: Automated generation of (Git) repositories for Korma - https://phabricator.wikimedia.org/T110678#1841343 (Lcanasdiaz) Open>Resolved [13:56:30] Hello joal! Oh yes I know, but I have to start sooner or later.. I would like not to bring down the analytics cluster on my first week of work [13:57:10] huhuhu elukey :) I don't think ottomata would let you do that anyway ;) [13:57:52] Analytics-Tech-community-metrics, DevRel-December-2015, Easy, Google-Code-In-2015: Clarify Demographics definitions on korma (Attracted vs. time served; retained) - https://phabricator.wikimedia.org/T97117#1841351 (Aklapper) **Update**: This is "upstream" code in Bitergia that we in Wikimedia "just"... [13:58:38] Analytics-Tech-community-metrics, DevRel-December-2015: Explain / sort out / fix SCR repository number mismatch on korma - https://phabricator.wikimedia.org/T116484#1841353 (Lcanasdiaz) I merged some local changes we had and it seems I ruined my own fix. Working on it v_v [13:58:43] elukey, joal, I'll leave for lunch now, will see you in a while [13:58:51] bye mforns [13:59:28] mforns: see you later! [13:59:40] o/ joal [13:59:46] Will be in the call in 5 minutes. [13:59:55] I'm finishing up some plots to show you :) [14:00:43] cool halfak, I had actually missed the time, thx foir the reminder :) [14:01:10] :D [14:01:15] o/ milimetric [14:01:20] Analytics-Tech-community-metrics, DevRel-December-2015, Easy, Google-Code-In-2015: Clarify Demographics definitions on korma (Attracted vs. time served; retained) - https://phabricator.wikimedia.org/T97117#1841356 (Aklapper) https://codein.withgoogle.com/dashboard/tasks/5925015899865088/ [14:01:23] hey halfak [14:01:28] Just telling joal that I won't be able to join the call for a couple minutes. [14:01:33] Makin' plots for you guys :) [14:01:36] cool [14:04:44] milimetric: o/ (Luca) [14:05:10] hi elukey! welcome :) [14:05:20] thanks! [14:09:01] Analytics-Tech-community-metrics, DevRel-December-2015, Google-Code-In-2015: Names on scr-contributors.html should link to corresponding people.html page - https://phabricator.wikimedia.org/T118192#1841375 (Aklapper) [14:10:02] joal, just joined the call [14:10:41] sorry [14:11:35] Analytics-Tech-community-metrics, DevRel-December-2015, Google-Code-In-2015: Names on scr-contributors.html should link to corresponding people.html page - https://phabricator.wikimedia.org/T118192#1841378 (Aklapper) https://codein.withgoogle.com/dashboard/tasks/6557026578595840/ [14:41:15] Analytics-Backlog, Wikimedia-Developer-Summit-2016: Developer summit session: Pageview API overview - https://phabricator.wikimedia.org/T112956#1841448 (Dave_Braunschweig) I'm not sure where to share this code with the community, so I'll add it here for reference. I created a Python script to generate... [14:56:42] hey ottomata, do you have a minute? [14:58:38] what's up with that EL critical alert acknowledgement thing? Someone working on it? [14:58:53] Analytics-Backlog, Wikimedia-Developer-Summit-2016: Developer summit session: Pageview API overview - https://phabricator.wikimedia.org/T112956#1841459 (ezachte) The new data in http://dumps.wikimedia.org/other/pageviews/ already exclude spider requests, so contain user data only. The way I'm reading @M... [14:59:48] milimetric: It's my duty-week, but I left that unattended because the machine is EL2001 [14:59:56] Do you think we should care ? [15:00:10] oh ok, no, I was just curious since it was marked as acknowledged but didn't recover [15:00:26] ok :) [15:00:50] joal hiya [15:00:53] ja whats up? [15:01:05] I am wondering about deploy or not the change on IPs [15:01:09] i dunno who applied eventlogging puppet to el2001 [15:01:09] ottomata: --^ [15:01:10] wasn't me [15:01:12] it hin it was ori [15:01:23] joal, aye? whatcha wondering about [15:01:47] ori asked me about that machine once, and i just told him we weren't using it...then a week or so later we saw some alarms there :) [15:02:11] Almost similar results, except for a few --> https://gist.github.com/jobar/5cd036153f69109ca969 [15:02:21] ottomata: let me know when you have it, I'll remove it [15:02:52] columns are ip, client_ip, hits [15:03:14] ottomata: nuria suggested we don't deploy it, to thoroughly test it [15:03:26] ottomata: I think I'm gonna go this direction [15:04:09] (PS1) Joal: Bump jar version for refine job [analytics/refinery] - https://gerrit.wikimedia.org/r/256224 [15:04:37] (CR) Joal: [C: 2 V: 2] "Self merging for deploy." [analytics/refinery] - https://gerrit.wikimedia.org/r/256224 (owner: Joal) [15:04:47] hm [15:05:02] ottomata: can I delete the gist? [15:05:07] hm, client_ip looks better! [15:05:15] sure joal, maybe show bblack though [15:05:26] we are iding client IPs better than whatever he is doing in varnish, eh? [15:05:38] ottomata: o/ (Luca) [15:06:09] hello! luca! [15:06:10] :) [15:06:17] joal: have you shown bblack that? [15:06:17] Well, this list is for 13 hours, and while there is ~400 diffs, there is ~4.1 billion equals :) [15:06:22] aye [15:06:43] aye, practically it won't matter, buuuuut, i dunno, would be good for him to know [15:06:45] ottomata: I keep that gist for the moment, but want to drop it asap [15:07:10] also ottomata, I don't deploy the thing, waiting for bblack feedback [15:07:23] * joal takes decisions :) [15:07:32] it looks like many of the diffs are where ip is some internal address somewhere (not necessarily WMF), but refinery figues that out and finds a different IP [15:07:37] joal: that is good [15:07:53] it doesn't hurt to leave it in refine for a bit longer...unless bblack removes x_forwarded_for [15:08:07] hm, right [15:08:09] https://gerrit.wikimedia.org/r/#/c/253474/ [15:08:11] will comment on that [15:08:14] I thought that was the plan ottomata [15:08:21] k thanks [15:08:26] * joal gets back to deploy [15:09:00] elukey: how goes?! [15:13:44] !log deploying refinery [15:15:01] everything's good! I finished yesterday in Amazon and I am packing the last things.. Also I am starting to read all the links that you sent :) [15:15:11] dcausse: I'll wait for the camus run to start (at 15 past), then deploy new code, and then we'll monitor for the run after [15:15:29] joal: ok [15:30:03] (PS1) Joal: Add jars v0.0.23 to refinery artifacts [analytics/refinery] - https://gerrit.wikimedia.org/r/256229 [15:30:29] * joal almost forgot to update the jars in refinery before deploying [15:30:59] (CR) Joal: [C: 2 V: 2] "Self merging for deploy." [analytics/refinery] - https://gerrit.wikimedia.org/r/256229 (owner: Joal) [15:37:50] !log Refinery deployment fails from tin [15:45:17] Analytics-Tech-community-metrics, DevRel-December-2015, Patch-For-Review: Kill out-of-date scr-organizations-summary in korma - https://phabricator.wikimedia.org/T119756#1841534 (Aklapper) That seems to be an upstream page as https://github.com/Bitergia/grimoire-dashboard/blob/master/browser/scr-compan... [15:49:22] !log Refinery deployment retry [15:52:11] Analytics-Tech-community-metrics, DevRel-December-2015: Explain / sort out / fix SCM repository number mismatch on korma - https://phabricator.wikimedia.org/T116483#1841543 (Dicortazar) Ok, Korma data show that there are 1166 git repositories in the list of repos at http://korma.wmflabs.org/browser/scm-re... [15:58:28] Analytics-Tech-community-metrics, DevRel-December-2015, Patch-For-Review: Kill out-of-date scr-organizations-summary in korma - https://phabricator.wikimedia.org/T119756#1841564 (Dicortazar) You also need to remove a specific line in a cofig file. More info in the pull request. Thanks for the change! :). [16:00:04] !log oozie refine bundle killed, pageview_hourly killed, cassandra loading killed [16:00:21] !log deploying new refinery code to hdfs [16:09:29] Analytics, Discovery, EventBus, MediaWiki-General-or-Unknown, and 7 others: Define edit related events for change propagation - https://phabricator.wikimedia.org/T116247#1841584 (Eevans) >>! In T116247#1839888, @Ottomata wrote: > @gwicke and I discussed the schema/revision in meta issue in IRC tod... [16:09:45] Analytics-Tech-community-metrics, Developer-Relations, DevRel-December-2015: Check whether it is true that we have lost 40% of (Git) code contributors in the past 12 months - https://phabricator.wikimedia.org/T103292#1841587 (Dicortazar) So far, the process has been focused on automating the list of Gi... [16:10:37] !log Add page_id field to webrequest and pageview_hourly tables [16:12:58] !log Restart oozie refine bundle [16:16:39] dcausse: camus job started with new jar [16:16:47] dcausse: so far, so good :) [16:16:52] joal: nice! :) [16:17:14] joal: did you activated the oozie job? [16:18:00] dcausse: not yet [16:18:05] ok [16:18:18] !log Restart oozie pageview_hourly coordinator [16:21:07] !log Restart oozie cassandra_loader bundle [16:27:54] joal, hi :] [16:28:11] do you have 15 mins for me to show you the code? [16:28:29] hey nuria [16:28:45] i think you may have submitted a patch with outdated code, you removed some of my changes in sevice.py from patchset 6 [16:28:54] https://gerrit.wikimedia.org/r/#/c/255188/6..7/eventlogging/service.py [16:32:51] mforns: Hey :) [16:33:03] hello! [16:33:09] sure, mforns, let's do that [16:33:13] batcave ? [16:33:16] cool! yes [16:34:47] Analytics-Tech-community-metrics, DevRel-December-2015, Patch-For-Review: Review/update mailing list repositories in korma - https://phabricator.wikimedia.org/T116285#1841676 (Aklapper) https://github.com/Bitergia/mediawiki-repositories/pull/13 [16:43:07] Analytics-Backlog, Wikimedia-Developer-Summit-2016: Developer summit session: Pageview API overview - https://phabricator.wikimedia.org/T112956#1841734 (Nemo_bis) > Running the monthly statistics in the past has typically taken two or three days to download and extract the data. The November run is und... [16:52:27] Analytics-Kanban, Database: Delete obsolete schemas {tick} [5 pts] - https://phabricator.wikimedia.org/T108857#1841806 (jcrespo) Open>Resolved It seems fixed now, last error: ``` 151130 18:18:45 [Note] Event Scheduler: [root@10.%].[log.delete_schedule] event execution failed. ``` [16:52:28] Analytics-Kanban: Enforce policy for each schema: Sanitize {tick} [8 pts] - https://phabricator.wikimedia.org/T104877#1841808 (jcrespo) [16:58:58] Analytics-Kanban, operations, Database: db1046 running out of disk space - https://phabricator.wikimedia.org/T119380#1841834 (jcrespo) Immediate problems have been fixed and purging has been restarted, however, the long term problem persist until we can schedule some maintenance for defragmenting and co... [16:59:23] Analytics-Kanban, operations, Database: db1046 running out of disk space - https://phabricator.wikimedia.org/T119380#1841836 (jcrespo) Open>Resolved [17:03:51] Analytics-Kanban: Backfill EL data for 2015-11-27 incident - https://phabricator.wikimedia.org/T119981#1841851 (mforns) NEW a:mforns [17:05:08] elukey: feel free to poke us / ask questions here as much as you like [17:06:11] milimetric: thanks a lot! I'll start reviewing some wiki docs that Andrew sent to me, I am sure that I'll have a lot of questions but I'll try to batch them [17:06:30] otherwise you guys will try to kill me after one week of work :) [17:13:47] * madhuvishy waves to elukey :) [17:14:36] k madhuvishy, what's up with the logging task? [17:14:36] ? [17:15:08] mforns: what date ranges are you looking for? [17:16:59] ottomata: oh, nothing - i din't look at it much yesterday - I just moved it from Done -> Ready to deploy because it was not merged. I tried to find out about the handlers key, couldn't find anything there. But i'll test whether the config overriding works if we call fileConfig after setting up the basicConfig [17:21:44] ottomata, sorry was asking a question in standup [17:21:59] ottomata, do you want to batcave-2? [17:23:18] mforns: ja actually gimme 10 mins to eat lunch [17:23:49] ottomata, sure! [17:25:02] joal: so the function is String getCountryName(String countryCode) [17:26:04] so my functional side and my hatred of OO are kind of preventing me from having a Country class, but is that what you'd do? [17:26:27] like new Country("GB").getName() ? [17:26:31] ewwwwww [17:27:35] hm milimetric [17:27:48] Maybe you could add that function as a static function in geocode? [17:28:09] being static prevents the actual need for maxmind loading [17:28:12] ? [17:28:28] hm... is there negative impact from all the imports that Geocode needs? [17:28:39] like when I use the UDF that needs Geocode, are all those imports loading as well? [17:29:09] hm ... so small in my opinion that you shouldn't bother, but if you think it's importtant, then single function class (how bad) [17:29:31] ok, I'll pick between the two evils :) thx [17:29:35] it would be CountryUDF anyway [17:29:35] :) [17:29:44] sounds better than Country [17:29:49] hm, CountryByCountryCode? [17:29:59] UDF at the end yes, [17:30:11] Arf, the class name,m got it sorry :) [17:31:41] Our UDF naming seems inconsistent though - some start with Get and Is, and others don't [17:32:00] true madhu [17:32:37] no the UDF I'd name CountryNameUDF, that seemed to be the majority of the convention [17:32:54] what I was talking about was the logic [17:34:40] ok mforns [17:34:58] ottomata, I'm with joseph in da cave [17:35:04] do you want to join us? [17:35:18] ottomata, we can talk, joseph is doing stuff [17:49:39] joal: checked mediawiki hdfs partitions and it seems to work 2015/12/01/16 has been flagged _IMPORTED after 2 camus runs [17:49:54] dcausse: I hadchecked that as well :) [17:50:02] thanks! :) [17:51:13] dcausse: just started oozie job to add table partitions [17:51:20] ok [17:51:41] !log Start oozie coordinator adding cirrus-searchrequest-set partitions in hive [17:52:59] joal: cirrussearchrequestset is in wmf but I think the oozie job is configured to wmf_raw by default [17:53:09] wahhh, my bad [17:56:54] joal: good news, the error notification worked correctly :) [17:57:00] Great :) [17:57:17] dcausse: not tested on purpose, but happy that it works :) [17:57:55] dcausse: U can haz dataz :) [17:59:35] madhuvishy: o/ [17:59:58] joal: a dream come true :) [18:00:06] huhuhu [18:00:32] dcause: SELECT min(ts), max(ts) from CirrusSearchRequestSet where year = 2015 and month = 12 and day = 1 and hour = 16; [18:00:37] --> results are correct :) [18:00:48] awesome [18:01:05] dcausse, I'll make the necessary tomorrow to try and get the old data in the table (resorted etc) [18:01:16] Analytics-Tech-community-metrics, DevRel-December-2015, Patch-For-Review: Review/update mailing list repositories in korma - https://phabricator.wikimedia.org/T116285#1842069 (Dicortazar) Do you want to keep the activity of those old mailing lists? If so, we shouldn't remove them from the list. I'm no... [18:01:53] joal: it's not strictly necessary but thank you :) [18:02:35] dcausse: I'll timebox the thing ;) [18:02:46] ok [18:02:47] Analytics-Tech-community-metrics, DevRel-December-2015: OwlBot seems to merge random user accounts in korma user data - https://phabricator.wikimedia.org/T119755#1842088 (Dicortazar) I'm having a look at the data. If there's an id with loads of merges, this is usually an id to be added to the sortinghat b... [18:03:26] Analytics-Kanban, Database: Delete obsolete schemas {tick} [5 pts] - https://phabricator.wikimedia.org/T108857#1842093 (mforns) Thanks @jcrespo :] [18:03:35] joal: it looks like just need to add partitions? i have a shell script that does that already, can knock it out [18:03:46] oh, but they will have wrong timestamps [18:03:51] ebernhardson: need to reorder [18:03:54] yea [18:03:58] right [18:04:16] ebernhardson: hive can do that [18:05:05] unrelated question about oozie :) we are writing up our first jobs (both in pyspark) to generate a page popularity score and then export those to elasticsearch. Is refinery the right place to put these scripts to integrate with oozie? [18:05:26] the thing i wasn't sure about, is it would mean our oozie jobs are on your teams deployment schedule and it seems it's a bit of work to deploy anything [18:05:57] yeah, ebernhardson, i think you shoudl manage your own jobs [18:06:07] if you need refinery things, you can always depend on refinery jar [18:06:15] ok that makes sense [18:06:43] cool :) [18:19:01] Analytics-Backlog, Wikipedia-iOS-App-Product-Backlog, operations, vm-requests, iOS-5-app-production: Request one server to suport piwik analytics - https://phabricator.wikimedia.org/T116312#1842158 (JMinor) @Joe @mark any updates here or other info I can provide for this request? I don't have m... [18:23:43] Analytics-Kanban, Patch-For-Review: Add page_id to pageview_hourly when present in webrequest x_analytics header - https://phabricator.wikimedia.org/T116023#1842185 (JAllemandou) a:EBernhardson [18:25:56] joal, I need to leave for a while, I'll ping you when I'm back [18:26:22] mforns: I'm sorry I'll be gone :( [18:26:25] probably [18:26:32] joal, np, I'll have a look at counters [18:26:40] k mforns [18:29:11] Analytics-Engineering: Have dashiki read and write GET params to pass stateful versions of dashboard pages - https://phabricator.wikimedia.org/T119996#1842220 (Jdforrester-WMF) NEW [18:46:55] dcausse: for old data, I'll start at 2015-11-03, right ? [18:47:40] Actually, I'll start the 4th, first full day [18:48:37] joal:I think you can start at 2015-11-04 (full day with full data, 11/3 contains sampled data) [18:48:43] k [18:48:53] dcausse: Am I allowed to delete these folders ? [18:48:58] yes [18:49:04] ok [19:12:08] milimetric: is it possible to test that the cohort got deleted at the end of the RunProgramMetricsReport in test? I'm not very clear on how [19:19:38] ottomata: would it be possible to open port 8888 from the analytics cluster to wdqs100[12].eqiad.wmnet ? And if so, how? [19:21:56] addshore: it requires ops approval, what are you trying to do again? [19:22:14] pinged you in the other channel [19:33:50] milimetric: got a sec for a weird pytohn import problem? [19:33:58] Analytics, Discovery, WMDE-Analytics-Engineering, Wikidata, and 3 others: Add firewall exception to get to wdqs*:8888 from analytics cluster - https://phabricator.wikimedia.org/T120010#1842561 (Addshore) NEW [19:35:13] Analytics-Backlog, Discovery, WMDE-Analytics-Engineering, Wikidata, and 3 others: Add firewall exception to get to wdqs*:8888 from analytics cluster - https://phabricator.wikimedia.org/T120010#1842581 (madhuvishy) [19:35:34] Analytics, Discovery, WMDE-Analytics-Engineering, Wikidata, and 3 others: Add firewall exception to get to wdqs*:8888 from analytics cluster - https://phabricator.wikimedia.org/T120010#1842585 (Addshore) [19:35:58] Analytics-Backlog, Discovery, WMDE-Analytics-Engineering, Wikidata, and 3 others: Add firewall exception to get to wdqs*:8888 from analytics cluster - https://phabricator.wikimedia.org/T120010#1842561 (Addshore) [19:36:28] addshore: we don't look at the Analytics board, so I changed it to the one we do look at [19:36:59] yup, and then I edit conflicted and added it back ;) and then switched it back again :D oh phabricator, how you handle edit conflicts is just great... [19:38:14] yeah, i would expect it told either of us there was a conflict before just updating it with the latest data [19:38:42] Analytics-Backlog, Discovery, WMDE-Analytics-Engineering, Wikidata, and 3 others: Add firewall exception to get to wdqs*:8888 from analytics cluster - https://phabricator.wikimedia.org/T120010#1842561 (Addshore) [19:43:28] (Draft1) Addshore: Add script to backfill language_usage data [analytics/limn-wikidata-data] - https://gerrit.wikimedia.org/r/256253 (https://phabricator.wikimedia.org/T119710) [19:45:53] mforns: does userAgent get inserted into eventlogging mysql tables? [19:46:05] ottomata: yeah [19:46:08] in the capsule [19:47:41] k danke [19:58:44] ottomata, sorry, yea thx madhu :] [19:59:03] ottomata, in fact it is a simplified user agent [19:59:13] oh? how's it get in there? [19:59:15] I think some details get stripped [19:59:18] oh, nm, nm [19:59:28] let me look at the code [20:05:03] ottomata, no, I don't think it gets simplified... unless "Mozilla/5.0 (Linux; Android 5.0; SM-G900V Build/LRX21T) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.133 Mobile Safari/537.36" is simplified to you :P [20:05:26] yeah [20:05:40] i think its not, i was getting confused for asec, but what i was seeing was a problem with unit tests [20:05:51] aha [20:05:55] not sure when userAgent was added to EventCapsule, but it was never updated in fixtures.py [20:06:08] I see [20:15:41] On the deployment-prep zookeeper instance, it wants to remove zookeeperd and install zookeeper.... is this normal? [20:16:42] madhuvishy: the test db should be cleared before and after each test, so you can just make sure there are no records in the cohort table, I think [20:16:48] ottomata: hey, what's up, sure [20:16:50] (Draft1) Addshore: Use WikimediaCurl in more places [analytics/limn-wikidata-data] - https://gerrit.wikimedia.org/r/256286 [20:16:55] (Draft1) Addshore: Allow proxy and non proxy in WikimediaCurl [analytics/limn-wikidata-data] - https://gerrit.wikimedia.org/r/256285 [20:17:09] Analytics-Kanban: Add LRUCache for user agent parsing in refinery [3 pts] {hawk} - https://phabricator.wikimedia.org/T120015#1842785 (JAllemandou) NEW a:JAllemandou [20:17:12] (Merged) jenkins-bot: Allow proxy and non proxy in WikimediaCurl [analytics/limn-wikidata-data] - https://gerrit.wikimedia.org/r/256285 (owner: Addshore) [20:17:16] (Merged) jenkins-bot: Use WikimediaCurl in more places [analytics/limn-wikidata-data] - https://gerrit.wikimedia.org/r/256286 (owner: Addshore) [20:18:43] milimetric: batcave real quick? [20:18:50] omw [20:33:41] (Draft2) Addshore: curl php get headers as well as body [analytics/limn-wikidata-data] - https://gerrit.wikimedia.org/r/256294 [20:34:59] (Merged) jenkins-bot: curl php get headers as well as body [analytics/limn-wikidata-data] - https://gerrit.wikimedia.org/r/256294 (owner: Addshore) [20:35:02] Bye a-team, I'm off for tonight [20:35:11] bye joal! [20:35:32] mforns: At some point we WILL find some time :) [20:35:37] joal, xD [20:35:42] yes sure [20:45:29] (Draft1) Addshore: Track triples & lag per query host [analytics/limn-wikidata-data] - https://gerrit.wikimedia.org/r/256338 (https://phabricator.wikimedia.org/T119993) [20:52:20] mforns: do you know when I should be able to get the top articles for a month? Seems data is not available yet [20:52:22] https://wikimedia.org/api/rest_v1/metrics/pageviews/top/en.wikipedia/all-access/2015/10/all-days [20:58:33] milimetric: do you know answer to question above? [20:58:52] kevinator, reading [21:00:03] kevinator, in theory it should be available after some hours after the start of next month [21:00:48] yeah and it seems like neither October or November are available... [21:00:52] kevinator, but I'm not sure if monthly resolution is already avaliable [21:00:55] \yes [21:01:50] i was in batcave kevinator, yes, I'm not sure what the status on loading monthly is, you'll have to ask joseph tomorrow [21:02:08] ok... I'll ask Joseph [21:02:21] I do have another comment on the documentation @ https://wikimedia.org/api/rest_v1/?doc#!/Pageviews_data/get_metrics_pageviews_top_project_access_year_month_day [21:02:31] aha [21:02:52] it says montly is available [21:02:57] I think the "all-days" should be mentioned next to the day field... [21:03:20] we should either fix that or change the docs, probably change the docs [21:03:30] oh [21:05:48] I see kevinator, we wrote it that way, because either if we write it like it is now or we write it as you suggest, we'll always reference the value of another field [21:06:58] for example: if we use your suggestion, we'd have to write: "The day of the date for which to retrieve top articles, in DD format. If you want to specify a whole year or a whole month, use all-days." [21:07:02] yeah... my first glance at the API documentation made me think I could leave the day field blank... [21:07:13] aha [21:07:19] then when it complained... I didn't read above to see that I had to write all-days [21:08:23] kevinator, but now that I rethink it, I agree with you that we should change that [21:09:19] kevinator, will create a task [21:09:30] thanks [21:13:21] Analytics-Backlog: Change the Pageview API's RESTBase docs for the top endpoint - https://phabricator.wikimedia.org/T120019#1842892 (mforns) NEW [21:13:28] kevinator, ^ [21:14:21] (Draft1) Addshore: Ignore bad offset when parsing http headers [analytics/limn-wikidata-data] - https://gerrit.wikimedia.org/r/256343 [21:14:58] thanks mforns... I added a thumbs up token :-) [21:15:26] kevinator, what's that? [21:15:48] I saw it, but is it visible outside the task? [21:15:52] Analytics-Backlog: Have dashiki read and write GET params to pass stateful versions of dashboard pages {crow} - https://phabricator.wikimedia.org/T119996#1842900 (Milimetric) [21:16:04] no, it's not visible outside the task [21:16:22] I think phab tokens are meant to show support for a task from the community [21:16:36] oh, but it's in the summary [21:16:51] you don't need to scroll down, I see :] [21:17:57] Analytics-Backlog: Have dashiki read and write GET params to pass stateful versions of dashboard pages {crow} - https://phabricator.wikimedia.org/T119996#1842906 (Milimetric) We probably won't get to this task this quarter, but I'm happy to help anyone else navigate the code. Baha worked on the toggle featur... [21:19:44] Analytics-Kanban, operations, Database: db1046 running out of disk space - https://phabricator.wikimedia.org/T119380#1842934 (Milimetric) @jcrespo, just open up a different task and assign it to me if I can help in any way. Don't worry about tagging analytics projects, it seems everyone's confused abou... [22:02:06] milimetric: https://docs.google.com/spreadsheets/d/1Qib7Nm0eyE9oMrst4cR5lHLvEg5C2P5TnpYnZhOWt4w/edit#gid=0&vpid=A1 has a couple other metrics that we din't do yet [22:02:13] because there were no tabs [22:02:22] should we also have those? [22:02:52] no madhuvishy, just the 4 metrics [22:02:58] ok cool [22:03:02] but that reminds me did we end up checking on the pages created metric, madhuvishy? [22:03:14] no [22:03:26] ok, should I ask about it or do you want to? [22:12:45] Analytics-Backlog, Wikimedia-Developer-Summit-2016: Developer summit session: Pageview API overview - https://phabricator.wikimedia.org/T112956#1843041 (Milimetric) >>! In T112956#1841448, @Dave_Braunschweig wrote: > I'm not sure where to share this code with the community, so I'll add it here for refer... [22:42:49] Analytics-Backlog, Wikimedia-Developer-Summit-2016: Developer summit session: Pageview API overview - https://phabricator.wikimedia.org/T112956#1843097 (Milimetric) >>! In T112956#1841459, @ezachte wrote: > The new data in http://dumps.wikimedia.org/other/pageviews/ already exclude spider requests, so c... [22:49:27] Analytics-Kanban: Reformat pageview API responses to allow for status reports and messages {slug} - https://phabricator.wikimedia.org/T117017#1843118 (Milimetric) >>! In T117017#1839501, @Ironholds wrote: > Agreed, never older than May, but are you planning on clearing old data out as new data comes in for st... [22:50:50] is there any downside to using too many partition fields? For our popularity score we have `project`, `page_id` and `score` and partitioning by `year`, `month`, `day` that aggregation starts at [22:51:40] the script that writes these scores to elasticsearch needs to group by project, so thinking of moving that over to partitiing side (there will be <1000 projects). Additionally thinking of adding on more field to partitioning for clarity that declares the number of days that were aggregated [22:52:40] but perhaps 1000 fields is too many to partition over, and many of those 1000 are rather small projects that wont have very large data [22:58:13] milimetric: sorry I missed your message. what did we want to ask? whether it's pages created or namespace edits? [22:59:44] right madhuvishy, basically whether it's ok to use namespace_edits knowing that it includes pages created as well as articles improved, I think [23:00:05] otherwise we'd have to create another metric to more accurately measure what they're trying to go for [23:00:27] the spreadsheet says to use Pages created - but it doesn't give articles improved? [23:01:57] (PS1) Milimetric: Add UDF that turns a country code into a name [analytics/refinery/source] - https://gerrit.wikimedia.org/r/256354 (https://phabricator.wikimedia.org/T118323) [23:04:28] (PS1) Milimetric: [WIP] Oozie-fy Country Breakdown Pageview Report [analytics/refinery] - https://gerrit.wikimedia.org/r/256355 [23:09:55] nite! [23:10:09] good night [23:11:34] milimetric: is there a standard way to return report failure? [23:23:52] Is there a task for "please make Vital Signs less sad and not just be missing data for most days"? [23:24:08] * James_F is happy to create one but doesn't want to duplicate… [23:25:36] James_F: feel free to make one and tag Analytics-Backlog :) We can always mark it duplicate if there's something else. [23:26:33] Sure! [23:26:52] thanks :) [23:27:17] milimetric: I think they want the sum of Pages Created, and Edits [23:28:14] Analytics-Backlog: Vital Signs: Please make the data for enwiki and other big wikis less sad, and not just be missing for most days - https://phabricator.wikimedia.org/T120036#1843248 (Jdforrester-WMF) NEW [23:30:55] Analytics-Backlog: Vital Signs: Please provide an "all languages" de-duplicated stream for the Community/Content groups of metrics - https://phabricator.wikimedia.org/T120037#1843264 (Jdforrester-WMF) [23:34:00] milimetric: but is Edits = NameSpace edits = Sum of pages created and edits?