[00:02:43] milimetric: around? [00:03:53] madhuvishy: what's up [00:04:17] milimetric: if i move stuff around like we talked about [00:04:31] i have some doubts on what'll happen here - https://github.com/wikimedia/analytics-wikimetrics/blob/master/wikimetrics/models/report_nodes/report.py#L68 [00:04:34] in the init [00:04:46] it persists the parameters of the report [00:05:11] all the parameters are not known at this point - like the cohort name etc - is that okay? [00:05:29] yes, I think that's fine [00:05:36] well.... [00:06:21] yeah that's fine, the cohort id matters later I think, when we render the results [00:06:35] id we will know [00:06:44] only name and size are not known [00:07:03] i have to split the globalreport init into two parts [00:07:08] right, I think we can make the rest work, though I confess I can't guarantee that it'll just work out of the box [00:07:19] okay [00:07:22] let me try it [00:07:28] but if something breaks, it'll be fixable in the code that breaks, by just not assuming whatever we're assuming about the params [00:07:48] ya alright [00:08:11] since i have no history on why those assumptions are there, i'll be poking you more :) [00:08:32] (not now, but when it breaks) [00:13:30] (PS6) EBernhardson: Implement ArraySum UDF [analytics/refinery/source] - https://gerrit.wikimedia.org/r/254452 [00:20:46] (CR) Milimetric: [C: 1] "Looks good, just gotta remove the [WIP] and I don't think any testing is needed for this task." (1 comment) [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/254887 (owner: Madhuvishy) [00:21:30] madhuvishy: you can poke me when it breaks, but I gotta go haul some chicken food back from the grocer, so I'll be out for a while. [00:21:54] milimetric: not now! its late for you, i'll ask tomorrow if anything [02:07:42] (PS1) EBernhardson: Add page_id to intermediate pageview [analytics/refinery] - https://gerrit.wikimedia.org/r/255318 [02:40:31] (CR) EBernhardson: "This is based off of https://gerrit.wikimedia.org/r/228010." [analytics/refinery] - https://gerrit.wikimedia.org/r/255318 (owner: EBernhardson) [02:40:47] (PS2) EBernhardson: Add page_id to intermediate pageview [analytics/refinery] - https://gerrit.wikimedia.org/r/255318 (https://phabricator.wikimedia.org/T116023) [05:17:42] Analytics-Backlog, Patch-For-Review: Add page_id to pageview_hourly when present in webrequest x_analytics header - https://phabricator.wikimedia.org/T116023#1830543 (EBernhardson) Ran a few exploratory queries which eventually led me to: ``` SELECT * FROM (select pageview_info['project'] as project, page... [05:20:25] (PS3) EBernhardson: Add page_id to intermediate pageview [analytics/refinery] - https://gerrit.wikimedia.org/r/255318 [05:25:37] (PS4) EBernhardson: Add page_id to intermediate pageview [analytics/refinery] - https://gerrit.wikimedia.org/r/255318 [05:25:42] (CR) EBernhardson: "after further investigation with exploratory queries in hive i decided to make page_id instead be page_ids array. further reasoning i" [analytics/refinery] - https://gerrit.wikimedia.org/r/255318 (owner: EBernhardson) [05:26:48] Analytics-Backlog, Patch-For-Review: Add page_id to pageview_hourly when present in webrequest x_analytics header - https://phabricator.wikimedia.org/T116023#1830546 (EBernhardson) I could see there being an argument to instead put x_analytics_map['page_id'] into the group by clause, any thoughts/opinions? [09:23:16] (PS1) Addshore: Fix 2 possible undeifned index notices [analytics/limn-wikidata-data] - https://gerrit.wikimedia.org/r/255345 [09:23:45] (CR) Addshore: [C: 2 V: 2] Fix 2 possible undeifned index notices [analytics/limn-wikidata-data] - https://gerrit.wikimedia.org/r/255345 (owner: Addshore) [09:26:06] Analytics-Backlog, Patch-For-Review: Add page_id to pageview_hourly when present in webrequest x_analytics header - https://phabricator.wikimedia.org/T116023#1830688 (JAllemandou) Thanks a lot Erik for the investigation ! I think I'd rather go for adding page_id as a single field and group by it. It makes... [09:28:55] Analytics-Kanban: Encapsulating the retrieval of schemas from local depot from KafkaSchemaRegistry [3 pts] - https://phabricator.wikimedia.org/T119211#1830690 (JAllemandou) --> Patch changed: https://gerrit.wikimedia.org/r/#/c/251267/12 [09:39:10] Analytics-Kanban, operations, Database: db1046 running out of disk space - https://phabricator.wikimedia.org/T119380#1830707 (jcrespo) I see two accelerations, on the 27 sep and on the 7 nov. There could be many explanations, from long running transactions being executed there, to the schema changes don... [10:46:51] (PS1) Addshore: Script for AVG and MAX item and prom page sizes [analytics/limn-wikidata-data] - https://gerrit.wikimedia.org/r/255356 (https://phabricator.wikimedia.org/T119602) [10:47:41] (PS2) Addshore: Script for AVG and MAX item and prom page sizes [analytics/limn-wikidata-data] - https://gerrit.wikimedia.org/r/255356 (https://phabricator.wikimedia.org/T119602) [10:53:11] (CR) DCausse: Add 2 payloads map fields to CirrusSearchRequestSet avro schema (1 comment) [analytics/refinery] - https://gerrit.wikimedia.org/r/252956 (https://phabricator.wikimedia.org/T118570) (owner: DCausse) [11:03:14] (PS1) Addshore: Simplify cron by using super scripts! [analytics/limn-wikidata-data] - https://gerrit.wikimedia.org/r/255362 [11:03:43] (CR) Addshore: [C: 2 V: 2] Script for AVG and MAX item and prom page sizes [analytics/limn-wikidata-data] - https://gerrit.wikimedia.org/r/255356 (https://phabricator.wikimedia.org/T119602) (owner: Addshore) [11:03:57] (CR) Addshore: [C: 2 V: 2] Simplify cron by using super scripts! [analytics/limn-wikidata-data] - https://gerrit.wikimedia.org/r/255362 (owner: Addshore) [11:24:37] hi a-team [12:14:48] (PS1) Addshore: Add terms_by_lang script [analytics/limn-wikidata-data] - https://gerrit.wikimedia.org/r/255367 [12:15:33] (PS2) Addshore: Add terms_by_lang script [analytics/limn-wikidata-data] - https://gerrit.wikimedia.org/r/255367 (https://phabricator.wikimedia.org/T119608) [12:21:15] (CR) Addshore: [C: 2 V: 2] Add terms_by_lang script [analytics/limn-wikidata-data] - https://gerrit.wikimedia.org/r/255367 (https://phabricator.wikimedia.org/T119608) (owner: Addshore) [12:28:41] (PS1) Addshore: Add properties_by_datatype script [analytics/limn-wikidata-data] - https://gerrit.wikimedia.org/r/255368 (https://phabricator.wikimedia.org/T119603) [12:29:52] (PS2) Addshore: Add properties_by_datatype script [analytics/limn-wikidata-data] - https://gerrit.wikimedia.org/r/255368 (https://phabricator.wikimedia.org/T119603) [12:30:17] (CR) Addshore: [C: 2 V: 2] Add properties_by_datatype script [analytics/limn-wikidata-data] - https://gerrit.wikimedia.org/r/255368 (https://phabricator.wikimedia.org/T119603) (owner: Addshore) [12:31:03] Analytics-Kanban: Write hive code doing pageview data anonimisation with two tables [13 pts] {hawk} - https://phabricator.wikimedia.org/T118838#1831090 (mforns) a:mforns [12:47:54] Analytics-Tech-community-metrics, DevRel-November-2015: Automated generation of (Git) repositories for Korma - https://phabricator.wikimedia.org/T110678#1831113 (Aklapper) @Dicortazar: Still wondering: >>! In T110678#1809233, @Aklapper wrote: > How got that ''initial'' list of Git repos in [[ https://gith... [12:50:01] Analytics-Tech-community-metrics, DevRel-November-2015: "Tickets" (defunct Bugzilla) vs "Maniphest" sections on korma are confusing - https://phabricator.wikimedia.org/T106037#1831119 (Aklapper) @Dicortazar: Reply to T106037#1701576 and in T114636 about the correct approach (place in code) is welcome. [13:20:09] Hi mforns :) [13:20:14] hey joal [13:20:17] :] [13:20:27] how is Lino doing? [13:20:35] Just in bed :) [13:20:47] I managed to have 2 hours this morning, hopefully two now :) [13:20:54] aha [13:21:11] And he's ok: compared to the previous one a few weeks ago, this one involves no fever [13:21:21] ah, cool [13:21:39] How about you ? [13:21:45] I'm fine [13:21:49] How is sanitizxation doing ? [13:22:22] this morning I had an episode with the tawn hall and the police [13:22:33] hoo ? [13:22:56] our tawn is purging the trees in the square near my home [13:23:08] this morning [13:23:48] and they are doing a horrible purge, they were leaving just the tree-trunks and a couple major branches... [13:24:12] so I went to tawn hall, spoke with the counselor, called the police,,, etc [13:24:25] k [13:24:43] but yea, nothing changed [13:24:46] it's sad [13:24:48] you managed to have change their way of doing ? [13:24:56] mwarf ... Sad it is :( [13:24:59] Poor trees [13:25:29] no, the thing is, I spoke with the chief gardener of the city and he said he agreed it was a tree-murder [13:25:59] and that he got orders to do so... and he was doing it with a broken heart... [13:26:23] O.o [13:26:36] I wonder who gives orders [13:26:49] me too [13:27:01] anyway [13:27:12] hm hm [13:27:14] Nor nice [13:28:11] re sanitization: I looked at the pageview_hourly table to see what are the dangerous fields, I looked at nuria's code to understand it, and was about to execute some of it [13:28:31] ok great :) [13:31:26] joal, I don't understand all of nuria's code though [13:31:36] do you want to cave to review that together? [13:31:40] sure [13:31:43] omw [14:06:15] Analytics-Tech-community-metrics, DevRel-November-2015: Explain / sort out / fix SCM repository number mismatch on korma - https://phabricator.wikimedia.org/T116483#1831256 (Aklapper) >>! In T116484#1807745, @Dicortazar wrote: > This seems to be an error when counting activity in empty repositories. That... [14:46:56] heya joal, is donate.wikimedia.org in pageview_hourly? [15:02:50] ottomata: normaly nope [15:02:56] Hi ottomata :) [15:03:12] ottomata: https://mail.google.com/mail/u/2/#search/donate.wikipedia.org/14f70a1ae31f5089 [15:03:22] ottomata: That's not what I merant :) [15:03:59] ottomata: https://gerrit.wikimedia.org/r/#/c/232177/ [15:04:02] Better :) [15:20:17] ottomata: You've seen my answer? [15:21:16] Arf, baby waking-up a-team, will be back when my wife arrives (normally soon) [15:31:09] Analytics, Multimedia, UploadWizard: Collect data on which unsupported file formats users are trying to upload - https://phabricator.wikimedia.org/T77796#1831474 (MarkTraceur) Open>declined a:MarkTraceur I'm going to close this as some combination of the following: * Not a good measure of wh... [15:33:42] ah, thanks joal! [15:46:35] ottomata: graphite check or later? [15:48:59] PROBLEM - Difference between raw and validated EventLogging overall message rates on graphite1001 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [30.0] [15:50:31] joal am about to start looking at it [15:50:39] Thanks mate [15:50:45] If I can help let me know [15:50:59] RECOVERY - Difference between raw and validated EventLogging overall message rates on graphite1001 is OK: OK: Less than 25.00% above the threshold [20.0] [15:51:10] k [15:57:13] Hey milimetric :) [15:57:18] hey joal [15:57:24] Don't worry, I'm fine having restarted one job ;) [15:57:34] I *suck* at being responsive to ops issues [15:57:41] huhuhu :) [15:58:30] ready for PV retro milimetric ? [15:59:15] mmmm yeah, sure [16:02:06] milimetric: on retro [16:26:22] ottomata: grafana kafka dashboard broken for me, is it for you? [16:27:23] yeah, godog temporarily disabled it because he suspected it was causing too much load on graphite [16:27:25] he's not sure thouhg [16:27:40] k ottomata [16:27:42] thx [16:34:26] Analytics-Backlog, MediaWiki-extensions-WikimediaEvents, The-Wikipedia-Library, Wikimedia-General-or-Unknown: Implement Schema:ExternalLinkChange - https://phabricator.wikimedia.org/T115119#1831678 (Sadads) @Legoktm can we provide you any more information? Did your initial investigation of impleme... [16:38:50] (PS1) Addshore: Add active_users tracking back [analytics/limn-wikidata-data] - https://gerrit.wikimedia.org/r/255407 [16:39:03] (CR) Addshore: [C: 2 V: 2] Add active_users tracking back [analytics/limn-wikidata-data] - https://gerrit.wikimedia.org/r/255407 (owner: Addshore) [16:48:51] Analytics, Beta-Cluster-Infrastructure, Services, WorkType-NewFunctionality: Set up AQS in Beta - https://phabricator.wikimedia.org/T116206#1831730 (Nuria) @mobrovac: Taking to #releng doesn't seem that they own this item, they help other teams to set up stuff in beta cluster but the setup (puppet... [16:51:10] Analytics-Backlog, Beta-Cluster-Infrastructure, Services, WorkType-NewFunctionality: Set up AQS in Beta - https://phabricator.wikimedia.org/T116206#1831740 (Nuria) [17:00:18] dcausse: yt? [17:00:35] Analytics-Backlog, Beta-Cluster-Infrastructure, Services, Scap3, WorkType-NewFunctionality: Set up AQS in Beta - https://phabricator.wikimedia.org/T116206#1831760 (mobrovac) @Nuria, this is part of the effort of creating a new deployment tool (called Scap3). RelEng and Services will make the init... [17:01:02] Analytics-Backlog, Beta-Cluster-Infrastructure, Services, Scap3, WorkType-NewFunctionality: Set up AQS in Beta - https://phabricator.wikimedia.org/T116206#1831762 (hashar) I am talking for myself, but I am pretty sure anyone from #releng will be happy to assist provide support. But do not expect... [17:02:52] (CR) Nuria: Implement ArraySum UDF (2 comments) [analytics/refinery/source] - https://gerrit.wikimedia.org/r/254452 (owner: EBernhardson) [17:06:54] (PS4) Madhuvishy: Add new form for launching the global metrics report [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/254887 (https://phabricator.wikimedia.org/T117289) [17:11:12] Analytics-Backlog, Beta-Cluster-Infrastructure, Services, Scap3, WorkType-NewFunctionality: Set up AQS in Beta - https://phabricator.wikimedia.org/T116206#1831788 (Nuria) >Once the initial deployment is completed, we will show you how to keep it updated, at which point ownership will be transferr... [17:15:41] Analytics-Kanban, operations, Database: db1046 running out of disk space - https://phabricator.wikimedia.org/T119380#1831799 (Milimetric) >>! In T119380#1830707, @jcrespo wrote: > I see two accelerations, on the 27 sep and on the 7 nov. There could be many explanations, from long running transactions be... [17:16:48] Analytics-Backlog, Beta-Cluster-Infrastructure, Services, Scap3, WorkType-NewFunctionality: Set up AQS in Beta - https://phabricator.wikimedia.org/T116206#1831803 (mobrovac) >>! In T116206#1831788, @Nuria wrote: >>Once the initial deployment is completed, we will show you how to keep it updated,... [17:19:49] Analytics-Backlog, Beta-Cluster-Infrastructure, Services, Scap3, WorkType-NewFunctionality: Set up AQS in Beta - https://phabricator.wikimedia.org/T116206#1831808 (hashar) Nice summary @mobrovac. So it seems AQS on beta is "just" waiting for {T116335} / {T114999}, isn't it? [17:21:35] Analytics-Backlog, Beta-Cluster-Infrastructure, Services, Scap3, WorkType-NewFunctionality: Set up AQS in Beta - https://phabricator.wikimedia.org/T116206#1831811 (mobrovac) >>! In T116206#1831808, @hashar wrote: > So it seems AQS on beta is "just" waiting for {T116335} / {T114999}, isn't it? On... [17:21:49] Analytics, Deployment-Systems, Services, operations, Scap3: Use Scap3 for deploying AQS - https://phabricator.wikimedia.org/T114999#1831813 (mobrovac) [17:21:52] Analytics-Backlog, Beta-Cluster-Infrastructure, Services, Scap3, WorkType-NewFunctionality: Set up AQS in Beta - https://phabricator.wikimedia.org/T116206#1831812 (mobrovac) [17:40:44] (PS7) Madhuvishy: [WIP] Setup celery task workflow to handle running reports for the Global API [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/253750 (https://phabricator.wikimedia.org/T118308) [17:40:46] (PS5) Madhuvishy: Add new form for launching the global metrics report [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/254887 (https://phabricator.wikimedia.org/T117289) [17:41:32] Analytics-General-or-Unknown, Database, Patch-For-Review: Create a table in labs with replication lag data - https://phabricator.wikimedia.org/T71463#1831862 (jcrespo) Documented on: https://wikitech.wikimedia.org/wiki/Help:Tool_Labs/Database#Identifying_lag [17:46:15] nuria: yep [17:46:35] dcausse: did you see my comment about hive updates of avro schema? [17:55:52] Analytics-Kanban, operations, Database: db1046 running out of disk space - https://phabricator.wikimedia.org/T119380#1831889 (jcrespo) > And we can't really drop old revision_ids for schemas in an automated way Actually, I wasn't suggesting that, just doing regularly that even if the revision_id doesn'... [17:56:15] (CR) Nuria: Add 2 payloads map fields to CirrusSearchRequestSet avro schema (1 comment) [analytics/refinery] - https://gerrit.wikimedia.org/r/252956 (https://phabricator.wikimedia.org/T118570) (owner: DCausse) [17:56:51] nuria: yes, thanks for testing alter table [17:57:13] joal: see comments, i am merging table creation with avro schema as this table does not exist yet [17:57:14] https://gerrit.wikimedia.org/r/#/c/252956/1/hive/mediawiki/cirrus-searchrequest-set/create_CirrusSearchRequestSet_table.hql [17:57:20] (CR) EBernhardson: Implement ArraySum UDF (1 comment) [analytics/refinery/source] - https://gerrit.wikimedia.org/r/254452 (owner: EBernhardson) [17:57:36] dcausse: i also tested you can *ahem* select past data [17:57:47] great [17:57:58] dcausse: cause you know... i wouldn't be surprised that is like great update but hey , no select no more [17:58:06] :) [17:58:17] (CR) Nuria: [C: 2] Add 2 payloads map fields to CirrusSearchRequestSet avro schema [analytics/refinery] - https://gerrit.wikimedia.org/r/252956 (https://phabricator.wikimedia.org/T118570) (owner: DCausse) [17:59:27] Analytics-Backlog, Patch-For-Review: Add page_id to pageview_hourly when present in webrequest x_analytics header - https://phabricator.wikimedia.org/T116023#1831901 (EBernhardson) yea that can work, i suppose i just lean towards keeping data small. But it's easier to query and no harm to have it separated... [18:01:06] nuria: huge thanks for your help [18:05:18] dcausse: doing my job you mean ? jaja ... [18:07:30] PROBLEM - Difference between raw and validated EventLogging overall message rates on graphite1001 is CRITICAL: CRITICAL: 26.67% of data above the critical threshold [30.0] [18:07:35] nuria: can you change the comment before merging ? [18:07:43] joal: yes [18:08:17] joal: GOOD catch [18:08:18] cause comment says the table needs to be recreated at schema change [18:08:23] ;) [18:09:02] mforns: batcave for a minute for me to show you where I am ? [18:09:09] joal, sure! [18:09:12] omw [18:09:31] RECOVERY - Difference between raw and validated EventLogging overall message rates on graphite1001 is OK: OK: Less than 25.00% above the threshold [20.0] [18:11:22] (PS5) EBernhardson: Add page_id to intermediate pageview [analytics/refinery] - https://gerrit.wikimedia.org/r/255318 [18:13:10] (PS2) Nuria: Create CirrusSearchRequestSet table [analytics/refinery] - https://gerrit.wikimedia.org/r/252956 (https://phabricator.wikimedia.org/T118570) (owner: DCausse) [18:14:13] dcausse: please take a look at commit msg to see it makes sense: https://gerrit.wikimedia.org/r/#/c/252956/ [18:14:49] nuria: perfect [18:15:00] dcausse, joal: k merging [18:15:13] ebernhardson: awesome nuria thanks [18:16:11] (CR) Nuria: [V: 2] Create CirrusSearchRequestSet table [analytics/refinery] - https://gerrit.wikimedia.org/r/252956 (https://phabricator.wikimedia.org/T118570) (owner: DCausse) [18:17:35] dcausse: the last one is this one: https://gerrit.wikimedia.org/r/#/c/252958/6/refinery-camus/src/main/resources/avro_schema_repo/CirrusSearchRequestSet/111448028943.avsc [18:17:44] dcausse: which you also tested right? [18:18:30] ebernhardson: about the pageId addition, we have talked in standup, and it would be awesome to actually add it as a core field to the refined webrequest table before using in the pageview_hourly table [18:18:42] ebernhardson: Do you want me to pick up your patch and modify ? [18:19:00] And by teh way ebernhardson : Thanks a lot for all the work you are putting in on that :) [18:19:09] ebernhardson: That really helps us moving faster :) [18:23:11] nuria: yes [18:24:12] (CR) Joal: [C: -1] "As said on IRC, we'd rather add the page_id field to the refined webrequest table ([oozie|hive]/webrequest/refine/), and then use it in pa" (1 comment) [analytics/refinery] - https://gerrit.wikimedia.org/r/255318 (owner: EBernhardson) [18:24:28] dcausse: ok, merging [18:28:58] joal: i can probably add it to core field fairly easily, mostly it takes longest to test things [18:29:09] ebernhardson: yessir, you're right [18:29:30] AS I said ebernhardson : thanks again, that's really awesome to have help like that :) [18:29:58] sure, i needed something to do :) with mediawiki code freeze in place we are getting back to working out how to integrate analytics into search [18:30:04] things that take awhile and don't require deployments [18:30:17] ebernhardson: not front delpoys anyway ;) [18:30:31] yup [18:31:01] ebernhardson: If you need help, let us know :) [18:31:18] But from what I have seen ebernhardson, it's probably not the case :) [18:31:21] sure, i'll let you know. it's good for me to learn all this stuff anyways, digging through helps me understand what went wrong later when it breaks [18:31:32] true ebernhardson [18:33:24] (PS7) Nuria: Add CirrusSearchRequestSet avro schema to local schema repo [analytics/refinery/source] - https://gerrit.wikimedia.org/r/252958 (https://phabricator.wikimedia.org/T118570) (owner: DCausse) [18:33:42] (CR) Nuria: [C: 2] Add CirrusSearchRequestSet avro schema to local schema repo [analytics/refinery/source] - https://gerrit.wikimedia.org/r/252958 (https://phabricator.wikimedia.org/T118570) (owner: DCausse) [18:37:06] one thing i havn't found, how does page_id get into the x-analytics header? I was expecting to find it in the varnish vcl but didn't find anything: https://github.com/wikimedia/operations-puppet/blob/production/templates/varnish/analytics.inc.vcl.erb#L157-L197 [18:39:49] ebernhardson: mediawiki does it [18:40:14] it adds it to the response header [18:40:23] ottomata: ahh, ok [18:44:20] COooOl [18:44:21] http://blog.cloudera.com/blog/2015/11/cloudera-enterprise-5-5-is-now-generally-available/?elq=86ee7a28799b4b518d59d7462f83680f&elqCampaignId=1126&elqaid=2429&elqat=1&elqTrackId=83a24049dd7a489dac4536f98aa1985e [18:44:24] • Apache Spark 1.5 (including Spark SQL, DataFrames API, and MLlib per above) [18:46:13] Analytics-Backlog, Analytics-Cluster: Upgrade to CDH 5.5 - https://phabricator.wikimedia.org/T119646#1832032 (Ottomata) NEW [18:50:05] ottomata: GIMEEEEEE ! [18:50:07] :D [18:50:21] :D [18:50:24] nicee [18:50:31] are we gonna upgrade [18:50:42] joal, so far so good [18:50:42] https://icinga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=2&host=cp1054&service=Varnishkafka+Delivery+Errors+per+minute [18:50:47] madhuvishy: uhhh ya eventually! :) [18:51:20] ottomata: maybe we'll wait a few weeks, not to suffer the fistrt bugs [18:51:28] yes yes of course [18:51:50] ottomata: ok, this graphite thingy seems to work :) [18:51:58] * joal is happy :) [18:55:12] Analytics-Backlog, Patch-For-Review: Add page_id to pageview_hourly when present in webrequest x_analytics header - https://phabricator.wikimedia.org/T116023#1832110 (JAllemandou) And an interesting point is that, since we are using columnar format (parquet), data won't grow that much :) [19:03:46] hi folks! I want to store some persistent data on the stats cluster (certain QS vars from requests to / on donatewiki with date and hour) in order to aggregate unique hits across days without massive queries on webrequest [19:03:57] can I just create an ejegg database in hive? [19:04:29] ejegg, yes! [19:04:30] you can [19:04:31] and should. [19:04:32] :) [19:04:36] awesome, thanks! [19:05:45] (PS1) Addshore: Enable running getentities script for old days [analytics/limn-wikidata-data] - https://gerrit.wikimedia.org/r/255427 [19:06:01] (CR) Addshore: [C: 2 V: 2] Enable running getentities script for old days [analytics/limn-wikidata-data] - https://gerrit.wikimedia.org/r/255427 (owner: Addshore) [19:13:35] milimetric: we have deadly results with mforns_gym :) [19:13:40] milimetric: ping when you can chat, i came to the office [19:13:48] deadly?! [19:13:50] :) [19:13:52] Will show tomorrow :) [19:13:58] or cave if you want [19:14:02] the results are dying! what if they die before tomorrow! [19:14:08] * milimetric running to the cave [19:14:09] hehe :) [19:14:20] joal: okay i'll join in a minute :) [19:14:30] (madhuvishy we can talk after) [19:14:31] Hi! Does anyone know what percentage of users might have cookies disabled or otherwise not available at all? [19:15:08] (I don't mean only accepting session cookies 8p) Thx in advance! [19:21:36] (PS1) Addshore: Sent getclaims stat to correct date [analytics/limn-wikidata-data] - https://gerrit.wikimedia.org/r/255429 [19:21:52] (CR) Addshore: [C: 2 V: 2] Sent getclaims stat to correct date [analytics/limn-wikidata-data] - https://gerrit.wikimedia.org/r/255429 (owner: Addshore) [19:23:11] a-team I'm off ! [19:23:17] See you tommorow or monday :) [19:23:28] laters! [19:23:32] ciao [19:25:02] one more question - when webrequest_source is 'mobile', is that just the mobile content service? I don't see anything with that source and host 'donate.wikimedia.org'. Will I lose donatewiki requests from mobile devices if I limit to the 'text' partition? [19:25:05] AndyRussG: of users we do not know and there is no way to find out, of requests it possible to find out doing some digging, we will probably have those numbers next month [19:25:30] (uri_host = 'donate.wikimedia.org', that is) [19:25:34] ejegg: i doubt donathe is served by either [19:25:45] it is probably served by "other" [19:25:51] so no text no mobile [19:25:57] hmm, I saw 'text' on the first ten from a random hour yesterday [19:26:05] ejegg: but you will need to do some quries to find out [19:26:28] ejegg: ok, i guess there is no pattern [19:26:30] but it's going to be all in one cluster, right? [19:26:39] I mean for that specific host? [19:27:00] ah, lemme see if the physical host is consistent too... [19:27:43] nuria: K! thanks so much :) [19:28:30] ok, it's a few different hostnames but always 'text' partition [19:35:49] (CR) Nuria: "Given how well tested this is I think we should merge all the daily loading code and remove the hourly one that is not used (it can live o" [analytics/refinery] - https://gerrit.wikimedia.org/r/236224 (owner: Joal) [19:43:24] (Abandoned) Florianschmidtwelzow: Add backtotop-click and remove fontchanger menu [analytics/limn-mobile-data] - https://gerrit.wikimedia.org/r/247045 (https://phabricator.wikimedia.org/T98701) (owner: Florianschmidtwelzow) [19:48:11] (PS1) Addshore: Split pages_by_namespace sql into own file [analytics/limn-wikidata-data] - https://gerrit.wikimedia.org/r/255433 [19:48:22] (CR) Addshore: [C: 2] Split pages_by_namespace sql into own file [analytics/limn-wikidata-data] - https://gerrit.wikimedia.org/r/255433 (owner: Addshore) [19:48:39] (Merged) jenkins-bot: Split pages_by_namespace sql into own file [analytics/limn-wikidata-data] - https://gerrit.wikimedia.org/r/255433 (owner: Addshore) [19:51:27] Analytics-Cluster, Analytics-Kanban: Estimate number of users (or requests) that have cookies off (due to fresh session or incognito mode) - https://phabricator.wikimedia.org/T119653#1832269 (Nuria) NEW a:Nuria [19:56:58] Analytics-General-or-Unknown, Database, Patch-For-Review: Create a table in labs with replication lag data - https://phabricator.wikimedia.org/T71463#1832288 (jcrespo) Open>Resolved It only took a year, but it was finally done. [20:13:06] PROBLEM - Difference between raw and validated EventLogging overall message rates on graphite1001 is CRITICAL: CRITICAL: 26.67% of data above the critical threshold [30.0] [20:15:06] RECOVERY - Difference between raw and validated EventLogging overall message rates on graphite1001 is OK: OK: Less than 25.00% above the threshold [20.0] [20:40:37] ottomata: around? :D [20:40:58] ja [20:40:59] hiya [20:41:03] oh hallooo! [20:41:24] PHP Warning: fopen(): Unable to find the wrapper "zlib" - did you forget to enable it when you configured PHP, where can one find in puppet the place to enable this ? ;) [20:44:14] on stat1002 that is, but I assume its a blanket thing for the analytics nodes! [20:46:12] probably because they still have zend, and not much else does these days :) [20:46:49] php!? [20:46:55] whatcha doing in php on stat1002? [20:47:09] although, since i was already logged into stat1003 i just checked...and it has zlib [20:48:12] yes php, the evilest of them all, reading stuff from the gz api dumps [20:48:13] addshore: maybe you're doing something odd, because i get zlib on stat1002: php -r 'var_dump(extension_loaded("zlib"));' [20:48:29] hmmm [20:48:40] interesting, [20:49:45] (CR) Joal: "All the code present in this patch is currently in use." [analytics/refinery] - https://gerrit.wikimedia.org/r/236224 (owner: Joal) [20:49:51] (CR) Milimetric: [C: 2] Add new form for launching the global metrics report (1 comment) [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/254887 (https://phabricator.wikimedia.org/T117289) (owner: Madhuvishy) [20:50:21] ebernhardson: [20:50:22] php -r 'fopen( "zlib:///a/mw-log/archive/api/api.log-20151111.gz", "r" );' [20:50:46] *goes to make sure he is using the correct syntax* .... [20:51:15] hmm, yup that one has the error. oddly `php -i` claims to have: Registered PHP Streams => https, ftps, compress.zlib, compress.bzip2, php, file, glob, data, http, ftp, phar, zip [20:51:21] so something is odd..not sure what yet [20:51:50] php -r 'fopen( "compress.zlib:///a/mw-log/archive/api/api.log-20151111.gz", "r" );' works >.> [20:51:52] :D [20:51:54] :) [20:51:59] Teamwork, many thanks! [20:56:01] I'm replace a bit of an evil bash script with a slightly less evil php script ;) [21:05:18] (PS7) EBernhardson: Implement ArraySum UDF [analytics/refinery/source] - https://gerrit.wikimedia.org/r/254452 [21:20:35] (PS3) Addshore: New version of log scanner [analytics/limn-wikidata-data] - https://gerrit.wikimedia.org/r/255440 [21:20:53] (CR) Addshore: [C: 2 V: 2] New version of log scanner [analytics/limn-wikidata-data] - https://gerrit.wikimedia.org/r/255440 (owner: Addshore) [21:21:02] (CR) Nuria: [C: 2] "Ah! my mistake, i though we did not have hourly loaded at all. Let's merge then." [analytics/refinery] - https://gerrit.wikimedia.org/r/236224 (owner: Joal) [21:32:41] milimetric: can i add a delete cohort method to the cohort service? [21:32:52] madhuvishy: definitely, yea [21:33:00] i see there are extensive delete methods in the controller though [21:33:37] milimetric: there are two, for viewer and owner - not sure why they are not already in the CohortService [21:34:18] madhuvishy: oh :/ right, those should be factored into the service [21:34:28] I think the service just came after the methods and we never moved them [21:34:37] milimetric: ah hmm, okay [21:34:55] it is tricky to delete though, because of all the relationships a cohort record has [21:35:22] milimetric: yeah [21:35:35] https://github.com/wikimedia/analytics-wikimetrics/blob/master/wikimetrics/controllers/cohorts.py#L316 [21:35:39] seems to do everything [21:36:23] yep. seems fairly easy to refactor into the service [21:40:36] i was attempting to test some changes i'm making to refinery that generates wmf.webrequest, but attempting to do a few tests is throwing exceptions: select x_analytics from wmf_raw.webrequest where year=2015 and month=11 and day=11 and hour=11 limit 10; [21:40:52] is there something special i need to do to read wmf_raw.webrequest, or perhaps i'm just not supposed to test that way? [21:41:31] i get: FAILED: RuntimeException org.apache.hadoop.hive.ql.metadata.HiveException: Failed with exception nulljava.lang.NullPointerException [21:45:30] hm, ebernhardson [21:45:31] try [21:45:33] ADD JAR /usr/lib/hive-hcatalog/share/hcatalog/hive-hcatalog-core.jar; [21:46:45] same error :( [21:47:27] oh, i had already added that one. I added both jars from refinery_webrequest.hql: hive-hcatalog-core.jar and refinery-hive-0.0.20.jar [21:54:03] ahh, it looks to be https://issues.apache.org/jira/browse/HIVE-10437 which is fixed in 1.2.0, but we are on 1.1.0 i believe [21:54:10] just have to force it to use a map/reduce job [21:59:26] huh! [21:59:37] yeah i guess select some fields or something might make it do that [21:59:42] oh you are [21:59:43] hm [21:59:45] count? [21:59:47] dunno weird. [22:01:15] haha [22:01:16] https://pixelastic.github.io/pokemonorbigdata/ [22:03:31] ottomata: this is addictive [22:03:58] i'm like, tokutek - big data - really [22:04:01] ? [22:22:40] took 5 tries but i finally got one right :) [22:29:02] (PS1) Addshore: api log scanner format whitelist [analytics/limn-wikidata-data] - https://gerrit.wikimedia.org/r/255464 [22:29:26] (CR) Addshore: [C: 2 V: 2] api log scanner format whitelist [analytics/limn-wikidata-data] - https://gerrit.wikimedia.org/r/255464 (owner: Addshore) [22:30:17] ebernhardson: Hadoop was the only one i was sure about [22:31:11] (PS1) Addshore: Remove old getclaims script [analytics/limn-wikidata-data] - https://gerrit.wikimedia.org/r/255465 [22:31:25] (CR) Addshore: [C: 2 V: 2] Remove old getclaims script [analytics/limn-wikidata-data] - https://gerrit.wikimedia.org/r/255465 (owner: Addshore) [22:32:59] (PS6) EBernhardson: Add page_id to webrequest and pageview_hourly [analytics/refinery] - https://gerrit.wikimedia.org/r/255318 [23:01:22] (PS8) Madhuvishy: [WIP] Setup celery task workflow to handle running reports for the Global API [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/253750 (https://phabricator.wikimedia.org/T118308) [23:45:26] madhuvishy: yt? [23:45:43] nuria: yes! [23:45:56] madhuvishy: take a look at this : https://wikitech.wikimedia.org/wiki/Analytics/Unique_clients/Last_access_solution/BotResearch [23:46:16] nuria: ooh, will read in a bit! [23:46:25] madhuvishy: i was looking into nocookie traffic to identify bots versus people [23:46:31] madhuvishy: ya norush