[00:27:21] (CR) Ori.livneh: [V: 2] Allow to specify date to compute the report for [analytics/blog] - https://gerrit.wikimedia.org/r/180158 (owner: QChris) [00:27:31] (CR) Ori.livneh: [V: 2] Add basic tox setup [analytics/blog] - https://gerrit.wikimedia.org/r/180159 (owner: QChris) [00:27:41] (CR) Ori.livneh: [C: 2 V: 2] Move blogreport code behind a __main__ guard [analytics/blog] - https://gerrit.wikimedia.org/r/180160 (owner: QChris) [00:27:50] (CR) Ori.livneh: [C: 2 V: 2] Add tests for parsing string to date [analytics/blog] - https://gerrit.wikimedia.org/r/180161 (owner: QChris) [00:40:12] (PS1) Ori.livneh: Add overall counts for URLs [analytics/blog] - https://gerrit.wikimedia.org/r/180369 [01:13:38] leila, yt? [01:14:16] nuria__: yes. [01:14:24] nuria__: what's up? [01:15:00] i am trying to productionize Ironholds code to callculate app uniques from webrequest table in hive [01:15:34] leila: for perf reasons i would like to avoid scanning 1 month of data if possible [01:16:23] leila: so i want to do some random sampling that will render data with an apropiate level of confidence [01:16:32] leila: so far makes sense? [01:17:11] question: we want the number of uniques per month? [01:17:27] leila: yes [01:17:42] leila: so it is a 'counting' problem [01:17:52] leila: no more complex than that [01:17:54] you want to do this in hive or not necessarily? [01:18:35] leila: we will do it with hive random sampling but first i wanted to ask if the aproximation of [01:19:25] sample size like simple: http://en.wikipedia.org/wiki/Binomial_distribution#Confidence_intervals [01:19:56] (I'm trying to understand if approximation is needed. it may be needed if you stick to hive. I'm wondering if you consider other options like pig or Oozie at the moment. [01:19:58] sorry, a simple calculation [01:20:47] this will run with oozie, but my concern is not to have to look at 1 month of data if it is not needed, say, if i can calculate [01:20:59] I understand [01:21:16] the number with a confidence of 95% looking at -let's say- 60% of teh dataset [01:21:33] the query consumes less time & resources [01:21:36] I understand. please continue. [01:22:11] so trying to rememeber my stats from school i was trying to estimate the sample size for a "counting" problem [01:22:36] and i think [01:23:51] teh formula was pretty simple (if you had a good guess as of the true value, which we do) [01:24:10] https://www.irccloud.com/pastebin/CXJ5XBF1 [01:24:35] i just cut and pasted, hopefully this makes sense [01:24:57] nuria__: there are multiple ways for doing this. I need to think about it for some time and let you know. Would it be good if I let you know before eod today? [01:25:17] leila: no rush at all, tomorrow is just as good [01:25:52] sounds good. I'll email you then, hopefully by eod. [01:26:02] leila: Many thanks!!! [01:26:08] np, nuria__. [01:28:54] nuria__: how long is the unique ID? [01:29:12] per record in webrequest table? [01:29:16] taht one? [01:29:16] yup [01:29:18] *that [01:30:12] Analytics-Visualization, Analytics-Engineering: PM shares a deep link into Limn Dashboard - https://phabricator.wikimedia.org/T78743#852299 (kevinator) NEW [01:31:01] boy, i imagine that our id is the partition key year/month/day/hour but this otto will know [01:31:19] no worries. I'll look at it. [01:31:26] nuria__: I'm signing off. will email later. [01:31:30] ciao [01:31:36] nuria__: no rush please ciao [03:47:34] Phabricator, Engineering-Community, Analytics-Tech-community-metrics: Monthly report of total / active Phabricator users - https://phabricator.wikimedia.org/T1003#852435 (Aklapper) >>! In T1003#851771, @chasemp wrote: > can I get on this list? You need to bribe mutante. >>! In T1003#851812, @Qgil wrote: > I... [08:05:47] Analytics-Tech-community-metrics, Phabricator: Metrics for key Wikimedia projects software in Maniphest - https://phabricator.wikimedia.org/T28#852651 (Qgil) [09:16:24] MediaWiki-extensions-MultimediaViewer, Multimedia, Analytics: Create MediaViewer image varnish hit/miss ratio dashboard - https://phabricator.wikimedia.org/T78205#852788 (Gilles) >>! In T78205#851804, @Tgr wrote: > Yeah, I didn't think of that. The Last-Modified header of thumbnails seems match when they were... [09:18:00] Analytics-EventLogging: EventLogging ValidateSchemaTest::testValidEvent() fails under HHVM - https://phabricator.wikimedia.org/T78680#852791 (hashar) Any idea why it would fail under HHVM and not under Zend ? One sure thing, the test pass now: https://integration.wikimedia.org/ci/job/mwext-EventLogging-teste... [09:27:55] MediaWiki-extensions-MultimediaViewer, Multimedia, Analytics: Create MediaViewer image varnish hit/miss ratio dashboard - https://phabricator.wikimedia.org/T78205#852821 (Gilles) Actually I see that there's a way to tell this only with headers, no need to calculate the local time difference. The "Age" header... [09:30:25] MediaWiki-extensions-MultimediaViewer, Multimedia, Analytics: Add Last-Modified and Date to performance logging - https://phabricator.wikimedia.org/T78767#852822 (Gilles) NEW a:Gilles [09:30:41] MediaWiki-extensions-MultimediaViewer, Multimedia, Analytics: Create MediaViewer image varnish hit/miss ratio dashboard - https://phabricator.wikimedia.org/T78205#852835 (Gilles) T78767 [09:34:22] MediaWiki-extensions-MultimediaViewer, Multimedia, Analytics: Add Last-Modified and Date to performance logging - https://phabricator.wikimedia.org/T78767#852842 (Gilles) Actually, as @tgr pointed out, Varnish's X-Timestamp is the same as Last-Modified, and we're already logging that. Assuming that the clocks... [09:46:02] MediaWiki-extensions-MultimediaViewer, Multimedia, Analytics: Add Last-Modified and Date to performance logging - https://phabricator.wikimedia.org/T78767#852853 (Gilles) Actually timestamp != Date for one very obvious reason: the EL event will only be recorded after the image load, and will depend on latency... [10:15:47] MediaWiki-extensions-MultimediaViewer, Multimedia, Analytics: Add Last-Modified and Date to performance logging - https://phabricator.wikimedia.org/T78767#852879 (Gilles) Saving this for later: P163 [10:26:13] MediaWiki-extensions-MultimediaViewer, Multimedia, Analytics: Add Last-Modified and Date to performance logging - https://phabricator.wikimedia.org/T78767#852888 (Gilles) Ah, it turns out that the "timestamp" column IS the Date header. So we only need Last-Modified. [10:30:58] MediaWiki-extensions-MultimediaViewer, Multimedia, Analytics: Add Last-Modified to performance logging - https://phabricator.wikimedia.org/T78767#852904 (Gilles) [10:31:32] MediaWiki-extensions-MultimediaViewer, Multimedia, Analytics: Add Last-Modified to performance logging - https://phabricator.wikimedia.org/T78767#852822 (Gilles) [11:11:50] Analytics-Tech-community-metrics, Engineering-Community, Phabricator: Monthly report of total / active Phabricator users - https://phabricator.wikimedia.org/T1003#852957 (Dzahn) ``` Hi Community Metrics team, this is your automatic monthly Phabricator statistics mail. Number of active users (any activity) i... [11:14:52] Analytics-Tech-community-metrics, Engineering-Community, Phabricator: Monthly report of total / active Phabricator users - https://phabricator.wikimedia.org/T1003#852958 (Dzahn) >>! In T1003#852435, @Aklapper wrote: >>>! In T1003#851771, @chasemp wrote: >> can I get on this list? > > You need to bribe mutant... [11:30:34] Analytics, MediaWiki-extensions-MultimediaViewer, Multimedia: Investigate if pre-rendering images is having an impact on performance - https://phabricator.wikimedia.org/T76035#852986 (fgiunchedi) yep clients are geo-located to the closest datacenter via dns, so different cp machines get very different clients... [11:31:05] Analytics-Tech-community-metrics, Engineering-Community, Phabricator: Monthly report of total / active Phabricator users - https://phabricator.wikimedia.org/T1003#852987 (Dzahn) >>! In T1003#852435, @Aklapper wrote: > but using $sql_name in the script won't work here as that is on the "phabricator_user" DB in... [11:34:31] Analytics-Tech-community-metrics, Engineering-Community, Phabricator: Monthly report of total / active Phabricator users - https://phabricator.wikimedia.org/T1003#852990 (Dzahn) >>>! In T1003#851249, @Aklapper wrote: >> Uh, //daily//? The idea was monthly (see topic), otherwise the queries using INTERVAL 1 MO... [11:39:41] Analytics-Tech-community-metrics, Engineering-Community, Phabricator: Monthly report of total / active Phabricator users - https://phabricator.wikimedia.org/T1003#853006 (Dzahn) [11:39:42] Analytics-Tech-community-metrics, Phabricator: SQL user/grant for phabricator statistics script - https://phabricator.wikimedia.org/T78311#853004 (Dzahn) Resolved>Open re-opening, because now we have an additional requirement. In T1003 we have been asked to add another query but it access a different da... [12:00:36] Analytics-Engineering: WebStatsCollector's pageviews definition should have a UDF - https://phabricator.wikimedia.org/T78779#853038 (Ironholds) NEW a:Ironholds [12:15:09] qchris, sorry didn't answer yesterday in time. Re "text" - yes, we plan to analyze future desktop traffic from carriers, once we start tagging it in varnish [12:15:40] yurikR: no worries. I'll respond to the email today. [12:16:00] qchris, re who needs it - Dan & the rest of zero team needs it so that can officially start using our graphs instead of limn [12:16:26] and so that in case we are asked about numbers, we can say that you at least looked over our procedure and found it reasonable [12:37:11] Analytics-General-or-Unknown: datasets.wikimedia.org SSL error - https://phabricator.wikimedia.org/T74805#853086 (QChris) Open>Resolved a:QChris ottomata moved stat1001 behind misc-web. Now SSL is handled before the request gets to stat1001, and the issue is gone. [13:28:36] (CR) QChris: [C: -1] "Only Nits." (3 comments) [analytics/refinery] - https://gerrit.wikimedia.org/r/180305 (owner: Ottomata) [13:51:34] (CR) Hashar: "recheck" [analytics/blog] - https://gerrit.wikimedia.org/r/180369 (owner: Ori.livneh) [13:51:42] (CR) jenkins-bot: [V: -1] Add overall counts for URLs [analytics/blog] - https://gerrit.wikimedia.org/r/180369 (owner: Ori.livneh) [13:52:17] bah qchris :-( [13:52:29] analytics/blog tox jobs are failling [13:52:32] Ouch :-( [13:52:37] They passed locally. [13:52:43] * qchris checks again [13:53:25] might just be the patch https://gerrit.wikimedia.org/r/#/c/180369/ [13:54:20] (PS1) QChris: [DO NOT SUBMIT] Empty test for CI [analytics/blog] - https://gerrit.wikimedia.org/r/180472 [13:54:26] (CR) jenkins-bot: [V: -1] [DO NOT SUBMIT] Empty test for CI [analytics/blog] - https://gerrit.wikimedia.org/r/180472 (owner: QChris) [13:54:46] That one succeeds for me locally :-) [13:54:49] (PS2) Hashar: Add overall counts for URLs [analytics/blog] - https://gerrit.wikimedia.org/r/180369 (owner: Ori.livneh) [13:54:54] (CR) jenkins-bot: [V: -1] Add overall counts for URLs [analytics/blog] - https://gerrit.wikimedia.org/r/180369 (owner: Ori.livneh) [13:55:21] (CR) Hashar: "I have fixed the flake8 errors. The other tox env failing is being investigated." [analytics/blog] - https://gerrit.wikimedia.org/r/180369 (owner: Ori.livneh) [13:55:54] sh: 1: mysql_config: not found [13:55:54] hashar: https://integration.wikimedia.org/ci/job/analytics-blog-tox-py27/2/console [13:55:59] Yes :-/ [13:56:06] seems building sqlalchemy requires the mysql-dev package or something [13:56:18] Mhmm. [13:57:06] I remember I had to install mysql on my laptop to be able to build sqlalchemy [13:57:29] I guess that's nothing tox can cater for :-( [13:57:39] libmysqlclient-dev would do [13:57:46] just have to install it on the slaves [13:57:57] one day folks will be able to add in their test sudo apt-get install libmysqlclient-dev [13:58:00] Would that be ok to install? [13:58:03] meanwhile that has to be done in puppet [13:58:15] yeah doing so [13:58:21] we already have a bunch of -dev packages [13:58:25] Ok. Awesome! :-) [13:58:32] * hashar pesters at slow internet connection [13:58:59] ideally we would have an up-to-date python-sqlalchemy package installed which tox would use instead of compiling from pip [14:06:15] qchris: fixed :] [14:06:25] (CR) Hashar: "recheck" [analytics/blog] - https://gerrit.wikimedia.org/r/180472 (owner: QChris) [14:06:26] Awesome! [14:07:34] It's green! [14:07:35] \o/ [14:07:42] Thanks hashar! [14:07:42] (Abandoned) Hashar: [DO NOT SUBMIT] Empty test for CI [analytics/blog] - https://gerrit.wikimedia.org/r/180472 (owner: QChris) [14:07:53] (CR) Hashar: "The CI slaves were lacking libmysqlclient-dev which I have installed via puppet." [analytics/blog] - https://gerrit.wikimedia.org/r/180369 (owner: Ori.livneh) [14:07:59] (CR) Hashar: "recheck" [analytics/blog] - https://gerrit.wikimedia.org/r/180369 (owner: Ori.livneh) [14:47:31] (CR) QChris: [C: -1] [WIP] UDF for classifying pageviews according to https://meta.wikimedia.org/wiki/Research:Page_view/Generalised_filters (25 comments) [analytics/refinery/source] - https://gerrit.wikimedia.org/r/180023 (owner: Ottomata) [14:52:21] yay, christian comments :D [14:52:40] :-P [14:58:54] (CR) OliverKeyes: [WIP] UDF for classifying pageviews according to https://meta.wikimedia.org/wiki/Research:Page_view/Generalised_filters (3 comments) [analytics/refinery/source] - https://gerrit.wikimedia.org/r/180023 (owner: Ottomata) [15:14:27] (PS2) Ottomata: Catch ValueError raised by hive.partition_datetime_from_path [analytics/refinery] - https://gerrit.wikimedia.org/r/180305 [15:19:41] Analytics, MediaWiki-extensions-MultimediaViewer, Multimedia: Investigate if pre-rendering images is having an impact on performance - https://phabricator.wikimedia.org/T76035#853383 (Gilles) >>! In T76035#852986, @fgiunchedi wrote: > Disabling prerending and running the measurements again sounds easier to te... [15:21:43] Analytics, MediaWiki-extensions-MultimediaViewer, Multimedia: Investigate if pre-rendering images is having an impact on performance - https://phabricator.wikimedia.org/T76035#853385 (Gilles) Which regions do the cp4xxx servers cover, out of curiosity? [15:27:55] Analytics, MediaWiki-extensions-MultimediaViewer, Multimedia: Investigate if pre-rendering images is having an impact on performance - https://phabricator.wikimedia.org/T76035#853390 (Gilles) Nevermind, I answered my own question by looking at the data: P164 It seems to be predominantly Asia, with Taiwan taki... [15:34:00] Ironholds: yt? [15:34:19] nuria__, I deny everything [15:34:27] Ironholds: jajaja [15:35:07] Ironholds: i just have a question: the parse_url() in the query: https://gist.github.com/Ironholds/428014d22edb7969ff5c [15:35:41] yup? [15:35:43] it's one of the apache udfs right? you did not have [15:35:58] yeah, it's an inbuilt hive UDF [15:36:00] another jar for that udf [15:36:05] ok, thank youu [15:36:11] it requires an actual URL body, hence the CONCAT, but is otherwise mostly sane [15:36:22] amusingly I ended up re-implementing the UDF in C++ for another project :D [15:37:17] Ironholds: ok, talked to leila a bit yesterday about the sampling to see if we can use one of teh simple formulas for sampling sizes of "counting" problems so we can reduce a bit the swap through the table [15:37:27] gotcha [15:37:31] Ironholds: but thus far running teh query i just get OOMs [15:37:43] oh, yes. The wonders of the hive client :D [15:37:43] nuria__: I'm on it. comparing couple of different methods [15:37:50] export HADOOP_HEAPSIZE=1024 before launching hive [15:38:09] dep_hive_query automatically does this before any query, because I run into the problem so often, even with simple stuff :/ [15:38:28] ottomata, while we're talking hardware, did I hear something about new machines for something-or-other in the dev sync-up, or was I mishearing? [15:39:12] Ironholds: do you have a query troubleshooting wiki or should i start one? [15:39:12] that was new namenodes, to replace the ciscos [15:39:14] no new capacity [15:39:40] nuria__: Ironholds, I am worried about OOM errors. if they happen on the client side, not sucha big deal, because you can increase HADOOP_HEAPSIZE as oliver notes. [15:39:48] if they happen on the hive server side...I am not yet sure [15:40:04] ottomata: agreed, that is why i was trying to look at alogs [15:40:52] nuria__: are you getting an application_id? i.e. your job is actually launching in hadoop? [15:41:24] ottomata: no, it was not [15:41:31] nuria__, please do start one! [15:41:36] ooh, that's WEIRD. [15:41:42] wait. nuria__ how are you running it? [15:42:05] Ironholds: from command line: time hive -f select-app-uniques.sql >& output.txt [15:42:12] well, there goes my theory [15:42:21] so, this has been happening ore and more to me recently, too. I don't know why. [15:42:23] nuria, can you tell if it is accepted by the hive server? [15:42:27] Ironholds: that is the problem we were having with your pageviews query before we turned it into a udf...if [15:42:28] yeah [15:42:28] As of a couple of days ago, there are massive lags.[ [15:42:32] yeah [15:42:43] but it also happened to a totally unrelated query that was for legal [15:42:57] it stops just before it assigns an appID and predicts reducer counts. [15:42:59] and just...freezes. [15:44:54] Ironholds: did that query ever run? [15:44:59] i remember you left it at the end of the day [15:45:55] it did, for some reason, work when I ran it through R. [15:46:08] even though that's just a system call to hive -f temp_query_file.hql > temp_output_file.tsv [15:46:18] ottomata: i think the task is accepted by the hive server : [15:46:21] https://www.irccloud.com/pastebin/UUQALaxF [15:47:51] nuria__: how large you expect the number of uniques in a month be? [15:51:02] leila: we know it will be (for all apps) about 9 million as oliver gather that data earlier [15:51:25] I see. thanks nuria__. [15:51:25] leila: the whole dataset is about 2500* 10^6 for 1 month [15:51:53] Ironholds: troubleshooting doc started: https://wikitech.wikimedia.org/wiki/Analytics/Cluster/Hive/Troubleshooting [15:52:10] ta! [15:52:16] ah yes [15:52:20] that is the same problem Ironholds has [15:52:25] so yes, your job is accepted by the hive server [15:52:30] but it not getting launched in hadoop [15:52:35] notice that it says 999 reducers. [15:52:37] that is not good [15:52:42] nuria__: what is your query? [15:53:16] nuria__: wanna get on the call? [15:54:16] ottomata: the one i got from Ironholds : https://gist.github.com/Ironholds/428014d22edb7969ff5c [15:55:29] leila: give me a sec to see if we can figure out the query issues [15:55:43] sure. just ping me when you're ready nuria__ [15:55:57] leila: thank you! [16:26:52] ottomata: looks like incrementing the heapsize on the client makes hive able to send the task to hadoop [16:26:56] ottomata, JFYI I'm working on patching for qchris's comments on that patch, btw [16:26:57] Analytics-Engineering, Analytics-Wikimetrics: Re-run Wikimetrics data once Labs issues are fixed [8 pts] - https://phabricator.wikimedia.org/T78305#853495 (Milimetric) [16:27:04] ...that came out all unreadable. [16:27:36] Analytics-Engineering, Analytics-Wikimetrics: Re-run Wikimetrics data once Labs issues are fixed [8 pts] - https://phabricator.wikimedia.org/T78305#841895 (Milimetric) The old recurrent reports are saved by the backup system, but I put just the contents of the datafiles directory here: /data/project/wikimetri... [16:27:41] hulk patch java! hulk-mata have better things to do than patch java. If hulk-mata patch java, hulk effort's duplicated and hulk sad. [16:35:40] Analytics-Refinery: Getting Ananth started - https://phabricator.wikimedia.org/T77196#853510 (kevinator) [16:35:52] actually, hulk not know how to patch some bits. [16:42:28] Analytics-Refinery: Hive User can specify webrequest date range in query more easily - https://phabricator.wikimedia.org/T76531#853519 (Ottomata) [16:42:49] (PS6) OliverKeyes: [WIP] UDF for classifying pageviews according to https://meta.wikimedia.org/wiki/Research:Page_view/Generalised_filters [analytics/refinery/source] - https://gerrit.wikimedia.org/r/180023 (owner: Ottomata) [16:44:54] (PS1) Gilles: Add scroll metadata open/close events to dashboards [analytics/multimedia/config] - https://gerrit.wikimedia.org/r/180501 [16:46:01] nuria__: in the simple case, are you considering tablesample in hive? [16:46:29] darnit [16:46:35] how do I reply to comments left on an old commit? [16:55:37] Analytics-Engineering: Oozie tutorial - https://phabricator.wikimedia.org/T78687#853536 (kevinator) a:kevinator [16:58:23] Analytics-Engineering: Oozie tutorial - https://phabricator.wikimedia.org/T78687#853538 (ggellerman) Andrew offered to do casual question & answer session. If that doesn't cover it, we can schedule a more formal presentation (possibly when everyone is in SF for WMDS) [16:58:24] interesting, nuria__, so you got your query started? [16:58:25] did it finish? [16:58:59] whoa, there are no partitions specified on that query? [17:01:36] Analytics, Analytics-Engineering: Analytics User uses CentralNotice cookie in x-analytics field of web-request logs - https://phabricator.wikimedia.org/T75835#853539 (kevinator) stalled>declined a:kevinator Closing task since it's not needed anymore. A workaround was implemented. [17:02:21] Analytics-Engineering, Analytics-Wikimetrics: Re-run Wikimetrics data once Labs issues are fixed [8 pts] - https://phabricator.wikimedia.org/T78305#853542 (Milimetric) [17:02:49] ottomata: no, there are not , oliver used to run it for a 'month' [17:04:33] ottomata: thus my conversation with leila about random sampling and reducing dataset a bit [17:04:49] aye [17:05:30] I mean, partitions are specified for webrequest_source [17:05:38] and no, I ran it for 31 days. Months are meaningless :) [17:06:16] sorry, 31 days [17:09:13] Analytics, Analytics-Engineering: Engineer adds data to X-Analytics header using mediawiki extension - https://phabricator.wikimedia.org/T78801#853554 (kevinator) NEW [17:09:31] Ironholds: at the time of running it it will swept all partitions on cluster, which might amount to more than 31 days of data, that is why i was saying "month", but understood [17:09:45] yep [17:09:47] which is a problem :/ [17:15:50] Analytics-Refinery: Hive User can specify webrequest date range in query more easily - https://phabricator.wikimedia.org/T76531#853589 (kevinator) A little background: * a UDF was deemed the wrong solution to the problem because Hive would have had to send all the partitions to the UDF to get determine if th... [17:19:59] Analytics-Refinery: Hive User calls UDF to extract fields out of X-Analytics header - https://phabricator.wikimedia.org/T78805#853590 (kevinator) NEW [17:26:22] Analytics: Move stat1001, stat1002 and stat1003 into Analytics VLAN - https://phabricator.wikimedia.org/T76346#853659 (Ottomata) @bblack I've started an etherpad to guide us for tomorrow's move: http://etherpad.wikimedia.org/p/stat-analytics-vlan. [17:32:10] (CR) QChris: [WIP] UDF for classifying pageviews according to https://meta.wikimedia.org/wiki/Research:Page_view/Generalised_filters (2 comments) [analytics/refinery/source] - https://gerrit.wikimedia.org/r/180023 (owner: Ottomata) [17:39:52] Analytics-Refinery: Hive User calls UDF to pull real requests or IP out of X-Forwarded-For header - https://phabricator.wikimedia.org/T78812#853687 (kevinator) NEW [17:43:55] Analytics-Refinery: Hive User calls UDF to pull real requestor IP out of X-Forwarded-For header - https://phabricator.wikimedia.org/T78812#853696 (Ottomata) [17:52:02] ottomata, question if you may: [17:52:15] yush? [17:55:59] ottomata: in the apps unique query [17:56:00] https://gist.github.com/Ironholds/428014d22edb7969ff5c [17:56:39] ottomata: could we do the distinct calculation in oozie using something that is not hive? [17:57:52] ottomata: or is hive the best place to do it (dataset by then will be about 10 millions) [17:58:49] hm, why count distinct? oh this is to know how many apps are installed out there? [18:03:38] ottomata: yes [18:06:15] not sure how oozie is relevant [18:06:24] the distinct in hive is probably fine [18:06:31] you could do filtering in a udf if you wanted to [18:07:37] qchris, how do I reply to comments on an old version of a commit? [18:07:45] (responding to more of your comments, have patched for some of them) [18:08:49] Ironholds: Go to [18:08:51] https://gerrit.wikimedia.org/r/#/c/180023/ [18:09:09] To the left of "Patch Set 5", there is a grey rectangle [18:09:14] oh, wait! [18:09:14] Click it. [18:09:16] yes, found it. doy. [18:09:19] Thankee :) [18:09:24] I don't know why I didn't see that before. [18:09:25] yw. [18:11:07] (CR) OliverKeyes: [WIP] UDF for classifying pageviews according to https://meta.wikimedia.org/wiki/Research:Page_view/Generalised_filters (7 comments) [analytics/refinery/source] - https://gerrit.wikimedia.org/r/180023 (owner: Ottomata) [18:11:18] qchris_away, thx, forwarded [18:11:18] t [18:11:35] yurikR: You're goot at stealing keyboard focus :-) [18:11:40] s/goot/good/ [18:11:52] i'm good at many things, focus is not one of them [18:12:01] Hahahaha :-P [18:48:01] Multimedia, MediaWiki-extensions-MultimediaViewer, Analytics: Add Last-Modified to performance logging - https://phabricator.wikimedia.org/T78767#853808 (Tgr) >>! In T78767#852888, @Gilles wrote: > Ah, it turns out that the "timestamp" column IS the Date header. So we only need Last-Modified. I think timesta... [18:49:21] Multimedia, MediaWiki-extensions-MultimediaViewer, Analytics: Add Last-Modified to performance logging - https://phabricator.wikimedia.org/T78767#853811 (Tgr) Oh yeah we have a manual timestamp field, I remember now. [18:53:26] Multimedia, MediaWiki-extensions-MultimediaViewer, Analytics: Add Last-Modified to performance logging - https://phabricator.wikimedia.org/T78767#853815 (Tgr) `Last-Modified` is a [[ http://www.w3.org/TR/cors/#simple-response-header | simple response header ]] so there should not be any CORS issues with this. [18:58:54] Multimedia, MediaWiki-extensions-MultimediaViewer, Analytics: Make upload.wikimedia.org serve images with Timing-Allow-Origin header - https://phabricator.wikimedia.org/T76020#853836 (Aklapper) If contributors wanted to work on this (as this task is marked as "easy"), would they find their way with the inform... [19:02:27] so um, Ironholds, qchris_away. i called that thing Webrequest on purpose [19:02:37] i could be convinced otherwise [19:03:15] as it can be more than just a class that works with pageview def. [19:04:06] originally, that change also had methods (and corresponding udfs) to get the site qualifier, or the project language [19:04:07] etc. [19:04:28] also, if we do ETL the webrequest logs to a different format, it might be a good place to build a webrequest object model [19:05:02] similar to the way I am doing for revisions here: https://gerrit.wikimedia.org/r/#/c/171056/5/refinery-core/src/main/java/org/wikimedia/mediawiki/Revision.java [19:07:33] ottomata, so, huh. [19:07:40] I think, two things [19:07:51] first, it's going to end up a REALLY big class if it's for all things webrequest [19:08:25] remember that just for the new def it will also have to incorporate access method tagging, and a wrapper around the UA parser that does device identification, and XFF param extraction, and a ton of other things [19:08:39] X_Analytics rather [19:08:50] and some of those are generaliseable methods to X_Analytics and some of them are very specific to pageviews. [19:09:24] e.g., some of those regex objects are only things we care about for pageviews. In fact, most of them. [19:09:31] it also creates an inconsistency between the WebStats UDF and the new def. [19:09:42] so basically I can see one of two things happening, here. [19:09:58] the first is we just suck it up and have a single Webstats class and pack everything into that [19:10:29] or we can have a Webstats class that contains generalised methods, such as X_Analytics parsing or site IDing, and then have a new/old def class that inherits from it. [19:11:03] I'd prefer the second, personally; abstracting things away is the big advantage of OOP! But I'm in the class of "not confident enough that they know what they're doing to do more than -1 something, Apache style". [19:11:11] So whatever the outcome, I'll be fine with it, but here is my case. [19:11:15] ...that was long. [19:14:37] all for inheritance, and hm, maybe what you say is ok [19:14:56] if doing OO things, i like object models, and webrequest could be one. that doesn't mean it couldn't encapsulate other more generic classes [19:15:06] so maybe the x-analytics and even pageview stuff could live elsewhere, dunno [19:15:15] but, it would be nice to have an instantiated webrequset object [19:15:17] from which you could do [19:15:24] if webrequest.isPageview() ... [19:15:28] rather than have to do [19:15:36] Webrequest.isPageview(fieldA, fieldB, fieldC) [19:15:37] etc. [19:15:56] yup [19:15:58] Ironholds: ^ :) [19:16:03] agreed! [19:16:05] moar elegant hive! [19:16:12] well, that is not hive [19:16:24] with hive, I think it is unlikely that one would use an instantiated webrequest class [19:16:36] since I think the UDFs always work with specific fields [19:16:36] fair! [19:16:41] and it would just be overhead to do [19:16:46] new Webrequets(fieldA, fieldB...) [19:16:50] yup [19:16:59] hmn. [19:17:18] but certainly other languages and techs that want to work with webrequests (or other data sources) [19:17:21] what would happen if we had the UDF just accept (*)? I imagine variadic functions are a tremendous pain in Java (they are in C++) [19:17:40] not having to type everything out would be, you know: nice, though. [19:18:26] Ironholds: e.g. [19:18:26] https://github.com/declerambaul/WikiScalding/tree/master/src/main/scala/org/wikimedia/scalding [19:19:03] I am getting lost in what get's discussed here. Is it still about the name of that class? [19:19:30] Ironholds: I guess variable parameters is not hard? [19:19:30] http://docs.oracle.com/javase/1.5.0/docs/guide/language/varargs.html [19:19:34] Multimedia, MediaWiki-extensions-MultimediaViewer, Analytics: Make upload.wikimedia.org serve images with Timing-Allow-Origin header - https://phabricator.wikimedia.org/T76020#853850 (Tgr) I'm still hoping to find an empty weekend to turn all easy MM bugs into GCI tasks. That would require a pointer to the ri... [19:19:35] qchris: yes, and the motivations for it :) [19:19:46] Java continues to suck less than C++! [19:19:48] well done Java. [19:20:41] In it's currenty form, the class only has one thing (uriHostPattern) that is not strictly �Pageviews only". [19:20:51] But uriHostPattern is pageview relevant. [19:20:58] So everything in there is pageviews. [19:21:07] Only one things might be reusable. [19:21:20] qchris, I am planning for the future! [19:21:35] overengineering from the start!, woohoo! [19:21:43] Yes, for the future, a separate Webrequest class that uses the Pageview class is a nice thing. [19:22:01] :) that would be fine with me :) [19:22:12] haha, a Pageview could extend Webrequest :p [19:22:36] NE way, ok ok [19:23:47] Meh. Not ok. You gotta argue. You cannot give in after me writing some 5 lines. [19:24:06] :-P [19:25:49] qchris, that is kinda what Ironholds said above, and I think it makes sense too. I'm just arguing that an modeling a webrequest might be a good idea [19:26:11] i'm fine if the webrequest model uses encapsulation to do its thang [19:26:17] qchris just approved an idea of mine [19:26:20] brb throwing a party [19:26:24] hahah [19:26:26] a CODE idea. Large party :D [19:26:30] although, i guess I'd start from my way, rather than your way [19:26:48] start with webrequest, and if it needs to be made decomposed into smaller pieces later, then cool [19:27:09] If you prefer to carve it out later ... that's fine by me too. [19:27:24] we are so good at letting the other have his way. [19:27:28] (sometimes) [19:27:28] :) [19:27:41] :-P [19:34:17] mmmn. [19:34:23] I'd rather start off how we mean to continue. [19:34:40] So, for the time being, let's focus on reviewing the pageviews UDF (which now has a specific class) and getting it in [19:34:53] and then I'll throw in a commit that just splits the reusable elements out into a static class [19:34:55] and we can go from there. [19:35:20] oh awesome, i get it qchris, re @RunWith(Parameterized.class) [19:35:21] awesome! [19:35:22] but I'd like to avoid actively building on the current structure, if we know we'll want to change it. Changing it is trivial; changing it after N months of building gets progressively more finnicky. [19:35:23] (CR) QChris: [WIP] UDF for classifying pageviews according to https://meta.wikimedia.org/wiki/Research:Page_view/Generalised_filters (1 comment) [analytics/refinery/source] - https://gerrit.wikimedia.org/r/180023 (owner: Ottomata) [19:35:36] ottomata: \o/ [19:35:53] qchris, aha! I get what you mean (re above patch) [19:35:55] Ironholds: changing the underlying structure is trivial [19:35:57] that's...weirdly beautiful [19:35:58] the UDF can remain the same [19:39:40] ok, qchris, re testing [19:39:53] i like the parametrized input, expected format [19:40:12] That's great! [19:40:29] can we make a little framework for reading the test parameters from a file? [19:40:46] oh wait... [19:40:48] maybe this exists already [19:40:56] hold on, googling... [19:41:34] Hahaha. You really want that external file. I see. [19:41:45] Well, then let's do that external file. [19:42:00] Halfak is also a big fan, for when we want non-Java implementations [19:42:11] the argument makes sense to me. Let's go for it. [19:42:16] qchris: https://github.com/Pragmatists/JUnitParams/blob/master/src/test/java/junitparams/FileParamsTest.java [19:42:16] ? [19:42:56] * qchris looks [19:43:06] also [19:43:06] http://stackoverflow.com/questions/21401504/junit-parameterized-tests-using-file-input [19:45:17] ah this is better link I thikn: http://www.codeyouneed.com/parameterized-junit-test/ [19:46:09] hm, as long as we can stick a json webrequest object into a csv file, then we could do: [19:46:21] Both are the same, aren't they? [19:46:26] {json webrequest object that is a pageview}, true [19:46:31] {json webrequest object that is not a pageview}, false [19:46:40] eh? [19:47:18] Both pages you linked use @RunWith(JUnitParamsRunner.class) [19:47:24] from pl.pragmatists [19:47:28] JUnitParams [19:47:33] yes, same, just a better doc [19:47:38] Ah. Ok. [19:49:21] ottomata: I think the library wouldbe worthwhile to try. [19:50:18] About the "json in csv" ... why put in json and not the directly the needed parameters. [19:50:20] ? [19:50:44] (CR) OliverKeyes: [WIP] UDF for classifying pageviews according to https://meta.wikimedia.org/wiki/Research:Page_view/Generalised_filters (2 comments) [analytics/refinery/source] - https://gerrit.wikimedia.org/r/180023 (owner: Ottomata) [19:52:00] qchris, i think it would be better to start with the data we will actually be working with, especially if/when oliver adds new constraints that requires new fields [19:52:19] wikimedia/mediawiki-extensions-EventLogging#296 (wmf/1.25wmf13 - 3d94580 : Reedy): The build passed. [19:52:19] Change view : https://github.com/wikimedia/mediawiki-extensions-EventLogging/commit/3d9458092336 [19:52:19] Build details : http://travis-ci.org/wikimedia/mediawiki-extensions-EventLogging/builds/44370217 [19:52:27] actually, i would almost rather encode more properties about each test object than just true (meaning isPageview) [19:52:28] we could do [19:52:44] {webrequest object}, {is_pageview: true, is_app_request: false} [19:52:44] etc. [19:52:52] maybe? hmm, i think you will not like that [19:52:57] tell me it is not good... [19:53:09] Analytics-Wikimetrics, Analytics-Engineering: Re-run Wikimetrics data once Labs issues are fixed [8 pts] - https://phabricator.wikimedia.org/T78305#853887 (kevinator) [19:53:22] It is great! [19:53:32] you do like it!? :) [19:53:37] But I only do not like JSON in the mix there [19:53:40] ah [19:53:54] it looks like this lib only works with a 2 column csv [19:54:14] Really? [19:54:28] That would make the whole think pointless. [19:55:00] that's what all the examples do anyway, [19:55:03] column A is input [19:55:05] column B is expected [19:55:13] 1. @FileParameters("src/test/resources/NameUtils.test.csv") [19:55:13] 2. public void testNameCapitalization(String input, String expected) { [19:55:19] 1. john doe, John Doe [19:55:19] 2. DR. JOHN DOE, Dr. John Doe [19:55:50] The docs about JUnit's parametrized tests are that stupid too. But JUnit's parametrized tests can to arbitrary numbers of parameters. [19:55:56] that is also how nuria's UAparser tests are kinda working [19:56:02] The docs just don't tell you explicitly how. [19:56:06] An array of two objects [19:56:16] oh [19:56:20] HMM [19:56:23] ok, will try some stuf [19:56:25] stuff [19:56:39] hm, but wouldn't an object with named fields be better for this anyway? [19:56:45] so you don't have to infer meaning from column order? [20:01:19] https://github.com/Pragmatists/JUnitParams/blob/master/src/test/java/junitparams/ParamsInAnnotationTest.java [20:01:25] ottomata: line 47 in ^. [20:01:39] So the library should be able to handle at least three fields. [20:02:18] If it can do two and and three fields, I hope they can treat arbitrary many. [20:02:26] (But I don't know). [20:02:59] About the named fields ... yes, that's a nice aspect of json. [20:03:04] hm, eyah, ok [20:03:24] But mixing two technologies unneededly ... I am not much fan of that. [20:03:38] Having only CSVs is simpler than JSON in CSVs. [20:03:38] csv + json you mean? [20:03:49] Yes. [20:04:07] DarTar: real nice e-mail with the freebase stuff! [20:05:35] Analytics-Engineering, Analytics-Dashiki, Analytics-Wikimetrics: Remove the "confusing" under reported data for Edits and Pages Created in Vital Signs - https://phabricator.wikimedia.org/T75617#853943 (kevinator) Scope change: remove all data, not just the data before Nov 1st. This work is now a subtask of T... [20:05:48] gwah, archiva isn't letting me log in! [20:05:52] Analytics-Engineering, Analytics-Dashiki, Analytics-Wikimetrics: Remove the "confusing" under reported data for Edits and Pages Created in Vital Signs - https://phabricator.wikimedia.org/T75617#853949 (kevinator) [20:06:08] Analytics-Wikimetrics: Re-run Wikimetrics data once Labs issues are fixed [8 pts] - https://phabricator.wikimedia.org/T78305#841895 (kevinator) [20:07:44] Analytics-Wikimetrics, Analytics-Dashiki, Analytics-Engineering: Remove the "confusing" under reported data for Edits and Pages Created in Vital Signs - https://phabricator.wikimedia.org/T75617#853970 (Milimetric) Open>Resolved Done as of a few hours ago. Vital signs will have no data until tonight whe... [20:09:39] Analytics-Dashiki, Analytics-Wikimetrics: Remove the "confusing" under reported data for Edits and Pages Created in Vital Signs - https://phabricator.wikimedia.org/T75617#853987 (kevinator) [20:11:08] thanks nuria__ [20:13:38] oh, java [20:13:45] y u gotta obfuscate what test failed [20:15:35] qchris, confirmed, arbitrary # of columns is cool [20:15:42] ottomata: \o/ [20:17:06] there we go [20:24:53] (CR) OliverKeyes: [WIP] UDF for classifying pageviews according to https://meta.wikimedia.org/wiki/Research:Page_view/Generalised_filters (2 comments) [analytics/refinery/source] - https://gerrit.wikimedia.org/r/180023 (owner: Ottomata) [20:25:03] ah qchris! [20:25:10] i think it can read csv headers! [20:25:18] and map those to parameter names in tests [20:25:19] checking now! [20:25:23] all. the. better! [20:25:46] would still prefer json, and we could implement our own DataMapper [20:25:49] like [20:25:50] https://github.com/Pragmatists/JUnitParams/blob/master/src/main/java/junitparams/mappers/CsvWithHeaderMapper.java [20:25:51] but ja [20:25:55] dunno [20:25:59] or [20:26:00] hm [20:26:01] yeah [20:26:02] hm [20:26:57] I'd use plain $SOME_TECHNOLOGY if possible. Then you can edit the file nicely with /any/ tool that speaks $SOME_TECHNOLOGY. [20:27:49] csvs are nice, because they are just suited for spreadsheet programs. [20:28:14] And they align nicely (both horizontally and vertically) without any coding. [20:28:20] yes, i thikn you are right, one or the other. we could use a single input field [20:28:29] if using json [20:28:37] and stick the expected values in as fields too [20:28:39] e.g. [20:28:58] {hostname: ..., ip: ..., is_pageview: true, is_appview: false} [20:29:24] it would be nice to just be able to copy/paste webrequest jsons from kafka. also, if we use avro, they will look like json to most users. [20:29:44] so it might be nice to use something consistent throughout refinery. [20:29:51] If you go with json, I think we should separate input from assertion. hence I'd rather go with something like [20:29:52] {input: {hostname: ..., ip: ...}, is_pageview: true, is_appview: false} [20:29:53] just annoying to have to write csvs for these tests. dunno [20:30:00] ah [20:30:03] yeah that would be fine [20:30:07] HMM [20:31:09] I find json with many fields a bit unreadable. But if you say, it has to be json for the test specs, it's ok. [20:31:11] ok, qchris i think I am not going to try to implement a junitparams json mapper right now. CSV shoudl work, doing some more testing. if it does, i will try to abstract the existing tests out into csv files and share them. [20:31:30] Cool. [20:31:30] if we get annoyed with writing CSV data for tests, then we can revisit this later. [20:32:08] Hey ... I missed the "not" in your above JSON mapper. [20:32:19] You really said you would /not/ do it? [20:32:28] Meh. I would have lost a bet :-D [20:32:39] yeah, i think that would be a rabbit hole that I don't have time to go down right now I thikn :) [20:32:44] Ok. Then CSVs for now. [20:32:46] i would probably try to get it all upstreamed and everything [20:33:01] Hahaha. [20:33:50] * qchris keeps fingers crossed that the library's CSV parser does allow proper escaping, so we can use crazy characters in the CSVs. [20:34:47] qchris, wanna hear something fun? [20:35:00] Ironholds: totally! [20:35:02] your comment about splitting out the app logic, since it's reused but very self-contained [20:35:12] https://github.com/Ironholds/WMUtils/blob/master/R/log_sieve.R#L26-L35 [20:35:26] that is precisely what the hacky R implementation does because it was the most rational way of doing things :D [20:35:43] kevinator, when you put a task in a sprint do you leave it in your backlog project? It's a hassle to maintain state on two workboards. [20:36:13] kevinator: also, can I fix the typo in "Incomming" column of https://phabricator.wikimedia.org/project/board/840/ ? :) [20:36:20] Ironholds: :-) [20:37:29] spagewmf: yes, once the team has committed to completing a task in a sprint, I’ll eventually remove it from the “analytics-engineering” project where I groom the backlog. [20:37:51] spagewmf: you’re welcome to fix my typo :-) [20:37:58] pssshhh looks like named headers are not actually implemented [20:38:02] kevinator: thx. I wish you could drag-and-drop between workboard windows [20:38:06] the WithHeader class just skips the first line [20:38:16] the non header class implies that the WithHeader class would be smarter [20:38:16] :-D [20:38:20] it says [20:38:26] that would be nice [20:38:31] be sure [20:38:31] * the columns in the file match exactly the ordering of arguments in the test [20:38:31] * method. [20:38:49] that would imply to me that in the WithHeaders one, you wouldn't have to be sure they match! [20:39:02] oh well :( [20:39:20] :-( [20:53:51] qchris, qq [20:54:02] could I use the contains() strategy with the "sections=0" check as well? [20:54:12] That's not looking for any wildcards, it just cares that sections=0 is in the string somewhere. [20:55:16] Ironholds: Yes, if you're interested in a plain substring search, "contains" would do the trick. [20:55:27] grand! Thanks :) [20:55:31] But ... you're sure thath you want a plain substring search? [20:55:39] I mean ... [20:56:04] You could also parse the parameters and look for a "section" key having value 0 in the set of parameters. [20:56:17] Argh. Shut up qchris. [20:56:32] Not your call. [20:56:42] Ironholds: Sorry. [20:56:50] Ironholds: Yes, contains allows to do that. [20:56:51] hahah [20:56:54] awesome :) [20:57:36] what's the advantage of that approach? Just to play devil's advocate. [20:57:56] * qchris sprays holy water at Ironholds. [21:10:01] (PS7) Ottomata: [WIP] UDF for classifying pageviews according to https://meta.wikimedia.org/wiki/Research:Page_view/Generalised_filters [analytics/refinery/source] - https://gerrit.wikimedia.org/r/180023 [21:10:02] this is pretty awesooooome! ^ [21:10:06] Ironholds: [21:10:07] qchris: [21:10:20] https://gerrit.wikimedia.org/r/#/c/180023/7/refinery-core/src/test/resources/pageview_test_data.csv [21:10:23] together with [21:10:33] neat! [21:10:36] https://gerrit.wikimedia.org/r/#/c/180023/7/refinery-core/src/test/java/org/wikimedia/mediawiki/TestPageview.java [21:10:40] * Ironholds will remember to git pull before amending [21:10:42] (BTW, I renamed it to Pageview [21:10:46] i hope no one is real mad) [21:11:10] +1 to singular names where they make sense :) [21:12:17] OOp [21:12:22] booboo, amending... [21:13:54] (PS8) Ottomata: [WIP] UDF for classifying pageviews according to https://meta.wikimedia.org/wiki/Research:Page_view/Generalised_filters [analytics/refinery/source] - https://gerrit.wikimedia.org/r/180023 [21:13:58] there we go [21:14:00] Ironholds: yea so [21:14:11] now, to add new test cases [21:14:20] you only need to edit the pageview_test_data.csv file [21:14:26] it is a csv [21:14:30] sweeet [21:14:32] halfak, ^ [21:14:34] columns this: [21:14:35] test_description, is_pageview, uri_host, uri_path, uri_query, http_status, content_type, user_agent [21:14:36] wait, can I make one request? [21:14:39] sure! [21:14:41] can we use TSVs? [21:14:45] CSVs are stupid and make baby jesus cry. [21:14:49] i dunno... [21:14:55] i'm using an external lib [21:14:55] also, sometimes people put commas in user agents. [21:15:04] because people are assholes [21:15:05] welp, don't put them in your test cases! [21:15:22] you're saying I artificially limit my test cases to not mimic the real-world scenarios the code will be exposed to? :D [21:15:49] Ironholds, let's just convert it to TSV ourselves :P [21:16:01] ANDREW HIERONYMUS OTTO! Go to your room and think about what you just did! [21:16:05] halfak, yeah, I guess we could [21:16:14] csv.replace("\t", "\\t").replace(",", "\t") [21:17:52] haha [21:17:52] welpo [21:17:58] the answer is yes [21:17:59] but [21:18:10] you'd have to implement a a mapper and create one that knows how to use tsvs [21:18:10] https://github.com/Pragmatists/JUnitParams/tree/master/src/main/java/junitparams/mappers [21:18:33] meh. let's just write a script that converts CSV to TSV. [21:18:50] Oh wait... You're saying something needs to read it. [21:18:59] yes [21:19:21] this junitparams lib automatically reads the csv columns into test method parameters [21:20:38] ottomata: That csvs (or maybe tsv at some point) great! [21:20:58] Ironholds: halfak, I would prefer to to JSON all the way [21:21:03] but that would also require some more coding [21:21:07] this lib works easily witih csvs [21:21:15] and i think that is good enough for now. we can really get into it if we need to later [21:24:09] oh! [21:25:58] Ironholds: it will let us use | char [21:26:00] instead of , [21:26:01] if you like [21:26:22] which do you prefer? [21:26:47] CSV; things have readers for those [21:26:53] aye :) [21:26:56] I've never seen a pipe-separated file and I'd rather we didn't start the trend ;p [21:27:15] this bit of the code is kinda dumb [21:27:36] this is how it is reading the line from the csv [21:27:37] https://github.com/Pragmatists/JUnitParams/blob/master/src/main/java/junitparams/internal/InvokeParameterisedMethod.java#L138 [21:27:54] if (character == ',' || character == '|') { [21:28:00] looks like you can escape commas [21:32:31] halfak, https://en.wikipedia.org/wiki/User_talk:Ironholds#December_2014:_Added_an_additional_educational_reference_for_learning_R_data_mining_techniques. boom [21:36:43] dammit, I can't directly git pull [21:36:44] wtf [21:36:46] rebase? [21:42:11] yeah [21:42:13] Ironholds: [21:42:27] if you have already locally committed [21:42:31] you should do [21:42:39] um [21:43:12] I haven't yet! [21:43:14] oh ok [21:43:16] then easier [21:43:17] git stash [21:43:22] and then rebase? [21:43:36] um, yes, but i'm not sure what to rebase to, since you are rebasing to an unmerged change [21:43:36] um [21:44:59] mmm, potato soup w/ bacon, hot sauce, and garlic :) [21:45:05] wrong room... [21:45:14] um, Ironholds [21:45:15] maybe [21:45:22] git review -m 180023 [21:45:23] ? [21:45:27] use at your own risk! [21:46:20] hmm, nope [21:46:53] ok, Ironholds, here is what I do, but I'm sure ther eis a better way [21:46:55] git stash [21:46:56] git checkout master [21:47:08] git branch -D review/ottomata/pageview [21:47:12] git review -d 180023 [21:47:14] git stash pop [21:47:55] aha [21:48:28] merge conflict! arghhhh! [21:48:38] oh sod it, I'll copy-paste [22:19:28] Ironholds: FYI while talkint to ottomata and qchirs this morning, I logged a task to create a UDF to parse the X-Analytics header: https://phabricator.wikimedia.org/T78805 [22:19:35] yay! [22:19:38] I am stealing the hell out of that [22:19:48] only it'll probably end up as a class rather than a single function [22:20:08] ok… I was saving it for the contractor , but you can call dibs i because it looks like you need it now [22:20:11] because I figure we want both a generalised parser, and parsers for very-common tasks [22:20:17] e.g. UUID extraction [22:20:34] Analytics-Refinery: Hive User calls UDF to extract fields out of X-Analytics header - https://phabricator.wikimedia.org/T78805#854305 (Ironholds) a:Ironholds Yoink [22:20:43] will work on it as soon as I've got this UDF with otto and christian finished [22:20:54] you lot realise I have spent all week writing Java, right? [22:21:00] this fills me with fear. fear and impostor syndrome. [22:21:38] I thought I heard you call yourself a Java programmer in the meeting just now ;-) [22:22:15] naw, I said I'd been writing Java [22:22:22] I don't call myself a programmer, or a developer, except sarcastically [22:34:17] laters all! [22:34:40] laters ottomata! [22:38:51] the only good bit of ottomata going out is fewer commit conflicts [22:38:53] :P [22:56:53] (CR) QChris: [C: 2 V: 2] Catch ValueError raised by hive.partition_datetime_from_path [analytics/refinery] - https://gerrit.wikimedia.org/r/180305 (owner: Ottomata) [23:16:04] Quarry: Switch Quarry to use Material Design Bootstrap theme - https://phabricator.wikimedia.org/T76140#854487 (yuvipanda) Open>declined a:yuvipanda But has a somewhat fucked up license, so... no. https://github.com/FezVrasta/bootstrap-material-design/blob/master/LICENSE.md [23:32:42] Analytics, MediaWiki-User-blocking: Generate stats on blocked IP (ranges) that attempt to edit - https://phabricator.wikimedia.org/T78840#854518 (Krenair) I imagine this is WMF Analytics' area? [23:45:54] (PS9) OliverKeyes: [WIP] UDF for classifying pageviews according to https://meta.wikimedia.org/wiki/Research:Page_view/Generalised_filters [analytics/refinery/source] - https://gerrit.wikimedia.org/r/180023 (owner: Ottomata) [23:47:34] (CR) OliverKeyes: [WIP] UDF for classifying pageviews according to https://meta.wikimedia.org/wiki/Research:Page_view/Generalised_filters (3 comments) [analytics/refinery/source] - https://gerrit.wikimedia.org/r/180023 (owner: Ottomata) [23:52:48] Analytics, MediaWiki-User-blocking: Generate stats on blocked IP (ranges) that attempt to edit - https://phabricator.wikimedia.org/T78840#854542 (Rjd0060) >>! In T78840#854518, @Krenair wrote: > I imagine this is WMF Analytics' area? I don't believe. To clarify it would be helpful for users (or some subset... [23:54:56] seeing gerrit-wm surface my name, is still tremendously weird [23:57:26] Analytics, MediaWiki-User-blocking: Make a SpecialPage to show stats on blocked IP (ranges) that attempt to edit - https://phabricator.wikimedia.org/T78840#854549 (Krenair)