[00:01:25] Analytics-Backlog, MediaWiki-API, Research-and-Data, Patch-For-Review: log user agent in api.log - https://phabricator.wikimedia.org/T108618#1668957 (EBernhardson) It seemed easy enough so i put the above patches together, i'm sure they can be iterated on a bit though. You don't have to use them if... [00:10:16] (CR) Ori.livneh: [V: 2] Dead-simple parallelization [analytics/statsv] - https://gerrit.wikimedia.org/r/240290 (owner: Ori.livneh) [00:18:32] milimetric, hi, you there? [00:19:23] Analytics-Kanban: Introduction to Hive class {flea} - https://phabricator.wikimedia.org/T113545#1669071 (kevinator) [00:19:24] Analytics-Kanban: {flea} Self-serve Analysis - https://phabricator.wikimedia.org/T107955#1669070 (kevinator) [00:20:24] hi bmansurov [00:21:27] milimetric, thanks for the reply on phab. Just to double check, no page view data for August 2014 in hadoop, right? [00:22:02] milimetric, what does sampled UDP data include in it? [00:22:07] no *unsampled* data from 2014 in hadoop [00:22:39] sampled UDP data has the same format as unsampled, but we only keep 1/1000 requests [00:23:46] Analytics-Backlog, MediaWiki-API, Research-and-Data, Patch-For-Review: log user agent in api.log - https://phabricator.wikimedia.org/T108618#1669099 (EBernhardson) On further thought decided the safest way was a simple two step: * Log to a new channel, ApiRequest. Unknown channels are ignored and n... [00:23:51] milimetric, thanks. HTTP request for viewing pages are TCP, right? [00:24:12] bmansurov: I happened to recently load the sampled data in a table in my own db in hadoop. Do: "use milimetric; show tables;" [00:24:27] ok [00:24:54] bmansurov: right, I mean UDP as in the data was logged and gathered by the old UDP infrastructure, not Kafka [00:25:19] But all web requests and responses except for like streaming video and stuff are TCP [00:25:39] milimetric, i see [00:25:58] milimetric, i see 2 tables [00:26:16] webrequest_sampled and webrequest_sampled_kafka [00:27:26] Sorry, I'm on my ipad, you want the first one, not the kafka one. The dt column in there is in the 2015-10-12 11:11:11 format. Check it out by selecting * from milimetric.webrequest_sampled limit 1 [00:27:54] milimetric, cool, thanks! [00:28:01] milimetric, have a good one [00:28:09] np [00:28:17] thx :) u2 [00:32:41] Analytics, MediaWiki-extensions-MultimediaViewer, Patch-For-Review: Create MediaViewer image varnish hit/miss ratio dashboard - https://phabricator.wikimedia.org/T78205#1669130 (Jdlrobson) Open>stalled What's left to do here? I'm a little confused. No activity since April. Please update and un-s... [00:33:14] Analytics, MediaWiki-extensions-MultimediaViewer, Patch-For-Review: Create dashboard showing file namespace page views and MediaViewer views - https://phabricator.wikimedia.org/T78189#1669134 (Jdlrobson) Open>stalled No activity since January no open patches - what's left to do? [01:49:36] Analytics, MediaWiki-extensions-MultimediaViewer, Patch-For-Review: Create MediaViewer image varnish hit/miss ratio dashboard - https://phabricator.wikimedia.org/T78205#1669350 (Tgr) Done in https://grafana.wikimedia.org/#/dashboard/db/media I think? Although that's (Swift + Varnish) hit/miss, not pur... [01:52:05] Analytics, MediaWiki-extensions-MultimediaViewer, Patch-For-Review: Create dashboard showing file namespace page views and MediaViewer views - https://phabricator.wikimedia.org/T78189#1669359 (Tgr) Fixing the MMV stats would be the first step as they are totally broken now: http://multimedia-metrics.w... [02:30:52] hello [02:31:00] partial eventlogging outage going on [02:31:02] apparently [02:31:05] anyone around to help? [02:31:11] nuria: milimetric ? [02:31:17] madhuvishy: ^ [02:31:20] I have sms'd otto [02:37:09] hey yuvipanda. I'll take a look [02:37:24] milimetric: ottomata also showed up, investigating on -operations [02:37:26] and thank you :) [02:38:08] thank you, I haven't done anything yet :) [02:38:48] milimetric: ok :) [03:11:04] Analytics, Developer-Relations, MediaWiki-API, Research consulting, and 3 others: Metrics about the use of the Wikimedia web APIs - https://phabricator.wikimedia.org/T102079#1669499 (EBernhardson) >>! In T102079#1622808, @Tgr wrote: > We discussed this recently in Reading Infrastructure; there are... [05:26:08] Analytics-Backlog, MediaWiki-API, Research-and-Data, Patch-For-Review: log user agent in api.log - https://phabricator.wikimedia.org/T108618#1669628 (Tgr) Awesome, thanks a lot! [08:09:26] Analytics-Tech-community-metrics, DevRel-September-2015: Provide open changeset snapshot data on Sep 22 and Sep 24 (for Gerrit Cleanup Day) - https://phabricator.wikimedia.org/T110947#1669841 (Dicortazar) Numbers for today. Data were updated up to 2:00 am CEST. * Total number of changesets waiting for a... [08:10:45] Analytics-Tech-community-metrics, DevRel-September-2015: Provide open changeset snapshot data on Sep 22 and Sep 24 (for Gerrit Cleanup Day) - https://phabricator.wikimedia.org/T110947#1669846 (Dicortazar) [08:27:33] Analytics-Tech-community-metrics, DevRel-September-2015: Provide open changeset snapshot data on Sep 22 and Sep 24 (for Gerrit Cleanup Day) - https://phabricator.wikimedia.org/T110947#1669869 (Dicortazar) Some method notes, just in case you liked to reproduce the analysis: This analysis uses the [[ http:... [09:05:25] (PS1) Addshore: Add .gitreview [analytics/limn-wikidata-data] - https://gerrit.wikimedia.org/r/240651 [09:09:21] (PS1) Addshore: Script for tracking site_stats over time [analytics/limn-wikidata-data] - https://gerrit.wikimedia.org/r/240652 [09:10:51] (PS1) Addshore: Add getclaims property use tracking script [analytics/limn-wikidata-data] - https://gerrit.wikimedia.org/r/240653 [09:12:07] (PS1) Addshore: Add sample cron [analytics/limn-wikidata-data] - https://gerrit.wikimedia.org/r/240654 [09:26:48] Analytics-Tech-community-metrics, DevRel-September-2015: Provide open changeset snapshot data on Sep 22 and Sep 24 (for Gerrit Cleanup Day) - https://phabricator.wikimedia.org/T110947#1669972 (Dicortazar) Open>Resolved [09:41:23] Analytics-Tech-community-metrics, DevRel-September-2015: Provide open changeset snapshot data on Sep 22 and Sep 24 (for Gerrit Cleanup Day) - https://phabricator.wikimedia.org/T110947#1670047 (Qgil) [10:25:43] (CR) Christopher Johnson (WMDE): [C: 2 V: 2] changes per prototype review adds preliminary owl for metric definitions adds dates and percentage to infoBox reorganizes tabs adds items as [wikidata/analytics/dashboard] - https://gerrit.wikimedia.org/r/239855 (https://phabricator.wikimedia.org/T108404) (owner: Christopher Johnson (WMDE)) [10:59:52] (CR) Mforns: [C: -1] "Just a couple lines that can be removed." (6 comments) [analytics/limn-mobile-data] - https://gerrit.wikimedia.org/r/240599 (owner: Milimetric) [11:08:18] Does anyone know if mailing list info is on the analytics cluster anywere? [11:15:28] Analytics-Tech-community-metrics, DevRel-September-2015: Provide open changeset snapshot data on Sep 22 and Sep 24 (for Gerrit Cleanup Day) - https://phabricator.wikimedia.org/T110947#1670309 (Aklapper) Thank you a lot for these numbers! Followup happens in {T113378}. [11:30:16] Analytics-Kanban: Investigate sample cube pageview_count vs unsampled log pageview count - https://phabricator.wikimedia.org/T108925#1670316 (JAllemandou) @Tbayer, @JKatzWMF : I move this task to done since you have not commented in negative ways :) [11:33:28] Hey halfak [11:33:32] You there ? [12:13:27] Analytics-Tech-community-metrics, DevRel-September-2015: Provide open changeset snapshot data on Sep 22 and Sep 24 (for Gerrit Cleanup Day) - https://phabricator.wikimedia.org/T110947#1670430 (Aklapper) >>! In T110947#1669841, @Dicortazar wrote: > Numbers for today. Data were updated up to 2:00 am CEST.... [12:35:53] Ironholds: around? :) [12:39:07] Analytics-Tech-community-metrics, DevRel-September-2015: Provide open changeset snapshot data on Sep 22 and Sep 24 (for Gerrit Cleanup Day) - https://phabricator.wikimedia.org/T110947#1670515 (Dicortazar) @Aklapper, I can update numbers tomorrow if needed. Please, let me know! [12:57:41] Analytics-Kanban: {flea} Self-serve Analysis - https://phabricator.wikimedia.org/T107955#1670574 (coren) [13:25:54] Analytics-Backlog, Analytics-EventLogging, Traffic, operations: EventLogging query strings are truncated to 1014 bytes by ?(varnishncsa? or udp packet size?) - https://phabricator.wikimedia.org/T91347#1670603 (BBlack) Open>Resolved The above commit should be live on all prod + beta varnishes... [13:26:12] Analytics-EventLogging, MediaWiki-extensions-NavigationTiming, Performance-Team, operations: Increase maxUrlSize from 1000 to 1500 - https://phabricator.wikimedia.org/T112002#1670606 (BBlack) The varnish change should be live on all production and beta varnishes now, raising the limit to ~2K there. [13:43:57] Analytics-Kanban: {flea} Self-serve Analysis - https://phabricator.wikimedia.org/T107955#1670631 (coren) [13:51:31] Analytics-Backlog, Analytics-EventLogging: Send raw server side events to Kafka using a PHP Kafka Client {stag} - https://phabricator.wikimedia.org/T106257#1670641 (Ottomata) Hey a-team, we can do this now! :D [13:52:14] Cool ottomata --^ [13:52:44] ottomata: We had a talk yesterday with nuria about logs from search/discovery coming from client [13:53:04] ottomata: Would like to get confirmation from you, since you handled the thing more directly with ebernhardson [13:53:27] ottomata: I'm in interview now, but let's spend some time after please :) [13:55:22] Analytics-Backlog, Analytics-EventLogging, Traffic, operations: EventLogging query strings are truncated to 1014 bytes by ?(varnishncsa? or udp packet size?) - https://phabricator.wikimedia.org/T91347#1670654 (Ottomata) Awesome, thanks @BBlack! [13:58:51] ottomata: can I have php5-curl on stat1002? :) [13:59:09] Analytics-Kanban: Support moving and adding new columns in reportupdater {lion} [5 pts] - https://phabricator.wikimedia.org/T113600#1670665 (Milimetric) NEW a:Milimetric [13:59:33] mforns: does a different part of reportupdater make sure that the header and result rows have the same number of columns? [13:59:38] or should I add that in update_results [13:59:54] milimetric, mmmm I'm not sure... [13:59:58] I decided against adding it yesterday but today I'm changing my mind :) [14:00:03] milimetric, I'm in an interview with a candidate [14:00:05] ok, I'll add it to the writer just in case it skips [14:00:07] sorry! [14:00:13] don't answer IRC!! [14:00:13] no, we can talk later [14:00:16] :] [14:00:18] lol [14:02:47] addshore: has it got a deb package? :) i'm sure you can, uH, would you file a quick phab ticket and assing to me? [14:02:51] i gotta puppetize [14:03:11] Yeh, I'll make a phab ticket :) [14:04:13] Analytics-Cluster, operations: php5-curl for stat1002 - https://phabricator.wikimedia.org/T113602#1670683 (Addshore) NEW a:Ottomata [14:04:18] ottomata: ^^ :0 Thanks! [14:06:16] (PS2) Milimetric: Handle new or re-arranged columns [analytics/limn-mobile-data] - https://gerrit.wikimedia.org/r/240599 (https://phabricator.wikimedia.org/T113600) [14:06:29] (CR) Milimetric: Handle new or re-arranged columns (5 comments) [analytics/limn-mobile-data] - https://gerrit.wikimedia.org/r/240599 (https://phabricator.wikimedia.org/T113600) (owner: Milimetric) [14:06:52] (CR) Milimetric: Handle new or re-arranged columns (1 comment) [analytics/limn-mobile-data] - https://gerrit.wikimedia.org/r/240599 (https://phabricator.wikimedia.org/T113600) (owner: Milimetric) [14:12:38] Thanks ottomata ! [14:12:42] ( milimetric thanks for review! ) [14:12:47] addshore yw [14:12:52] easy peasy [14:12:57] (: [14:12:57] np [14:14:47] addshore: Notice: /Stage[main]/Statistics::Compute/Package[php5-curl]/ensure: ensure changed 'purged' to 'present' [14:15:30] Epic! Woo! I'll go continue what I was doing once I more rooms! (: [14:16:01] holaaa [14:17:47] Hi nuria :) [14:21:19] hey nuria! [14:21:29] btw, do you know where the eventlogging.client_errors metric comes from? [14:21:36] http://grafana.wikimedia.org/#/dashboard/db/eventloggingschema?panelId=10&fullscreen&edit [14:23:19] or maybe mforns knows? ^ [14:25:41] client errors could be [14:25:44] url size [14:25:52] that is directly sent to grafana by a hook [14:26:22] in mediawiki code, you invoque a function in javascript and it post directly to a statsd http endpoint [14:26:39] ottomata: makes sense? [14:27:26] yeah, am talking to Krin kle now in ops, thank you! [14:27:47] ottomata: ya, BTW Krinkle i just merged the url size patch after talking to brandon [14:29:18] ahhh ok [14:43:48] joal: want to talk about fingerprinting? [14:43:58] nuria: in interview, after :) [14:44:05] joal: k [15:01:46] hey milimetric [15:01:53] I'm done [15:02:10] hey mforns. I made the change and pushed a new patchset [15:02:11] take a look [15:02:39] ok! [15:03:32] nuria: ready :) [15:03:35] batcave ? [15:05:31] joal: sure, 2 mins [15:11:11] (PS1) Addshore: Add social stats tracking script [analytics/limn-wikidata-data] - https://gerrit.wikimedia.org/r/240710 [15:32:05] ottomata1: holaaaa [15:32:12] hmmmm [15:32:12] ottomata1: standupppp [15:32:18] interview! [15:32:37] research@ on mysqlstore doesnt have access to select into output file? :O [15:32:57] or perhaps not into my home dir [15:32:58] bah [15:33:31] addshore, it tries to write the output file on the DB server. [15:33:39] ohhhhh *facepalm* [15:33:54] To write to stat1002/3 you want just write to stdout directly from the mysql cli [15:34:05] okay [15:34:15] * addshore just wants to select a whole tables as a tsv :0 [15:34:16] So, mysql -h .... -u research -e "SELECT * FROM derp" > outfile.tsv [15:34:16] :) [15:34:23] awesome [15:34:29] You can also pass your query to stdin [15:34:30] so [15:34:45] cat query.sql | mysql -h ... -u research > query_output.tsv [15:34:53] Godspeed :) [15:35:02] Analytics-Kanban, Research consulting, Research-and-Data: Validate Uniques using Last Access cookie {bear} [55 pts] - https://phabricator.wikimedia.org/T101465#1670982 (Milimetric) [15:36:51] Analytics-Kanban, Research consulting, Research-and-Data: Validate Uniques using Last Access cookie {bear} [89 pts] - https://phabricator.wikimedia.org/T101465#1339575 (Milimetric) [15:39:03] Analytics-Kanban: Investigate sample cube pageview_count vs unsampled log pageview count [5 pts] - https://phabricator.wikimedia.org/T108925#1670984 (Milimetric) [15:41:11] (PS1) Addshore: Add sql to tsv script [analytics/limn-wikidata-data] - https://gerrit.wikimedia.org/r/240724 [15:41:12] halfak: woo! ^^ [15:41:59] :D [15:42:13] *goes to read where he should actually put them* .... [15:43:47] Analytics-Kanban, Patch-For-Review: Make reportupdater support script execution [8 pts] {crow} - https://phabricator.wikimedia.org/T112109#1671018 (Milimetric) [15:44:42] (PS1) Addshore: Also copy tsv files to aggregate-datasets [analytics/limn-wikidata-data] - https://gerrit.wikimedia.org/r/240725 [15:45:19] I guess I should also import the legacy data we have at some point... [15:48:32] Analytics-Kanban, Privacy: Identify possible user identity reconstruction using location and user_agent_map pageview aggregated fields to try to link to IPs in webrequest {slug} - https://phabricator.wikimedia.org/T108843#1671035 (Milimetric) [15:49:03] Analytics-Cluster, Analytics-Kanban, Patch-For-Review: Decommission remaining old Hadoop Workers {hawk} - https://phabricator.wikimedia.org/T112113#1671036 (Milimetric) [15:49:12] Analytics-Cluster, Analytics-Kanban, Patch-For-Review: Create new Hive / Oozie server from old analytics Dell {hawk} - https://phabricator.wikimedia.org/T110090#1671037 (Milimetric) [15:51:02] Analytics-Backlog, Analytics-EventLogging, Fundraising-Backlog, Unplanned-Sprint-Work, and 2 others: Promise returned from LogEvent should resolve when logging is complete - https://phabricator.wikimedia.org/T112788#1671043 (Nuria) [16:06:07] Hey halfak ! [16:06:16] You can't hide anymore, I know you're here :D [16:06:54] o/ joal [16:06:59] Did I miss a ping? [16:07:21] Earlier than acceptable :) [16:08:04] halfak: we have not talked again about the help I could possibly give to your research folks [16:08:27] Oh yes. It seems that Nitin has come back and would like to keep ownership over loading data in. [16:08:32] So it was a false alarm. [16:08:39] :) [16:08:51] But it would be nice to keep reaching out to you to consult about some of the decisions we're making. [16:08:52] halfak: Nothing to do with ownership though, [16:09:08] Yeah. Probably the wrong word. [16:09:16] no issue [16:09:29] In any case, you know where to find me :) [16:09:37] Yup :) [16:09:42] Thanks for reminding me. [16:10:57] no problem, I was wondering :) [16:26:04] (CR) JanZerebecki: [C: 2 V: 2] Add .gitreview [analytics/limn-wikidata-data] - https://gerrit.wikimedia.org/r/240651 (owner: Addshore) [16:51:20] hi milimetric. [16:51:52] question about piwiki: the data collected is not kept on our servers, right, milimetric? [16:53:19] leila: we are at staff meeting [16:53:25] thanks madhuvishy. [17:03:49] Analytics-Tech-community-metrics, Developer-Relations, DevRel-September-2015: Check whether it is true that we have lost 40% of (Git) code contributors in the past 12 months - https://phabricator.wikimedia.org/T103292#1671397 (Nemo_bis) https://www.openhub.net/p/mediawiki/contributors/summary agrees wi... [17:08:46] Analytics-Backlog: Make Logstash consume from Kafka:eventlogging_EventError - https://phabricator.wikimedia.org/T113627#1671439 (mforns) NEW [17:09:26] Analytics-Tech-community-metrics, DevRel-September-2015: Provide open changeset snapshot data on Sep 22 and Sep 24 (for Gerrit Cleanup Day) - https://phabricator.wikimedia.org/T110947#1671447 (Aklapper) >>! In T110947#1670515, @Dicortazar wrote: > @Aklapper, I can update numbers tomorrow if needed. Please... [17:12:49] leila: yes, piwik data is kept on a server in labs [17:13:06] I mean, yes, it's kept on our servers [17:18:10] Analytics, Analytics-Backlog, Analytics-Cluster: Setup pipeline for search logs to travel through kafka and camus into hadoop - https://phabricator.wikimedia.org/T113521#1671521 (kevinator) p:Triage>High [17:18:59] Analytics-Backlog: Make Logstash consume from Kafka:eventlogging_EventError - https://phabricator.wikimedia.org/T113627#1671548 (kevinator) p:Triage>Normal [17:21:06] (PS1) Christopher Johnson (WMDE): adds sparql lookup function for metric metadata removes markdown files modifies owl to make legal uris [wikidata/analytics/dashboard] - https://gerrit.wikimedia.org/r/240758 (https://phabricator.wikimedia.org/T113180) [17:22:12] Analytics-Backlog: Enable use of Python 3 in Spark - https://phabricator.wikimedia.org/T113419#1671568 (kevinator) p:Triage>Normal [17:24:07] Analytics-Backlog, Wikimedia-Developer-Summit-2016: Developer summit session: Pageview API overview - https://phabricator.wikimedia.org/T112956#1671585 (kevinator) p:Triage>Normal [17:24:07] (PS2) Christopher Johnson (WMDE): adds sparql lookup function for metric metadata removes markdown files modifies owl to make legal uris [wikidata/analytics/dashboard] - https://gerrit.wikimedia.org/r/240758 (https://phabricator.wikimedia.org/T113180) [17:24:52] milimetric: I see. what can be the privacy issue around it? that labs is not private and there will be private data collected via piwiki? [17:32:24] leila: there are a couple of issues with labs [17:32:35] 1. it's not as secure as production, so we can't guarantee that we don't leak the data [17:32:53] 2. we have a privacy policy that says we can't send private data to third parties for analysis from labs [17:33:54] but, as far as I can see, and as far as I think legal has commented, we're not violating that policy by monitoring apps in labs with the piwik instance on labs. We can ask them and get another opinion on it. But I think after I made the adjustments to the privacy settings of piwik, we're ok from a legal and ethical perspective [17:34:42] to be clear, no IPs are being stored anywhere, only half-masked IPs are stored for a restricted period of time. This is not dangerous from any definition I can find. [17:38:35] joal: iam getting " FAILED: ClassCastException org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableStringObjectInspector cannot be cast to org.apache.hadoop.hive.serde2.objectinspector.primitive.BooleanObjectInspector" [17:38:46] joal: whn doing a trivial query on hive [17:42:09] nuria: using a UDF ? [17:42:16] ya [17:42:28] https://www.irccloud.com/pastebin/gGW9BzNK/ [17:42:54] it's the "instr" [17:43:27] instr is a hive lnaguage function, no ? [17:44:10] joal: yes, sorry right query is: select user_agent_map, uri_path from webrequest where year=2015 and month=09 and day=01 and hour=01 and uri_path and instr("api", uri_path) >1 limit 10; [17:46:45] joal: neverminddd!!! [17:46:49] joal: operator error [17:47:08] yup, instr(str, substr) [17:47:45] oh now actually [17:48:09] nuria: instr can also return null right? will null > 1 succeed? [17:48:29] trueee [17:49:15] madhuvishy: null >1 will not actually fail, but won't actually return a result I htink [17:49:30] may be coalesce(instr("api", uri_path), 0) > 1? [17:49:39] more correct for sure :) [17:50:44] nuria: then you dont have to put the "and uri_path" null check condition [17:51:15] madhuvishy: nice [17:53:43] nuria, madhuvishy : problem actually comes from the "AND uri_path " check --> "AND uri_path IS NOT NULL " [17:53:57] aha [18:04:57] Analytics-Backlog, MediaWiki-API, Research-and-Data, Patch-For-Review: log user agent in api.log - https://phabricator.wikimedia.org/T108618#1671906 (Nuria) Have we checked that user agent is not already present on API requests on webrequest table in hadoop? I think is there: >select user_agent_ma... [18:10:10] nuria: ooh also, wanted to ask, what is the advantage of instr over say uri_path like '%api.php%' [18:10:14] joal: ^ [18:10:34] madhuvishy: I don't know :) [18:10:57] madhuvishy: none i think. the instr came faster to mind cause #meknownothingsql [18:11:33] he he okay, i was like, huh why din't we just write that instead of coalesce(instr(..)) [18:14:22] madhuvishy: I actually don't know if '%blha%' is executed differently from instr (that could be a simple loop over the char array) [18:14:34] madhuvishy: fun question :) [18:36:54] joal: will document what we have done & do checks and balances today, will send docs to security too so they can look at results. starting now as I just came out of meeting. [18:38:52] milimetric: when you have time, let's chat about piwiki. Ellery is also around and talking about it in person can help. [18:39:08] I can schedule something for later today if you're happy with that milimetric [18:58:28] (CR) JanZerebecki: [C: -1] "As is this is too complex to be in shell. Use something more suitable like ocaml, haskel, rust or python... Alternatively maybe this can b" [analytics/limn-wikidata-data] - https://gerrit.wikimedia.org/r/240652 (owner: Addshore) [18:59:13] (CR) JanZerebecki: [C: -1] "Even more complex shell, see previous patch regarding that." [analytics/limn-wikidata-data] - https://gerrit.wikimedia.org/r/240653 (owner: Addshore) [19:00:51] (CR) JanZerebecki: "Will this get used somewhere or does this instead need to go into puppet?" [analytics/limn-wikidata-data] - https://gerrit.wikimedia.org/r/240654 (owner: Addshore) [19:02:05] (CR) JanZerebecki: "Yay, no shell!" [analytics/limn-wikidata-data] - https://gerrit.wikimedia.org/r/240710 (owner: Addshore) [19:04:15] (CR) JanZerebecki: "Barely acceptable use of shell ;)" (1 comment) [analytics/limn-wikidata-data] - https://gerrit.wikimedia.org/r/240724 (owner: Addshore) [19:05:42] leila: ready! [19:10:20] Thx nuria :) [19:38:23] (CR) Mforns: [C: 2 V: 2] "Code looks awesome!" (2 comments) [analytics/limn-mobile-data] - https://gerrit.wikimedia.org/r/240599 (https://phabricator.wikimedia.org/T113600) (owner: Milimetric) [19:39:32] sweet, mforns you tested with real queries and stuff? [19:39:33] (PS1) Joal: [WIP] Add camus offsets reader to refinery-core [analytics/refinery/source] - https://gerrit.wikimedia.org/r/240868 [19:39:42] milimetric, yes, seemed to work [19:39:55] cool, that was fast [19:40:01] anyway I made a copy of /a/limn-public-data [19:40:21] :) thx, I'll merge their change that changes columns and I'll check on it tomorrow [19:40:31] ok, cool [19:40:41] btw, is gerrit logging all of you guys out as well? [19:40:43] or is that just me [19:40:53] mmm [19:40:57] not happened to me [19:42:08] (CR) Milimetric: [C: 2 V: 2] "The reportupdater changes needed to support this kind of change have been deployed. This means that you can change column order and add n" [analytics/limn-edit-data] - https://gerrit.wikimedia.org/r/237534 (owner: Jforrester) [20:05:04] Byr a-team ! [20:05:10] bye joal! [20:05:12] nite [20:05:21] See you tomorrow :) [20:05:52] I'm leaving too a-team, see ya! [20:06:05] nite mforns [20:07:14] for a second I was enthused you call yourselves the a-team then I just remembered [a]nalytics [20:07:19] still cool tho [20:07:41] I don't know what you mean. Mr. T out [20:11:48] Analytics-Backlog, MediaWiki-API, Research-and-Data, Patch-For-Review: log user agent in api.log - https://phabricator.wikimedia.org/T108618#1672366 (Anomie) >>! In T108618#1671906, @Nuria wrote: > Have we checked that user agent is not already present on API requests on webrequest table in hadoop?... [20:30:24] Analytics-Backlog, MediaWiki-API, Research-and-Data, Patch-For-Review: log user agent in api.log - https://phabricator.wikimedia.org/T108618#1672470 (Nuria) @Anomie: You have the query_url. This is what the table keeps: https://wikitech.wikimedia.org/wiki/Analytics/Data/Webrequest The point I was... [20:31:17] Analytics-Backlog, MediaWiki-API, Research-and-Data, Patch-For-Review: log user agent in api.log - https://phabricator.wikimedia.org/T108618#1672477 (Tgr) Is there a way to get a unique identifier to the varnish log in MediaWiki code? Otherwise that data is not going to help much. (It would be nice... [20:33:05] Analytics-Backlog, MediaWiki-API, Research-and-Data, Patch-For-Review: log user agent in api.log - https://phabricator.wikimedia.org/T108618#1672482 (Tgr) Varnish does not provide POST data and there is no way to get them from it. The original plan was to add them via X-Analytics headers but if we l... [20:53:50] Analytics-Backlog, MediaWiki-API, Research-and-Data, Patch-For-Review: log user agent in api.log - https://phabricator.wikimedia.org/T108618#1672585 (EBernhardson) >>! In T108618#1672470, @Nuria wrote: plus harvest of ad-hoc data in hadoop (please note that data will not be on a table) in order to... [21:02:33] tgr: yt? [21:09:38] nuria: yes [21:14:56] Analytics, Analytics-Cluster, Fundraising Tech Backlog, Fundraising-Backlog, operations: Verify kafkatee use for fundraising logs on erbium - https://phabricator.wikimedia.org/T97676#1672662 (ellery) @Jgreen Does that mean the data in pgheres.bannerimpressions is incorrectly scaled? If not, w... [21:36:15] gotta check out for now, will be back later [21:37:54] Analytics, Analytics-Cluster, Fundraising Tech Backlog, Fundraising-Backlog, operations: Verify kafkatee use for fundraising logs on erbium - https://phabricator.wikimedia.org/T97676#1672781 (atgo) hey @ellery - right now we've got it in Q2, which means we'll hopefully look at it before the ne... [21:54:07] hi, is https://phabricator.wikimedia.org/diffusion/ODBE/repository/master/ being maintained? [22:00:11] It build depends on things that are not in newer versions of ubuntu. We'll probably fix it for us, and can contribute the changes back if you want. [22:01:48] simonft: It looks like it was maintained by ops, can you ask on #wikimedia-operations [22:05:11] Analytics, MediaWiki-API, Research-and-Data: api.log does not indicate errors and exceptions - https://phabricator.wikimedia.org/T113672#1672882 (Spage) NEW [22:06:14] madhuvishy: thanks [22:06:42] np [22:09:46] Analytics, Analytics-Cluster, Fundraising Tech Backlog, Fundraising-Backlog, operations: Verify kafkatee use for fundraising logs on erbium - https://phabricator.wikimedia.org/T97676#1672900 (Jgreen) >>! In T97676#1672662, @ellery wrote: > @Jgreen Does that mean the data in pgheres.bannerimpr... [22:13:36] tgr: so we are sure POST arguments do not come in via varnish, even with our extensions? [22:15:24] Analytics, MediaWiki-API: api.log does not indicate errors and exceptions - https://phabricator.wikimedia.org/T113672#1672934 (DarTar) [22:55:06] nuria: I asked somebody about that (can't remember, maybe bblack?) and apparently the varnish shared memory log thing that varnishncsa and varnishkafka is based on doesn't handle request bodies at all [22:55:27] and since POST requests store the parameters in the body... [23:06:14] nuria: the X-Varnish header contains a unique ID generated by varnish so that could be used to correlate logs [23:07:00] unique-ish... it's only 9 digits [23:09:21] Analytics-Backlog: English Wikipedia stats for 5 millionth article - https://phabricator.wikimedia.org/T113683#1673115 (Halfak) NEW [23:10:22] Analytics-Backlog: English Wikipedia stats for 5 millionth article - https://phabricator.wikimedia.org/T113683#1673124 (Halfak) [23:10:34] Analytics-Backlog: English Wikipedia stats for 5 millionth article - https://phabricator.wikimedia.org/T113683#1673115 (Halfak) @ezachte, I figure you might have some of these stats handy. Regretfully, I don't. [23:11:59] nuria: does varnishkafka run on the last varnish server? or how are the records deduped? [23:24:22] 'sequence' in the webrequest database is the varnish request id from X-Varnish, right? [23:25:16] does that include both ids for a cached response? [23:29:30] my varnish is a little rusty, but IIRC varnish never even looks at the body, it looks at the headers and pipes the body straight through. pretty likely POST body is not available :(