APRIL 11 - MONDAY
9 - 12:30pm : TUTORIAL SESSION 1 (Rooms 519A, 519B)
Centrality Measures on Big Graphs: Exact, Approximated, and Distributed Algorithms
- Francesco Bonchi, ISI Foundation
- Gianmarco De Francisci Morales, Aalto University
- Matteo Riondato, Two Sigma Investments
Abstract: Centrality measures allow to measure the relative importance of a node or an edge in a graph w.r.t. other nodes or edges. In this tutorial, we survey the different definitions of centrality measures and the algorithms to compute them. We start from the most common measures (e.g., closeness centrality, betweenness centrality) and move to more complex ones, like spanning-edge centrality. In our presentation, we begin from exact algorithms and then move to approximation algorithms, including sampling-based ones, and to highly-scalable MapReduce algorithms for huge graphs, both for exact computation and for keeping the measures up-to-date on dynamic graphs where edges are inserted or deleted over time. Our goal is to show how advanced algorithmic techniques and scalable systems can be used to obtain efficient algorithms for important graph mining tasks, and to encourage research in the area by highlighting open problems and possible directions.
A mini-website for the tutorial is available at http://matteo.rionda.to/centrtutorial/ .
2 - 5:30pm : TUTORIAL SESSION 2(Rooms 519A, 519B)
Cryptographic Currencies Crash Course
- Aljisha Judmayer, SBA Research
- Edgar Weippl, SBA Research
Abstract: This tutorial aims to further close the gap between IT security research and the area of cryptographic currencies and block chains. We will describe and refer to Bitcoin as an ex- ample throughout the tutorial, as it is the most prominent representative of a such a system. It also is a good reference to discuss the underlying block chain mechanics which are the foundation of various altcoins (e.g. Namecoin) and other derived systems. In this tutorial, the topic of cryptographic currencies is solely addressed from a technical IT security point-of-view. Therefore we do not cover any legal, sociological, financial and economical aspects. The tutorial is designed for participants with a solid IT security background but will not assume any prior knowledge on cryptographic currencies. Thus, we will quickly advance our discussion into core aspects of this field.
APRIL 12 - TUESDAY
9 - 12:30pm : TUTORIAL SESSION 3(Rooms 519A, 519B, 521ABC)
Mining Big Time-series Data on the Web
- Yasushi Sakurai, Kumamoto University
- Yasuko Matsubara, Kumamoto University
- Christos Faloutsos, Carnegie Mellon University
Abstract: Online news, blogs, SNS and many other Web-based services has been attracting considerable interest for business and marketing purposes. Given a large collection of time series, such as web-click logs, online search queries, blog and review entries, how can we efficiently and effectively find typical time-series patterns? What are the major tools for mining, forecasting and outlier detection? Time-series data analysis is becoming of increasingly high importance, thanks to the decreasing cost of hardware and the increasing on-line processing capability.
The objective of this tutorial is to provide a concise and intuitive overview of the most important tools that can help us find meaningful patterns in large-scale time-series data. Specifically we review the state of the art in three related fields: (1) similarity search, pattern discovery and summarization, (2) non-linear modeling and forecasting, and (3) the extension of time-series mining and tensor analysis. We also introduce case studies that illustrate their practical use for social media and Web-based services.
Automatic Entity Recognition and Typing in Massive Text Corpora
- Xiang Ren, University of Illinois at Urbana-Champaign
- Ahmed El-Kishky, University of Illinois at Urbana-Champaign
- Chi Wang, Microsoft Research
- Jiawei Han, Microsoft Research
Abstract: In today’s computerized and information-based society, we are soaked with vast amounts of natural language text data, ranging from news articles, product reviews, advertisements, to a wide range of user-generated content from social media. To turn such massive unstructured text data into actionable knowledge, one of the grand challenges is to gain an understanding of entities and the relationships between them. In this tutorial, we introduce data-driven methods to recognize typed entities of interest in different kinds of text corpora (especially in massive, domain-specific text corpora). These methods can automatically identify token spans as entity mentions in text and label their types (e.g., people, product, organization) in a scalable way. We demonstrate on real datasets including news articles and yelp reviews how these typed entities aid in knowledge discovery and management.
2 - 5:30pm : TUTORIAL SESSION 4(Rooms 519A, 519B)
Analyzing Sequential User Behavior on the Web
- Philipp Singer, Leibniz Institute for the Social Sciences GESIS
- Florian Lemmerich, Leibniz Institute for the Social Sciences GESIS
Abstract: The World Wide Web is an information environment that facilitates sequen- tial user behavior between states. A prime example for that is the navigation of users between websites enabled through the presence of hyperlinks. How- ever, today, we can think of many other kinds of transitional behavior that many of us perform on a daily base. For instance, if users listen to music on Spotify, they transition between songs, or when users check-in at locations on Foursquare they transition between geocoordinates, or when users write reviews on Amazon they transition between products.
To that end, we consider all kinds of transitions between states as sequences on the Web. States can refer to any kind of categorical action per- formed, such as the ones listed. Our research community has been interested in studying such sequences in various contexts such as (i) modeling, (ii) the detection of regularities and patterns or (iii) the understanding of the production of underlying sequences (e.g., cognitive strategies). Recent research heavily focused on studying human navigation on the Web, but also other types of transition data have sparked the interest of researchers such as mobility sequences, search sequences or song listening sequences. In this tutorial we will give an outline of the fundamental methods of analyzing such categorical sequences on the Web and discuss some recent advancements in-depth.