Upload
sebastien
View
3.125
Download
3
Embed Size (px)
DESCRIPTION
Mesure(s) de phénomènes dynamiques sur le web : Théorie(s), modèle(s), expérimentation(s), interfaces
Citation preview
IC 05 / semestre printemps 2008
IC 05 / semestre printemps 2008Franck.ghitalla
Département TSHPrésident de WebAtlas
Mesure(s) de phénomènes dynamiques sur le web
Théorie(s), modèle(s), expérimentation(s), interfaces
IC 05 / semestre printemps 2008
Temporal patterns, Topic Detection and Tracking, network and human dynamics…
1) Quelques repères bibliographiques
IC 05 / semestre printemps 2008
A.-L. Barabasi, Nature, 2005.
IC 05 / semestre printemps 2008
A.-L. Barabasi, Physics, 2005.
IC 05 / semestre printemps 2008
Kumar-Raghavan-Novak-Tomkins, WWW3 conference, 2003.
IC 05 / semestre printemps 2008
Beyond serving as online diaries, weblogs have evolved into a complex social structure, one which is in many ways ideal for the study of the propagation of information. As
weblog authors discover and republish information, we are able to use the existing link structure of blogspace to track its flow. Where the path by which it spreads is
ambiguous, we utilize a novel inference scheme that takes advantage of data describing historical, repeating patterns of "infection." Our paper describes this technique as well as
a visualization system that allows for the graphical tracking of information flow.
E. Adar, Lada A. Adamic, WebIntelligence Conference, 2005.
IC 05 / semestre printemps 2008
AbstractA fundamental problem in text data mining is to extract meaningful structure from document streams that arrive continuously over time. E-mail and news
articles are two natural examples of such streams, each characterized by topics that appear, grow in intensity for a period of time, and then fade away. The
published literature in a particular research eld can be seen to exhibit similar phenomena over a much longer time scale. Underlying much of the text mining
work in this area is the following intuitive premise | that the appearance of a topic in a document stream is signaled by a \burst of activity," with certain features
rising sharply in frequency as the topic emerges.The goal of the present work is to develop a formal approach for modeling such bursts," in such a way that they can be robustly and eciently identied, and can provide an organizational framework for analyzing the underlying content. The approach is based on modeling the stream using an innite-state automaton, in
which bursts appear naturally as state transitions; it can be viewed as drawing an analogy with models from queueing theory for bursty network trac. The resulting
algorithms are highly ecient, and yield a nested representation of the set of bursts that imposes a hierarchical structure on the overall stream. Experiments with e-mail and research paper archives suggest that the resulting structures
have a natural meaning in termsof the content that gave rise to them.
J. Kleinberg, 8th ACM SIGKDD international conference on Knowledge discovery and data mining , 2002.
IC 05 / semestre printemps 2008
Temporal patterns, Topic Detection and Tracking, network and human dynamics…
2) Modéliser les phénomènes temporels sur le web
IC 05 / semestre printemps 2008
1
2 3
4
Articulation des TYPES de temporalité (information ON
and IN the net)
Topic Detection and Tracking (TDT)
Dynamics of network (patterns temporels)
Articulation des NIVEAUX de temporalité(Global/local
dynamics)
Modèle opérationnelModèle opérationnelDesign du système(s) de mesure
Production/vérification des hypothèsesOptimisation/profiling des systèmes de capture et de traitement
Question(s) sémiologique(s) de visualisation et le défi de la spatialisation de phénomènes temporels
IC 05 / semestre printemps 2008
2-1) Articulation des TYPES de temporalité (information ON and IN the net)
Préoccupation contemporaine : téléphonie, cryptographie, norme Ipv6 et réseaux ad-hoc…et maintenant le web / à différentes échelles
Extraire des structures signifiantes des flux d’informations / le champ de la TDT (Topic Detection and Tracking) / Un thème dans un courant de documents : développement de l’activité autour du thème, puis retombée / Le temps comme ordre (principe d’ordonnancement)
MAIS distinction à faire entre « événement de structure » (Network dynamics) et modèle propagatoire (épidémiologique et/ou viral) de la diffusion ou des flux
Information IN and ON the Net
IN and hypertext topology« Any local change in the network topology can be obtained through a combination of four elementary processes: addition and removal of a node and addition or removal of an edge. » / growth, preferential attachment as dynamic rules
ON and information propagationModèles de circulation virale / la topologie du réseau comme vecteurÉpidémiologie, rumeur, diffusion de l’innovation
IC 05 / semestre printemps 2008
2-2) Articulation des NIVEAUX de temporalité, (Global/local dynamics)
Verrous théorique et technique : Temporalité propre des objets réseau / temporalité du phénomène étudié (détection de signal faible, mouvement de « fond », organisation d’acteurs…) / temporalité des mesures / modèles théoriques de l’Histoire
Exemple : quand (et quoi) sonder? Avec quelle régularité pour quel résultats?
Propriété méthodologique : cartographie = rendre statique du dynamique, mesure de phénomènes dynamiques : introduire du temps dans du statique / l’aller-retour statique-dynamique
IC 05 / semestre printemps 2008
2-3) Topic Detection and Tracking (TDT)
TOPIC DETECTION AND TRACKING
« Time series » / queuing theory
Data elements are a function of time : D = {(t1,y1),(t2,y2),…,(tn,yn)}
Théorie du Signal : (fréquence / amplitude ou intensité) appliqué au Text Mining
Mesure à deux états (au plus simple) par rapport à un seuil
Mesure à états multiples : choix du type d’indicateurs, définition des échelles
TEMPORAL PATTERNS
Equal / non-equal time steps
linear (cycles) / non-linear patterns (but non chaotic)
IC 05 / semestre printemps 2008
2-3) Topic Detection and Tracking (TDT)
Hierarchical Structure and E-mail Streams
all the mail I sent and received during this period, unltered by content but excluding long les. It contains 34344 messages in UNIX mailbox format, totaling 41.7 megabytes of ascii text, excluding message headers.
Subsets of the collection can be chosen by selecting all messages that contain a particular string or set of strings; this can be viewed as an analogue of a \folder" of related messages, although messages in the present case are related not because they were manually led together but because they are the response set to a particular query.
To give a qualitative sense for the kind of structure one obtains, Figures 2 and 3 show the results of computing bursts for two dierent queries using the automaton A2. Figure 2 shows an analysis of the stream of all messages containing the word \ITR," which is prominent in my e-mail because it is the name of a large National Science Foundation program for which my colleagues and I wrote two proposals in 1999-2000.
IC 05 / semestre printemps 2008
2-3) Topic Detection and Tracking (TDT)
Text Mining
IC 05 / semestre printemps 2008
2-4) Dynamics of network (patterns temporels)
L’inscription du temps dans les systèmes : temps « invisible et continu » du système / temporalité d’événements remarquables
Emergence : the « first event »
« The sudden jump in network property occurs at a « critical state ». In random network
theory, this state is <K>=1. From a mostly disconnected state, the system evolves
suddenly to a single connected component »
Topology evolution (universal rules?)-Growth-Preferential attachment
IC 05 / semestre printemps 2008
2-4) Dynamics of network (patterns temporels)
critical states / phase transition (facteur interne?)
Équilibre?Feature of spontaneous order?
Signal faible et prédictibilitéBibliothèque de cas et méthodes de repérage des courbes ascendantes/naissantesMémoire et réseaux (réactivation potentielle des topologies/états critiques)
Robustness/Vulnerability (facteur externe?)
Error and Attack Tolerance / planed organisation and
developpment?
Ordered / random (crystal/liquid)Connected / fragmented (percolation)Synchronized / random-phased (lazer/light)
Quels types/degrés de corrélation entre facteurs externes et phase transition?Mutations systémiques
IC 05 / semestre printemps 2008
Temporal patterns, Topic Detection and Tracking, network and human dynamics…
3) Systèmes, interfaces, cas
IC 05 / semestre printemps 2008
Temporal patterns, Topic Detection and Tracking, network and human dynamics…
Detect and validate properties of an unknown function f
Temporal behavior of data elements
When was something greatest/least?
Is there a pattern?
Are two series similar?
Do any of the series match a pattern?
Provide simpler, faster access to the series
OBJECTIVES OF TIME SERIES VISUALIZATION(S) OR NETWORK EVOLUTION
IC 05 / semestre printemps 2008
Modéliser les propriétés topologiques (statiques) du domaine (cartographie)
Distribuer les systèmes de mesure, traiter les données, assurer la visualisation des patterns
Disposer de modèles prédictifs ou des scénarios évolutifs (ce qui suppose de les avoir testés dans plusieurs cas) dans leur articulation à la cartographie
Verrous théorique et technique : Bibliothèque de casExemple : la « grippe aviaire » comme phénomène informationnel stratégiqueModèle opérationnel : Global/local (topologie, contenu), niveau de couches (haute/agrégats), phénomènes dynamiques/statiques
Un exemple en veille stratégique : la « grippe aviaire »
Contexte : qui parle du H5N1 sur le web? En quels termes? La thémétique est-elle localisable sur le web? Par quels canaux et/ou relais d’opinion se propage l’information? Peut-on fournir des indicateurs a) de localisation b) de densité c) de propagation des informations associées à la thématique?
IC 05 / semestre printemps 2008
Mesure quantitative de « bruit »(type Tendançologue)
Analyse thématique quantitative et qualitative
(contenu textuel)
SYNTHESE Global/local (topologie, contenu), niveau de couches (haute/agrégats), phénomènes dynamiques/statiques
IC 05 / semestre printemps 2008
ThemeRiver: Visualizing Thematic Changes in Large Document Collections Susan Havre, Elizabeth Hetzler, Paul Whitney, Lucy Nowell
Interactive Visualization of Serial Periodic DataJohn Carlis, Joseph Konstan
Visual Queries for Finding Patterns in Time Series Data Harry Hochheiser, Ben Shneiderman
3 exemples de systèmes
IC 05 / semestre printemps 2008
ThemeRiver: Visualizing Thematic Changes in Large Document Collections
River metaphor: Each attribute is mapped to a “current” in the “river”, flowing along the timeline
Current width ~= strength of theme
River width ~= global strength
Color mapping (similar themes – same color family)
Comparing two rivers
IC 05 / semestre printemps 2008
ThemeRiver: Visualizing Thematic Changes in Large Document Collections
IC 05 / semestre printemps 2008
Interactive Visualization of Serial Periodic Data
Spiral axis = serial attributes
Radii = periodic attributes
Period = 360°
Focus on pure serial periodic data (equal durations of cycles)
Simultaneous display of serial and periodic attributes (e.g. seasonality)
Traditional layouts exaggerate distance across period boundaries
Focus+Context / Zoom unsuitable
Chimpanzees Monthly food consumption 1980-
1988
IC 05 / semestre printemps 2008
Interactive Visualization of Serial Periodic Data
12 common food types
Consistent ordering
Boundary linesHelpful ?
112 food types
Muliple linked spirals:
2 chimpanzees
group avg size / max size
One data set at a time
One spoke at a time / animation
Dynamic query (Movie database)
IC 05 / semestre printemps 2008
Visual Queries for Finding Patterns in Time Series Data
Visualization alone is not enough (when dealing with multiple entities, e.g. stocks/genes)
identifying patterns and trends
Algorithmic/statistical methods
Intuitive tools for dynamic queries (e.g. QuerySketch)
Visual query operator for time series (e.g. 1500 stocks)
Rectangular region drawn on the timeline display
X-axis of the box = time period
Y-axis of the box = constraint on the values
Multiple timeboxes = conjunctive queries
IC 05 / semestre printemps 2008
Visual Queries for Finding Patterns in Time Series DataEntity display windowQuery spaceControlling multiple boxes togetherQuery by examplelinked updates between views
http://www.cs.umd.edu/hcil/timesearcher/
IC 05 / semestre printemps 2008
http://cdc25.biol.vt.edu/Pubs/TysonNR.pdf
IC 05 / semestre printemps 2008
IC 05 / semestre printemps 2008Franck.ghitalla
Département TSHPrésident de WebAtlas
Mesure(s) de phénomènes dynamiques sur le web
Théorie(s), modèle(s), expérimentation(s), interfaces