Work-in-Progress Presentations
These are presentations by 2nd year MAPi students and by PhD students from other programs of the MAP universities.
Current accepted/received talks:- Geocast routing for Vehicular Ad-Hoc Networks, Rui Meireles (MAPi)
- Certifying Compiler for C with Fat pointers, Miguel Santos Silva (MAPi)
- Kolmogorov Complexity and Entropy Measures, Andreia Teixeira (MAPi)
- An Infrastructure for Experience Centered Agile Prototyping of Ambient Intelligence, José Luís Silva
- Focused Semi-Automatic Content Retrieval for Persistent Information Needs, Nuno Filipe Escudeiro
- Interoperability in Pedagogical e-Learning Services, Ricardo Queirós
- Learning from Ubiquitous Data Streams, Pedro Rodrigues
- Runtime Patching, Eduardo Marques
- Strategies for Discovering Network Motifs in Complex Networks, Pedro Ribeiro
- Clouder: A Flexible Large Scale Decentralized Object Store, Ricardo Vilaça (MAPi)
- Computer Assisted Diagnosis for Gastroenterology, Farhan Riaz
- An Ontology-based Approach to Model-DrivenSoftware Product Lines, Nuno Ferreira (MAPi)
- Formalization and mechanization of Kleene algebras inCoq, David Pereira (MAPi)
- Inception of Software Validation and Verification Practices within CMMI Level 2, Paula Monteiro (MAPi)
- Command and control base console forheterogeneous vehicles, Rui Gonçalves (MAPi)
- Gossip-based Service Coordination for Scalability and Resilience, Filipe José Campos (MAPi)
- Self-managing service platform, Nuno Carvalho (MAPi)
- Process Architecture for Multimodel Environments, André Ferreira (MAPi)
- A Cooperation Infrastructure for Heterogeneous Vehicles and Sensors, José Carlos Pinto (MAPi)
- Data Warehouses in the Path from Databases to Archive, Arif Ur Rahman (MAPi)
- Electronic Health Record for Mobile Citizens, Tiago Pedrosa (MAPi)
- Adaptive Object-Modelling: Patterns, Tools and Applications, Hugo Ferreira (MAPi)
- Robust Distributed Data Aggregation, PauloJesus (MAPi)
- Time Series Data Mining, Marco Castro (MAPi)
- Visualizing and Slicing Annotated Programs, Daniela Cruz (MAPi)
- Construction of a Local Domain Ontology from News Stories, Brett Drury (MAPi)
- An artificial immune system framework for temporal anomaly detection
-
Title: Geocast routing for Vehicular Ad-Hoc Networks
Author: Rui Meireles, MAPi 2nd Year
Advisors: João Barros (IT/FEUP), Michel Ferreira (IT/FCUP), Peter Steenkiste (CMU)
-
Title: Certifying Compiler for C with Fat pointers
Author: Miguel Santos Silva, MAPi 2nd Year
Advisors: Mário Florido (LIACC/FCUP) e Karl Crary (CMU)
Abstract:
-
Title: Kolmogorov Complexity and Entropy Measures
Author: Andreia Teixeira, MAPi 2nd Year
Advisors: Armando Matos (LIACC/FCUP) and Luís F. Antunes (IT/FCUP)
-
Title: An Infrastructure for Experience Centered
Agile Prototyping of Ambient Intelligence
Author: José Luís Silva, PhD Student at UM (non-MAPi)
Advisors: José Creissac Campos (DI/UM) and Michael D. Harrison (U. Newcastle)
-
Title: Focused Semi-Automatic Content Retrieval for Persistent Information Needs
Author: Nuno Filipe Escudeiro, PhD student PRODEI
Advisors: Alípio Jorge (LIAAD/FCUP) and Rui Camacho (LIAAD/FEUP)
-
Title: Interoperability in Pedagogical e-Learning Services
Author: Ricardo Queirós, PhD Student FCUP,non-MAPi
Advisors: José Paulo Leal (CRACS/FCUP)
The ultimate goal of this research is to improve the learning experience of students through the combination of pedagogical eLearning services. Service oriented architectures are already being used in eLearning but in this work the focus is on services of pedagogical value, rather than on generic services adapted from other business systems. This approach to the architecture of eLearning platforms raises challenges addressed by this work, namely: conceptual modeling of the pedagogical eLearning services domain; interoperability and coordination of pedagogical eLearning service; conversion of existing eLearning systems to pedagogical services; adaptation of eLearning services to individual learners. An improved eLearning platform will incorporate learning tools adequate to the domains it covers and will focus on the individual learner that uses it. With this approach we expect to raise the pedagogical value of eLearning platforms.
-
Title: Learning from Ubiquitous Data Streams
Author: Pedro Rodrigues, PhD Student FCUP, Non-MAPi
Advisors: João Gama (LIAAD/FEP) and Luís Lopes (CRACS/FCUP)
Abstract:
Today's data flows continuously from streams at high speed, producing examples over time, which forces a traditional data gathering process to create databases with tendentiously infinite length. Also, data gathering and analysis have become ubiquitous, in the sense that our world is evolving into a setting where all devices, as small as they may be, will be able to include sensing and processing ability. Thus, if data is to be gathered centrally, this scenario also points out to databases with tendentiously infinite width. Hence, new techniques must be defined, or adaptations of known methods should appear, in order to deal with this new ubiquitous streaming setting. This talk will cover a few topics which have been targeted by the speaker in his PhD work in the past two years, concerning centralized and distributed data stream mining, pointing out different contributions and some paths for future developments. -
Title: Runtime Patching
Author: Eduardo Marques, PhD Student FCUP, non-MAPi
Advisors: Luís Lopes (CRACS/FCUP) and Christoph Kirsch (Univ.Salzburg, Austria)
Abstract:
Typically a program requires updates over its life cycle, that are increasingly complex and dynamic in modern software. The interruption of a program, in order that it is replaced by a patched version of it through an offline process - the more simplistic approach of addressing updates - compromises on the other hand high availability and mission-critical requirements that may exist for a program. To deal with these aspects, we consider runtime patching: the update of a program at runtime, in which the transition between old and new functionality occurs in semantics-preserving and incremental manner. -
Title: Strategies for Discovering Network Motifs in Complex Networks
Author: Pedro Ribeiro, PhD Student FCUP, non-MAPi
Advisors: Fernando Silva and Luís Lopes (CRACS/FCUP)
Abstract:
Complex networks appear in many fields of science. One possible way to study and characterize their structural properties is to find "network motifs", that is sub-networks which are over-represented (and thus can have special functional meaning). This concept has gained the attention of the research community and a multitude of variations and methods were developed. Here we will define exactly what "network motifs" are and briefly describe which algorithms and strategies currently exist, giving emphasis on how they compare to one another. Finding "network motifs" is a computationally hard problem and we will show how one could improve the existing methods. We will focus on parallel approaches and novel data structures that could lead to substantial efficiency gains on this problem. We will also give some preliminary results of our work. -
Title: Clouder: A Flexible Large Scale Decentralized Object Store
Author: Ricardo Vilaça, MAPi 2nd Year
Advisors: Rui Oliveira (CCTC/UM)
Abstract:
The current exponential growth of data calls for massive scale capabilities of storage and processing. Such large volumes of data tend to disallow their centralized storage and processing making extensive and exible data partitioning unavoidable. This is being acknowledged by several major Internet players embracing the Cloud computing model and offering first generation remote storage services with simple processing capabilities. In this work we present preliminary ideas for the architecture of a flexible, efficient and dependable fully decentralized object store able to manage very large sets of variable size objects and to coordinate in place processing. Our target are local area large computing facilities composed of tens of thousands of nodes under the same administrative domain. The system should be capable of leveraging massive replication of data to balance read scalability and fault tolerance. -
Title: Computer Assisted Diagnosis for Gastroenterology
Author: Farhan Riaz, PhD Student FCUP, non-MAPi
Advisors: Miguel Coimbra (IT/FCUP) and Mario Dinis Ribeiro (FMUP)
Abstract:
Endoscopy of Gastrointestinal tract is a medical procedure which is used for diagnosis of cancer in the patients. Considering the fact that it is hard to diagnose the GI cancer at the initial stages owing to unclear health conditions, time-cosuming and cumbersome diagnostic procedure, it is believed that a Computer Aided Diagnosis (CAD) system could help in fast screening of patients, which might result in early diagnosis, and hence, cure of GI cancer. An early study into design of such a CAD system suggests the requirement of image segmentation of endoscopic images. This step was validated by the physicians and they provided annotations to us, which we refer to as the "gold-set of annotations of Endoscopic images". In our initial study on chromo endoscopy images, we found that state-of-the-art segmentation algorithms provide very reasonable approximation of image annotations. -
Title: An Ontology-based Approach to Model-Driven Software Product Lines
Author: Nuno Ferreira, MAPi PhD Student
Advisors: Ricardo Machado (Algoritmi/UM) and Dragan Gasevic (U. Athabasca)
Abstract:
Software development in highly variable domains constrained by tight regulations and with many business concepts involved results in hard to deliver and maintain applications, due to the complexity of dealing with the large number of concepts provided by the different parties and system involved in the process. One way to tackle these problems is thru combining software product lines and model-driven software development supported by ontologies. Software product lines and model-driven approaches would promote reuse on the software artifacts and, if supported by an ontological layer, those artifacts would be domain-validated. We intend to create a new conceptual framework for software development with domain validated models in highly variable domains. To define such a framework we will propose a model that relates several dimensions and areas of software development thru time and abstraction levels. This model would guarantee to the software house traceability of components, domain validated artifacts, easy to maintain and reusable components, due to the relations and mappings we propose to establish in the conceptual framework, between the software artifacts and the ontology. -
Title: Formalization and mechanization of Kleene algebras inCoq
Author: David Pereira, MAPi 2nd Year
Advisors: Nelma Moreira (LIACC/FCUP) and Simão Sousa (LIACC/UBI)
Abstract:
Kleene algebra (KA) [2] is an algebraic system that axiomatically captures properties of several important structures arising in Computer Science, and has been applied in several contexts like automata and formal languages, semantics and Logic of programs, design and analysis of algorithms, among others. A KA is an algebraic structure K = (K, 0, 1, +, ·,∗ ) such that (K, 0, 1, +, ·) is an idempotent semiring and where the operator ∗ (Kleene’s star) is characterized by a set of axioms. There are several ways of axiomatizing a KA. Here we follow the work presented by Dexter Kozen in [3]. The axiomatization we are going to consider has the advantage of being sound over non-standard interpretations, and leads to a complete deductive system for the universal Horn theory of KA. In particular, it leads to a decidable procedure for reasoning equationally in KA, as the equational theories of several classes of KA are the same and equal to the one of regular expressions.Kleene algebra with tests (KAT) [4] extends KA with an embedded Boolean algebra and is particularly suited for the formal verification of propositional programs. In particular, KAT subsumes propositional Hoare logic [5], a weaker kind of Hoare logic without the assignment axiom. The deductive rules of propositional Hoare logic can be encoded as KAT expressions, and these expressions are shown to be KAT theorems.
In this work we describe an ongoing formalization of the previous algebraic systems in the Coq [1] theorem prover. The current version of our formalization includes the encoding of KA and KAT themselves, plus the formalizations of regular languages, of regular expressions, and of propositional Hoare logic. In particular we proved that regular expressions are the standard model of KA, and that propositional Hoare logic deductive rules are theorems of KAT’s equational theory.
Our contribution ends with a view on a decision procedure for regular expression equivalence (based on regular expression derivatives), and how such decision procedure can be encoded as Coq’s proof tactic that automatically proves KAT equalities. We envision the usage of such a tactic together with the encoded theories for KAT as the formal system in which we can encode and prove proof obligations for the mechanization and automation of formal program verification, in the context of the Proof Carrying Code paradigm.
-
Title: Inception of Software Validation and Verification Practices within CMMI Level 2
Author: Paula Monteiro, MAPi 2nd Year
Advisors: R. Machado (Algoritmi/UM) and R. Kazman
Abstract:
Validation and verification are mandatory activities that software companies must perform when developing software products with a high degree of quality. CMMI (Capability Maturity Model Integration) is a Software Process Improvement maturity model composed by several Process Areas (PAs) and the Process Areas are grouped by Maturity Level (ML). Companies can achieve each of the 5 Maturity Level by implementing the set of Process Areas imposed by each of the Maturity Level. Currently, more companies become aware that adopting CMMI (the software process maturity model developed by the Software Engineering Institute) can be a way to develop quality software. However, some companies are resistant to adopt CMMI Maturity Level 2 because they do not considerer this Maturity Level a benefit since its implementation is expensive and does not cover the Validation and Verification efforts. The simultaneous adoption of CMMI Maturity Level 2 with Validation and Verification Process Areas (from Maturity Level 3) lacks some methodological recommendations, since some dependencies exist between those two CMMI Maturity Levels. This thesis will propose one approach to conciliate Validation and Verification practices with of CMMI maturity level 2 and by adopting ISO/IEC 29119 standard to fulfill a product lifecycle perspective. We start this work by studying the implementation of CMMI Maturity Level 2 and some Process Areas from Maturity Level 3 at same time.
We are analyzing the dependencies between CMMI Process Areas to evaluate the impact of this combination of Process Areas from different Maturity Level. Looking to the official CMMI documentation it is not possible to have a global view of the dependencies between all the CMMI Process Areas. It is only possible to see what are the dependencies of each PA independently. So to have this global view of the CMMI Process Areas, we create a matrix that contains the information of all the dependencies and a set of graphs that graphically represents the information stored in the matrix. With this study we want to analyze which are the Process Areas that we have to take into account when implementing the Validation and Verification Process Areas simultaneously with CMMI Maturity Level 2.
-
Title: Command and control base console for heterogeneous vehicles
Author: Rui Gonçalves, MAPi 2nd Year
Advisors: F. Pereira and G. Gonçalves (FEUP)
Abstract:
The importance of unmanned vehicles in a wide variety of applications in many different environments has been recognized for many years. In this way, the increasing level of automation induces a context in which the system understanding by operators can hardly be considered to improve its operation. However, given the current state-of-the-art of the underlying technologies, in general, this is an extremely valuable component to ensure requirements and or competitiveness. In order to tackle this issue, a recent concept in the design of person-machine interfaces, called tele-presence, has been emerging that allows operators to optimally use their judgment to infer about the system and the environment where it stands. Such an interface enables a clear view and presence on the remote environment by bringing the operator very much inside the control loop, with the consequent improved redistribution of tasks.This type of architecture leads us to the analyses of a virtual reality environment, and an interface to the controller of all hardware elements, including vehicles sensors payload.
This presentation will show the work that has been developed on command and control software framework (Neptus). Neptus is a modular mixed initiative framework (human operators in the control loop) for the operation of heterogeneous teams of vehicles such as autonomous and remotely operated underwater, surface, land, and air vehicles. Neptus is composed of mission and vehicle planning, supervision/execution, and post-mission analysis modules which are provided as services across a network. This presentation focus mainly on the mission supervision/execution module with the presentation of a XML based language for operation consoles definition and the underlying plug-in architecture.
For this framework it was developed an architecture to easily build consoles capable of handling several types of unnamed vehicles. These consoles are created by components that can be included in the framework has plugins. It will be given examples of some developed console components. Also a special example of components combination will be presented. This example will show how to combine the information distribution system (in the consoles) with the 3D world representation component and the video feed component with the goal of geo reference video images, captured by vehicle onboard cameras, in the 3D world view. All these methodologies will be presented in congruence with the proposed main vision: to offer a new standard of methodologies to apply in command and control software with different control levels for different type of vehicles in collaborative missions.
-
Title: Gossip-based Service Coordination for Scalability and Resilience
Author: Filipe José Campos, MAPi 2nd Year
Advisors: José Orlando Pereira (CCTC/UM)
Abstract:
Many interesting emerging applications involve the coordination of a large number of service instances, whether as targets or sources for information delivery or retrieval, respectively. These applications raise hard architectural, scalability, and resilience issues that are not suitably addressed by centralized or monolithic coordination solutions.
In order to achieve increasingly complex behaviors and to provide the desired fault tolerance guarantees in a service-oriented environment, we intend to take advantage of current standards and to produce similarly useful protocols from existing protocols used to achieve fault tolerance in distributed systems.
With this in mind, we have proposed a lightweight approach to service coordination aimed at such application scenarios, which is based on gossiping and thus potentially fully decentralized, requiring that each participant is concerned only with a small number of peers.
We depict the way our approach (WS-PushGossip), a proof-of-concept coordination protocol based upon the WS-Coordination framework, compares to the current publish-subscribe web services standard (WS-Notification), in terms of resources' consumption and fault tolerance. At the present time, we have an implementation but still have to improve and evaluate it.
-
Title: Self-managing service platform
Author: Nuno Carvalho, MAPi 2nd Year
Advisors: José Orlando Pereira (CCTC/UM)
Abstract:
The increasing complexity and size of a typical distributed system makes it very hard to understand the role of each of its components, such as middleware layers and physical resources, and the impact of their interactions in overall system performance. It is difficult to model the behavior of large distributed systems, both in terms of number of components, as in their reactions to high loads. This lack of understanding leads to incorrect component models, with missing dependencies and interactions, that severely limit the applicability of autonomic management methods.
In order to model correctly the behaviour of systems, we need to know exactly how a system reacts to actions, to achieve that we performed several experimental analysis, such as, distributed systems using various platforms and frameworks, MySQL TPC-C benchmark and network protocols stability according different system loads. This experiments give us worthful insights on how to correlate systems behaviour with their input actions, and therefore how to model them. The overall results were (1) a framework capable of identify and characterize the system components, generating a model that mimics the system; (2) a simulator which animates the model generated previously.
-
Title: Process Architecture for Multimodel Environments
Author: André Ferreira, MAPi 2nd Year
Advisors: Ricardo Machado (Algoritmi/UM)
Abstract:
In recent years the number of organizations combining different improvement models is increasing. The main challenge for these organizations is the interoperability of these models. An incorrect integration approach diminishes efficiency and effectiveness of combining different approaches. Additionally, the integration process is considered complex and the resulting system complexity is also considerable. An architecture aims to provide an abstraction to deal with complexity in the design and implementation of complex systems. The concept of architecture applied in the design of systems of processes is considered an emergent research topic. The current understanding of architectural characteristics is insufficient to inform effective process composition and architecture. With the research being carried out we propose to deliver an extensive study of one of these multimodel environments and provide an analysis of identified architectural building blocks and their relationships.
Our approach involves a thorough analysis of the current organizational (Critical Software) Quality Management System. We are using Rummler-Brache swim-lane diagram to expound the existing inter-process structural relations. Rummler-Brache swim-lane diagram provide a method for plotting and trace processes, in particular, the interconnections between processes and relevant organizational structures. During the study numeric data will collected that characterizes existing process components and relations. Additionally, we aim to categorize the type of relations that exist in the overall system of processes.
Our goal is to understand how improvement models were interpreted and which structuring elements currently support the organizational Quality Management System.
-
Title: A Cooperation Infrastructure for Heterogeneous
Vehicles and Sensors
Author: José Carlos Pinto, MAPi 2nd Year
Advisors: (FEUP)
Abstract: F. Pereira and J. Sousa (FEUP)
We present the ongoing efforts to create a verified infrastructure for networks of heterogeneous vehicles and sensors as well as human operators. In our approach, we use a service-oriented architecture, achieving cooperation by means of service orchestration.
Recent developments in the fields of electronics, communications and computing have significantly lowered the production cost of wireless sensors and unmanned vehicles which are being used in multiple applications. Moreover, these individual wireless sensors and unmanned vehicles can also be part of larger networks for use in scenarios like remote sensing and surveillance.
There is an increased interest in using multiple vehicles and sensors simultaneously because the number of nodes in the network tends to linearly contribute to increase the efficiency in terms of spatial and/or temporal resolution of acquired data. Along with node multiplicity, we are also pursuing node heterogeneity. Different vehicles and sensors may be adapted to register different environment variables, to fulfil a specific set of tasks or to use different communication means.
By having cooperative behaviour among heterogeneous network nodes, the number of applications of these systems is increased even more. Consider, for instance, scenarios like search and rescue operations (some vehicles tailored for aerial search while others for rescue in the ground), border patrolling (static wireless cameras register movements while unmanned air vehicles follow targets) and automated warfare (vehicles with different weaponry, vehicles for battle damage assessment, ...), adaptive sensing (when a feature is detected, a vehicle tailored for that kind of inspection is sent to the local), among others.
In our envisioned system, cooperation is made possible by service requisition/provision and service orchestration. Sensors, vehicles and human operators provide a specific (time-varying) set of services and may request services to other nodes in the network. The set of services that can be provided at each instant of time may vary with the nodes’ internal states and also with the network topology, since some services may be an orchestration of other services in the network.
In order to create a verified (networked) system, one must extract system properties like reachability (the system will produce the desired result at some point), liveness (the system continually provides a minimal set of services) and robustness (the system will never reach an erroneous state). To do this verification we are developing both simulation platforms (simplified dynamics and communication models) and formal models based on Dynamic Input-Output Automata, Pi-Calculus, Graph Transformation Systems and Hybrid Automata.
-
Title: Data Warehouses in the Path from Databases to Archive
Author: Arif Ur Rahman, MAPi 2nd Year
Advisors: Gabriel David and Cristina Ribeiro (FEUP)
Abstract:
Organizations are increasingly relying on databases as the main component of their record keeping systems. However, at the same pace the amount and detail of information contained in such systems grows, also grows the concern that in a few years most of it may be lost, when the current hardware, operating systems, database management systems (DBMS) and actual applications become obsolete and turn the data repositories unreadable. It is very important to take certain steps to keep the data safe from the changes in technology.
In the last academic year I have been working on a case study involving the ‘Course Evaluation System’ of Faculty of Engineering, University of Porto (FEUP). It involved two major steps i.e. migration of the model of the database from relational to dimensional model and storing explicitly the derived information embodied in code. This case study i.e. Model migration approach for database preservation, was presented to a conference (DEXA09) but was unfortunately not accepted. I have further improved it according to the comments and will submit again to a conference next month.
I also studied the strategies and standards already established in the past for digital records preservation. I found that this work already done can be helpful for database preservation. I have submitted a paper ‘Applying electronic records standards to database preservation’ to iPRES09.
Currently, I am working on another part of the case study which involves the database of Human Resource department of FEUP. This case study will lead to the development of transformation rules which will be useful for model migration of a database from relational to dimensional model.
I also got some ideas from the study of digital preservation strategies already established about some points which if taken into account while developing a database. It will be easier to preserve the database in the future. Soon will be starting work on it and submit a paper to a conference.
-
Title: Electronic Health Record for Mobile Citizens
Author: Tiago Pedrosa, MAPi 2nd Year
Advisors: J.L. Oliveira, Rui Lopes (IEETA/UA)
Abstract:
An Electronic Health Record allow gathering information related to citizens medical events, and is nowadays a key component to efficiently exploit information technologies in health care institutions.
Despite the political and technical efforts to deploy it, there are still several lacks that impede its wide adoption, namely the co- existence of dissimilar and incompatible health information systems and the absence of a unique central repository for personal medical data. The heterogeneity of systems and type of records used, as well the number of different institutions in different areas and countries, create the need for a secure aggregation of disperse data.
To tackle this problem we have been investigating the construction of a Virtual Health Card System (VHCS). This VHCS can be seen as the migration of a physical smart-card to online services, that are responsible to present the user credentials to health systems as needed, providing a single-sign-on behavior. It also provides a way to save references to the disperse health data and control the access to it promoting a virtual integration of medical records.
-
Title: Adaptive Object-Modelling: Patterns, Tools and Applications
Author: Hugo Sereno Ferreira, MAPi 2nd Year
Advisors: Ademar Aguiar and Pascoal Faria (INESC-Porto/FEUP)
Abstract:
-
Title: Robust Distributed Data Aggregation
Author: Paulo Jesus, MAPi 2nd Year
Advisors: Carlos Baquero and Paulo Sérgio Almeida (CCTC/UM)
Abstract:
Data aggregation is a basilar technique in the design of efficient sensor networks and scalable systems in general. It enables the determination of meaningful system properties (e.g. network size; total storage capacity; average load; or majorities) on decentralized settings, directing the execution of the system. Robbert Van Renesse defined aggregation as "the ability to summarize information", stating that "it is the basis for scalability for many, if not all, large networking services". In a nutshell, data aggregation is considered a subset of information fusion, aiming at reducing (summarize) the handled data volume.
Although apparently simple, aggregation has reveled itself as a hard and interesting problem, especially when seeking solutions in distributed settings, where no single element holds a global view of the whole system. In the current state of the art, several algorithms are found addressing this problem from diverse approaches, exhibiting different characteristics in terms of accuracy, time and communication tradeoffs. In particular, most of the existing approaches to perform data aggregation exhibit a lack in terms of reliability and robustness against faults, which is often a major concern in distributed networks (e.g. wireless sensor networks).
This research work focus on the construction of robust aggregation algorithms on dynamic settings, giving a particular attention to fault tolerance. It contributes and aid in the resolution of important practical issues, such as fault tolerance and network changes, intending to enable the effective practical application of a robust data aggregation scheme.
-
Title: Time Series Data-Mining
Author: Marco Castro, MAPi 2nd Year
Advisors: Paulo Azevedo (CCTC/UM)
Abstract:
Data Mining or Knowledge Discovery in Databases (KDD) is an important area of computer science. The relevance of this area is due to the enormous quantity of information daily produced by different sources, for instance the web, biological processes, finance, the aeronautic industry, retail, and telecommunications data. A random sample of 4,000 graphics from 15 of the world’s newspapers published from 1974 to 1989 found that more than 75% of all graphics were time series. Examples of time series are the variation of a stock index, the consumption of a certain good, the daily blood pressure of an individual, the annual rainfall in a city, number of web page hits per second, the brain electrical activity of a patient measured at 256 Hz in an electroencephalogram (EEG), the motion of a blob in a football player's right foot as he shoots the ball, and the daily evolution of the oil price.
Time series data present characteristics that make their analysis particularly difficult. The large amount of data derived for them leads any attempt for manual data analysis to failure. Automatic analysis is also hard because typically we are dealing with massive data-sets. For example one hour of an electrocardiogram produces one gigabyte (GB) of data, a typical weblog 5GB a week, and some existing databases such as the Macho database has terabytes of information, growing at a gigabyte daily rate.
The task of finding previously unknown frequent patterns in time series is of particular importance. Also known as "recurrent patterns", "typical shapes", "similar patterns", or just "motifs", these time series subsequences usually describe the time series at hand, providing valuable insight to the domain expert. For example, in EEG time series a motif may be a pattern that usually precedes a seizure; in DNA it may be a sequence of symbols that has been preserved through evolution; in music, it may be a specific sequence of rhythms; in telecommunications, it may be a typical burst in traffic which happens when major social events are located near an antenna.
In our work, we investigate the time series motif discovery problem. Namely, we have been working in multivariate motif discovery, by developing a scalable algorithm with few parameters. Our future goals are to increase the understanding in time series similarity measures and motif evaluation measures. Finally, we also intend to tackle the streaming time series case and use the just discovered motifs as building blocks for other data mining tasks, as we hope these accomplishments have significant impact in the time series data mining community.
-
Title: Visualizing and Slicing Annotated Programs
Author: Daniela Cruz, MAPi 2nd Year
Advisors: Pedro Rangel Henriques and Jorge Sousa Pinto (CCTC/UM)
Abstract:
In the last few years, a concern with the correctness of programs has been raised with developments in the area of Program Verification (an approach to specifying formally and automatically validating programs). This concern leads programmers to enrich programs with annotations, in order to guarantee their correct behavior.
In this talk we adapt the idea of postcondition-based slicing to the scope of program verification systems and annotation languages that adhere to the design-by-contract principles. In this direction, we introduce the definition of contract-based slice.
This allows us to address the problem of combining program slicing with annotated components in a rigorous and efficient way. Applications include both verification and classic applications of slicing. Assuring the quality of the components, we go a step ahead in the process of certifying the whole application.
To complement the efforts done and the benefits of slicing techniques, there is a need to find an efficient way to visualize the annotated components and their slices.
In this talk we also address this problem. To take full profit of visualization, it is crucial to combine the visualization of the control/data flow with the textual representation of source code. To attain this objective, we extend the notion of System Dependence Graph and slicing criterion.
-
Title: Construction of a Local Domain Ontology from News Stories
Author: Brett Drury, MAPi 2nd Year
Advisors: José João Almeida (CCTC/UM) and Luís Torgo (LIAAD/FCUP)
Abstract:
The identification of "actionable" information in news stories has become a popular area for investigation. News presents some unique challenges for the researcher. The size constraints of a news story often require that full background information is omitted. Although this is acceptable for a human reader, it makes any form of automatic analysis difficult. Computational analysis may require some background information to provide context to the news stories. There have been some attempts to identify and store background information. These approaches have tended to use an ontology to represent relationships and concepts present in the background information. The current methods of creating and populating ontologies with background information for news analysis were unsuitable for our future needs.
In this paper we present an automatic construction and population method of a domain ontology. This method produces an ontology which has the coverage of a manually created ontology and the ease of construction of the semi-automatic method. The proposed method uses a recursive algorithm which identifies relevant news stories from a corpus. For each story the algorithm tries to locate further related stories and background information. The proposed method also describes a pruning procedure which removes extraneous information from the ontology. Finally, the proposed method describes a procedure for adapting the ontology over time in response to changes in the monitored domain.
-
Title: An artificial immune system framework for temporal anomaly detection
Author: Mário Antunes, PhD Student FCUP, non-MAPi
Advisors: Manuel Eduardo Correia (CRACS/FCUP)
Abstract:
One emergent, widely used metaphor and rich source of inspiration for anomaly detection has been the vertebrate Immune System (IS). This is mainly due to its intrinsic nature of having to constantly protect the body against harm inflicted by external (nonself) harmful entities. The bridge between metaphor and the reality of new practical systems for anomaly detection is cemented by recent biological advancements and new proposed theories on the dynamics of immune cell by the field of theoretical immunology.
This talk aims to present a generic immuneinspired architecture for temporal anomaly detection, based on the Grossman's Tunable Activation Threshold (TAT) immunological hypothesis. Generally, TAT theory hypothesizes that immune cell activation depends on a threshold that is adjusted dynamically and at each point in time it corresponds to the integrated past history of the received signals. Thus, there is a strict temporal ordering and meaning on the data, such as in network intrusion and process changes detection. Firstly, the aims and a short immunological background are presented, mainly the adopted TAT theory. Then, the overall architecture and its possible applications to solve real world problems is depicted. Finally, some results, open issues and main lines of research we intend to follow are analyzed and discussed.