Doutorando em Filosofia: 2009

terça-feira, 15 de dezembro de 2009

Stereo Image Displacement

Today I constructed a small application to calculate the distance at which the objects are from the two stereo cameras. The image in the bottom left shows the nearer object in lighter colours and the far objects in darker tones. There are some problems especially in blank areas, where there are no big references to create the image (see red areas).

To calculate this image, for each pixel on the top left image I calculated the displacement of that point in the right image. The displacement is inversely proportional to the distance.

quinta-feira, 19 de novembro de 2009

Measuring the distance of an object

This was a small demo that I did in Processing 1.0 to test stereo vision.

I took two pictures with a webcam. The second picture was taken 7 cm (b) to the left of the first. In this demo, to measure the distance of a point I pick the same point in each picture. The distance is measured using triangulation. There is a linear relation between the displacement (H) in pixels ( green line in the door locker ), and the distance (Z).

H=b/Z

This only happens when the two images are taken by two cameras that are side by side on the some plane.

segunda-feira, 16 de novembro de 2009

OpenCV the Computer Vision Library

Great library, but it is very hard to make it work on Visual Studio 2009. There is no default .lib for VS so you have to make them yourself.

These are the instructions from the file:

OpenCV-2.0.0a-win32.Readme.Please.txt

1. Download CMake from http://www.cmake.org/cmake/resources/software.html
and install it.

2. Run CMake GUI tool and configure OpenCV there:
   2.1. select C:\OpenCV2.0 (or the installation directory you chose)
        as the source directory;
   2.2. choose some other directory name for the generated project files, e.g.
        C:\OpenCV2.0\vs2008, or D:\Work\OpenCV_MinGW etc.
   2.3. press Configure button, select your preferrable build environment
   2.4. adjust any options at your choice
   2.5. press Configure again, then press Generate.
3a. In the case of Visual Studio or any other IDE, open the generated
   solution/workspace/project ..., e.g. C:\OpenCV2.0\vs2008\OpenCV.sln,
   build it in Release and Debug configurations

3b. In the case of command-line Makefiles, enter the destination directory
    and type "make" (or "nmake" etc.)
4. Add the output directories to the system path, e.g.:
   C:\OpenCV2.0\vs2008\bin\Debug;C:\OpenCV2.0\vs2008\bin\Release:%PATH%
   It is safe to add both directories, since the Debug
   OpenCV DLLs have the "d" suffix, which the Release DLLs do not have.
5. Optionally, add C:\OpenCV2.0\include\opencv to the list of
   include directories in your IDE settings,
   and the output library directories
   (e.g. C:\OpenCV2.0\vs2008\lib\{Debug,Release})
   to the list of library paths.

One afternoon was lost with this, It would just be better if they gave all the compiled stuff from the beggining…

sexta-feira, 13 de novembro de 2009

Tema da tese definido

“Building interactive spatial and temporal models
using multimodal data”

Abstract

The construction of virtual world models is an important step in several applications from simulations to virtual reality. These threedimensional worlds are usually constructed oine from scratch or using large datasets of data extracted from satellites. Once they are constructed they are usually static or hard to alter automatically. What is proposed in this thesis is a method to create methods that would take advantage of several types of low-cost sensors and computer vision to quickly create and update virtual models. These virtual models will have live feeds from the sensors and from images and videos. The resulting model can be consulted in terms of space and time, turning on and o several layers of information. These models can then be used in several scenarios such as retrieving information from a physical space, creating simulations or creating augmented reality scenes. One of the main ideas proposed is the fact that changes in the virtual model can be reflected back to the images and videos that helped in its construction. This reconstruction of reality can have several applications to visualize the results of environmental simulations such as pollution or disaster simulations, showing the a ected areas in real life images. Additionally it can be useful to superimpose virtual objects to be used in augmented reality, turning the room where the user is into a completely di erent scenario. The construction of the models involves the study of several image processing algorithms and techniques. Additionally, to support the fast creation, visualization and interaction with the models several tools will have to be developed. The interaction should explore new paradigms di erent from the mouse and keyboard. Essentially it should take advantage of the computer vision knowledge learnt in the construction of the models. This document addresses the problems involved in this area, presents related work, preliminary solutions and a work plan for the thesis.

quinta-feira, 30 de abril de 2009

Update

Muitas coisas foram feitas durante o último mês:

O paper para o Casa2009 ficou feito mas não foi enviado, fica para mais tarde, quando surgir uma conferência em que o resultado fique publicado na internet.

O paper do Interact foi enviado espera-se resultados até dia 15 de Maio.

A expofct foi um sucesso, a sala da realidade aumentada foi uma das salas que mais curiosidade suscitou.

O trabalho de audio está feito e apresentado com um projecto sobre detecção de voz em musica.

Falta o trabalho de visualização com data de entrega a 31 de Maio.

Falta o trabalho de STC com data de entrega a 28 de Maio.

quarta-feira, 11 de março de 2009

Paper review: Precise Selection for Multi-Touch Screens

Benko, H., Wilson, A. D., and Baudisch, P. 2006. Precise selection techniques for multi-touch screens. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Montréal, Québec, Canada, April 22 - 27, 2006). R. Grinter, T. Rodden, P. Aoki, E. Cutrell, R. Jeffries, and G. Olson, Eds. CHI '06. ACM, New York, NY, 1263-1272.

link

In current touch screens there is a problem related with the precision of the human fingers. Most of the graphic user interfaces are designed for mouse interaction where the pointing device precision is much higher and all control objects on the screen can have a low amount of pixels. Keeping in mind that the most appealing aspect of touch screens is the ability to directly touch an object in order to interact with it, this paper explains several techniques to increase the pixel-accuracy of the interaction. The techniques includes simulate pressure using the area of touch and placing the cursor at the top of the finger. Another is to place the cursor at an offset distance from the finger to reach corners or edges. Using two fingers they propose a resizable zoom window that can be used to facilitate the selection of objects with fingers. Finally it is proposed a model for a contextual menu. The paper has a good related work about multi-touch prototypes and interfaces and has an extensive study on the behavior of the users with these techniques.

terça-feira, 10 de março de 2009

Preparing a presentation...

Shorts:

I'm preparing a small 10 min talk for the Scientific and Technical Communication class about:

Multi-point interfaces and interaction

I am also writing a short paper for Interact2009

Call for papers : 3AMIGAS

Paper review: Supporting Multi-point Interaction in Visual Workspaces

Shoemaker, G. and Gutwin, C. 2007. Supporting multi-point interaction in visual workspaces. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (San Jose, California, USA, April 28 - May 03, 2007). CHI '07. ACM, New York, NY, 999-1008.

link

This is a paper describing how to interact using multiple controls in a task but using only a single pointing device. The main techniques described include splitting screen when two control points are too far and fisheye context zoom around control points when more precision is needed.

Does not use simultaneous multi-point interaction.

quarta-feira, 4 de março de 2009

Paper Review: Photo-based question answering

Interesting paper with some ideas on how to use images for searching and question answering.

Yeh, T., Lee, J. J., and Darrell, T. 2008. Photo-based question answering. In Proceeding of the 16th ACM international Conference on Multimedia (Vancouver, British Columbia, Canada, October 26 - 31, 2008). MM '08. ACM, New York, NY, 389-398.

This paper describes a three step method for answering photo-based questions. In traditional text-based query systems there are some problems when the questions performed by the user are centered on physical objects with distinct visual attributes. For example in the question “where can I buy this poster?” using text the user has to perform the question and accurately describe the object or image desired. Using photos the user would instead perform the question and submit an image of the desired poster reducing the amount of text required for the query. To answer the query the authors propose a three-layer system architecture. The first, is a template-based QA, where it takes the question and sees if there is a image database associated with it, then tries to find the image on that database. If it fails the information retrieval layer searches an internal repository to find similar questions already answered. If all fails the last layer answers the query using human-computation using experts or voluntary users feeding the result to the IR layer. In this work three prototypes are presented, one to do photo-based QA in Flikr other for Yahoo!Answers and a mobile application where the user takes pictures of physical objects and executes a question with the resulting image.

sexta-feira, 27 de fevereiro de 2009