Abstracts Track 2021

Area 1 - Business Analytics

Nr: 13

Real Estate Price Prediction with Artificial Intelligence Techniques


Sophia L. Zhou

Abstract: For investors, businesses, consumers, and governments, an accurate assessment of future housing prices is crucial to critical decisions in resource allocation, policy formation, and investment strategies. Previous studies are contradictory about macroeconomic determinants of housing price and largely focused on one or two areas using point prediction. This study aims to develop data-driven models to accurately predict future housing market trends in different markets. This work studied five different metropolitan areas representing different market trends and compared three time lagging situations: no lag, 6-month lag, and 12-month lag. Linear regression (LR), random forest (RF), and artificial neural network (ANN) were employed to model the real estate price using datasets with S&P/Case-Shiller home price index and 12 demographic and macroeconomic features, such as gross domestic product (GDP), resident population, personal income, etc. in five metropolitan areas: Boston, Dallas, New York, Chicago, and San Francisco. The data from March, 2005 to December 2018 were collected from the Federal Reserve Bank, FBI, and Freddie Mac. In the original data, some factors are monthly, some quarterly, and some yearly. Thus, two methods to compensate missing values, backfill or interpolation, were compared. The models were evaluated by accuracy, mean absolute error, and root mean square error. The LR and ANN models outperformed the RF model due to RF’s inherent limitations. Both ANN and LR methods generated predictive models with high accuracy (>95%). It was found that personal income, GDP, population, and measures of debt consistently appeared as most important factors. It also showed that technique to compensate missing values in the dataset and implementation of time lag can have significant influence in the model performance and require further investigation. The best performing models varied for each area, but the backfilled 12-month lag LR models and the interpolated no lag ANN models showed best stable performance overall, with accuracies >95% for each city. This study reveals the influence of input variables in different markets. It also provides evidence to support future studies to identify the optimal time lag and data imputing methods for establishing accurate predictive models.

Nr: 18

Knowledge Graph based Electrical Circuit Simulation and Component Selection


Rahman Syed, Johannes Bayer and Felix Thoma

Abstract: Electrical circuits can be considered graph structures with components (like resistors, capacitors or inductors) as nodes and wiring as edges. For simulation and hardware implementation purposes, these nodes are equipped with attributes like electrical characteristics and referenced against libraries of real-world products. The presented system takes an RDF representation of a netlist and uses Ngspice to calculate circuit parameters. Additional parameters can be specified using formulas which are also represented in RDF. The parameters and signals calculated for components are then used as constraints to shortlist candidates from the product knowledge graph and shortlisted candidates can then be optimized for cost if the marginal costs of procurement for the products are known. Signal and device characteristics matching criteria as well as unit standardization formulas are also stored as RDF triples to simplify the addition of new device types, circuit characteristics and matching criteria. A circuit simulator is used to predict the voltages at nodes and current flows in wires. These values and parameters are substituted in formulae to derive additional values such as the component's power consumption. Formulae are specified in RDF and the system checks which of the specified formulae for a component can be applied given the set of known parameters. These values and parameters are added as an enrichment to the RDF representation of the circuit, which is then used to shortlist products. Product information for components such as resistance, capacitance, power output and prices are collected from web stores to build a knowledge graph of different device types. Multiple physical devices from various manufacturers and vendors, with differing parameters and physical characteristics can match component requirements known at this stage. This is achieved by filters for each known parameter for a circuit component to list the most suitable device matches. Component level shortlists of matching devices from the knowledge graph also provide the engineer pricing information extracted from vendor sites. While detailed costing for the final hardware implementation and cost optimization is still not achievable because of complicated pricing rules and ordering costs, the designer is given an overview of the potential options along with the pricing per piece and the minimum order size. In future, the system is envisioned to support the engineer by automated constraint checking and product recommendations for implementing and altering circuits.

Nr: 19

Automatic Measurement of Corporate Reputation for Retail Companies from Online Public Data on the Web


Marselo Sitorus and Rob Loke

Abstract: Retail industry consists of the establishment of selling consumer goods (i.e. technology, pharmaceuticals, food and beverages, apparels and accessories, home improvement etc.) and services (i.e. specialty and movies) to customers through multiple channels of distribution including both the traditional brick-and-mortar and online retailing. Managing corporate reputation of retail companies is crucial as it has many advantages, for instance, it has been proven to impact generated revenues (Wang et al., 2016). But, in order to be able to manage corporate reputation, one has to be able to measure it, or, nowadays even better, listen to relevant social signals that are out there on the public web. One of the most extensive and widely used frameworks for measuring corporate reputation is through conducting elaborated surveys with respective stakeholders (Fombrun et al., 2015). This approach is valuable but deemed to be laborious and resource-heavy and will not allow to generate automatic alerts and quick and live insights that are extremely needed in this era of internet. For these purposes a social listening approach is needed that can be tailored to online data such as consumer reviews as the main data source. Online review datasets are a form of electronic Word-of-Mouth (WOM) that, when a data source is picked that is relevant to retail, commonly contain relevant information about customers’ perceptions regarding products (Pookulangara, 2011) and that are massively available. The algorithm that we have built in our application provides retailers with reputation scores for all variables that are deemed to be relevant to retail in the model of Fombrun et al. (2015). Examples of such variables for products and services are high quality, good value, stands behind, and meets customer needs. We propose a new set of subvariables with which these variables can be operationalized for retail in particular. Scores are being calculated using proportions of positive opinion pairs such as <fast, delivery> or <rude, staff> that have been designed per variable. With these important insights extracted, companies can act accordingly and proceed to improve their corporate reputation. It is important to emphasize that, once the design is complete and implemented, all processing can be performed completely automatic and unsupervised. The application makes use of a state of the art aspect-based sentiment analysis (ABSA) framework because of ABSA's ability to generate sentiment scores for all relevant variables and aspects. Since most online data is in open form and we deliberately want to avoid labelling any data by human experts, the unsupervised aspectator algorithm has been picked. It employs a lexicon to calculate sentiment scores and uses syntactic dependency paths to discover candidate aspects (Bancken et al., 2014). We have applied our approach to a large number of online review datasets that we sampled from a list of 50 top global retailers according to National Retail Federation (2020), including both offline and online operation, and that we scraped from trustpilot, a public website that is well-known to retailers. The algorithm has carefully been evaluated by manually annotating a randomly sampled subset of the datasets for validation purposes by two independent annotators. The Kappa’s score on this subset was 80%.