view attachments
You will have the opportunity to evaluate an app and share your highlights and concerns with the team
American Psychiatric Assoc
Article to Read (attached it separate as pdf since its school only access)
Psyber guide
Brain HQ
After doing some research on an app, share your results in 1-2 page APA paper and jump into Yellowdig to explain your findings. Your instructor will give defining features of synaptic legacy in Yellowdig
What brain based category does this app fall under? What are the efficacy research results
published? What needs to be improved in terms of research studies with this app? Do they
include culturally relevant data? How does the app work? What makes it appealing or NOT?
What needs to be refined in the future? Would you recommend or NOT? Does it mislead the
population in terms of false outcomes or false advertising?Does this app provide a means for synaptic legacy? https://doi.org/10.1007/s10664-022-10236-0
On the privacy of mental health apps
An empirical investigation and its implications for app development
Leonardo Horn Iwaya1,2,3 M. Ali Babar1,2 Awais Rashid4,5
Chamila Wijayarathna1,2
Accepted: 1 September 2022 /
The Author(s) 2022
Abstract
An increasing number of mental health services are now offered through mobile health
(mHealth) systems, such as in mobile applications (apps). Although there is an unprece-
dented growth in the adoption of mental health services, partly due to the COVID-19
pandemic, concerns about data privacy risks due to security breaches are also increasing.
Whilst some studies have analyzed mHealth apps from different angles, including secu-
rity, there is relatively little evidence for data privacy issues that may exist in mHealth apps
used for mental health services, whose recipients can be particularly vulnerable. This paper
reports an empirical study aimed at systematically identifying and understanding data pri-
vacy incorporated in mental health apps. We analyzed 27 top-ranked mental health apps
from Google Play Store. Our methodology enabled us to perform an in-depth privacy anal-
ysis of the apps, covering static and dynamic analysis, data sharing behaviour, server-side
tests, privacy impact assessment requests, and privacy policy evaluation. Furthermore, we
mapped the findings to the LINDDUN threat taxonomy, describing how threats manifest on
the studied apps. The findings reveal important data privacy issues such as unnecessary per-
missions, insecure cryptography implementations, and leaks of personal data and credentials
in logs and web requests. There is also a high risk of user profiling as the apps development
do not provide foolproof mechanisms against linkability, detectability and identifiability.
Data sharing among 3rd-parties and advertisers in the current apps ecosystem aggravates
this situation. Based on the empirical findings of this study, we provide recommendations
to be considered by different stakeholders of mHealth apps in general and apps developers
in particular. We conclude that while developers ought to be more knowledgeable in con-
sidering and addressing privacy issues, users and health professionals can also play a role
by demanding privacy-friendly apps.
Keywords Privacy Security Mobile health Mental health apps Privacy by design
Android Empirical study
Communicated by: Jacques Klein
Leonardo Horn Iwaya
[emailprotected]
Extended author information available on the last page of the article.
Published online: 8 November 2022
Empirical Software Engineering (2023) 28:2
http://crossmark.crossref.org/dialog/?doi=10.1007/s10664-022-10236-0&domain=pdf
http://orcid.org/0000-0001-9005-0543
mailto: [emailprotected]
1 Introduction
The ongoing COVID-19 pandemic has dramatically increased the number of mental
health support services provided using applications developed for mobile devices. Such
applications are called mental health apps, a subcategory of mobile health (mHealth)
systems. Examples are chatbots (e.g., Wysa and Woebot) and text-a-therapist platforms
(e.g., TalkSpace and BetterHelp) that can be readily downloaded from apps stores, e.g.,
iOS or Android, and used for seeking and/or providing help for mental health well-
being (ECHAlliance 2020; Heilweil 2020). Even before the COVID-19 pandemic, these
apps have made the provision of mental health services more accessible to the people in
need, by lowering cost, eliminating traveling, saving time and reducing the fear of social
stigma/embarrassment attached to psychological treatment (Bakker et al. 2016; Price et al.
2014). Furthermore, mental health apps increase the availability of mental health services
(anywhere and anytime) to users and provide additional functionalities such as real-
time monitoring of users (Donker et al. 2013). Research also shows that mental health
apps improve users autonomy and increase self-awareness and self-efficacy (Prentice and
Dobson 2014) leading to better health outcomes.
On the other hand, studies on the security of mHealth apps, in general, have shown that
many apps are insecure, threatening the privacy of millions of users (Papageorgiou et al.
2018). Insecure apps can be the prime targets of cyber attackers since personal health infor-
mation is of great value for cyber-criminals (IBM 2020). There is also increasing evidence
pointing to a widespread lack of security knowledge among mHealth developers, which is
usually linked to different issues, such as insufficient security guidelines, tight budgets and
deadlines, lack of security testing, and so on Aljedaani et al. (2020) and Aljedaani et al.
(2021). App developers also heavily rely on a range of SDKs for analytics and advertising,
exacerbating the risks of data linkage, detectability, and re-identification of users in such
ecosystems (Solomos et al. 2019).
The real or perceived security risks leading to data privacy compromises are particularly
concerning for mental health apps because they deal with highly sensitive data, in contrast
to other general mHealth apps, e.g., for fitness and wellness. The stigma around mental ill-
nesses also increases the potentially negative impacts on users in case of privacy violations.
For instance, the mere link of users to a given app can reveal that they might be having some
psychological problems (e.g., anxiety, depression, or other mental health conditions), which
may make mental health apps users feel more vulnerable and fragile.
The above-mentioned mHealth apps data privacy concerns warrant evidence-based
inquiries for improved understanding and actionable measures as there is a paucity of empir-
ical research on understanding the full range of privacy threats that manifest in mental health
apps; the existing research has only focused on privacy policy analysis (OLoughlin et al.
2019; Powell et al. 2018; Robillard et al. 2019; Rosenfeld et al. 2017), or 3rd-party data
sharing (Huckvale et al. 2019). Hence, it is important to systematically identify and under-
stand the data privacy problems that may exist in mHealth apps as such a body of knowledge
can better inform the stakeholders in general and apps developers in particular.
In this study, we specifically focus on the subgroup of mHealth apps designed for mental
health and psychological support. This study was stimulated by one research question: What
is the current privacy status of top-ranked mental health apps? Here, we adopt a broad
definition of privacy that encompasses security and data protection and with emphasis on
the negative privacy impacts on data subjects.
2 Page 2 of 42 Empir Software Eng (2023) 28:2
The methodology for this investigation relied on a range of penetration testing tools and
methods for systematic analysis of privacy policies and regulatory compliance artefacts.
We selected a sample of 27 top-ranked mental health apps from the Google Play Store
that collected, stored and transmitted sensitive personal health information of users. We
subjected the apps to static and dynamic security analysis and privacy analysis using various
tools. Particular focus was put on using the Mobile Security Framework (MobSF), which
provides a wide range of static and dynamic analyzing tools. Other tools such as Drozer,
Qualys SSL Labs, WebFX, CLAUDETTE and PrivacyCheck were also employed in this
study. Furthermore, we documented the privacy issues that we identified for each app by
mapping them to the well-known LINDDUN privacy threat categories (i.e., Linkability,
Identifiability, Non-repudiation, Detectability, Disclosure of information, Unawareness and
Non-compliance) (Deng et al. 2011).
This studys findings reveal alarming data privacy problems in the mHealth apps used by
millions of users, who are likely to expect data privacy protection built in such apps. Our
studys main findings include:
Most apps pose linkability, identifiability, and detectability threats. This is a risk as
some 3rd-parties can link, re-identify and detect the users actions and data. Unaware-
ness is also related to such threats, given that apps do not explain (e.g., in the privacy
policy) the risks posed by targeted advertising on people experiencing mental prob-
lems and the risk of re-identification and disclosure of mental health conditions (e.g.,
anxiety, depression).
Only 3/27 app developers responded to our query regarding Privacy Impact Assess-
ments (PIAs), mentioning that they had performed a PIA on their app, and only
two of them had made the PIA reports public. That suggests a high non-compliance
rate since mHealth apps tend to pose high-risk to the rights and freedoms of
users.
24/27 app privacy policies were found to require at least college-level educa-
tion to understand them. The remaining 3/27 apps needed 10th12th-grade level
education to understand them. Such findings also suggest further problems with
regards to non-compliance, leading to data subjects unawareness about the nature
of the data processing activities in mental health apps, data controllers, and service
providers.
Static analysis reports show that 20/27 apps are at critical security risk, and 4/27
apps are at high security risk. Many of the issues are revealed through a simple
static analysis, such as the use of weak cryptography. Dynamic analysis also shows
that some apps transmit and log personal data in plain-text. Four apps can leak such
sensitive data to 3rd-parties, exacerbating risks of (re-)identification and information
disclosure.
We have also synthesised the main findings and mapped them according to the LIND-
DUN privacy threat taxonomy (Deng et al. 2011). The findings highlight the prevalence
of data privacy problems among the top-ranked mental health apps. It is clear that compa-
nies and software developers should pay more attention to privacy protection mechanisms
while developing mHealth apps. At the same time, users and mental health practitioners
should demand for (at least) compliance with privacy standards and regulations. Based on
the findings, we offer some recommendations for mHealth apps development companies,
apps developers, and other stakeholders.
Page 3 of 42 2Empir Software Eng (2023) 28:2
2 Background
2.1 Privacy (and Security)
Until quite recently, the term privacy was treated under the umbrella of security. However,
this situation has changed with data privacy gaining significance and prominence of its
own. It is essential to clarify the difference between privacy and security for the research
reported in this paper. In this study, we are mainly interested in data privacy that can be
compromised as a result of security breaches. The concept of privacy comprises several
aspects such as informed consent, transparency, and compliance, that are not necessarily
connected to security. Whilst privacy is protected through security measures, privacy cannot
be satisfied solely on the basis of managing security (Brooks et al. 2017). For such reasons,
we regard security as part of a broad conceptualisation of privacy, which includes protecting
personal data. As a consequence, the study design reflects this contrast between privacy and
security. That is, apart from traditional security testing, this study also evaluates the apps
privacy policies, makes requests for privacy impact assessments, and gathers the developers
feedback on raised issues.
2.2 The ecosystem of mental health Apps
Todays information systems are built upon a wide range of services involving multiple
stakeholders. Figure 1 presents a simplified Data Flow Diagram (DFD) that can help a
reader to identify the main actors in the mental health apps ecosystem for discussing the
privacy issues. As shown in Fig. 1, users (i.e., data subjects) have their data collected by
mHealth apps and transmitted to the companies (i.e., data controllers) as well as to the other
service providers (i.e., data processors). Privacy considerations should be made for every
Fig. 1 Simplified Data Flow Diagram (DFD) for the apps ecosystem with an overview of the data subjects,
data controllers, data processors, and privacy threats to consider
2 Page 4 of 42 Empir Software Eng (2023) 28:2
step of the DFD (i.e., a detailed DFD created by apps developers) in which personal data is
processed, stored and transmitted.
First, as shown in Fig. 1, the personal data flows from an app to a company-owned server.
Here developers have a greater control on the systems design so that the main concern
is the protection of data at-rest, in-transit and in-use. Developers can fully understand all
aspects of the company-owned infrastructure (i.e., client and server sides). Thus, they can
transparently communicate the nature of personal data collection and processing to its users.
Data flows within this trusted boundary of the company-owned systems tend to be less
problematic regarding privacy. However, it is worth stressing that privacy goes beyond data
protection, so other privacy aspects should be considered, such as unawareness and non-
compliance threat categories.
Second, personal data flows to many 3rd-party service providers that support the collec-
tion and processing of the users data. Most companies rely (often entirely) on public cloud
infrastructures (e.g., Amazon AWS, Google Cloud) to maintain their servers and databases,
as well as use many APIs that provide services for the apps to function (e.g., CrashAnalyt-
ics, RevenueCat, PayPal, Firebase). In such cases, developers have limited control over the
system, and the processing activities are not fully transparent anymore. Developers have to
trust service providers, and a shared responsibility model ensues. Thus, the data flows going
to service providers should be carefully scrutinized. This concern is particularly critical in
the context of mental health apps since the personal data is considered highly sensitive, as
previously mentioned.
Adding to the problem, companies often rely on advertising as a source of monetary
income for their apps, and mental health apps are no exception in such business mod-
els. Thus, a users information provided for using an app may be distributed to the app
developer(s), to 3rd-party sites used for functionality reasons, and to unidentified 3rd-party
marketers and advertisers (Giota and Kleftaras 2014). Whilst users and health professionals
are expected to be aware of such risks, it is important that companies that develop mHealth
apps are also transparent about the business model in which they operate. Users already
have little control over their data that resides within the developers systems, let alone the
data shared with 3rd-parties, such as mobile advertising platforms and data brokers.
2.3 The LINDDUN threat taxonomy
LINDDUN is a well-known privacy threat modelling framework (Deng et al. 2011), recently
included in the NIST Privacy Framework (NIST 2022). Given the increasing popularity of
LINDDUN framework for systematically analyzing privacy threats during software systems
development, we decided to use LINDDUN to analyze and map the findings from our study.
The LINDDUN privacy threat analysis methodology consists of three main steps: (1) mod-
elling the systems, using DFDs and describing all data; (2) eliciting privacy threats, iterating
over the DFD elements to identify threats using a taxonomy; and, (3) managing the threats,
finding suitable solutions to tackle the uncovered threats.
We are mainly interested in the LINDDUN threat taxonomy, which can be used as a
standard reference for discussing privacy threats:
Linkability: an adversary can link two items of interest (IOI) without knowing the iden-
tity of the data subject(s) involved (e.g., service providers are able to link data coming
from different apps about the same data subject).
Page 5 of 42 2Empir Software Eng (2023) 28:2
Identifiability: an adversary can identify a data subject from a set of data subjects
through an IOI (e.g., service providers can re-identity a user based on leaked data,
metadata, and unique IDs).
Non-repudiation: the data subject cannot deny a claim, such as having performed an
action or sent a request (e.g., data and transactions stored by companies and service
providers cannot be deleted, revealing the users actions).
Detectability: an adversary can distinguish whether an IOI about a data subject exists
or not, regardless of being able to read the contents itself (e.g., attackers can detect that
a users device is communicating with mental health services).
Disclosure of information: an adversary can learn the content of an IOI about a data
subject (e.g., data is transmitted in plain-text).
Unawareness: the data subject is unaware of the collection, processing, storage, or
sharing activities (and corresponding purposes) of the data subjects data (e.g., the com-
panies privacy policy is not easy to understand and transparent about the nature of data
processing).
Non-compliance: the processing, storage, or handling of personal data is not compliant
with legislation, regulation, and policy (e.g., a company fails to perform a PIA for a
privacy-sensitive systems).
Each of these seven threat categories is composed by distinct threat trees, forming the
complete threat taxonomy. For instance, the Linkability category is subdivided into four
threat trees: (1) Linkability of Entity (L e); (2) Linkability of Data Flows (L df); (3) Link-
ability of Data Store (L ds); and, Linkability of Process (L p). Each of the threat trees is
modeled in a number of branches in which the leaf nodes refer to a specific threat. For
instance, if we take the threat tree of Linkability of Data Flow (L df), it develops in two
main branches, i.e., Linkability of transactional data (transmitted data) (L df1) and Linka-
bility of contextual data (metadata) (L df2). These two branches are then divided into other
more specific threats, e.g., data flow not fully protected (L df6) or linkability based on IP
address (L df8). The other threat trees, i.e., Linkability of Entity and Linkability of Data
Store, follow the same overall structure of branches and leaf nodes.
However, not all of the main threat categories are composed of multiple threat trees. The
category of Unawareness, for example, contains only the threat tree for Unawareness of
Entity (U e); this is the only relevant, i.e., only an entity can be unaware, not a data flow,
data store, or process. And particularly for the threat tree of Information Disclosure, the
LINDDUN methodology actually borrows its threat trees from Microsofts security threat
model, STRIDE (Howard and Lipner 2006).
For a complete account of all the LINDDUN threat categories, threat trees, and specific
threats, we refer the reader of this article to the catalogue compiled in Wuyts et al. (2014).
Some familiarity with LINDDUN is beneficial since we refer to specific threats through-
out the paper, e.g., when describing how LINDDUN was incorporated into our research
methodology for this study and when discussing the main findings and results.
2.4 Related work
2.4.1 Security and privacy of mHealth Apps in general
The broad category of mHealth apps includes several types of apps, such as wellness and
fitness apps (e.g., calorie counters, exercise trackers), personal health apps (e.g., diabetes
monitors, symptom checkers), and medical resource apps (e.g., drugs catalogues, medical
2 Page 6 of 42 Empir Software Eng (2023) 28:2
terminology libraries). In the past years, many studies have analyzed the security and pri-
vacy of mHealth apps in general. Some studies focus on the analysis of the more visible
aspects of the mHealth apps, looking into their privacy policies, user interfaces, documen-
tation, and websites (Adhikari et al. 2014; Sampat and Prabhakar 2017; Hutton et al. 2018;
Shipp and Blasco 2020).
For instance, the work of Hutton et al. (2018) contributed with a set of heuristics for eval-
uating privacy characteristics of self-tracking apps. They introduced 26 heuristics under the
four categories of (a) notice and awareness, (b) choice and consent, (c) access and participa-
tion, (d) social disclosure usability. A group of 4 HCI and software engineering researchers
then analyzed 64 mHealth apps for self-tracking using these heuristics by reviewing the
apps user interface, terms of service, and privacy policies, reaching a moderate agreement
(kappa = .45) (Hutton et al. 2018). This work mentions that disagreements between raters
mainly arose from confusion over the privacy policies that are often unclear regarding lan-
guage and intent. We can also add that privacy lawyers would be better suited for this type of
analysis because the apps terms of service and privacy policies are legal artefacts. Nonethe-
less, their results show that most apps performed poorly against the proposed heuristics,
the app maturity was not a predictor for enhanced privacy; and apps that collected health
data (e.g., exercise and weight) performed worse than other self-tracking apps (Hutton et al.
2018). Adhikari et al. (2014) and Sampat and Prabhakar (2017) have also warned about the
issues concerning insufficient privacy policies (e.g., unclear or non-existent), lack of data
access and deletion functions, and opaqueness in the data sharing with 3rd-parties. Shipp
and Blasco (2020) also looked into menstrual apps in order to show that developers often
fail to consider menstruation and sex data as especially sensitive, mentioning only common
pieces of personal data (e.g., name, email) in their privacy policies.
Other studies have privileged the invisible aspects of mHealth apps security and
privacy, e.g., using pentesting tools to analyze the apps code, network traffic, logs, and
generated data (He et al. 2014; Papageorgiou et al. 2018; Hussain et al. 2018; LaMalva
and Schmeelk 2020). The earlier work of He et al. (2014) expressed concerns about the
widespread use of unsecured Internet communication and 3rd-party servers by mHealth
apps. Papageorgiou et al. (2018) carried out a more in-depth security and privacy anal-
ysis, revealing several vulnerabilities, such as unnecessary permissions, use of insecure
cryptography, hard-coding and logging of sensitive information, insecure servers SSL con-
figuration, and transmission of personal data to 3rd-parties. Similar threats have also been
identified in other studies as reported by Hussain et al. (2018) and LaMalva and Schmeelk
(2020).
The above-mentioned studies have contributed significantly to the researchers and
practitioners understanding of security and privacy threats in mHealth apps in general.
However, these studies often analyze mHealth apps in wellness and fitness categories
instead of apps with highly sensitive data such as those in the mental health area. From
2020 to 2022, a sharp increase of users have turned to mental health apps as an effect of
the COVID-19 pandemic; this context motivated our research team to perform this study.
Nonetheless, even after the pandemic, this trend will continue with the increased adoption
of mental health services supported by mHealth technologies.
2.4.2 Security and privacy of mental health Apps
As shown in Table 1, we identified eight studies related to the security and privacy of men-
tal health apps. However, the existing related work has a limited scope of analysis. Most
researchers focus only on the apps privacy policies (OLoughlin et al. 2019; Powell et al.
Page 7 of 42 2Empir Software Eng (2023) 28:2
Ta
bl
e
1
C
om
pa
ri
so
n
of
th
e
ex
is
tin
g
w
or
ks
on
pr
iv
ac
y
an
d/
or
se
cu
ri
ty
fo
r
m
en
ta
lh
ea
lth
ap
ps
ac
co
rd
in
g
to
th
ei
r
sc
op
e
of
an
al
ys
is
R
ef
Y
ea
r
N
.o
f
A
pp
s
C
on
di
tio
n
PP
PI
A
Pe
r
D
T
C
od
e
L
og
D
S
SC
U
C
L
im
ita
tio
ns
H
ua
ng
an
d
B
as
hi
r
(2
01
7)
20
17
27
4
A
nx
ie
ty
(i
)
L
im
ite
d
to
an
xi
et
y
ap
ps
.
(i
i)
A
na
ly
ze
s
on
ly
ap
ps
pe
rm
is
si
on
s.
H
uc
kv
al
e
et
al
.(
20
19
)
20
19
36
D
ep
re
ss
io
n
an
d
sm
ok
–
in
g
ce
ss
at
io
n
(i
)L
im
ite
d
to
de
pr
es
si
on
an
d
sm
ok
–
in
g
ce
ss
at
io
n
ap
ps
.
(i
i)
A
na
ly
ze
s
on
ly
th
e
pr
iv
ac
y
po
lic
ie
s
an
d
ne
t-
w
or
k
tr
af
fi
c.
M
uc
ha
ga
ta
an
d
Fe
rr
ei
ra
(2
01
9)
20
19
18
D
em
en
tia
(i
)
L
im
ite
d
to
de
m
en
tia
ap
ps
.
(i
i)
A
na
ly
ze
s
on
ly
PP
s
an
d
pe
rm
is
si
on
s
an
d
pe
rf
or
m
s
a
G
D
PR
co
m
pl
ia
nc
e
ch
ec
k.
O
L
ou
gh
lin
et
al
.(
20
19
)
20
19
11
6
D
ep
re
ss
io
n
(i
)
L
im
ite
d
to
de
pr
es
si
on
ap
ps
.
(i
i)
A
na
ly
ze
s
on
ly
th
e
PP
s.
Pa
rk
er
et
al
.(
20
19
)
20
19
61
M
en
ta
lh
ea
lth
(i
)
A
na
ly
ze
s
on
ly
th
e
ap
ps
pe
rm
is
–
si
on
s
an
d
PP
s.
Po
w
el
le
ta
l.
(2
01
8)
20
19
70
D
ia
be
te
s
an
d
m
en
–
ta
lh
ea
lth
(i
)
A
na
ly
ze
s
on
ly
th
e
co
m
pl
ex
ity
of
PP
s.
R
ob
ill
ar
d
et
al
.(
20
19
)
20
19
36
9
T
ra
ck
an
d
m
oo
d
(i
)A
na
ly
ze
s
on
ly
th
e
PP
s
an
d
te
rm
s
of
ag
re
em
en
t.
R
os
en
fe
ld
et
al
.(
20
17
)
20
17
33
D
em
en
tia
(i
)
L
im
ite
d
to
de
m
en
tia
ap
ps
.
(i
i)
A
na
ly
ze
s
on
ly
th
e
PP
s.
T
hi
s
w
or
k
27
M
en
ta
lh
ea
lth
(i
)
L
im
ite
d
to
to
p-
ra
nk
ed
m
en
ta
l
he
al
th
ap
ps
.(
ii)
PP
s
an
al
yz
ed
us
in
g
A
I-
as
si
st
ed
to
ol
s.
A
bb
re
vi
at
io
ns
đŸ˜›
ri
va
cy
Po
lic
y
(P
P)
,P
ri
va
cy
Im
pa
ct
A
ss
es
sm
en
t(
PI
A
),
Pe
rm
is
si
on
s
(P
er
),
D
at
a
T
ra
ns
fe
r(
D
T
),
D
at
a
St
or
ed
(D
S)
,S
er
ve
rC
on
fi
gu
ra
tio
n
(S
C
),
U
se
rC
on
tr
ol
(U
C
)
2 Page 8 of 42 Empir Software Eng (2023) 28:2
2018; Robillard et al. 2019; Rosenfeld et al. 2017). Another work investigates only the apps
permissions (Huang and Bashir 2017), or the combination of apps permissions and pri-
vacy policies (Parker et al. 2019). Another study (Muchagata and Ferreira 2019) proposes a
scope of analysis to check for GDPR compliance, i.e., assessing the types of collected data,
apps permissions, and evidence of consent management, data portability and data deletion
features. Such approaches mostly reveal Unawareness and Non-compliance issues, missing
the other categories of privacy threats. That means their results do not have the depth of
penetration tests to identify the presence of the concrete privacy threats.
One study has also examined the apps network traffic and data transmissions, in addition
to assessing the privacy policies (Huckvale et al. 2019). Looking into the network traffic
enabled the identification of data that is transmitted to 3rd-parties, such as marketing and
advertising services. To some extent, this study may cover all LINDDUN threat categories,
but it misses many branches in the LINDDUN threat trees. For instance, logs and stored data
are not inspected for data leaks and weak access control; nor is the reverse engineered code
reviewed for insecure coding. These types of inspections are important in order to achieve
breadth and depth of privacy analysis.
In this work, we employed an extensive assessment framework for the privacy analysis
of mental health apps, detailed in Section 3. In brief, our privacy analysis work included
a series of penetration tests, with static and dynamic analysis, inspecting apps permis-
sions, network traffic, identified servers, reverse-engineered code, databases and generated
data, which had not been explored in the related work shown in Table 1. Furthermore, the
proposed privacy analysis also involves communication with companies and software devel-
opers by requesting the PIAs of the apps and discussing findings through the responsible
disclosure process.
3 Methodology
This section presents the methodology used for the privacy assessment of the mental health
apps. Figure 2 gives an overview of the main processes, specific steps, and tools used
throughout the study.
3.1 Apps selection process
For this study, we selected mobile applications developed for Android devices that can be
downloaded from Google Play Store. The initial identification of potential apps for the
study was performed using the google-play-scraper Node.js module1, essentially
searching for free mental health apps in English (see Fig. 2), and setting the location to
Australia.
This search resulted in 250 apps as 250 is the default maximum number set by Google
Play Store. In this study, we are particularly interested in top-ranked mental health apps.
The main reason for focusing on top-ranked apps is that we sought to concentrate efforts on
the most popular mental health apps, in which privacy impacts may affect millions of users.
In order to select only the top-ranked apps, we introduced the following refinement criteria
during the app selection process: apps should have at least 100K downloads, rating above 4
1The google-play-scraper is a Node.js module to scrape application data from the Google Play store.
Website: https://www.npmjs.com/package/google-play-scraper
Page 9 of 42 2Empir Software Eng (2023) 28:2
https://www.npmjs.com/package/google-play-scraper
Fig. 2 An overview of the methodology used for investigating the privacy issues in mental health apps
stars, and categorized as MEDICAL or HEALTH AND FITNESS. This refinement reduced
our sample to 37 Android apps.
We also wanted to limit our analysis to apps that require health and/or personal data
as inputs in order to be functional and transmit users data to a remote host. That is, to
avoid analyzing apps data do not collect any personal data, e.g., a mindfulness app that
only plays music would most likely have fewer privacy impacts. To identify these types of
apps, we manually inspected the apps to figure out whether they store and transmit personal
data of their users. This process was carried by two researchers that jointly tested the apps
and reached a consensus on whether to include or exclude the app from the study. There
were no disagreements between the researchers in this step. This manual analysis included
several tasks such as downloading the apps, reading their descriptions, creating and using
dummy accounts to use the apps, entering information and checking apps functionalities.
The analysis stopped once we gathered sufficient evidence. That is, if an app collects and
stores or transmits at least one item of personal data (e.g., username, password, email, mood
levels, journal entry), we would consider this app for further analysis. We adopted this
low threshold for personal data collection because we assumed that even if an app adopted
stringent data minimisation strategies, there would still be potential privacy risks given the
rather sensitive context of the mental health apps.
This analysis identified nine apps that do not collect and transmit personal/health data of
users. Also, one of the apps provided a forum and chatting functionalities to users (e.g., to
discuss problems that they face or create support groups). The analysis of this app would
reveal information about other users on the platform. The mere collection of personal data
of other users (i.e., usernames, posts, replies) would require a complete ethical application
to address potential privacy issues. Therefore, we omitted these 10 apps from our analysis
and selected the remaining 27 apps to perform the privacy-centred security analysis.
2 Page 10 of 42 Empir Software Eng (2023) 28:2
3.2 Privacy analysis process
As shown in Fig. 2, after filtering the 27 apps to perform the analysis, we performed static
and dynamic security analysis to