Download Professional Data Engineer on Google Cloud Platform.Professional-Data-Engineer.Dump4Pass.2024-10-24.38q.vcex

Vendor: Google
Exam Code: Professional-Data-Engineer
Exam Name: Professional Data Engineer on Google Cloud Platform
Date: Oct 24, 2024
File Size: 277 KB

How to open VCEX files?

Files with VCEX extension can be opened by ProfExam Simulator.

Purchase
Coupon: EXAM_HUB

Discount: 20%

Demo Questions

Question 1
Your weather app queries a database every 15 minutes to get the current temperature. The frontend is powered by Google App Engine and server millions of users. How should you design the frontend to respond to a database failure? 
 
  1. Issue a command to restart the database servers. 
  2. Retry the query with exponential backoff, up to a cap of 15 minutes. 
  3. Retry the query every second until it comes back online to minimize staleness of data. 
  4. Reduce the query frequency to once every hour until the database comes back online.  
Correct answer: B
Question 2
You want to process payment transactions in a point-of-sale application that will run on Google Cloud Platform. Your user base could grow exponentially, but you do not want to manage infrastructure scaling. 
Which Google database service should you use? 
 
  1. Cloud SQL 
  2. BigQuery 
  3. Cloud Bigtable 
  4. Cloud Datastore  
Correct answer: D
Question 3
You need to store and analyze social media postings in Google BigQuery at a rate of 10,000 messages per minute in near real-time. Initially, design the application to use streaming inserts for individual postings. Your application also performs data aggregations right after the streaming inserts. You discover that the queries after streaming inserts do not exhibit strong consistency, and reports from the queries might miss in-flight data. How can you adjust your application design? 
 
  1. Re-write the application to load accumulated data every 2 minutes. 
  2. Convert the streaming insert code to batch load for individual messages. 
  3. Load the original message to Google Cloud SQL, and export the table every hour to BigQuery via streaming inserts. 
  4. Estimate the average latency for data availability after streaming inserts, and always run queries after waiting twice as long. 
     
Correct answer: D
Question 4
Business owners at your company have given you a database of bank transactions. Each row contains the user ID, transaction type, transaction location, and transaction amount. They ask you to investigate what type of machine learning can be applied to the data. Which three machine learning applications can you use? 
(Choose three.) 
 
  1. Supervised learning to determine which transactions are most likely to be fraudulent. 
  2. Unsupervised learning to determine which transactions are most likely to be fraudulent. 
  3. Clustering to divide the transactions into N categories based on feature similarity. 
  4. Supervised learning to predict the location of a transaction. 
  5. Reinforcement learning to predict the location of a transaction. 
  6. Unsupervised learning to predict the location of a transaction.  
Correct answer: BCD
Question 5
Your company has hired a new data scientist who wants to perform complicated analyses across very large datasets stored in Google Cloud Storage and in a Cassandra cluster on Google Compute Engine. The scientist primarily wants to create labelled data sets for machine learning projects, along with some visualization tasks. She reports that her laptop is not powerful enough to perform her tasks and it is slowing her down. You want to help her perform her tasks. What should you do? 
  1. Run a local version of Jupiter on the laptop. 
  2. Grant the user access to Google Cloud Shell. 
  3. Host a visualization tool on a VM on Google Compute Engine. 
  4. Deploy Google Cloud Datalab to a virtual machine (VM) on Google Compute Engine.  
Correct answer: D
Question 6
MJTelco Case Study 
Company Overview 
MJTelco is a startup that plans to build networks in rapidly growing, underserved markets around the world. 
The company has patents for innovative optical communications hardware. Based on these patents, they can create many reliable, high-speed backbone links with inexpensive hardware. 
 
Company Background  
Founded by experienced telecom executives, MJTelco uses technologies originally developed to overcome communications challenges in space. Fundamental to their operation, they need to create a distributed data infrastructure that drives real-time analysis and incorporates machine learning to continuously optimize their topologies. Because their hardware is inexpensive, they plan to overdeploy the network allowing them to account for the impact of dynamic regional politics on location availability and cost. 
Their management and operations teams are situated all around the globe creating many-to-many relationship between data consumers and provides in their system. After careful consideration, they decided public cloud is the perfect environment to support their needs. 
 
Solution Concept  
MJTelco is running a successful proof-of-concept (PoC) project in its labs. They have two primary needs: 
  • Scale and harden their PoC to support significantly more data flows generated when they ramp to more than 50,000 installations. 
  • Refine their machine-learning cycles to verify and improve the dynamic models they use to control topology definition. 
MJTelco will also use three separate operating environments – development/test, staging, and production – to meet the needs of running experiments, deploying new features, and serving production customers. 
 
Business Requirements  
  • Scale up their production environment with minimal cost, instantiating resources when and where needed in an unpredictable, distributed telecom user community. 
  • Ensure security of their proprietary data to protect their leading-edge machine learning and analysis. 
  • Provide reliable and timely access to data for analysis from distributed research workers 
  • Maintain isolated environments that support rapid iteration of their machine-learning models without affecting their customers. 
 
Technical Requirements 
  • Ensure secure and efficient transport and storage of telemetry data 
  • Rapidly scale instances to support between 10,000 and 100,000 data providers with multiple flows each. 
  • Allow analysis and presentation against data tables tracking up to 2 years of data storing approximately 100m records/day 
  • Support rapid iteration of monitoring infrastructure focused on awareness of data pipeline problems both in telemetry flows and in production learning cycles. 
 
CEO Statement 
Our business model relies on our patents, analytics and dynamic machine learning. Our inexpensive hardware is organized to be highly reliable, which gives us cost advantages. We need to quickly stabilize our large distributed data pipelines to meet our reliability and capacity commitments. 
 
CTO Statement 
Our public cloud services must operate as advertised. We need resources that scale and keep our data secure. We also need environments in which our data scientists can carefully study and quickly adapt our models. Because we rely on automation to process our data, we also need our development and test environments to work as we iterate. 
 
CFO Statement 
The project is too large for us to maintain the hardware and software required for the data and analysis. Also, we cannot afford to staff an operations team to monitor so many data feeds, so we will rely on automation and infrastructure. Google Cloud’s machine learning will allow our quantitative researchers to work on our high-value problems instead of problems with our data pipelines. 
 
You create a new report for your large team in Google Data Studio 360. The report uses Google BigQuery as its data source. It is company policy to ensure employees can view only the data associated with their region, so you create and populate a table for each region. You need to enforce the regional access policy to the data. 
Which two actions should you take? (Choose two.)  
  1. Ensure all the tables are included in global dataset. 
  2. Ensure each table is included in a dataset for a region. 
  3. Adjust the settings for each table to allow a related region-based security group view access. 
  4. Adjust the settings for each view to allow a related region-based security group view access. 
  5. Adjust the settings for each dataset to allow a related region-based security group view access.  
Correct answer: BE
Question 7
You are implementing security best practices on your data pipeline. Currently, you are manually executing jobs as the Project Owner. You want to automate these jobs by taking nightly batch files containing non-public information from Google Cloud Storage, processing them with a Spark Scala job on a Google Cloud Dataproc cluster, and depositing the results into Google BigQuery. 
How should you securely run this workload? 
 
  1. Restrict the Google Cloud Storage bucket so only you can see the files. 
  2. Grant the Project Owner role to a service account, and run the job with it 
  3. Use a service account with the ability to read the batch files and to write to BigQuery 
  4. Use a user account with the Project Viewer role on the Cloud Dataproc cluster to read the batch files and write to BigQuery  
Correct answer: C
Question 8
You are using Google BigQuery as your data warehouse. Your users report that the following simple query is running very slowly, no matter when they run the query:  
SELECT country, state, city FROM [myproject:mydataset.mytable] GROUP BY country  
You check the query plan for the query and see the following output in the Read section of Stage:1: 
 
 
 
What is the most likely cause of the delay for this query? 
 
  1. Users are running too many concurrent queries in the system. 
  2. The [myproject:mydataset.mytable] table has too many partitions 
  3. Either the state or the city columns in the [myproject:mydataset.mytable] table have too many NULL values 
  4. Most rows in the [myproject:mydataset.mytable] table have the same value in the country column, causing data skew  
Correct answer: D
Question 9
Your globally distributed auction application allows users to bid on items. Occasionally, users place identical bids at nearly identical times, and different application servers process those bids. Each bid event contains the item, amount, user, and timestamp. You want to collate those bid events into a single location in real time to determine which user bid first. What should you do? 
 
  1. Create a file on a shared file and have the application servers write all bid events to that file. Process the file with Apache Hadoop to identify which user bid first. 
  2. Have each application server write the bid events to Cloud Pub/Sub as they occur. Push the events from Cloud Pub/Sub to a custom endpoint that writes the bid event information into Cloud SQL. 
  3. Set up a MySQL database for each application server to write bid events into. Periodically query each of those distributed MySQL databases and update a master MySQL database with bid event information. 
  4. Have each application server write the bid events to Google Cloud Pub/Sub as they occur. Use a pull subscription to pull the bid events using Google Cloud Dataflow. Give the bid for each item to the user in the bid event that is processed first.  
Correct answer: B
Question 10
You set up a streaming data insert into a Redis cluster via a Kafka cluster. Both clusters are running on Compute Engine instances. You need to encrypt data at rest with encryption keys that you can create, rotate, and destroy as needed. What should you do? 
 
  1. Create a dedicated service account, and use encryption at rest to reference your data stored in your Compute Engine cluster instances as part of your API service calls. 
  2. Create encryption keys in Cloud Key Management Service. Use those keys to encrypt your data in all of the Compute Engine cluster instances. 
  3. Create encryption keys locally. Upload your encryption keys to Cloud Key Management Service. Use those keys to encrypt your data in all of the Compute Engine cluster instances. 
  4. Create encryption keys in Cloud Key Management Service. Reference those keys in your API service calls when accessing the data in your Compute Engine cluster instances.  
Correct answer: B
HOW TO OPEN VCE FILES

Use VCE Exam Simulator to open VCE files
Avanaset

HOW TO OPEN VCEX AND EXAM FILES

Use ProfExam Simulator to open VCEX and EXAM files
ProfExam Screen

ProfExam
ProfExam at a 20% markdown

You have the opportunity to purchase ProfExam at a 20% reduced price

Get Now!