关键词 > DTS311TC

DTS311TC Final Year Project Application of Smart Car for Line Following and Large Model Integration

发布时间:2025-11-08

Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

DTS311TC Final Year Project

Application of Smart Car for Line Following and Large Model Integration

Abstract

In modern retail environments, efficiency and automation are crucial to meet increasing customer demands and reduce operational costs. This proposal focuses on the development of a smart car system designed to operate in a mapped environment, starting from an initial point where the car utilizes the large AI model "iFLYTEK Spark" to communicate with humans. The smart car records the required items mentioned during the conversation and navigates using SLAM (Simultaneous Localization and Mapping) to reach the line-following section. The line-following mechanism is implemented based on the RKNN platform, allowing precise navigation through predefined paths. Finally, the smart car arrives at the supermarket and identifies the items mentioned in the dialogue to complete the task. This approach aims to automate the process of item retrieval and delivery in retail settings,  improving  efficiency  and  enhancing  user  experience.  The  integration  of  advanced  AI technologies and automated navigation presents significant potential for revolutionizing traditional retail logistics, making operations more streamlined, and providing a seamless customer experience.

1.Introduction

1.1 Introduction and Background

In the modern retail industry, supermarkets face significant challenges such as rising labor costs, increasing customer demand for convenience, and the need for efficient operations. To address these issues, automation and intelligent systems are being introduced to streamline processes, enhance customer satisfaction, and reduce operational costs.

The concept of a smart car in retail environments aims to address these challenges by providing autonomous  navigation,  item  identification,  and  customer  interaction.  Equipped  with  advanced sensors and AI technologies, the smart car surpasses traditional inventory and logistics robots by offering multifunctional capabilities tailored to the needs of a retail setting. By leveraging sensor fusion, computer vision, and deep learning, the smart car can effectively navigate the supermarket, assist customers, and manage inventory tasks with minimal human intervention.

The integration of large AI models, such as  "iFLYTEK Spark," further enhances the smart car's capabilities, particularly in natural language understanding and dialogue management. This enables the smart car to engage in more meaningful interactions with customers, understand their needs, and accurately perform the requested tasks. The use of SLAM (Simultaneous Localization and Mapping) technology provides the smart car with the ability to navigate complex environments autonomously, while  RKNN-based  line  following  ensures  that  the  vehicle  can  adhere  to  predefined  paths effectively.

1.2 Scope and Objectives

The scope of this project is to develop a smart car system that can autonomously navigate in a supermarket environment, interact with humans to gather information on required items, and carry out  the  retrieval  and  delivery  of  those  items.  The  key  components  of  this   system  include SLAM-based navigation, RKNN-powered line-following, and the integration of a large AI model for advanced human-robot interaction (Thrun, Burgard, & Fox, 2005; Rockchip, 2019).

The  specific  objectives  of  this  project  are  to  develop  a  multifunctional  smart  car  capable  of autonomous navigation, human interaction, line following, and item recognition. The project aims to create a navigation system that enables the smart car to autonomously explore and operate within a mapped environment using SLAM technology. To facilitate effective human-robot interaction, the "iFLYTEK Spark" large AI model will be employed, allowing the smart car to understand verbal instructions and record required items. Additionally, an RKNN-based line-following mechanism will be implemented to ensure precise path adherence within the supermarket. The smart car will also be equipped with the ability to recognize items mentioned during interaction and autonomously retrieve them  from  supermarket  shelves.  Such  AI-driven  mechanisms  align  with  advancements  in  deep learning applications for real-world scenarios (Goodfellow, Bengio, & Courville, 2016).

Ultimately, the project  seeks to improve the efficiency  of item retrieval and delivery in a retail environment, reducing reliance on human labor and enhancing the overall customer experience. The integration  of  these  components  will  lead  to  a  fully  functional   smart  car   system  capable  of transforming  traditional  retail  logistics  by  providing  automated  solutions  for  item  management, customer assistance, and operational efficiency.

2. Literature Review

The development of autonomous systems in retail environments is an area of active research and innovation. This literature review will explore key technologies and methodologies that have been previously studied and implemented to address similar challenges in retail automation.

2.1 Autonomous Navigation in Retail

Autonomous navigation has been a critical focus in retail automation, aiming to provide robots with the capability to navigate crowded and dynamic environments effectively. Research studies have demonstrated the use of SLAM (Simultaneous Localization and Mapping) technology for creating detailed maps of supermarket layouts, enabling autonomous vehicles to navigate efficiently. For example, Thrun et al. (2005) in their seminal work on probabilistic robotics, introduced techniques that form the foundation for modern SLAM methods, allowing for reliable mapping and localization in dynamic environments. Algorithms like GMapping and Hector SLAM have been widely used to address localization and mapping issues in unknown environments, making autonomous systems more efficient in their navigation capabilities.

The  mathematical  foundation  of  SLAM  involves  probabilistic  estimation,  often  modeled  using Kalman filters or particle filters. The core problem in SLAM is to estimate the state of the robot (e.g., position and orientation) while simultaneously building a map of the environment. Techniques such as Extended Kalman Filters (EKF) and Particle Filters (PF) are employed to handle uncertainties in measurements and robot motion. These methods allow for continuous map building and correction, which is crucial in dynamic supermarket environments where aisles may change frequently.

The Extended Kalman Filter (EKF) used in SLAM can be expressed mathematically as follows:

State Prediction:

Predicts the robot's next state based on its current state and control input.

(1)

x k|k-1:Predicted state at time t.

f(.)Motion model, which maps the previous state and control input to the predicted state.

ut :Control input, such as velocity or steering angle.

wt :process noise, accounting for uncertainty in motion.

Kalman Gain:

Calculates the optimal weighting for the measurement update, balancing measurement noise.

(2)

PK|K1  : Predicted covariance, representing the uncertainty in the state prediction.

Ht :Jacobian matrix of the measurement model.

Rt :Measurement noise covariance, accounting for sensor inaccuracies.

State Update:

Corrects the predicted state using the measurement to yield an updated estimate.

(3)

Zk:Actual measurement from the environment.

h(.):Measurement model, mapping the predicted state to the measurement space.

These steps allow the robot to build a continuously updated map of the environment while estimating its position accurately. This is especially critical in retail settings where the layout may frequently change due to dynamic obstacles like customers and inventory adjustments.

2.2 Human-Robot Interaction (HRI) Using Large AI Models

Human-robot  interaction  in  retail  environments  plays  a  crucial  role  in  enhancing  customer experience. The use of large AI models, like "iFLYTEK Spark" or GPT-based architectures, has proven effective in understanding natural language and engaging in meaningful dialogues. Vaswani et  al.  (2017)  introduced  the  Transformer  architecture,  which  has  become  a  core  component  in modern language models, enabling them to understand and generate human-like responses. These models significantly enhance the communication capabilities of autonomous robots, allowing them to comprehend  customer needs, provide recommendations,  and record  item requests  seamlessly, thereby contributing to an interactive and personalized shopping experience.

The Transformer architecture relies on self-attention mechanisms that allow the model to weigh different parts of an input sequence, making it highly efficient at handling conversational context.

The self-attention mechanism is computed as follows:

(4)

Q, K , V:Query, key, and value matrices derived from the input sequence.

:   Scaled dot product, capturing relationships between input elements.

softmax(.):Normalizes attention scores to focus on key tokens.

The Transformer also includes a feed-forward network at each position, which operates as follows:

(5)

x:Normalizes attention scores to focus on key tokens.

W1 , W2 :Trainable weight matrices.

b1,b 2:Bias terms.

ReLU:Non - linear activation function, introducing complexity.

2.3 Line Following and Path Planning

Line following is a foundational aspect of robotic navigation, particularly in controlled environments such as warehouses and supermarkets. Traditional methods include using infrared sensors or cameras to  detect  lines,  while  recent   advancements   involve  deep  learning-based  methods  to   improve robustness. The RKNN (Rockchip Neural Network) platform provides an efficient solution for edge AI, facilitating real-time line following with high accuracy. Integrating RKNN with computer vision methods, as highlighted by Goodfellow et al. (2016) in their work on deep learning, has enabled smarter  and  more  reliable  navigation   systems  that  can  adapt  to  varying  lighting  and   surface conditions, making them suitable for real-world retail environments.

The line-following task can be modeled as an image classification or segmentation problem, where a convolutional  neural  network  (CNN)  processes  camera  images  to  detect  the  path.  The  RKNN platform  allows  the  deployment  of such  models  efficiently  on  edge  devices,  enabling  real-time decision-making directly on the vehicle.

Pooling Layer (e.g., Max Pooling):

(6)

x[i : i + k, j + j + k]Size of the pooling window

k:Size of the pooling window

y[i, j]:Output value, representing the maximum value in the pooling region.

Pooling layers improve computational efficiency and focus the model's attention on the most critical features of the path, enabling robust navigation in dynamic retail environments.

3.Project Plans

The successful implementation of the smart car system for autonomous navigation and human-robot interaction  in  retail   environments  requires   a  well-structured  project  plan.  With  the  hardware components  already  available,  the  focus  will  shift  to  the  development,  testing,  and  deployment phases.   This   includes   integrating   navigation,    line-following,    and    large   AI    model-based communication  functionalities  into  a  cohesive  system.  Drawing  from  proven  methodologies  in autonomous  systems  (Zhang  &  Singh,  2017),  foundational  principles  of  deep  learning  (LeCun, Bengio, & Hinton, 2015), and advanced segmentation techniques (Shi & Malik, 2000), the project emphasizes robustness and efficiency to ensure reliable performance in real-world retail settings.

3.1 Smart Car System Design and Operational Framework

The proposed solution focuses on integrating autonomous navigation, human-robot interaction, and line-following capabilities to develop a smart car system that can navigate efficiently in a retail environment,  assist  customers,  and  retrieve  items  autonomously.  The  methodology  involves  the following steps:

3.1.1 SLAM-Based Navigation

To  enable  real-time  mapping  and  localization  in  a  dynamic  supermarket  environment,  SLAM (Simultaneous Localization and Mapping) will be implemented. The smart car will use LiDAR, IMU, and RGB cameras to build a real-time map of the environment and accurately determine its position. Path planning algorithms will then be developed to navigate through aisles and shelves, allowing the robot to move efficiently while avoiding obstacles.

The SLAM workflow, as depicted in Figure 1: SLAM Workflow Diagram, outlines the sequential processes involved in real-time mapping and localization. The system begins with data collection from sensors such as LiDAR and IMU, which is then processed through data association to match observations with existing map features. Tracks for new features are initialized, while outdated or irrelevant tracks and features are removed to maintain accuracy. The system predicts the smart car's next state based on control inputs like velocity and direction and dynamically updates the map to reflect changes in the environment. This iterative process ensures the smart car can adapt to dynamic environments  such  as  crowded  supermarkets  (Thrun,  Burgard,  &  Fox,  2005;  Engel,  Schöps,  & Cremers, 2014).

Figure 1:SLAM Workflow Diagram

The SLAM system will continuously update the map as the robot moves, enabling the car to adapt to dynamic  changes  in  the  environment.  This  approach  aligns  with  the  techniques  introduced  in LSD-SLAM, a method for monocular visual SLAM that is highly adaptable to large-scale settings (Engel,  Schöps,  & Cremers, 2014). Furthermore, leveraging ORB-SLAM offers an accurate and versatile alternative for monocular SLAM applications, which is particularly valuable for robust, real-time localization (Mur-Artal, Montiel, & Tardos, 2015). The integration of object detection algorithms such as YOLOv3 also enhances the smart car's ability to detect and avoid obstacles with high precision and low latency (Redmon & Farhadi, 2018), while multi-view 3D object detection supports effective environmental perception across multiple perspectives (Chen et al., 2017). This comprehensive integration ensures that the smart car can navigate through a constantly changing supermarket environment in real time (Thrun, Burgard, & Fox, 2005).

Figure 2 shows the system architecture, including the interaction between the SLAM, AI interaction, and RKNN modules, as well as the sensors used for mapping and localization.

Figure 2: System Architecture Diagram

3.1.2 Human-Robot Interaction Using AI Models

The  smart  car  will  be  integrated  with  the  large  AI  model  "iFLYTEK  Spark"  to  enable  natural language processing (NLP) for meaningful interaction with customers. This capability allows the car to understand spoken customer queries, such as requests for specific products or assistance with store navigation.  The   system  also  incorporates   speech  recognition   and  intent  detection  modules  to accurately interpret customer needs, leveraging the self-attention mechanisms in Transformer models to handle complex conversational context (Vaswani et al., 2017). Additionally, pre-trained language models, such as BERT, have proven effective in enhancing the precision of intent detection, further refining the interaction quality in customer queries (Devlin et al., 2019).

Once the intent is detected, the system generates contextual responses, enabling the robot to engage with customers in a personalized manner. This interaction model is supported by techniques like the GPT-3 model, which enhances contextual response generation through advanced language modeling capabilities (Brown et al., 2020). This interaction will be key to enhancing the shopping experience by enabling the smart car to provide real-time assistance in a natural, human-like way.

Figure 3 illustrates the flow of interaction from customer speech input to the car’s response, showing the process of speech recognition, intent detection, and contextual response generation.

Figure 3: AI Interaction Flowchart

3.1.3 Line Following Using RKNN

The  RKNN  (Rockchip  Neural  Network)  platform  will  be  used  to  implement  a  line-following mechanism for the smart car. A Convolutional Neural Network (CNN) will be trained to detect floor lines and guide the robot along predefined paths within the supermarket. This setup allows the robot to  follow  a  specific  route  while  avoiding  obstacles  and  other  disruptions,  with  CNNs  proving effective  for  feature  extraction  in  image-based  tasks  (Goodfellow,  Bengio,  &  Courville,  2016). Additionally, the integration of MobileNet, an efficient neural network for mobile and embedded vision applications, supports edge AI processing on resource-limited devices, allowing real-time line detection and path correction (Howard et al., 2017).

As   illustrated   in   Figure   4,   the   architecture   demonstrates   an   RKNN-based   framework   for time-dependent state prediction. RKNN (Rockchip Neural Network) provides an efficient platform for deploying deep learning models on edge devices, enabling real-time processing. In this setup, the system uses the current state   yr i(tj and  control parameter μ(tj as inputs, processes them through multiple neural network layers, and outputs the predicted state   yr i(tj+1 )    for the next time step.

This  architecture  leverages  RKNN's  efficient  computation  capabilities,  which   are  particularly suitable for dynamic system modeling and real-time control applications. RKNN provides a robust platform for deploying deep learning models on edge devices, enabling high-performance inference even  on  resource-constrained  hardware  (Rockchip,  2019).  This  design  ensures  accurate  and consistent state predictions, making it ideal for real-world dynamic environments.

Figure 4: MLP for Time-State Prediction

Using edge AI enables the decision-making process to be executed instantly, ensuring the car reacts swiftly  to  environmental  changes.  This  capability  is  essential  for  navigating  narrow  aisles  and following  complex  paths,  which  are  often  challenging  for  traditional  line-following  algorithms. Moreover, implementing transfer learning techniques on edge devices has shown promise in further improving model performance and adaptability to dynamic environments (Pan & Yang, 2010).

Figure 3.3 shows a diagram illustrating how the CNN model processes the camera feed to detect the path and guide the smart car along its designated route.

Figure 5: CNN Line Following Diagram

3.2 Experimental Design

To evaluate the performance of the proposed smart car system, a comprehensive experimental setup will be implemented, focusing on both individual modules—SLAM-based navigation, human-robot interaction,  and  line  following—and  the  integrated  system's  overall  functionality.  As  shown  in Figure 6: Task Map, a 4.5 × 4.5 m test area will be constructed. The experiment begins at point A, where the smart car initiates a dialogue with the user to obtain task details using the human-robot interaction module. After understanding and recording the requested item, the smart car employs the SLAM module to navigate autonomously to the yellow area, representing the item’s location. During this stage, the SLAM system ensures accurate localization and obstacle avoidance. Once the car reaches the yellow area and simulates item collection, it returns to A, switching to line-following mode. Finally, the car follows a predefined line from A to C to complete the task. This experimental design ensures a thorough evaluation of all system components under realistic conditions, providing valuable insights into their performance and integration.

Figure 6: Task Map

The performance of the smart car system will be evaluated through a comprehensive series of tests focusing  on  individual  modules—SLAM-based  navigation,  human-robot  interaction,  and  line following—as well as the overall integrated system.

The SLAM system will be tested in a simulated supermarket environment featuring dynamically changing elements, such as moving obstacles and variable aisle layouts. Localization accuracy will be assessed by comparing the estimated positions from SLAM with ground truth data, following methods suggested by Thrun, Burgard, & Fox (2005). Additionally, map quality will be evaluated based on its completeness and accuracy in capturing the environment, while path-planning efficiency will be measured in terms of the time and distance required for the smart car to reach its destination (Engel, Schöps, & Cremers, 2014).

For human-robot interaction (HRI), user studies will be conducted where participants engage with the smart car by requesting items or asking questions. Metrics such as speech recognition accuracy and response time will quantify system performance, while post-interaction surveys will assess user satisfaction and provide insights into the system’s usability in a retail setting. This dual approach of quantitative metrics and qualitative feedback is critical for capturing the nuanced effectiveness of the AI model, particularly in its natural language processing (NLP) and contextual response capabilities, as highlighted by Vaswani et al. (2017).

The line-following system will be evaluated in supermarket-like environments with varied lighting conditions and floor surfaces. The accuracy of the smart car in following designated paths will be observed, alongside its ability to quickly respond to path deviations. These tests will assess the robustness of the CNN model for line detection and path tracking, crucial for stable navigation. Robustness across different environmental conditions and the system's ability to maintain consistent performance are key metrics, as discussed in Goodfellow, Bengio, & Courville (2016).

Finally, for the overall system evaluation, the smart car will autonomously navigate to designated locations, interact with users, and retrieve items based on their requests. Task completion time will be  recorded  to  evaluate  efficiency,  and  item  retrieval  accuracy  will  measure  the  system’s effectiveness  in  locating  and  delivering  requested  items.  Additionally,  system  stability  will  be monitored to ensure that no unexpected errors occur during operation. This end-to-end evaluation will confirm the system’s readiness for real-world deployment in dynamic retail environments, as recommended by Redmon & Farhadi (2018).

3.3 Progress Analysis and Gantt Chart

The progress of the smart car development project for deploying a large AI model and completing line-following tasks in a supermarket environment is tracked using a Gantt chart. This chart outlines the key phases, tasks, and their respective timelines, helping to visualize the schedule, dependencies, and progress for each task to ensure timely project completion.

3.3.1 Progress Analysis

The project is divided into five main phases: Preparation, Planning, Implementation, Evaluation & Improvement,  and  Summary.  Each  phase  contains  specific  tasks  and  milestones,  with  progress assessed based on the completion of these milestones.

l Phase 1: Preparation (Weeks 5-7)

The initial preparation tasks have been successfully completed. This phase included setting up the project environment, exploring the deployment requirements for the large AI model, defining the problem statement, and conducting an initial literature review to establish a solid foundation for integrating large-scale AI and line-following technology.

l Phase 2: Planning (Weeks 8-12)

The   planning  phase   is   underway,   focusing   on   designing   the   methodology   for   integrating SLAM-based navigation, line-following, and large AI model-driven interaction. Data collection for training the AI model and refining navigation and line-following algorithms has been completed. Proposal drafting is ongoing and expected to be finalized by the end of Week 9.

l Phase 3: Implementation (Weeks 10-17)

The  implementation  phase  is  set  to  begin  in  Week   10.  Key  tasks  include  developing  the  2D auto-encoder and memory module for handling real-time SLAM data, training a CNN model for line-following, and integrating the large AI model (e.g., iFLYTEK Spark) for customer interaction. Initial testing of the line-following and SLAM systems will be carried out towards the end of this phase.

l Phase 4: Evaluation & Improvement (Weeks 18-25)

Model evaluation and iterative improvements will start in Week 18. This phase includes testing the AI interaction model in a simulated supermarket environment, fine-tuning the line-following and navigation   algorithms,   and   comparing   different    configurations   to   optimize   overall    system performance.

l Phase 5: Summary (Weeks 26-28)

The final phase focuses on documentation, project presentation, and deployment preparation. A draft of the project report will be created, followed by the final presentation. The project will conclude with the submission of the dissertation and deployment plan for real-world testing.

3.3.2 Gantt Chart

The Gantt chart below provides a timeline for each phase and task, distributed across two semesters. This breakdown helps to track progress and manage project timelines effectively.

Figure3.3.2:Project Timeline for Smart Car Development

4.Conclusion

In this project, we developed a smart car system aimed at achieving autonomous navigation and human-robot interaction in retail environments. The system combines SLAM-based navigation for real-time mapping and localization, an AI-driven interaction model for processing and responding to customer requests, and a line-following mechanism to ensure precise movement along predefined routes. Through iterative testing and optimization, the smart car is designed to navigate dynamic spaces, assist customers effectively, and handle obstacles autonomously. This approach highlights the  potential  for  AI-integrated  robotics  to  enhance  efficiency,  reduce  labor,  and  improve  the customer experience in retail settings.

Reference

[1]  Thrun, S., Burgard, W., & Fox, D. (2005). Probabilistic Robotics. MIT Press. (pp. 51-72)

[2] Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep Learning. MIT Press. (pp. 191-250)

[3] Redmon, J., & Farhadi, A. (2018). YOLOv3: An Incremental Improvement. arXiv preprint.

[4] Engel, J., Schöps, T., & Cremers, D. (2014). LSD-SLAM: Large-Scale Direct Monocular SLAM. European Conference on Computer Vision (ECCV).

[5] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I. (2017). Attention is All You Need. Advances in Neural Information Processing Systems (NeurIPS).

[6] Chen, X., Ma, H., Wan, J., Li, B., & Xia, T. (2017). Multi-View 3D Object Detection Network for Autonomous Driving. IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7] Mur-Artal, R., Montiel, J. M. M., & Tardos, J. D. (2015). ORB-SLAM: A Versatile and Accurate Monocular SLAM System. IEEE Transactions on Robotics.

[8] Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. NAACL-HLT.

[9] Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., ... & Amodei, D. (2020). Language Models are Few-Shot Learners. Advances in Neural Information Processing Systems (NeurIPS).

[10] Howard, A. G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., ... & Adam, H.

(2017). MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. arXiv preprint.

[11] Pan, S. J., & Yang, Q. (2010). A Survey on Transfer Learning. IEEE Transactions on Knowledge and Data Engineering, 22(10), 1345–1359.

[12] Rockchip. (2019). RKNN Toolkit User Guide. Retrieved from https://opensource.rock-chips.com/wiki_RKNN_Toolkit.

[13] Zhang, J., & Singh, S. (2017). LOAM: Lidar Odometry and Mapping in Real-time. Robotics: Science and Systems (RSS).

[14] LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep Learning. Nature, 521(7553), 436–444.

[15] Shi, J., & Malik, J. (2000). Normalized Cuts and Image Segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 22(8), 888–905.